NLP Scholar: An Interactive Visualization to Explore NLP Papers

NLP Scholar is a research prototype of a visual explorer to help users find related work and explore broad patterns in citations. Papers are taken from the ACL Anthology (AA). Citation information is taken from Google Scholar author profiles. (Date of last data extraction: June 2020). The NLP Scholar data and further details about the NLP Scholar Project are available at the NLP Scholar Project Home Page. While some caveats and limitations are mentioned below, a detailed list is available here.

Contact: Saif M. Mohammad (uvgotsaif@gmail.com, saif.mohammad@nrc-cnrc.gc.ca); Follow @SaifMMohammad

Responsiveness: Please allow for about five seconds for the initial visualization to render. This delay is due to the limitations of the Tableau Public server, the large number of rows, and multiple search filters. Hover information is rather responsive. After clicking, it takes a few seconds to render the updated visualization. Visualizations on smaller subsets of the papers, e.g. ACL papers, are more responsive than when working with the full set of papers.

Versions:

Version 4 with data from June 2020: Shown below on this page. This is the latest version. ACL Anthology and Google Scholar information extracted in June 2020. This includes more than 52K papers by about 45K authors. Citation information is available for 81.3% of tehse papers. Minor changes in the interface over version 3.
Version 3 with data from June 2019: ACL Anthology and Google Scholar information extracted in June 2019 (same as v2). Simpler design than v2. Added B3. Improved responsiveness.
Version 2 (ACL 2020 Demo paper version): ACL Anthology and Google Scholar information extracted in June 2019. The visualization is as described in the ACL 2020 system demo paper.

Disclaimer: This is a beta release and a research prototype. The nature of automatic extraction, alignment, and processing entails some amount of errors.

Visualization

Interactivity Options:

Click on the pre-made tabs at the top to explore example interactions. In your exploration, it is easier to start with one of these pre-made tabs (e.g.: Year 2014) and change the settings from there, instead of starting from the full collection of data (shown in the last tab). The visualization is slow to render the full data.
Hover over an item to see an information box with relevant details. For example, hovering over the 2019 bar in A2 will show the #papers published that year as well as #papers for which we were able to extract citation information from Google Scholar author profiles. (Note that not all authors have created a Google Scholar profile, and that limits the amount of citation information we can extract automatically.)
Click on an item to filter all visualizations; showing information relevant to the selection. (This may take a few seconds.) Click again or press escape to undo the selection.
In B2, the colored segments correspond to individual papers. The height of a segment is proportional to the number of citations the paper has received. Hovering on a segment shows basic paper details. Clicking on a segment takes one to the landing page for the paper in the ACL Anthology. You can access the pdf, BibTeX, abstract, etc. from there. You can also access the ACL Anthology page for a paper by clicking on the paper in the Papers list C. Click again to deselect the paper.
E provides a sliders to adjust date range and citations range. Protip: You will find it easier to click on a number on either side of the slider and enter the desired number, rather than dragging the slider. E also has search boxes to find papers by an author, or papers with user-specified words in the title (unigram or bigram).
Examine papers from a venue (e.g. ACL) or papers of a certain type (e.g. workshop papers), by clicking on corresponding tiles in F. You can select multiple tiles by holding down control/command on your keyboard and clicking or by clicking and dragging over multiple tiles.
Reset all filters by clicking on the Reset button on the bottom right or by reloading the page. Reset a pre-made tab by clicking on the curved arrow that appears on top of it everytime a selection is made or filter applied to change its home state.
Null value citations: B1, B2, and B3 show information for only those AA papers for which we were able to obtain citation information. Similarly, the citation numbers listed next to authors in D are based on only those papers which we were able to obtain citation information.

For A and C, you can control whether to include papers with no citation information (Null values), by clicking on the downward facing arrow on the top right of the citations slider and selecting a suitable option. Hover top right of the slider to make the downward facing arrow appear, then click:

By default, in the pre-made tabs for citations above some theshold, Null value papers are not included. For the rest, they are.

In C, the papers without citation information are listed at the bottom of the list and no number is shown to their right. In contrast, if a paper in C is listed with number 0, then it means that the paper received 0 citations at the time of data collection.

Scroll through the slides for a quick description of the data, the visualizations, and examples. This paper has further details.

Paper

Details of the visualization are available in this paper:

NLP Scholar: An Interactive Visual Explorer for Natural Language Processing Literature. Saif M. Mohammad. In Proceedings of the 58th Annual Meeting of the Association of Computational Linguistics (ACL-2020). July 2020. Seattle, USA.
Paper (pdf) Presentation

BibTeX:

@inproceedings{mohammad2020demo,
title={NLP Scholar: An Interactive Visual Explorer for Natural Language Processing Literature},
author={Mohammad, Saif M.},
booktitle={Proceedings of the 2020 annual conference of the association for computational linguistics},
address={Seattle, USA},
year={2020} }

Note that the paper (as well as other 2020 publications listed on the NLP Scholar Project Home Page) refer to work using data from June 2019.

Notes

Citations are not always a reflection of quality or importance. They can be impacted by popularity of area, eggregious self-citations, biases in citation practises, etc. One might view them as an imperfect but useful metric for examining sets of papers. For individual papers, I strongly encourage looking beyond citations, for example, use the timeline to see what papers came before the high-citation papers that may have influenced them and the field.
B3 shows a box plot of citations for the current set of papers in the visualization. The y axis is in log scale. The shaded segments represent quartiles on either side of the median. The average is indicated by a dashed orange line. Brown dots on the vertical line correspond to individual papers. Hover over components to view relevant information in a tool tip. Click on a dot to be taken to the ACL Anthology landing page for the paper. As usual, click again to deselect or reset the visualization.
One can make selections in the visualizations to recreate the citation box plots shown in:

Examining Citations of Natural Language Processing Literature. Saif M. Mohammad. In Proceedings of the 58th Annual Meeting of the Association of Computational Linguistics (ACL-2020), July 2020, Seattle, USA.

Note that in the above paper, most of the analyses were on papers publish up till 2016, to allow for a few years for papers to collect citations.
The citation numbers listed next to authors in D correspond to the citations of their ACL Anthology papers (and not all of their papers) as of June 2019. This list can be useful when searching for researchers with expertise in a particular area. For example, entering "Arabic" in the unigram search box will show researchers that have worked on Arabic.

D is not meant to be a leaderboard or a ranking of researchers. It simply provides a coarse and imperfect way of identifying possible experts in an area. A person with very few citations may well be extremely knowledgable about an area. Also, a person's highly citated papers might be in venues that are not associated with ACL Anthology, and thus not part of the dataset being visualized here. Further, we do not have citation information for about 26% of the papers in the ACL Anthology.

Clicking on multiple elements (including use of sliders) causes all of the corresponding filters to be applied (ANDing). For example, clicking on "ACL" in F and the bar for 2010 in A2 will show all ACL papers from 2010.
Multiple terms can be selected through the search box for Author in E. Entering multiple author names will restrict the data to all papers that have at least one of them as authors.
Similarly, one can explore papers on a topic by entering multiple terms associated with the topic in the Title Unigram (or Title Bigram) search box in E. Multiple entries within a search box are OR'ed. Selections across different components are AND'ed.
The size (area) of a tile in F is proportional to log (#papers + 1). This choice was made so that the order of the tiles (from left to right) is proportional to #papers, and yet venues with a very small number of papers are still shown in a large enough size to enable ease of viewing and selecting them.
You can download the source NLP Scholar Demo file (click icon in bottom right of viz) and open it with the free Tableau Reader on your computer for a more responsive interaction.
Author name variants are derived from the list of name variants compiled by ACL Anthology. If one or more of your AA papers is submitted under a different name, you can report it as a name variant to AA. The next round of data collection by NLP Scholar will then consolidate papers accordingly. In addition to the name variants file AA applies some heuristics to automatically find name variants. The current version of NLP Scholar does not incorporate this, but we hope to do so in the future.
Some entries in the ACL Anthology, such as forewords, table of contents, and schedules, were removed as part of the preprocessing of the data.
Feedback: You can provide feedback, report issues, and request corrections by sending an email to uvgotsaif@gmail.com with "NLP Scholar Feedback" in the subject line. I would especially love to hear about all the cool things it helped you discover!

NLP Scholar: An Interactive Visualization to Explore NLP Papers

Visualization

Paper

Notes

Screenshot of NLP Scholar When Showing the Full June 2020 Data