NLP Scholar: An Interactive Visualization to Explore NLP Papers v3

See latest version (v4) here.

Papers are taken from the ACL Anthology (AA). Citation information is taken from Google Scholar author profiles. (Date of data extraction: June 2019). We were able to obtain citation information for about 74% of the papers. The NLP Scholar data and further details about the NLP Scholar Project are available at the NLP Scholar Project Home Page. While some caveats and limitations are mentioned below, a detailed list is available here.

Contact: Saif M. Mohammad (uvgotsaif@gmail.com, saif.mohammad@nrc-cnrc.gc.ca);

Interactivity:

  • Click on the pre-made tabs at the top to explore example interactions. In your exploration, it is easier to start with one of these pre-made tabs (e.g.: Year 2016) and change the settings from there, instead of starting from the full collection of data (shown in the last tab). The visualization is slow to render the full data.

  • Hover over an item to see an information box with relevant details.

  • Click on an item to filter all visualizations; showing information relevant to the selection. (This may take a few seconds.) Click again or press escape to undo the selection.

  • In B2, the colored segments correspond to individual papers. The height of a segment is proportional to the number of citations the paper has received. Hovering on a segment shows basic paper details. Clicking on a segment takes one to the landing page for the paper in the ACL Anthology. You can access the pdf, BibTeX, abstract, etc. from there. You can also access the ACL Anthology page for a paper by clicking on the paper in the Papers list C. Click again to deselect the paper.

  • E provides a slider to adjust date range as well as a search boxes to show papers by an author, or papers with user-specified words in the title (unigram or bigram).

  • Examine papers from a venue (e.g. ACL) or papers of a certain type (e.g. workshop papers), by clicking on corresponding tiles in F. You can select multiple tiles by holding down control/command on your keyboard and clicking or by clicking and dragging over multiple tiles.

  • Reset all filters by clicking on the Reset button on the bottom right or by reloading the page. Reset a pre-made tab by clicking on the curved arrow that appears on top of it everytime a selection is made or filter applied to change its home state.

Scroll through the slides for a quick description of the data, the visualizations, and examples. This paper has further details.

Responsiveness: Please allow for about five to ten seconds for the initial visualization to render. This delay is due to the limitations of the Tableau Public server, the large number of rows, and multiple search filters. Hover information is rather responsive. After clicking, it takes a few seconds to render the updated visualization. Visualizations on smaller subsets of the papers, e.g. ACL papers, are more responsive than when working with the full set of ~45K papers.

ACL 2020 Demo paper version: The visualization below is an updated and more responsive version of the work described in the ACL 2020 demo paper. We recommend using this updated version. However, you can also explore the ACL 2020 demo paper version here.

Citations: B1, B2, and B3 show information for only those AA papers for which we were able to obtain citation information. Similarly, the citation numbers listed next to authors in D are based on only those papers which we were able to obtain citation information. In C, the papers without citation information are listed at the bottom without any number. In contrast, if a paper in C is listed with number 0, then it means that the paper received 0 citations at the time of data collection.

Disclaimer: This is a beta release and a research prototype. The nature of automatic extraction, alignment, and processing entails some amount of errors.

 

Visualization


 

Paper

Details of the visualization are available in this paper:

NLP Scholar: An Interactive Visual Explorer for Natural Language Processing Literature. Saif M. Mohammad. In Proceedings of the 58th Annual Meeting of the Association of Computational Linguistics (ACL-2020). July 2020. Seattle, USA.
Paper (pdf)    Presentation

Please cite the above paper if you use the visualization or screenshots from it.

 

Notes

  1. B3 shows a box plot of citations for the current set of papers in the visualization. The y axis is in log scale. The shaded segments represent quartiles on either side of the median. The average is indicated by a dashed orange line. Brown dots on the vertical line correspond to individual papers. Hover over components to view relevant information in a tool tip. Click on a dot to be taken to the ACL Anthology landing page for the paper. As usual, click again to deselect or reset the visualization.

  2. One can make selections in the visualizations to recreate the citation box plots shown in:

    Examining Citations of Natural Language Processing Literature. Saif M. Mohammad. In Proceedings of the 58th Annual Meeting of the Association of Computational Linguistics (ACL-2020), July 2020, Seattle, USA.

    Note that in the above paper, most of the analyses were on papers publish up till 2016, to allow for a few years for papers to collect citations.

  3. The citation numbers listed next to authors in D correspond to the citations of their ACL Anthology papers (and not all of their papers) as of June 2019. This list can be useful when searching for researchers with expertise in a particular area. For example, entering "Arabic" in the unigram search box will show researchers that have worked on Arabic.
  4. Clicking on multiple elements (including use of sliders) causes all of the corresponding filters to be applied (ANDing). For example, clicking on "ACL" in F and the bar for 2010 in A2 will show all ACL papers from 2010.

  5. Multiple terms can be selected through the search box for Author in E. Entering multiple author names will restrict the data to all papers that have at least one of them as authors.

  6. Similarly, one can explore papers on a topic by entering multiple terms associated with the topic in the Title Unigram (or Title Bigram) search box in E. Multiple entries within a search box are OR'ed. Selections across different components are AND'ed (see bullet 2).

  7. The size (area) of a tile in F is proportional to log (#papers + 1). This choice was made so that the order of the tiles (from left to right) is proportional to #papers, and yet venues with a very small number of papers are still shown in a large enough size to enable ease of viewing and selecting them.

  8. Date of data extraction from ACL Anthology and Google Scholar is June 2019. The ACL 2020 publications listed on the NLP Scholar Project Home Page are based on this data. We will do a fresh data extraction soon.

  9. Author name variants are derived from the list of name variants compiled by ACL Anthology. If one or more of your AA papers is submitted under a different name, you can report it as a name variant to AA. The next round of data collection by NLP Scholar will then consolidate papers accordingly. In addition to the name variants file AA applies some heuristics to automatically find name variants. The current version of NLP Scholar does not incorporate this, but we hope to do so in the future.

  10. You can provide feedback, report issues, and request corrections by sending an email to uvgotsaif@gmail.com with "NLP Scholar Feedback" in the subject line.