NLP Scholar: An Interactive Visualization to Explore NLP Papers

NLP Scholar is a research prototype of a visual explorer to help users find related work and explore broad patterns in citations. Papers are taken from the ACL Anthology (AA). Citation information is taken from Google Scholar author profiles. (Date of last data extraction: June 2020). The NLP Scholar data and further details about the NLP Scholar Project are available at the NLP Scholar Project Home Page. While some caveats and limitations are mentioned below, a detailed list is available here.

Contact: Saif M. Mohammad (uvgotsaif@gmail.com, saif.mohammad@nrc-cnrc.gc.ca);

Responsiveness: Please allow for about five seconds for the initial visualization to render. This delay is due to the limitations of the Tableau Public server, the large number of rows, and multiple search filters. Hover information is rather responsive. After clicking, it takes a few seconds to render the updated visualization. Visualizations on smaller subsets of the papers, e.g. ACL papers, are more responsive than when working with the full set of papers.

Versions:

  • Version 4 with data from June 2020: Shown below on this page. This is the latest version. ACL Anthology and Google Scholar information extracted in June 2020. This includes more than 52K papers by about 45K authors. Citation information is available for 81.3% of tehse papers. Minor changes in the interface over version 3.

  • Version 3 with data from June 2019: ACL Anthology and Google Scholar information extracted in June 2019 (same as v2). Simpler design than v2. Added B3. Improved responsiveness.

  • Version 2 (ACL 2020 Demo paper version): ACL Anthology and Google Scholar information extracted in June 2019. The visualization is as described in the ACL 2020 system demo paper.

Disclaimer: This is a beta release and a research prototype. The nature of automatic extraction, alignment, and processing entails some amount of errors.

 

Visualization


 

Interactivity Options:

Scroll through the slides for a quick description of the data, the visualizations, and examples. This paper has further details.

 

Paper

Details of the visualization are available in this paper:

NLP Scholar: An Interactive Visual Explorer for Natural Language Processing Literature. Saif M. Mohammad. In Proceedings of the 58th Annual Meeting of the Association of Computational Linguistics (ACL-2020). July 2020. Seattle, USA.
Paper (pdf)    Presentation

Note that the paper (as well as other 2020 publications listed on the NLP Scholar Project Home Page) refer to work using data from June 2019.

 

Notes

  1. Citations are not always a reflection of quality or importance. They can be impacted by popularity of area, eggregious self-citations, biases in citation practises, etc. One might view them as an imperfect but useful metric for examining sets of papers. For individual papers, I strongly encourage looking beyond citations, for example, use the timeline to see what papers came before the high-citation papers that may have influenced them and the field.

  2. B3 shows a box plot of citations for the current set of papers in the visualization. The y axis is in log scale. The shaded segments represent quartiles on either side of the median. The average is indicated by a dashed orange line. Brown dots on the vertical line correspond to individual papers. Hover over components to view relevant information in a tool tip. Click on a dot to be taken to the ACL Anthology landing page for the paper. As usual, click again to deselect or reset the visualization.

  3. One can make selections in the visualizations to recreate the citation box plots shown in:

    Examining Citations of Natural Language Processing Literature. Saif M. Mohammad. In Proceedings of the 58th Annual Meeting of the Association of Computational Linguistics (ACL-2020), July 2020, Seattle, USA.

    Note that in the above paper, most of the analyses were on papers publish up till 2016, to allow for a few years for papers to collect citations.

  4. The citation numbers listed next to authors in D correspond to the citations of their ACL Anthology papers (and not all of their papers) as of June 2019. This list can be useful when searching for researchers with expertise in a particular area. For example, entering "Arabic" in the unigram search box will show researchers that have worked on Arabic.

  5. Clicking on multiple elements (including use of sliders) causes all of the corresponding filters to be applied (ANDing). For example, clicking on "ACL" in F and the bar for 2010 in A2 will show all ACL papers from 2010.

  6. Multiple terms can be selected through the search box for Author in E. Entering multiple author names will restrict the data to all papers that have at least one of them as authors.

  7. Similarly, one can explore papers on a topic by entering multiple terms associated with the topic in the Title Unigram (or Title Bigram) search box in E. Multiple entries within a search box are OR'ed. Selections across different components are AND'ed.

  8. The size (area) of a tile in F is proportional to log (#papers + 1). This choice was made so that the order of the tiles (from left to right) is proportional to #papers, and yet venues with a very small number of papers are still shown in a large enough size to enable ease of viewing and selecting them.

  9. You can download the source NLP Scholar Demo file (click icon in bottom right of viz) and open it with the free Tableau Reader on your computer for a more responsive interaction.

  10. Author name variants are derived from the list of name variants compiled by ACL Anthology. If one or more of your AA papers is submitted under a different name, you can report it as a name variant to AA. The next round of data collection by NLP Scholar will then consolidate papers accordingly. In addition to the name variants file AA applies some heuristics to automatically find name variants. The current version of NLP Scholar does not incorporate this, but we hope to do so in the future.

  11. Some entries in the ACL Anthology, such as forewords, table of contents, and schedules, were removed as part of the preprocessing of the data.

  12. Feedback: You can provide feedback, report issues, and request corrections by sending an email to uvgotsaif@gmail.com with "NLP Scholar Feedback" in the subject line. I would especially love to hear about all the cool things it helped you discover!

 

Screenshot of NLP Scholar When Showing the Full June 2020 Data