Who’s Who in Large Language Model Science? Mapping Science as a Graph

Interactive version of the visualization: https://kenedict.com/networks/llm

Large Language Models and their applications are here to stay, with key players such as Microsoft, Google, Baidu and Alibaba all having announced ChatGPT-like functionality or products. Of course, there’s a lot of science behind these achievements. Let’s turn to graphs to find out more about the key players and their publications.

Data Collection & Preparation

OpenAlex‘s excellent API was used to collect publications containing relevant terms in their title or abstract metadata. The terms used for the search were:

“large language model”
“generative language model”
“autoregressive language model”
“transformer language model”
“transformer-based language model”
“large transformer model”
“generative pre-trained transformer”

This is by no means an exhaustive list of relevant keywords, but should give us enough relevant data to work with for the example presented here.

The output was filtered to only include records which contain information on the participating institutions. Duplicate records (for example, the same paper posted on multiple sources) were removed manually. The institutions metadata was finally cleaned using OpenRefine.

Creating the graph in Kenelyze

To take a look at the institutions behind LLM-related science, let’s creating a graph consisting of two node types: Publications and Institutions. We can do this in Kenelyze by picking the ‘Multiple Node Types’ network type when importing data, and then selecting our columns of interest in the dataset:

After importing the data, Kenelyze automatically generates a layout and colors nodes by their communities.

Many graphs consist of multiple connected components, i.e. sub-groups of nodes in the network which can reach each other directly or indirectly. For this example, we’ll be looking at the largest connected component of papers and institutions. You can determine the components in your graph using the Components button in Kenelyze’s Metrics menu:

Here’s what the final network looks like, after adjusting some visualization settings to show more labels:

Exporting interactive visuals

To be able to share this visual with others, Kenelyze can export the created visual to a fully interactive, single HTML file using the Export Visual button in the menu bar:

The final visual can be explored here: https://kenedict.com/networks/llm

Of course, this is just one of many perspectives graphs can give on this specific dataset. In other types of analysis, we could examine collaboration networks by connecting institutions when they co-appear on papers, or apply text-based clustering to get a look at clusters of topics and themes.

Interested in doing this for other datasets, or want to find out more about how Kenelyze can support you in going from data to graph-powered insights? Let us know at info@kenelyze.com

Contact

Do you have general questions or need support? Mail us at support@kenelyze.com

Kenelyze is a product by Kenedict Innovation Analytics