Kenelyze’s Autumn 2020 Release: Text Networks, Improved Performance and More

The Autumn 2020 release of Kenelyze is now live for all users!

This release brings major new functionality which allows users to create Text Similarity and Co-Occurrence networks based on input of unstructured text data, including for example document abstracts, book summaries, product reviews, press releases and any other type of text you may have available in a dataset. Next to this, significant performance improvements have been implemented across the board to allow you to move from data to network even faster, including new implementations of the layout algorithm, centrality measures and more.

An example of a Text Similarity network created in Kenelyze (based on publications related to COVID-19)

Here’s a detailed overview of what’s new and improved:

Text Similarity and Co-Occurrence Networks

Since its inception, Kenelyze has been able to create networks based on any type of structured metadata in a dataset. This new release now also makes it possible to create networks based on unstructured text data in datasets you are importing. Text-based network types are very valuable when you wish to generate overviews of similar records in a dataset based on textual overlap, or when you want to gain a high-level overview of the contents of your dataset by looking at text co-occurrence in textual fields of interest. These new network types can be selected during the import process, right after importing your dataset:

Kenelyze creates text networks by pre-processing text metadata of your choice using Natural Language Processing (NLP) techniques such as tokenization, lemmatization, stopword filtering and the creation of vectors to calculate similarities between records. All of this is done 100% locally by your own browser – as usual, no data is sent to any external server when using Kenelyze. Text similarity networks can be created using text in any language; co-occurrence networks work best when working with data in English.

When you create a Text Similarity network, you’ll be asked 3 questions to select the fields in your dataset you wish to use:

In the above example, Kenelyze creates a network of documents linked by textual similarity between their Abstracts (chosen in the first question), with node labels based on Titles (second question) and with each node having an additional Date Published attribute.

It is possible to change the similarity threshold which Kenelyze uses to link documents using the Advanced import settings panel pictured above. A value of 0,15 generally works well for medium-sized text data (summaries, abstracts, longer product reviews, etc.). If you’re looking at shorter bits of text (document titles, short reviews, etc.), a higher value around 0,4-0,5 is recommended. Kenelyze also automatically clusters the network by calculating communities and focuses on high-weight links between clusters to improve the readability of the visualization. You can disable these settings using the checkboxes if you wish.

When creating a term co-occurrence network, you’ll be asked the following questions:

The advanced settings for this network type allow you to change minimum frequencies for nodes and links, exclude low-quality terms (terms which have a disproportionately high number of highly weighted connections), and set whether you wish to automatically calculate communities:

Here’s an example of a text co-occurrence network based on a dataset containing product reviews for an Amazon tablet:

Labeling Groups

To help you in your exploration of networks based on text, Kenelyze now also includes functionality to automatically label groups/clusters detected in your network by analyzing the contents of an attribute of your choice. This can be found under the Groups button in the menu bar:

You can pick an attribute which defines groups in your network (for example, Communities calculated using Kenelyze), choose the metadata which you want to analyze (for example, a summary associated with a node), based on which Kenelyze will then label the groups based on the highest-scoring Noun phrases or Nouns detected in each group. In the example network at the top of this post, the labels visible in the top left panel were created using this new functionality.

Major Performance Improvements: Layouts & Metrics

Included in this new release are complete rewrites of the implementations of the layout algorithm and various metrics (betweenness, closeness, reach, eccentricity and diameter), resulting in significant overall calculation speed-ups. Generating layouts for networks containing thousands of nodes is faster than ever before – something which is especially noticeable when importing a dataset and generating a network.

New Metrics: Link Betweenness Centrality, Weighted Betweenness

It is now also possible to calculate the betweenness centrality of links to get a view on bridges in your network. Next to this, Kenelyze can now also take into account link weights when calculating betweenness centrality (based on Dijkstra’s shortest path algorithm). You can find these options right next to the button to calculate betweenness:

Community Detection Settings

When detecting communities/clusters in your network, it is now also possible to determine the granularity of the resulting communities by setting a resolution parameter. Lower values generally lead to fewer/larger communities; higher values lead to more/smaller communities. To increase the readability of layouts in dense networks, you can now also choose to only show the heighest-weight links between detected communities.

Improved Data Import Process

To move from data to insights even faster, Kenelyze’s standard setting is now to pre-calculate communities in your network after initial data import of any network type. The improved layout algorithm now also stops automatically when a good initial layout has been reached.

Explore Mode Enhancements

When working with large datasets, it is often worthwhile to use Kenelyze’s Explore Mode to iteratively build up your network based on specific nodes of interest. This release brings various enhancements related to Explore Mode, including smoother expansion animations and the ability to expand selections of nodes.

That’s it for now. Thanks to all users for their continuous feedback! Many of the features and improvements listed above are the result of your feedback and comments. If you need any support with the new features, please drop us a line at support@kenelyze.com.

If you’re interested in a demonstration of the platform, please let us know via the ‘Try Now’ button above. We’ll get back to you as soon as possible.

Contact

Do you have general questions or need support? Mail us at support@kenelyze.com

Kenelyze is a product by Kenedict Innovation Analytics