This chapter provides detailed information about how the Natural Language Processing algorithm of Neticle works.
1. Entity & Topic Recognition
Automatic topic recognition uses a lexicon for labels, where the available labels have their corresponding sets of synonyms. The label is recognized in a verbatim only if any of its synonyms can be found in the text, either by matching the beginning of a phrase, or by an exact match (depending on the settings of each label). Some labels can have excluded phrases as well - this prevents the label recognition in texts where the excluded phrase is present.
For example, the English label “import” has one synonym: import. It matches if the beginning of a phrase contains the string import, so the following phrases will be hits: import, importing, imported, importer etc. However, it also has two excluded phrases: important, importance. This means that even if a phrase begins with import (a synonym), it will not be a hit if it also begins with any of the excluded phrases. The verbatim below shows exactly that. The word “imports” in the first sentence is a hit for the topic import (hence the dotted underline), while the word “importance” in the second sentence is not.
Besides automatic topic recognition, entities are also recognized in the analyzed texts. Zurvey.io can recognize the following entities: brands, organizations, locations and people. The former three work the same way as automatic topic recognition does: each brand, organization and location has its own corresponding synonym set with optional excluded phrases.
For example, the brand MOL has only one synonym: mol. It uses the exact match method, which means it will only be a hit if the text contains the word mol, this three-character string as it is (not being case sensitive). This way we can be sure it will not be a hit for words like molecule, Molotov etc. without using any excluded phrases.
The name (people) recognition in texts uses a more dynamic method: the system has access to lists of existing given names and surnames and it tries to find their possible combinations in each verbatim.
2. Entity oriented sentiment analysis
Sentiment analysis helps the user determine the sentiment of any part of the examined text - be it the whole text, a paragraph or only a subsentence. The basis of sentiment analysis is the sentiment lexicon, which consists of sentiment phrases and the numeric values associated with them (meaning every sentiment phrase in Neticle’s sentiment lexicon has a corresponding number as value). For values, Neticle uses a scale of -3 to +3. -3 is the most negative, +3 is the most positive, while 0 means neutral sentiment. Then the system itself uses these sentiment phrases to build longer phrases or modify the existing ones, for example using adverbs or negation. Neticle uses a unique sentiment lexicon for every language in its repertoire, so the sentiment analysis is carried out in the language of the original text, without translation. Finally, the numeric values are added up, resulting in the sentiment score of the whole verbatim or entities.
For example, in case of the verbatim below, there is one sentiment phrase recognized with the sentiment value of -2 (not loading). Adding up will result in the sentiment score of the whole verbatim, which in this case is -2.
Entity oriented sentiment analysis means that we can determine the sentiment score of each recognized entity individually by adding up the sentiment values occurring in the same subsentence as the examined entity. In the example above, two entities are recognized - usage and load. While the entity usage has no recognized sentiment phrase in its surroundings (thus having the neutral sentiment score of 0), load is recognized in the subsentence where the sentiment phrase not loading is also present, resulting in a sentiment score of -2 by adding up the recognized sentiment phrases’ values in the examined subsentence.
Read more about this feature on our blog
Debunking sentiment analysis: why it's useful and how accurate it is