Here you find the details of analysis settings options.
4. Language Recognition & Mixed Language Analysis
1. Text Splitting
Text splitting is the basis of every text analysis process, therefore it is the very first thing to do when starting any analysis. It defines what the smallest independent unit of our analysis will be. Text splitting uses the punctuation in the text, and thus splitting the text into subsentences is the most intuitive method in NLP solutions. For any given Zurvey.io analysis, splitting the text into subsentences is the default scenario. However, there are two other possibilities: sentences and paragraphs. This can be set right before starting an analysis.
Changing the method of text splitting has consequences in entity oriented sentiment analysis (more on that in Entity oriented sentiment analysis) and custom label recognition when using the AND operator (more on that in custom label Creation and management).
When the smallest examined element of the text analysis is the subsentence, topics are only influenced by the sentiment values of the sentiment phrases occurring in the same subsentence, giving an intuitive result as in the following example. The topic product bears the sentiment value of itself works great - the two being in the same subsentence, therefore the sentiment score of the topic product is +2. The topic job is in a subsentence with no recognized sentiment phrase, thus the topic itself has a neutral sentiment score, while the topic customer service is negative due to the negative sentiment value of the phrase is unsatisfactory within the same subsentence.
After changing the text splitting to sentence, our example looks as follows. Now the borders of subsentences don’t apply, so every sentiment phrase influences the sentiment score of every topic within the same sentence. Now the topic job has a positive sentiment score due to the sentiment phrase itself works great in the same sentence.
If we change the text splitting method to paragraph, it further changes our example. Now even the borders of sentences don’t apply and every sentiment phrase influences the sentiment score of every topic within the same paragraph. This time all the recognized topics are neutral because their sentiment score is calculated by adding up all the sentiment phrases in the paragraph.
Based on the above examples, we can conclude that splitting the text into subsentences is really the most intuitive way of analyzing the text regarding entity oriented sentiment analysis. However, there is another case when changing the text splitting method comes in handy: the usage of custom labels and the AND operator. The two or more parts of the synonym or excluded phrase connected with the AND operator can only be a hit if all said parts are within the same subsentence, sentence or paragraph, depending on which one we chose before starting the analysis (more on that in custom label Creation and management).
2. Accented Characters
When working with texts that may contain accented characters, you need to decide if you want to replace those characters with their non-accented counterparts for the analysis or not. The replacement only affects the analysis in the background, the text will be displayed in its original form on the portal and in the exports as well. By default, replacing the accented characters is turned on when starting a new analysis.
Replacing the accented characters can make a difference when users type consistently without accents or when unexpected accents appear in the text due to misspellings. For example, in the following Hungarian mention we are faced with unexpected accents on the character e. (The mention means: +2 hours would not increase the efficiency, I would work that much less.) When the accented characters are replaced by their non-accented counterparts for the analysis, the sentiment is correctly recognized (would not increase the efficiency as negative) and the topic hatékonyság (efficiency) is also correctly recognized.
However, when replacing is switched off, neither the sentiment nor the topic is recognized, as hatèkony is not a synonym of the topic hatékonyság and the sentiment lexicon does not contain unexpected accented versions of phrases.
We do not recommend switching this setting off unless you are certain that your textual input is clean and does not contain misspelled words with unexpected accents.
3. Lowercase Conversion
In addition to replacing accented characters, you have the option to choose whether text analysis should consider uppercase characters or convert all text to lowercase. By default, this option is enabled, but it's worth noting that in specific scenarios, it may impact the accuracy of entity recognition, particularly for brands.
For instance, consider the example of "Dove." It is only recognized when written in uppercase, as demonstrated in the example.
It's crucial to highlight that for sentiment analysis, all text will be converted to lowercase. In this context, the choice between uppercase and lowercase has no bearing on precision.
4. Language Recognition & Mixed Language Analysis
Zurvey.io’s NLP engine is capable of automatic language recognition. It can be useful if the analyzed document has verbatims of various languages, if we expect answers for our survey in more than one language or we are not sure what language the answers will use. The examined unit for language recognition is the verbatim, the whole verbatim will be recognized as written in one language. If the verbatim itself contains words from various languages, the language present in the largest proportion will be recognized.
Automatic language recognition can be triggered by choosing the Mixed option for language before starting an analysis.
Automatic language recognition is fully compatible with multilingual custom labels (more on that in custom label Creation and management).