Different Providers Different Sentiment Scores; not all sentiments are equal
Connexun recently integrated its Sentiment Analysis and Text Summarization APIs into Eden AI platform. The good part of Eden AI is that it gives its users the opportunity to subscribe and compare results from different providers concurrently unlike other marketplaces.
Connexun’s team subscribed to Eden AI and did the analysis and comparison of results of Sentiment Analysis scores from six providers (Microsoft, AWS, IBM, Google, Lettria, and of course Connexun) for
Test Simple and Challenging cases.
Comparison between providers on World News articles in English.
Comparison between providers' prediction with Human Labelled Finance News articles in English.
Sentiment scores from different providers were deduced to the common 1D scale from +1 positive to -1 negative through 0 neutral sentiments,
Simple and Challenging cases | Comparison of Sentiment Scores
a. “Today is a very nice day”
Microsoft 1.00 positive, AWS 0.99 positive, IBM 0.96 positive, Google 0.90 positive, Lettria 0.60 positive, Connexun 0.99 positive
b. “It was a terrible idea “
Microsoft 0.99 negative, AWS 0.99 negative, IBM 0.97 negative, Google 0.80 negative, Lettria 0.29 negative, Connexun 0.97 negative
a. “Today isn’t a bad day”
Microsoft 0.55 positive, AWS 0.86 positive, IBM 0.92 positive, Google 0.30 positive, Lettria 0.22 negative, Connexun 0.88 positive.
b. “I do not have good experience with this approach”
Microsoft 0.98 negative, AWS 0.90 negative, IBM 0.44 negative, Google 0.80 negative, Lettria 0.19 negative, Connexun 0.94 negative
The positive and negative sentiment classes were the same for simple and challenging cases across all providers. An exception was Lettria’s classification of “Today isn’t a bad day” as a negative.
World News Articles in English | Comparison of Sentiment Scores
We did a comparison study for the sentiment scores of world news from different providers. The results of sentiment scores from the data set of 300 randomly selected world news from Connexun News API were analysed and compared for dependencies, correlations, and agreements.
Dependencies between two providers
The correlation plot below for various providers highlights the differences and similarities amongst the providers. We discovered that Google and Amazon most often agreed in terms of their results
with a Pearson coefficient value of 0.73, whereas IBM and Connexun were most in disagreement with a correlation coefficient of 0.54. Connexun sentiments were most matched with Google followed by Lettria and Microsoft with IBM being the least matched with us. Some providers had a separate class for Neutral sentiment, but IBM seems to round off any neutral score to 0 which is why no sentiment score is present for them between a range of –0.20 to +0.20. This is also reflected along the diagonal axes of the image which represents the distribution over the sentiment value of each provider and IBM showing two peaks corresponding to a positive and negative region with no values in the middle.
Notes: Google provides sentiment scores with step 10%; the rest have graduation 1% and below.
The most similar results are between Google and Amazon, Google and Connexun, Google and Lettria, Pearson correlations > 0.7.
Agreement correlation between the two providers
We then calculate agreement between providers by evaluating the percentage of the same classes of sentiment between providers, which checks whether both providers give positive sentiment to the same text, both give negative or both give neutral 0.
The maximum agreement was seen between Microsoft and Connexun (82%), Amazon and Lettria (78%), and Amazon and Connexun (76%). The minimum agreement was between IBM and Microsoft, IBM and Google (60%).
Mean sentiment score for a provider
Then we evaluated the mean sentiment score and percentage of positive, neutral and negative news over the considered sample. For the data that we used, Microsoft came out to be most biased towards the negative sentiments with only 19% of news labelled as positive, and Amazon was most biased towards positive sentiment with almost 48% of news labelled as positive. While we can say that any randomly selected dataset should contain roughly the same amount of news for both sentiments, the results from all the providers tend to tell a different story.
Finance News Articles in English — Human Labelled| Comparison of Sentiment Scores
Next, we calculated the main metrics of different models on a human labelled dataset of Financial News from Kaggle https://www.kaggle.com/datasets/ankurzing/sentiment-analysis-for-financial-news . We randomly selected 150 positive and 150 negative news from the dataset and processed them for sentiment scores through six providers. We considered outputs of providers as binary classifier positive/negative sentiment and compared with human labelled data Accuracy, Precision, Recall and F1.
The first numbers represent metrics per positive class and the second for negative.
The best performance is shown by Connexun (F1 = 0.87) and Amazon (F1 = 0.81).
We saw how different providers can vary with their sentiment scores for the same input text. We also saw the agreement between different providers can vary significantly, suggesting that different base models or training sets and fine-tuning at the providers' end can turn a slightly positive sentiment into mostly positive or vice versa. As we demonstrated the same news could have opposite sentiments in different providers. Considering the variety of providers that are available in the market at the moment it can be overwhelming for the customer to choose which one is most suited for their needs. Therefore, our goal with this blog is to highlight the differences among the top providers and help customers make better-informed decisions.
In the next part, we will assess how many providers can maintain the sentiment of a text when the translation gets mixed into the picture.
Connexun is an innovative tech startup based in Milano, Italy.
Connexun crawls news content from tens of thousands of open web sources worldwide; turning unstructured web content into machine-readable news data APIs. Its AI-powered news engine B.I.R.B.AL. empowers organizations to transform the world’s news into real-time business insight.