Group Abstract Group Abstract

Message Boards Message Boards

0
|
55 Views
|
0 Replies
|
0 Total Likes
View groups...
Share
Share this post:

A Comparative Analysis of LLM Sentiment Distribution Across Varied Situational Depressiveness


Abstract

The goal of this study is finding the most positive AI in different level of depressing situation focusing on sentiment distribution. The levels of prompt's depression are divided into four based on negativity probability: 0.6-0.7, 0.7-0.8, 0.8-0.9, 0.9-1.0. Fifteen prompts per level are given to AI, and all of their answers are analyzed with Classify["Sentiment"]. Sentiment distribution observes how positivity scores of sentences are spread; models with high concentration in the higher bound are defined as more positive. Among three models, the most positive model is ChatGPT in 0.6 - 0.7 level, Claude in 0.7 - 0.8 level, and Gemini in both 0.8 - 0.9 level and 0.9 - 1.0 level. However, by absolute standards, none of the models are positive; rather they are biased toward negativity.


1. Introduction

The most distinguished characteristics of AI's writing from human writing may be the positivity. While humans remain negative in depressing situations, AI hardly shows the negative response in discouraging situations (Bardol, 2025). This might be the result of an endeavor to maximize the users' satisfaction. Users perceive LLMs more useful, trustful when they get positive answers from LLMs (Pataranutaporn et al., 2026). In this research, I intend to verify this hypothesis that LLMs will generate positively biased answers even to negative prompts.

Unfortunately, positivity is not always helpful to users. When users counsel their depressing situation with AI, the cheerfulness of it will help users. However, when users ask the dispassionate review for loss in stock market, positivity of AI can be counterproductive. Depending on the situation of the user, the optimal level of AI's positivity is different. Using different AI models based on the situation can lead to better user experience. The problem is that choosing an optimal AI model for a specific situation is hard due to the lack of research about comparison of different AI model's positivity.

Therefore, this research aims to provide useful data for optimal AI matching for each situation. Positivity distribution of three LLM models' answers - ChatGPT, Gemini, and Claude - in four levels of depressing situations will be analyzed. Sentiment scores of each model's answers per level will be plotted to probability density function graph. It will show the general tendency of each model to discouraging prompts.

2. Methodology

2.1 Prompt

a. Creation of Prompt

Each LLM are asked to generate the depressing topic of essay. The tasks are asked without log-in, while Claude is logged in with no baseline data. The prompt is as following:

Suggest me the open discussion question that I can write a 300 words essay. I want the question is about the drastically tragic situation.

b. Classifying the level

Calculate the depressing level of whole prompt.

Classify["Sentiment", "If a person gives their life to save another, but the person they saved dies moments later anyway, was the sacrifice a beautiful act of love or a meaningless waste of a soul?", "Probabilities"] 

Since the negative sentiment score is within 0.6 - 0.7, it is categorized into level 0.6. This process is repeated until each level has 5 prompts per AI.

c. Creation of answer

Each LLM is asked to generate 300 words essay about AI-generated topic. The tasks are asked without log-in, while Claude is logged in with no baseline data. The prompt is as following:

Write me a 300 words essay about this topic : "AI-generated topic"

2.2 Sentiment Distribution

Segment each model's response into individual sentences.

SampleSentence = TextSentences["Surviving a tragedy can have a profound impact on a person's life. Some people emerge from difficult experiences stronger and more resilient, while others suffer lasting emotional and psychological damage. Whether tragedy makes a person stronger or permanently damaged depends on many factors, including their personality, support system, and the nature of the event itself.On one hand, tragedy can help people develop inner strength. Facing hardship often teaches valuable life lessons about perseverance, courage, and gratitude. Many survivors learn to appreciate life more deeply and become better able to handle future challenges. For example, people who overcome serious illnesses, natural disasters, or personal losses may discover strengths they never knew they possessed. Their experiences can inspire them to help others and create positive change in their communities.On the other hand, tragedies can leave deep scars that are difficult to heal. Some survivors experience long-term effects such as anxiety, depression, or post-traumatic stress disorder (PTSD). Painful memories may continue to affect their relationships, work, and daily lives. In severe cases, the emotional damage can remain for many years, making it hard for individuals to fully recover. Not everyone has access to the support, therapy, or resources needed to overcome such challenges.However, being damaged and being strong are not always opposites. A person can carry emotional wounds while still showing remarkable resilience. In fact, many survivors learn to live with their pain rather than completely erase it. Their strength comes not from forgetting the tragedy but from continuing to move forward despite it.In conclusion, surviving a tragedy can make people stronger, permanently damaged, or both at the same time. Human responses to suffering are complex, and every individual's journey is different. What matters most is the support, understanding, and opportunities available to help survivors heal and grow."]

Evaluate the positivity of sentences.

SampleSentiment = Classify["Sentiment", SampleSentence, "Probabilities"]

Plot sentiment scores of all sentences from one AI in one level.

SampleSentences = Flatten[{SampleSentence, SentenceGPT26gpt, SentenceGPT36gpt, SentenceGPT46gpt, SentenceGPT56gpt, SentenceGem16gpt, SentenceGem26gpt, SentenceGem36gpt, SentenceGem46gpt, SentenceGem56gpt, SentenceCla16gpt, SentenceCla26gpt, SentenceCla36gpt, SentenceCla46gpt, SentenceCla56gpt}]
 SampleSentiments = Classify["Sentiment", SampleSentences, "Probabilities"]
 SamplePo = Lookup[#, "Positive", 0.] & /@ SampleSentiments
 SampleGPo = SmoothKernelDistribution[SamplePo, Automatic, {"Bounded", {0, 1}, "Gaussian"}]
 Plot[CDF[SampleGPo, x], {x, 0, 1}, Filling -> Axis,  PlotRange -> All, PlotLabel -> "Positivity Distribution of ChatGPT's answers in 0.6-0.7 level"]

SampleSentences = Flatten[{SampleSentence, SentenceGPT26gpt, SentenceGPT36gpt, SentenceGPT46gpt, SentenceGPT56gpt, SentenceGem16gpt, SentenceGem26gpt, SentenceGem36gpt, SentenceGem46gpt, SentenceGem56gpt, SentenceCla16gpt, SentenceCla26gpt, SentenceCla36gpt, SentenceCla46gpt, SentenceCla56gpt}]
 SampleSentiments = Classify["Sentiment", SampleSentences, "Probabilities"]
 SamplePo = Lookup[#, "Positive", 0.] & /@ SampleSentiments
 SampleGPo = SmoothKernelDistribution[SamplePo, Automatic, {"Bounded", {0, 1}, "Gaussian"}]
 Plot[CDF[SampleGPo, x], {x, 0, 1}, Filling -> Axis,  PlotRange -> All, PlotLabel -> "Positivity Distribution of ChatGPT's answers in 0.6-0.7 level"]

3. Sentiment Distribution

3.1 0.6 - 0.7 level

Draw positivity distribution graph of three models in 0.6 - 0.7 level

Plot[
    {CDF[SampleGPo, x], CDF[GPo6gem, x], CDF[GPo6cla, x]}, 
    {x, 0, 1}, 
    Filling -> Axis, 
    PlotRange -> All, 
    PlotLegends -> {"ChatGPT", "Gemini", "Claude"}, 
    PlotLabel -> "Positivity Distribution in 0.6-0.7 level" 
  ]

ChatGPT appears to be the most positive model in 0.6- 0.7 level because the low initial values of the CDF imply that data has a smaller proportion of low-value scores and higher concentration in the upper bounds. However, it does not mean ChatGPT is positive by absolute standard. In all models, more than 70% of sentences have positivity sentiment below 0.2. Most answers are not positive, which suggests LLMs do not have positive bias in this level. This leaves three possibilities: the model responses are fundamentally negative, neutral, or a combination of both. To evaluate possibilities, analysis of negativity and neutrality distribution are needed.

Draw negativity distribution and neutrality distribution graph

Plot[
     {CDF[SampleGN, x], CDF[GN6gem, x], CDF[GN6cla, x]}, 
     {x, 0, 1}, 
     Filling -> Axis, 
     PlotRange -> All, 
     PlotLegends -> {"ChatGPT", "Gemini", "Claude"}, 
     PlotLabel -> "Negativity Distribution in 0.6-0.7 level" 
   ] 
  
 Plot[
    {CDF[SampleGNe, x], CDF[GNe6gem, x], CDF[GNe6cla, x]}, 
    {x, 0, 1}, 
    Filling -> Axis, 
    PlotRange -> All, 
    PlotLegends -> {"ChatGPT", "Gemini", "Claude"}, 
    PlotLabel -> "Neutrality Distribution in 0.6-0.7 level" 
  ]

In all models, negativity and neutrality distribution show slower accumulation than positivity distribution.

Draw all sentiments' distribution graph of each model in one plane.

Plot[{CDF[SampleGPo, x], CDF[SampleGN, x], CDF[SampleGNe, x]}, {x, 0, 1}, PlotLegends -> {"Positivity", "Negativity", "Neutrality"}, PlotLabel -> "Sentiment Distribution of ChatGPT in 0.6-0.7 level"]
 Plot[{CDF[GPo6gem, x], CDF[GN6gem, x], CDF[GNe6gem, x]}, {x, 0, 1}, PlotLegends -> {"Positivity", "Negativity", "Neutrality"}, PlotLabel -> "Sentiment Distribution of Gemini in 0.6-0.7 level"]
 Plot[{CDF[GPo6cla, x], CDF[GN6cla, x], CDF[GNe6cla, x]}, {x, 0, 1}, PlotLegends -> {"Positivity", "Negativity", "Neutrality"}, PlotLabel -> "Sentiment Distribution of Claude in 0.6-0.7 level"]

Sentiments are not distributed equally to positivity, negativity and neutrality. They are allocated to negativity and neutrality similarly. In particular, answers of Gemini and Claude seem to be biased toward negativity and neutrality, not toward positivity. In contrast, ChatGPT's responses slightly lean toward negativity.

3.2 0.7 - 0.8 level

Draw Positivity distribution graph of three models in 0.7 - 0.8 level.

Plot[
    {CDF[GPo7gpt, x], CDF[GPo7gem, x], CDF[GPo7cla, x]}, 
    {x, 0, 1}, 
    Filling -> Axis, 
    PlotRange -> All, 
    PlotLegends -> {"ChatGPT", "Gemini", "Claude"}, 
    PlotLabel -> "Positivity Distribution in 0.7-0.8 level" 
  ]

It appears that Claude is the most positive model in 0.7 - 0.8 level. However, Claude is only relatively positive since positive sentiment is focused on [0, 0.2]. Evan at more negative level, LLMs continue to exhibit an absence of positive bias.

Draw all sentiments' distribution graph in one plane. Plot[{CDF[GPo7gpt, x], CDF[GN7gpt, x], CDF[GNe7gpt, x]}, {x, 0, 1}, PlotLegends -> {"Positivity", "Negativity", "Neutrality"}, PlotLabel -> "Sentiment Distribution of ChatGPT in 0.7-0.8 level"] Plot[{CDF[GPo7gem, x], CDF[GN7gem, x], CDF[GNe7gem, x]}, {x, 0, 1}, PlotLegends -> {"Positivity", "Negativity", "Neutrality"}, PlotLabel -> "Sentiment Distribution of Gemini in 0.7-0.8 level"] Plot[{CDF[GPo7cla, x], CDF[GN7cla, x], CDF[GNe7cla, x]}, {x, 0, 1}, PlotLegends -> {"Positivity", "Negativity", "Neutrality"}, PlotLabel -> "Sentiment Distribution of Claude in 0.7-0.8 level"]

The graphs indicate that answers of ChatGPT and Gemini have similar distribution between negativity and neutrality, but slightly biased toward negativity. Whereas, Claude's responses seem to be somewhat skewed in favor of negativity.

3.3 0.8 - 0.9 level

Draw Positivity distribution graph of three models in 0.8 - 0.9 level.

Plot[
    {CDF[GPo8gpt, x], CDF[GPo8gem, x], CDF[GPo8cla, x]}, 
    {x, 0, 1}, 
    Filling -> Axis, 
    PlotRange -> All, 
    PlotLegends -> {"ChatGPT", "Gemini", "Claude"}, 
    PlotLabel -> "Positivity Distribution in 0.8-0.9 level" 
  ]

Gemini emerges as the most positive model in 0.8 - 0.9 level, while ChatGPT is the least positive model in this level. The graph shows high concentration in low interval, which suggests possibility that sentiment distribution is biased toward negativity or neutrality or both.

Draw all sentiments' distribution graph in one plane.

Plot[{CDF[GPo8gpt, x], CDF[GN8gpt, x], CDF[GNe8gpt, x]}, {x, 0, 1}, PlotLegends -> {"Positivity", "Negativity", "Neutrality"}, PlotLabel -> "Sentiment Distribution of ChatGPT in 0.8-0.9 level"]
 Plot[{CDF[GPo8gem, x], CDF[GN8gem, x], CDF[GNe8gem, x]}, {x, 0, 1}, PlotLegends -> {"Positivity", "Negativity", "Neutrality"}, PlotLabel -> "Sentiment Distribution of Gemini in 0.8-0.9 level"]
 Plot[{CDF[GPo8cla, x], CDF[GN8cla, x], CDF[GNe8cla, x]}, {x, 0, 1}, PlotLegends -> {"Positivity", "Negativity", "Neutrality"}, PlotLabel -> "Sentiment Distribution of Claude in 0.8-0.9 level"]

All models are moderately biased toward negativity unlike previous levels. It indicates that changes of sentiment level of prompts in more depressing ways impact sentiment distribution of their answers.

3.4 0.9 - 1.0 level

Draw Positivity distribution graph of three models in 0.9 - 1.0 level.

Plot[
    {CDF[GPo9gpt, x], CDF[GPo9gem, x], CDF[GPo9cla, x]}, 
    {x, 0, 1}, 
    Filling -> Axis, 
    PlotRange -> All, 
    PlotLegends -> {"ChatGPT", "Gemini", "Claude"}, 
    PlotLabel -> "Positivity Distribution in 0.9-1.0 level" 
  ]

Consistent with the trend of the previous level, the most positive model in 0.9 - 1.0 level appears to be Gemini and the least positive model seems to be ChatGPT.

Draw all sentiments' distribution graph in one plane.

Plot[{CDF[GPo9gpt, x], CDF[GN9gpt, x], CDF[GNe9gpt, x]}, {x, 0, 1}, PlotLegends -> {"Positivity", "Negativity", "Neutrality"}, PlotLabel -> "Sentiment Distribution of ChatGPT in 0.9-1.0 level"]
 Plot[{CDF[GPo9gem, x], CDF[GN9gem, x], CDF[GNe9gem, x]}, {x, 0, 1}, PlotLegends -> {"Positivity", "Negativity", "Neutrality"}, PlotLabel -> "Sentiment Distribution of Gemini in 0.9-1.0 level"]
 Plot[{CDF[GPo9cla, x], CDF[GN9cla, x], CDF[GNe9cla, x]}, {x, 0, 1}, PlotLegends -> {"Positivity", "Negativity", "Neutrality"}, PlotLabel -> "Sentiment Distribution of Claude in 0.9-1.0 level"]

Answers of ChatGPT and Claude show moderately biased sentiment distribution toward negativity. The graph of Gemini implies that distribution mildly leans toward negativity.

3.5 Trend

Show the changes of positivity concentration in [0, 0.2].

SenChangeGPT = {{1, CDF[SampleGPo, 0.2]}, {2, CDF[GPo7gpt, 0.2]}, {3, CDF[GPo8gpt, 0.2]}, {4, CDF[GPo9gpt, 0.2]}}
 SenChangeGem = {{1, CDF[GPo6gem, 0.2]}, {2, CDF[GPo7gem, 0.2]}, {3, CDF[GPo8gem, 0.2]}, {4, CDF[GPo9gem, 0.2]}}
 SenChangeCla = {{1, CDF[GPo6cla, 0.2]}, {2, CDF[GPo7cla, 0.2]}, {3, CDF[GPo8cla, 0.2]}, {4, CDF[GPo9cla, 0.2]}}
 ListLinePlot[{SenChangeGPT, SenChangeGem, SenChangeCla}, PlotLegends -> {"GPT", "Gemini", "Claude"}]

All models show a uniform trajectory regarding shifts in score density. From 0.6 - 0.7 tier to 0.7 - 0.8 tier, concentration increases but after that, every models exhibit decrease in density when depressing level increases. In low level, increased depressing level makes answers become less positive than before. However, in high level, it makes LLM's answers become more positive than previous level. In addition, ChatGPT's concentration figures decrease sharply compared to other models. ChatGPT seems to be the most sensitive model to changes in sentiment of prompts.

4. Conclusion

4.1 Conclusion

$\begin{array}{ccc} \text{Level} & \text{The} \text{most} \text{positive} \text{model} & \text{Biased} \text{how}? \\ 0.6 - 0.7 & \text{ChatGPT} & \text{Negativity}\&\text{Neutralitiy} \\ 0.7 - 0.8 & \text{Claude} & \text{Negativity} \\ 0.8 - 0.9 & \text{Gemini} & \text{Negativity} \\ 0.9 - 1.0 & \text{Gemini} & \text{Negativity} \\ \end{array}$

In each level, the most positive model is as shown in the table above. However, by absolute standard, even the most positive model can not be considered positive because of low positivity scores. My initial hypothesis was that LLMs may show positive bias even if the prompt is depressing. Contrary to that, LLMs rather exhibit bias toward negativity and neutrality. In 0.6 - 0.7 level, most models show skewed sentiment distribution toward negativity and neutrality. Other levels manifest bias in favor of negativity. The more depressing the prompts are, the more biased the distribution is. Basic LLMs replicate negative sentiment of depressing prompts. The reason that LLMs did not show positive bias might be because there was no baseline data. AI begins to produce more favorable responses after interaction with users (Jain et al., 2026), which was not present in this research. Therefore, future work should examine how LLMs' tendency changes after dynamic interaction with users.

4.2 Limitations

Even though generative AI created the prompts, I selected 15 prompts among all possible prompts whose sentiment scores were within level category standard. I eliminated repeated subjects and tried to make sentiment scores diverse within one level. After I selected prompts, I generated the answers and analyzed the sentiment scores. Consequently, this process preserved data integrity.

When I generated answers, I failed to generate Claude's answers without any previous chat data. Claude required log-in to converse. So when creating Claude's answers, chat data was accumulated. I did not rate any of Claude's answers and just asked Claude to write essays about designated prompts. However, Claude also had the chat data of generating prompts. Claude's results may have been influenced by these baseline data, but other models' answers were created under no baseline data.

When AI models answer the prompts, ChatGPT and Claude often include titles in their essay. ChatGPT's titles were just repeating prompts, whereas Claude's titles were not. I decided to exclude all the titles in analysis because of two reasons. First, analyzing the sentiments of prompts was repetitive. Second, analyzing only Claude's title lacks methodological consistency.

Gemini sometimes included tables in its answers. However, I exclude tables in the answer due to two reasons. First, the contents in the table are hard to separate automatically because they do not usually have punctuation marks. Second, the contents in the table are often short, which makes it very hard to proceed with sentiment analysis.

Originally, I intended to calculate a regression line in order to evaluate the drift of positivity sentiment. However, p-value was higher than 0.4 and coefficient of determinant was lower than 0.2. The regression line was not statistically meaningful, which is the reason that I decided not to include regression analysis in this essay.

References

  • Abdullah, K., Suprith, G., Nicole, A., Dave, P., Gracia, B., Elly, B., Ahsun, A., & Hassan, J. (2026). Assessing the accuracy and bias of large language models in drafting management discussion and analysis (MD&A) reports from structured XBRL data. International Journal of Multi Discipline Science (IJ-MDS). https://www.researchgate.net/publication/403661392

  • Almulla, N. (2025). The use of hedging devices and engagement markers in AI-Generated and Human-Written Essays: A Corpus-Based comparison. Open Journal of Modern Linguistics, 15(05), 754--772. https://doi.org/10.4236/ojml.2025.155044

  • Jain, S., Park, C., Viana, M., Wilson, A., & Calacci, D. (2026). Interaction context often increases sycophancy in LLMs. In Proceedings of the 2026 CHI Conference on Human Factors in Computing Systems (Article 793, pp. 1--26). https://doi.org/10.1145/3772318.3791915

  • Pataranutaporn, P., Lee, E., Amores, J., & Maes, P. (2026). Personal validation effect in LLMs: positive AI responses bias perceptions of validity, reliability, personalization, and usefulness of fictitious predictions. In Proceedings of the 2026 CHI Conference on Human Factors in Computing Systems (Article 117, pp. 1--18). Association for Computing Machinery. https://doi.org/10.1145/3772318.3791851

Attachments:
POSTED BY: Yewon Lim
Reply to this discussion
Community posts can be styled and formatted using the Markdown syntax.
Reply Preview
Attachments
Remove
or Discard