The need for incremental learning in sentiment analysis

By Abhinash Jena on July 7, 2025

Incremental learning also known as continuous learning is a crucial paradigm in machine learning that enables models to adapt over time. The world is generating an enormous amount of data, often in continuous streams, which makes traditional data analysis methods difficult. In such situations, the way data is distributed changes over time, so models need to update themselves constantly.

In dynamic and data-rich environments such as social media platforms or e-commerce portals where user sentiments evolve, models trained once quickly become outdated if not updated continuously (Gama et al., 2014). In sentiment analysis, the need for incremental learning is significant due to the temporal nature of language and sentiment trends. Platforms such as Twitter and other product review sites continuously generate massive amounts of user-generated content, requiring models that can scale in real-time and adapt to new sentiment indicators. Therefore, a sentiment classifier trained on static data will fail to generalize to newer expressions or topics, leading to reduced accuracy over time (Žliobaitė et al., 2015).

EXAMPLE

The meaning and usage of emojis, acronyms, and hashtags on social platforms can shift within weeks, rendering a previously accurate model obsolete.

Changes in data distribution over time or concept drift

In machine learning, changes in data distribution over time refers to the phenomenon where the statistical properties of the input data or target labels evolve and is known as concept drift. It happens when the rules or patterns a computer program is trying to learn change over time because of hidden factors that the program can’t directly see. Such drift directly impacts a model’s ability to make accurate predictions, particularly when it was trained on historical data that no longer reflects current realities (Widmer & Kubat, 1996).

EXAMPLE

During the COVID-19 pandemic, sentiment polarity of specific keywords changed significantly over time. Before the pandemic, the term “tested positive” (pregnancy) was strongly associated with good sentiment. Therefore, sentiment analysis models trained before 2020 consistently labeled this term with positive polarity. However, during the outbreak, the phrase “tested positive” became widely used to indicate someone who had contracted the virus. This shifted the sentiment polarity of the term from positive to negative.

Lyu et al., 2021

Before diving into making changes to a model it is important to monitor concept drift in a machine learning system especially in tasks like sentiment analysis. This enables proactive retraining and adaptation of models before performance drops. Concept drift becomes evident through continuous monitoring of a system’s performance, particularly when handling real-time or streaming data (Widmer & Kubat, 1996). A clear indication of concept drift can be observed from a decline in model accuracy measured by the F1 score, or an increase in error rate.

Open notebook

Most concept drift detection methods such as DDM as ADWIN rely on changes in error rates. But an imbalanced stream of data can cause false drift signals due to class imbalance-induced by performance degradation, not the actual concept drift (Gama et al., 2014).

Monitoring feature distribution to detect concept drift

The Statistical Distribution Monitoring approach is also fundamental in detecting concept drift. It leverages statistical tests and error monitoring to identify and respond to changes in data and target distributions over time. The method maintains a reference distribution, derived from the initial training data or a stable window, and compares it to the distribution of the most recent data arriving in the stream. Kolmogorov–Smirnov (KS) Test, Kullback–Leibler (KL) Divergence, or Hellinger Distance helps detect significant shifts in concept.

Feature Distribution: It is the comparision of the distribution of input features (or their relationships) between the training data and incoming data streams, where significant difference indicate drift. Common tests used are KS, Chi-squared, JS, KL. It is used when new vocabulary or review patterns emerge in the test data.

EXAMPLE

When a sentiment analysis model is trained on reviews with common words like “good,” “bad,” or “excellent,” and new data contains “slay,” “mid,” or “vibe,” the word frequency distribution changes, indicating drift.

Target Distribution: Observing changes in the distribution of the model’s predictions or the actual target variable. Common tests used to monitor skewed predictions are Chi-squared and KL.

EXAMPLE

If a model previously predicted 60% positive, 30% neutral, 10% negative sentiment, and now it’s consistently predicting 80% negative, this may indicate drift or skewed predictions.

Correlation Analysis: Tracking correlations between features and between features and predictions; shifts in these relationships signal drift. It is used to monitor structural changes in feature influence. Common tests used are Pearson, Spearman and MI.

EXAMPLE

Sentiment scores predicted by a pretrained model may have correlated strongly with word frequency for “excellent” or “terrible.” A drop in this correlation signals a shift in how features influence sentiment.

These approaches helps to capture subtle and large-scale shifts in the data pipeline, especially for dynamic tasks like sentiment analysis. Here is a chart showing the top 10 valid drifted words from a dataset of 7000 restaurant reviews.

Each colored line represents the most statistically drifted words.
If a word’s frequency sharply rises or falls around changes in the sentiment line, that word likely contributes to the sentiment drift. This suggests that the language used by the users is evolving.
The black dashed line tracks how positive sentiment changes across batches.

Drifted words with average sentiment score overlay

The word usage significantly changed across the review batches. The chart is overlaid with the average sentiment trend across the batches. In the chart, words like “food,” “service,” “great,” “experience,” and “ambience” exhibit notable variation in frequency across the batches.

EXAMPLE

“food” sees a spike in Batch 3, while “experience” dips after Batch 4.

The average sentiment (black dashed) drops in Batch 2, then peaks in Batch 4, and slightly declines again. This suggests a sentiment shift, due to service or product quality changes. Batches where words like “service” or “food” peak often align with dips or rises in sentiment. This indicates that these words are driving or reflecting sentiment shifts. Such alignment supports the relevance of the features in model retraining and the necessity of a dynamic vocabulary handling.

EXAMPLE

A model trained on early batches such as Batch 1 & 2, may misinterpret later reviews if it fails to adapt to evolving usage patterns of key sentiment-bearing words.

Furthermore, a heatmap showing the correlation analysis explains how frequently different words co-occur across the review dataset and how strongly their usage is monotonically related.

Correlation heatmap of top 10 drifted words

This heatmap shows the Spearman correlation relationships among the top 10 drifted review words in the dataset. Darker red areas on the diagonal show perfect self-correlation. Paler or blue shades between different words indicate weak or no strong relationships. These words are used independently in reviews rather than always appearing together.

Most word pairs (e.g., “food”– “service”, “great”– “experience”) show weak correlations, indicating they are not consistently used together across reviews. This means that reviewers often talk about these aspects independently rather than as part of a standard phrase or opinion structure. The lack of strong correlations implies that even frequently used sentiment terms are applied in diverse and changing contexts.

EXAMPLE

“great” may be associated with different features like “ambience” in one review, “food” in another.

These weak and scattered correlations reinforce the findings from the drift metrics i.e customer sentiment expression is fluid, and word pairings evolve over time. Therefore, a static model that relies on strong fixed associations (e.g., “great + service” always being positive) would risk misclassification in the newer batches.

Average correlation drift of top drifted words

This plot quantifies correlation drift across review batches using the customer reviews data and the selected words. Y-axis shows the average change in pairwise Spearman correlation between the top drifted words relative to Batch 1. Batch 2 shows the highest correlation drift, indicating that how words like “great” and “experience” relate to others changed significantly after the first batch. Drift then tapers off slightly but remains present across Batches 4–6, never returning to baseline. This suggests a persistent evolution in how customers pair and relate positive sentiment words over time.

Open notebook

Even though the same top words as “food”, “service”, “ambience”, “great” are used, the way customers relate them to each other changes over time. These evolving relationships can mislead a sentiment model trained on historical data, if it assumes static word associations such as “great + service” as positive. This confirms that correlation drift monitoring is a critical and quantifiable approach for maintaining model relevance in sentiment analysis.

Choosing incremental learning over batch learning

In sentiment analysis systems the linguistic patterns, feature associations, and sentiment expressions evolve over time. This evolution leads to concept drift, where the statistical properties of the input data (features) and the target variable (sentiment labels) change, invalidating the assumptions of a previously trained model (Gama et al., 2014). Therefore, continuously updating the sentiment classifier becomes essential to:

Maintain high prediction accuracy.
Adapt to new vocabulary and shifting semantics.
Detect and respond to early signals of dissatisfaction or positive change.

Batch learning, also called offline learning, refers to training a model on the entire dataset (historical) at once, assuming that the data distribution is static and known in advance. It is effective for stationary datasets that is used in many supervised learning workflows. Retraining a model on large datasets is also computationally expensive and time-consuming (Pérez-Sánchez et al., 2018). Furthermore, Gama et al. (2014), notes that in this approach the current model is discarded to build a new one from scratch using new historical data.

Incremental learning (or online learning) on the other hand allows a machine learning model to learn from streaming data without the need for retraining model on the full dataset. Incremental models support partial updates, retain prior knowledge, and are equipped with mechanisms for concept drift adaptation. This approach is useful in situations where the concept being learned changes over time (Widmer & Kubat, 1996). This is critical for preserving long-term model performance in dynamic environments (Lu et al., 2019). Furthermore, with incremental learning models can incorporate mechanisms like forgetting factors to give increasing importance to newer data, allowing for quick adaptation to new contexts in changing environments. The forgetting function is also relevant for detecting imbalanced data streams such as non-stationary features in incremental learning (Pérez-Sánchez et al., 2018).

Incremental learning is preferred over batch learning in scenarios where data arrives continuously, such as in data streams, because it can update the model as new data becomes available without needing the entire dataset at once. While traditional classifiers like logistic regression lack adaptive capacity, incremental learning algorithms offer a scalable solution (Žliobaitė et al., 2014). Models such as SGDClassifier (Pedregosa et al., 2011), Hoeffding Trees (Hulten et al., 2001), and online Naive Bayes support real-time updates (Zhang, 2004), computational efficiency, and offers robust adaptation to evolving sentiment expressions. Adopting such models shall ensure that sentiment analysis systems remain accurate and contextually relevant as user language and sentiments shift.

References

Gama, J., Žliobaitė, I., Bifet, A., Pechenizkiy, M., & Bouchachia, A. (2014). A survey on concept drift adaptation. ACM Comput. Surv., 46(4), 44:1-44:37. https://doi.org/10.1145/2523813
Hulten, G., Spencer, L., & Domingos, P. (2001). Mining time-changing data streams. Proceedings of the Seventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 97–106. https://doi.org/10.1145/502512.502529
Lu, J., Liu, A., Dong, F., Gu, F., Gama, J., & Zhang, G. (2019). Learning under Concept Drift: A Review. IEEE Transactions on Knowledge and Data Engineering, 31(12), 2346–2363. https://doi.org/10.1109/TKDE.2018.2876857
Lyu, J. C., Han, E. L., & Luli, G. K. (2021). COVID-19 Vaccine–Related Discussion on Twitter: Topic Modeling and Sentiment Analysis. Journal of Medical Internet Research, 23(6), e24435. https://doi.org/10.2196/24435
Pedregosa, F., Pedregosa, F., Varoquaux, G., Varoquaux, G., Org, N., Gramfort, A., Gramfort, A., Michel, V., Michel, V., Fr, L., Thirion, B., Thirion, B., Grisel, O., Grisel, O., Blondel, M., Prettenhofer, P., Prettenhofer, P., Weiss, R., Dubourg, V., … Cournapeau, D. (2011). Scikit-learn: Machine Learning in Python. MACHINE LEARNING IN PYTHON.
Pérez-Sánchez, B., Fontenla-Romero, O., & Guijarro-Berdiñas, B. (2018). A review of adaptive online learning for artificial neural networks. Artificial Intelligence Review, 49(2), 281–299. https://doi.org/10.1007/s10462-016-9526-2
Widmer, G., & Kubat, M. (1996). Learning in the presence of concept drift and hidden contexts. Machine Learning, 23(1), 69–101. https://doi.org/10.1007/BF00116900
Zhang, H. (2004). The Optimality of Naive Bayes.
Žliobaitė, I., Bifet, A., Pfahringer, B., & Holmes, G. (2014). Active Learning With Drifting Streaming Data. IEEE Transactions on Neural Networks and Learning Systems, 25(1), 27–39. https://doi.org/10.1109/TNNLS.2012.2236570
Žliobaitė, I., Bifet, A., Read, J., Pfahringer, B., & Holmes, G. (2015). Evaluation methods and decision theory for classification of streaming data with temporal dependence. Machine Learning, 98(3), 455–482. https://doi.org/10.1007/s10994-014-5441-4

Abhinash Jena

I am an interdisciplinary educator, researcher, and technologist with over a decade of experience in applied coding, educational design, and research mentorship in fields spanning management, marketing, behavioral science, machine learning, and natural language processing. I specialize in simplifying complex topics such as sentiment analysis, adaptive assessments and data visualizatiion. My training approach emphasizes real-world application, clear interpretation of results and the integration of data mining, processing, and modeling techniques to drive informed strategies across academic and industry domains.

Changes in data distribution over time or concept drift

Monitoring feature distribution to detect concept drift

Choosing incremental learning over batch learning

References

Discuss

proofreading