Feature engineering and extraction in sentiment analysis

By Abhinash Jena & Priya Chetty on July 21, 2025

The term “feature” has been used for a long time but has found growing importance in recent years due to the spurt of text communication over various mediums. One of the earliest definitions of ‘feature’ in the context of systems is that of Keck and Kuehn (1998), who defined it as a unit of one or more telecommunication or telecommunication management based capabilities a network provides to a user. However, in the modern era, the meaning has undergone a significant change.

It has become an important part of the data preparation step in machine learning. Reid Turner et al. (1999), classify the definition of feature into three sets based on functionality:

Feature as a s ubset of system requirements: Posits that a feature is a group of related needs or tasks from the system’s requirement list.

EXAMPLE

In a food delivery app, the “Order Tracking” feature groups all requirements related to tracking orders.

Feature as a subset of system implementation: posits that a feature is made up of the actual pieces of code that make it work.

EXAMPLE

The same “Order Tracking” feature might involve several code files that developers wrote to build it.

Feature as an aggregate view: here a feature connects everything related to it across the project, including r equirements, related code, test plans, and user guides.

Dong and Liu (2018), gave a much simpler definition of ‘feature’ in data analytics, machine learning, and data mining. A feature is an attribute or variable used to describe some aspect of individual data objects, for example age and eye colour for persons, grades for students. Features are useful for describing the underlying objects and for characterizing different groups of objects (latent or explicit). The term ‘feature’ is often used synonymously with ‘variable’ and ‘attribute’.

Some aspects of ‘feature’ in the context of systems and data mining is similar to that of statistics. For instance, different types features require different types of analysis:

A categorical feature has discrete values like eye colour being brown, black, grey, blue, or green.
Binary features like 1 for male, 0 for females.
Ordinal features which run in a sequence like income level, USD 10000 > USD 50000> USD 100000
Numerical feature like age which can be an integer like 50, 66, 87, etc.

Features help machines learn patterns

In data mining, features are pieces of data which help computers learn patterns to make predictions. This process of preparing and creating data features for predictions is called ‘feature engineering’. It is technically defined as the practice of constructing suitable features from given features that lead to improved predictive performance (Nargesian et al., 2017). Feature engineering is a foundational step in the machine learning process, crucial for transforming raw data into actionable insights though predictive modeling. Feature engineering significantly impacts the accuracy, interpretability, and generalisability of machine learning models (Omoseebi, 2025).

Feature engineering is used after the data preparation step which involves handling missing values, removing duplicates, detecting outliers, and encoding categorical variables. This is also called as ‘data processing’ and it helps in feature extraction, a process that transforms raw data into meaningful representations. It preserves the key features of the original data while eliminating the irrelevant ones.

EXAMPLE

A pizza review which says “Pizza was delicious but the delivery was slow” contains both positive and negative sentiment. The positive sentiment can be extracted from the word “delicious” whereas negative sentiment is extracted from the word “slow”.

Therefore, understanding the data and defining the objectives are crucial steps in feature extraction and fall under the feature engineering function.

Key steps in feature engineering

Feature selection is a dimensionality reduction process which aims to choose a small subset of relevant features from the original ones by re-moving irrelevant, redundant, or noisy features. It reduces the complexity of the model. This is particularly significant for big datasets. It removes redundancy in the data. It also helps to better grasp the relationships between variables. Popular dimension reduction techniques are correlation analysis, Chi-square test, Recursive Feature Elimination (RFE), or L1 regularization (Lasso) (Wang, Tang and Liu, 2016; Laakel Hemdanou et al., 2024).

Feature transformation refers to steps used for replacing missing features or features that are not valid. It scales, modifies, or converts features before numerical computations for prediction modeling. It is typically done after data cleaning, i.e., performing steps like removing missing values, removing duplicates, detecting outliers, and standardising data type. Common methods of data transformation are Normalization (Scaling), Standardization (Z-Score), Log Transformation, Binning (Bucketing), One-Hot Encoding (converting text into numbers), and Label Encoding (assigning numbers to categories). According to (Nogales and Benalcázar, 2023), feature transformation also involves feature extraction by reducing dimensionality, i.e. creating new features that summarize most of the information contained in the original set of features, but some researchers (Zhang, Zhou and Yao, 2020; Htun, Biehl and Petkov, 2023) mention extraction as a part of the feature creation process.

Feature creation refers to the creation of new features from existing data to help with better predictions.

EXAMPLE

While calculating age from birthdate. Features can be domain-specific using information or existing domain knowledge like “time since last purchase” or interaction that is combining two features to form a new one like age + CIBIL score to create a new category ‘risk level’.

Feature engineering enhances performance

Feature engineering is an important part in the machine learning process. It enhances the performance and interpretability of predictive models. The quality of features greatly affects the outcome of feature engineering, hence particular importance needs to be given to the feature selection process.

Enhancing the performance of a prediction model is the primary goal of feature engineering. Well-engineered features are key to understand ing the underlying patterns in the data and can affect the accuracy of the prediction models. Sometimes data scientists even use engineered features to improve the prediction models without changing the underlying model itself.

EXAMPLE

Objective: To predict housing prices based on a given dataset
Original features in the dataset: Consists of features such as number of bedrooms, number of bathrooms, year of construction, size of the house, and number of garage spaces, locality,
Engineered features: ‘Age of the house’ based on year of construction, bathrooms per bedroom, total rooms, building density, location encoding.

Feature engineering improves data quality by transforming raw, messy, or incomplete data into structured information. It involves techniques like handling missing values, correcting inconsistencies, encoding text or categories into numbers, scaling features to a common range, and creating new features from existing ones to highlight important patterns. This aims to remove noise from the data, reduces errors, improve data accuracy and consistency, and relevance. This improves the quality of the prediction model.

Avoiding overfitting with feature engineering

Data overfitting happens when a machine learning model learns too much from the training data, including noise and tiny details that don’t really matter. Over-fitted models tend to memorize all the data including the noisy data rather than learn from it. As a result, it works perfectly on the training data but performs poorly on new data. Overfitting happens when the dataset is too small in size, has too much noise, is too complex with too many variables, hypotheses, or inputs, or tries to test too many possible options with multiple comparisons (Ying, 2019). Proper feature engineering reduces the chances of overfitting by carefully selecting features that represent the true pattern rather than noise. The outcome is that the model becomes more generalized and robust because of the reduced overfitting and complexity. Overfitting is reduced by using methods such as Lasso regression.

EXAMPLE

Objective: To predict customer churn
Original features in the dataset: Consists of features such as customer’s age, tenure, monthly charges, total charges, number of services subscribed to, type of contract, payment method, and whether the customer has churned or not (1 or 0)
Engineered features: customer’s age, tenure, monthly charges, number of services subscribed to, encoded contract type, encoded payment type.

Complex pattern discovery with feature engineering

Raw data often has content in multiple formats such as text, images, emojis, and numbers, some off which may not be usable for building a prediction model. Feature engineering transforms these complex data types into structured and simple numbers or categorical representations that algorithms can process. For this, techniques such as one-hot coding (coding words in yes/no format) and label encoding (a ssign ing numbers to categories e.g., Red = 1, Blue = 2) are used.

EXAMPLE

Objective: To analyse sentiment from customer reviews
Original features in the dataset: Consists of review ID, review text, rating, review length, and sentiment (positive/ negative/ neutral)
Engineered features: TF-IDF vectors, s entiment scores (compound, positive, negative, neutral), r eview length.
Initial Model Accuracy: 0.60 (using only rating)
Enhanced Model Accuracy: 0.85 (after incorporating text features and other engineered features)

Handling missing data

The term missing data refers to the absence of records or values or observations usually expected to be present in a dataset. Missing data must be addressed during the data preprocessing stage prior to feeding it into the ML model otherwise it will render the model too complex and affect its performance. This may create biased outcomes or predictions. Therefore using feature engineering methods like l istwise deletion(LD), mean, mode, k-nearest neighbor (k-NN), expectation-maximization single imputation (EMSI), logistic regression (LR), SVM, random forest (RF), naïve Bayes (NB) and artificial neural network (ANN) (Makaba and Dogo, 2022).

EXAMPLE

Objective: To predict patient outcomes
Original features in the dataset: patient age, gender, blood pressure, cholesterol level, diabetes, treatment type, and outcome.
Engineered features: patient age, gender, imputed blood pressure, imputed cholesterol level, diabetes, treatment type, and missing indicators.
Initial Model Accuracy: 0.70 (with missing data handling issues).
Enhanced Model Accuracy: 0.78 (after handling missing data).

Leveraging domain knowledge with feature engineering

Feature engineering allows data scientists to incorporate domain knowledge into the model-building process (Ziegler et al., 2024) classify domain knowledge into two types; scientific knowledge, derived from the principles and laws of physics and engineering and expert knowledge, rooted in the experts’ intuition and experience, making it less formal but more flexible. According to them, there is a data-driven feature engineering approach to integrate domain knowledge into the machine learning process which involves data preparation, feature extraction, feature construction, feature selection, and model evaluation. According to (Shaik, Shaik and Priya, 2024), practitioners can create features that capture critical aspects of the data that might not be apparent through automated methods alone. Regression methods like logistic regression are applied along with domain knowledge to improve the output.

EXAMPLE

Objective: To predict employee attrition.
Original features in the dataset: employee age, tenure, job role, monthly income, performance rating, work life balance perception, training hours, attrition outcome (binary; 1 for yes and 0 for no).
Engineered features: employee age, recent promotion, work life balance score, relative salary, performance training interaction, encoded tenure group, job role, attritition.
Initial Model Accuracy: 0.75
Enhanced Model Accuracy: 0.82 (using metrics such as accuracy, precision, recall, and F1-score).

Feature extraction

Feature extraction has been acknowledged as one of the most important fields in artificial intelligence (Ahmed Medjahed, 2015). It forms an important part of the feature engineering process. Feature extraction refers to reducing the amount of data to be processed using dimensionality reduction techniques. To explain it simply, it involves the transformation of input data into a set of features. Each feature is distinct from another and therefore it helps in differentiating between the categories of inputs.

The process also involves removing noise and redundant information from the data. This step is therefore performed after feature selection and feature transformation. Feature extraction is particularly popular in the field of image processing where an image must be represented as an object. Some of the popular feature extraction methods used in text data are bag of words, TF-IDF (Term Frequency – Inverse Document Frequency), n-grams, Word Embeddings (Word2Vec, GloVe, FastText), PCA, and Feature Hashing (Hashing Trick). Bag of Words is one of the most popularly used feature extraction methods used in NLP. It simply counts how many times a word appears in a text while ignoring grammar and word order (Suhaidi, Kadir and Tiun, 2021).

Bag of words (BoW)

Bag of Words (BoW) is used to convert text data into numerical features by counting the frequency of words in the corpus (Ahmed Medjahed, 2015). It ignores grammar and word order. Therefore BoW creates a list of all unique words in the dataset and counts how many times each word appears in a given document. The BoW model is mainly used in document classification based on frequency of occurrence of each word (Qader, M.Ameen and Ahmed, 2019).

Case: Sample reviews
Review text: “I love pizza” and “I hate pizza”.
Bag of words: I, love, hate, pizza
Analysis:

Word	Review 1 (“I love pizza”)	Review 2 (“I hate pizza”)
I	1	1
love	1	0
hate	0	1
pizza	1	1

Classification:
Review 1 → [1, 1, 0, 1] → Labeled as Positive (1)
Review 2 → [1, 0, 1, 1] → Labeled as Negative (0)

Term Frequency-Inverse Document Frequency (TF-IDF)

TF-IDF weighs words based on their frequency in the document and across the corpus. It is the most commonly used weighting metrics for measuring the relationship of words and documents (Zhang, Zhou and Yao, 2020). TF-IDF works by determining the relative frequency of words in a specific document compared to the inverse proportion of that word over the entire document corpu s. For example (S, George and Varghese, 2019) :

Total words in a document: 100
Word “paddy” appears = 3 times
So, TF = 3 ÷ 100 = 0.03
Total documents = 10 million
Word “paddy” appears in 1, 000 documents
So, IDF = log(10, 000, 000 ÷ 1, 000) → log(10, 000) = 4
TF-IDF = 0.03 × 4 = 0.12
Score of 0.12 reveals moderate level of importance of the word “paddy” in this document because although it appears frequently in this document, it is not occurring frequently in other documents.

N-Grams

N-grams are used to estimate the probability or occurrence of a sequence of words in a corpus (Kallmeyer, 2016). In other words, n-grams are sequences of elements (words or characters) as they appear in texts. “N” corresponds to the number of elements in a sequence and can be a unigram (single word), bigram (two words), or trigram (three words).

EXAMPLE

Sentence: “I love pizza”
Unigrams (n=1): “I”, “love”, “pizza”
Bigrams (n=2): “I love”, “love pizza”
Trigrams (n=3): “I love pizza”

N-grams are often compared to bag of words but they capture context better than just counting individual words. They form an important part of text mining because they turn free text into numerical variables which can then by analysed using statistical techniques (Schonlau, Guenther and Sucholutsky, 2017).

Word Embeddings

Word e mbeddings represent words as dense vectors to capture semantic relationships (e.g., Word2Vec, GloVe). This conversion of words into vectors is necessary because mathematical algorithms need numeric inputs to work with. While other forms of data like images and audio naturally come in the form of rich, high-dimensional vectors (i.e. pixel intensity for images and power spectral density coefficients for audio data), words are treated as discrete atomic symbols (Mandelbaum and Shalev, 2016). Typically, words with similar meanings have similar numbers. For example:

Review: “The food was delicious and tasty.”
Words in review: food, delicious, tasty
Word embedding:

Word	Word Embedding (Example Numbers)
food	[0.5, 0.7, 0.2]
delicious	[0.6, 0.8, 0.3]
tasty	[0.59, 0.78, 0.31]

Result: “delicious” and “tasty” have very similar embeddings → They have similar meanings in reviews.

Hashing Vectorizer

In Natural Language Processing (NLP), hashing Vectorizer is used in case of large and complex datasets which need to be analysed quickly. Hashing Vectoriser is a text feature extraction technique used in NLP tasks to convert text files into a matrix of token occurrences. It uses a hash function to assign indices to words, allowing each word to be processed independently (Roshan, Bhacho and Zai, 2023). In simpler words, it quickly converts text into numerical features by:

Turning each word into a hashed number (using a math function).
Mapping those numbers into fixed positions in a vector.
Counting how many times words appear (frequency).

EXAMPLE

Review: “The pizza was delicious and hot and tasty”
Hashing Words to Positions :

Word	Hashed Position (Example)
pizza	3
is	4
tasty	7
and	1
delicious	2
hot	5

Vector after hashing: [0, 2, 1, 1, 1, 1, 0, 1, 0, 0]
“and” appears twice in the sentence, that’s why it has been vectorized at 2.
Point to remember is that vectorized numbers always start from 0, even though the hashed position numbering starts from 1. This is because in most programming languages (like Python), counting starts from 0.

Linguistic features

A primary challenge in natural language processing (NLP) is to enable computers to derive meaning from human or natural language input. Part-of-speech (POS) tagging is a common solution which involves assigning appropriate POS tag to each word in a sentence, e.g. verb, adverb, noun, pronoun etc. The process of POS tagging involves four main steps:

reading the input sentence,
tokenizing the sentence into words,
using POS Tagging methods, and
deriving the tagged output for further analysis.

POS tagging is also used to introduce the relationship of one word with its previous and next word (Adhvaryu and Balani, 2015).

EXAMPLE

Review: “The pizza was very tasty.”
POS Tagging:

Word	POS Tag (Meaning)
The	Determiner (DT)
pizza	Noun (NN)
was	Verb (VBD)
very	Adverb (RB)
tasty	Adjective (JJ)

Outcome: POS Tagging has identified the grammatical role of each word.
Purpose: it helps to understand the meaning and purpose of each word.

To perform sentiment analysis, the focus will be on adjectives or adverbs like “tasty”, “bad”, “amazing”. Similarly, if the purpose is to recognize the named entities, the focus will be on nouns to d etect names, places, brands.

Lexicon based features

General Purpose Emotion Lexicons (GPELs) associate words with emotion categories and are used for emotion analysis of text. Emotion analysis is a method for deciphering a text to identify the feelings conveyed within it (Bandhakavi et al., 2016). As a part of the feature extraction process in NLP, l exicon-based features are numbers (features) derived from sentiment dictionaries (also called lexicons) that contain predefined scores for words based on their sentiment or emotion. A sentiment lexicon contains words along with their sentiment scores. It checks the text against this lexicon and finds which words from the text exist in the lexicon and collects their scores and use them as features.

EXAMPLE

Review: “The food was tasty but expensive.”
Lexicon: “tasty” → +2, “expensive” → –1
New features generated:
Total sentiment score → (+2) + (–1) = +1
Count of positive words → 1
Count of negative words → 1

Lexicon based features is one of the most commonly used method to extract sentiment scores in reviews, i.e. sentiment analysis.

Deep Learning and Transformers

The article so far has discussed traditional feature extraction methods like Bag of Words, TF-IDF, and Lexicon-based approaches which rely on human defined rules and simple frequency counts. However, these methods often fail to capture the deeper meaning and context of word and are effective only for basic tasks. Traditional methods cannot detect complex sentences or sarcasm.

EXAMPLE

Review: “I thought the food would be great, but it wasn’t.”
Traditional methods: Bag of Words or TF-IDF only count words like “great” and “food” and would likely think this is a positive review because of the word “great.” They don’t understand sentence structure or word relationships like “but” which changes the sentiment here.

This is where deep learning comes in. Deep learning is a powerful technology which has been used in a wide variety of analytical tasks, such as image classification and speech recognition. It can describe complex relations. In the above example, deep learning models can capture that “but” signals a shift to negative sentiment so it would classify the review as “negative”. Deep learning techniques like transformers s ignificantly enhance feature extraction.

Older deep learning models like LSTM, RNN and CNN have limited contextual understanding and have other limitations like slow processing speeds. RNN r eads one word at a time and is therefore suitable only for short sentences. LSTM was designed as a development to RNN by reading long sentences, but it still reads word by word so it takes time (Fan et al., 2019). However the new deep learning method called Transformers offers a solution to the limitations of sequence-to-sequence (seq-2-seq) architectures, as they work in parallel with multiple words (Islam et al. , 2024). This means that transformer models focus on important words in the whole sentence at once, instead of going word by word. This makes them very powerful for language tasks like translation or chatbots. The parallel processing allows them to detect underlying sentiments better than traditional models.

EXAMPLE

Review: “The food at the restaurant was not good.”
Transformer model process: it notices that “good” usually sounds positive. But the word “not” changes the meaning, so it classifies the review as ‘negative’.

References

Adhvaryu, N. and Balani, P. (2015) ‘Survey: Part-Of-Speech Tagging in NLP’, Science and Technology [Preprint], (1).
Ahmed Medjahed, S. (2015) ‘A Comparative Study of Feature Extraction Methods in Images Classification’, International Journal of Image, Graphics and Signal Processing , 7(3), pp. 16–23. Available at: https://doi.org/10.5815/ijigsp.2015.03.03.
Bandhakavi, A. et al. (2016) ‘Lexicon based feature extraction for emotion text classification.’, Pattern recognition letters , 93. Available at: https://doi.org/10.1016/j.patrec.2016.12.009.
Dong, G. and Liu, H. (2018) Feature Engineering for Machine Learning and Data Analytics . CRC Press.
Fan, C. et al. (2019) ‘Deep learning-based feature engineering methods for improved building energy prediction’, Applied Energy , 240, pp. 35–45. Available at: https://doi.org/10.1016/j.apenergy.2019.02.052.
Htun, H.H., Biehl, M. and Petkov, N. (2023) ‘Survey of feature selection and extraction techniques for stock market prediction’, Financial Innovation , 9(1), p. 26. Available at: https://doi.org/10.1186/s40854-022-00441-7.
Islam, S. et al. (2024) ‘A comprehensive survey on applications of transformers for deep learning tasks’, Expert Systems with Applications , 241, p. 122666. Available at: https://doi.org/10.1016/j.eswa.2023.122666.
Kallmeyer, L. (2016) Machine Learning for natural language processing – N-grams and language models . Available at: https://user.phil-fak.uni-duesseldorf.de/~kallmeyer/MachineLearning/n-grams-language-models.pdf (Accessed: 4 July 2025).
Keck, D.O. and Kuehn, P.J. (1998) ‘The feature and service interaction problem in telecommunications systems: a survey’, IEEE Transactions on Software Engineering , 24(10), pp. 779–796. Available at: https://doi.org/10.1109/32.729680.
Laakel Hemdanou, A. et al. (2024) ‘Comparative analysis of feature selection and extraction methods for student performance prediction across different machine learning models’, Computers and Education: Artificial Intelligence , 7, p. 100301. Available at: https://doi.org/10.1016/j.caeai.2024.100301.
Makaba, T. and Dogo, E. (2022) A comparison of strategies for missing values in data on machine learning classification algorithms . University of Johannesburg Institutional Repository. Available at: https://core.ac.uk/reader/286396130 (Accessed: 3 July 2025).
Mandelbaum, A. and Shalev, A. (2016) ‘Word Embeddings and Their Use In Sentence Classification Tasks’. arXiv. Available at: https://doi.org/10.48550/arXiv.1610.08229.
Nargesian, F. et al. (2017) ‘Learning Feature Engineering for Classification’, in Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence . Twenty-Sixth International Joint Conference on Artificial Intelligence , Melbourne, Australia: International Joint Conferences on Artificial Intelligence Organization, pp. 2529–2535. Available at: https://doi.org/10.24963/ijcai.2017/352.
Nogales, R.E. and Benalcázar, M.E. (2023) ‘Analysis and Evaluation of Feature Selection and Feature Extraction Methods’, International Journal of Computational Intelligence Systems , 16(1), p. 153. Available at: https://doi.org/10.1007/s44196-023-00319-1.
Omoseebi, A. (2025) ‘Data Preparation and Feature Engineering’, ResearchGate [Preprint]. Available at: https://www.researchgate.net/publication/389860294_Data_Preparation_and_Feature_Engineering (Accessed: 3 July 2025).
Qader, W.A., M.Ameen, M. and Ahmed, B.I. (2019) ‘An Overview of Bag of Words: Importance, Implementation, Applications, and Challenges’, in. 2019 International Engineering Conference (IEC) , Muscat. Available at: https://www.researchgate.net/publication/338511771_An_Overview_of_Bag_of_WordsImportance_Implementation_Applications_and_Challenges (Accessed: 4 July 2025).
Reid Turner, C. et al. (1999) ‘A conceptual basis for feature engineering’, Journal of Systems and Software , 49(1), pp. 3–15. Available at: https://doi.org/10.1016/S0164-1212(99)00062-X.
Roshan, R., Bhacho, I.A. and Zai, S. (2023) ‘Comparative Analysis of TF–IDF and Hashing Vectorizer for Fake News Detection in Sindhi: A Machine Learning and Deep Learning Approach’, Engineering Proceedings , 46(1), p. 5. Available at: https://doi.org/10.3390/engproc2023046005.
S, A.K., George, N. and Varghese, S.M. (2019) ‘A TF-IDF Method for Automatic Query Answering’, IJRTI , 4(1).
Schonlau, M., Guenther, N. and Sucholutsky, I. (2017) ‘Text Mining with n-gram Variables’, The Stata Journal , 17(4), pp. 866–881. Available at: https://doi.org/10.1177/1536867X1801700406.
Shaik, N., Shaik, A.S. and Priya, D.C.K. (2024) ‘Elevating Machine Learning Performance: The Power of Feature Engineering’, 2(6).
Suhaidi, M., Kadir, R.A. and Tiun, S. (2021) ‘A REVIEW OF FEATURE EXTRACTION METHODS ON MACHINE LEARNING’, JOURNAL OF INFORMATION SYSTEM AND TECHNOLOGY MANAGEMENT (JISTM) , 6(22).
Wang, S., Tang, J. and Liu, H. (2016) ‘Feature Selection’, in Encyclopedia of Machine Learning and Data Mining . Available at: https://doi.org/10.1007/978-1-4899-7502-7_101-1.
Ying, X. (2019) ‘An Overview of Overfitting and its Solutions’, in Conf. Series 1168 . IOP Conf. Series: Journal of Physics , IOP.
Zhang, Y., Zhou, Y. and Yao, J. (2020) ‘Feature Extraction with TF-IDF and Game-Theoretic Shadowed Sets’, in M.-J. Lesot et al. (eds) Information Processing and Management of Uncertainty in Knowledge-Based Systems . Cham: Springer International Publishing, pp. 722–733. Available at: https://doi.org/10.1007/978-3-030-50146-4_53.
Ziegler, J. et al. (2024) ‘An Approach to Integrate Domain Knowledge into Feature Engineering to Enhance Data-Driven Surrogate Models of Simulations’, in Procedia CIRP . 57th CIRP Conference on Manufacturing Systems , Science Direct.

Abhinash Jena
Priya Chetty

I am an interdisciplinary educator, researcher, and technologist with over a decade of experience in applied coding, educational design, and research mentorship in fields spanning management, marketing, behavioral science, machine learning, and natural language processing. I specialize in simplifying complex topics such as sentiment analysis, adaptive assessments and data visualizatiion. My training approach emphasizes real-world application, clear interpretation of results and the integration of data mining, processing, and modeling techniques to drive informed strategies across academic and industry domains.

I am a management graduate with specialisation in Marketing and Finance. I have over 12 years' experience in research and analysis. This includes fundamental and applied research in the domains of management and social sciences. I am well versed with academic research principles. Over the years i have developed a mastery in different types of data analysis on different applications like SPSS, Amos, and NVIVO. My expertise lies in inferring the findings and creating actionable strategies based on them.

Over the past decade I have also built a profile as a researcher on Project Guru's Knowledge Tank division. I have penned over 200 articles that have earned me 400+ citations so far. My Google Scholar profile can be accessed here.

I now consult university faculty through Faculty Development Programs (FDPs) on the latest developments in the field of research. I also guide individual researchers on how they can commercialise their inventions or research findings. Other developments im actively involved in at Project Guru include strengthening the "Publish" division as a bridge between industry and academia by bringing together experienced research persons, learners, and practitioners to collaboratively work on a common goal.

Features help machines learn patterns

Key steps in feature engineering

Feature engineering enhances performance

Avoiding overfitting with feature engineering

Complex pattern discovery with feature engineering

Handling missing data

Leveraging domain knowledge with feature engineering

Feature extraction

Bag of words (BoW)

Term Frequency-Inverse Document Frequency (TF-IDF)

N-Grams

Word Embeddings

Hashing Vectorizer

Linguistic features

Lexicon based features

Deep Learning and Transformers

References

Discuss

proofreading