Compare ChatGPT to Machine Learning Techniques for Sentiment Analysis in 2023 | by Courtlin Holt-Nguyen

[ad_1]

Source: Image created with Midjourney by the author

Check out my other articles in this series on using ChatGPT for sentiment analysis of customer product reviews here:

1. Sentiment Analysis with ChatGPT, OpenAI and Python — Use ChatGPT to build a sentiment analysis AI system for your business

2. Analyze Customer Product Reviews Using ChatGPT OpenAi API: A Step-by-Step Guide To Extracting Business Insights From Sentiment Analysis Part 1

Yes! ChatGPT Is A Powerful AI Model for Sentiment Analysis

Sentiment analysis with natural language processing (NLP) enable artificial intelligence systems to understand the opinions and emotions contained in text. This has become increasing important and accurate over the past few years. ChatGPT, developed by OpenAI, is a language model that has drawn tremendous attention over the last few months for its advanced natural language processing capabilities. But is it really better than existing machine learning NLP techniques? The short answer is YES! This article will offer a demonstration of ChatGPT’s significantly increased accuracy versus pre-ChatGPT NLP techniques. The full Python code you can copy and run yourself is included below.

How to Use The ChatGPT API AI Model for Sentiment Analysis

Use ChatGPT API: Integration and Usage

To use the ChatGPT for sentiment analysis, you can use Python to send requests to the API provided by OpenAI and get back the sentiment classification in a few seconds. This allows you to perform sentiment analysis using ChatGPT on a large amount of input text, like social media posts and customer reviews, rather than copying and pasting each entry into the ChatGPT application over and over again. The key to getting this to work is using the correct prompt to tell ChatGPT how to classify your text and then setting limits on what type and amount of information you want the API to return in its response. You also have to put in place some safeguards to handle any possible API overloads and timeouts that may occur as a result of so many people and apps now trying to use the ChatGPT API.

ChatGPT has been trained on a massive amount of text data, allowing it to provide a sentiment analysis performance comparable to fine-tuned machine learning models. The generative pre-trained transformer architecture, combined with reinforcement learning techniques, makes ChatGPT a powerful AI model for sentiment analysis tasks. Due to the massive amount of training text that was used, ChatGPT already knows how to interpret language without all of the tedious text pre-processing steps that traditional machine learning models require. What’s amazing is that you only need to know how to send information to the API in order to get back a state-of-the-art level of sentiment analysis accuracy and you don’t need a massive amount of labeled training data reviews to train your model in the first place like you do with an ML model.

For the purposes of this tutorial, we will be comparing ChatGPT to the following ML algorithms:

Logistic Regression
Multinomial Naive Bayes
Random Forest
Gradient Boosting
Decision Tree
AdaBoost
CatBoost

For this test, I will use the IMDB Large Movie Review Dataset of 50,000 labeled reviews provided by Andrew Maas et al. at Stanford in their paper, Learning Word Vectors for Sentiment Analysis. Here’s the link to the dataset website. This file is called “IMDB_Dataset_TRAIN.csv” in the code below. The “Reviews” column contains the movie reviews, and the “Sentiments” column has the labeled sentiment classification.

I randomly removed 200 reviews from the dataset before training to use as a validation dataset for the ML models and ChatGPT. This file is called “IMDB_Dataset_VALIDATE.csv” in the code below.

First, install the required additional Python libraries

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as snsimport nltk
import gensim
from sklearn.model_selection import train_test_split
from sklearn.feature_extraction.text import CountVectorizer, TfidfVectorizer
from sklearn.linear_model import LogisticRegression
from sklearn.naive_bayes import MultinomialNB
from sklearn.svm import SVC
from sklearn.ensemble import RandomForestClassifier, GradientBoostingClassifier
from catboost import CatBoostClassifier

If all goes well, this is what you’ll see:

Output of the code block showing successful loading of the dataset — Success, the dataset has been loaded

Next, load the dataset called, IMDB_Dataset_TRAIN.csv, from the temporary storage on Google Colab.

Note: Uploading the TRAIN csv to Colab is very slow but easy to explain so that’s what I’ve done here. A better choice is to mount your Google drive and get the files that way.

# Load the dataset
def load_data(file_path):
"""
Load dataset from CSV, JSON, or other file formats.
Args:
file_path (str): File path to the dataset.Returns:
data (pd.DataFrame): Loaded dataset as a pandas DataFrame.
"""
file_extension = file_path.split(".")[-1].lower()
if file_extension == "csv":
data = pd.read_csv(file_path)
elif file_extension == "json":
data = pd.read_json(file_path)
else:
raise ValueError("Unsupported file format. Please use CSV or JSON.")
return data
# B. Understand the data's structure
def data_overview(data):
"""
Provide an overview of the dataset structure.
Args:
data (pd.DataFrame): Input dataset as a pandas DataFrame.
Returns:
None
"""
print("Data Overview:")
print("Shape of the dataset:", data.shape)
print("\nFirst 5 rows of the dataset:")
print(data.head())
# Use the functions 
file_path = "/content/IMDB_Dataset_TRAIN.csv"
data = load_data(file_path)
data_overview(data)

Let’s do some quick data visualization to understand the dataset, starting with a bar chart and a pie chart. The training dataset is balanced nearly 50/50 between positive and negative reviews.

# Visualize the data
def visualize_data_distribution(data, target_column):
"""
Visualize the distribution of sentiment classes using bar and pie charts.
Args:
data (pd.DataFrame): Input dataset as a pandas DataFrame.
target_column (str): Column name of the target sentiment labels.Returns:
None
"""
sentiment_counts = data[target_column].value_counts()
# Bar chart
plt.figure(figsize=(8, 4))
sns.barplot(x=sentiment_counts.index, y=sentiment_counts.values)
plt.title("Sentiment Distribution (Bar Chart)")
plt.xlabel("Sentiments")
plt.ylabel("Counts")
plt.show()
# Pie chart
plt.figure(figsize=(6, 6))
plt.pie(sentiment_counts.values, labels=sentiment_counts.index, autopct='%1.1f%%', startangle=90)
plt.title("Sentiment Distribution (Pie Chart)")
plt.axis('equal')
plt.show()

Here are the output descriptive summary graphs

Bar chart showing the positive and negative reviews are balanced in the dataset — The dataset is balanced between positive and negative reviews

And now a word cloud for good measure.

# B. Create a word cloud
def visualize_word_cloud(text_data, title):
"""
Generate a word cloud visualization of the text data.
Args:
text_data (pd.Series): Text data as a pandas Series.
title (str): Title for the word cloud plot.Returns:
None
"""
from wordcloud import WordCloud
text = " ".join(review for review in text_data)
wordcloud = WordCloud(background_color="white", width=800, height=400).generate(text)
plt.figure(figsize=(10, 5))
plt.imshow(wordcloud, interpolation="bilinear")
plt.axis("off")
plt.title(title)
plt.show()
# Usage example:
target_column = "Sentiments"  # Replace this with the actual target column name in your dataset
text_column = "Reviews"  # Replace this with the actual text column name in your dataset
visualize_data_distribution(data, target_column)
visualize_word_cloud(data[text_column], "Word Cloud for Sentiment Analysis")

Here’s the word cloud

A word cloud of all the top words used in the dataset — Everyone loves word clouds

Now, let’s have some quantitative EDA of the data. First, calculate summary statistics for the dataset. Next, look at the distribution of labels (i.e., how many positive vs. negative) and then identify the most common words in the dataset (before we do any text preprocessing to remove useless words.)

#Exploratory Data Analysis (EDA)# Calculate summary 
def calculate_summary_statistics(data, target_column):
"""
Calculate summary statistics for the target sentiment labels.
Args:
data (pd.DataFrame): Input dataset as a pandas DataFrame.
target_column (str): Column name of the target sentiment labels.
Returns:
summary_statistics (pd.DataFrame): Summary statistics as a pandas DataFrame.
"""
summary_statistics = data[target_column].describe()
return summary_statistics
# B. Look at the distribution count of the labels
def analyze_sentiment_distribution(data, target_column):
"""
Analyze the distribution of sentiment labels.
Args:
data (pd.DataFrame): Input dataset as a pandas DataFrame.
target_column (str): Column name of the target sentiment labels.
Returns:
sentiment_distribution (pd.Series): Sentiment distribution as a pandas Series.
"""
sentiment_distribution = data[target_column].value_counts(normalize=True)
return sentiment_distribution
# C. Identify the top words in the dataset (before any preprocessing)
def identify_top_words(data, text_column, n=10):
"""
Identify the top n most frequent words in the text data.
Args:
data (pd.DataFrame): Input dataset as a pandas DataFrame.
text_column (str): Column name of the text data.
n (int): Number of top words to identify.
Returns:
top_words (pd.Series): Top n most frequent words as a pandas Series.
"""
from collections import Counter
words = []
for text in data[text_column]:
for word in text.split():
words.append(word.lower())
counter = Counter(words)
top_words = pd.Series(counter.most_common(n))
top_words.index = top_words.index + 1
return top_words
# Call the functions 
summary_statistics = calculate_summary_statistics(data, target_column)
sentiment_distribution = analyze_sentiment_distribution(data, target_column)
top_words = identify_top_words(data, text_column, n=10)
print("Summary Statistics:")
print(summary_statistics)
print("\nSentiment Distribution:")
print(sentiment_distribution)
print("\nTop 10 Most Frequent Words:")
print(top_words)

This is the output

Table showing the output summary statistics for the dataset — Summary statistics for our dataset

And now we need to preprocess all of the text to clean it up so it can be fed into the machine learning models. This requires removing punctuation, removing duplicate words, determining the word stem of similar words, etc. Google each step in the process if you want to know why it’s being done. Honestly, this is a tedious process with many steps that are prone to error. As you’ll see later, using ChatGPT for sentiment analysis skips all of this 🙂

#Text Preprocessing# Tokenize the text
def tokenize_text(text):
"""
Tokenize the input text.
Args:
text (str): Input text.
Returns:
tokens (list): List of tokens.
"""
from nltk.tokenize import word_tokenize
tokens = word_tokenize(text)
return tokens
# Remove common stopwords (i.e. low value words)
def remove_stopwords_punctuation(tokens, language='english'):
"""
Remove stopwords and punctuation from the input tokens.
Args:
tokens (list): List of tokens.
language (str): Language for the stopwords.
Returns:
filtered_tokens (list): List of filtered tokens.
"""
from nltk.corpus import stopwords
from string import punctuation
stop_words = set(stopwords.words(language))
filtered_tokens = [token for token in tokens if token.lower() not in stop_words and token not in punctuation]
return filtered_tokens
# Performing stemming to avoid duplicating similar words
def perform_stemming_lemmatization(tokens, stemming=True, lemmatization=True):
"""
Perform stemming and/or lemmatization on the input tokens.
Args:
tokens (list): List of tokens.
stemming (bool): Whether to perform stemming.
lemmatization (bool): Whether to perform lemmatization.
Returns:
processed_tokens (list): List of processed tokens.
"""
from nltk.stem import PorterStemmer, WordNetLemmatizer
processed_tokens = tokens
if stemming:
stemmer = PorterStemmer()
processed_tokens = [stemmer.stem(token) for token in processed_tokens]
if lemmatization:
lemmatizer = WordNetLemmatizer()
processed_tokens = [lemmatizer.lemmatize(token) for token in processed_tokens]
return processed_tokens
# Vectorize the text using TF-IDF
def vectorize_text(text_data, method='tfidf'):
"""
Vectorize the input text data using the specified method.
Args:
text_data (pd.Series): Text data as a pandas Series.
method (str): Vectorization method ('count' for CountVectorizer, 'tfidf' for TfidfVectorizer).
Returns:
vectorized_text (sparse matrix): Vectorized text data as a sparse matrix.
vectorizer (object): The vectorizer object used for the transformation.
"""
if method == 'count':
vectorizer = CountVectorizer()
elif method == 'tfidf':
vectorizer = TfidfVectorizer()
else:
raise ValueError("Invalid method. Use 'count' for CountVectorizer or 'tfidf' for TfidfVectorizer.")
vectorized_text = vectorizer.fit_transform(text_data)
return vectorized_text, vectorizer
# Call the functions defined above:
nltk.download('punkt')
nltk.download('stopwords')
nltk.download('wordnet')
sample_text = data.loc[0, 'Reviews']  # Replace this with the actual text column name in your dataset
tokens = tokenize_text(sample_text)
filtered_tokens = remove_stopwords_punctuation(tokens)
processed_tokens = perform_stemming_lemmatization(filtered_tokens)
print("Sample Text:")
print(sample_text)
print("\nTokens:")
print(tokens)
print("\nFiltered Tokens:")
print(filtered_tokens)
print("\nProcessed Tokens:")
print(processed_tokens)
vectorized_text, vectorizer = vectorize_text(data[text_column], method='tfidf')
print("\nVectorized Text:")
print(vectorized_text)

Finally, it’s time to train the models! Everyone’s favorite part. There’s not much to do other than let your machine run and run and run to try the various models. Once the best model is found, its name and evaluation score (F1 in this case) are displayed.

from tqdm import tqdm# Model Selection and Training
from sklearn.neighbors import KNeighborsClassifier
from sklearn.tree import DecisionTreeClassifier
from sklearn.neural_network import MLPClassifier
from sklearn.linear_model import SGDClassifier
from sklearn.ensemble import AdaBoostClassifier
# A. Split dataset into training and testing sets
X = vectorized_text
y = data[target_column]
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Define the Candidate Machine Learning Algorithms
ml_models = 
'Logistic Regression': LogisticRegression(random_state=42, n_jobs=-1),
'Multinomial Naive Bayes': MultinomialNB(),
'Random Forest': RandomForestClassifier(random_state=42, n_jobs=-1),
'Gradient Boosting': GradientBoostingClassifier(random_state=42), 
'Decision Tree': DecisionTreeClassifier(random_state=42),
'AdaBoost': AdaBoostClassifier(random_state=42),
'CatBoost': CatBoostClassifier(verbose=0, random_state=42, task_type='CPU', thread_count=-1)
Section VIII: Model Evaluation
# A. Function: calculate_evaluation_metrics
def calculate_evaluation_metrics(y_true, y_pred):
"""
Calculate evaluation metrics for the model predictions.
Args:
y_true (array-like): True labels.
y_pred (array-like): Predicted labels.
Returns:
metrics (dict): Dictionary of evaluation metrics.
"""
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score
metrics = 
'accuracy': accuracy_score(y_true, y_pred),
'precision': precision_score(y_true, y_pred, average='weighted'),
'recall': recall_score(y_true, y_pred, average='weighted'),
'f1_score': f1_score(y_true, y_pred, average='weighted')

return metrics
# B. Function: visualize_evaluation_results
def visualize_evaluation_results(model_performance):
"""
Visualize the evaluation results of the models.
Args:
model_performance (dict): Dictionary of model performance results.
Returns:
None
"""
metrics_df = pd.DataFrame(model_performance).T
plt.figure(figsize=(10, 6))
sns.barplot(x=metrics_df.index, y=metrics_df['f1_score'])
plt.title("Model Performance Comparison")
plt.xlabel("Models")
plt.ylabel("F1 Score")
plt.show()

The function definition prep work is done, now it’s time to fit the ML models and evaluate how well they predict on the test dataset.

from tqdm import tqdm
import time# Fit the models and use them to predict on the test dataset
from retry import retry
@retry(tries=3, delay=2)  # Retries up to 3 times with 2 seconds delay between each retry
def fit_and_predict(model, X_train, y_train, X_test):
model.fit(X_train, y_train)
y_pred = model.predict(X_test)
return y_pred
# Evaluate ML models
model_performance = 
completed_models = []
progress_bar = tqdm(ml_models.items(), desc="Evaluating models", total=len(ml_models))
for model_name, model in progress_bar:
progress_bar.set_description(f"Evaluating model_name \n")
start_time = time.time()
try:
y_pred = fit_and_predict(model, X_train, y_train, X_test)
except Exception as e:
print(f"Error occurred while fitting model_name: str(e)")
continue
fit_time = time.time() - start_time
fit_time_minutes = fit_time / 60
model_performance[model_name] = calculate_evaluation_metrics(y_test, y_pred)
completed_models.append(model_name)
print(f"\n Time taken to fit model_name: fit_time_minutes:.2f minutes")
# Visualize the evaluation results
visualize_evaluation_results(model_performance)
# Section IX: Model Selection and Optimization
# A. Select best performing model
best_model_name = max(model_performance, key=lambda x: model_performance[x]['f1_score'])
best_model = ml_models[best_model_name]
print("Best Model:", best_model_name)
print("Performance Metrics:")
for metric, value in model_performance[best_model_name].items():
print(f"metric: value:.4f")

Here are the performance results for the best model (logistic regression)

Best Model: Logistic Regression

Performance Metrics:

Accuracy: 0.9023

Precision: 0.9027

Recall: 0.9023

F1_score: 0.9023

bar chart of ML model performance before tuning — ML Model Performance Chart (before hyperparameter tuning)

90% accuracy isn’t bad!

NLP models created with the default parameters can be good. But sometimes hyperparameter tuning can increase their performance enough to make it worth the time and effort required to tune. Tuning can take hours, depending on how many parameters you want to test, so this code block below will let us specify a max tuning time of 1 hour per model.

import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics import f1_score, accuracy_score
from sklearn.model_selection import GridSearchCV
from sklearn.linear_model import LogisticRegression
from sklearn.naive_bayes import MultinomialNB
from sklearn.ensemble import RandomForestClassifier
from sklearn.ensemble import GradientBoostingClassifier
from sklearn.neighbors import KNeighborsClassifier
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import AdaBoostClassifier
from catboost import CatBoostClassifierfrom joblib import Parallel, delayed
import multiprocessing
import time
# Determine how many CPU cores are available to use
num_cores = multiprocessing.cpu_count()
print(num_cores)

Here you can set which ML models you want to use for hyperparameter tuning. If a model is obviously bad or does not handle large datasets well (e.g., Support Vector Machines or K-Means), then you can exclude it from the tuning process.

# Define the models and their hyperparameters
ml_models = 
'Logistic Regression': LogisticRegression(random_state=42, n_jobs=-1),
'Multinomial Naive Bayes': MultinomialNB(),
'Random Forest': RandomForestClassifier(random_state=42, n_jobs=-1),
'Gradient Boosting': GradientBoostingClassifier(random_state=42),
'Decision Tree': DecisionTreeClassifier(random_state=42),
'AdaBoost': AdaBoostClassifier(random_state=42),
'CatBoost': CatBoostClassifier(verbose=0, random_state=42, task_type='CPU', thread_count=-1)

This is why your hyperparameter tuning can take forever! The more parameters you want to test with Grid Search, the longer it takes to run.

# Hyperparameter grids for each model
hyperparameters = 
'Logistic Regression': 
'C': [0.001, 0.01, 0.1, 1, 10, 100],
'penalty': ['l1', 'l2', 'elasticnet', 'none']
,
'Multinomial Naive Bayes': 
'alpha': [0.001, 0.01, 0.1, 1, 10, 100]
,
'Random Forest': 
'n_estimators': [10, 50],
'max_depth': [None, 10],
'min_samples_split': [2, 5],
'min_samples_leaf': [1, 2]
,
'Gradient Boosting': 
'n_estimators': [10, 50, 100, 200],
'learning_rate': [0.001, 0.01, 0.1, 1],
'max_depth': [3, 5, 10]
,
'Decision Tree': 
'max_depth': [None, 10, 20, 30],
'min_samples_split': [2, 5, 10],
'min_samples_leaf': [1, 2, 4]
, 
'AdaBoost': 
'n_estimators': [10, 50, 100, 200],
'learning_rate': [0.001, 0.01, 0.1, 1]
,
'CatBoost': 
'iterations': [50],
'learning_rate': [0.001, 0.01],
'depth': [4, 6],
'l2_leaf_reg': [1, 3]

Create the function that will do the actual fitting and tuning of each model and save the results to a list for later comparison.

# Define the fit_and_tune_model functionimport time
def fit_and_tune_model(model_name, model, hyperparameters, X_train, y_train, X_test, y_test):
start_time = time.time()
if model_name in hyperparameters:
grid_search = GridSearchCV(model, hyperparameters[model_name], scoring='f1_weighted', cv=5, n_jobs=-1)
grid_search.fit(X_train, y_train)
best_params = grid_search.best_params_
optimized_model = grid_search.best_estimator_
else:
optimized_model = model
optimized_model.fit(X_train, y_train)
y_pred = optimized_model.predict(X_test)
f1 = f1_score(y_test, y_pred, average='weighted')
accuracy = accuracy_score(y_test, y_pred)
end_time = time.time()
tuning_time = end_time - start_time
return optimized_model, f1, accuracy, tuning_time

To prevent the hyperparameter tuning from inadvertently getting out of hand, I’ve created a timeout function so any model that hasn’t finished tuning after 1 hour will be skipped and not considered as a possible “best model”. Feel free to adjust the timeout limit to suit your needs. If there are one or two models and you want to go crazy with parameters and let the process run for days, go right ahead. But if you’re running this on a machine with limited RAM or Google Colab, you probably want to have these guardrails in place. There’s nothing more frustrating than having Google Colab hang after 7 hours of tuning 🙁

import signalclass TimeoutException(Exception):
pass
def handler(signum, frame):
raise TimeoutException()
# Set the signal handler
signal.signal(signal.SIGALRM, handler)
# Hyperparameter tuning for each model
optimized_models = 
total_models = len(ml_models)
timeout_max_minutes = 60
timeout = timeout_max_minutes * 60  # max minutes in seconds until timeout
for model_index, (model_name, model) in enumerate(tqdm(ml_models.items(), desc="Tuning Models")):
print(f"\nProcessing Model model_index + 1/total_models: model_name")
# Set a timeout for the model tuning
try:
signal.alarm(timeout)
optimized_model, f1, accuracy, tuning_time = fit_and_tune_model(model_name, model, hyperparameters, X_train, y_train, X_test, y_test)
print(f"\n Best Hyperparameters for model_name:")
print(optimized_model.get_params())
print(f"Tuning time for model_name: tuning_time/60:.2f minutes")
except TimeoutException:
print(f"model_name tuning timed out after timeout/60 minutes.")
continue
finally:
signal.alarm(0)
# Store the results
optimized_models[model_name] = optimized_model
# Print the f1-score and accuracy
print(f"F1-Score for model_name: f1:.4f")
print(f"Accuracy for model_name: accuracy:.4f\n")

Almost done with the ML section! Our tuning is done, now we need to visualize the results and save the best model.

# Evaluate optimized models
optimized_model_performance = for model_name, model in optimized_models.items():
model.fit(X_train, y_train)
y_pred = model.predict(X_test)
optimized_model_performance[model_name] = calculate_evaluation_metrics(y_test, y_pred)
# Visualize the evaluation results for optimized models
visualize_evaluation_results(optimized_model_performance)
# Select the best performing optimized model
best_optimized_model_name = max(optimized_model_performance, key=lambda x: optimized_model_performance[x]['f1_score'])
best_optimized_model = optimized_models[best_optimized_model_name]
print("Best Optimized Model:", best_optimized_model_name)
print("Performance Metrics:")
for metric, value in optimized_model_performance[best_optimized_model_name].items():
print(f"metric: value:.4f")

Still with me? Good, here’s the output after all of that tuning. Some ML models can be tuned in a few minutes, even with a large dataset. Others require hours and hours of computing time and will probably timeout on Google Colab before they finish running anyway.

And the winner is…still logistic regression!

Best Optimized Model: Logistic Regression

Performance Metrics:

Accuracy: 0.9078

Precision: 0.9080

Recall: 0.9078

F1_score: 0.9078

Feel free to reduce or expand the number of parameters per model. You can also comment out the parameters for whichever models you don’t want to tune to save time. Do you also feel like time slows to a crawl when waiting for your hyperparameter tuning to finish?

Now we need to save the best model and the vectorizer that was used for text preprocessing so we can use it to predict on new, unseen data.

# Save the best model and vectorizer for later use
import joblibjoblib.dump(best_optimized_model, 'best_model.pkl')
joblib.dump(vectorizer, 'vectorizer.pkl')

First, we preprocess the unseen reviews.

# Preprocess the input text like we did for the training data above 
def preprocess_input_text(text, vectorizer):
"""
Preprocess the input text for prediction.
Args:
text (str): The input text.
vectorizer: The vectorizer to convert text to a feature vector.Returns:
preprocessed_text (array-like): The preprocessed text as a feature vector.
"""
tokens = tokenize_text(text)
filtered_tokens = remove_stopwords_punctuation(tokens)
processed_tokens = perform_stemming_lemmatization(filtered_tokens)
cleaned_text = ' '.join(processed_tokens)
preprocessed_text = vectorizer.transform([cleaned_text])
# print("Preprocessed Tokens:", processed_tokens)
# print("Cleaned Text:", cleaned_text)
return preprocessed_text

Then we define the functions we need to predict the sentiment using our best model from above (i.e., Logistic Regression)

# Predict the sentiment of unseen data
def predict_sentiment(text, model, vectorizer):
"""
Predict the sentiment of the input text using the trained model.
Args:
text (str): The input text.
model: The trained model for prediction.
vectorizer: The vectorizer to convert text to a feature vector.Returns:
sentiment (str): The predicted sentiment.
"""
preprocessed_text = preprocess_input_text(text, vectorizer)
print(preprocessed_text)
sentiment_label = model.predict(preprocessed_text)[0]
print(sentiment_label)
preprocessed_text = preprocess_input_text(text, vectorizer)
probabilities = model.predict_proba(preprocessed_text)
print(f"Probabilities: probabilities")
sentiment_label = model.predict(preprocessed_text)[0]
# Assuming binary sentiment classification (positive or negative)
sentiment = 'positive' if sentiment_label == "pos" else 'negative'
return sentiment
# Load the saved model and vectorizer
best_model_loaded = joblib.load('best_model.pkl')
vectorizer_loaded = joblib.load('vectorizer.pkl')
# Predict sentiment using the best model
input_text = "rumor , a muddled drama about coming to terms with death , feels impersonal , almost generic . "
sentiment = predict_sentiment(input_text, best_model_loaded, vectorizer_loaded)
print(f"Sentiment: sentiment")

Finally, we can make a prediction! Whew, that took forever. This is why data scientists are well paid 😉

import pandas as pddef predict_sentiment_csv(input_csv, model, vectorizer):
"""
Predict the sentiment of each review in the input CSV using the trained model.
Args:
input_csv (str): Path to the input CSV file containing reviews.
model: The trained model for prediction.
vectorizer: The vectorizer to convert text to a feature vector.
Returns:
results_df (pd.DataFrame): A DataFrame containing the original reviews and predicted sentiment.
"""
# Read the input CSV file
input_data = pd.read_csv(input_csv)
# Initialize an empty list to store the predicted sentiments
predicted_sentiments = []
# Iterate through the reviews in the input DataFrame
for index, row in input_data.iterrows():
text = row['Reviews']
preprocessed_text = preprocess_input_text(text, vectorizer)
sentiment_label = model.predict(preprocessed_text)[0]
# Assuming binary sentiment classification (positive or negative)
sentiment = 'positive' if sentiment_label == "pos" else 'negative'
predicted_sentiments.append(sentiment)
# Add the predicted sentiments as a new column to the input DataFrame
input_data['Predicted_Sentiment'] = predicted_sentiments
return input_data
# Load the saved model and vectorizer
best_model_loaded = joblib.load('best_model.pkl')
vectorizer_loaded = joblib.load('vectorizer.pkl')
# Predict sentiment for the reviews in the input CSV file
input_csv_path = '/content/IMDB_Dataset_VALIDATE.csv'  # Replace this with the path to your input CSV file
results_df = predict_sentiment_csv(input_csv_path, best_model_loaded, vectorizer_loaded)
# Print the results
print(results_df)
# Save the results to a new CSV file
results_df.to_csv('output_reviews_validation.csv', index=False)

How did our optimal ML model (logistic regression) do on the unseen data? Create a confusion matrix below to find out.

The above code for sentiment analysis with machine learning is very lengthy. Thankfully, the code for using ChatGPT is much shorter!

pip install pandas openai requests tqdm

Add your own Open AI API key to the code below

import os
import pandas as pd
import openai
import requests
from tqdm import tqdm
import time
import docx# Set up the OpenAI API
openai.api_key = "<REPLACE this with your own OpenAI API, make sure to leave the quotation marks>"
GPT_API_URL = "https://api.openai.com/v1/chat/completions"

Load the unseen validation data

input_file = "/content/IMDB_Dataset_VALIDATE.csv"
data_chatGPT = pd.read_csv(input_file)
data_chatGPT

The dataset uses “pos” and “neg” as labels so we replace them with POSITIVE and NEGATIVE to match the ChatGPT prompt instructions.

# Modify the dataset to use POSITIVE and NEGATIVE as the sentiment rating rather than pos or neg or 1 or 0
# import pandas as pddata_chatGPT['Sentiments'] = data_chatGPT['Sentiments'].replace('pos': "POSITIVE", 'neg': "NEGATIVE")
# Print the first 5 rows to verify the changes
print(data_chatGPT.head(50))

Setup the ChatGPT API call and define the prompt to get back only a POSITIVE or NEGATIVE sentiment classification

def analyze_review(review):
retries = 3
sentiment = Nonewhile retries > 0:
messages = [
"role": "system", "content": "You are an AI language model trained to analyze and detect the sentiment of product reviews.",
"role": "user", "content": f"Analyze the following product review and determine if the sentiment is: positive or negative. Return only a single word, either POSITIVE or NEGATIVE, do not rate any review as NEUTRAL: review"
]
completion = openai.ChatCompletion.create(
model="gpt-3.5-turbo",
messages=messages,
max_tokens=3,
n=1,
stop=None,
temperature=0
)
response_text = completion.choices[0].message.content
print(response_text)
if response_text in ["POSITIVE", "NEGATIVE"]:
sentiment = response_text
break
else:
retries -= 1
time.sleep(1)
else:
sentiment = "cannot determine sentiment"
retries = 3
# add a delay of 4 seconds between requests to avoid hitting the openai free tier API call rate limit. Otherwise, you can try a shorter delay but may risk overloading the openAI API and hitting an error
time.sleep(4)
return sentiment

This code block will send the request to analyze each review to the ChatGPT API. In case of any errors from the API, the code will wait a few seconds and then pickup where it left off and continue sending reviews to the API until all of them are completed. The API takes about 5 seconds to process each review in the validation dataset.

import sys predicted_sentiment = []
def save_partial_results(predicted_sentiment):
data_chatGPT['Predicted Sentiment'] = pd.Series(predicted_sentiment)
output_file_partial = "reviews_analyzed_partial_sentiment.csv"
data_chatGPT.to_csv(output_file_partial, index=False)
print("Partial results saved to:", output_file_partial)
def process_remaining_reviews(start_index):
for review in tqdm(data_chatGPT["Reviews"][start_index:], desc="Processing remaining reviews"):
success = False
attempts = 0
while not success and attempts < 5:
try:
pred_sentiment = analyze_review(review)
predicted_sentiment.append(pred_sentiment)
success = True
except Exception as e:
attempts += 1
print(f"Error occurred while processing a review (attempt attempts):", e)
if attempts == 5:
print("Reached maximum number of attempts. Saving partial results.")
save_partial_results(predicted_sentiment)
print("Continuing with the remaining reviews.")
process_remaining_reviews(start_index + len(predicted_sentiment))
break
# Start processing reviews from the beginning
process_remaining_reviews(0)
# Save the final results
data_chatGPT['Predicted Sentiment'] = predicted_sentiment
output_file_final = "reviews_analyzed_full_sentiment.csv"
data_chatGPT.to_csv(output_file_final, index=False)
print("All results saved to:", output_file_final)

Check to see how many POSITIVE and NEGATIVE results there are.

data_chatGPT['Predicted Sentiment'].value_counts()

Once in awhile, ChatGPT will return “I’m sorry” if the review it is analyzing is just a bunch of jibberish or a single word. This code will convert any of those errors to NEGATIVE.

# Convert any rating other than POSITIVE to NEGATIVEdata_chatGPT['Predicted Sentiment'] = data_chatGPT['Predicted Sentiment'].replace('NEUTRAL': "NEGATIVE", "cannot determine sentiment": "NEGATIVE" )
# Print the first 5 rows to verify the changes
# print(data_chatGPT.head(5))
# Confirm the change
data_chatGPT['Predicted Sentiment'].value_counts()

Make sure we have the same labels used in the validation and predicted datasets

# import pandas as pddata_chatGPT['Sentiments'] = data_chatGPT['Sentiments'].replace('positive': 'POSITIVE', 'negative': 'NEGATIVE')
# Print the first 5 rows to verify the changes
print(data_chatGPT.head(5))

And now, it’s time for the confusion matrix to check how ChatGPT did on the validation data.

First, here’s the confusion matrix for the best ML model on the validation dataset. 81% accuracy.

SK learn confusion matrix for ML models — SK Learn Confusion matrix for the best ML model

confusion matrix with seaborn for ML models — Confusion matrix for ChatGPT API

And here’s the confusion matrix for ChatGPT on the validation dataset. 94% accuracy, significantly better, and all we had to do was call an API!

SK learn confusion matrix for ChatGPT — SK Learn confusion matrix for ChatGPT.

Confusion matrix Seaborn for ChatGPT API — Confusion matrix for ChatGPT API

ChatGPT is a Good Sentiment Analyzer!

With an accuracy of 94% vs. 81% compared to the 7 optimized machine learning models tested, ChatGPT is the clear winner. With additional fine-tuning, the performance for ChatGPT could be even higher.

Congratulations! You made it to the end of this article. By now, you should have a good understanding of how to use the ChatGPT API for sentiment analysis and why it’s so powerful. You also have a new project for your data science portfolio 😉 Good luck!

If you liked this article, make sure to follow me on Medium for more ideas on how to apply data science to solve real business challenges.

Here are some other articles you may like:

I’m happy to answer any questions you have in the comments section.

Disclosure Per Medium’s Policy: AI-assistive technology was used to help write some of the code in this article.

[ad_2]

Source link