Here's a simple TensorFlow (Keras) example for sentiment analysis using a CSV file. This example assumes the CSV file has two columns:

text: The review or sentence.

label: 0 for negative, 1 for positive sentiment.





step1:-


text,label
"I love this movie!",1
"This is terrible.",0
"Amazing experience",1
"Awful and boring",0
"Absolutely fantastic!",1
"I hated every minute of it.",0
"A masterpiece of cinema.",1
"Not worth the time.",0
"Brilliant and inspiring.",1
"Poorly written and acted.",0
"Heartwarming and beautiful.",1
"Completely disappointing.",0
"A joy to watch!",1
"I'll never watch this again.",0
"Exceeded my expectations.",1
"The plot made no sense.",0
"Touching and emotional.",1
"Full of clichés and bad jokes.",0
"Simply outstanding!",1
"Terrible from start to finish.",0
"Well-acted and directed.",1
"The acting was painful.",0
"One of the best I've seen.",1
"Boring and predictable.",0
"Highly recommended!",1
"A total waste of time.",0
"Funny and entertaining.",1
"I couldn’t finish it.",0
"Loved the characters.",1
"Nothing good about it.",0


step 2:-

sentimnet.py file code:-


import pandas as pd
import numpy as np
import tensorflow as tf
from tensorflow.keras.preprocessing.text import Tokenizer
from tensorflow.keras.preprocessing.sequence import pad_sequences
from sklearn.model_selection import train_test_split
from tensorflow.keras.preprocessing.text import tokenizer_from_json

# 1. Load CSV Data
df = pd.read_csv("sentiment.csv")  # Make sure the file is in the same directory
texts = df['text'].astype(str).tolist()
labels = df['label'].tolist()

# 2. Split into Train/Test
X_train, X_test, y_train, y_test = train_test_split(texts, labels, test_size=0.2, random_state=42)

# 3. Tokenize Text
vocab_size = 1000
tokenizer = Tokenizer(num_words=vocab_size, oov_token="<OOV>")
tokenizer.fit_on_texts(X_train)

# Convert text to sequences
X_train_seq = tokenizer.texts_to_sequences(X_train)
X_test_seq = tokenizer.texts_to_sequences(X_test)

# 4. Pad Sequences
max_length = 100
X_train_pad = pad_sequences(X_train_seq, maxlen=max_length, padding='post')
X_test_pad = pad_sequences(X_test_seq, maxlen=max_length, padding='post')

# 🔧 Convert labels to numpy arrays to avoid ValueError
y_train = np.array(y_train)
y_test = np.array(y_test)

# 5. Build the Model
model = tf.keras.Sequential([
    tf.keras.layers.Embedding(vocab_size, 16),
    tf.keras.layers.GlobalAveragePooling1D(),
    tf.keras.layers.Dense(16, activation='relu'),
    tf.keras.layers.Dense(1, activation='sigmoid')
])

model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])

# 6. Train the Model
model.fit(X_train_pad, y_train, epochs=10, validation_data=(X_test_pad, y_test))

# Save the model with proper extension
model.save("sentiment_model.keras")

# Save tokenizer
token_json = tokenizer.to_json()
with open('tokenizer.json', 'w') as f:
    f.write(token_json)

print("Model and tokenizer saved.")

# 7. Evaluate
loss, accuracy = model.evaluate(X_test_pad, y_test)
print(f"Test Accuracy: {accuracy:.2f}")

# 8. Load Model (use correct filename)
loaded_model = tf.keras.models.load_model("sentiment_model.keras")

# 9. Load Tokenizer
with open('tokenizer.json') as f:
    token_data = f.read()
loaded_tokenizer = tokenizer_from_json(token_data)

# 10. Predict on new samples
texts_to_predict = [
    "I really enjoyed this movie!",
    "It was a waste of time."
]

# Tokenize and pad
sequences = loaded_tokenizer.texts_to_sequences(texts_to_predict)
padded = pad_sequences(sequences, maxlen=max_length, padding='post')

# Predict
predictions = loaded_model.predict(padded)

for text, pred in zip(texts_to_predict, predictions):
    sentiment = "Positive" if pred > 0.5 else "Negative"
    print(f"Text: {text}\nSentiment: {sentiment} (Confidence: {pred[0]:.2f})\n")




run it :-

python sentiment.py 


ouput:-


Text: I really enjoyed this movie!
Sentiment: Positive (Confidence: 0.56)

Text: It was a waste of time.
Sentiment: Positive (Confidence: 0.55)






Details Explanation :-


Sure! Here's a detailed explanation of each line of your code:

import pandas as pd


Imports the pandas library as pd, which is used for data manipulation and analysis, especially for working with tabular data like CSV files.

import numpy as np


Imports NumPy as np, a library for numerical operations, arrays, and mathematical functions.

import tensorflow as tf


Imports TensorFlow as tf, a deep learning framework used to build and train machine learning models.

from tensorflow.keras.preprocessing.text import Tokenizer


Imports Tokenizer, a utility from Keras (part of TensorFlow) to convert text into sequences of integers.

from tensorflow.keras.preprocessing.sequence import pad_sequences


Imports pad_sequences, a function to make sequences the same length by padding them, which is important for batch processing in neural networks.

from sklearn.model_selection import train_test_split


Imports train_test_split from scikit-learn to split the dataset into training and testing sets.

from tensorflow.keras.preprocessing.text import tokenizer_from_json


Imports tokenizer_from_json, which lets you load a tokenizer from a saved JSON file.

Data Loading and Preparation:
df = pd.read_csv("sentiment.csv")  # Make sure the file is in the same directory


Reads a CSV file named "sentiment.csv" into a pandas DataFrame df.

texts = df['text'].astype(str).tolist()


Extracts the "text" column from the DataFrame, converts all entries to strings (in case some are missing or other types), and converts it into a Python list.

labels = df['label'].tolist()


Extracts the "label" column, which contains sentiment labels (like 0/1), and converts it into a list.

Train-Test Split:
X_train, X_test, y_train, y_test = train_test_split(texts, labels, test_size=0.2, random_state=42)


Splits the data into training and testing sets:

80% training data

20% test data

random_state=42 ensures reproducibility of the split

Tokenization:
vocab_size = 1000


Defines the size of the vocabulary to keep the top 1000 most common words.

tokenizer = Tokenizer(num_words=vocab_size, oov_token="<OOV>")


Creates a tokenizer that keeps only the top 1000 words and uses <OOV> token for out-of-vocabulary words (words not in the top 1000).

tokenizer.fit_on_texts(X_train)


Builds the word index based on the training texts.

X_train_seq = tokenizer.texts_to_sequences(X_train)
X_test_seq = tokenizer.texts_to_sequences(X_test)


Converts the training and testing text data into sequences of integers corresponding to each word.

Padding:
max_length = 100


Sets a maximum sequence length of 100 words.

X_train_pad = pad_sequences(X_train_seq, maxlen=max_length, padding='post')
X_test_pad = pad_sequences(X_test_seq, maxlen=max_length, padding='post')


Pads sequences shorter than 100 with zeros at the end (post padding) and trims sequences longer than 100 words.

Label Preparation:
y_train = np.array(y_train)
y_test = np.array(y_test)


Converts label lists into NumPy arrays for TensorFlow compatibility.

Model Building:
model = tf.keras.Sequential([
    tf.keras.layers.Embedding(vocab_size, 16),
    tf.keras.layers.GlobalAveragePooling1D(),
    tf.keras.layers.Dense(16, activation='relu'),
    tf.keras.layers.Dense(1, activation='sigmoid')
])


Defines a simple neural network model:

Embedding layer: maps words to 16-dimensional vectors.

GlobalAveragePooling1D: averages the embeddings over the sequence length.

Dense(16): fully connected layer with 16 neurons and ReLU activation.

Dense(1): output layer with sigmoid activation (for binary classification).

model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])


Compiles the model with:

Binary crossentropy loss (since it's a binary classification)

Adam optimizer

Accuracy metric to track

Model Training:
model.fit(X_train_pad, y_train, epochs=10, validation_data=(X_test_pad, y_test))


Trains the model for 10 epochs on the training data, validating on the test data each epoch.

Saving Model and Tokenizer:
model.save("sentiment_model.keras")


Saves the trained model in a .keras file format.

token_json = tokenizer.to_json()
with open('tokenizer.json', 'w') as f:
    f.write(token_json)


Saves the tokenizer as a JSON file so it can be reused later.

print("Model and tokenizer saved.")


Confirms saving is complete.

Evaluation:
loss, accuracy = model.evaluate(X_test_pad, y_test)
print(f"Test Accuracy: {accuracy:.2f}")


Evaluates the model on test data and prints test accuracy.

Loading Model and Tokenizer:
loaded_model = tf.keras.models.load_model("sentiment_model.keras")


Loads the saved model from disk.

with open('tokenizer.json') as f:
    token_data = f.read()
loaded_tokenizer = tokenizer_from_json(token_data)


Loads the tokenizer from the saved JSON file.

Prediction on New Samples:
texts_to_predict = [
    "I really enjoyed this movie!",
    "It was a waste of time."
]


Defines new sample texts to classify.

sequences = loaded_tokenizer.texts_to_sequences(texts_to_predict)
padded = pad_sequences(sequences, maxlen=max_length, padding='post')


Tokenizes and pads these new texts just like the training data.

predictions = loaded_model.predict(padded)


Predicts sentiment scores for the new samples.

for text, pred in zip(texts_to_predict, predictions):
    sentiment = "Positive" if pred > 0.5 else "Negative"
    print(f"Text: {text}\nSentiment: {sentiment} (Confidence: {pred[0]:.2f})\n")


Prints each input text with predicted sentiment ("Positive" if score > 0.5 else "Negative") and confidence score.