[ad_1]
Identify Yoga Poses with Computer Vision using Deep Learning and CNNs — A Data Science Approach
There is no doubt that yoga has become increasingly popular over the years, with millions of people practicing it worldwide. However, with the vast number of yoga poses available, it can be challenging to identify and perform them correctly. That’s where data science, AI, Python, machine learning, and deep learning come in.
As if that is not enough; the development of computer vision technology has enabled us to automatically identify and classify images accurately. By leveraging deep learning algorithms and tools like TensorFlow, we can train a model to recognize and classify yoga poses from images and videos.
In this blog post, we will explore how to use computer vision to identify yoga poses using TensorFlow. We will discuss the data collection and preparation process, how to train the model, and evaluate its performance. We will also share the results of our experiments and discuss potential future directions for this research.
Good news! You don’t need to be an expert in computer vision or machine learning to follow along with this tutorial. I will provide step-by-step instructions, and all the code examples will be in Python, making it easy for anyone to get started with this exciting technology. So, whether you’re a yoga enthusiast or a data scientist, read on to discover how computer vision can help identify yoga poses accurately.
Computer vision is a part of artificial intelligence that helps machines understand and process what they see. This is done by creating algorithms and models to analyze images and videos, which lets machines identify objects, find patterns, and carry out tasks. A popular tool for making these models is TensorFlow, a free machine-learning library by Google. TensorFlow is commonly used to recognize and classify objects in pictures and videos.
When it comes to yoga poses, there’s a wide variety of options — hundreds, in fact — each with its own unique movements and characteristics. Figuring out and sorting these poses can be quite a challenge, even for seasoned yoga enthusiasts. Fortunately, computer vision and machine learning can make this process much simpler, quicker, and more precise.
Best of all, there are several ready-to-use datasets for recognizing and classifying yoga poses, such as the Yoga-82 dataset. This dataset includes 82 different yoga poses, complete with annotated images for easy reference.
Here’s why creating your own dataset might be necessary: if you can’t find one that suits your requirements, you can always gather images and videos of the yoga poses you’d like to identify and classify and create a custom dataset.
Now that we’ve covered the basics of computer vision and machine learning let’s explore how to harness their power to accurately identify yoga poses.
For the sake of this article, we will use Yoga Posture Dataset from Kaggle contains 6 different yoga poses, here.
Before training the model, we need to prepare the dataset by organizing it into training and validation sets. We also need to preprocess the data and perform data augmentation to increase the diversity of the dataset.
Dataset and Directory Structure
Our dataset consists of six classes of yoga poses: chair, cobra, downdog, goddess, tree, and warrior.
We have already downloaded the dataset and saved it in a directory named Yoga
in our local machine. The Yoga
directory contains six subdirectories, each corresponding to a yoga pose class.
To organize the dataset into training and validation sets, we will create two subdirectories inside the Yoga
directory: train
and validation
. We will move 80% of the images from each class to the train
directory and the remaining 20% to the validation
directory. We will use the train_test_split
function from the sklearn
library to split the dataset.
# Define directories
base_dir = "/Users/randyasfandy/Downloads/Yoga"
classes = ['chair', 'cobra', 'downdog', 'goddess', 'tree', 'warrior']
train_dir = os.path.join(base_dir, 'train')
validation_dir = os.path.join(base_dir, 'validation')# Split the dataset into training and validation sets
if not os.path.exists(train_dir):
os.makedirs(train_dir)
if not os.path.exists(validation_dir):
os.makedirs(validation_dir)
for cls in classes:
if not os.path.exists(os.path.join(train_dir, cls)):
os.makedirs(os.path.join(train_dir, cls))
if not os.path.exists(os.path.join(validation_dir, cls)):
os.makedirs(os.path.join(validation_dir, cls))
images = [f for f in os.listdir(os.path.join(base_dir, cls)) if f.endswith('.jpg')]
train_images, validation_images = train_test_split(images, test_size=0.2)
for image in train_images:
os.rename(os.path.join(base_dir, cls, image), os.path.join(train_dir, cls, image))
for image in validation_images:
os.rename(os.path.join(base_dir, cls, image), os.path.join(validation_dir, cls, image))
Data Augmentation
To improve the generalization ability of our model and prevent overfitting, we will perform data augmentation on the training set. This involves randomly transforming the images by applying various image manipulations such as rotation, flipping, zooming, and shearing. We will use the ImageDataGenerator
class from the tensorflow.keras.preprocessing.image
module to perform data augmentation.
In addition to data augmentation, we also need to preprocess the data by rescaling the pixel values to a range of 0 to 1. We will achieve this by passing the argument rescale=1./255
to the ImageDataGenerator
class.
# Preprocess the data
train_datagen = ImageDataGenerator(rescale=1./255)
validation_datagen = ImageDataGenerator(rescale=1./255)train_generator = train_datagen.flow_from_directory(
train_dir,
target_size=(150, 150),
batch_size=20,
class_mode='categorical'
)
validation_generator = validation_datagen.flow_from_directory(
validation_dir,
target_size=(150, 150),
batch_size=20,
class_mode='categorical'
)
Handling Corrupted Images
It is common to encounter corrupt images while working with large datasets. These images can have missing pixels, incorrect file formats, or other errors that prevent them from being processed by the model. In order to prevent these images from halting the training process, we can create a custom image generator that handles corrupted images.
The custom image generator function, custom_image_generator()
, is defined to wrap around the original generator object, yielding data only when the image is not corrupted. If the image is corrupted, the generator skips that image and continues to the next one. This allows us to continue training the model without interruption, even if there are some corrupted images in the dataset.
Here is the code for the custom image generator:
def custom_image_generator(generator):
while True:
try:
data = next(generator)
yield data
except Exception as e:
print(f"Skipping image due to error: e")
We then use this custom generator to handle corrupted images in our train_generator
and validation_generator
:
train_generator = custom_image_generator(train_generator)
validation_generator = custom_image_generator(validation_generator)
This ensures that our model will be able to handle any corrupted images encountered during training, and continue training on the rest of the dataset without interruption.
Next, we will move on to creating the CNN model.
For our model, we will be using a convolutional neural network (CNN), a type of neural network commonly used for image classification tasks. CNNs are designed to process data with a grid-like topology, such as images, by employing a series of convolutional and pooling layers to extract features and reduce spatial dimensions, followed by one or more fully connected layers to classify the extracted features.
Here’s the architecture we will be using for our model:
model = tf.keras.models.Sequential([
tf.keras.layers.Conv2D(32, (3, 3), activation='relu', input_shape=(150, 150, 3)),
tf.keras.layers.MaxPooling2D(2, 2),
tf.keras.layers.Conv2D(64, (3, 3), activation='relu'),
tf.keras.layers.MaxPooling2D(2, 2),
tf.keras.layers.Conv2D(128, (3, 3), activation='relu'),
tf.keras.layers.MaxPooling2D(2, 2),
tf.keras.layers.Flatten(),
tf.keras.layers.Dense(512, activation='relu'),
tf.keras.layers.Dense(len(classes), activation='softmax')
])
In this model, we have:
- Three convolutional layers with increasing filters (32, 64, and 128) and 3×3 kernel size, using the ReLU activation function
- Three max-pooling layers with a 2×2 pool size to reduce spatial dimensions and retain important features
- One flatten layer to convert the 2D feature maps into a 1D feature vector
- Two fully connected layers, the first with 512 units and ReLU activation function, and the second with a number of units equal to the number of classes (yoga poses) and a softmax activation function, which outputs a probability distribution over the classes.
After defining the architecture, we need to compile the model, which involves specifying the loss function, optimizer, and evaluation metric to use during training.
Here’s the code to compile the model:
model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])
For our model, we are using the Adam optimizer, which is a popular optimization algorithm for deep learning models that combines the benefits of both the stochastic gradient descent (SGD) and adaptive gradient algorithms. We are using categorical cross-entropy as the loss function, which is suitable for multi-class classification problems, and accuracy as the evaluation metric, which measures the percentage of correct predictions.
Now that we have compiled our model, we can start training it on our training data.
Here’s the code to train the model:
history = model.fit(
train_generator,
epochs=30,
validation_data=validation_generator,
steps_per_epoch=train_steps_per_epoch,
validation_steps=validation_steps
)
We are training the model for 30 epochs, which means it will go through the entire training dataset 30 times. We are also validating the model on our validation data during training to monitor its performance and avoid overfitting. We are using the fit()
function of our model and passing it our training and validation generators, the number of steps per epoch and validation, and the number of epochs to train for.
Note that since we are using a custom generator to handle corrupted images, we need to pass the train_generator
and validation_generator
objects returned by ImageDataGenerator.flow_from_directory()
through our custom_image_generator()
function before training. This ensures that any corrupted images are skipped during training and do not affect the performance of the model.
Here is the model’s score.
Now let’s visualize to check the model’s performance.
Here is the code.
import matplotlib.pyplot as plt
import random# Load images and true labels
test_images = []
true_labels = []
for cls in classes:
test_image_files = [f for f in os.listdir(os.path.join(validation_dir, cls)) if f.endswith('.jpg')]
for img_file in random.sample(test_image_files, 4): # Choose 4 random images per class
img = tf.keras.preprocessing.image.load_img(os.path.join(validation_dir, cls, img_file), target_size=(150, 150))
img_array = tf.keras.preprocessing.image.img_to_array(img)
img_array = img_array / 255.0
test_images.append(img_array)
true_labels.append(cls)
# Convert the list of test images to a NumPy array
test_images = np.array(test_images)
# Make predictions
predictions = model.predict(test_images)
predicted_labels = [classes[np.argmax(p)] for p in predictions]
# Display the grid of images with their predicted class labels
fig, axes = plt.subplots(4, 4, figsize=(15, 15))
for i, ax in enumerate(axes.flat):
ax.imshow(test_images[i])
ax.set_title(f"True: true_labels[i]\nPredicted: predicted_labels[i]")
ax.axis("off")
plt.show()
Here is the output.
Wonderful.
I find the pictures from an internet, and ran the model and again, visualize the outputs.
The code is the same as the above code.
The output and the accuracy look good. However, it can be better.
Yet this will heavily depend on your computational power.
In conclusion, we have demonstrated the potential of computer vision in identifying yoga poses using a custom convolutional neural network.
We have explained the process of data preprocessing, model creation, training, and evaluation. Additionally, we have demonstrated the use of data visualization to evaluate the model’s performance on a set of test images.
This project has the potential to be expanded upon in many ways, such as using more advanced architectures, integrating with video streams, and creating a user-friendly mobile application.
Overall, computer vision is an exciting field with many potential applications, and identifying yoga poses is just one example of its capabilities. Thanks for reading, and happy coding!
If you want me to write an article for you, here is my e-mail for business inquiries: [email protected]
Thanks for reading!
[ad_2]
Source link