EYE FOR BLIND

This is the final submission file for the Capstone project - Eye for Blind.
An CNN-RNN based Attention model has been built on flickr8k dataset to predict captions for random images. The Model selects captions using Greedy Search and resulting captions are evaluated using BLUE score.

Let's read the dataset

Data understanding

1.Import the dataset and read image & captions into two seperate variables

2.Visualise both the images & text present in the dataset

3.Create word-to-index and index-to-word mappings.

4.Create a dataframe which summarizes the image, path & captions as a dataframe

5.Visualise the top 30 occuring words in the captions

6.Create a list which contains all the captions & path

Each image id has 5 captions associated with it therefore the total dataset has 40455 samples.

Pre-Processing the captions

1.Create the tokenized vectors by tokenizing the captions fore ex :split them using spaces & other filters. This gives us a vocabulary of all of the unique words in the data. Keep the total vocaublary to top 5,000 words for saving memory.

2.Replace all other words with the unknown token "UNK" .

3.Create word-to-index and index-to-word mappings.

4.Pad all sequences to be the same length as the longest one.

Pre-processing the images & Extracting Features using a Pre-trained CNN

  1. Resizing images the shape of (299, 299)
  2. Normalizing images within the range of -1 to 1, which is the correct format for InceptionV3.
  3. Extracting features using Inceptionv3 and caching them on disk

Create the train & test data

  1. Combining both images & captions to create the train & test dataset using tf.data.Dataset API. Creating the train-test spliit using 80-20 ratio & random state = 42. Shuffle and batch is used while building the dataset.
  2. The shape of each image in the dataset after building is (batch_size, 299, 299, 3)
  3. The shape of each caption in the dataset after building is (batch_size, max_len)

Model Building

  1. Setting the model parameters

  2. Building the Encoder, Attention model & Decoder

Encoder

Attention model

Decoder

Model training & optimization

  1. Setting the optimizer & loss object
  2. Creating a checkpoint path
  3. Creating custom train_step & test_step functions
  4. Creating a custom loss function for the test dataset

Model Evaluation

1.Define your evaluation function using greedy search

2.Define your evaluation function using beam search ( optional)

3.Test it on a sample data using BLEU score