Segment videos into groups of frames which represent a common human action. Sample video and corresponding frames have been provided.
The frames of the sample video are fed into a pre-trained Keras model of VGGNet to extract the features of the frames. The extracted features of the frames have been used for Spectral Clustering of the frames using the Normalized Cuts algorithm.
- Anaconda
- Scikit-Learn
- Keras alongwith Theano or Tensorflow(recommended)
- Set the value of
k
to the desired number of clusters. - Pass the number of frames, format of the frames and the path, where the frames are located, in the
get_features
function - Pass the number of frames in the
adjacency_matrix
function - Run the rest of the code
Use OpenCV
- Add Steps to extract the frames of the video
- Implement Oversegmentation and add Conv3D model for finer segmentation.