Videolab: Minimum Viable Class #231

ghost · 2023-10-17T20:46:56Z

Abstract

The purpose of issue #215 was to implement a minimal extension of the Imagelab class. I believe that at this point, per the criteria listed here, we are now very close to achieving the aforementioned extension. While I fully expect there will be specific issues that the cleanvision maintainers will want to address, the extension is mature enough to begin the review process.

Implementation Strategy

In an effort to keep to the original goal of the extension laid out by @jwmueller, I have not attempted any additional features (including any default arguments, or error checking). I have only implemented exactly that which is necessary to fulfill the three tenets:

1. just extract every k-th frame from the video,
2. run cleanvision on all those images,
3. aggregate the results across the frames extracted per video, into scores per video.

Tenet 1 was implemented in quite a simple way, with the FrameSampler class (which is a simplified version of the VideoSampler written by @LemurPwned). Again you will notice I limited any excess features except where necessary, in order to keep to the minimal design and limited scope of issue #215.

Tenet 2 and 3 were achieved with the Videolab class. Again, where possible I have limited any excess features or additional design choices, preferring to defer to the cleanvision maintainers for specifics.

Conclusion

Thank you to @smttsp @LemurPwned @sanjanag and @jwmueller for their contributions and perspective on extending Cleanvision to run on video data.

CLAassistant · 2023-10-17T20:47:04Z

All committers have signed the CLA.

codecov · 2023-10-17T20:56:51Z

Codecov Report

Attention: 137 lines in your changes are missing coverage. Please review.

Comparison is base (cd8f98b) 95.63% compared to head (9434049) 84.06%.
Report is 5 commits behind head on main.

Files	Patch %	Lines
src/cleanvision/videolab.py	0.00%	136 Missing and 1 partial ⚠️

Additional details and impacted files

@@             Coverage Diff             @@
##             main     #231       +/-   ##
===========================================
- Coverage   95.63%   84.06%   -11.58%     
===========================================
  Files          16       17        +1     
  Lines         986     1123      +137     
  Branches      194      214       +20     
===========================================
+ Hits          943      944        +1     
- Misses         22      158      +136     
  Partials       21       21

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

jwmueller · 2023-10-19T02:44:46Z

pyproject.toml

 dependencies = [
    "Pillow>=9.3",
+    "av>=10.0.0",


Suggested change

dependencies = [

"Pillow>=9.3",

"av>=10.0.0",

dependencies = [ # optional: av>=10.0.0

"Pillow>=9.3",

suggest making av an optional dependency, so the package still works for image data without it installed

This looks good, instead of av, you could use the video keyword.

jwmueller · 2023-10-19T02:45:06Z

src/cleanvision/videolab.py

+from pathlib import Path
+from typing import Any, Dict, Generator, List, Optional
+
+import av


Suggested change

import av

jwmueller · 2023-10-19T02:53:09Z

src/cleanvision/videolab.py

+        # create frame samples sub directory
+        sample_sub_dir = self._create_frame_sample_sub_dir(video_file, output_dir)
+
+        # open video file for streaming


Suggested change

# open video file for streaming

# open video file for streaming

try:

import av

except ImportError as error:

raise ImportError(

"Cannot import package `av`. "

"Please install it via `pip install av` and then try again."

) from error

import av

Can lazy import the optional dependency av only where it's required

You can move this line of code wherever is the first time a user would do something related to videolab. I.e. when they construct the object.

The reason I don't think this line of code can be at the top of the file is because the file will be imported with: import cleanvision

And thus this exception will be raised, even for users who didn't want to use Videolab.

So you can really put this code anywhere that makes it possible for users to run Imagelab without having av installed (which you can easily test yourself), while ensuring the exception gets raised as soon as such users try to run Videolab.

jwmueller · 2023-10-19T02:57:48Z

src/cleanvision/videolab.py

@@ -0,0 +1,260 @@
+"""Videolab is an extension of Imagelab for finding issues in a video dataset."""


amazing stuff!

Could you comment:

A minimal example script of how to use this new Videolab and what the outputs look like?

A comment with comprehensive list of limitations of this code?
Eg:

What video file-type requirements are there.

What happens if some videos are really long some really short.

any efficiency bottlenecks where the code feels slow

edge cases where the results might not be good or code might crash.

This is great! That will super useful while our team is reviewing the current code

Great callout, ya replacing the FrameSampler will certainly need to be improved before we publicly announce this module to users. But I don't think it is critical to this PR being merged and could be addressed in a future follow-up PR.

Thanks for the comprehensive benchmarking/profiling, very helpful! Our team will get to the code review as soon as we're able to

Great job, if you need me to optimise this, let me know, there are some avenues to explore here.

@lafemmephile Do you want to make one of your notebooks here:
https://github.com/lafemmephile/videolab/tree/master/notebooks

the official quickstart tutorial for Videolab?

Here's a GH issue detailing what we need for our quick-start tutorial:
#239

Should be a simple PR of one of your existing notebooks, which you can make separately from the Videolab source code PR in parallel, if you're interested!

sanjanag · 2023-10-25T23:02:44Z

Hi @lafemmephile ! This PR is a great start to implementing Videolab. I looked at the PR and will be posting my thoughts here.

For the question of how to keep filenames for images, hashing, etc. I suggest we use a VideoDataset class, that maintains the mapping between numeric indices like [0,1,2,3] to video_paths. This would help us in supporting both video_dir and video_filepaths as arguments and reduce the overhead of hashmap, collisions etc. You can use this class a base class https://github.com/cleanlab/cleanvision/blob/43584bcb5554d52236d657d1fb9e7f1664eaf101/src/cleanvision/dataset/base_dataset.py

Using this index you can retrieve, video path, hence, video frames and the video itself. For saving frames, you could use the dir structure

frame_dir/

- 0/
    - frame_0.jpg
    - frame_1.jpg
- 1/
    - frame_0.jpg
    - frame_1.jpg

sanjanag · 2023-10-25T23:24:01Z

I think frame sampler is okay for now. We can take up the performance of frame sampler and also the strategy to sample frames in a separate PR, when we benchmark this on different datasets.

sanjanag · 2023-10-25T23:27:16Z

@lafemmephile Is there a specific reason you inherited Videolab from Imagelab? I think we could just have an self.imagelab object inside the Videolab class as we are using only find_issues method in frames. We can expose the same methods but there doesn't seem to be a need to inherit the Imagelab class.

sanjanag · 2023-10-25T23:35:53Z

@lafemmephile Videolab's list of supported issues should be separate from the list of issues supported by Imagelab. Right now it also checks for near/exact and the current algo doesn't make sense for videos. So I'd suggest maintain a separate list of issues that Videolab supports and pass them in imagelab.find_issues()

sanjanag · 2023-10-25T23:39:20Z

For visualization of videos with issues, we could construct gifs from the extracted frames, I got the idea from here.

sanjanag · 2023-10-25T23:42:07Z

@lafemmephile for faster iteration over the code and checking if the issues make any sense, I suggest you create a test dataset for videos similar to the one we have for images. For starters just add one video of each issue type and see how it works.

sanjanag · 2023-11-01T17:47:53Z

So basically no near/exact, just all the other issues should be supported?

That's correct @lafemmephile

sanjanag · 2023-11-03T20:23:31Z

@sanjanag @jwmueller @LemurPwned Should I also implement all the public methods found in Imagelab or just stick with the existing methods I have in Videolab?

It would be nice to have all the public methods in Imagelab as they apply to Videolab as well, but you could start with the ones you already have.

sanjanag · 2023-11-21T16:56:08Z

@jwmueller @sanjanag @LemurPwned I have done my best to implement all the PUBLIC methods ... there may be some attributes (e.g. issues) that should be implemented in order to match the usage and API of Imagelab ... The only remaining public method not implemented is visualize.

I find it difficult to be clear about how to implement the visualize method ... should we "visualize" videos? Should they be playable? Or should we visualize individual frames? From the research I did it does not some trivial to plot videos/GIFs in matplotlib AND have them be playable ... GIFs can be plotted but it does not seem obvious to me how to make them playable.

Until it becomes more clear what is the best strategy for visualizing the Videolab data I will hold off on attempting to implement the visualize method. If any of you have some experience/insight/ideas related to approaching this problem, please comment 👍.

Hi @lafemmephile ! That is some amazing work. Could you elaborate on what difficulty you are facing in making videos playable? And does playable means you click and play, or it plays as soon as it is plotted? It could be the case that it requires some frontend support from jupyter notebook.

We could start with frames, where we don't plot all the say dark frames from a video. But sample one frame per dark video.
I also think with either frames or GIFs, it is important to display some kind of time information where one can find it in the video. For example, say frame_10.png is sampled to visualize, then what instant in the video does it occur and what does does it occur in. These are some important information we want to show apart from the sampled frame.

sanjanag · 2023-11-21T17:04:43Z

@lafemmephile We want to wrap up this PR soon, as I will be on PTO from second week of December and wouldn't be available during that time. Would it be okay if I take the reins of this PR from here? I think you have done some phenomenal contribution. Could you please tell if there are any pending in-progress contributions from your side?

Femme Phile added 8 commits September 27, 2023 13:43

Adding submodule videolab (#1)

a2e503a

Adding video sampler code (#2)

40ebe0f

Merge branch 'cleanlab:main' into main

5d62898

Refactoring to minimal frame sampler (#2)

ecbd253

Merge branch 'main' of github.com-lafemmephile:lafemmephile/cleanvision

2ee82f2

Implementing minimum viable Videolab class (#2)

6f59e02

Importing List type.

0f96d50

Adding aggregation feature (#2)

43584bc

jwmueller requested a review from sanjanag October 17, 2023 20:53

jwmueller reviewed Oct 19, 2023

View reviewed changes

Femme Phile added 6 commits November 12, 2023 08:14

Refactoring to composition

4fa1ca7

Refactoring to PyAV lazy import strategy

3924620

Implementing VideoDataset class

cfb4981

Adding default video issue types

f8e8c21

Adding save, load, get_stats methods

861fac0

Fixing DataFrame type annotation capitalization

9434049

ghost requested a review from jwmueller November 18, 2023 13:55

ghost requested a review from LemurPwned November 18, 2023 13:55

ghost closed this Nov 22, 2023

This pull request was closed.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Videolab: Minimum Viable Class #231

Videolab: Minimum Viable Class #231

ghost commented Oct 17, 2023 •

edited by ghost

Loading

CLAassistant commented Oct 17, 2023 •

edited

Loading

codecov bot commented Oct 17, 2023 •

edited

Loading

jwmueller Oct 19, 2023

jwmueller Oct 19, 2023 •

edited

Loading

sanjanag Nov 13, 2023

jwmueller Oct 19, 2023

jwmueller Oct 19, 2023

jwmueller Oct 19, 2023

jwmueller Nov 13, 2023

jwmueller Oct 19, 2023

jwmueller Oct 23, 2023

jwmueller Oct 25, 2023

LemurPwned Oct 25, 2023

jwmueller Nov 17, 2023

sanjanag commented Oct 25, 2023 •

edited

Loading

sanjanag commented Oct 25, 2023

sanjanag commented Oct 25, 2023

sanjanag commented Oct 25, 2023

sanjanag commented Oct 25, 2023

sanjanag commented Oct 25, 2023

sanjanag commented Nov 1, 2023

sanjanag commented Nov 3, 2023

sanjanag commented Nov 21, 2023

sanjanag commented Nov 21, 2023

-        # open video file for streaming
+        # open video file for streaming
+        try:
+            import av
+        except ImportError as error:
+            raise ImportError(
+                "Cannot import package `av`. "
+                "Please install it via `pip install av` and then try again."
+             ) from error
+        import av

		@@ -0,0 +1,260 @@
		"""Videolab is an extension of Imagelab for finding issues in a video dataset."""

Videolab: Minimum Viable Class #231

Videolab: Minimum Viable Class #231

Conversation

ghost commented Oct 17, 2023 • edited by ghost Loading

Abstract

Implementation Strategy

Conclusion

CLAassistant commented Oct 17, 2023 • edited Loading

codecov bot commented Oct 17, 2023 • edited Loading

Codecov Report

Choose a reason for hiding this comment

jwmueller Oct 19, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sanjanag commented Oct 25, 2023 • edited Loading

sanjanag commented Oct 25, 2023

sanjanag commented Oct 25, 2023

sanjanag commented Oct 25, 2023

sanjanag commented Oct 25, 2023

sanjanag commented Oct 25, 2023

sanjanag commented Nov 1, 2023

sanjanag commented Nov 3, 2023

sanjanag commented Nov 21, 2023

sanjanag commented Nov 21, 2023

ghost commented Oct 17, 2023 •

edited by ghost

Loading

CLAassistant commented Oct 17, 2023 •

edited

Loading

codecov bot commented Oct 17, 2023 •

edited

Loading

jwmueller Oct 19, 2023 •

edited

Loading

sanjanag commented Oct 25, 2023 •

edited

Loading