From 4bcf5df2f8e04acc27d0269922b16737293b85ae Mon Sep 17 00:00:00 2001 From: Rex Cheng Date: Mon, 23 Dec 2024 22:31:25 +0000 Subject: [PATCH] update readme --- README.md | 1 + 1 file changed, 1 insertion(+) diff --git a/README.md b/README.md index 5209554..c069e47 100644 --- a/README.md +++ b/README.md @@ -146,6 +146,7 @@ MMAudio was trained on several datasets, including [AudioSet](https://research.g ## Update Logs +- 2024-12-23: Added training and batch evaluation scripts. - 2024-12-14: Removed the `ffmpeg<7` requirement for the demos by replacing `torio.io.StreamingMediaDecoder` with `pyav` for reading frames. The read frames are also cached, so we are not reading the same frames again during reconstruction. This should speed things up and make installation less of a hassle. - 2024-12-13: Improved for-loop processing in CLIP/Sync feature extraction by introducing a batch size multiplier. We can approximately use 40x batch size for CLIP/Sync without using more memory, thereby speeding up processing. Removed VAE encoder during inference -- we don't need it. - 2024-12-11: Replaced `torio.io.StreamingMediaDecoder` with `pyav` for reading framerate when reconstructing the input video. `torio.io.StreamingMediaDecoder` does not work reliably in huggingface ZeroGPU's environment, and I suspect that it might not work in some other environments as well.