-
Notifications
You must be signed in to change notification settings - Fork 430
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Error with making a music video (av.error.ValueError: [Errno 22] Invalid argument) #190
Comments
Not sure about the install issue, will look into it asap (next day or so). As for setup, yea that's a great idea. That's what the colab was about, but I use lambda too. Next time I spin up an instance I'll sort out the install + add lambda specific instructions to readme. if you have time, feel free to issue a PR with these instructions and I'll give it a try :) |
Legend @nateraw thanks buddy! I'd be happy to help fill out some instructions for lambda VM setup but I haven't actually gotten the music video mode working there yet. This morning I spent 2 hours or so trying some more things and sadly ended up bumping into the same error (i.e. I tried replicating the exact conda set up you mentioned you used in a few other threads, and I also tried locking torchvision to the previous 0.14.1 version, etc). Whenever you have the time, let me know if you get it working in that context, and I'll happily help with a PR however I can. |
I'm taking a look on a fresh A100 instance from lambda 😎 will let ya know how it goes |
Few issues going on.
To set up on new Lambda A10 instance to make music videosInstall deps. I added xformers cuz it speeds things up (40 sec per batch in example below to 26 on A10).
Get a song
Then you can run: import random
import torch
from stable_diffusion_videos import StableDiffusionWalkPipeline
pipe = StableDiffusionWalkPipeline.from_pretrained(
'runwayml/stable-diffusion-v1-5',
torch_dtype=torch.float16,
safety_checker=None,
).to("cuda")
# Comment the line below if you do not have xformers installed.
pipe.enable_xformers_memory_efficient_attention()
# I give you permission to scrape this song :)
# youtube-dl -f bestaudio --extract-audio --audio-format mp3 --audio-quality 0 -o "music/thoughts.%(ext)s" https://soundcloud.com/nateraw/thoughts
audio_filepath = 'music/thoughts.mp3'
# Seconds in the song. Here we slice the audio from 0:07-0:13
# Should be same length as prompts/seeds.
audio_offsets = [7, 10, 13]
# Output video frames per second.
# Use lower values for testing (5-10ish), higher values for better quality (30 or 60)
fps = 4 # Change back to 25-30ish, 4 is for testing
# Convert seconds to frames
# This array should be `len(prompts) - 1` as its steps between prompts.
num_interpolation_steps = [(b-a) * fps for a, b in zip(audio_offsets, audio_offsets[1:])]
prompts = ["a cat with a funny hat", "snoop dogg at the dmv", "steak flavored ice cream"]
seeds = [random.randint(0, 9e9) for _ in range(len(prompts))]
pipe.walk(
prompts=prompts,
seeds=seeds,
num_interpolation_steps=num_interpolation_steps,
fps=fps,
audio_filepath=audio_filepath,
audio_start_sec=audio_offsets[0],
batch_size=12, # Increase/decrease based on available GPU memory. This fits on 24GB A10
num_inference_steps=50,
guidance_scale=15,
margin=1.0,
smooth=0.2,
) Will add this stuff to the repo. :) let me know if you give it a shot. |
You're a wizard @nateraw!! Thanks a lot. Can confirm the above worked for me on a Lambda A10 :) Only minor thing to note is upgrading the pip version was still required (perhaps that's worth adding in the README when you next can). Also I'm still a little confused about the audio_offsets - I'm interpreting them as the point in the song when each prompt should "kick in" (so the differences between offsets are the duration of each prompt). However, the mp4 generated from your above example is only 6 seconds long, not sure if this is intentional. All good, that's for me to figure out, thanks again! |
Hi @nateraw, thanks for your great work on this package.
I'm currently struggling with generating a video synchronised with an mp3 file. I've set up my Python environment as per your requirements.txt file and getting the following error:
Traceback (most recent call last):
File "make_music_video.py", line 21, in
video_path = pipeline.walk(
File "/home/ubuntu/venv/lib/python3.8/site-packages/stable_diffusion_videos/stable_diffusion_pipeline.py", line 867, in walk
make_video_pyav(
File "/home/ubuntu/venv/lib/python3.8/site-packages/stable_diffusion_videos/stable_diffusion_pipeline.py", line 130, in make_video_pyav
write_video(
File "/home/ubuntu/venv/lib/python3.8/site-packages/torchvision/io/video.py", line 124, in write_video
for packet in a_stream.encode(frame):
File "av/stream.pyx", line 164, in av.stream.Stream.encode
File "av/codec/context.pyx", line 482, in av.codec.context.CodecContext.encode
File "av/audio/codeccontext.pyx", line 42, in av.audio.codeccontext.AudioCodecContext._prepare_frames_for_encode
File "av/audio/resampler.pyx", line 101, in av.audio.resampler.AudioResampler.resample
File "av/filter/graph.pyx", line 211, in av.filter.graph.Graph.push
File "av/filter/context.pyx", line 89, in av.filter.context.FilterContext.push
File "av/error.pyx", line 336, in av.error.err_check
av.error.ValueError: [Errno 22] Invalid argument
The error seems to occur after the code has already generated all the image frames and is attempting to prepare the final video file (my assumption but could be mistaken).
I tried to resolve the issue by attempting various permutations of the package versions in requirements.txt, including locking all versions to a release prior to Jan 7th when you last updated the requirements.txt. This sometimes changed the error to a different one (i.e. changing the version of librosa), but didn't ultimately resolve it.
My environment:
Python: 3.8.10
Pip: 23.1.2 (had to upgrade to the latest to properly install that basicsr package, as per my comment on #170).
OS: "Ubuntu 20.04.5 LTS"
CUDA version:
(nvcc --version output)
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2022 NVIDIA Corporation
Built on Wed_Sep_21_10:33:58_PDT_2022
Cuda compilation tools, release 11.8, V11.8.89
Build cuda_11.8.r11.8/compiler.31833905_0
This is actually a VM instance spun up from lambda labs, which gives you cheap access to a pretty powerful GPU cloud (like $0.60 / hour for this instance type). I mention that because one idea that might be helpful for users would be to have a proven way to set this package up in a standard, known environment -- maybe that could be one of these lambda labs VMs? Just a thought, I'm not affiliated with them in any way.
That way everyone is working off the same environment & there's an option for people like me who don't mind spending some pocket change to not deal with dependency hell :p
The text was updated successfully, but these errors were encountered: