Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

yaydl can't download from YouTube playlists yet. #6

Open
wkrettek opened this issue Jun 26, 2022 · 12 comments
Open

yaydl can't download from YouTube playlists yet. #6

wkrettek opened this issue Jun 26, 2022 · 12 comments

Comments

@wkrettek
Copy link

Here's a link I tried downloading:

https://www.youtube.com/watch?v=F8sZRBdmqc0&list=WL&index=10&t=1040s

yaydl can't parse this. It's a bit of a complicated link because it's part of a playlist and there's a timestamp attached. However, it can parse this link

https://www.youtube.com/watch?v=F8sZRBdmqc0

I think with a little bit of regex, the extractor could parse these links better. Probably in the future, it would be good to parse it and recognize it's part of a list and give the option to download the whole playlist, but an easy solution for now is to throw everything past the watch ID out and send it to the downloader. The timestamp can probably be thrown out in almost all cases.

@dertuxmalwieder
Copy link
Owner

Hmm. Yes, indeed. There are several TODOs for this:

  1. Add a flag to yaydl to switch between playlists and non-playlists (e.g. -p).
  2. If -p is not supplied, the &list part will be skipped.
  3. Otherwise, youtube.rs needs playlist support.

I hope I'll find the time to work on this soon. Contributions are welcome.

@wkrettek
Copy link
Author

I had the idea of making a Rust downloader like this that is fully interoperable with the extractors from youtube-dl. The main draw of youtube-dl is the community that is constantly adding extractors for new sites. The rest of the code, especially the cpu-bound stuff, can and should be rewritten in Rust. But in the meantime, before we have extractors for every site, we could take advantage of the existing solutions. Support for that is something I would be interested in looking into if it fits within the goals of the project.

@dertuxmalwieder
Copy link
Owner

That would require Python support in yaydl, wouldn’t it?

@wkrettek
Copy link
Author

Hmm it appears that it might. I envisioned using Py03 to call python extractors using Rust bindings and it looks like it uses an embedded interpreter to make that happen. I think the harder part would be that they usually make calls to other python utils that the youtube dl library provides. I guess we'd have to make python bindings that call our Rust code the other way? Probably would take a lot longer than just updating the regex in the existing Rust extractor, but if it worked it would add a lot of functionality that could later be oxidized.

@dertuxmalwieder
Copy link
Owner

I would actually like to see a “generic” extractor like youtube-dl’s in yaydl which would solve most problems if done right…?

@wkrettek
Copy link
Author

Looking at the youtube-dl generic extractor it looks like the main guts of it start at the _real_extract function. Looks like it checks a bunch of common things to look for the video and then checks for playlist files like m3u and xspf. Funnily enough, it will do a bunch of fallback checks for embedded videos using the existing extractors for other sites. Would be interesting to see what the upper limit is for generic extractor effectiveness.

@dertuxmalwieder
Copy link
Owner

Hmm. A generic "look for anything m3u(8) and fetch everything in it" extractor should already be doable with yaydl's built-in methods and the site scraper crate.

I'll be (mostly) off the keyboard over the weekend, so I probably won't look at this ticket before next week (presumably, also the weekend). Thank you for your ideas so far!

@dertuxmalwieder
Copy link
Owner

PSA:
I pushed yaydl 0.10.1 to crates.io, this repository will be updated in a minute or two, only addressing the "broken" (= incomplete) regex in your original bug report.

Playlists are still left as an exercise to ... uh ... me, I guess.
(Actually, to anyone.)

@dertuxmalwieder dertuxmalwieder changed the title yaydl can't parse youtube links with extended URLs yaydl can't download from YouTube playlists yet. Jun 30, 2022
@akshettrj
Copy link

A similar error occurs if we use a shorts link (e.g. https://youtube.com/shorts/<video_id>)

However, if we convert it into a normal video link (https://youtube.com/watch/?v=<video_id>) then it works

So, maybe you can handle that in the regex as well because other wise it panics due to an unwrap() on the capture groups

@dertuxmalwieder
Copy link
Owner

Try this:

let id_regex = Regex::new(r"(?:v=|\.be/|shorts/)(.*?)(&.*)*$").unwrap();

Does it work? If so, I'll push an update...

@akshettrj
Copy link

No, its does not work.

I tried with the following link https://www.youtube.com/shorts/HVcVhfq1SVY

@dertuxmalwieder
Copy link
Owner

I pushed a 0.12.0 upstream that seems to detect the URL, at least...?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants