-
-
Notifications
You must be signed in to change notification settings - Fork 19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
New Extractor! #33
base: main
Are you sure you want to change the base?
New Extractor! #33
Conversation
WalkthroughThis pull request introduces a new VixCloud extractor to the mediaflow proxy system. The changes include adding a new extractor class for VixCloud, updating the extractor factory to include the new extractor, modifying the schemas to support the new host, and adding a debug print statement in the base extractor. The VixCloudExtractor implements methods to retrieve the version of the VixCloud site and extract media URLs, expanding the system's capability to handle VixCloud media sources. Changes
Sequence DiagramsequenceDiagram
participant Client
participant ExtractorFactory
participant VixCloudExtractor
participant VixCloudSite
Client->>ExtractorFactory: Request extractor for VixCloud
ExtractorFactory->>VixCloudExtractor: Create extractor
Client->>VixCloudExtractor: Extract URL
VixCloudExtractor->>VixCloudSite: Check site version
VixCloudSite-->>VixCloudExtractor: Return version
VixCloudExtractor->>VixCloudSite: Request media URL
VixCloudSite-->>VixCloudExtractor: Return media details
VixCloudExtractor-->>Client: Return extracted URL and headers
Poem
📜 Recent review detailsConfiguration used: CodeRabbit UI ⛔ Files ignored due to path filters (1)
📒 Files selected for processing (1)
🔇 Additional comments (1)pyproject.toml (1)
The project generally uses specific versions (e.g., Let's verify if this version is appropriate: ✅ Verification successfulUsing The version constraint can be safely changed to use the caret notation:
🏁 Scripts executedThe following scripts were executed for the analysis: Script: #!/bin/bash
# Description: Check for the latest stable versions and any security advisories for beautifulsoup4
# Check PyPI for latest versions
curl -s https://pypi.org/pypi/beautifulsoup4/json | jq '.info.version'
# Check for security advisories
gh api graphql -f query='
{
securityVulnerabilities(first: 5, ecosystem: PIP, package: "beautifulsoup4") {
nodes {
advisory {
summary
severity
publishedAt
}
vulnerableVersionRange
firstPatchedVersion {
identifier
}
}
}
}'
Length of output: 427 Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
🧹 Nitpick comments (2)
mediaflow_proxy/extractors/vixcloud.py (2)
1-9
: Ensure robust class-level documentation.
Although the docstring for the class briefly describes its purpose, consider adding more details on usage scenarios and any assumptions made (e.g., domain patterns and potential IP-locked scenarios). This additional clarity will assist future maintainers.
61-65
: Encourage additional test coverage.
Since this extractor handles multiple request/response steps, consider unit tests covering both happy paths and partial failures (e.g., missing token, invalid domain, unexpected status codes, and alternative query parameter scenarios).
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (4)
mediaflow_proxy/extractors/base.py
(1 hunks)mediaflow_proxy/extractors/factory.py
(2 hunks)mediaflow_proxy/extractors/vixcloud.py
(1 hunks)mediaflow_proxy/schemas.py
(1 hunks)
✅ Files skipped from review due to trivial changes (1)
- mediaflow_proxy/extractors/base.py
🧰 Additional context used
🪛 Ruff (0.8.2)
mediaflow_proxy/extractors/vixcloud.py
50-50: Local variable quality
is assigned to but never used
Remove assignment to unused variable quality
(F841)
55-55: Local variable canPlayFHD
is assigned to but never used
Remove assignment to unused variable canPlayFHD
(F841)
58-58: Local variable b
is assigned to but never used
Remove assignment to unused variable b
(F841)
🔇 Additional comments (5)
mediaflow_proxy/extractors/vixcloud.py (2)
51-60
: Check for '/embed/' substring.
Using split on '/embed/' can break if the iframe URL is missing “/embed/”. Consider adding a check or fallback to gracefully handle alternative URL formats.
🧰 Tools
🪛 Ruff (0.8.2)
55-55: Local variable canPlayFHD
is assigned to but never used
Remove assignment to unused variable canPlayFHD
(F841)
58-58: Local variable b
is assigned to but never used
Remove assignment to unused variable b
(F841)
32-45
: Validate domain extraction logic.
Relying on string splitting to parse domains can fail when extra subdomains or alternative URL structures are used. Consider using “urlparse” more extensively to handle edge cases gracefully.
mediaflow_proxy/extractors/factory.py (2)
10-10
: Import statement for 'VixCloudExtractor' looks good.
No issues here. This correctly references the new extractor.
24-24
: Factory entry for 'VixCloud'.
The dictionary mapping is well done. Ensure all relevant documentation and usage references also list 'VixCloud' in the supported hosts.
mediaflow_proxy/schemas.py (1)
66-66
: Addition of "VixCloud" to 'host' is consistent.
The schema change aligns with the new extractor. Ensure front-end or endpoint validations also reflect this new value.
script = soup.find("body").find("script").text | ||
token = re.search(r"'token':\s*'(\w+)'", script).group(1) | ||
expires = re.search(r"'expires':\s*'(\d+)'", script).group(1) | ||
quality = re.search(r'"quality":(\d+)', script).group(1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Remove unused variables 'quality', 'canPlayFHD', and 'b'.
These variables are flagged by static analysis as unused. Removing them reduces clutter and improves maintainability.
- quality = re.search(r'"quality":(\d+)', script).group(1)
...
- canPlayFHD = "h=1"
...
- b = "b=1"
Also applies to: 55-55, 58-58
🧰 Tools
🪛 Ruff (0.8.2)
50-50: Local variable quality
is assigned to but never used
Remove assignment to unused variable quality
(F841)
async def version(self, domain: str) -> str: | ||
"""Get version of VixCloud Parent Site.""" | ||
DOMAIN = domain | ||
base_url = f"https://streamingcommunity.{DOMAIN}/richiedi-un-titolo" | ||
response = await self._make_request( | ||
base_url, | ||
headers={ | ||
"Referer": f"https://streamingcommunity.{DOMAIN}/", | ||
"Origin": f"https://streamingcommunity.{DOMAIN}", | ||
}, | ||
) | ||
if response.status_code != 200: | ||
raise ExtractorError("Outdated Domain") | ||
# Soup the response | ||
soup = BeautifulSoup(response.text, "lxml", parse_only=SoupStrainer("div", {"id": "app"})) | ||
if soup: | ||
# Extract version | ||
version = json.loads(soup.find("div", {"id": "app"}).get("data-page"))["version"] | ||
return version | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Improve error handling in 'version' method.
Currently, if the JSON structure or the "version" key is missing, the code will raise an unhandled exception. Consider adding a try-except block around the JSON parsing and key access. This ensures the method fails gracefully if the response format changes.
soup = BeautifulSoup(response.text, "lxml", parse_only=SoupStrainer("div", {"id": "app"}))
if soup:
- version = json.loads(soup.find("div", {"id": "app"}).get("data-page"))["version"]
- return version
+ try:
+ data = json.loads(soup.find("div", {"id": "app"}).get("data-page"))
+ return data["version"]
+ except (KeyError, json.JSONDecodeError, AttributeError) as e:
+ raise ExtractorError(f"Failed to parse version: {e}")
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
async def version(self, domain: str) -> str: | |
"""Get version of VixCloud Parent Site.""" | |
DOMAIN = domain | |
base_url = f"https://streamingcommunity.{DOMAIN}/richiedi-un-titolo" | |
response = await self._make_request( | |
base_url, | |
headers={ | |
"Referer": f"https://streamingcommunity.{DOMAIN}/", | |
"Origin": f"https://streamingcommunity.{DOMAIN}", | |
}, | |
) | |
if response.status_code != 200: | |
raise ExtractorError("Outdated Domain") | |
# Soup the response | |
soup = BeautifulSoup(response.text, "lxml", parse_only=SoupStrainer("div", {"id": "app"})) | |
if soup: | |
# Extract version | |
version = json.loads(soup.find("div", {"id": "app"}).get("data-page"))["version"] | |
return version | |
async def version(self, domain: str) -> str: | |
"""Get version of VixCloud Parent Site.""" | |
DOMAIN = domain | |
base_url = f"https://streamingcommunity.{DOMAIN}/richiedi-un-titolo" | |
response = await self._make_request( | |
base_url, | |
headers={ | |
"Referer": f"https://streamingcommunity.{DOMAIN}/", | |
"Origin": f"https://streamingcommunity.{DOMAIN}", | |
}, | |
) | |
if response.status_code != 200: | |
raise ExtractorError("Outdated Domain") | |
# Soup the response | |
soup = BeautifulSoup(response.text, "lxml", parse_only=SoupStrainer("div", {"id": "app"})) | |
if soup: | |
try: | |
data = json.loads(soup.find("div", {"id": "app"}).get("data-page")) | |
return data["version"] | |
except (KeyError, json.JSONDecodeError, AttributeError) as e: | |
raise ExtractorError(f"Failed to parse version: {e}") |
if soup: | ||
script = soup.find("body").find("script").text | ||
token = re.search(r"'token':\s*'(\w+)'", script).group(1) | ||
expires = re.search(r"'expires':\s*'(\d+)'", script).group(1) | ||
quality = re.search(r'"quality":(\d+)', script).group(1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Codebase verification
Based on the code review, I can now generate the final response:
Add error handling for regex matches to prevent NoneType errors
- The regex matches for 'token', 'expires', and 'quality' in
vixcloud.py
are performed without any error handling - Since the class already inherits from
BaseExtractor
which hasExtractorError
, use it to handle failed matches - Add checks before accessing
.group(1)
for all three regex patterns:
token_match = re.search(r"'token':\s*'(\w+)'", script)
if not token_match:
raise ExtractorError("Failed to extract token from script")
token = token_match.group(1)
🔗 Analysis chain
Confirm presence of 'token', 'expires', and 'quality' in the script.
Where regex matching fails, you may encounter a NoneType error. Validate these matches before calling '.group(1)' to avoid run-time failures.
Let me verify the presence of error handling for these regex matches by searching through the codebase.
🏁 Scripts executed
The following scripts were executed for the analysis:
Error: No shell scripts found
Script:
#!/bin/bash
# Search for the file content to understand the context and error handling
cat mediaflow_proxy/extractors/vixcloud.py
Length of output: 3011
🧰 Tools
🪛 Ruff (0.8.2)
50-50: Local variable quality
is assigned to but never used
Remove assignment to unused variable quality
(F841)
Changes:
Added Support for VixCloud in Extractors: IP-Locked, some domains are protected by cloudfare.
URL to pass: siteurl/iframe/id
Note that right now links generated by it are not functional, but this is more an issue of mediaflow-proxy rather than the extractor itself since those links work fine using Stremio-Server.
Summary by CodeRabbit
New Features
VixCloudExtractor
for extracting URLs from VixCloud.host
options in URL extraction to include "VixCloud".Bug Fixes
Documentation