Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Solution #371

Open
wants to merge 2 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,6 @@
*.iml
.env
.DS_Store
venv/
.venv/

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The .venv/ entry is correctly included to ignore virtual environment directories. However, ensure that this matches the actual directory name used in your project. If your virtual environment directory has a different name, you should update this entry accordingly.

.pytest_cache/
**__pycache__/
40 changes: 38 additions & 2 deletions app/parse.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,11 @@
from dataclasses import dataclass
import csv
from dataclasses import dataclass, fields, astuple

import requests
from bs4 import BeautifulSoup, Tag


BASE_URL = "https://quotes.toscrape.com/"


@dataclass
Expand All @@ -8,8 +15,37 @@ class Quote:
tags: list[str]


QUOTE_FIELDS = [field.name for field in fields(Quote)]


def parse_single_quote(quote: Tag) -> Quote:
return Quote(
text=quote.select_one(".text").text,
author=quote.select_one(".author").text,
tags=[tag.text for tag in quote.select(".tag")],
)


def parse_qutes(soup: Tag) -> list[Quote]:
quotes = [parse_single_quote(quote) for quote in soup.select(".quote")]

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a typo in the function name parse_qutes. It should be parse_quotes to accurately reflect its purpose and maintain consistency with naming conventions.

while next := soup.select_one(".next > a"):
text = requests.get(BASE_URL + next["href"]).content

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The use of the walrus operator := is correct here for Python 3.8 and above. Ensure that your environment supports this syntax, or consider using a different approach if compatibility with older Python versions is required.

soup = BeautifulSoup(text, "html.parser")
quotes.extend(
[parse_single_quote(quote) for quote in soup.select(".quote")]
)
return quotes


def main(output_csv_path: str) -> None:
pass
text = requests.get(BASE_URL).content
soup = BeautifulSoup(text, "html.parser")
quotes = parse_qutes(soup)

with open(output_csv_path, "w", encoding="utf-8", newline="") as f:
writer = csv.writer(f)
writer.writerow(QUOTE_FIELDS)
writer.writerows([astuple(quote) for quote in quotes])


if __name__ == "__main__":
Expand Down
17 changes: 17 additions & 0 deletions requirements.txt
Original file line number Diff line number Diff line change
@@ -1,6 +1,23 @@
attrs==24.3.0
beautifulsoup4==4.12.3
certifi==2024.12.14
charset-normalizer==3.4.1
colorama==0.4.6
flake8==5.0.4
flake8-annotations==2.9.1
flake8-quotes==3.3.1
flake8-variables-names==0.0.5
idna==3.10
iniconfig==2.0.0
mccabe==0.7.0
packaging==24.2
pep8-naming==0.13.2
pluggy==1.5.0
py==1.11.0
pycodestyle==2.9.1
pyflakes==2.5.0
pytest==7.1.3
requests==2.32.3
soupsieve==2.6
tomli==2.2.1
urllib3==2.3.0
Loading