Skip to content

Commit

Permalink
Change to html5lib parser
Browse files Browse the repository at this point in the history
This parser keeps the whitespace characters in `span` elements which
fixes code block rendering.
  • Loading branch information
2m committed Mar 29, 2022
1 parent 46c7707 commit 4b4c715
Show file tree
Hide file tree
Showing 4 changed files with 138 additions and 67 deletions.
2 changes: 1 addition & 1 deletion loconotion/modules/notionparser.py
Original file line number Diff line number Diff line change
Expand Up @@ -272,7 +272,7 @@ def parse_page(self, url: str):
self.open_toggle_blocks(self.args["timeout"])

# creates soup from the page to start parsing
soup = BeautifulSoup(self.driver.page_source, "html.parser")
soup = BeautifulSoup(self.driver.page_source, "html5lib")

self.clean_up(soup)
self.set_custom_meta_tags(url, soup)
Expand Down
178 changes: 119 additions & 59 deletions poetry.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

1 change: 1 addition & 0 deletions pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,7 @@ cssutils = "^1.0.2"
requests = "^2.23.0"
selenium = "^3.141.0"
toml = "^0.10.1"
html5lib = "^1.1"

[tool.poetry.dev-dependencies]
black = "^19.10b0"
Expand Down
Loading

0 comments on commit 4b4c715

Please sign in to comment.