-
Notifications
You must be signed in to change notification settings - Fork 27
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added lxml_html_clean and updated pdfplumber dependencies #143
Conversation
Thanks for this @koddas! Looks like some tests are failing for unrelated reasons, I'll take a look at those soon |
@GjjvdBurg No worries. Is there a way to run the tests locally? The readme isn't that informative on the matter :) |
The commit includes three tests and an update to the readme file.
Well, this sucks... I managed to to commit some changes I made to the wrong branch. It's totally fine if you reject the lastest commit, it was supposed to be a new PR after the current PR has been approved. |
Thanks for the changes @koddas! I'm happy to merge this with the addition of the DiVa provider, but the tests are currently failing because of formatting issues, and the |
paper2remarkable/providers/diva.py
Outdated
soup = bs4.BeautifulSoup(page, "html.parser") | ||
|
||
pdf_url = soup.find("meta", {"name": "citation_pdf_url"}) | ||
print(pdf_url) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would you mind removing this print
statement?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Certainly!
tests/test_providers.py
Outdated
# Testing absolute URLs and sanitization of filenames | ||
prov = DiVA(upload=False, verbose=VERBOSE) | ||
url = "https://www.diva-portal.org/smash/record.jsf?pid=diva2%3A1480467" | ||
exp = "Alhussein_-_Privacy_by_Design_Internet_of_Things_managing_privacy_2018.pdf" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When running the test I get Alhussein_-_Privacy_by_Design_Amp_Internet_of_Things_Managing_Privacy_2018.pdf
which causes it to fail
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, thanks!
Thanks @koddas, merged! |
The LXML project has separated its HTML cleaner into its own project, breaking the current build. This change updates the required packages list in setup.py to reflect that change. Also, for some reason, the build didn't finish as expected unless I updated the pdfplumber dependency to 0.11.