Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Getting AttributeError when downloading pdf #14

Open
javierelpianista opened this issue Aug 9, 2021 · 10 comments
Open

Getting AttributeError when downloading pdf #14

javierelpianista opened this issue Aug 9, 2021 · 10 comments

Comments

@javierelpianista
Copy link

When I try to download an article using

scidownl -D <doi>

I get the following error:

File "/home/jgarcia/.local/lib/python3.9/site-packages/scidownl/scihub.py", line 112, in find_pdf_in_html
pdf_url = soup.find('iframe', {'id': 'pdf'}).attrs['src'].split('#')[0]

AttributeError: 'NoneType' object has no attribute 'attrs'

This didn't happen before. I am using Arch Linux, but also tried in a virtual machine with Linux Mint.
Accessing SciHub manually and downloading the article works.

@javierelpianista javierelpianista changed the title Not working anymore Getting AttributeError when downloading pdf Aug 9, 2021
@grace-reed
Copy link

grace-reed commented Aug 14, 2021

Hi Javier,
I am getting a really similar problem because my scidownl does not work either. I am using the same code as you (scidownl -D < doi >). I checked line 112 and see that the AttributeError is caused by beautiful soup and refers to embedding that sci-hub does to papers, placing them in an iframe, then the private function below searches for the embedding and assigns it to the iframe variable. The problem lies in beautiful soup returning NoneType for iframe. they may have renamed iframe to something else. My new question lies with the html through sci hub.

def _search_direct_url(self, identifier):
    """
    Sci-Hub embeds papers in an iframe. This function finds the actual
    source url which looks something like https://moscow.sci-hub.io/.../....pdf.
    """
    res = self.sess.get(self.base_url + identifier, verify=False)
    s = self._get_soup(res.content)
    iframe = s.find('iframe')
    if iframe:
        return iframe.get('src') if not iframe.get('src').startswith('//') \
            else 'http:' + iframe.get('src')

Grace

@fridrichmrtn
Copy link

fridrichmrtn commented Sep 20, 2021

It appears to me, that sci-hub does not use the frames anymore. They utilize divs, see the example below.

<div id="article">
        <embed type="application/pdf" src="https://twin.sci-hub.se/6279/8a941ec16c0cd4c9ad1bf5ab29139335/ahmed2017.pdf#navpanes=0&amp;view=FitH" id="pdf">
</div>

PR with the quickfix below.

#16

@ddh101
Copy link

ddh101 commented Oct 19, 2021

simply replace
pdf_url = soup.find('iframe', {'id': 'pdf'}).attrs['src'].split('#')[0]
with
pdf_url = soup.find('embed', {'id': 'pdf'}).attrs['src'].split('#')[0]
works for me

@grace-reed
Copy link

grace-reed commented Oct 19, 2021 via email

@BaHole
Copy link

BaHole commented Nov 13, 2021

simply replace pdf_url = soup.find('iframe', {'id': 'pdf'}).attrs['src'].split('#')[0] with pdf_url = soup.find('embed', {'id': 'pdf'}).attrs['src'].split('#')[0] works for me

Hi,I use python3.9 version , it occurs the same trouble like you mentioned above , and I followed your advise,however, it still didn't work @ddhecnu

@grace-reed
Copy link

grace-reed commented Nov 13, 2021 via email

@PhelaPoscam
Copy link

I still get the error in some articles even with iframe to embed change

Traceback (most recent call last):
File "\main.py", line 20, in
download(DOIs)
File "\main.py", line 13, in download
SciHub(doi, out).download(choose_scihub_url_index=1)
File "\scihub.py", line 88, in download
pdf = self.find_pdf_in_html(res.text)
File "\scihub.py", line 112, in find_pdf_in_html
pdf_url = soup.find('embed', {'id': 'pdf'}).attrs['src'].split('#')[0]
AttributeError: 'NoneType' object has no attribute 'attrs'

@fridrichmrtn
Copy link

fridrichmrtn commented Jan 19, 2022

Great, what about a reproducible example? DOI maybe? I just randomly checked the sci-hub, and it seems fine. PDFs flourishing and resting in their embed lane.

@PhelaPoscam
Copy link

PhelaPoscam commented Jan 19, 2022

My bad. The errors were occurring in articles not yet available on scihub. I hadn't realized that was the problem.

@fridrichmrtn
Copy link

image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants