Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

IndexError of Instagram #520

Open
QihanWangCo opened this issue Jul 19, 2022 · 19 comments
Open

IndexError of Instagram #520

QihanWangCo opened this issue Jul 19, 2022 · 19 comments
Labels
bug Something isn't working module:instagram

Comments

@QihanWangCo
Copy link

Hi, I want to use snscrape for collect instagram data. My code is:

import snscrape.modules.instagram as sninstagram
import pandas as pd

query='google' #change name
ins_s=[]
limit=10
for ins in sninstagram.InstagramHashtagScraper(query).get_items():
      print(vars(ins))
      break

And I got this error:

jsonData = r.text.split('<script type="text/javascript">window._sharedData = ')[1].split(';</script>')[0] # May throw an IndexError if Instagram changes something again; we just let that bubble.
IndexError: list index out of range

How can I fix it?

@mettsal
Copy link

mettsal commented Jul 19, 2022

Same issue here. Had a working code since 05/07/22, has basically the same structure as yours, and it ran fine. Untill today, that is: now it breaks at the same line (I believe) - during the .get_items() in the for loop.

Also adding another part of the error that may have to do with the issue "_logger.warning(f'Page does not exist')".

    [106][...]/Python310/lib/site-packages/snscrape/modules/instagram.py?line=105) def get_items(self):
--> [107][...]/Python310/lib/site-packages/snscrape/modules/instagram.py?line=106) 	r = self._initial_page()
    [108][...]/Python310/lib/site-packages/snscrape/modules/instagram.py?line=107) 	if r.status_code == 404:
    [109][...]/Python310/lib/site-packages/snscrape/modules/instagram.py?line=108) 		_logger.warning(f'Page does not exist')``` 

@JustAnotherArchivist
Copy link
Owner

As the comment there suggests, this is due to changes on Instagram's side. They recently overhauled their site a bit. The scraper needs to be adapted to those changes.

@JustAnotherArchivist JustAnotherArchivist added bug Something isn't working module:instagram labels Jul 20, 2022
@QihanWangCo
Copy link
Author

As the comment there suggests, this is due to changes on Instagram's side. They recently overhauled their site a bit. The scraper needs to be adapted to those changes.

Thanks for your answer! Really looking forward to the adaption!!

@kallewesterling
Copy link

Any updates on this yet? Curious if we can help somehow!

@TheTechRobo
Copy link
Contributor

TheTechRobo commented Sep 26, 2022

Any updates on this yet? Curious if we can help somehow!

If you're a programmer, you could send a fix via the "pull requests" feature (or just by suggesting a fix!).

@kallewesterling
Copy link

Yeah, I know how GitHub works — just wanted to know whether there is any active development happening elsewhere on this particular issue.

@barisulgen
Copy link

Is this is a dead repo now?

@JustAnotherArchivist
Copy link
Owner

No, but there hasn't been anything worth saying.

This issue, along with any other Instagram or Facebook issues, is effectively blocked by their silly rate limits. They make development of the corresponding scrapers very annoying since rapid testing is very tricky. I haven't had time to look into possible workarounds to make that less unpleasant and less time-intensive. So for now, those scrapers are unfortunately poorly supported by me. I'll happily consider PRs though.

@purut18
Copy link

purut18 commented Oct 23, 2022

Hey @JustAnotherArchivist, I'm trying to solve this issue. Can you share what we're looking for in the source code returned?

Is it a JSON link or plain JSON? Currently, there is no script with the type "text/javascript" returned by Instagram.

It would be great if you could share what was being stored in "jsonData" before this error came. Thanks!

@JustAnotherArchivist
Copy link
Owner

@purut18 I don't recall the exact format etc., but it was basically some context information (profile, hashtag, location, etc.) and the first page of posts, I believe.

@purut18
Copy link

purut18 commented Oct 24, 2022

Well... nothing like that is being returned in the source code of Instagram now. (If someone else can confirm this, please?)

I think Instagram changed it or moved to dynamic rendering to prevent scrapping :/

@0bmay
Copy link

0bmay commented Jul 1, 2023

I am working on a fix for Instagram. So far searching by user and hashtags are working. Location will be soon™️

@kallewesterling
Copy link

In #1001?

@0bmay
Copy link

0bmay commented Jul 5, 2023

logged out users for locations always returns a single page of data and there is a pretty strict rate limit on getting data from the platform. But data is returned, for now.

@feusagittaire
Copy link

@0bmay i keep getting "IndexError: list index out of range" when trying to "for post in sns.InstagramHashtagScraper(query).get_items()"
how could i resolve this? ;/

@TheTechRobo
Copy link
Contributor

@feusagittaire The pull request hasn't been merged to snscrape yet

@feusagittaire
Copy link

logged out users for locations always returns a single page of data and there is a pretty strict rate limit on getting data from the platform. But data is returned, for now.

Tysm for that! If I may ask, it will be implemented in any time soon?

@TheTechRobo
Copy link
Contributor

TheTechRobo commented Aug 8, 2023

@feusagittaire Until the pull request is merged, you should be able to do a pip install -U git+https://github.com/0bmay/snscrape@insta_fix to install their copy of snscrape.

@feusagittaire
Copy link

tysm for the tip!!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working module:instagram
Projects
None yet
Development

No branches or pull requests

9 participants