Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

All Twitter scrapes are failing: blocked (404) #996

Open
JustAnotherArchivist opened this issue Jun 30, 2023 · 157 comments
Open

All Twitter scrapes are failing: blocked (404) #996

JustAnotherArchivist opened this issue Jun 30, 2023 · 157 comments
Labels
bug Something isn't working module:twitter upstream

Comments

@JustAnotherArchivist
Copy link
Owner

With the exception of twitter-trends, all Twitter scrapes are failing since sometime in the past hour. This is likely connected to Twitter as a whole getting locked behind a login wall since earlier today. There is no known workaround at this time, and it's not known whether this will be fixable.

@yeahjack
Copy link

So sad :-(
My research project is strongly related to this lib, and pay tribute to your effort in maintaining this.

@viktorzen
Copy link

Twitter disabled their public web site today (2023-06-30) and require users to login, twitter used to be public prior to this date. Would it be possible to automate the login as well providing a username and pw to snscrape, i.e. before calling a graphql api to login to twitter and simulate a logged-in session?

@yeahjack
Copy link

yeahjack commented Jun 30, 2023

I do not think the developer would do this, as he said that auth would never be added into features: see #270 .
Let's see what our great developers' solution, hope it would not take long.

@enzoferey

This comment was marked as off-topic.

@midnightmagic

This comment was marked as off-topic.

@midnightmagic
Copy link

Please consider deleting my prior off-topic comment.

Don't nuke this one as off-topic: A Twitter employee says it's temporary:

https://twitter.com/AqueelMiq/status/1674843555486134272
"this is a temporary restriction, we will re-enable logged out twitter access in the near future"

@Wouze
Copy link

Wouze commented Jul 1, 2023

Elon talked about it too 💀
https://twitter.com/elonmusk/status/1674942336583757825

@JustAnotherArchivist JustAnotherArchivist changed the title All Twitter scrapes are failing All Twitter scrapes are failing: blocked (404) Jul 1, 2023
@akanachuu

This comment was marked as duplicate.

@khorg0sh
Copy link

khorg0sh commented Jul 1, 2023

Elon talked about it too 💀 https://twitter.com/elonmusk/status/1674942336583757825

Musk referred to EXTREME scraping, indicating that scrapers may no longer be functional post changes. Let's see how it is done.

@akanachuu

This comment was marked as off-topic.

@Benniepie
Copy link

Benniepie commented Jul 1, 2023

Hello,

This may or may not help. Here's a route to access Tweets without logging in (contains further iframe to platform.twitter.com):
https://cdn.embedly.com/widgets/media.html?type=text%2Fhtml&key=a19fcc184b9711e1b4764040d3dc5c07&schema=twitter&url=https://twitter.com/elonmusk/status/1674865731136020505

Would combining this with a pre-existing list of Tweets allow data scraping to continue? Alternatively users could build the tweet list using google search, e.g. for Tesla tweets: "site:twitter.com/tesla/status" or via another cached list (e.g. Waybackmachine - https://web.archive.org/web/*/https://twitter.com/tesla/status*)

If I'm off the mark, I apologise but thought I'd pass this on, on the off chance it may help at least as a temporary measure.

Just a note to @JustAnotherArchivist - thank you for the hard work you have put into this library - it is very much appreciated

Ben

@arfathyahiya
Copy link

Hello,

This may or may not help. Here's a route to access Tweets without logging in (contains further iframe to platform.twitter.com): https://cdn.embedly.com/widgets/media.html?type=text%2Fhtml&key=a19fcc184b9711e1b4764040d3dc5c07&schema=twitter&url=https://twitter.com/elonmusk/status/1674865731136020505

Would combining this with a pre-existing list of Tweets allow data scraping to continue? Alternatively users could build the tweet list using google search, e.g. for Tesla tweets: "site:twitter.com/tesla/status" or via another cached list (e.g. Waybackmachine - https://web.archive.org/web//https://twitter.com/tesla/status)

If I'm off the mark, I apologise but thought I'd pass this on, on the off chance it may help at least as a temporary measure.

Just a note to @JustAnotherArchivist - thank you for the hard work you have put into this library - it is very much appreciated

Ben

URL: https://cdn.syndication.twimg.com/tweet-result

CODE:

import requests

url = "https://cdn.syndication.twimg.com/tweet-result"

querystring = {"id":"1652193613223436289","lang":"en"}

payload = ""
headers = {
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:109.0) Gecko/20100101 Firefox/114.0",
    "Accept": "*/*",
    "Accept-Language": "en-US,en;q=0.5",
    "Accept-Encoding": "gzip, deflate, br",
    "Origin": "https://platform.twitter.com",
    "Connection": "keep-alive",
    "Referer": "https://platform.twitter.com/",
    "Sec-Fetch-Dest": "empty",
    "Sec-Fetch-Mode": "cors",
    "Sec-Fetch-Site": "cross-site",
    "Pragma": "no-cache",
    "Cache-Control": "no-cache",
    "TE": "trailers"
}

response = requests.request("GET", url, data=payload, headers=headers, params=querystring)

print(response.text)

Generated by Insomnia

@yeahjack

This comment was marked as off-topic.

@dadiaz1424

This comment was marked as off-topic.

@Write
Copy link

Write commented Jul 1, 2023

https://twitter.com/elonmusk/status/1675187969420828672

😂

@ElonMusk
To address extreme levels of data scraping & system manipulation, we’ve applied the following temporary limits:

  • Verified accounts are limited to reading 6000 posts/day
  • Unverified accounts to 600 posts/day
  • New unverified accounts to 300/day

@Fa5g

This comment was marked as off-topic.

@MazenTayseer

This comment was marked as resolved.

@Fa5g
Copy link

Fa5g commented Jul 2, 2023

Scraping seems to be still possible, check this:

https://rss-bridge.org/bridge01/?action=display&bridge=TwitterBridge&context=By+username&u=elonmusk&format=html

https://rss-bridge.org/bridge01/?action=display&bridge=TwitterBridge&context=By+username&u=elonmusk&format=json

By https://github.com/RSS-Bridge/rss-bridge

@MrCube21

This comment was marked as resolved.

@Write

This comment was marked as resolved.

@MahmuudNabil

This comment was marked as duplicate.

@leockl

This comment was marked as spam.

@ihabpalamino

This comment was marked as spam.

@Nik-Kras
Copy link

Nik-Kras commented Sep 7, 2023

Hi @arfathyahiya, with this script #996 (comment) and a working token, how far back can the tweets go?

I used this script A to get the Guest Token [When you get an error Failed to fetch guest account, is your IP rate limited or so? -> Turn on / change VPN, it helped me] and then applied this token to script B to get tweets. But it doesn't work. I assume it used to work before, so leaving this comment to update you on the situation.

If it changed again - please mention.

Script A:

#!/usr/bin/env python3
import sys
import json
import textwrap
import requests

with requests.Session() as session:
    guest_token = session.post("https://api.twitter.com/1.1/guest/activate.json", headers={
        "Authorization": "Bearer AAAAAAAAAAAAAAAAAAAAAFXzAwAAAAAAMHCxpeSDG1gLNLghVe8d74hl6k4%3DRUMF4xAQLsbeBhTSRrCiQpJtxoGWeyHrDb5te2jpGskWDFW82F",
    }).json()["guest_token"]

    flow_token_resp = session.post("https://api.twitter.com/1.1/onboarding/task.json?flow_name=welcome&api_version=1&known_device_token=&sim_country_code=us", headers={
        "Authorization": "Bearer AAAAAAAAAAAAAAAAAAAAAFXzAwAAAAAAMHCxpeSDG1gLNLghVe8d74hl6k4%3DRUMF4xAQLsbeBhTSRrCiQpJtxoGWeyHrDb5te2jpGskWDFW82F",
        "Content-Type": "application/json",
        "User-Agent": "TwitterAndroid/9.95.0-release.0 (29950000-r-0) ONEPLUS+A3010/9 (OnePlus;ONEPLUS+A3010;OnePlus;OnePlus3;0;;1;2016)",
        "X-Twitter-API-Version": "5",
        "X-Twitter-Client": "TwitterAndroid",
        "X-Twitter-Client-Version": "9.95.0-release.0",
        "OS-Version": "28",
        "System-User-Agent": "Dalvik/2.1.0 (Linux; U; Android 9; ONEPLUS A3010 Build/PKQ1.181203.001)",
        "X-Twitter-Active-User": "yes",
        "X-Guest-Token": guest_token,
    }, data=textwrap.dedent(
        """{
            "flow_token": null,
            "input_flow_data": {
                "country_code": null,
                "flow_context": {
                    "start_location": {
                        "location": "splash_screen"
                    }
                },
                "requested_variant": null,
                "target_user_id": 0
            },
            "subtask_versions": {
                "generic_urt": 3,
                "standard": 1,
                "open_home_timeline": 1,
                "app_locale_update": 1,
                "enter_date": 1,
                "email_verification": 3,
                "enter_password": 5,
                "enter_text": 5,
                "one_tap": 2,
                "cta": 7,
                "single_sign_on": 1,
                "fetch_persisted_data": 1,
                "enter_username": 3,
                "web_modal": 2,
                "fetch_temporary_password": 1,
                "menu_dialog": 1,
                "sign_up_review": 5,
                "interest_picker": 4,
                "user_recommendations_urt": 3,
                "in_app_notification": 1,
                "sign_up": 2,
                "typeahead_search": 1,
                "user_recommendations_list": 4,
                "cta_inline": 1,
                "contacts_live_sync_permission_prompt": 3,
                "choice_selection": 5,
                "js_instrumentation": 1,
                "alert_dialog_suppress_client_events": 1,
                "privacy_options": 1,
                "topics_selector": 1,
                "wait_spinner": 3,
                "tweet_selection_urt": 1,
                "end_flow": 1,
                "settings_list": 7,
                "open_external_link": 1,
                "phone_verification": 5,
                "security_key": 3,
                "select_banner": 2,
                "upload_media": 1,
                "web": 2,
                "alert_dialog": 1,
                "open_account": 2,
                "action_list": 2,
                "enter_phone": 2,
                "open_link": 1,
                "show_code": 1,
                "update_users": 1,
                "check_logged_in_account": 1,
                "enter_email": 2,
                "select_avatar": 4,
                "location_permission_prompt": 2,
                "notifications_permission_prompt": 4
            }
        }"""
    ))

    flow_token = flow_token_resp.json()["flow_token"]

    resp = session.post("https://api.twitter.com/1.1/onboarding/task.json", headers={
        "Authorization": "Bearer AAAAAAAAAAAAAAAAAAAAAFXzAwAAAAAAMHCxpeSDG1gLNLghVe8d74hl6k4%3DRUMF4xAQLsbeBhTSRrCiQpJtxoGWeyHrDb5te2jpGskWDFW82F",
        "Content-Type": "application/json",
        "User-Agent": "TwitterAndroid/9.95.0-release.0 (29950000-r-0) ONEPLUS+A3010/9 (OnePlus;ONEPLUS+A3010;OnePlus;OnePlus3;0;;1;2016)",
        "X-Twitter-API-Version": "5",
        "X-Twitter-Client": "TwitterAndroid",
        "X-Twitter-Client-Version": "9.95.0-release.0",
        "OS-Version": "28",
        "System-User-Agent": "Dalvik/2.1.0 (Linux; U; Android 9; ONEPLUS A3010 Build/PKQ1.181203.001)",
        "X-Twitter-Active-User": "yes",
        "X-Guest-Token": guest_token,
    }, data=json.dumps({
        "flow_token": flow_token,
        "subtask_inputs": [
            {
                "open_link": {
                    "link": "next_link",
                },
                "subtask_id": "NextTaskOpenLink",
            }
        ],
        "subtask_versions": {
            "generic_urt": 3,
            "standard": 1,
            "open_home_timeline": 1,
            "app_locale_update": 1,
            "enter_date": 1,
            "email_verification": 3,
            "enter_password": 5,
            "enter_text": 5,
            "one_tap": 2,
            "cta": 7,
            "single_sign_on": 1,
            "fetch_persisted_data": 1,
            "enter_username": 3,
            "web_modal": 2,
            "fetch_temporary_password": 1,
            "menu_dialog": 1,
            "sign_up_review": 5,
            "interest_picker": 4,
            "user_recommendations_urt": 3,
            "in_app_notification": 1,
            "sign_up": 2,
            "typeahead_search": 1,
            "user_recommendations_list": 4,
            "cta_inline": 1,
            "contacts_live_sync_permission_prompt": 3,
            "choice_selection": 5,
            "js_instrumentation": 1,
            "alert_dialog_suppress_client_events": 1,
            "privacy_options": 1,
            "topics_selector": 1,
            "wait_spinner": 3,
            "tweet_selection_urt": 1,
            "end_flow": 1,
            "settings_list": 7,
            "open_external_link": 1,
            "phone_verification": 5,
            "security_key": 3,
            "select_banner": 2,
            "upload_media": 1,
            "web": 2,
            "alert_dialog": 1,
            "open_account": 2,
            "action_list": 2,
            "enter_phone": 2,
            "open_link": 1,
            "show_code": 1,
            "update_users": 1,
            "check_logged_in_account": 1,
            "enter_email": 2,
            "select_avatar": 4,
            "location_permission_prompt": 2,
            "notifications_permission_prompt": 4,
        }
    }))

    try:
        tokens = [json.dumps(resp.json()["subtasks"][i]["open_account"]["user"]["id"]) for i in range(len(resp.json()["subtasks"]))]
        print(json.dumps(resp.json()["subtasks"][0]["open_account"]))
    except KeyError:
        print("Failed to fetch guest account, is your IP rate limited or so?", file=sys.stderr)
        sys.exit(1)

print("Tokens: ", tokens)

Script B:

import requests

url = "https://cdn.syndication.twimg.com/tweet-result"
select_token = 0

search_keywords = "How much is the fish?"
params = {
    "id":tokens[select_token],
    "lang":"en",
    "keywords": search_keywords
}

payload = ""
headers = {
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:109.0) Gecko/20100101 Firefox/114.0",
    "Accept": "*/*",
    "Accept-Language": "en-US,en;q=0.5",
    "Accept-Encoding": "gzip, deflate, br",
    "Origin": "https://platform.twitter.com",
    "Connection": "keep-alive",
    "Referer": "https://platform.twitter.com/",
    "Sec-Fetch-Dest": "empty",
    "Sec-Fetch-Mode": "cors",
    "Sec-Fetch-Site": "cross-site",
    "Pragma": "no-cache",
    "Cache-Control": "no-cache",
    "TE": "trailers"
}

response = requests.request("GET", url, data=payload, headers=headers, params=params)

print(response.text)

@hypnoti13

This comment was marked as spam.

@tatta123

This comment was marked as spam.

@tatta123

This comment was marked as spam.

@robookwus

This comment was marked as resolved.

@erikcas

This comment was marked as resolved.

@vcuspinera

This comment was marked as spam.

@ghost

This comment was marked as off-topic.

@KyllianBeguin

This comment was marked as off-topic.

@yml-blog

This comment was marked as spam.

@doveppp
Copy link

doveppp commented Dec 16, 2023

Now, individual tweets can be viewed without logging in, but I tried TwitterTweetScraper and it still doesn't work.

@JFVefour
Copy link

Hi, how did you get this information?

@doveppp
Copy link

doveppp commented Dec 16, 2023

Hi, how did you get this information?

No specific notification, I just opened a tweet while not logged in.

@yeahjack
Copy link

Confirmed too, that viewing both tweets and users without login is now successful. Maybe it is a good start.

Hi, how did you get this information?

No specific notification, I just opened a tweet while not logged in.

@recursingfeynman

This comment was marked as spam.

1 similar comment
@yml-blog

This comment was marked as spam.

@xoliq0v

This comment was marked as spam.

@Demmenie
Copy link

Demmenie commented Jul 5, 2024

Vercel's react-tweet now has a bit of a workaround. They figured out that you can use the Twitter embed API to get data from any tweet. Usually, you'd need a special token to get any data but they reverse engineered the token and you can generate it yourself using the tweet id.

The API is at this URL: 'https://cdn.syndication.twimg.com/tweet-result'

and the token generator looks like this:

function getToken(id: string) {
  return ((Number(id) / 1e15) * Math.PI)
    .toString(6 ** 2)
    .replace(/(0+|\.)/g, '')
}

Source: https://github.com/vercel/react-tweet/blob/main/packages/react-tweet/src/api/fetch-tweet.ts

@MathiasExorde
Copy link

MathiasExorde commented Jul 16, 2024

Hi everyone, I know this will sound like an ad.

I have used this library for a while back then, and waited to see if the community would manage.
Apparently it's now impossible to get tweet by simple indivudals.
I represent Exorde network (exordelabs . com) and we are collecting 6 millions tweets a day, out of 10 million posts(a day).
That's billions a year, We do it in real time, large scale, over 8000+ sources, 300k articles daily, forums blogs, etc.

We have an Insight API for aggregated metrics and a Fullstream API that output the entire annotated feed. Just reach out for trial & access. We are willing to support researchers and OSINT efforts, with have an API & can provide raw archives.
As far as we know, we're the only option for humble researchers / OSINT experts.

Just reach out on [email protected] or visit developers.exorde.io

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working module:twitter upstream
Projects
None yet
Development

No branches or pull requests