Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[New API] Add Google Ngram Viewer API #51

Open
kagermanov27 opened this issue Mar 10, 2022 · 0 comments
Open

[New API] Add Google Ngram Viewer API #51

kagermanov27 opened this issue Mar 10, 2022 · 0 comments
Labels
status: freezer Something we don't want to work on yet

Comments

@kagermanov27
Copy link

Old Canny Conversation:

Bill: The goal we're trying to hit: when did Google first index a term?
before: and after: operators don't work, since if a page was indexed in 2000, it'll show for, e.g. "COVID-19" even though the term didn't appear in 2000 (but the page did).
We like this proxy:
https://books.google.com/ngrams
Useful, but of course the data is obfuscated in an SVG... if it's possible (or another way)...?

Ilya: Google Books Ngram Viewer has a JSON endpoint: https://books.google.com/ngrams/json
. It accepts the same parameters and responds with an array of objects.
curl -s --compressed 'https://books.google.com/ngrams/json?content=Albert+Einstein%2CSherlock+Holmes%2CFrankenstein&year_start=1800&year_end=2022' | jq '.[] | keys'
[
"ngram",
"parent",
"timeseries",
"type"
]
[
"ngram",
"parent",
"timeseries",
"type"
]
[
"ngram",
"parent",
"timeseries",
"type"
]
Related researches:

https://stackoverflow.com/a/67759742/1291371
https://jameshfisher.com/2018/11/25/google-ngram-api/

Bill, thank you for this feature request! We'll update this thread when we support Google Books Ngrams. Until then, you can use Google's undocumented API. Make sure you avoid getting blocked by Google
image

Bill: Love it. Thanks so much!

Justin: Hi Bill Frischling I inspected the HTML for https://books.google.com/ngrams and the element for one of the search items.
<text class="label hover clickable" aria-hidden="true" transform="translate(397.1,45)" x="3" dy=".1em" style="font-size: 15px; fill: rgb(211, 47, 47); opacity: 0.12; font-weight: normal;">Sherlock Holmes</text>
image
What measurables or static data did you want from the HTML that can be found, that we could potentially scrape. If it's not on the HTML then we won't be able to scrape it.

Bill: Understood. I'm still poking and I was hoping
the year and % could be extrapolated in some way, but it appears to be quite thoroughly obfuscated unless I'm reading it wrong. The mouseover data is what we are going for, but darned if I can figure how to translate that from the SVG.
We are looking at a couple of code blocks we found that can translate the chart area and SVG points into a relative measurement (e.g. https://stackoverflow.com/questions/43727621/converting-svg-from-highcharts-data-into-data-points) just to see if it can be done (more on the 'damn you Google, we'll prove we can beat the obfuscation' than for any practical use on our end), but it def wouldn't be a straightforward extract from embedded attributes or JSON.
I was hoping I missed something in the code that might have expressly stated "1969" and "0.0000371656" to extract, but sounds like that's not the case.

Ali: Hello Bill,
I hope you are doing well. If you can't do what you are looking to do with Google, I don't think that you can do with SerpApi. We support operators but I see that you already tested.
For the second part, do you request this Google Books Ngrams page as a new API?

Bill: Yes... even just to pull basic data on term distribution across date. As part of our algos, we use proxies to try to figure out when a term first came into circulation in common language usage. Trends is great for that, but obvi limited to the time (we like that feature request of course) back to the 1990s. Books NGrams rolls back to 1800, which for our purposes is just AWESOME


Old Canny Link

@kagermanov27 kagermanov27 added the status: freezer Something we don't want to work on yet label Mar 23, 2022
@dimitryzub dimitryzub moved this to 🆕 Frozen in SerpApi Roadmap Sep 13, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
status: freezer Something we don't want to work on yet
Projects
Status: 🆕 Frozen
Development

No branches or pull requests

1 participant