-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Commits SHAs from p2c not indexed? #9
Comments
The commit also still exists on github: liferay/liferay-portal@4ec29d1 |
4ec29d1bdde625673f844e2a44cc7d9095253b35 is a regular commit (not among the bad commits listed in woc.pm) but is, indeed missing. The repo is updated regularly (and was updated for version S) but that specific commit was lost in the process, so hopefully it will get successfully extracted during the next collection. |
@audrism I've also come across commits existing but not a project for them. Are you interested in the list of commits? |
Getting a list is easy:
for inConly:
What would be helpful is a scrip or audit process that tries to recover missing commits for inPonly While the first is traightforward in case the git repo is still online and has not been compacted, the second is more tricky: use ghtorrent/SwHeritage? |
ghtorrent and SwHeritage might not cover the most recent commits. There is a way to search for it on github... but API limits: Note that the CI bot has already deleted the branch, but the commit still shows up in a PR: liferay/liferay-portal@4ec29d1 I guess you could also query But it doesn't have the metadata for whether or not the commit belongs to the repo vs getting this link from search: SwHeritage returns no hits: https://archive.softwareheritage.org/browse/search/?q=4ec29d1bdde625673f844e2a44cc7d9095253b35&with_visit=true&with_content=true&search_metadata=true |
So the search is affected by api limits? the url does not appear to invoke rest/graphql api |
Search has a limit of 30 requests/min with a token (https://docs.github.com/en/free-pro-team@latest/rest/reference/search#rate-limit). You can also query for when your rate limit expires: https://docs.github.com/en/free-pro-team@latest/rest/reference/rate-limit I imagine the lookup to be 2 steps:
Where
The 30 requests/min is limiting, and the non-api endpoints are also rate limited (although I'm unsure what it is exactly). Your mileage may vary as well with getting useful results (the example works because the commit sha was included somewhere in the pull request body?). e.g.
|
It seems like |
I use --mirror when cloning as it gets all the branches. |
For example, I run
echo "liferay_liferay-portal" ~/lookup/getValues -f p2c | grep 4ec29d1bdde625673f844e2a44cc7d9095253b35
which means that a commit 4ec29d1bdde625673f844e2a44cc7d9095253b35 should exist.This is what happens when I run the following:
The text was updated successfully, but these errors were encountered: