Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Get badge without mirrors? #164

Open
shadiakiki1986 opened this issue Oct 8, 2019 · 9 comments
Open

Get badge without mirrors? #164

shadiakiki1986 opened this issue Oct 8, 2019 · 9 comments
Labels
enhancement New feature or request

Comments

@shadiakiki1986
Copy link

Hey there. Awesome project. Is it possible to get a badge from pepy without the mirrors? For my project, the mirrors stats are much larger than the non-mirror ones because it's still a young project. I wouldn't want to be misleading with the badge on my README

References

https://pepy.tech/project/isitfit

https://pypistats.org/packages/isitfit

@psincraian
Copy link
Owner

I don't plan to add this feature, maybe I will add a similar one to list the source of downloads. If you take a look at your downloads in BigQuery you can see the following results:

row details_installer_name downloads
1 Browser 122
2 pip 71
3 requests 96
4 null 48
5 bandersnatch 2886

As you can see here the bandersnatch mirror has 2866 downloads. It seems quite a lot, but the mirror can be installed locally link. So if I have a mirror locally and I install your packaging from the mirror, this download will not be took into account.

@shadiakiki1986
Copy link
Author

shadiakiki1986 commented Oct 9, 2019 via email

@shadiakiki1986
Copy link
Author

shadiakiki1986 commented Oct 9, 2019 via email

@jewettaij
Copy link

jewettaij commented Aug 11, 2020

For what it's worth, I can also chime in to confirm that my download stats on pepy.tech are much, much higher than they are (or were) on pypistats.org.

That's all I wanted to say. No need to reply to this post.

The remainder of this message is a discussion which is not directly relevant to this thread, but I wasn't sure if it was appropriate to start a new issue. Feel free to ignore.

Other possible reasons for high download counts

Estimating the number of users instead of download counts

Before I used pypi, when my software was hosted on my own web page, the majority of downloads came from the same few IP addresses. (For example, I remember that one IP address downloaded my software over 10000 times. This was back when it was legal to keep track of visitor IP addresses.) Is it possible to use BigQuery to estimate the number of unique users (by discarding downloads from the same IP)? (Forgive me. I know nothing about BigQuery.)

Excluding downloads with unknown python versions

When I used pypistats.org, it was able to show what version of python the users who downloaded my project were using (eg 2.7, 3.5, 3.7, etc...). This was interesting, but it's not essential. I only mention this here because it seemed that (even after excluding downloads from mirrors), the majority of downloads for my small project were from users whose python version is "unknown" and whose OS is also "unknown". Are these downloads legitimate? Should we exclude them?

Thanks for creating this service.

@laurahanu
Copy link

I also think it would be useful to have the option to choose the type of stats! Have there been any updates on this or are there any plans to add this in the future?

@psincraian
Copy link
Owner

Hi @laurahanu, currently we are saving download stats without mirrors. Now we need to make changes to the API and to the frontend app :-)

@laurahanu
Copy link

Hi @psincraian, thanks for the reply and good to hear! Looking forward!

@PMeira
Copy link

PMeira commented Apr 10, 2021

currently we are saving download stats without mirrors. Now we need to make changes to the API and to the frontend app :-)

@psincraian I'm also looking forward to that, and thanks PePy as a whole!

I just wanted to add some thoughts on this, hopefully not too off-topic. I know these are not trivial issues and I' m aware of the discussion on why PyPI doesn't include stats themselves. And I imagine these issues don't matter much for packages with a large number of downloads.

As you can see here the bandersnatch mirror has 2866 downloads. It seems quite a lot, but the mirror can be installed locally link. So if I have a mirror locally and I install your packaging from the mirror, this download will not be took into account.

Since the mirrors seem to download all files, they might inflate a lot the numbers for packages with few users but binary wheels for various Python versions and platforms. I believe the total without mirrors will help a lot in those cases.

For instance, using BigQuery directly* a few weeks ago, one of my packages had:

  • around 26k downloads without mirrors
  • around 15k downloads with pip installer and details
  • more than 265k downloads total (286k from PePy)

(*=I was using the old downloads* table for this, not file_downloads)

@jewettaij mentioned:

the majority of downloads for my small project were from users whose python version is "unknown" and whose OS is also "unknown". Are these downloads legitimate? Should we exclude them?

Besides those, which usually reflect that the fields are null in the BigQuery table, I noticed some other weird things.
For example, I'm not sure how the "country_code" is filled in the BigQuery data, even when restricted to "pip" as the installer. For my niche package, I noticed from the data that country_code=US is disproportionally larger than everything else, so I wonder:

  • if the CDN infrastructure has any effect on that and possibly other fields
  • if the data from other countries is/was just lost more frequently
  • if maybe the current data is right(ish) and those download numbers are closer to the actual numbers

@laurahanu
Copy link

Hi @psincraian, have there been any updates with the api or on the front end side? Otherwise, is there a timeline for when this would be included?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

5 participants