Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add check for uptake rate #326

Open
leplatrem opened this issue Feb 3, 2020 · 1 comment
Open

Add check for uptake rate #326

leplatrem opened this issue Feb 3, 2020 · 1 comment
Labels
remote-settings Related to Remote Settings checks

Comments

@leplatrem
Copy link
Contributor

Check that X% of clients get update after Y minutes

Would require https://bugzilla.mozilla.org/show_bug.cgi?id=1612712

@leplatrem leplatrem added the remote-settings Related to Remote Settings checks label Feb 3, 2020
@leplatrem
Copy link
Contributor Author

leplatrem commented Feb 24, 2020

I started to work on this, using this query.

Since timestamps go forward, we can assume that if a client received timestamp X, then it also received all X - N.
I was thinking of accumulating the number of received events by timestamp, and hacked quickly something like this:

    cumulated = {}

    for row in rows:
        total = row["total"]
        etag = row["received_timestamp"][1: -1]
        if etag not in cumulated:
        	cumulated[etag] = {
        		"published": utcfromtimestamp(int(etag)).isoformat(),
        		"first_seen": row["min_timestamp"],
        		"total": 0,
        	}

        cumulated[etag]["duration"] = (
            datetime.fromisoformat(row["max_timestamp"])
            - datetime.fromisoformat(cumulated[etag]["first_seen"])
        ).seconds / 60

        for e in cumulated.keys():
        	if e <= etag:
        		cumulated[etag]["total"] += total

Which would give:

  "1582329230774": {
    "published": "2020-02-21T23:53:50.774000+00:00",
    "start": "2020-02-21T23:50:00",
    "total": 9677877,
    "duration": 1010.0
  },
  "1582416087079": {
    "published": "2020-02-23T00:01:27.079000+00:00",
    "start": "2020-02-23T00:00:00",
    "total": 7711960,
    "duration": 1000.0
  },
  "1582549284992": {
    "published": "2020-02-24T13:01:24.992000+00:00",
    "start": "2020-02-24T13:00:00",
    "total": 3140898,
    "duration": 220.0
  },
  ...

Some questions arose, so I thought sharing them here (most likely with myself) would be helpful:

  • how to properly pick the etag to be studied over the period? The closest after beginning of period? The one with the highest uptake?
  • what does represent better the global health of the pipeline? Percentage of uptake after X minutes or number of minutes to reach X percentage?
  • how to handle the variations of connected clients during nights and week-ends?
  • if accumulation is the way to go, how to treat the same client reporting several updates over the studied period of time?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
remote-settings Related to Remote Settings checks
Projects
None yet
Development

No branches or pull requests

1 participant