Add upload statistics to report #3615

dbutenhof · 2024-04-05T20:51:25Z

Present upload counts in various buckets, including "this year", "this month", "this week", "today" along with per year/month/day-of-month/day-of-week/hour-of-day just for fun.

Upload report:
  30,086 uploads this year (2024)
  62 uploads this month (April 2024)
  62 uploads this week (March 30 to April 06)
  9 uploads today (06 April 2024)
 Total uploads by year:
    2023:  100,497    2024:   30,086
 Total uploads by month of year:
    Jan:   17,084    Feb:   12,586    Mar:      354    Apr:       62
    Dec:  100,497
 Total uploads by day of month:
    01:       14    02:       68    03:       63    04:       11
    05:    3,848    06:    3,476    07:      758    08:   10,851
    09:    5,110    10:   21,787    11:    5,450    12:      192
    13:    8,351    14:   20,960    15:    3,714    16:    7,715
    17:    3,551    18:   10,381    19:       66    20:       47
    21:       96    22:    1,620    23:   11,200    24:   10,888
    25:       34    26:       58    27:       83    28:       50
    29:       51    30:       57    31:       33
 Total uploads by day of week:
    Mon:   19,395    Tue:      512    Wed:    8,590    Thu:   21,737
    Fri:   11,912    Sat:   32,570    Sun:   35,867
 Total uploads by hour of day:
    00:    7,747    01:    8,691    02:    6,794    03:    5,600
    04:    3,248    05:    4,591    06:    3,156    07:    3,051
    08:    2,998    09:    2,565    10:    1,635    11:    1,793
    12:    1,586    13:    3,337    14:    9,135    15:    5,347
    16:    5,740    17:    6,647    18:    9,018    19:    5,731
    20:    7,637    21:   10,244    22:    6,972    23:    7,320

Present upload counts in various buckets, including "this year", "this month", "this week", "today" along with per year/month/day-of-month/hour just for fun.

webbnh

Looks great (although, the headers for the buckets could be more descriptive -- I fell into the trap of thinking that the "by day" buckets were for each of the last 31 days and that the "by hour" were for the last 24 hours, as opposed to, e.g., "the second hour of any day"...unfortunately, I don't have any good suggestions off the top of my head).

I do have a pointed question and a small 1+ for your consideration.

webbnh · 2024-04-05T21:58:36Z

lib/pbench/cli/server/report.py

+    click.echo(f"  {this_month:,d} uploads this month ({month:%B, %Y})")
+    click.echo(f"  {this_week:,d} uploads this week ({week:%B %d} to {day:%B %d})")
+    click.echo(f"  {this_day:,d} uploads today ({day:%Y-%m-%d})")


Are you sure you want to use %Y-%m-%d for "today" after using %B in the previous two lines? (And, do you really want the comma in %B, %Y?) I would opt for %d %B %Y.

I did bounce back and forth on the exact format for several. Basically I was doing this on half the burners in between bouts of wrestling with Horreum and Kiota.

webbnh · 2024-04-05T22:05:39Z

lib/pbench/cli/server/report.py

+def columnize(items: dict[str, Any], width: int = 80):
+    line = ""
+    for item, count in sorted(items.items()):
+        if len(line) >= width:
+            click.echo(line)
+            line = ""
+        line += f"    {item:4d}: {count:>8,d}"


Since the code adds to the line in chunks of 18 characters (I think...), the test at line 314 could produce a line which is "too long" (i.e., 90 characters, by default). [I don't know why this doesn't happen in the same output in the description...did you specify a smaller width for that?]

I would recommend something like this:

line = "" for item, count in sorted(items.items()): addition = f" {item:4d}: {count:>8,d}" if len(line) + len(addition) >= width: click.echo(line) line = "" line += addition

or

line = "" for item, count in sorted(items.items()): addition = f" {item:4d}: {count:>8,d}" if len(line) + len(addition) >= width: click.echo(line) line = addition else: line += addition

Yeah, I threw in the columnization at the last minute because the lists were rather long: you're right, it's not entirely accurate as is, but I didn't spend too many cycles worrying about it. However, I guess it's worth cleaning up.

(And, yeah, I think I generated the sample output with --width 70 to make sure GitHub didn't wrap it. 😆 )

dbutenhof

I realized after pushing the PR that I'd meant to also touch up the feedback (and add a "show" indexing state) on reindex after our ops review experience, so this is due another pass anyway this weekend... and I'd've maybe felt slightly bad about that if you'd approved without comments.

dbutenhof · 2024-04-06T12:10:41Z

lib/pbench/cli/server/report.py

+    click.echo(f"  {this_month:,d} uploads this month ({month:%B, %Y})")
+    click.echo(f"  {this_week:,d} uploads this week ({week:%B %d} to {day:%B %d})")
+    click.echo(f"  {this_day:,d} uploads today ({day:%Y-%m-%d})")


I did bounce back and forth on the exact format for several. Basically I was doing this on half the burners in between bouts of wrestling with Horreum and Kiota.

dbutenhof · 2024-04-06T12:12:10Z

lib/pbench/cli/server/report.py

+def columnize(items: dict[str, Any], width: int = 80):
+    line = ""
+    for item, count in sorted(items.items()):
+        if len(line) >= width:
+            click.echo(line)
+            line = ""
+        line += f"    {item:4d}: {count:>8,d}"


Yeah, I threw in the columnization at the last minute because the lists were rather long: you're right, it's not entirely accurate as is, but I didn't spend too many cycles worrying about it. However, I guess it's worth cleaning up.

(And, yeah, I think I generated the sample output with --width 70 to make sure GitHub didn't wrap it. 😆 )

dbutenhof · 2024-04-07T18:10:01Z

On a whim, I split the dataset history statistics to be able to show by dataset.metalog.pbench.date rather than just by the primary internal dataset.uploaded:

Dataset statistics by creation date:
  1,637 this year (2024)
  71 this month (April 2024)
  71 this week (March 31 to April 07)
  9 today (07 April 2024)
 Total by year:
    2019:    8,055    2020:   42,817    2021:   21,328    2022:   32,523
    2023:   23,146    2024:    1,637
 Total by month of year:
    Jan:   12,659    Feb:    7,472    Mar:    7,208    Apr:    8,624
    May:    7,511    Jun:    6,927    Jul:    6,254    Aug:    8,777
    Sep:   11,910    Oct:   14,346    Nov:   11,988    Dec:   25,830
 Total by day of month:
    01:    4,623    02:    4,214    03:    4,634    04:    4,493
    05:    4,265    06:    3,295    07:    2,906    08:    3,357
    09:    3,891    10:    2,646    11:    2,960    12:    3,617
    13:    4,608    14:    3,638    15:    2,713    16:    4,505
    17:    3,556    18:    4,349    19:    3,014    20:    4,235
    21:    4,944    22:    4,912    23:    5,792    24:    5,629
    25:    5,175    26:    5,271    27:    4,660    28:    4,781
    29:    4,468    30:    5,208    31:    3,147
 Total by day of week:
    Mon:   19,896    Tue:   19,504    Wed:   24,110    Thu:   18,946
    Fri:   23,197    Sat:   13,790    Sun:   10,063
 Total by hour of day:
    00:    5,355    01:    4,537    02:    5,607    03:    5,975
    04:    6,499    05:    5,306    06:    3,844    07:    4,076
    08:    4,269    09:    5,357    10:    5,481    11:    6,141
    12:    6,411    13:    5,991    14:    5,391    15:    6,015
    16:    5,611    17:    5,531    18:    5,290    19:    5,664
    20:    5,793    21:    5,694    22:    4,665    23:    5,003

webbnh

Looks excellent.

Add upload statistics to report

43d961a

Present upload counts in various buckets, including "this year", "this month", "this week", "today" along with per year/month/day-of-month/hour just for fun.

webbnh assigned dbutenhof Apr 5, 2024

webbnh self-requested a review April 5, 2024 21:44

webbnh previously approved these changes Apr 5, 2024

View reviewed changes

dbutenhof commented Apr 6, 2024

View reviewed changes

Cleanup

350be3b

dbutenhof dismissed webbnh’s stale review via 350be3b April 6, 2024 13:09

dbutenhof added 2 commits April 6, 2024 09:25

touchup

062dbb9

One more tweak: report by creation date metadata

2699e94

Don't generate error without --statistics (oops)

4d0d276

webbnh approved these changes Apr 9, 2024

View reviewed changes

dbutenhof merged commit 9c898cb into distributed-system-analysis:main Apr 9, 2024
4 checks passed

dbutenhof deleted the uploads branch April 9, 2024 18:22

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add upload statistics to report #3615

Add upload statistics to report #3615

dbutenhof commented Apr 5, 2024 •

edited

Loading

webbnh left a comment

webbnh Apr 5, 2024

dbutenhof Apr 6, 2024

webbnh Apr 5, 2024

dbutenhof Apr 6, 2024

dbutenhof left a comment

dbutenhof Apr 6, 2024

dbutenhof Apr 6, 2024

dbutenhof commented Apr 7, 2024

webbnh left a comment

Add upload statistics to report #3615

Add upload statistics to report #3615

Conversation

dbutenhof commented Apr 5, 2024 • edited Loading

webbnh left a comment

Choose a reason for hiding this comment

webbnh Apr 5, 2024

Choose a reason for hiding this comment

dbutenhof Apr 6, 2024

Choose a reason for hiding this comment

webbnh Apr 5, 2024

Choose a reason for hiding this comment

dbutenhof Apr 6, 2024

Choose a reason for hiding this comment

dbutenhof left a comment

Choose a reason for hiding this comment

dbutenhof Apr 6, 2024

Choose a reason for hiding this comment

dbutenhof Apr 6, 2024

Choose a reason for hiding this comment

dbutenhof commented Apr 7, 2024

webbnh left a comment

Choose a reason for hiding this comment

dbutenhof commented Apr 5, 2024 •

edited

Loading