Name	Name	Last commit message	Last commit date
Latest commit robertodauria Test context cancelation and increase coverage (#89 ) Nov 29, 2021 db14a5d · Nov 29, 2021 History 334 Commits
annotation/exports	annotation/exports	Use project template var in tcpinfo annotation export query	Nov 5, 2021
cmd/stats-pipeline	cmd/stats-pipeline	annotation export type uses new formatter	Oct 21, 2021
config	config	Update comments in config.go.	Jul 26, 2021
docs	docs	update format-schema	Feb 2, 2021
exporter	exporter	Test context cancelation and increase coverage (#89 )	Nov 29, 2021
formatter	formatter	Add additional input checks for stats partition formatter	Nov 1, 2021
histogram	histogram	Fix switch, add another test	Jul 21, 2021
k8s/data-processing	k8s/data-processing	Add separate CronJob for annotation-export (#87 )	Nov 17, 2021
maptiles	maptiles	Add separate CronJob for annotation-export (#87 )	Nov 17, 2021
output	output	Add GCS and Local writer implementations	Jul 19, 2021
pipeline	pipeline	Make errors errors.	Oct 25, 2021
statistics	statistics	Change canary query to use simpler client filter (#78 )	Nov 10, 2021
.gitignore	.gitignore	Add "step" parameter to the /pipeline endpoint (#44 )	Feb 1, 2021
.travis.yml	.travis.yml	Disable race tests	Jul 15, 2021
ANNOTATION.md	ANNOTATION.md	Add separate CronJob for annotation-export (#87 )	Nov 17, 2021
Dockerfile	Dockerfile	Include annotation queries in docker image	Nov 5, 2021
LICENSE	LICENSE	Initial commit	Jun 23, 2020
README.md	README.md	Add badges to README	Jul 15, 2021
cloudbuild.yaml	cloudbuild.yaml	Add separate CronJob for annotation-export (#87 )	Nov 17, 2021
compose-annotation-export.yaml	compose-annotation-export.yaml	Use config from k8s directory	Nov 5, 2021
config.json	config.json	corrected misspelling per review	Apr 7, 2021
cors-settings.json	cors-settings.json	Add creation script for GCP resources and CORS settings (#17 )	Nov 13, 2020
create_statistics_api.sh	create_statistics_api.sh	Add creation script for GCP resources and CORS settings (#17 )	Nov 13, 2020
go.mod	go.mod	Update go mod	Oct 20, 2021
go.sum	go.sum	Update go mod	Oct 20, 2021

Repository files navigation

Statistics Pipeline Service

This repository contains code that processes NDT data and provides aggregate metrics by day for standard global, and some national geographies. The resulting aggregations are made available in JSON format, for use by other applications.

The stats-pipeline service is written in Go, runs on GKE, and generates and updates daily aggregate statistics. Access is provided in public BigQuery tables and in per-year JSON formatted files hosted on GCS.

Documentation Provided for the Statistics Pipeline Service

(This document) Overview of the stats-pipeline service, fields provided (schema), output formats, available geographies, and API URL structure.
What Statistics are Provided by stats-pipeline, and How are They Calculated?
Geographic Precision in stats-pipeline
Statistics Output Format, Schema, and Field Descriptions
Statistics API URL Structure, Available Geographies & Aggregations

General Recommendations for All Aggregations of NDT data

In general, our recommendations for research aggregating NDT data are:

Don't oversimplify
Aggregate by ASN in addition to time/date and location
Be aware of, and illustrate multimodal distributions
Use histogram and logarithmic scales
Take into account, and compensate for, client bias and population drift

Roadmap

Below we list additional features, methods, geographies, etc. which may be considered for future versioned releases of stats-pipeline.

Geographies

US Zip Codes, US Congressional Districts, Block Groups, Blocks

Output Formats

histogram_daily_stats.csv - Same data as the JSON, but in CSV. Useful for importing into a spreadsheet.
histogram_daily_stats.sql - A SQL query which returns the same rows in the corresponding .json and .csv. Useful for verifying the exported data against the source and to tweak the query as needed by different use cases.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Statistics Pipeline Service

Documentation Provided for the Statistics Pipeline Service

General Recommendations for All Aggregations of NDT data

Roadmap

Geographies

Output Formats

About

Releases 11

Packages

Contributors 5

Languages

License

m-lab/stats-pipeline

Folders and files

Latest commit

History

Repository files navigation

Statistics Pipeline Service

Documentation Provided for the Statistics Pipeline Service

General Recommendations for All Aggregations of NDT data

Roadmap

Geographies

Output Formats

About

Resources

License

Stars

Watchers

Forks

Releases 11

Packages 0

Contributors 5

Languages

Packages