GitHub - anadahz/ooni-sync: Fast downloader of measurements from the OONI measurements API

anadahz / ooni-sync Public

forked from ooni/ooni-sync

Notifications You must be signed in to change notification settings
Fork 0
Star 0

Fast downloader of measurements from the OONI measurements API

CC0-1.0 license

0 stars 5 forks Branches Tags Activity

Star

Notifications

Branches Tags

Name		Name	Last commit message	Last commit date
Latest commit History 41 Commits
.github/workflows		.github/workflows
.gitignore		.gitignore
.goreleaser.yml		.goreleaser.yml
LICENSE		LICENSE
Makefile		Makefile
README		README
ooni-sync.go		ooni-sync.go

Repository files navigation

ooni-sync now supports downloading OONI reports from the AWS S3 bucket
[ooni-data](https://registry.opendata.aws/ooni/). Please use the option
`-s3-direct` to download OONI reports from the AWS S3 bucket.

ATTENTION: OONI now offers a full database to quicky access all the data 
and the use of the OONI Sync tool is no longer recommended. For all the 
options available to access OONI data see: https://ooni.org/data/.

Fast downloader of OONI reports using the OONI API. It works by
downloading an index of available files, and only downloading the files
that are not already present locally. You can run it again and again to
keep a local directory up to date with newly published reports.

Example usage:
	ooni-sync -s3-direct -xz -directory reports test_name=tcp_connect
This command will create the directory "reports" if it doesn't exist,
download all tcp_connect reports that are not already present in the
directory, and compress the downloaded reports with xz.

The query is composed of name=value pairs. Possible query parameters
include:
	test_name=[name]
	probe_cc=[cc]
	probe_asn=AS[num]
	since=[yyyy-mm-dd]
	until=[yyyy-mm-dd]
The value of "test_name" can be "tcp_connect", "web_connectivity",
"ndt", etc. The query parameters "order_by", "order", "offset", and
"limit" are used internally by the program and will be overridden if
present. More documentation on the API is available from
https://api.ooni.io/api/.

Because the program only uses filename comparison to know whether a
report file has already been downloaded, it is careful not to store a
file under its final filename until it has been completely downloaded.
It is safe to interrupt the program or run two copies simultaneously.

To make your Python scripts capable of processing xz-compressed files,
just call this function intead of open(filename):
	def open_magic(filename):
	    if filename.endswith(".xz"):
	        p = subprocess.Popen(["xz", "-d", "-c", "--", filename], stdout=subprocess.PIPE)
	        return p.stdout
	    else:
	        return open(filename)

Tip: to generate a URL that links to a specific measurement OONI
Explorer, use this jq syntax:
	"https://explorer.ooni.torproject.org/measurement/\(.report_id|@uri)?input=\(.input|join(":")|@uri)"
For example, here is how to generate a table from a directory full of
reports:
	cat reports/*.json | jq -r '[.measurement_start_time,.probe_cc,.probe_asn,.test_name,"https://explorer.ooni.torproject.org/measurement/\(.report_id|@uri)?input=\(.input|join(":")|@uri)"]|@tsv'
Or, if you used the -xz option:
	xz -dc reports/*.json.xz | jq -r '[.measurement_start_time,.probe_cc,.probe_asn,.test_name,"https://explorer.ooni.torproject.org/measurement/\(.report_id|@uri)?input=\(.input|join(":")|@uri)"]|@tsv'

Contact: David Fifield <[email protected]>