Skip to content
forked from ooni/ooni-sync

Fast downloader of measurements from the OONI measurements API

License

Notifications You must be signed in to change notification settings

anadahz/ooni-sync

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

41 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ooni-sync now supports downloading OONI reports from the AWS S3 bucket
[ooni-data](https://registry.opendata.aws/ooni/). Please use the option
`-s3-direct` to download OONI reports from the AWS S3 bucket.

ATTENTION: OONI now offers a full database to quicky access all the data 
and the use of the OONI Sync tool is no longer recommended. For all the 
options available to access OONI data see: https://ooni.org/data/.

Fast downloader of OONI reports using the OONI API. It works by
downloading an index of available files, and only downloading the files
that are not already present locally. You can run it again and again to
keep a local directory up to date with newly published reports.

Example usage:
	ooni-sync -s3-direct -xz -directory reports test_name=tcp_connect
This command will create the directory "reports" if it doesn't exist,
download all tcp_connect reports that are not already present in the
directory, and compress the downloaded reports with xz.

The query is composed of name=value pairs. Possible query parameters
include:
	test_name=[name]
	probe_cc=[cc]
	probe_asn=AS[num]
	since=[yyyy-mm-dd]
	until=[yyyy-mm-dd]
The value of "test_name" can be "tcp_connect", "web_connectivity",
"ndt", etc. The query parameters "order_by", "order", "offset", and
"limit" are used internally by the program and will be overridden if
present. More documentation on the API is available from
https://api.ooni.io/api/.

Because the program only uses filename comparison to know whether a
report file has already been downloaded, it is careful not to store a
file under its final filename until it has been completely downloaded.
It is safe to interrupt the program or run two copies simultaneously.

To make your Python scripts capable of processing xz-compressed files,
just call this function intead of open(filename):
	def open_magic(filename):
	    if filename.endswith(".xz"):
	        p = subprocess.Popen(["xz", "-d", "-c", "--", filename], stdout=subprocess.PIPE)
	        return p.stdout
	    else:
	        return open(filename)

Tip: to generate a URL that links to a specific measurement OONI
Explorer, use this jq syntax:
	"https://explorer.ooni.torproject.org/measurement/\(.report_id|@uri)?input=\(.input|join(":")|@uri)"
For example, here is how to generate a table from a directory full of
reports:
	cat reports/*.json | jq -r '[.measurement_start_time,.probe_cc,.probe_asn,.test_name,"https://explorer.ooni.torproject.org/measurement/\(.report_id|@uri)?input=\(.input|join(":")|@uri)"]|@tsv'
Or, if you used the -xz option:
	xz -dc reports/*.json.xz | jq -r '[.measurement_start_time,.probe_cc,.probe_asn,.test_name,"https://explorer.ooni.torproject.org/measurement/\(.report_id|@uri)?input=\(.input|join(":")|@uri)"]|@tsv'

Contact: David Fifield <[email protected]>

About

Fast downloader of measurements from the OONI measurements API

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages

  • Go 98.6%
  • Makefile 1.4%