forked from ooni/ooni-sync
-
Notifications
You must be signed in to change notification settings - Fork 0
/
README
56 lines (48 loc) · 2.73 KB
/
README
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
ooni-sync now supports downloading OONI reports from the AWS S3 bucket
[ooni-data](https://registry.opendata.aws/ooni/). Please use the option
`-s3-direct` to download OONI reports from the AWS S3 bucket.
ATTENTION: OONI now offers a full database to quicky access all the data
and the use of the OONI Sync tool is no longer recommended. For all the
options available to access OONI data see: https://ooni.org/data/.
Fast downloader of OONI reports using the OONI API. It works by
downloading an index of available files, and only downloading the files
that are not already present locally. You can run it again and again to
keep a local directory up to date with newly published reports.
Example usage:
ooni-sync -s3-direct -xz -directory reports test_name=tcp_connect
This command will create the directory "reports" if it doesn't exist,
download all tcp_connect reports that are not already present in the
directory, and compress the downloaded reports with xz.
The query is composed of name=value pairs. Possible query parameters
include:
test_name=[name]
probe_cc=[cc]
probe_asn=AS[num]
since=[yyyy-mm-dd]
until=[yyyy-mm-dd]
The value of "test_name" can be "tcp_connect", "web_connectivity",
"ndt", etc. The query parameters "order_by", "order", "offset", and
"limit" are used internally by the program and will be overridden if
present. More documentation on the API is available from
https://api.ooni.io/api/.
Because the program only uses filename comparison to know whether a
report file has already been downloaded, it is careful not to store a
file under its final filename until it has been completely downloaded.
It is safe to interrupt the program or run two copies simultaneously.
To make your Python scripts capable of processing xz-compressed files,
just call this function intead of open(filename):
def open_magic(filename):
if filename.endswith(".xz"):
p = subprocess.Popen(["xz", "-d", "-c", "--", filename], stdout=subprocess.PIPE)
return p.stdout
else:
return open(filename)
Tip: to generate a URL that links to a specific measurement OONI
Explorer, use this jq syntax:
"https://explorer.ooni.torproject.org/measurement/\(.report_id|@uri)?input=\(.input|join(":")|@uri)"
For example, here is how to generate a table from a directory full of
reports:
cat reports/*.json | jq -r '[.measurement_start_time,.probe_cc,.probe_asn,.test_name,"https://explorer.ooni.torproject.org/measurement/\(.report_id|@uri)?input=\(.input|join(":")|@uri)"]|@tsv'
Or, if you used the -xz option:
xz -dc reports/*.json.xz | jq -r '[.measurement_start_time,.probe_cc,.probe_asn,.test_name,"https://explorer.ooni.torproject.org/measurement/\(.report_id|@uri)?input=\(.input|join(":")|@uri)"]|@tsv'
Contact: David Fifield <[email protected]>