Transparency Report Aggregator

Overview

Our research tracks the impact of telecommunications firms on human rights. We need to be able to track how different governments around the world apply pressure on tech companies and telecommunication providers to hand over identifying user information or block content.

Major technology companies and telecommunications providers release 'transparency reports' on an annual, bi-annual, quarterly, or monthly basis. These transparency reports contain aggregate information about how many requests have been made to block data or hand over user information, and the proportion of those requests that have been complied with.

The reports are released in many different formats, including HTML, PDF, and CSV. There is no uniform way to compare across platforms -- or even necessarily on the same platform over time.

There are currently 68 companies releasing this data. A reasonably complete list of reports is maintained by Access Now at: https://www.accessnow.org/transparency-reporting-index/

This script is designed to scrape tables from CSV, PDF, or HTML reports and aggregate data from different companies into a single common format. The output format contains the following fields:

'report_start', 'report_end', 'platform', 'property', 'country', 'request_type', 'request_subtype', 'num_requests', 'num_accounts_specified', 'num_requests_complied', 'num_accounts_complied', 'agency', 'reason'

Initially, the two main types of requests are private information requests and content removal requests.

Progress to date -- get involved!

This project is under active development. We aim to extend it to download and as many transparency reports as possible.

So far, we have developed an extensible framework and are aggregating information requests from Facebook, Google, LinkedIn, and Snapchat. We are looking for contributors to write new interpreters for data from other providers. Documentation about the steps you need to take to add a new source will be available shortly.

Contact

If you have any questions or comments, please contact Nicolas Suzor <[email protected]>.

Usage

main.py [-vl] --csv-output=FILE (--get=SOURCE | --get-all)

main.py --version

main.py --help

`-h, --help`	Show this screen.
`-v, --verbose`	Increase verbosity for debugging.
`-l, --nolog`	Don't save log to file -- for debugging only.
`-c FILE, --csv-output=FILE`
	Save results to FILE in CSV format.
`--version`	Show version.
`-a, --get-all`	Fetch all available transparency reports
`-s, --get=SOURCE`
	Fetch data from SOURCE (e.g. facebook)

Cache

We recommend clearing the cache before generating a report, except while developing new code.

You can clear the cache with:: make clear_cache

Downloaded data is cached in the ./cache/ directory. Results with error codes (e.g. 404) are not cached. The cache needs to be cleared before changes to sites are detected. For sites that publish each report at a different URL, this works fine. For sites where everything is at one URL (e.g. Linkedin), or where the data on a page changes (e.g. the most recent snapchat URL), these changes will be missed.

Troubleshooting

are you getting:: ssl.SSLError: [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed (_ssl.c:777)

On MacOS, you want to navigate to "Applications/Python 3.6" and run the "Install Certificates.command" file.

Testing

You can run the unit tests with:: make test
You can run the system tests with:: make system_test

The system tests use the program to download and process all data (using the cached data if available) and then checks a small subset of the results agree with data copied off the website by hand.

You can use use the SYSTEM_TEST_USE_OLD_OUTPUT environment variable to skip creating an output file This is useful when writing new system tests.

SYSTEM_TEST_USE_OLD_OUTPUT=1 make system_test

Scope

Currently, the following transparency reports are read and imported:

Google information requests, government removal requests
Twitter information requests, government removal requests
Snapchat information requests
Linkedin information requests
Facebook: information requests, preservation requests, government removal requests

Name		Name	Last commit message	Last commit date
Latest commit History 182 Commits
cache		cache
manual		manual
system_tests		system_tests
tests		tests
transparency		transparency
.gitignore		.gitignore
ASSUMPTIONS.txt		ASSUMPTIONS.txt
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
Makefile		Makefile
QUT_DMRC_Transparency_Aggregator_How_to_add_a_new_source.rtf		QUT_DMRC_Transparency_Aggregator_How_to_add_a_new_source.rtf
README.rst		README.rst
main.py		main.py
requirements.txt		requirements.txt
setup.cfg		setup.cfg
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Transparency Report Aggregator

Overview

Progress to date -- get involved!

Contact

Usage

Cache

Troubleshooting

Testing

Scope

About

Releases

Packages

Contributors 2

Languages

License

qut-dmrc/transparency-aggregator

Folders and files

Latest commit

History

Repository files navigation

Transparency Report Aggregator

Overview

Progress to date -- get involved!

Contact

Usage

Cache

Troubleshooting

Testing

Scope

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages