-
Notifications
You must be signed in to change notification settings - Fork 54
Home
Demo | Download | User Guide | API
- What is Datashare ?
- Who uses it ?
- Where can I ask for help ?
- Where can I see it in action ?
- Can I run it on my server ?
- Can I customize Datashare ?
Welcome to Datashare - a self-hosted documents search software. It is a free and open-source software developed by the International Consortium of Investigative Journalists (ICIJ). Initially created to combine multiple named-entity recognition pipelines, this tool is now a fully-featured search interface to dig into your documents. With the help of several open source tools (Extract, Apache Tika, Apache Tesseract, CoreNLP, OpenNLP, Elasticsearch, etc), Datashare can be used on one single personal computer as well as on 100 interconnected servers.
Datashare is developed by the ICIJ, a collective of investigative journalists. Datashare is built at the top of technologies and methods already tested with investigations like the Panama Papers or the Luanda Leaks. Seeing the growing interested for ICIJ's technology, we decided to open source this key component of our investigations so a single journalist as well as big media organizations could use it on their own documents.
Curious to know more about how we use Datashare ?
- How ICIJ will rock its tech in 2020
- How ICIJ analysed 715,000 Luanda Leaks records
- Help test and improve our latest journalism tool
- How Datashare project will help journalists breach borders
Datashare's User Guide and FAQ are published on Gitbook. Please refer to the Support page to report any problem installing Datashare. If you're used to Github, you can also report an issue to get in touch quickly with our team.
We setup a Demo instance of Datashare with a small set of documents from the Luxleaks investigation (2014). When using this instance, you will be assigned a temporary user which can star, tag and explore documents.
Datashare was also build to be run on a server. This is how we use it for our collaborative projects. Our team is currently working on a proper documentation to help you setup Datashare on your own server. In the meantime, you can have a look on our installer for Linux which uses Docker Compose to create all the resources needed to run a Datashare instance.
When building Datashare, one of our strategic decisions was to use ElasticSearch to create an index of documents. It would be fair to describe Datashare as a nice looking web interface for ElasticSearch. We want our search platform to be user-friendly while keeping all the powerful ElasticSearch features available for advanced users. This way we ensure that Datashare is usable by non tech-savvy reporters, but still robust enough to satisfy data analysts and developers who want to query the index directly with our API.
We are currently implementing the possibility to create plugins, to make this process more accessible. Instead of modifying Datashare directly, you could isolate your code with a specific set of features and then configure Datashare to use it. Each Datashare user could pick the plugins they need or want, and have a fully customized installation of our search platform.