Skip to content
Pierre Romera edited this page Jan 2, 2023 · 25 revisions

Datashare: Better analyze information, in all its forms

Demo | Download | User Guide

  1. What is Datashare?
  2. Who uses it?
  3. Where can I ask for help?
  4. Where can I see it in action?
  5. Can I run it on my server?
  6. Can I customize Datashare?

What is Datashare?

Welcome to Datashare - a self-hosted documents search software. It is a free and open-source software developed by the International Consortium of Investigative Journalists (ICIJ). Initially created to combine multiple named-entity recognition pipelines, this tool is now a fully-featured search interface to dig into your documents. With the help of several open source tools (Extract, Apache Tika, Apache Tesseract, CoreNLP, OpenNLP, Elasticsearch, etc), Datashare can be used on one single personal computer as well as on 100 interconnected servers.

Who uses it?

Datashare is developed by the ICIJ, a collective of investigative journalists. Datashare is built at the top of technologies and methods already tested with investigations like the Panama Papers or the Luanda Leaks. Seeing the growing interest for ICIJ's technology, we decided to open source this key component of our investigations so a single journalist as well as big media organizations could use it on their own documents.

Curious to know more about how we use Datashare?

Where can I ask for help?

Datashare's User Guide and FAQ are published on Gitbook. Please refer to the Support page to report any problem about Datashare. If you're used to Github, you can also report an issue to get in touch quickly with our team.

Where can I see it in action?

We setup a Demo instance of Datashare with a small set of documents from the Luxleaks investigation (2014). When using this instance, you will be assigned a temporary user which can star, tag and explore documents.

Can I run it on my server?

Datashare was also built to run on a server. This is how we use it for our collaborative projects. Read our documentation to do so.

Can I customize Datashare?

When building Datashare, one of our strategic decisions was to use Elasticsearch to create an index of documents. It would be fair to describe Datashare as a nice looking web interface for Elasticsearch. We want our search platform to be user-friendly while keeping all the powerful Elasticsearch features available for advanced users. This way we ensure that Datashare is usable by non tech-savvy reporters, but still robust enough to satisfy data analysts and developers who want to query the index directly with our API.

We implemented the possibility to create plugins, to make this process more accessible. Instead of modifying Datashare directly, you could isolate your code with a specific set of features and then configure Datashare to use it. Each Datashare user could pick the plugins they need or want, and have a fully customized installation of our search platform. Please have a look at the documentation.

Datashare

Customize (Legacy)

Translations

This project is currently available in English, French, Spanish and Japanese. You can help us to improve and complete translations on Crowdin.

About ICIJ

Datashare is a project by ICIJ, a collective of investigative journalists.

ICIJ Logo

Clone this wiki locally