Skip to content

maethub/makar

Repository files navigation

Makar Data Manager

Makar Data Manager allows to easily import data from different sources (e.g Stack Overflow, Github, Mailinglists). User-Defined data models guarantee great flexibility, and with extendable Transformations the data can be preprocessed as needed for further analysis.

Overview

1. Requirements
2. Installation
3. Usage
4. Development
5. References
6. License

Requirements

  • Docker (>= v19.03)
  • docker-compose (>= v3)

Installation

  1. Set environment variables in .env file in the project root
DATABASE_USER=postgres
DATABASE_PASSWORD=[choose a password]
DATABASE_HOST=db
  1. Build and run docker containers
docker-compose build
  1. Init Database
docker-compose run --rm web rails db:setup
  1. Start Makar
docker-compose up
  1. Access Makar Data manager at localhost:3000

Usage

Schema

Schemas define the structure of the datasets managed in Makar. Go to Menu-Item "Schemas" to manage your schemas. Schemas can be defined by a special JSON-based DSL. The following image shows a valid schema definition. See lib/schema_definition.json for exact specification of how to define a schema.

Import

Various import adapters are available to load data into Makar. Some import adapter need to be given an existing Schema (e.g. CSV and JSON import) and the data needs to be exactly matching this schema. Other import adapters create the Schema ad-hoc. Based on the import adapter, different parameters need to be given.

Currently implemented import adapters are:

Example of the Mailing List Import Adapter:

Search Interface

All imported records can be browsed and queried on the page "Records". The search interface allows one to define fine-grained queries with multiple conditions and various search predicates (i.e. contains, equals, less than, ...). Search queries can be saved as Filters to be reused later.

Collections

Collections can be created to group records. Every record can be part of any number of collections. Records can be added manually to collections or by assigning a filter (i.e. saved search query) to the collection. When a filter is assigned to a collection, all records matching that filter get assigned automatically to the collection.

Collections offer functionality (e.g. delete records, compact table view, exports, custom exports) that is applied to all records in the collection.

The following image shows an collection which is connected to a filter. By clicking on the sync icon, the collection gets updated with all records matching the assigned filter.

Transformations

Transformations perform operations on all records of a specified collection. This is useful to apply preprocessing steps (e.g. remove HTML tags) on the dataset.

The following example show how to remove HTML tags from the attribute "Body" in all records in the Collection "Stack Overflow Questions" and store the result in the attribute "Body_stripped".

A transformation is always executed in a background process. The status of such a running transformation can be seen on the page "Jobs".

In case the transformation produces an error, the error log is shown there.

Export

The datasets in makar can be exported in various formats. The export functionality is available for collections. A user can select the attributes to export and specify the format.

Development

Various parts of this tool can adapted easily to support more use cases, add more transformations, import capabilities or export formats. The following list should help developers to know where to start looking when extending certain functionality of the tool.

References

Demo

License

GNU General Public License v3.0 or later

See COPYING to see the full text.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published