SMBUD data project

Introduction

This is the repository of Systems and methods for big and unstructured data project held at Politecnico di Milano in 2021. The aim of the project is to design and implement NoSQL databases for different scenarios.

Grading

Assignment	Grade	Optional part	Total
Neo4J	9/10	✔️	10
MongoDB	9.5/10	✔️	10.5
ElasticSearch	10/10	✔️	11

Final score: 31.5/33

First assignment - Neo4J

Design, store and query graph data structures in a NoSQL DB for supporting a contact tracing application for COVID-19.

Tasks to perform:

Design conceptual model
Store a sample dataset in Neo4J
Write basic Queries (minimum 5) and Commands (minimum 3) useful for typical usage scenarios
Implementation of some application / UI that exploits the Neo4J database.

Second assignment - MongoDB

Design, store and query data on a NoSQL DB supporting a certification app for COVID-19. The data storage solution must keep track of people and information about their tests and vaccination status. In particular, it is required to record in a document-based storage the certificate of vaccination /testing and the authorized bodies that can deliver them. Data stored in the deisigned database will be used for checking the validity of the certificate (concerning expiration dates, evolution of the rules, and so on)

Tasks to perform:

Design conceptual model
Store a sample dataset in MongoDB (some hundred nodes at least)
Write basic Queries (minimum 5) and Commands (minimum 3) useful for typical usage scenarios
Implementation of some application / UI that interacts with the MongoDB database.

Third assignment - ElasticSearch & Kibana

Design, store and query data on a NoSQL DB supporting a data analysis scenario over data about COVID-19 vaccination statistics. The purpose is that of building a comprehensive database of vaccinations. Data analysis is performed over the dataset that can be found at: https://raw.githubusercontent.com/italia/covid19-opendata-vaccini/master/dati/somministrazioni-vaccini-latest.csv

Tasks to perform:

Report the schema of the data, including the types of the different fields. Make sure that the format/schema is correct and motivate it (even if you use the automatic mapping)
Store the dataset in ElasticSearch
Write basic Queries (minimum 8) and data update commands (minimum 2) useful for typical usage scenarios
Implement a simple visualization dashboard using Kibana. Exploration, navigation and dynamicity of the dashboard will be considered a valuable contribution too
Integrate other datasets

Repository structure

random_italian_things: this package is responsible for the random generation of the database's entities such as people, houses (group of people living together) and amenities such as restaurants, pubs and so on. This package is used in the Neo4J and MongoDB assignment.
neo4J assignment content is divided in:
- neo4JDB-populator: package responsible for the automatic population of the graph database. The main.py file exploits the classes belonging to random_italian_things package.
- GUI: package containing all the necessary classes to run the python application supported by the Neo4j DB
- deliverables contains the pdf report file and the example queries
MongoDB_assignment content is divided in:
- data contains main.py used to populate the document oriented DB, webapp package containing all the necessary files to run the webapp application supported by the MongoDB database, finally, queries and commands.txt is a list of example MongoDB queries.
- [Report] contains latex and png files to compile the pdf report
ElasticSearch_assignment contains:
- kibana dashboard file that can be imported in Kibana. See more details in the report
- queries.txt is an example set of queries for the ElasticSearch DB
- dataset_cleaner.py and merge dataset.ipynb are used to fix some format code in the csv file.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

SMBUD data project

Introduction

Table of contents

Grading

First assignment - Neo4J

Second assignment - MongoDB

Third assignment - ElasticSearch & Kibana

Repository structure

Files

README.md

Latest commit

History

README.md

File metadata and controls

SMBUD data project

Introduction

Table of contents

Grading

First assignment - Neo4J

Second assignment - MongoDB

Third assignment - ElasticSearch & Kibana

Repository structure