RESTify Data Analysis

Data, Data-Mining and Visualization for the RESTify experiment.

About

This repository hosts sources and raw input data that allows replication of empiric findings around the RESTify experiment. The data can be reproduced and inspected with a Jupyter Notebook instance, or for more experienced users and collaborators with a preconfigured PyCharm project.

To replicate our data analysis, you have four options:

Inspect the rendered preview on GitHub, using only your browser
=> You will see all figures of this paper, pre-rendered. However, you will not be able to modify or execute the notebook. Some internal links may not work.
Deploy a local Jupyter Notebook as preconfigured Docker container.
=> The fastest and simplest way to replicate our work and findings.
Manually set up a local Jupyter Notebook.
=> Similar to the previous option, you can replicate the work and findings. The manual setup requires proficiency with python installations.
Manually run individual parts of the data analysis with the PyCharm IDE:
=> Full access to all implementation details. The preferred option for peer-reviewers, software developers and data scientists who want to investigate and understand our work. You can replicate our findings, and on top inspect the implementation. You can debug the code, verify correctness of our implementation, and if desired build on top.

Dockerized Notebook

This repository hosts a Docker configuration that creates a container Jupyter Notebook instance with all runtime dependencies.
The notebook allows you to locally replicate our methodology and all findings, together with in-depth explanations.

Instructions for Docker (MacOS / Linux host):

Install Docker
(After install, test your setup with: docker run hello-world)
Clone this repository:
git clone https://github.com/m5c/RestifyJupyter.git
Build and Run the Jupyter Notebook Container:
cd RestifyJupyter; ./docker-autostart.sh
(On linux, you may need to prefix the docker command with sudo)
Access the Notebook: http://127.0.0.1:8889/notebooks/Restify.ipynb

If you see a notebook with all paper figures and stats, you have succeeded.

Manual Notebook

This section explains how to run the Jupyter Notebook instance natively. For this to work, you must install all runtime dependencies. The below steps will install the dependencies in a virtual environment, so your system-wide python installation stays clutter-free.

Install Python 3.9 or newer. Make sure the newly installed python version is set as default. Verify with: python --version
Go into the project and create a new virtual environment (local python folder with all dependencies):
cd RestifyJupyter
python3 -m venv .env
source .env/bin/activate
Install all required python libraries, using the pip3 package manager:
pip3 install pandas numpy matplotlib plotly scipy statsmodels seaborn jupyter
You can also install all at once, with pip3 install -r requirements.txt
Start up the Notebook:
- Start the Notebook: jupyter notebook
- Access the Notebook: http://localhost:8888/notebooks/Restify.ipynb

PyCharm IDE

Complementary to the replication of our results with a Jupyter Notebook, you can also directly execute the python code used for data mining. This option provides an in depth access to implementation details and is intended for data scientist who want to either:

Validate the correctness of our extracted data at coding level.
Enrich our the data analysis we implemented by additional insights.

All runtime dependencies, including python itself, can be directly installed from PyCharm, however it is important that the IDE is configured to use the correct interpreter.

Install PyCharm. The free Community Edition is sufficient.
Install the python3 interpreter. You find a corresponding option in the PyCharm -> Settings menu:
Install all required libraries. Open the PyCharm -> Settings -> Project -> Interpreter menu:
- Click the + sign, then install everything listed in requirements.txt
Install PyLint. Open the plugins menu: PyCharm -> Settings -> Plugins:
- Configure PyLint to use the root .pylintrc config file, so it correctly resolves imports.
Select the desired run configuration, to replicate any of our results:
- For every code cell of the Notebook, there is a corresponding preconfigured run configuration.
- We recommend that you run the run_all_pseudo_cell.py script, which recreates all statistical figures and listings from the paper.

Inputs and Outputs

Inputs:
The Notebook works on the CSV data, stored in source-csv-files. It is the same data as provided in our replication bundle.
Outputs:
- Figures are generated to generated-plots
- Intermediate CSV files are generated to generated-csv-files

Implementation Details

This section is only relevant for data analysts who want to tweak the notebook output / visualization, or reuse part of the codebase for similar project layouts.

Label Makers

For scatter plots and scatter series you can easily change how samples are annotated. Just pass a different LabelMaker at the moment of scatter instantiation.
LabelMakers are defined in restify_mining/scatter_plotters/extractors.

If you with to annotated only selected dots, edit the labeloverride.csv and use a custom LabelMaker.

To remove all labels, use the EmptyLabelMaker.
To annotate full codenames (colour + animal) use the FullLabelMaker.
To annotate group internal codenames (only animal), use AnimalLabelMaker.

License

This software is under open source MIT License.

Author / References

Principal Investigator: Maximilian Schiedermeier
Academic Supervisors: Bettina Kemme , Jorg Kienzle
Implementation: Maximilian Schiedermeier
- Study Instructions, by control group:
  - Red
  - Green
  - Blue
  - Yellow
- Legacy Application Source Code:
  - BookStore
  - Zoo
  - Xox
- Participant Submission Analyzer: RestifyAnalyzer
Research Ethics Board Advisor: Lynda McNeil

Name		Name	Last commit message	Last commit date
Latest commit History 278 Commits
.idea		.idea
.ipynb_checkpoints		.ipynb_checkpoints
csv_tools		csv_tools
jupyter_snippets		jupyter_snippets
markdown		markdown
restify_mining		restify_mining
source-csv-files		source-csv-files
static-figures		static-figures
.gitignore		.gitignore
.png		.png
.pylintrc		.pylintrc
Dockerfile		Dockerfile
LICENSE.txt		LICENSE.txt
README.md		README.md
Restify.ipynb		Restify.ipynb
docker-autostart.sh		docker-autostart.sh
dssh		dssh
format.py		format.py
github-deploy-instructions.txt		github-deploy-instructions.txt
jupyter_validator.py		jupyter_validator.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

RESTify Data Analysis

About

Dockerized Notebook

Manual Notebook

PyCharm IDE

Inputs and Outputs

Implementation Details

Label Makers

License

Author / References

About

Releases

Packages

Languages

License

m5c/RestifyJupyter

Folders and files

Latest commit

History

Repository files navigation

RESTify Data Analysis

About

Dockerized Notebook

Manual Notebook

PyCharm IDE

Inputs and Outputs

Implementation Details

Label Makers

License

Author / References

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages