This assignment holds the purpose of demonstrating how researchers can (easily) produce false positives or inflated prediction rates via p-hacking. See complete description with practical-assignment.md
file.
-
Python *(this was based on 3.7.6 version and used via miniconda). For Python installation tutorials, refer to either conda or pip
-
Jupyter NoteBook *Installation documentation
To install the packages required via conda follow these instructions based on this Documentation. For installing packages rather via pip you can refer to this instead.
For pandas : conda install pandas
(For a specific version install for any package via conda add =(version)
), for example : conda install pandas=1.0.3
For scipy : conda install scipy
Please refer to requirement.txt
file for package installation needed or follow the list below:
- pandas
- numpy
- random2
- matplotlib.pyplot
- statsmodels.formula.api
- statsmodels.api
You can follow the myanalysis.ipynb
run by jupyter notebook for full analysis rundown. This file can be found in the PelletierDeKoninck-B-QLSC612/script/ folder of this repos.
For the data file needed, the file brainsize.csv
can be found in the folder PelletierDeKoninck-B-QLSC612/data/
of this repo.
- Descriptive statistic table of all variables (with the addition of two random seed variables 'partY' and 'partY2')
- Multiple regression model (model_partY) results summary output for predicting partY by factors FSIQ, VIQ and PIQ
- Plots of regression for each factors related to partY
- Plots of residuals for the three independant variables FSIQ, VIQ and PIQ (factors)
- Multiple regression model (model_partY2) result summary output for predicting partY2 by factors FSIQ, VIQ and PIQ