This project is an implementation of Neural Networks for effective population size and mutation rate parameters estimation of Wright-Fisher simulator method.
There are two parts of the project:
[1] - The WF simulator, which is bulky and requires Python3.7 environment. This environment contains ELFI, PhyloDeep and Pypy packages. Simply install by
either
pip install -r requirements.txt
or
conda env create -f simulator_env.yml
[2] - The Neural Network, which requires PyToch and Python3.10 environment
either
pip install -r ./nn_model/requirements.txt
or
conda env create -f nn_env.yml
The scripts listed here can be used in the 'simulator_env'. Each used . The data is generated by running the simulator with different parameters.
This script is used to run the Wright-Fisher simulator with different parameters. The parameters are set in the script. This script is mainly used to generate summary statistics for the Neural Network. The output is a .csv file with the summary statistics and the parameters used to generate the data. The script can simply be run by
python run_simulator.py --input_fasta [str:fasta/file/path.fasta] --n_simulations [int:number of simulations] --outdir [str:output/dir/path]
This script is used to generate summary statistics from a given newick tree and also plots tree. Outputs path.png and path_stats.json The script can be run by
python run_get_tree_stats.py --infile_path [str:tree/file/path.tree] --output_path [str:output/tree/path.png]
This script is used to run the Bayesian Optimisation Likelihood-Free Inference (BOLFI) algorithm. The script can simply be run by
python run_bolfi.py --observed_data [str:input/dir/path.csv] --fasta [str:fasta/file/path.fasta]
The scripts listed here can be used in the 'nn_env'. Each script has a help function that can be accessed by running the script with the -h flag. The data is generated by running the simulator with different parameters.
This script is used to preprocess the summary statistics data generated by the simulator. The script can be run by
python run_preprocess.py --input_csv [str:input/dir/path] --output_csv [str:output/path.csv]
This script is used to train the Neural Network and by adjusting optimiser, loss function etc. The trained model is saved in output path. The script can be run by
python run_train_nn.py --input_csv [str:input/dir/path.csv] --output_path [str:output/dir/path.pth]