PopMLvis is a population genetic analysis application. It provides a comprehensive interactive environment for scientists, bioinformaticians, and researchers to dig deeper in analyzing population genetic datasets. In order to understand the gene structure, our platform analysis includes dimensionality reduction algorithms, machine learning models, statistical measurements and more.
These instructions will cover usage information for the docker container.
In order to run this container you'll need docker installed.
If cloning the repository, install git-lfs. (https://github.com/git-lfs/git-lfs/blob/main/README.md)
Download the source code by either cloning the repository (git clone https://github.com/qcri/QCAI-PopMLVis.git
) or Download Zip as following.
There are two configuration files, backend/gunicorn_local.conf, and frontend/envfile. Modify them only if you want to change the configuration of the backend or frontend.
After that, in your command line, navigate to the folder you downloaded/cloned and run:
docker-compose up
After the initialization is done, open your web-browser and load http://localhost:3000
The documentation and funcionality is reported at https://popmlvis.qcri.org/static/media/PopMLvis.b6275acf.pdf
- PCA
○ It accepts both comma separated and space separated files.
○ The input should have headers.
○ Each PCA column needs to be named PC*(or TSNE*, but not both!)
where * can be any string of alphanumeric values.
○ The rest of the columns also need to be named, but the naming
need not be specific.
○ Make sure to have a column named IID, in order for the "Merge Metadata"
functionality to correctly map the subjects.
- PCA and Admixture
○ The PCA input file should have the above structure.
○ It accepts files with .Q extension
○ The content of the .Q file:
○ Should not have headers.
○ Should be space delimited.
○ NOTE : For the correct visualization of the Scatter Plot with the admix
clustering information, the ordering of the data should be the same in the
PCA and Admix input.
- PCA
○ It accepts a correlation matrix.
○ The input can be a comma separated, space delimited or a pickle file
containing the correlation matrix.
○ The input need not have headers or indices.
- PC-AiR
○ It accepts .bed, .bim, .fam files.
○ The kinship can be comma or space delimited.
○ If we detect that the files do not have a similar structure
(ex. Same number of subjects etc.), an error will be thrown.
- t-SNE 2D/3D
○ It will work with both PCA/PC-AiR data or Correlation Matrix data.
○ If the number of columns is relatively large (eg. > 50), make sure
to use another dimensionality reduction method first.
- The input file should consist of headers.
- It can be comma or space delimited.
- It should have a column called IID that stores the ids of the subjects.
- NOTE: If the number of subjects in the metadata do not match that of the initial
file input, then an error screen will appear. However, the matching will still be
done based on the IIDs that appear in both the input file and the metadata file.