-
Notifications
You must be signed in to change notification settings - Fork 8
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
UncleCAT4
committed
Dec 9, 2023
1 parent
afb0897
commit 0b79c8e
Showing
1 changed file
with
24 additions
and
15 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,75 +1,84 @@ | ||
# Description | ||
|
||
This study utilized GitLab CI/CD and GitHub Actions to automate the construction of the Hidden Markov Model ( HMM ) and conduct a search for Homeotic genes. It further screened out gene IDs that satisfy the user-defined threshold and extracted protein sequences from the obtained results files. | ||
使用 GitHub 的 action 功能来进行自动化的 hmmer,主要功能有 hmmersearch、基因筛选、蛋白质序列提取及自定义的 hmmer 命令。 | ||
|
||
# Usage | ||
|
||
You need to define the following variables in `config.py`. | ||
你需要在 `config.py`中修改一些参数。 | ||
|
||
| name | example | description | | ||
| -- | -- | -- | | ||
| PF_number | PF00031 | The Gene family number of the [pfam](http://pfam-legacy.xfam.org/) , such as PF00031 | | ||
| evaluation_threshold | 1e-5 | The threshold for filtering IDs from the results of hmmsearch | | ||
| species | Oryza_sativa | Species of gene family | | ||
|
||
The repository only contains the genome protein sequences of three species: Arabidopsis_thaliana , Oryza_sativa , Lolium_perenne, and Zea_mays . | ||
仓库文件夹中只有这几个物种的蛋白序列:Arabidopsis_thaliana , Oryza_sativa , Lolium_perenne, and Zea_mays。 | ||
|
||
## Local usage | ||
|
||
I suggest that you use Conda for installation. | ||
使用 conda 建立一个单独的环境来使用,本地使用时需要自己安装 hmmer 和对应的 python 库。 | ||
|
||
```bash | ||
conda create -n hmmer python=3.7 | ||
conda activate hmmer | ||
conda install wget biopython hmmer | ||
``` | ||
|
||
After setting up the environment, modify the information in `config.py` and run `main.py`. | ||
环境搭建完成后修改 `config.py`然后运行`main.py`。 | ||
|
||
## Cloud usage | ||
|
||
You need to clone the warehouse locally first. | ||
`fork`此仓库,clone 到本地: | ||
|
||
```bash | ||
git clone https://github.com/UncleCAT4/auto-hmmer.git | ||
``` | ||
|
||
Fork this warehouse to your own account, modify the information in `config.py`, and submit the changes to the repository. CI/CD will help you complete the remaining tasks. | ||
修改`config.py`中的参数后提交更改,action 会自动开始。 | ||
|
||
```bash | ||
git add . | ||
git commit -m "update" | ||
git push | ||
``` | ||
|
||
You can also make modifications directly on the webpage of the code repository, and the running results can be downloaded from the product. | ||
你也可以直接在 GitHub 网页版修改参数,但并不建议这样做。 | ||
|
||
# Results | ||
|
||
If you are running locally, the results are clearly visible.If you are using it in the cloud,you can download the completed files from the GitHub Action or Gitlab CI/CD running artifacts. | ||
|
||
Usually contains the following three main artifacts: | ||
action 运行完成后你可以在产物中下载以下文件: | ||
|
||
> result.out ( Results of hmmsearch ) | ||
> id_list.txt ( Filtered ID List ) | ||
> protein_for_target_id.fasta ( Protein sequence extracted from ID list ) | ||
# Customize the hmmer command | ||
# 自定义 hmmer 命令 | ||
|
||
You only need to define the format of the new command in `main.py`, such as the following command: | ||
你只需要在`main.py`中定义一个新的 hmmer 命令,例如: | ||
|
||
```python | ||
hmmbuild_command = 'hmmbuild model.hmm PFseed.txt' | ||
``` | ||
|
||
Then run the command by calling the function in `main.py`. | ||
随后在`main.py`中调用命令: | ||
|
||
```python | ||
run_command(hmmbuild_command) | ||
``` | ||
|
||
# Problem | ||
|
||
Different transcripts of the same gene were not removed. | ||
- 没有去除基因的不同转录本。 | ||
|
||
# More | ||
|
||
如果你想使用额外物种的蛋白序列,将序列放在`protein_seq`文件夹下,修改`config.py`中的物种名称即可,请注意,序列的文件拓展名应该是`fasta`,如果是其他拓展名,你可以在`main.py`中修改。 | ||
|
||
```python | ||
target_species_proteins = "./protein_seq/{}.fasta".format(config.species) | ||
target_gene_proteins = 'protein_for_target_id.fasta' | ||
``` | ||
|
||
由于 GitHub 的仓库大小限制为 1GB,所以建议`clone`到本地后先清除其它物种的蛋白序列。 |