Custom database questions #45

mbhall88 · 2022-08-29T04:45:58Z

I'm having some issues trying to create a custom database.

My understanding from the documentation is that I clone this repo, and then replace/change the tbdb.csv file to have the mutations I want, then I run parse_db.py in the main directory?

It seems there is a file missing? And I can't find it documented anywhere

$ python parse_db.py -c tbdb.csv --custom
Traceback (most recent call last):
  File "/Users/michaelhall/Projects/drprg/paper/tmp/tbdb/parse_db.py", line 281, in <module>
    args.func(args)
  File "/Users/michaelhall/Projects/drprg/paper/tmp/tbdb/parse_db.py", line 202, in main
    gene_info = load_gene_info("genes.txt")
  File "/Users/michaelhall/Projects/drprg/paper/tmp/tbdb/parse_db.py", line 187, in load_gene_info
    for l in open(filename):
FileNotFoundError: [Errno 2] No such file or directory: 'genes.txt'

I then instead tried running the following from the tbdb main directory

$ tb-profiler create_db --custom --include_original_mutation

this completes successfully, but I have a further issue with the output of this.

As per the docs, the mutations must follow HGVS nomenclature. But it seems tb-profiler only accepts a subset of this nomenclature.

For example, I have the mutation c.196_198delinsTAG, which describes an MNP at position 196 TCG>TAG. Looking at the tbdb.conversion.log this (incorrectly) gets converted as

Converted pncA c.196_198delinsTAG to c.196_198delTCG

Are you able to clarify (here and in the docs) what subset you support?

The text was updated successfully, but these errors were encountered:

mbhall88 · 2022-08-30T06:58:04Z

I've also notice you don't accept duplications in the recommended format? i.e. c.643dup must specify the duplicated base at the end e.g., c.643dupC

jodyphelan · 2022-08-31T09:07:58Z

Hi @mbhall88 ,

Sorry I need to update the documentation. You are right in using tb-profiler create_db instead.

As per the docs, the mutations must follow HGVS nomenclature. But it seems tb-profiler only accepts a subset of this nomenclature.
For example, I have the mutation c.196_198delinsTAG, which describes an MNP at position 196 TCG>TAG. Looking at the tbdb.conversion.log this (incorrectly) gets converted.

Yes at the moment it is only a subset, which it accepts. The pipeline uses snpEff to annotate variants in new samples and only represents the variants in one way (e.g. c.643dupC instead c.643dup). To simplify the variant looup step the create_db function tried to standardise all variants to the snpEff format using regex, but currently I've only added support for the variants that are tbdb.csv. I'll try over the next days to update the docs and look into adding compatibility for more types such as the one you listed.

Thanks for raising the issue!

mbhall88 · 2022-08-31T23:34:36Z

Thanks for the clarification. Trying to support all of HGVS would likely be difficult, and would likely require developing a library. I just noticed https://github.com/biocommons/hgvs though! I haven't used it before, but looks like it might make your life a little easier potentially?

Anyways, I got a custom db working and just thought this issue might be helpful just for some docs changes.

Thanks for the quick response.

jodyphelan · 2022-09-01T12:07:23Z

Oh I hadn't seen that before, I'll check it out thanks!
And, I'll have a go at updating the docs asap.

jodyphelan added this to TB-Profiler Sep 1, 2022

jodyphelan moved this to 🆕 New in TB-Profiler Sep 1, 2022

jodyphelan removed this from TB-Profiler Sep 1, 2022

jodyphelan added this to TB-Profiler Sep 1, 2022

jodyphelan moved this to 🆕 New in TB-Profiler Sep 1, 2022

mbhall88 mentioned this issue Oct 11, 2022

[Errno 2] No such file or directory: 'genes.txt' #46

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Custom database questions #45

Custom database questions #45

mbhall88 commented Aug 29, 2022

mbhall88 commented Aug 30, 2022

jodyphelan commented Aug 31, 2022

mbhall88 commented Aug 31, 2022

jodyphelan commented Sep 1, 2022

Custom database questions #45

Custom database questions #45

Comments

mbhall88 commented Aug 29, 2022

mbhall88 commented Aug 30, 2022

jodyphelan commented Aug 31, 2022

mbhall88 commented Aug 31, 2022

jodyphelan commented Sep 1, 2022