Skip to content

Latest commit

 

History

History
100 lines (75 loc) · 3.21 KB

on gff.md

File metadata and controls

100 lines (75 loc) · 3.21 KB

the gff general parsing can be make working again (TODO: why was not enabled?)

step 0: prepare your intermine environment

for example: in your home directory $HOME ``` mkdir .intermine mkdir git ``` in your .intermine dir put your properties file mymine.properties

in your git directory place your mine, follow instructions here .

Note:

cd git
git clone https://github.com/mygithub/mymine.git

Alternatively, you could just use a checkout of biotestmine, see here

Note:

cd git
git clone https://github.com/intermine/biotestmine

install locally intermine

cd git
git clone https://github.com/intermine/intermine.git 
cd intermine

edit the file bio/sources/settings.gradle: you need to add the source bio-source-gff (add the lines marked with + below)

+':bio-source-gff',
 ':bio-source-go',

and

+project(':bio-source-gff').projectDir = new File(settingsDir, './gff')
 project(':bio-source-go-annotation').projectDir = new File(settingsDir, './go-annotation')

then:

cd bio/sources
./gradlew bio-source-gff:install --stacktrace

in your mine add your gff source

add to your project.xml file something like (assuming human data -> taxid=9606): git clone https://github.com/intermine/biotestmine

<source name="exgff3" type="gff">
  <property name="gff3.taxonId" value="9606"/>
  <property name="gff3.seqClsName" value="Chromosome"/>
  <property name="src.data.dir" location="/datadir"/>
  <property name="gff3.dataSourceName" value="yoursourcename"/>
  <property name="gff3.dataSetTitle" value="your dataset title"/>
  <!-- add licence here -->
  <property name="gff3.licence" value="https://chttps://github.com/intermine/intermine.gitreativecommons.org/licenses/by-sa/3.0/" />
</source>

troubleshooting

  • if you deal with human data, then you need to check also the file bio/core/src/main/resources/gff_config.properties in your intermine directory where there are some default settings.

Issues with your gff files

  • important: the third field in the gff file must be a feature that is modelled in the mine. so if one of the core ones (gene, protein, etc) there is nothing to do, otherwise you need to add to the mine model your new entities.

to run a build, i had to change all the various 'association' fields (Exterior_Association, Health_Association, Reproduction_Association, etc) to something in the model. I used 'gene'

  • some of the records are missing the locations, and the '.' to mark its absence. you need to have something like

Gene . . . . .

rather than

Gene . . .

  • in the attributes field (the last one) you must have name=value pairs. sometime this is not the case e.g.

FlankMarkers;

in

Chr.16 Animal QTLdb Exterior_Association 62860102 62943884 . . . QTL_ID=17668;Name="Maternal infanticide";Abbrev=MATINF;PUBMED_ID=21303561;trait_ID=538;trait=Maternal infanticide;FlankMarkers;VTO_name="parental behavior trait";Map_Type=Linkage;Model=Mendelian;peak_cM=57.2;Significance=Significant;P-value=0.009;Likelihood_Ratio=9.485;gene_ID=100144487;gene_IDsrc=NCBIgene

and that breaks the parsing.