Skip to content

Commit

Permalink
Modify the doc
Browse files Browse the repository at this point in the history
  • Loading branch information
Anne Lopes committed Jan 3, 2021
1 parent 98d3c08 commit 6aa173e
Show file tree
Hide file tree
Showing 27 changed files with 160 additions and 106 deletions.
3 changes: 3 additions & 0 deletions docs/.idea/.gitignore

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

8 changes: 8 additions & 0 deletions docs/.idea/docs.iml

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

6 changes: 6 additions & 0 deletions docs/.idea/inspectionProfiles/profiles_settings.xml

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

4 changes: 4 additions & 0 deletions docs/.idea/misc.xml

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

8 changes: 8 additions & 0 deletions docs/.idea/modules.xml

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

6 changes: 6 additions & 0 deletions docs/.idea/vcs.xml

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

2 changes: 1 addition & 1 deletion docs/How_it_works_orfold.md
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,7 @@ They are mostly associated with intermediate HCA score values.


The HCA score is calculated using the freely available
software **pyHCA** which can be downloaded and installed
softwares **pyHCA** which can be downloaded and installed
following the instructions of its developers: <https://github.com/T-B-F/pyHCA>


Expand Down
9 changes: 4 additions & 5 deletions site/How_it_works_orfold.html
Original file line number Diff line number Diff line change
Expand Up @@ -183,7 +183,7 @@ <h1 id="how-works-orfold">How works ORFold?<a class="headerlink" href="#how-work
knowledge of their 3D structures nor evolutionary information (orphan sequences can be treated).
The fold potential of a sequence is
calculated with the HCA
method. Also ORFold can estimate the disorder,
method. Also, ORFold can estimate the disorder,
and the aggregation propensities of the input sequences with IUPred
and Tango respectively. </p>
<h2 id="hydrophobic-clusters-analysis-hca">Hydrophobic Clusters Analysis (HCA)<a class="headerlink" href="#hydrophobic-clusters-analysis-hca" title="Permanent link">#</a></h2>
Expand Down Expand Up @@ -213,10 +213,9 @@ <h2 id="tango">Tango<a class="headerlink" href="#tango" title="Permanent link">#
Tango is not freely available software, and the user of ORFold should
first contact the Tango developers to have access to the source code: <a href="http://tango.crg.es">http://tango.crg.es</a></p>
<p>For the aggregation propensity estimation, according to the protocol
proposed by XXX et al[REF], a residue is considered as
participating in an aggregation prone region if it is located in a segment
of at least five consecutive residues which were predicted as populating
a b-aggregated conformation for more than 5%.
proposed by Linding et al.[6], a sequence segment is considered as aggregation prone
if it is composed of at least five consecutive residues predicted as
populating a b-aggregated conformation with a percentage occupancy greater than 5%.
Then, the aggregation propensity of each sequence is defined as the
fraction of residues predicted in aggregation prone segments. </p>
<h2 id="iupred">IUPred<a class="headerlink" href="#iupred" title="Permanent link">#</a></h2>
Expand Down
2 changes: 1 addition & 1 deletion site/Objective_orfold.html
Original file line number Diff line number Diff line change
Expand Up @@ -169,7 +169,7 @@
<div role="main">
<div class="section">

<p><img alt="LOGO_ORFold" src="img/icons/Logo_ORFold.eps" width="30%" /></p>
<p><img alt="LOGO_ORFold" src="img/icons/Logo_ORFold.png" width="30%" /></p>
<h1 id="aims-and-general-description-of-orfold">Aims and general description of ORFold<a class="headerlink" href="#aims-and-general-description-of-orfold" title="Permanent link">#</a></h1>
<p>ORFold aims at estimating the fold potential of a set of amino acid sequences
using the <strong>Hydrophobic Clusters Analysis (HCA)</strong> method [1].
Expand Down
22 changes: 18 additions & 4 deletions site/Plot_orfold.html
Original file line number Diff line number Diff line change
Expand Up @@ -171,14 +171,23 @@ <h1 id="plot-of-the-orfold-output">Plot of the ORFold output<a class="headerlink
<p>The output table generated by ORFold can be subsequently given to ORFold
to generate a plot of the HCA score distribution.
The user can provide several tables in order to compare different HCA score
distribution. In this case, ORFplot will plot all the distributions on the same plot
(the tables must be given with the <strong>-tab</strong> option).
distributions. In this case, ORFplot will plot all the distributions on the same plot
(the tables must be given with the <strong>-tab</strong> option). </p>
<p>The HCA score distribution of a set
of globular proteins extracted from [1] is represented by the grey histogram.
We defined three sequence categories according to their HCA scores: low, intermediate and high HCA score
sequences. The boundaries of these categories are defined so that 95% of the globular proteins
fall into the intermediate HCA score bin. Dotted black lines delineate the
boundaries of each category. </p>
<p>Each plotted distribution is compared with the one of the globular
proteins set with a Kolmogorov Smirnov test.
Asterisks on the plot denote level of significance: * &lt; 0.05, ** &lt; 0.01, *** &lt; 0.001.
By default, the names used in the legend of the resulting plot
are the root names of the input table files.
However, the user can write his own names in the legend
with the <strong>-names</strong> option. The names must be given in the same order
as the table files. </p>
<pre><code>orfplot -tab sequences_Y.tab sequences_X.tab sequences_Z.tab
<pre><code bash="bash">orfplot -tab sequences_Y.tab sequences_X.tab sequences_Z.tab
</code></pre>
<p>This example will generate the HCA score distributions of the sequences
stored in the sequences_Y.tab, sequences_X.tab and sequences_Z.tab files.
Expand All @@ -187,7 +196,7 @@ <h1 id="plot-of-the-orfold-output">Plot of the ORFold output<a class="headerlink
</code></pre>
<p>This example will generate the HCA score distributions of the sequences
stored in the sequences_Y.tab, sequences_X.tab and sequences_Z.tab files.
The resulting legend will be Noncoding, Coding and Translated, respectively.</p>
The resulting legend will be "Noncoding", "Coding" and "Translated", respectively.</p>
<div class="admonition note">
<p class="first admonition-title">
Note
Expand All @@ -204,6 +213,11 @@ <h1 id="plot-of-the-orfold-output">Plot of the ORFold output<a class="headerlink

</p>
</div>

<p>References</p>
<ol>
<li>Mészáros B, Erdős G, Dosztányi Z (2018) IUPred2A: context-dependent prediction of protein disorder as a function of redox state and protein binding. Nucleic acids research 46:W329–W337</li>
</ol>

</div>
</div>
Expand Down
16 changes: 11 additions & 5 deletions site/Run_orfold_advanced.html
Original file line number Diff line number Diff line change
Expand Up @@ -190,7 +190,7 @@ <h3 id="mapping-of-the-fold-potential-and-the-disorder-and-aggregation-propensit
that contain for the ORFs provided in the input FASTA file, their corresponding
property (fold potential, disorder or aggregation propensities). The
values are stored in the column #9 of the output GFF files. The GFF files can be subsequently
uploaded on a genome viewer such as IGV [REF].</p>
uploaded on a genome viewer such as IGV [1].</p>
<p>The input GFF file must be given with the <strong>-gff</strong> option as follows:</p>
<pre><code>orfold -fna sequences.fasta -options HIT -gff sequences.gff
</code></pre>
Expand All @@ -209,7 +209,7 @@ <h3 id="mapping-of-the-fold-potential-and-the-disorder-and-aggregation-propensit
a genome viewer, thereby enabling the visual inspection of these properties along
the input genome. Notice that on IGV, blue indicates low values (for all mapped properties)
while red indicates high values. </p>
<p><img alt="HCA Scale" src="img/mapping/Scale.pdf" /><br>
<p><img alt="HCA Scale" src="img/mapping/Scale.png" /><br>
<em>Figure 1: Color scale for the HCA score values
</em></p>
<div class="admonition note">
Expand Down Expand Up @@ -338,14 +338,14 @@ <h3 id="running-orfold-on-subsets-of-randomly-selected-sequences">Running ORFold
<pre><code>orfold -fna sequences_X.pfasta sequences_Y.pfasta -options H -N 1500 3000
</code></pre>

even if is the same sample size it has to be explicit for every input
Also, if the user wants to sample two subsets of same sizes, he has to indicate the subset sizes explicitly for each input

<pre><code>orfold -fna sequences_X.pfasta sequences_Y.pfasta -options H -N 1500 1500
</code></pre>

If the user whishes to calculate the fold potential of all the sequences
in one of the given inputs, he has to write in the order of this file
the word "all"
of one of the given inputs, he has to indicate it with the "all" flag (again with respect to
the order of input files)

<pre><code>orfold -fna sequences_X.pfasta sequences_Y.pfasta -options H -N all 3000
</code></pre>
Expand All @@ -356,6 +356,12 @@ <h3 id="running-orfold-on-subsets-of-randomly-selected-sequences">Running ORFold

</p>
</div>

<p>References</p>
<ol>
<li>Robinson JT, Thorvaldsdóttir H, Winckler W, et al (2011)
Integrative genomics viewer. Nature biotechnology 29:24–26</li>
</ol>

</div>
</div>
Expand Down
Binary file removed site/img/mapping/Scale.pdf
Binary file not shown.
Binary file added site/img/mapping/Scale.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file removed site/img/mapping/orf_annotation.pdf
Binary file not shown.
Binary file added site/img/mapping/orf_annotation.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
46 changes: 23 additions & 23 deletions site/orfmap_annotation.html
Original file line number Diff line number Diff line change
Expand Up @@ -185,11 +185,11 @@ <h2 id="orf-annotation">ORF annotation<a class="headerlink" href="#orf-annotatio
noncoding intergenic ORFs (nc_intergenic) or noncoding overlapping
ORFs (nc_ovp-x with x refering to the overlapping genomic feature)
(see <a href="orfmap_overlap.html">here</a> for the definition of an overlap).
The former correspond to ORFs which do not overlap with any
The former correspond to ORFs which do not overlap any
genomic feature. The latter consist of ORFs
which overlap with a non-phased genomic feature (i.e. non coding)
which overlap a non-phased genomic feature (i.e. non coding)
on the same or the opposite
strand or which overlap with a CDS in another frame.
strand or which overlap a CDS in another frame.
Depending on the localization of the overlapping feature (same or
opposite strand), the ORFs are annotated as nc_ovp_same-x or
nc_ovp_opp-x respectively.</p>
Expand All @@ -198,7 +198,7 @@ <h2 id="orf-annotation">ORF annotation<a class="headerlink" href="#orf-annotatio
Note
</p>
<p class="last">
Notice that the ORFmap annotation adopts/has a particular point of
Notice that the ORFmap annotation has a particular point of
view on the genome which is centered on the identification and
annotation of a genome's ORFs rather than the annotation of
real biological objets (e.g. tRNA, rRNA or lncRNA for example).
Expand All @@ -214,13 +214,13 @@ <h4 id="orf-categories">ORF categories<a class="headerlink" href="#orf-categorie
<ul>
<li>(1) <code>c_CDS</code> ORFs which include in the same frame a CDS </li>
<li>(2) <code>nc_intergenic</code> ORFs which do not overlap any genomic feature </li>
<li>(3) <code>nc_ovp_same-x</code> ORFs which overlap on the same strand, with a genetic feature no matter
<li>(3) <code>nc_ovp_same-x</code> ORFs which overlap on the same strand, a genetic feature no matter
its type</li>
<li>(4) <code>nc_ovp_opp-x</code> ORFs which overlap on the opposite strand, with a genetic feature no matter
<li>(4) <code>nc_ovp_opp-x</code> ORFs which overlap on the opposite strand, a genetic feature no matter
its type</li>
</ul>
<p>See examples of each type in Figure 1.</p>
<p><img alt="Examples_of_ORFs" src="img/mapping/orf_annotation.pdf" /><br>
<p><img alt="Examples_of_ORFs" src="img/mapping/orf_annotation.png" /><br>
<em>Figure 1: representation of the six frames of a DNA section. STOP codons
are represented with red stars, CDS with orange boxes and
the localization of the non-phased genomic features
Expand All @@ -230,52 +230,52 @@ <h4 id="orf-categories">ORF categories<a class="headerlink" href="#orf-categorie
Note
</p>
<p class="last">
Notice that a noncoding ORF which overlaps with a tRNA is not
Notice that a noncoding ORF which overlaps a tRNA is not
considered as a tRNA and will not be annotated as tRNA, but rather
as an ORF that overlaps with a tRNA. Indeed a tRNA is a RNA
as an ORF that overlaps a tRNA. Indeed a tRNA is a RNA
molecule that does not follow the ORF definition (not bounded by STOP
codons, whose sequence length is not necessarily a multiple of 3...).
Here ORFs are seen as potential peptides or proteins that could be
produced upon the pervasive translation of their corresponding RNA.
Annotating all ORFs with the genomic feature they overlap with
Annotating all ORFs with the genomic feature they overlap
enables their analysis in a very flexible fashion.
Indeed, the user can adopt different levels of annotation,
considering all noncoding ORFs as a whole (i.e. regardless of the fact they
overlap with a genomic feature or not) or differentiating noncoding ORFs
from noncoding ORFs that overlap with specific genomic features (e.g.
overlap a genomic feature or not) or differentiating noncoding ORFs
from noncoding ORFs that overlap specific genomic features (e.g.
tRNA and rRNA) (see the <a href="./orfget_run.html">ORFget section</a>
for more details).

</p>
</div>

<h4 id="priority-rules">Priority rules<a class="headerlink" href="#priority-rules" title="Permanent link">#</a></h4>
<p>If a noncoding ORF overlaps with multiple genomic features,
<p>If a noncoding ORF overlaps multiple genomic features,
it will be annotated according to the following priority rules:</p>
<ol>
<li>
<p>if the noncoding ORF overlaps with a CDS and another annotated
<p>if the noncoding ORF overlaps a CDS and another annotated
feature, the CDS has priority over the other annotated features
no matter the CDS is located on the same or the opposite strand.
The ORF will be annotated as a noncoding ORF overlapping with
The ORF will be annotated as a noncoding ORF overlapping
a CDS (e.g. nc_ovp_(same/opp)-CDS).</p>
</li>
<li>
<p>if the noncoding ORF overlaps with an annotated feature on the
<p>if the noncoding ORF overlaps an annotated feature on the
same strand and another annotated feature on the opposite
strand (except CDS), the annotated feature located on the same strand
has priority over the other features on the opposite
strand. The ORF will be annotated as a noncoding ORF overlapping
with the feature on the same strand (e.g. nc_ovp_same-x).</p>
the feature on the same strand (e.g. nc_ovp_same-x).</p>
</li>
<li>
<p>if the noncoding ORF overlaps with multiple annotated features
<p>if the noncoding ORF overlaps multiple annotated features
located on the same strand, the feature with the larger overlap
with the ORF to be annotated has priority over the other features
(e.g. nc_ovp_(same/opp)-x).</p>
</li>
<li>
<p>if the noncoding ORF overlaps with multiple features located on the
<p>if the noncoding ORF overlaps multiple features located on the
same strand and that cover the same fraction of the ORF to be
annotated, the feature which first appears in the GFF file has
priority over the others. This case occurs with large annotated
Expand All @@ -290,7 +290,7 @@ <h4 id="priority-rules">Priority rules<a class="headerlink" href="#priority-rule
match at the same time with the features "gene" and "mRNA" are annotated
as nc_(same/opp)_ovp-mRNA (see Figure 2), while those that match with a CDS
and its corresponding exon, will be annotated as c_CDS (i.e. coding
ORFs). Finally, noncoding ORFs that overlap in another frame with
ORFs). Finally, noncoding ORFs that overlap in another frame
a CDS, and an exon will be annotated as nc_(same/opp)_ovp-CDS.</p>
<p><img alt="priority_gene_vs_mRNA" src="img/mapping/priority_gene_vs_mRNA.png" /><br>
<em>Figure 2: representation of the three frames of a DNA strand section
Expand All @@ -299,11 +299,11 @@ <h4 id="priority-rules">Priority rules<a class="headerlink" href="#priority-rule
the two CDS of the multiexonic gene with orange boxes, while
the protein coding gene and its corresponding mRNA are
represented with light and dark grey boxes respectively.
The two ORFs indicated with brackets do not overlap with the CDS
The two ORFs indicated with brackets do not overlap the CDS
of the gene and are subsequently annotated as noncoding. However,
they overlap with the gene and its corresponding mRNA. As the mRNA has priority
they overlap the gene and its corresponding mRNA. As the mRNA has priority
over the gene feature, the two ORFs are annotated as noncoding ORF
overlapping with a mRNA (nc_ovp_same-mRNA).
overlapping a mRNA (nc_ovp_same-mRNA).
</em></p>

</div>
Expand Down
4 changes: 2 additions & 2 deletions site/orfmap_description.html
Original file line number Diff line number Diff line change
Expand Up @@ -167,13 +167,13 @@
<div role="main">
<div class="section">

<p><img alt="LOGO_ORFmap" src="img/icons/Logo_ORFmap.eps" width="30%" /></p>
<p><img alt="LOGO_ORFmap" src="img/icons/Logo_ORFmap.png" width="30%" /></p>
<h2 id="aims-and-general-description">Aims and general description<a class="headerlink" href="#aims-and-general-description" title="Permanent link">#</a></h2>
<p>ORFMap scans a given genome in the six frames, and searches for
all possible ORFs longer than a given size (default: 60 nucleotides -
STOP codons excluded). It annotates them according to a set of genomic features (e.g. noncoding intergenic,
coding, noncoding and overlapping with a specific genomic feature - see
the <a href="orfmap_orfdef.html">ORF annotation section</a> for more details).
the <a href="orfmap_annotation.html">ORF annotation section</a> for more details).
ORFmap takes as inputs a FASTA file containing the nucleotide
sequences of all chromosomes or contigs and their corresponding
annotations in a GFF file. The program returns a new GFF file that contains all
Expand Down
2 changes: 1 addition & 1 deletion site/orfmap_orf_extraction.html
Original file line number Diff line number Diff line change
Expand Up @@ -175,7 +175,7 @@ <h2 id="orf-extraction">ORF extraction<a class="headerlink" href="#orf-extractio
or noncoding overlapping ORFs), but also, those of a specific ORF
category (e.g. only noncoding intergenic ORFs) or a combination
of ORFs according to their annotations (e.g. noncoding intergenic
ORFs and noncoding ORFs that overlap with lncRNAs). Different
ORFs and noncoding ORFs that overlap lncRNAs...). Different
examples are detailed in the <a href="orfget_run.html">ORFget section</a>.</p>

</div>
Expand Down
7 changes: 3 additions & 4 deletions site/orfmap_overlap.html
Original file line number Diff line number Diff line change
Expand Up @@ -175,17 +175,16 @@ <h2 id="overlap-definition">Overlap definition<a class="headerlink" href="#overl
if the latter covers at least 70% of the ORF sequence.</p>
</li>
<li>
<p>If less than 70% of an ORF sequence overlaps with a genomic
<p>If less than 70% of an ORF sequence overlaps a genomic
feature,
but the latter is totally included in the ORF sequence,
then the ORF is also considered as overlapping with it.</p>
then the ORF is also considered as overlapping it.</p>
</li>
</ul>
<p><img alt="Overlap definition" src="img/mapping/orfmap_coverage.png" /></p>
<p>Notice that the overlap threshold can be modified with the <strong>-co_ovp</strong>
parameter. With the following instruction, an ORF is annotated
as overlapping
with a given genomic feature if the latter covers at least
as overlapping a given genomic feature if the latter covers at least
90% of the considered ORF.</p>
<pre><code class="language-bash">orfmap -fna genome.fasta -gff genome.gff -co_ovp 90
</code></pre>
Expand Down
Loading

0 comments on commit 6aa173e

Please sign in to comment.