Modify the doc

i2bc · Jan 3, 2021 · 6aa173e · 6aa173e
1 parent 98d3c08
commit 6aa173e
Show file tree

Hide file tree

Showing 27 changed files with 160 additions and 106 deletions.
diff --git a/docs/.idea/.gitignore b/docs/.idea/.gitignore
diff --git a/docs/.idea/docs.iml b/docs/.idea/docs.iml
diff --git a/docs/.idea/inspectionProfiles/profiles_settings.xml b/docs/.idea/inspectionProfiles/profiles_settings.xml
diff --git a/docs/.idea/misc.xml b/docs/.idea/misc.xml
diff --git a/docs/.idea/modules.xml b/docs/.idea/modules.xml
diff --git a/docs/.idea/vcs.xml b/docs/.idea/vcs.xml
diff --git a/docs/How_it_works_orfold.md b/docs/How_it_works_orfold.md
@@ -30,7 +30,7 @@ They are mostly associated with intermediate HCA score values.
 
 
 The HCA score is calculated using the freely available 
-software **pyHCA** which can be downloaded and installed 
+softwares **pyHCA** which can be downloaded and installed 
 following the instructions of its developers: <https://github.com/T-B-F/pyHCA>
 
 

diff --git a/site/How_it_works_orfold.html b/site/How_it_works_orfold.html
@@ -183,7 +183,7 @@ <h1 id="how-works-orfold">How works ORFold?<a class="headerlink" href="#how-work
 knowledge of their 3D structures nor evolutionary information (orphan sequences can be treated). 
 The fold potential of a sequence is 
 calculated with the HCA 
-method. Also ORFold can estimate the disorder, 
+method. Also, ORFold can estimate the disorder, 
 and the aggregation propensities of the input sequences with IUPred
 and Tango respectively.    </p>
 <h2 id="hydrophobic-clusters-analysis-hca">Hydrophobic Clusters Analysis (HCA)<a class="headerlink" href="#hydrophobic-clusters-analysis-hca" title="Permanent link">#</a></h2>
@@ -213,10 +213,9 @@ <h2 id="tango">Tango<a class="headerlink" href="#tango" title="Permanent link">#
 Tango is not freely available software, and the user of ORFold should 
 first contact the Tango developers to have access to the source code: <a href="http://tango.crg.es">http://tango.crg.es</a></p>
 <p>For the aggregation propensity estimation, according to the protocol
-proposed by XXX et al[REF], a residue is considered as
-participating in an aggregation prone region if it is located in a segment 
-of at least five consecutive residues which were predicted as populating 
-a b-aggregated conformation for more than 5%. 
+proposed by Linding et al.[6], a sequence segment is considered as aggregation prone
+if it is composed of at least five consecutive residues predicted as 
+populating a b-aggregated conformation with a percentage occupancy greater than 5%. 
 Then, the aggregation propensity of each sequence is defined as the 
 fraction of residues predicted in aggregation prone segments. </p>
 <h2 id="iupred">IUPred<a class="headerlink" href="#iupred" title="Permanent link">#</a></h2>

diff --git a/site/Objective_orfold.html b/site/Objective_orfold.html
@@ -169,7 +169,7 @@
           <div role="main">
             <div class="section">
 
-                <p><img alt="LOGO_ORFold" src="img/icons/Logo_ORFold.eps" width="30%" /></p>
+                <p><img alt="LOGO_ORFold" src="img/icons/Logo_ORFold.png" width="30%" /></p>
 <h1 id="aims-and-general-description-of-orfold">Aims and general description of ORFold<a class="headerlink" href="#aims-and-general-description-of-orfold" title="Permanent link">#</a></h1>
 <p>ORFold aims at estimating the fold potential of a set of amino acid sequences
 using the <strong>Hydrophobic Clusters Analysis (HCA)</strong> method [1]. 

diff --git a/site/Plot_orfold.html b/site/Plot_orfold.html
@@ -171,14 +171,23 @@ <h1 id="plot-of-the-orfold-output">Plot of the ORFold output<a class="headerlink
 <p>The output table generated by ORFold can be subsequently given to ORFold 
 to generate a plot of the HCA score distribution. 
 The user can provide several tables in order to compare different HCA score
-distribution. In this case, ORFplot will plot all the distributions on the same plot
-(the tables must be given with the <strong>-tab</strong> option). 
+distributions. In this case, ORFplot will plot all the distributions on the same plot
+(the tables must be given with the <strong>-tab</strong> option). </p>
+<p>The HCA score distribution of a set 
+of globular proteins extracted from [1] is represented by the grey histogram. 
+We defined three sequence categories according to their HCA scores: low, intermediate and high HCA score
+sequences. The boundaries of these categories are defined so that 95% of the globular proteins 
+fall into the intermediate HCA score bin. Dotted black lines delineate the 
+boundaries of each category. </p>
+<p>Each plotted distribution is compared with the one of the globular 
+proteins set with a Kolmogorov Smirnov test. 
+Asterisks on the plot denote level of significance: * &lt; 0.05, ** &lt; 0.01, *** &lt; 0.001.
 By default, the names used in the legend of the resulting plot 
 are the root names of the input table files. 
 However, the user can write his own names in the legend 
 with the <strong>-names</strong> option. The names must be given in the same order 
 as the table files. </p>
-<pre><code>orfplot -tab sequences_Y.tab sequences_X.tab sequences_Z.tab
+<pre><code bash="bash">orfplot -tab sequences_Y.tab sequences_X.tab sequences_Z.tab
 </code></pre>
 <p>This example will generate the HCA score distributions of the sequences 
 stored in the sequences_Y.tab, sequences_X.tab and sequences_Z.tab files. 
@@ -187,7 +196,7 @@ <h1 id="plot-of-the-orfold-output">Plot of the ORFold output<a class="headerlink
 </code></pre>
 <p>This example will generate the HCA score distributions of the sequences 
 stored in the sequences_Y.tab, sequences_X.tab and sequences_Z.tab files.
-The resulting legend will be Noncoding, Coding and Translated, respectively.</p>
+The resulting legend will be "Noncoding", "Coding" and "Translated", respectively.</p>
 <div class="admonition note">
     <p class="first admonition-title">
         Note
@@ -204,6 +213,11 @@ <h1 id="plot-of-the-orfold-output">Plot of the ORFold output<a class="headerlink
 
     </p>
 </div>
+
+<p>References</p>
+<ol>
+<li>Mészáros B, Erdős G, Dosztányi Z (2018) IUPred2A: context-dependent prediction of protein disorder as a function of redox state and protein binding. Nucleic acids research 46:W329–W337</li>
+</ol>
 
             </div>
           </div>

diff --git a/site/Run_orfold_advanced.html b/site/Run_orfold_advanced.html
@@ -190,7 +190,7 @@ <h3 id="mapping-of-the-fold-potential-and-the-disorder-and-aggregation-propensit
 that contain for the ORFs provided in the input FASTA file, their corresponding 
 property (fold potential, disorder or aggregation propensities). The 
 values are stored in the column #9 of the output GFF files. The GFF files can be subsequently
-uploaded on a genome viewer such as IGV [REF].</p>
+uploaded on a genome viewer such as IGV [1].</p>
 <p>The input GFF file must be given with the <strong>-gff</strong> option as follows:</p>
 <pre><code>orfold -fna sequences.fasta -options HIT -gff sequences.gff 
 </code></pre>
@@ -209,7 +209,7 @@ <h3 id="mapping-of-the-fold-potential-and-the-disorder-and-aggregation-propensit
 a genome viewer, thereby enabling the visual inspection of these properties along
 the input genome. Notice that on IGV, blue indicates low values (for all mapped properties) 
 while red indicates high values. </p>
-<p><img alt="HCA Scale" src="img/mapping/Scale.pdf" /><br>
+<p><img alt="HCA Scale" src="img/mapping/Scale.png" /><br>
 <em>Figure 1: Color scale for the HCA score values 
 </em></p>
 <div class="admonition note">
@@ -338,14 +338,14 @@ <h3 id="running-orfold-on-subsets-of-randomly-selected-sequences">Running ORFold
 <pre><code>orfold -fna sequences_X.pfasta sequences_Y.pfasta -options H -N 1500 3000
 </code></pre>
 
-even if is the same sample size it has to be explicit for every input
+Also, if the user wants to sample two subsets of same sizes, he has to indicate the subset sizes explicitly for each input
 
 <pre><code>orfold -fna sequences_X.pfasta sequences_Y.pfasta -options H -N 1500 1500
 </code></pre>
 
     If the user whishes to calculate the fold potential of all the sequences 
-    in one of the given inputs, he has to write in the order of this file
-    the word "all"
+    of one of the given inputs, he has to indicate it with the "all" flag (again with respect to
+    the order of input files)
 
 <pre><code>orfold -fna sequences_X.pfasta sequences_Y.pfasta -options H -N all 3000
 </code></pre>
@@ -356,6 +356,12 @@ <h3 id="running-orfold-on-subsets-of-randomly-selected-sequences">Running ORFold
 
     </p>
 </div>
+
+<p>References</p>
+<ol>
+<li>Robinson JT, Thorvaldsdóttir H, Winckler W, et al (2011)
+   Integrative genomics viewer. Nature biotechnology 29:24–26</li>
+</ol>
 
             </div>
           </div>

diff --git a/site/img/mapping/Scale.pdf b/site/img/mapping/Scale.pdf
diff --git a/site/img/mapping/Scale.png b/site/img/mapping/Scale.png
diff --git a/site/img/mapping/orf_annotation.pdf b/site/img/mapping/orf_annotation.pdf
diff --git a/site/img/mapping/orf_annotation.png b/site/img/mapping/orf_annotation.png
diff --git a/site/orfmap_annotation.html b/site/orfmap_annotation.html
@@ -185,11 +185,11 @@ <h2 id="orf-annotation">ORF annotation<a class="headerlink" href="#orf-annotatio
 noncoding intergenic ORFs (nc_intergenic) or noncoding overlapping
 ORFs (nc_ovp-x with x refering to the overlapping genomic feature)
 (see <a href="orfmap_overlap.html">here</a> for the definition of an overlap).
-The former correspond to ORFs which do not overlap with any 
+The former correspond to ORFs which do not overlap any 
 genomic feature. The latter consist of ORFs
-which overlap with a non-phased genomic feature (i.e. non coding) 
+which overlap a non-phased genomic feature (i.e. non coding) 
 on the same or the opposite
-strand or which overlap with a CDS in another frame. 
+strand or which overlap a CDS in another frame. 
 Depending on the localization of the overlapping feature (same or
 opposite strand), the ORFs are annotated as nc_ovp_same-x or 
 nc_ovp_opp-x respectively.</p>
@@ -198,7 +198,7 @@ <h2 id="orf-annotation">ORF annotation<a class="headerlink" href="#orf-annotatio
         Note
     </p>
     <p class="last">
-Notice that the ORFmap annotation adopts/has a particular point of
+Notice that the ORFmap annotation has a particular point of
 view on the genome which is centered on the identification and
 annotation of a genome's ORFs rather than the annotation of 
 real biological objets (e.g. tRNA, rRNA or lncRNA for example). 
@@ -214,13 +214,13 @@ <h4 id="orf-categories">ORF categories<a class="headerlink" href="#orf-categorie
 <ul>
 <li>(1) <code>c_CDS</code> ORFs which include in the same frame a CDS </li>
 <li>(2) <code>nc_intergenic</code> ORFs which do not overlap any genomic feature </li>
-<li>(3) <code>nc_ovp_same-x</code> ORFs which overlap on the same strand, with a genetic feature no matter 
+<li>(3) <code>nc_ovp_same-x</code> ORFs which overlap on the same strand, a genetic feature no matter 
   its type</li>
-<li>(4) <code>nc_ovp_opp-x</code> ORFs which overlap on the opposite strand, with a genetic feature no matter 
+<li>(4) <code>nc_ovp_opp-x</code> ORFs which overlap on the opposite strand, a genetic feature no matter 
   its type</li>
 </ul>
 <p>See examples of each type in Figure 1.</p>
-<p><img alt="Examples_of_ORFs" src="img/mapping/orf_annotation.pdf" /><br>
+<p><img alt="Examples_of_ORFs" src="img/mapping/orf_annotation.png" /><br>
 <em>Figure 1: representation of the six frames of a DNA section. STOP codons
 are represented with red stars, CDS with orange boxes and 
  the localization of the non-phased genomic features 
@@ -230,52 +230,52 @@ <h4 id="orf-categories">ORF categories<a class="headerlink" href="#orf-categorie
         Note
     </p>
     <p class="last">
-       Notice that a noncoding ORF which overlaps with a tRNA is not 
+       Notice that a noncoding ORF which overlaps a tRNA is not 
 considered as a tRNA and will not be annotated as tRNA, but rather
-as an ORF that overlaps with a tRNA. Indeed a tRNA is a RNA 
+as an ORF that overlaps a tRNA. Indeed a tRNA is a RNA 
 molecule that does not follow the ORF definition (not bounded by STOP 
 codons, whose sequence length is not necessarily a multiple of 3...).
 Here ORFs are seen as potential peptides or proteins that could be 
 produced upon the pervasive translation of their corresponding RNA.
-Annotating all ORFs with the genomic feature they overlap with 
+Annotating all ORFs with the genomic feature they overlap 
 enables their analysis in a very flexible fashion.
 Indeed, the user can adopt different levels of annotation, 
 considering all noncoding ORFs as a whole (i.e. regardless of the fact they
-overlap with a genomic feature or not) or differentiating noncoding ORFs
-from noncoding ORFs that overlap with specific genomic features (e.g. 
+overlap a genomic feature or not) or differentiating noncoding ORFs
+from noncoding ORFs that overlap specific genomic features (e.g. 
 tRNA and rRNA) (see the <a href="./orfget_run.html">ORFget section</a>  
 for more details).
 
     </p>
 </div>
 
 <h4 id="priority-rules">Priority rules<a class="headerlink" href="#priority-rules" title="Permanent link">#</a></h4>
-<p>If a noncoding ORF overlaps with multiple genomic features, 
+<p>If a noncoding ORF overlaps multiple genomic features, 
 it will be annotated according to the following priority rules:</p>
 <ol>
 <li>
-<p>if the noncoding ORF overlaps with a CDS and another annotated 
+<p>if the noncoding ORF overlaps a CDS and another annotated 
     feature, the CDS has priority over the other annotated features
     no matter the CDS is located on the same or the opposite strand.
-    The ORF will be annotated as a noncoding ORF overlapping with 
+    The ORF will be annotated as a noncoding ORF overlapping 
     a CDS (e.g. nc_ovp_(same/opp)-CDS).</p>
 </li>
 <li>
-<p>if the noncoding ORF overlaps with an annotated feature on the 
+<p>if the noncoding ORF overlaps an annotated feature on the 
    same strand and another annotated feature on the opposite 
    strand (except CDS), the annotated feature located on the same strand
    has priority over the other features on the opposite
    strand. The ORF will be annotated as a noncoding ORF overlapping 
-   with the feature on the same strand (e.g. nc_ovp_same-x).</p>
+   the feature on the same strand (e.g. nc_ovp_same-x).</p>
 </li>
 <li>
-<p>if the noncoding ORF overlaps with multiple annotated features
+<p>if the noncoding ORF overlaps multiple annotated features
    located on the same strand, the feature with the larger overlap
    with the ORF to be annotated has priority over the other features
    (e.g. nc_ovp_(same/opp)-x).</p>
 </li>
 <li>
-<p>if the noncoding ORF overlaps with multiple features located on the 
+<p>if the noncoding ORF overlaps multiple features located on the 
 same strand and that cover the same fraction of the ORF to be 
    annotated, the feature which first appears in the GFF file has
    priority over the others. This case occurs with large annotated 
@@ -290,7 +290,7 @@ <h4 id="priority-rules">Priority rules<a class="headerlink" href="#priority-rule
 match at the same time with the features "gene" and "mRNA" are annotated
 as nc_(same/opp)_ovp-mRNA (see Figure 2), while those that match with a CDS
 and its corresponding exon, will be annotated as c_CDS (i.e. coding 
-   ORFs). Finally, noncoding ORFs that overlap in another frame with 
+   ORFs). Finally, noncoding ORFs that overlap in another frame 
    a CDS, and an exon will be annotated as nc_(same/opp)_ovp-CDS.</p>
 <p><img alt="priority_gene_vs_mRNA" src="img/mapping/priority_gene_vs_mRNA.png" /><br>
 <em>Figure 2: representation of the three frames of a DNA strand section 
@@ -299,11 +299,11 @@ <h4 id="priority-rules">Priority rules<a class="headerlink" href="#priority-rule
  the two CDS of the multiexonic gene with orange boxes, while 
  the protein coding gene and its corresponding mRNA are 
  represented with light and dark grey boxes respectively.
-The two ORFs indicated with brackets do not overlap with the CDS
+The two ORFs indicated with brackets do not overlap the CDS
  of the gene and are subsequently annotated as noncoding. However,
- they overlap with the gene and its corresponding mRNA. As the mRNA has priority
+ they overlap the gene and its corresponding mRNA. As the mRNA has priority
 over the gene feature, the two ORFs are annotated as noncoding ORF
-overlapping with a mRNA (nc_ovp_same-mRNA).
+overlapping a mRNA (nc_ovp_same-mRNA).
  </em></p>
 
             </div>

diff --git a/site/orfmap_description.html b/site/orfmap_description.html
@@ -167,13 +167,13 @@
           <div role="main">
             <div class="section">
 
-                <p><img alt="LOGO_ORFmap" src="img/icons/Logo_ORFmap.eps" width="30%" /></p>
+                <p><img alt="LOGO_ORFmap" src="img/icons/Logo_ORFmap.png" width="30%" /></p>
 <h2 id="aims-and-general-description">Aims and general description<a class="headerlink" href="#aims-and-general-description" title="Permanent link">#</a></h2>
 <p>ORFMap scans a given genome in the six frames, and searches for 
 all possible ORFs longer than a given size (default: 60 nucleotides -
 STOP codons excluded). It annotates them according to a set of genomic features (e.g. noncoding intergenic,
 coding, noncoding and overlapping with a specific genomic feature - see
-the <a href="orfmap_orfdef.html">ORF annotation section</a> for more details). 
+the <a href="orfmap_annotation.html">ORF annotation section</a> for more details). 
 ORFmap takes as inputs a FASTA file containing the nucleotide
 sequences of all chromosomes or contigs and their corresponding 
 annotations in a GFF file. The program returns a new GFF file that contains all

diff --git a/site/orfmap_orf_extraction.html b/site/orfmap_orf_extraction.html
@@ -175,7 +175,7 @@ <h2 id="orf-extraction">ORF extraction<a class="headerlink" href="#orf-extractio
 or noncoding overlapping ORFs), but also, those of a specific ORF
 category (e.g. only noncoding intergenic ORFs) or a combination 
 of ORFs according to their annotations (e.g. noncoding intergenic
-ORFs and noncoding ORFs that overlap with lncRNAs). Different 
+ORFs and noncoding ORFs that overlap lncRNAs...). Different 
 examples are detailed in the <a href="orfget_run.html">ORFget section</a>.</p>
 
             </div>

diff --git a/site/orfmap_overlap.html b/site/orfmap_overlap.html
@@ -175,17 +175,16 @@ <h2 id="overlap-definition">Overlap definition<a class="headerlink" href="#overl
   if the latter covers at least 70% of the ORF sequence.</p>
 </li>
 <li>
-<p>If less than 70% of an ORF sequence overlaps with a genomic
+<p>If less than 70% of an ORF sequence overlaps a genomic
    feature,
    but the latter is totally included in the ORF sequence, 
-   then the ORF is also considered as overlapping with it.</p>
+   then the ORF is also considered as overlapping it.</p>
 </li>
 </ul>
 <p><img alt="Overlap definition" src="img/mapping/orfmap_coverage.png" /></p>
 <p>Notice that the overlap threshold can be modified with the <strong>-co_ovp</strong>
 parameter. With the following instruction, an ORF is annotated 
-as overlapping
-with a given genomic feature if the latter covers at least 
+as overlapping a given genomic feature if the latter covers at least 
 90% of the considered ORF.</p>
 <pre><code class="language-bash">orfmap -fna genome.fasta -gff genome.gff -co_ovp 90 
 </code></pre>