Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

print tables reporting encode quality control metrics #166

Open
alexg9010 opened this issue Apr 12, 2021 · 0 comments
Open

print tables reporting encode quality control metrics #166

alexg9010 opened this issue Apr 12, 2021 · 0 comments

Comments

@alexg9010
Copy link
Member

https://www.encodeproject.org/atac-seq/

  • Experiments should have two or more biological replicates. Assays performed using EN-TEx samples may be exempted due to limited availability of experimental material, but at least two technical replicates are required.
  • Each replicate should have 25 million non-duplicate, non-mitochondrial aligned reads for single-end sequencing and 50 million for paired-ended sequencing (i.e. 25 million fragments, regardless of sequencing run type).
  • The alignment rate, or percentage of mapped reads, should be greater than 95%, though values >80% may be acceptable.
  • Replicate concordance is measured by calculating IDR values (Irreproducible Discovery Rate). The experiment passes if both rescue and self consistency ratios are less than 2.
  • Library complexity is measured using the Non-Redundant Fraction (NRF) and PCR Bottlenecking Coefficients 1 and 2, or PBC1 and PBC2. The preferred values are as follows: NRF>0.9, PBC1>0.9, and PBC2>3.
  • Various peak files must meet certain requirements. Please visit the section on output files under the pipeline overview for more information on peak files.
    • The number of peaks within a replicated peak file should be >150,000, though values >100,000 may be acceptable.
    • The number of peaks within an IDR peak file should be >70,000, though values >50,000 may be acceptable.
    • A nucleosome free region (NFR) must be present.
    • A mononucleosome peak must be present in the fragment length distribution. These are reads that span a single nucleosome, so they are longer than 147 bp but shorter than 147*2 bp. Good ATAC-seq datasets have reads that span nucleosomes (which allows for calling nucleosome positions in addition to open regions of chromatin).
  • The fraction of reads in called peak regions (FRiP score) should be >0.3, though values greater than 0.2 are acceptable. For EN-TEx tissues, FRiP scores will not be enforced as QC metric. TSS enrichment remains in place as a key signal to noise measure.
  • Transcription start site (TSS) enrichment values are dependent on the reference files used; cutoff values for high quality data are listed in the table below.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant