Skip to content

Commit

Permalink
Update mkdocs.yaml
Browse files Browse the repository at this point in the history
  • Loading branch information
xie186 committed Sep 16, 2024
1 parent a2d661b commit e99dad1
Show file tree
Hide file tree
Showing 2 changed files with 131 additions and 1 deletion.
127 changes: 127 additions & 0 deletions docs/FASTQ_PHRED.ipynb
Original file line number Diff line number Diff line change
@@ -0,0 +1,127 @@
{
"cells": [
{
"cell_type": "raw",
"metadata": {},
"source": [
"---\n",
"title: \"PRED Score in Bioinformatics\"\n",
"format: \n",
" pptx:\n",
" reference-doc: template_UMD.pptx\n",
"editor: visual\n",
"---"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## What is PHRED Scores\n",
"\n",
"A Phred score is a measure of the probability that a base call in a DNA sequencing read is incorrect. It is a logarithmic scale, meaning that a small change in the Phred score represents a large change in the probability of an error.\n",
"\n",
"$$Q = -10 \\cdot \\log_{10}(P)$$\n",
"\n",
"Where:\n",
"\n",
"- Q is the PHRED score.\n",
"\n",
"- P is the probability that the base was called incorrectly.\n",
"\n",
"For example:\n",
"\n",
"- **Q = 20**: This corresponds to a 1 in 100 probability of an incorrect base call, or an accuracy of 99%.\n",
"\n",
"- **Q = 30**: This corresponds to a 1 in 1000 probability of an incorrect base call, or an accuracy of 99.9%.\n",
"\n",
"- **Q = 40**: This corresponds to a 1 in 10,000 probability of an incorrect base call, or an accuracy of 99.99%.\n",
"\n",
"```{r}\n",
"# Print the header\n",
"cat(sprintf(\"%-5s\\t\\t%-10s\\n\", \"Phred\", \"Prob of\"))\n",
"cat(sprintf(\"%-5s\\t\\t%-10s\\n\", \"score\", \"Incorrect call\"))\n",
"\n",
"# Loop through Phred scores from 0 to 41\n",
"for (phred in 0:41) {\n",
" cat(sprintf(\"%-5d\\t\\t%0.5f\\n\", phred, 10^(phred / -10)))\n",
"}\n",
"```\n",
"\n",
"## What is ASCII \n",
"\n",
"ASCII (American Standard Code for Information Interchange) is used to represent characters in computers. We can represent Phred scores using ASCII characters. The advantage is that the quality information can be esisly stored in text based FASTQ file.\n",
"\n",
"Not all [ASCII characters](https://www.columbia.edu/kermit/ascii.html) are printable. The first printable ASCII character is `!` and the decimal code for the character for `!` is 33. \n",
"\n",
"\n",
"```{r}\n",
"# Store output in a vector to fit on a slide\n",
"output <- c(sprintf(\"%-8s %-8s\", \"Character\", \"ASCII #\"))\n",
"\n",
"# Loop through ASCII values from 33 to 89\n",
"for (i in 33:89) {\n",
" output <- c(output, sprintf(\"%-8s %-8d\", intToUtf8(i), i))\n",
"}\n",
"\n",
"# Print the output in a single block (e.g., to fit on a slide)\n",
"cat(paste(output, collapse = \"\\n\"))\n",
"```\n",
"## Phred scores in FASTQ file \n",
"\n",
"In a FASTQ file, Phred scores are represented as ASCII characters. These characters are converted back to numeric values (PHRED scores) based on the encoding scheme used:\n",
"\n",
"1. **PHRED+33 Encoding (Sanger/Illumina 1.8+)**:\n",
"\n",
" - The ASCII character for a quality score Q is calculated as:\n",
"\n",
" ASCII character=chr(Q+33)\n",
"\n",
" - For example:\n",
"\n",
" - A PHRED score of 30 is encoded as `chr(30 + 33) = chr(63)`, which corresponds to the ASCII character `?`.\n",
"\n",
"2. **PHRED+64 Encoding (Illumina 1.3-1.7)**:\n",
"\n",
" - The ASCII character for a quality score QQQ is calculated as: \n",
" \n",
" ASCII character=chr(Q+64)\n",
"\n",
" - For example:\n",
"\n",
" - A PHRED score of 30 is encoded as `chr(30 + 64) = chr(94)`, which corresponds to the ASCII character `^`.\n",
"\n",
"\n",
"```{r}\n",
"# Print the header\n",
"cat(sprintf(\"%-5s\\t\\t%-10s\\t%-6s\\t\\t%-10s\\n\", \"Phred\", \"Prob. of\", \"ASCII\", \"ASCII\"))\n",
"cat(sprintf(\"%-5s\\t\\t%-10s\\t%-6s\\t%-10s\\n\", \"score\", \"Error\", \"Phred+33\", \"Phred+64\"))\n",
"\n",
"# Loop through Phred scores from 0 to 41\n",
"for (phred in 0:41) {\n",
" # Calculate the probability of error\n",
" prob_error <- 10^(phred / -10)\n",
"\n",
" # Convert Phred scores to ASCII characters\n",
" ascii_phred33 <- intToUtf8(phred + 33)\n",
" ascii_phred64 <- intToUtf8(phred + 64)\n",
"\n",
" # Print the results in a formatted table\n",
" cat(sprintf(\"%-5d\\t\\t%0.5f\\t\\t%-6s\\t\\t%-10s\\n\", \n",
" phred, prob_error, \n",
" ascii_phred33, ascii_phred64))\n",
"}\n",
"```\n"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
}
},
"nbformat": 4,
"nbformat_minor": 4
}
5 changes: 4 additions & 1 deletion mkdocs.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,10 @@ nav:

- Get ready for the course:
- Basic Linux: basic_linux.md
#- Basic R: basic_r.md
- Bulk RNA-seq
- Prepare data: bulkRNAseq_lab.md
- Phred score in FASTQ: FASTQ_PHRED.md
#- Basic R: basic_r.md
#- Basic Python: basic_python.md

#- Intronduction:
Expand Down

0 comments on commit e99dad1

Please sign in to comment.