-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Showing
2 changed files
with
131 additions
and
1 deletion.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,127 @@ | ||
{ | ||
"cells": [ | ||
{ | ||
"cell_type": "raw", | ||
"metadata": {}, | ||
"source": [ | ||
"---\n", | ||
"title: \"PRED Score in Bioinformatics\"\n", | ||
"format: \n", | ||
" pptx:\n", | ||
" reference-doc: template_UMD.pptx\n", | ||
"editor: visual\n", | ||
"---" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"## What is PHRED Scores\n", | ||
"\n", | ||
"A Phred score is a measure of the probability that a base call in a DNA sequencing read is incorrect. It is a logarithmic scale, meaning that a small change in the Phred score represents a large change in the probability of an error.\n", | ||
"\n", | ||
"$$Q = -10 \\cdot \\log_{10}(P)$$\n", | ||
"\n", | ||
"Where:\n", | ||
"\n", | ||
"- Q is the PHRED score.\n", | ||
"\n", | ||
"- P is the probability that the base was called incorrectly.\n", | ||
"\n", | ||
"For example:\n", | ||
"\n", | ||
"- **Q = 20**: This corresponds to a 1 in 100 probability of an incorrect base call, or an accuracy of 99%.\n", | ||
"\n", | ||
"- **Q = 30**: This corresponds to a 1 in 1000 probability of an incorrect base call, or an accuracy of 99.9%.\n", | ||
"\n", | ||
"- **Q = 40**: This corresponds to a 1 in 10,000 probability of an incorrect base call, or an accuracy of 99.99%.\n", | ||
"\n", | ||
"```{r}\n", | ||
"# Print the header\n", | ||
"cat(sprintf(\"%-5s\\t\\t%-10s\\n\", \"Phred\", \"Prob of\"))\n", | ||
"cat(sprintf(\"%-5s\\t\\t%-10s\\n\", \"score\", \"Incorrect call\"))\n", | ||
"\n", | ||
"# Loop through Phred scores from 0 to 41\n", | ||
"for (phred in 0:41) {\n", | ||
" cat(sprintf(\"%-5d\\t\\t%0.5f\\n\", phred, 10^(phred / -10)))\n", | ||
"}\n", | ||
"```\n", | ||
"\n", | ||
"## What is ASCII \n", | ||
"\n", | ||
"ASCII (American Standard Code for Information Interchange) is used to represent characters in computers. We can represent Phred scores using ASCII characters. The advantage is that the quality information can be esisly stored in text based FASTQ file.\n", | ||
"\n", | ||
"Not all [ASCII characters](https://www.columbia.edu/kermit/ascii.html) are printable. The first printable ASCII character is `!` and the decimal code for the character for `!` is 33. \n", | ||
"\n", | ||
"\n", | ||
"```{r}\n", | ||
"# Store output in a vector to fit on a slide\n", | ||
"output <- c(sprintf(\"%-8s %-8s\", \"Character\", \"ASCII #\"))\n", | ||
"\n", | ||
"# Loop through ASCII values from 33 to 89\n", | ||
"for (i in 33:89) {\n", | ||
" output <- c(output, sprintf(\"%-8s %-8d\", intToUtf8(i), i))\n", | ||
"}\n", | ||
"\n", | ||
"# Print the output in a single block (e.g., to fit on a slide)\n", | ||
"cat(paste(output, collapse = \"\\n\"))\n", | ||
"```\n", | ||
"## Phred scores in FASTQ file \n", | ||
"\n", | ||
"In a FASTQ file, Phred scores are represented as ASCII characters. These characters are converted back to numeric values (PHRED scores) based on the encoding scheme used:\n", | ||
"\n", | ||
"1. **PHRED+33 Encoding (Sanger/Illumina 1.8+)**:\n", | ||
"\n", | ||
" - The ASCII character for a quality score Q is calculated as:\n", | ||
"\n", | ||
" ASCII character=chr(Q+33)\n", | ||
"\n", | ||
" - For example:\n", | ||
"\n", | ||
" - A PHRED score of 30 is encoded as `chr(30 + 33) = chr(63)`, which corresponds to the ASCII character `?`.\n", | ||
"\n", | ||
"2. **PHRED+64 Encoding (Illumina 1.3-1.7)**:\n", | ||
"\n", | ||
" - The ASCII character for a quality score QQQ is calculated as: \n", | ||
" \n", | ||
" ASCII character=chr(Q+64)\n", | ||
"\n", | ||
" - For example:\n", | ||
"\n", | ||
" - A PHRED score of 30 is encoded as `chr(30 + 64) = chr(94)`, which corresponds to the ASCII character `^`.\n", | ||
"\n", | ||
"\n", | ||
"```{r}\n", | ||
"# Print the header\n", | ||
"cat(sprintf(\"%-5s\\t\\t%-10s\\t%-6s\\t\\t%-10s\\n\", \"Phred\", \"Prob. of\", \"ASCII\", \"ASCII\"))\n", | ||
"cat(sprintf(\"%-5s\\t\\t%-10s\\t%-6s\\t%-10s\\n\", \"score\", \"Error\", \"Phred+33\", \"Phred+64\"))\n", | ||
"\n", | ||
"# Loop through Phred scores from 0 to 41\n", | ||
"for (phred in 0:41) {\n", | ||
" # Calculate the probability of error\n", | ||
" prob_error <- 10^(phred / -10)\n", | ||
"\n", | ||
" # Convert Phred scores to ASCII characters\n", | ||
" ascii_phred33 <- intToUtf8(phred + 33)\n", | ||
" ascii_phred64 <- intToUtf8(phred + 64)\n", | ||
"\n", | ||
" # Print the results in a formatted table\n", | ||
" cat(sprintf(\"%-5d\\t\\t%0.5f\\t\\t%-6s\\t\\t%-10s\\n\", \n", | ||
" phred, prob_error, \n", | ||
" ascii_phred33, ascii_phred64))\n", | ||
"}\n", | ||
"```\n" | ||
] | ||
} | ||
], | ||
"metadata": { | ||
"kernelspec": { | ||
"display_name": "Python 3", | ||
"language": "python", | ||
"name": "python3" | ||
} | ||
}, | ||
"nbformat": 4, | ||
"nbformat_minor": 4 | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters