Skip to content

Latest commit

 

History

History
102 lines (79 loc) · 10.1 KB

README.md

File metadata and controls

102 lines (79 loc) · 10.1 KB

bioinformatics_and_data_science_part_II

Bioinformatics and Data Science Part II Spring 2023
BIOL 792-1036
Prof: Julie Allen; SFB 206; [email protected]
Class: Tuesdays and Thursdays 3:00 - 4:15; PE 102 Office Hours: By appointment

Course Description

The nature of biological datasets have changed dramatically in the last few decades. The need for bioinformatic and data science skills is rapidly growing. The goals of the second part of this two part series is to continue building on the linux and python skills the students learned in the first semester and to add an understanding of data science and tools for managing large datasets. The course will focus on python programming and working in the shell along with introduction to data standards and version control, tools for cleaning dirty data, data visualization, relational databases and working with High Performance Clusters (HPCs).

With an understanding of how to integrate different data sources we will increase not only the creativity of our science, but also expand our ability to do more broad-scale research. A prerequisite for this course is enrollment as an M.S. or PhD student and have taken Data_Science_For_Biology_I. If you have not taken this course email me - to determine eligibility. The course will be capped at 15 students.

Student Learning Outcomes

The goal of the course is to learn data science tools/tricks and hacks from a bioinformatics angle. By the end of the course you should feel comfortable with the tools data scientists use in biology and be able to solve and/or trouble shoot both small and large-scale data challenges in biology.

Material Distribution

All readings, lab instructions, datasets, etc. will be available here.

Attendance and Participation

Because this is a graduate class, I expect full attendance and participation, including all in class exercises, homework, and projects.

Grade

Homework assignments (40%) Assignments will involve working in Unix, writing simple Python scripts, and other small assignments given during each module. These will be working with data sets that will be provided over the course of the semester. Assignments will be evaluated based on completion. You can work in teams of 2 or 3 but will turn in your own notes and scripts for each assignment. More guidelines on these files and each specific assignment will be available on github.

Participation (20%) Participation entails showing up for class, prepared and doing your best to work through assigned tasks and programming example problems. Becasue all classes build on previous classes if you need to miss a class contact me. Some of the material we cover might be easy and quick to figure out. Other material and tasks will present roadblocks that are more difficult. We are building a positive community in this class, your attitude and helpfulness will be evaluated.

Independent project (40%) Everyone will be responsible for an independent project (this can be done either individually, or as a group no more than 2 people). The goal of your semester project is to incorporate the tools learned in this classroom into a project of your design. Ideally this will be something related to your research and will help you move your PhD forward, but you could decide to work on new project. A requirement of the project will be to incorporate at least 2 tools learned in the class to resolve a biological question or computational problem. You will turn in a 1-2 page write up of the project and how you will solve it by week 6. On the last day of class you will turn in a one to three page write up of the project, put the documented code on github, (or submit to me) and present your project in a 10-15 min presentation the last day of class.

White paper

  • 1-2 page White Paper: The 1-2 page write up should be similar format to a whtie paper. Therefore there should be an introduction to the biological or other type of problem you are trying to solve (with references), just like a white paper followed by a methods section. The methods will fully describe your plan. For example "I will write a python script to take the data from a phyllip format to a fasta format". There should be two techniques from the class used (e.g. python, shell scripts, Github, Relational Database, Cleaning Data).

Project Summary + Presentation

  • 1-2 page Project Paper: The 1-2 page final paper should be similar format to the whiite paper but added results and discussion section. Explain in detail the tools from class you used. In the diiscussion talk about how this helped your project and what you would do next and what you leaarned.

  • 10 - 15 min presentation: On the last day of class each of you will present your project to the class. No more than 15 min each - Feel free to show GitHub repos anad or run code in class.

SCHEDULE

*this is the tentative outline of the schedule – the events may change according to the speed and needs of the students in the course the course is going to be set up into 5 parts

Part I - Linux Refresh

Part II - Version Control, Git, Github

Part III - Python - Notebooks, Pandas

Part IV - Data Visualization

Part V - Data Cleaning, Relational Databases

Part II - Machine Learning

Part VII - Working with High Performance Clusters (HPCs)

Week Month Date Day Class Due
1 Jan 24 Tues Course intro
1 Jan 26 Thurs Part 1 Linux Refresh
2 Jan 31 Tues Go over homework 1 and start Part 2 Version Control with Git Homework_1 Linux_Refresh
2 Feb 2 Thurs Tracking Changes
3 Feb 7 Tues Exploring History Gitignore, Remotes in Github, Practice
3 Feb 9 Thurs Collaborating
4 Feb 14 Tues no class work on projects
4 Feb 16 Thurs no class Conflicts - work on Homework 2
5 Feb 21 Tues work on projects
5 Feb 23 Thurs Work on Homework 2
6 Feb 28 Tues Version Control Finish/Introduction to Programming/Notebooks Homework_2 Github
6 Mar 2 Thurs Work on Homework 3
7 Mar 7 Tues Intro to Pandas Chandra Sarkar Homework_3 Python Refres
7 Mar 9 Thurs Pandas
8 Mar 14 Tues Part 4 Data Visualization - ggplot2 Avery Grant Homework_4 Pandas
8 Mar 16 Thurs GGplot Avery Grant
SB Mar 21 - 23 Tues-Thurs Spring Break
9 Mar 28 Tues Data Visualization - Bobby del Carlo Homework_5 DV ggplot
9 Mar 30 Thurs Data Visualization Data Vis Homework 6
10 Apr 4 Tues Part 5 Data Science + Open Refine *1-2 Page Project Writeup Due
10 Apr 6 Thurs MetaData Relational Databases - Sqlite
11 Apr 11 Tues Sqlite Homework_7_Open Refine
11 Apr 13 Thurs Sqlite + homework
12 Apr 18 Tues Working with HPCs Homework_8 Sqlite
13 Apr 20 Thurs Slurm scripts aTRAM intro
15 Apr 25 Tues Working with Pronghorn
15 Apr 27 Thurs Machine Learning
16 May 2 Tues Machine Learning
16 May 4 Thurs Project Prep Homework_9 High Performance Clusters
17 May 9 Tues Project presentations *presentations due

Statement on Academic Dishonesty:

"Cheating, plagiarism or otherwise obtaining grades under false pretenses constitute academic dishonesty according to the code of this university. Academic dishonesty will not be tolerated and penalties can include canceling a student's enrollment without a grade, giving an F for the course or for the assignment. For more details, see the University of Nevada, Reno General Catalog."

Statement of Disability Services:

Statement of Disability Services For Traditional and Seated Classrooms: “Any student with a disability needing academic adjustments or accommodations is requested to speak with me or the Disability Resource Center (Pennington Achievement Center Suite 230) as soon as possible to arrange for appropriate accommodations.”

Statement on Audio and Video Recording:

"Surreptitious or covert video-taping of class or unauthorized audio recording of class is prohibited by law and by Board of Regents policy. This class may be videotaped or audio recorded only with the written permission of the instructor. In order to accommodate students with disabilities, some students may be given permission to record class lectures and discussions. Therefore, students should understand that their comments during class may be recorded."

Statement on Maintaining a Safe Learning and Work Environment

The University of Nevada, Reno is committed to providing a safe learning and work environment for all. If you believe you have experienced discrimination, sexual harassment, sexual assault, domestic/dating violence, or stalking, whether on or off campus, or need information related to immigration concerns, please contact the University's Equal Opportunity & Title IX office at 775-784-1547. Resources and interim measures are available to assist you. For more information, please visit the

Equal Opportunity and Title IX page.

Statement on Academic Success Services Your student fees cover usage of the Math Center (775) 784-4433, Tutoring Center (775) 784-6801, and University Writing Center (775) 784-6030. These centers support your classroom learning; it is your responsibility to take advantage of their services. Keep in mind that seeking help outside of class is the sign of a responsible and successful student.