This class is a structured, collaborative study of advanced topics in Data Science. During the semester, students will apply the data analytics lifecycle to a research topic of their choosing. Students will select appropriate predictive analytical methods for their topics and evaluate its social and ethical implications. Individual work will complement peer collaboration as students explore issues of visualizing and communicating data to each other and to the public.
Class repository is maintained on GitHub/MoreDataScience and hosted on MoreDataScience.github.io/CSCI499-Spring2019/.
OCNL 220 Tuesdays, 11am - 3pm Or by appointment
Topic | Activities | Due Date | Lead |
---|---|---|---|
Collaboration and Version Control with R projects | Create a GitHub repository for your project portfolio and make it host a public site that will include your code and blog and review how to maintain version control and collaborate using Git. Join the class slack channel and use it for all out-of-class communication, including questions and when you need assistance. As a first entry for your blog, identify a /r/dataisbeautiful post that interests you and summarize a critique for it. As a first commit to your code, identify a research topic and edit your README.md to provide a brief expanation, including where you expect to find the source(s) of your data. Finally, submit a Pull Request to edit this document with the topic area you choose to lead and a link to your site. |
February 8 | Kevin Buffardi |
Ethics and Data Science in Society | Resources: Weapons of Math Destruction chapters 3, 6; Podcast: "Science Vs - Gentrification: what's really happening". Add at least one blog entry to communicate your thoughts on ethics and societal impact of data science and how it applies to your topic. | February 15 | Grant Esparza |
Data Analytics Lifecycle | Resources: R for Data Science chapters 4 and 8; Commit code to your project that organizes "where your data (and analysis) lives" and begin exploring your data by identifying what needs to be cleaned and what questions you might be able to get insight to with the information available. Follow best practices, as guided by the reading. Write a blog that documents what you've done so far, including where you found the data, what you've discovered about the dataset. Make sure you provide enough detail that what you have done can be replicated. | March 15 | Lizz Arriaza |
Regression models and Classification | Resources: Introduction to Statistical Learning chapters 2-4; Explore how to apply the information to your project and write a blog entry that explains your decisions with justification, again with enough detail that someone can replicate your work. | March 29 | Eduardo Gomez |
Resampling and Tree based methods | Resources: Introduction to Statistical Learning chapters 5, 8; Similarly to the previous module, apply the reading to your project and write a blog entry about it. Continue to analyze your data and document your process while you commit and push your new versions. | April 5 | Jerry Tucay |
Information Visualization | Resources: Edward Tufte keynote (video); The Schneiderman Information Visualization Mantra (video); Learning Data Visualization (via Lynda) chapter 5: Visual Dispay; Using the principles you learned from this module's materials, create at least one visualization of your data that provides useful insights. In your blog, post your visualizations and also discuss what design decisions you made that help communicate the insights the visualizations provide. | April 26 | Eisley Adoremos |
Peer Review and Replication | Before meeting, make sure your project code and documentation is all committed to your project. During class, we will perform a pull request review so that a peer can verify that they can replicate, review, and critique your results | May 3 | Kevin Buffardi |
| Student | Project Porfolio Link |
Name | Topic repo |
---|---|
Eisley Adoremos | How Much Better are NBA Players Today Compared to the Past? |
Eduardo Gomez | Crime in the United States |
Grant Esparza | Public Perception of Tech Companies Following Security Leaks |
Lizz Arriaza | World Travel |
Jerry Tucay | Sales Forecasting |