Professor Laura Stoker
GSI Rochelle Terman
Fall 2015
Final(ish) Syllabus
##1. Course Description
This course will provide graduate students the critical technical skills necessary to conduct research in computational social science and digital humanities, introducing them to the basic computer literacy, programming skills, and application knowledge that students need to be successful in further methods work. This course is not an introduction to statistics, computer science, or specialized social science / digital humanities methods. Rather, it is meant as a springboard for students to further their training once the course is finished, whether through campus workshops (e.g. D-Lab workshops), online courses, traditional classrooms, or independent learning.
####Objectives
By the end of the course, students should be able to:
- Understand basic programming terminologies, structures, and conventions
- Navigate and operate effectively in a UNIX environment
- Master basic Git and GitHub workflows
- Write, execute, and debug R code for assignments involving statistical analysis
- Write, execute, and debug Python code for assignments involving data collection and manipulation, as well as other computing tasks
- Collect data through a variety of means, including webscraping and APIs.
- Be familiar with the concepts and tools of a variety of computational social science / digital humanities applications
- Be familiar with the basic guidelines around reproducible research, good scientific computing practices, and ethics/privacy/legal quandaries.
- Learn independently and train themselves in a variety of computational applications and tasks through online documentation
####Structure
The course is divided into three main sections: skills, applications, and community engagement. The “skills” portion will introduce students to basic computer literacy, terminologies, and programming languages. Specifically, students will learn the basics in the Unix Shell (i.e. Bash), R, Python, and Git. We chose these tools because they are considered by many to be the “core curriculum” for social scientists and humanists who wish to integrate computing into their scholarship.
The second part of the course provides students the opportunity to use the skills they learned in part 1 towards practical applications. While the realm of possibilities is too great to cover in a single course, we expect to introduce students to automated text analysis, geospatial analysis, webscraping, data collection via APIs, and qualitative data analysis software. Each week will introduce students to a new application. Because of time constraints, we cannot offer a robust training in any one application. Rather, the goal is to introduce the students to a medley of common applications so that they can discover which avenue to pursue in their own research, and what such training would entail.
The third section on community engagement will introduce topics such as ethics and privacy, best practices of reproducible research, scholarly communication and collaboration, and how to further one’s research using UC Berkeley campus resources.
##2. Logistics
####Personnel
The course is taught by Rochelle Terman ([email protected]) and Laura Stoker ([email protected]). Email Rochelle first.
####Time
Lecture: Tu 4-6P
Section/Lab: Fr 2-4P
####Location
202 Barrows
Rochelle will have office hours on Mondays, 1-4P in the GSI room on the 7th floor of Barrows.
####bCourses
We will use bCourses for communication (announcements and questions) and turning in assignments. You should ask questions about class material and assignments through the bcourse website so that everyone can benefit from the discussion. We encourage you to respond to each other’s questions as well. Questions of a personal nature can be emailed to us directly.
####GitHub
All course materials will be posted on Github at https://github.com/rochelleterman/PS239T, including class notes, code demonstrations, sample data, and assignments. Students are encouraged to submit pull requests to this repository, for example if they find a particularly helpful resource that would aid other students. Students are required to use GitHub for their final projects, which will be publically available, unless they have special considerations (e.g. proprietary data).
####Accessibility
This class is committed to creating an environment in which everyone can participate, regardless of background, discipline, or disability. If you have a particular concern, please come to us as soon as possible so that we can make special arrangements.
##3. Course Requirements and Grades
####Final Grades
This is a graded class based on the following:
- Completion of assigned homework (50%)
- Participation (25%)
- Final project (25%)
####Assignments
Assignments will be assigned at the end of every session. They will be due at the start of the following class unless otherwise noted. The assignments will be frequent but each of them should be fairly short.
You are encouraged to work in groups, but the work you turn in must be your own. Group submission of homework, or turning in copies of the same code or output, is not acceptable. Remember, the only way you actually learn how to write code is to write code.
Unless otherwise specified, assignments should be turned in as pdf documents via the bCourses site.
####Class Participation
The class participation portion of the grade can be satisfied in one or more of the following ways:
- attending the lecture and section
- asking and answering questions in class
- contributing to class discussion through the bCourse site, and/or
- collaborating with the campus computing community, either by attending a D-Lab or BIDS workshop, submitting a pull request to a campus github repository (including the class repository), answering a question on StackExchange, or other involvement in the social computing / digital humanities community.
####Final Project
The final project consists of using the tools we learned in class on your own data of interest. Students are required to write a short proposal by November 10 (no more than 2 paragraphs) in order to get approval / feedback from the instructors.
On Dec 11 we will have a lightning talk session where students can present their projects in a maximum 5 minute talk.
##4. Class Activities and Materials
####Activities and Guest Speakers
Classes will follow a “workshop” style, combing lecture and lab formats. We envision the class to be as interactive / hands on as possible, with students programming every session. During the “skills” part of the class, we will be learning how to program in Unix, Python, and R by following a set of course notes with demonstrations.
During the “applications” section, we will be following a similar structure with the possibility of guest speakers leading the class on specific tools. These guest speakers will be members of the campus computer who are experts in the respective tool that they teaching.
####Section
The Friday "lab" section will be an informal session dedicated to helping students with materials from lecture and homework. It will be mostly student led, so come with questions. If there are no questions, the lab turns into a "hackathon" where groups can work on the assignments together. It is not required but strongly encouraged, especially if you are having difficulty with the assignments.
####Computer Requirements
The software needed for the course is as follows:
- Access to the UNIX command line (e.g., a Mac laptop, a Bash wrapper on Windows)
- Git
- R and RStudio (latest versions)
- A scientific Python distribution including Python, iPython and iPython notebooks, and major packages.
This requires a computer that can handle all this software. Almost any Mac will do the job. Most Windows machines are fine too if they have enough space and memory.
See Install.md for more information. We will be having an InstallFest on September 4 for those students experiencing difficulties downloading and installing the requisite software.
####Books and Other Resources
There are no official textbooks for this class. There is a list of topic-specific tutorials and resources in the GitHub repo in the document resources.md
. If you find a good online resources, you are encouraged to add to the document.
##5. Curriculum Outline / Schedule
- Sept 1 - Introduction (InstallFest on Sept 4)
- Sept 8 - Unix, Bash, and Git
- Sept 15 - Python 1 (Basics)
- Sept 22 - Python 2 (Pythonic Code)
- Sept 29 - Python 3 (Information Retrieval)
- Oct 6 - R 1 (Basics)
- Oct 13 - R 2 (Data Analysis)
- Oct 20 - R 3 (Visualization)
- Oct 27 - APIs
- Nov 3 - Web Scraping
- Nov 10 - Text Analysis
- Nov 17 - GeoSpatial (Guest Speaker)
- Nov 24 - AtlasTI (Guest Speaker)
- Dec 1 - Git
- Dec 11 - Final project lightning talks