-
Notifications
You must be signed in to change notification settings - Fork 53
Syllabus
- Course: LIN386M/CS395T Applied Natural Language Processing
- Semester: Spring 2013
- Location: GAR 1.134
- Jason Baldridge
- office hours: Mon 11:45-12:45, Thur 9:30-11:30
- office: CLA 4.738
- email: [email protected]
Graduate standing. Students should have significant previous programming experience and they should have taken Introduction to Computational Linguistics, Applied Text Analysis, Natural Language Processing, Information Retrieval, Machine Learning or relevant computer science courses. Discuss your situation with the instructor if you think you might not fulfill these prerequisites.
This page serves as the syllabus for this course.
There is no required course text book. Readings will primarily come from online sources, including tutorials. Students are encouraged to obtain a copy of the following standard natural language processing book:
Jurafsky, D. and J. H. Martin, Speech and language processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition (2nd Edition). Prentice-Hall, 2008.
For Scala programming, students are encouraged to get the following book:
Horstmann, Cay. Scala for the Impatient.
You can get a free PDF of the first nine chapters from Typesafe.
Additional required readings will be made available for download from the schedule page of the course website.
There will be no midterm or final exam.
There will be 6 homework assignments and a multi-phase course project. Assignments will be posted as the semester progresses. A tentative schedule for the entire semester is posted on the schedule page. Readings and exercises may change up to one week in advance of their due dates.
Natural language processing is an applied arm of the field of computational linguistics, which has experienced significant growth in the last two decades. Some of the most important factors behind this include the use of machine learning techniques, the availability of large (sometimes annotated) corpora (including the web itself), and the availability of relatively cheap and powerful computers. Together, these factors have played a major part in making computational linguistics very relevant in applied settings. In industry, the use of natural language processing techniques is often referred to as text analytics.
The foremost goal of this course is to provide practical exposure to the core techniques and applications of natural language processing. By the end, students will understand the motivations for and capabilities of several core natural language processing and machine learning algorithms and techniques used in text analysis, including:
- regular expressions
- vector space models
- clustering
- classification
- deduplication
- n-gram language models
- topic models
- part-of-speech tagging
- named entity recognition
- PageRank
- label propagation
- dependency parsing
We will show, on a few chosen topics, how natural language processing builds on and uses the fundamental data structures and algorithms presented in this course. In particular, we will discuss:
- authorship attribution
- language identification
- spam detection
- sentiment analysis
- influence
- information extraction
- geolocation
Students will learn to write non-trivial programs for natural language processing that take advantage of existing open source toolkits. The course will involve significant guidance and instruction in to software engineering practices and principles, including:
- functional programming
- distributed version control systems (git)
- build systems
- unit testing
- distributed computing (Hadoop)
The course will help prepare students both for jobs in the industry and for doing original research that involves natural language processing.
See the course schedule for details.
There are several components to the class grade.
-
Project (45%): There will be a group project (2-3 students) that involves building an automated Twitter user that employs natural language processing to inform its actions. There are six stages to this project throughout the semester, plus a demo and a final write-up.
- Stages 1-6 (20%): These are initial stages that will incrementally build up capabilities for clustering, classifying, and filtering tweets that will build toward the capabilities needed for the final project. I will drop your lowest grade, so each stage counts for 4% of your course grade.
- Demo (5%): In the last week of class, each group will present a demo of their project. This will include a discussion of the choices and resources that were used.
- Final implementation and write-up (20%): Every group's code will be developed as an open Github repository that forks the tshrdlu repository. The write-up will be done as a wiki
-
Class assignments (50%): There will be six assignments, but I will drop your lowest grade and take the average of five assignment grades. Each assignment grade is worth 10% of your course grade.
-
Class participation (5%): Showing up for class, demonstrating preparedness (i.e., doing the readings), and contributing to class discussions. Attendance is required since there will be many in-class exercises and discussions.
Grading scale for homeworks and project stages 1-6: I will not be able to give detailed feedback on these. Instead, I will check them using automated scripts, reading your written answers and inspecting key portions of the code. Based on that, you will receive a grade based on the following.
- Score = 100: impressive mastery (i.e., you went above and beyond what was asked for in the assignment).
- Score = 95: satisfactory completion (i.e., everything or nearly everything was correct).
- Score = 85: adequate completion (i.e., most of the answers were correct, but there were some missing or incorrect answers).
- Score = 75: significant deficiencies (i.e., you missed some portions of the assignment and/or many of the answers were incorrect.
- Score = 65: serious deficiencies (i.e., you missed significant portions of the assignment or a significant number of the answers were incorrect.
- Score = 0: no credit (e.g., you failed to turn in the assignment).
The demo and final project implementation+write-up will be based on grading rubrics that will be made available to you before the due dates.
Homework must be turned in on the due date in order to receive credit. Late homework will be accepted only under exceptional circumstances (e.g., medical or family emergency) and at the discretion of the instructor (e.g. exceptional denotes a rare event). This policy allowing for exceptional circumstances is not a right, but a privilege and courtesy to be used when needed and not abused. Should you encounter such circumstances, simply email assignment to instructor and note "late submission due to exceptional circumstances". You do not need to provide any further justification or personally revealing information regarding the details.
You are encouraged to discuss assignments with classmates, but all written submissions must reflect your own, original work. If in doubt, ask the instructor. Acts like plagiarism represent a serious violation of UT's Honor Code and standards of conduct:
http://deanofstudents.utexas.edu/sjs/scholdis_plagiarism.php
http://deanofstudents.utexas.edu/sjs/conduct.php
Students who violate University rules on academic dishonesty are subject to severe disciplinary penalties, such as automatically failing the course and potentially being dismissed from the University. Don't risk it. Honor code violations ultimately harm yourself as well as other students, and the integrity of the University, policies on academic honesty will be strictly enforced.
For further information please visit the Student Judicial Services Web site: http://deanofstudents.utexas.edu/sjs.
The University of Texas at Austin provides appropriate accommodations for qualified students with disabilities. To determine if you qualify, please contact the Dean of Students at 512-471-6529 or UT Services for Students with Disabilities. If they certify your needs, we will work with you to make appropriate arrangements.
UT SSD Website: http://www.utexas.edu/diversity/ddce/ssd
A student who misses an examination, work assignment, or other project due to the observance of a religious holy day will be given an opportunity to complete the work missed within a reasonable time after the absence, provided that he or she has properly notified the instructor. It is the policy of the University of Texas at Austin that the student must notify the instructor at least fourteen days prior to the classes scheduled on dates he or she will be absent to observe a religious holy day. For religious holy days that fall within the first two weeks of the semester, the notice should be given on the first day of the semester. The student will not be penalized for these excused absences, but the instructor may appropriately respond if the student fails to complete satisfactorily the missed assignment or examination within a reasonable time after the excused absence.