-
Notifications
You must be signed in to change notification settings - Fork 31
saffsd/kaggle-stackoverflow2012
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
Code submission for Kaggle Stack Overflow challenge Initial development was based on http://fastml.com/predicting-closed-questions-on-stack-overflow/ And some of the code is derived from there. The data extraction (data2vw.py) was rewritten from scratch, integrating the two scripts from the above post. Another area I expanded was to automate the whole process using a Makefile, and additionally introducing cross-validation implemented entirely using shell tools (GNU parallel was very useful). I generated an expanded set of features, based on segmenting the documents into code and non-code sections. The non-code section was further segmented into sentences, and then into words (using NLTK). I also looked at some metrics about the user, as well as some aspects of the sentence structure, such as number of questions, exclamations etc. Marco Lui <[email protected]>, October-November 2012
About
My entry to the Kaggle 2012 Stack Overflow competition. Ranked 10th on the final public leaderboard.
Resources
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published