From 012b1246f3e8009bb80058700eb31e867d7952f8 Mon Sep 17 00:00:00 2001 From: SzymonPajzert Date: Tue, 12 Apr 2016 17:00:27 +0200 Subject: [PATCH] similiar papers on Stack Overflow --- .../StackOverflowCrawler.md | 0 docs/StackOverflowCrawler/similiarPapers.md | 31 +++++++++++++++++++ 2 files changed, 31 insertions(+) rename docs/{ => StackOverflowCrawler}/StackOverflowCrawler.md (100%) create mode 100644 docs/StackOverflowCrawler/similiarPapers.md diff --git a/docs/StackOverflowCrawler.md b/docs/StackOverflowCrawler/StackOverflowCrawler.md similarity index 100% rename from docs/StackOverflowCrawler.md rename to docs/StackOverflowCrawler/StackOverflowCrawler.md diff --git a/docs/StackOverflowCrawler/similiarPapers.md b/docs/StackOverflowCrawler/similiarPapers.md new file mode 100644 index 0000000..d414da1 --- /dev/null +++ b/docs/StackOverflowCrawler/similiarPapers.md @@ -0,0 +1,31 @@ +# List of similar projects and notes +Most of the papers in the list is taken from: +http://meta.stackexchange.com/questions/134495/academic-papers-using-stack-exchange-data + +### [Mining StackOverflow to Turn the IDE into a Self-Confident Programming Prompter](http://www.inf.usi.ch/phd/ponzanelli/profile/publications/2014b/Ponz2014b.pdf) + [Prompter: A Self-confident Recommender System](http://www.inf.usi.ch/phd/ponzanelli/profile/publications/2014d/Ponz2014d.pdf) +IDE plugin, querying code snippets and retrieving evaluated solution. Unluckily, algorithm uses search engines instead of their own machine learning algorithm. + +**Possible project value:** important + +### [Predicting Tags for StackOverflow Posts](http://chil.rice.edu/research/pdf/StanleyByrne2013StackOverflow.pdf) +Prediction of tags for given text with 65% accuracy. Prediction done with use of Bayesian probabilistic model. + +**Possible project value:** significant + +### [StORMeD: Stack Overflow Ready Made Data](http://www.inf.usi.ch/phd/ponzanelli/profile/publications/2015a/Ponz2015a.pdf) +Ready model and algorithms to mine data in Stack Overflow. + +**Possible project value:** meagre + +### [Mining Questions Asked by Web Developers](http://salt.ece.ubc.ca/publications/docs/kartik-msr14.pdf) +Unsupervised learning - topic clustering. Data contained questions about HTML5, JavaScript and CSS. Main goal was to divide and label questions as using natural language processing and Latent Dirichlet Allocation - type of statistical modeling that can be used to discover hidden topics in +a collection of documents, based on the statistics of words in each document. + +**Possible project value:** meagre + +### [Automatic categorization of questions from Q&A sites](http://lascam.facom.ufu.br/cms/userfiles/downloads/2014/SAC2014CameraReady.pdf) +Q&A questions classification algorithms. Questions on SO are divided into 3 categories: how-to-do-it, need-to-know, seeking-something. Presented algorithms, with varying efficiency classify data - the best turned out to be Naive Bayes. + +Naive Bayes: These classifiers assume that all the attributes are independent and that each contributes equally to the categorization. A category is assigned to a project by combining the contribution of each feature. + +**Possible project value:** meagre