Data Science 101

We formed a group to discuss and learn about the data science. We meet every alternative days to work on Kaggle problem. We go through the theoretical and practical knowledge required to help us prepare for data science internship.

Titanic
- Kaggle Dataset kernel
- Traditional machine learning
- Data preprocessing and data visualization
- Kaggle Link
Movie Sentiment Analysis
- Text preprocessing (Removal of punctuations, html tags)
- Creation of dataset in proper csv formatted file
- Usage of NLTK for tokenization for TfidfVectorizer
- Machine Learning algorithms
  - Logistic Regression
  - Support Vector Machine
  - Naive Bayes
  - KNN
  - Perceptron, MLP
  - GridSearchCV for all above
- Visualization
  - Word cloud (top unigrams, bigrams, trigrams)
  - Confusion matrix
  - ROC AUC curve
  - Histogram of top negative/positive features
- Corpus creation for word embeddings
- Use of gensim for word embeddings word2vec
- Dataset
Quora Insincere Questions Analysis
- Text preprocessing
- Traditional machine learning
- Deep learning
- Dataset

Name		Name	Last commit message	Last commit date
Latest commit History 67 Commits
icxmedia		icxmedia
movie		movie
quora		quora
titanic		titanic
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Data Science 101

About

Releases

Packages

Languages

oya163/DataScience101

Folders and files

Latest commit

History

Repository files navigation

Data Science 101

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages