Skip to content

oya163/DataScience101

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

67 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Data Science 101

We formed a group to discuss and learn about the data science. We meet every alternative days to work on Kaggle problem. We go through the theoretical and practical knowledge required to help us prepare for data science internship.

  • Titanic

    • Kaggle Dataset kernel
    • Traditional machine learning
    • Data preprocessing and data visualization
    • Kaggle Link
  • Movie Sentiment Analysis

    • Text preprocessing (Removal of punctuations, html tags)
    • Creation of dataset in proper csv formatted file
    • Usage of NLTK for tokenization for TfidfVectorizer
    • Machine Learning algorithms
      • Logistic Regression
      • Support Vector Machine
      • Naive Bayes
      • KNN
      • Perceptron, MLP
      • GridSearchCV for all above
    • Visualization
      • Word cloud (top unigrams, bigrams, trigrams)
      • Confusion matrix
      • ROC AUC curve
      • Histogram of top negative/positive features
    • Corpus creation for word embeddings
    • Use of gensim for word embeddings word2vec
    • Dataset
  • Quora Insincere Questions Analysis

    • Text preprocessing
    • Traditional machine learning
    • Deep learning
    • Dataset

Releases

No releases published

Packages

No packages published