CREDITS:All corresponding resources
MOTIVATION:Motivation to create this repository to help upcoming aspirants and help to others in the data science field
Business understanding
1.Data collection
Data consists of 3 kinds
a.Structure data (tabular data,etc...)
b.Unstructured data (images,text,audio,etc...)
c.semi structured data (XML,JSON,etc...)
variable
a.qualitative (nominal,ordinal,binary)
b.quantitative(discrete,continuous)
a.Web scraping best article to refer-https://towardsdatascience.com/choose-the-best-python-web-scraping-library-for-your-application-91a68bc81c4f
https://www.bigdatanews.datasciencecentral.com/profiles/blogs/top-30-free-web-scraping-software
https://medium.com/analytics-vidhya/master-web-scraping-completly-from-zero-to-hero-38051423256b
1.Beautifulsoup
2.Scrapy
3.Selenium
4.Request to access data
5.AUTOSCRAPER - https://github.com/alirezamika/autoscraper
webbot https://pypi.org/project/webbot/
6.Twitter scraping tool (๐๐ ๐๐๐ or tweepy)-https://github.com/twintproject/twint
https://analyticsindiamag.com/complete-tutorial-on-twint-twitter-scraping-without-twitters-api/
https://developer.twitter.com/en/docs
Scraping Instagram -instaloader https://thecleverprogrammer.com/2020/07/30/scraping-instagram-with-python/
Scrape Wikipedia wikipedia
Web Scraping to Create a CSV File https://thecleverprogrammer.com/2020/08/08/web-scraping-to-create-csv/
7.urllib
8.pattern
9.Octoparse Easy Web Scraping https://www.octoparse.com/
ParseHub https://www.parsehub.com/ https://analyticsindiamag.com/parsehub-no-code-gui-based-web-scraping-tool/
Diffbot https://analyticsindiamag.com/diffbot/
Trustpilot
lxml https://lxml.de/index.html#introduction
ScrapingBee https://analyticsindiamag.com/scrapingbee-api/
MechanicalSoup https://analyticsindiamag.com/mechanicalsoup-web-scraping-custom-dataset-tutorial/
Scrape HTML tables https://www.youtube.com/watch?v=6U5xJ3mXRKA&feature=youtu.be
patang (extract product details) https://github.com/tejazz/patang
pandas(read_html)
https://analyticsindiamag.com/complete-learning-path-to-web-scraping-with-all-major-tools/
b.Web Crawling
https://python.libhunt.com/scrapy-alternatives
b.3rd party API'S
c.creating own data (manual collection eg:google docx,servey,etc...) primary data
d.Databases
Databases are 2 kind sequel and no sequel database
sql,sql lite,mysql,mongodb,hadoop,elastic search,cassendra,amazon s3,hive,googlebigtable,AWS DynamoDB,HBase,oracle db
sql in python https://medium.com/jbennetcodes/how-to-rewrite-your-sql-queries-in-pandas-and-more-149d341fc53e
Cloud AI Data labeling service https://cloud.google.com/ai-platform/data-labeling/docs?utm_source=youtube&utm_medium=Unpaidsocial&utm_campaign=guo-20200503-Data-Labeling
e.Online resources - ultimate resource https://datasetsearch.research.google.com/
1)kaggle-https://www.kaggle.com/datasets , ๐๐๐ ๐๐๐๐๐๐๐ ๐๐๐๐๐๐๐๐๐๐๐๐๐๐
Downloading Kaggle datasets directly into Google Colab -https://towardsdatascience.com/downloading-kaggle-datasets-directly-into-google-colab-c8f0f407d73a
2)movielens-https://grouplens.org/datasets/movielens/latest/
3)data.gov-https://data.gov.in/
4)uci-https://archive.ics.uci.edu/ml/datasets.php https://github.com/tirthajyoti/UCI-ML-API
5)Group Lens dataset https://grouplens.org/
Wikipedia ML Datasets https://en.wikipedia.org/wiki/List_of_datasets_for_machine-learning_research
6)world3bank https://data.world/ , worldbank
7)Google Cloud BigQuery public datasets
Google Public Datasets-cloud.google.com/bigquery/public-data/
Google Cloud Data Catalog https://cloud.google.com/data-catalog
Academic Torrents-https://academictorrents.com/check.htm?returnto=%2Fbrowse.php
8)online hacktons
9)image data from google_images_download
https://www.visualdata.io/discovery
http://xviewdataset.org/#dataset
https://ai.googleblog.com/2016/09/introducing-open-images-dataset.html
10)image data from Bing_Search
image data from simple_image_download https://github.com/RiddlerQ/simple_image_download
11)https://www.columnfivemedia.com/100-best-free-data-sources-infographic
12)Reddit:https://lnkd.in/dv5UCD4 https://www.reddit.com/r/datasets/
13)https://datasets.bifrost.ai/?ref=producthunt
14)data.world:https://lnkd.in/gEK897K
15)https://data.world/datasets/open-data
https://tinyletter.com/data-is-plural
16)FiveThirtyEight :- https://lnkd.in/gyh-HDj , https://data.fivethirtyeight.com/
17)BuzzFeed :- https://lnkd.in/gzPWyHj
Buzzfeed News -github.com/BuzzFeedNews
Socrata - https://opendata.socrata.com/
18)Google public datasets :- https://lnkd.in/g5dH8qE
Statistics Canada https://www.statcan.gc.ca/eng/start https://towardsdatascience.com/how-to-collect-data-from-statistics-canada-using-python-db8a81ce6475
19)Quandl :- https://www.quandl.com stock data
statista : https://www.statista.com/ stock data
20)socorateopendata :- https://lnkd.in/gea7JMz
21)AcedemicTorrents :- https://lnkd.in/g-Ur9Xy
22)labelimage:- https://github.com/wkentaro/labelme , https://github.com/tzutalin/labelImg
Labelbox-https://labelbox.com/
Playment-https://playment.io/
SuperAnnotate -https://www.superannotate.com/
CVAT-https://github.com/openvinotoolkit/cvat
Lionbridge- https://lionbridge.ai/
LinkedAI: A No-code Data Annotations- https://analyticsindiamag.com/linkedai/
Dataturks
V7 Darwin The Rapid Image Annotator https://docs.v7labs.com/docs/loading-a-dataset-in-python https://github.com/v7labs/darwin-py#usage-as-a-python-library
https://waliamrinal.medium.com/top-and-easy-to-use-open-source-image-labelling-tools-for-machine-learning-projects-ffd9d5af4a20
https://github.com/heartexlabs/awesome-data-labeling
Label a Dataset with a Few Lines of Code https://eric-landau.medium.com/label-a-dataset-with-a-few-lines-of-code-45c140ff119d
https://analyticsindiamag.com/complete-guide-to-data-labelling-tools/
23)tensorflow_datasets as tfds https://www.tensorflow.org/datasets (import tensorflow_datasets as tfds)
https://lionbridge.ai/datasets/tensorflow-datasets-machine-learning/
24)https://datasets.bifrost.ai/?ref=producthunt
25)https://ourworldindata.org/
26)https://data.worldbank.org/
27)google open images:https://storage.googleapis.com/openimages/web/download.html
https://cloud.google.com/bigquery/public-data/ https://towardsdatascience.com/bigquery-public-datasets-936e1c50e6bc
28)https://data.gov.in/
29)imagenet dataset-http://www.image-net.org/
30)https://parulpandey.com/2020/08/09/getting-datasets-for-data-analysis-tasks%e2%80%8a-%e2%80%8aadvanced-google-search/
31)https://storage.googleapis.com/openimages/web/index.html ,
https://storage.googleapis.com/openimages/web/visualizer/index.html?set=train&type=segmentation&r=false&c=%2Fm%2F09qck
https://console.cloud.google.com/marketplace/browse?filter=solution-type:dataset&_ga=2.35328417.1459465882.1589693499-869920574.1589693499
https://catalog.data.gov/dataset?groups=education2168#topic=education_navigation
https://vincentarelbundock.github.io/Rdatasets/datasets.html
32)coco dataset https://cocodataset.org/#explore
33)huggingface datasets-https://github.com/huggingface/datasets https://huggingface.co/datasets https://huggingface.co/languages
pip install datasets
34)Big Bad NLP Database-https://datasets.quantumstat.com/
https://github.com/niderhoff/nlp-datasets
nlp-datasets https://github.com/karthikncode/nlp-datasets
https://analyticsindiamag.com/15-most-important-nlp-datasets/ https://medium.com/ai-in-plain-english/25-free-datasets-for-natural-language-processing-57e407402c60
35)https://www.edureka.co/blog/25-best-free-datasets-machine-learning/
36)bigquery public dataset ,Google Public Data Explorer
https://cloud.google.com/public-datasets
37)inbuilt library data eg:iris dataset,mnist dataset,etc...
pandas-datareader https://github.com/pydata/pandas-datareader
tf.data.Datasets for TensorFlow Datasets
38)https://data.gov.sg/ https://data.gov.au/ https://data.europa.eu/euodp/en/data https://data.europa.eu/euodp/en/data https://data.govt.nz/
data.gov.be ,data.egov.bg/ ,data.gov.cz/english ,portal.opendata.dk,govdata.de,opendata.riik.ee,data.gov.ie,data.gov.gr,datos.gob.es,data.gouv.fr,data.gov.hr
dati.gov.it,data.gov.cy,opendata.gov.lt,data.gov.lv,data.public.lu,data.gov.mt,data.overheid.nl,data.gv.at,danepubliczne.gov.pl,dados.gov.pt,data.gov.ro,podatki.gov.si
data.gov.sk,avoindata.fi,oppnadata.se,https://data.adb.org/ ,https://data.iadb.org/ ,https://www.weforum.org/agenda/2018/03/latin-america-smart-cities-big-data/
https://data.fivethirtyeight.com/ , https://wiki.dbpedia.org/ ,https://www.europeandataportal.eu/en ,https://data.europa.eu/ ,https://www.census.gov/,
https://www.who.int/data/gho ,https://data.unicef.org/open-data/ ,http://data.un.org/ ,https://data.oecd.org/ ,https://data.worldbank.org/
39.Awesome Public Dataset- https://github.com/awesomedata/awesome-public-datasets
https://github.com/the-pudding/data
datasets https://github.com/benedekrozemberczki/datasets
kdnuggets https://www.kdnuggets.com/datasets/index.html
Hub https://github.com/activeloopai/Hub
40.Datasets for Machine Learning on Graphs-https://ogb.stanford.edu/
41.https://www.johnsnowlabs.com/data/
42.30 largest tensorflow datasets-https://lionbridge.ai/datasets/tensorflow-datasets-machine-learning/
43. coco dataset-https://cocodataset.org/#home
Google Open images-https://opensource.google/projects/open-images-dataset https://storage.googleapis.com/openimages/web/index.html
50+ Object Detection Datasets-https://medium.com/towards-artificial-intelligence/50-object-detection-datasets-from-different-industry-domains-1a53342ae13d
70+ Image Classification Datasets from different Industry domains-https://medium.com/towards-artificial-intelligence/70-image-classification-datasets-from-different-industry-domains-part-2-cd1af6e48eda
bifrost- https://datasets.bifrost.ai/
https://public.roboflow.com/
https://www.visualdata.io/discovery http://www.image-net.org/ https://www.cs.toronto.edu/~kriz/cifar.html
tensorflow_datasets.object_detection - https://storage.googleapis.com/openimages/web/index.html
https://github.com/google-research-datasets/Objectron/ https://ai.googleblog.com/2020/11/announcing-objectron-dataset.html?m=1
http://idd.insaan.iiit.ac.in/ http://database.mmsp-kn.de/koniq-10k-database.html
https://ai.googleblog.com/2020/11/announcing-objectron-dataset.html
https://www.visualdata.io/discovery https://blogs.bing.com/maps/2019-03/microsoft-releases-12-million-canadian-building-footprints-as-open-data
https://blogs.bing.com/maps/2019-09/microsoft-releases-18M-building-footprints-in-uganda-and-tanzania-to-enable-ai-assisted-mapping
https://datasets.bifrost.ai/ https://storage.googleapis.com/openimages/web/download.html https://computervisiononline.com/datasets http://yacvid.hayko.at/
https://www.cogitotech.com/use-cases/biodiversity/
ImageNet data -http://image-net.org/
ApolloScape Dataset-http://apolloscape.auto/
https://github.com/chrieke/awesome-satellite-imagery-datasets
44.https://github.com/fivethirtyeight/data
45.Recommender Systems Datasets-https://cseweb.ucsd.edu/~jmcauley/datasets.html
46.indiadataportal-https://indiadataportal.com/
47.US Government Open Dataset: https://www.data.gov/
https://censusreporter.org/ https://data.census.gov/cedsci/
48.AWS Public Data Sets:https://registry.opendata.aws/ https://aws.amazon.com/opendata/?wwps-cards.sort-by=item.additionalFields.sortDate&wwps-cards.sort-order=desc
49.https://the-eye.eu/public/AI/pile_preliminary_components/
Reddit -https://www.reddit.com/r/datasets/
wikipedia-https://en.wikipedia.org/wiki/List_of_datasets_for_machine-learning_research
http://opendata.cern.ch/ , https://www.imf.org/en/Data
Global Health Observatory data repository-https://apps.who.int/gho/data/node.main
CERN Open Data Portal-http://opendata.cern.ch/
TensorFlow Datasets https://www.tensorflow.org/datasets
50.openblender- https://www.openblender.io/#/welcome
51.Top 10 Datasets For Cybersecurity Projects- https://analyticsindiamag.com/top-10-datasets-for-cybersecurity-projects/
52.Datasets from Web Crawl Data (nlp)-http://data.statmt.org/cc-100/
53.https://www.springboard.com/blog/free-public-data-sets-data-science-project/
54.NASA - https://nasa.github.io/data-nasa-gov-frontpage/ace
55.Academic Torrents,GitHub Datasets,CERN Open Data Portal,Global Health Observatory Data Repository
56.32 Data Sets to Uplift your Skills in Data Science-https://blog.datasciencedojo.com/data-sets-data-science-skills/?utm_content=144243072&utm_medium=social&utm_source=linkedin&hss_channel=lcp-3740012
57.OpenDaL-https://opendatalibrary.com/
Data Is Plural-https://docs.google.com/spreadsheets/d/1wZhPLMCHKJvwOkP4juclhjFgqIY8fQFMemwKL2c64vk/edit#gid=0
VisualData-https://www.visualdata.io/discovery
https://medium.com/towards-artificial-intelligence/best-datasets-for-machine-learning-data-science-computer-vision-nlp-ai-c9541058cf4f
58.Pandas Data Reader-https://pandas-datareader.readthedocs.io/en/latest/remote_data.html
59.ieee-dataport-https://ieee-dataport.org/datasets
https://medium.com/towards-artificial-intelligence/best-datasets-for-machine-learning-data-science-computer-vision-nlp-ai-c9541058cf4f
https://github.com/neomatrix369/awesome-ai-ml-dl/blob/master/data/datasets.md#datasets-and-sources-of-raw-data
60.Faker is a Python package that generates fake data-https://github.com/joke2k/faker
Synthetic data library https://github.com/finos/datahub https://github.com/agmmnn/awesome-blender https://opendata.blender.org/ https://www.youtube.com/watch?v=eZwOeBkLL8E
61.Text Data Annotator Tool - Datasaur https://datasaur.ai/
62.Google Analytics cost data import https://segmentstream.com/google-analytics?utm_source=twitter&utm_medium=cpc&utm_campaign=ga_costs_import_en&utm_content=guide
63.https://lionbridge.ai/services/crowdsourcing/ https://lionbridge.ai/ https://www.clickworker.com/ https://appen.com/ https://www.globalme.net/
64.Azure Open Datasets https://azure.microsoft.com/en-us/services/open-datasets/ https://azure.microsoft.com/en-in/services/open-datasets/catalog/
Yelp Open Dataset https://www.yelp.com/dataset
https://data.world/
ODK Open Data Kit- https://getodk.org/
World Bank Open Data https://data.worldbank.org/
https://analyticsindiamag.com/10-biggest-data-breaches-that-made-headlines-in-2020/
https://data.mendeley.com/
https://github.com/iamtekson/geospatial-data-download-sites
https://eugeneyan.com/writing/data-discovery-platforms/
65.https://medium.com/towards-artificial-intelligence/best-datasets-for-machine-learning-data-science-computer-vision-nlp-ai-c9541058cf4f
https://towardsdatascience.com/data-repositories-for-almost-every-type-of-data-science-project-7aa2f98128b
https://github.com/MTG/freesound-datasets
https://dataform.co/
https://github.com/rfordatascience/tidytuesday https://www.youtube.com/watch?v=vCBeGLpvoYM
https://www.analyticsvidhya.com/blog/2020/12/top-15-datasets-of-2020-that-every-data-scientist-should-add-to-their-portfolio/?utm_source=linkedin&utm_medium=AV|link|high-performance-blog|blogs|44181|0.375
https://cseweb.ucsd.edu/~jmcauley/datasets.html
66.https://en.wikipedia.org/wiki/List_of_datasets_for_machine-learning_research
https://archive.org/details/datasets
https://commoncrawl.org/
https://www.youtube.com/watch?v=1aUt8zAG09E
67.yfinance for finance data using https://github.com/ranaroussi/yfinance
import fix_yahoo_finance as yf
https://www.analyticsvidhya.com/blog/2021/01/bear-run-or-bull-run-can-reinforcement-learning-help-in-automated-trading/?utm_source=feedburner&utm_medium=email&utm_campaign=Feed%3A+AnalyticsVidhya+%28Analytics+Vidhya%29
Downloading Historical Stock prices with Alpha Vantage https://medium.com/towards-artificial-intelligence/downloading-historical-stock-prices-with-alpha-vantage-688edad46a6d
Get Financial Data Directly into Python https://www.quandl.com/tools/python
openml https://www.openml.org/search?type=data
https://registry.opendata.aws/
voice_datasets https://github.com/jim-schwoebel/voice_datasets
Dynamically-Generated-Hate-Speech-Dataset https://github.com/bvidgen/Dynamically-Generated-Hate-Speech-Dataset
2.Feature engineering
Validate your Data (Schema) https://towardsdatascience.com/introduction-to-schema-a-python-libary-to-validate-your-data-c6d99e06d56a
Data cleaning-Pyjanitor-https://analyticsindiamag.com/beginners-guide-to-pyjanitor-a-python-tool-for-data-cleaning/
Speed Up Data Cleaning and Exploratory Data Analysis in Python with klib https://github.com/akanz1/klib https://towardsdatascience.com/speed-up-your-data-cleaning-and-preprocessing-with-klib-97191d320f80
Easy to use Python library of customized functions for cleaning and analyzing data https://github.com/akanz1/klib
Remove duplicate data in dataset
a.Handle missing value
Types of missing value
i.missing completely at random(no correlation b/w missing and observed data) we can delete no disturbance of data distribution
ii.missing at random (randomness in missing data, missing value have correlation by data) we can't delete because disturbance of data distribution
iii.missing not at random (there is reason for missing value and directly related to value)
1.if missing data too small then delete it a.row deletion b.column deletion c.pairwise deletion
2.replace by statistical method mean(influenced by outiler),median(not influenced by outiler),mode
3.apply classifier algorithm to predict missing value
4.Iterative imputer,knn imputer, multivariate imputation
5.apply unsupervised
6.Random Sample Imputation
7.Adding a variable to capture NAN(missing term)
8.Arbitrary Value Imputation
9.hot deck Imputation,Cold deck imputation
10.regression Imputation
11.End of Distribution Imputation
12.Arbitrary Value Imputation
13.Frequent Category Imputation
14.MICE Imputation
Extrapolation and Interpolation
Imputation using K-NN
Imputation Using Deep Learning (Datawig)
15.autoimpute-https://github.com/kearnz/autoimpute
https://towardsdatascience.com/6-different-ways-to-compensate-for-missing-values-data-imputation-with-examples-6022d9ca0779
https://stefvanbuuren.name/fimd/want-the-hardcopy.html
b.Handle imbalance
1.Under Sampling - mostly not prefer because lost of data
2.Over Sampling (RandomOverSampler (here new points create by same dot)) , SMOTETomek(new points create by nearest point so take long time),BorderLine Smote,KMeans Smote,SVM Smote,SMOTNC,ADASYN,Smote-NC https://towardsdatascience.com/5-smote-techniques-for-oversampling-your-imbalance-data-b8155bdbe2b5
https://towardsdatascience.com/7-over-sampling-techniques-to-handle-imbalanced-data-ec51c8db349f
3.class_weight give more importance(weight) to that small class
4.use Stratified kfold to keep the ratio of classess constantly
5.Weighted Neural Network
https://machinelearningmastery.com/framework-for-imbalanced-classification-projects/
c.Remove noise data
d.Format data
e.Handle categorical data Ordinal,Nominal,cyclic,binary categorical variables
1.One Hot Encoding
2.Count Or Frequency Encoding
3.Target Guided Ordinal Encoding
4.Mean Encoding
5.Probability Ratio Encoding
6.label encoding
7.probability ratio encoding
8.woe(Weight_of_evidence)
9.one hot encoding with multi category (keep most frequently repeated only)
10.feature hashing
11.sparse csr matrix
12.entity embeddings
13.binary encoding
14.Rare label encoding
15.Leave-one-out(Loo) encoding
https://towardsdatascience.com/beyond-one-hot-17-ways-of-transforming-categorical-features-into-numeric-features-57f54f199ea4
f.Scaling of data
1.Normalisation
2.Standardization
3.Robust Scaler not influenced by outliers because using of median,IQR
4. Min Max Scaling
5.Mean normalization
6.maximum absolute scaling
https://www.analyticsvidhya.com/blog/2020/07/types-of-feature-transformation-and-scaling/?utm_source=linkedin&utm_medium=KJ|link|high-performance-blog|blogs|44204|0.375
Q-Q plot or Shapiro-Wilk Normality Test is used to check whether feature is guassian or normal distributed required for linear regression,logistic regression to Improve performance if not distributed then use below methods to bring it guassian distribution
normal test for check normal distribution
anderson teset use for check any distribution
a.Guassian Transformation
b.Logarithmic Transformation
c.Reciprocal Trnasformation
d.Square Root Transformation
e.Exponential Transdormation
f.BoxCOx Transformation
g.log(1+x) Transformation
h.johnson
g.Remove low variance feature by using VarianceThreshold
h.Same variable(only 1 variable) in feature then remove feature
i.Outilers removing outilers depond on problem we are solving
2 type of outilers available: Global outiler, Local outiler
eg: incase of fraud detection outilers are very important
methods to find outiler: Standard Deviation,zscore,boxplot,scatter plot,IQR,TensorFlow_Data_Validation
Automatic Outlier Detection:Isolation Forest,Local Outlier Factor,Minimum Covariance Determinant,Robust Random Cut Forest,DBScan Clustering
outiler treatment: mean/median/random imputation,drop,discretization (binning)
if outiler present then use robust scaling
alibi-detect https://github.com/SeldonIO/alibi-detect#adversarial-detection https://docs.seldon.io/projects/alibi-detect/en/latest/
https://medium.com/towards-artificial-intelligence/outlier-detection-and-treatment-a-beginners-guide-c44af0699754
j.Anomaly
clustering techniques to find it
Isolation Forest(for Big Data),dbscan
Anomaly detection using PyOD https://pyod.readthedocs.io/en/latest/ https://www.youtube.com/watch?v=QPjG_313GOw
k.Sampling techniques
a.biased sampling
b.unbiased sampling
3.Exploratory Data Analysis(eda)
Explore the dataset by using python or microsoft excel or tableau or powerbi, etc...
Data visualization (Matplotlib,Seaborn,Plotly,pyqtgraph,Bokeh,Pygal,Dash,Pydot,Geoplotlib,ggplot,visualizer,etc...)
Scatterplot,multi line plot,bubble chart,bar chart,histogram,boxplot,distplot,bubble charts,area plot,heat map,index plot,violin plot,time series plot,density plot,dot plot,strip plot,plotly,Choropleth Map,Kepler,PDF,Kernel density function,networkx,Scatter_matrix,Bootstrap_plot,functionvis,Higher-Dimensional Plots,3-D Plots,Word Clouds,HoloViz
https://towardsdatascience.com/8-free-tools-to-make-interactive-data-visualizations-in-2021-no-coding-required-2b2c6c564b5b
https://datavizproject.com/ https://datavizcatalogue.com/
https://attachments.convertkitcdnm.com/232198/ee18f415-1406-4e5c-94f1-49a2c6e3ec4e/Statistics-The-Big-Picture-Poster.pdf
https://towardsdatascience.com/8-free-tools-to-make-interactive-data-visualizations-in-2021-no-coding-required-2b2c6c564b5b
HiPlot (high dimensional data)-https://github.com/facebookresearch/hiplot
https://towardsdatascience.com/top-6-python-libraries-for-visualization-which-one-to-use-fe43381cd658
https://www.kaggle.com/abhishekvaid19968/data-visualization-using-matplotlib-seaborn-plotly
๐๐ฒ๐ฟ๐ฎ๐ ๐ ๐ผ๐ฑ๐ฒ๐น ๐๐ถ๐๐๐ฎ๐น๐ถ๐๐ฎ๐๐ถ๐ผ๐ป ๐ด๐ฒ๐ป๐ฒ๐ฟ๐ฎ๐๐ผ๐ฟ(ann-visualizer)- ๐ฝ๐ถ๐ฝ๐ฏ ๐ถ๐ป๐๐๐ฎ๐น๐น ๐ด๐ฟ๐ฎ๐ฝ๐ต๐๐ถ๐
univariate and bivariate and multivariate analysis
model visualization Tensorboard,netron,playground tensorflow,plotly,TensorDash,Dash,Microscope,Lucid
distributions(discerte,continous)
data distributions-normal distribution,Standard Normal Distribution,Student's t-Distribution,Bernoulli Distribution,Binomial Distribution,Poisson Distribution,๏ทUniform Distribution,F Distribution,Covariance and Correlation
Types of Statistics
1.Descriptive
2.Inferential
Types of data
1) Categorical (nomial,ordinal)
2) Numerical (discerte,continous)
random variable(discerte random variable ,continous random variable)
Central Limit Theorem,Bayes Theorem,Confidence Interval,Hypothesis Testing,z test, t test,f test,Confidence Interval,1 tail test, 2 tail test,chisquare test,anova test,A/B testing
4.Feature selection
1.Filter methods (correleation,chisquare test,Ttest,anova test,mutal information,hypothesis test,information gain etc...)
2.Wrapper methods (recursive feature eliminiation,boruta,forward selection,backwaed elimination,stepwise selection etc...)
3.Embedded method (lasso,ridge regression,elasticnet,tree based etc...)
DropConstantFeatures DropDuplicateFeatures DropCorrelatedFeatures
4.Feature Importance
a.ExtraTreesClassifier,ExtraTreesregressor
b.SelectKBest
c.Logistic Regression
d.Random_forest_importance
e.decision tree
f.Linear Regression
g.xgboost
5.curse of dimensionality (as dimension increases performance decreases)
6.highly correleated features then can take any 1 feature (multicollinearity)
7.dimension reduction
8.lasso regression to penalise unimportant features
9.VarianceThreshold
10.model based selection
11.Mutual Information Feature Selection
12.remove features with very low variance (quasi constant feature dropping)
13.Univariate feature selection
14.importance of feature (random forest importance)
15.feature importance with decision trees
16.PyImpetus
17.drop constant features (variance=0)
18.variance inflation factor(vif)
19.Recursive Feature Elimination RecursiveFeatureAddition
20.exchaustive feature selection
21.Statistical Methods , Hypothesis Testing ,Recursive Feature Elimination
22.Boruta https://github.com/scikit-learn-contrib/boruta_py
https://machinelearningmastery.com/feature-selection-with-real-and-categorical-data/ https://machinelearningmastery.com/statistical-hypothesis-tests-in-python-cheat-sheet/
https://www.analyticsvidhya.com/blog/2020/10/a-comprehensive-guide-to-feature-selection-using-wrapper-methods-in-python/
5.Data splitting
Splitting ratio of data deponds on size of dataset available
Training data,Validation data,Testing data
6.Model selection
Machine learning
A.Supervised learning (have label data)
1.Regression (output feature in continous data form)
linear regression,polynomial regression,Robust Regression,support vector regression,Decision Tree Regression,Random Forest Regression,
least square method,Random Forest Regression,xgboost,ridge(L2 Regularization),lasso(L1 Regularization (more sparse)),catboost,gradientboosting,adaboost,
elsatic net,light gbm,ordinary least squares,cart,Stepwise Regression,Multivariate Adaptive Regression Splines
use cases:
2.Classification (output feature in categorical data form)
Binary,Multi-class,Multi-labe
Logistic Regression,K-Nearest Neighbors,Support Vector Machine,Kernel SVM,Naive Bayes,Decision Tree Classification,
Random Forest Classification,xgboost,adaboost,Gradient Boost,catboost,gaussian NB,LGBMClassifier,LinearDiscriminantAnalysis, Extreme Gradient Boosting Machine, passive aggressive classifier algorithm,cart,c4.5,c5.0
B.Unsupervised learning(no label(target) data)
1.Dimensionality reduction - PCA,SVD,LDA,som,tsne,plsr,pcr,autoencoders,kpca,lsa,Factor Analysis,
2.Clustering :https://scikit-learn.org/stable/modules/clustering.html
https://www.kdnuggets.com/2020/12/algorithms-explained-k-means-k-medoids-clustering.html
K-Means 8x faster, 27x lower error than Scikit-learn in 25 lines https://www.kdnuggets.com/2021/01/k-means-faster-lower-error-scikit-learn.html#.YAHAAIpnx4A.linkedin
3.Association Rule Learning - support,lift,confidence,aprior,elcat,Fp-growth,Fp-tree construction, association_rules
4.Recommendation system -
a.collaborative Recommendation system (model based, memory based(item based,user based)) user-item interaction matrix
b.content based Recommendation system
similarity based(user-user similarity,item-item similarity)
matrix factorization
c.utility based Recommendation system
d.knowledge based Recommendation system
e.demographic based Recommendation system
f.hybrid based Recommendation system
g.Average Weighted Recommendation
h.using K Nearest Neighbor
i.cosine distance recommender system
j.TensorFlow Recommenders https://www.tensorflow.org/recommenders
k.suprise baseline model
l.Tf-Rec https://github.com/Praful932/Tf-Rec
https://analyticsindiamag.com/top-open-source-recommender-systems-in-python-for-your-ml-project/
C.Ensemble methods
1.Stacking models
2.Bagging models
3.Boosting models
4.Blending
5.Voting (Hard Voting,Soft Voting)
Shapley value of players (models) in weighted voting games https://github.com/benedekrozemberczki/shapley
D.Reinforcement learning
2 types a)model free b)model based
agent,environment,policy(On-Policy vs Off-Policy),reward function,value function,state,action,episode,actor-critic
agent apply action to environment get corresponding reward so that it learn environment
1.Q-Learning
2.Deep Q-Learning
3.Deep Convolutional Q-Learning
Deep Deterministic Policy Gradient
4.Twin Delayed DDPG,DQN
5.A3C (Actor Critic)
6.Advantage weighted actor critic (AWAC).
7.XCS
8.genetic algorithm,sarsa
https://simoninithomas.github.io/deep-rl-course/
Environments-OpenAI Gym, DeepMind Lab, Unity ML-Agents
https://data-flair.training/news/python-libraries-for-reinforcement-learning/
https://analyticsindiamag.com/8-best-free-resources-to-learn-deep-reinforcement-learning-using-tensorflow/
https://analyticsindiamag.com/top-8-autonomous-driving-open-source-projects-one-must-try-hands-on/
https://analyticsindiamag.com/8-toolkits-for-reinforcement-learning-models-that-make-reasoning-explainability-core-to-ai/
https://neptune.ai/blog/best-reinforcement-learning-tutorials-examples-projects-and-courses
https://neptune.ai/blog/best-reinforcement-learning-tutorials-examples-projects-and-courses?utm_source=twitter&utm_medium=tweet&utm_campaign=blog-best-reinforcement-learning-tutorials-examples-projects-and-courses
Open AI Gym - https://gym.openai.com/
DeepMindโs MuZero https://deepmind.com/blog/article/muzero-mastering-go-chess-shogi-and-atari-without-rules?utm_campaign=Learning%20Posts&utm_content=150411901&utm_medium=social&utm_source=twitter&hss_channel=tw-3018841323
KerasRL https://github.com/keras-rl/keras-rl
pyqlearning
tensorforce https://tensorforce.readthedocs.io/en/latest/index.html
Practical_RL https://github.com/yandexdataschool/Practical_RL
rl_coach https://github.com/IntelLabs/coach#installation MushroomRL https://mushroomrl.readthedocs.io/en/latest/
TFAgents https://github.com/tensorflow/agents (https://www.tensorflow.org/agents) https://deepmind.com/blog/article/trfl
Automate The Stock Market Using FinRL (Deep Reinforcement Learning Library) https://analyticsindiamag.com/stock-market-prediction-using-finrl/
Stable Baselines https://github.com/openai/baselines
https://www.youtube.com/playlist?list=PL_iWQOsE6TfURIIhCrlt-wj9ByIVpbfGc
https://neptune.ai/blog/the-best-tools-for-reinforcement-learning-in-python?utm_source=twitter&utm_medium=tweet&utm_campaign=blog-the-best-tools-for-reinforcement-learning-in-python
Semi-Supervised Learning-small amount of labeled data with a large amount of unlabeled data during training
E.Deep-learning (use when have huge data and data is highly complex and state of art for unstructured data)
Frameworks:Pytorch,Tensorflow,Keras,caffe,theano,MXNet,Matlab,Microsoft Cognitive Toolkit,opacus(Train PyTorch models with Differential Privacy)
1.Multilayer perceptron(MLP)
1.Regression task
2.Classification task
2.Convolutional neural network ( use for image data)
1.Classification of image
create own model,Lenet,Alexnet,Resenet,GoogleNet,Inception,Vgg16,vgg19,,Efficient,Nasnet,STN,nasneta,senet,amoebanetc,DeiT (tiny,small,base)
2.Localization of object in image
3.Object detection and object segmentation
rcnn,fastrcnn,fastercnn,TensorFlow Object Detection,yolo v1,yolo v2,yolo v3,yolo v4,scaled yolov4,efficinetdet,fast yolo,yolo tiny,yolo lite,yolo tiny++,yolo act++,
maskrcnn,DeepLab-v3-plus,ssd,detectron,detectron2,mobilenet,retinanet,R-fcn,detr facebook,pspnet,segnet,U-net,UNet++,EfficientDet,Vision Transformer,deit
3 kind of object segmentation are available semantic segmentation,instance segmentation,panoptic segmentation
PyTorch based low code object detection-https://github.com/alankbi/detecto
autogluon
https://awesomeopensource.com/project/hoya012/deep_learning_object_detection
4.objecttracking (mean shit and optical flow and kalman filter)
Tracktor++,Trackrcnn,Jde,DeepSORT,FairMOT
mmtracking https://github.com/open-mmlab/mmtracking
5.Deepdream,Neural style transfer, Pose estimation
6.DEEP LEARNING METHODS FOR 2D :OpenPose,DeepPose,MultiPoseNet,AlphaPose,VIBE,DeeperCut,Mask RCNN,DeepCut,Convolutional Pose Machines,PoseNet
openpose wrnchai densepose
3D POSE ESTIMATION
3D Image Classification https://keras.io/examples/vision/3D_image_classification/
TensorFlow 2 Object Detection API tutorial https://tensorflow-object-detection-api-tutorial.readthedocs.io/en/latest/
https://blog.paperspace.com/how-to-train-scaled-yolov4-object-detection/
albumentations https://github.com/albumentations-team/albumentations
TensorFlow2.0-Examples https://github.com/YunYang1994/TensorFlow2.0-Examples
unadversarial https://github.com/microsoft/unadversarial/ https://analyticsindiamag.com/microsoft-research-unadversarial/
CNNs 'see' - FilterVisualizations, Heatmaps,Saliency Maps,Heat Map Visualizations,GradCAM,Class Activation Maps,ZFNet,Lucid,Activation Atlas,Blur Integrated Gradients,concept whitening,Integrated Gradients,SmoothGrad
https://github.com/utkuozbulak/pytorch-cnn-visualizations
Mediapipe for Python https://google.github.io/mediapipe/
imageai.Detection for Object detection
cnn-raccoon interactive dashboards for your Convolutional Neural Networks with a single line of code https://github.com/lucko515/cnn-raccoon
deit https://github.com/facebookresearch/deit https://wandb.ai/thibault-neveu/detr-tensorflow-log/reports/Finetuning-DETR-Object-Detection-with-Transformers-on-Tensorflow-A-step-by-step-tutorial--VmlldzozOTYyNzQ https://github.com/Visual-Behavior/detr-tensorflow
awesome-computer-vision-models https://github.com/nerox8664/awesome-computer-vision-models
EfficientDet https://github.com/ravi02512/efficientdet-keras
Vision Transformer - Pytorch https://github.com/lucidrains/vit-pytorch https://github.com/alohays/awesome-visual-representation-learning-with-transformers
https://github.com/ashishpatel26/Vision-Transformer-Keras-Tensorflow-Pytorch-Examples https://github.com/google-research/vision_transformer
DeepLab-v3-plus Semantic Segmentation in TensorFlow https://github.com/rishizek/tensorflow-deeplab-v3-plus
DEEP LEARNING METHODS FOR 3D:3D human pose estimation= 2D pose estimation + matching,Integral Human Pose Regression,Towards 3D Human Pose Estimation in the
Wild: a Weakly-supervised Approach,A Simple Yet Effective Baseline for 3d Human Pose Estimation,
Data Augmentation apply to increase size of dataset and performance of model
low code object detection - detecto https://github.com/alankbi/detecto
AutoML https://github.com/dataloop-ai/AutoML
Object Detection with 10 lines of code-https://www.datasciencecentral.com/profiles/blogs/object-detection-with-10-lines-of-code
OneNet-https://analyticsindiamag.com/onenet/
Norfair https://github.com/tryolabs/norfair
Remo Improves Image Management https://www.freecodecamp.org/news/manage-computer-vision-datasets-in-python-with-remo/
yolo https://github.com/zzh8829/yolov3-tf2 https://github.com/ultralytics/yolov5 https://github.com/ashishpatel26/Yolov5-King-of-object-Detection https://github.com/sicara/tf2-yolov4
clip https://github.com/openai/CLIP
3.Recurrent neural network (use when series of data)
1.RNN
2.GRU
3.LSTM (have memory cell,forget gate etc..)
all above 3 models have bidirectional also based on problem statement use bidirectional models
4.Generative adversarial network https://poloclub.github.io/ganlab/ https://developers.google.com/machine-learning/gan/training
Cycle gan,Dcgan,SRGAN,InfoGAN,stargan,attan gan,stylegan,,PixelRNN,StackGAN,DiscoGAN,lsGAN,Conditional GAN(Pix2Pix),Progressive GANs( produces higher resolution images,Image-to-Image Translation),Face Inpainting,Super-resolution
Imaginaire https://analyticsindiamag.com/guide-to-nvidia-imaginaire-gan-library-in-python/
StyleFlow https://github.com/RameenAbdal/StyleFlow
https://github.com/hindupuravinash/the-gan-zoo
5.Autoencoder
1.sparse Autoencoder
2.denoising Autoencoder
3.Contractive Autoencoder
4.stacked Autoencoder
5.deep Autoencoder
6.variational autoencoder
6.BoltzmannMachines,Restricted Boltzmann Machine,deep belief network,deep BoltzmannMachines
7.Self Organizing Maps (SOM)
8.Natural language processing
Clean data(removing stopwords depond on problem ,lowering data,tokenization,postagging,stemmimg or lemmatization depond on problem,skipgram,n-gram,chunking)
Nltk,spacy,genism,textblob,inltk,Pattern,stanza,OpenNLP,polygot,corenlp,polyglot,PyDictionary,Huggiing face,spark nlp,allen nlp,rasa nlu,Megatron,texthero,Flair,textacy,finetune,gluon-nlp,VnCoreNLP,fasttext libraries
clean-text https://github.com/jfilter/clean-text https://www.youtube.com/watch?v=i2TjAgga1YU
NLU,NLG,NER,text summarization,Sentiment Analysis,Text Classifications,machine translation,chat bot,Text Generation,Speech Recognition
1.bag of words
2.Tfidf
3.wordembedding
a.using pretrained model
i)word2vec( cbow,skipgram)
ii)glove
iiI)fasttext
b.creating own embedding (use when have huge data)
i)word2vec library
ii)keras embedding
elmo (store semantic of word)
4.Document embedding-Doc2vec
5.sentence embedding
sense2vec,SENT2VEC,Universal sentence encoder
Top2Vec
6.using rnn,lstm,gru
for above 3 models have bidirectional also
7.Encoder and Decoder(sequence to sequence), ProphetNet(new pretrained seq2seq model)
8.attention
self attention,Global Attention,Multi-Head Attention,Local Attention (monotonic,predictive) https://github.com/uzaymacar/attention-mechanisms
9.Transformer (big breakthrough in NLP) - http://jalammar.github.io/illustrated-transformer/
FastFormers https://medium.com/ai-in-plain-english/fastformers-233x-faster-transformers-inference-on-cpu-4c0b7a720e1
Shrinking Transformers (reduce size) 1.quantization,distillation,pruning,
Reformer,Performers,vision transformer
Reformer: The Efficient Transformer
Longformer: The Long-Document Transformer
ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators
Tree-Transformer https://github.com/yaushian/Tree-Transformer
10.BERT,ConvBert,Quantized MobileBERT,ALBERT,ARBERT,MARBERTElectra,Transformer-XL,Reformer,DistilBERT,ELMo,ROBERTA,XLNet,XLM-RoBERTa,DeBERTa,T5,DISTILBERT,GPT,GPT2,GPT3,PRADO,PET,BORT,MuRIL
https://analyticsindiamag.com/top-ten-bert-alternatives-for-nlu-projects/
http://jalammar.github.io/ http://jalammar.github.io/illustrated-bert/ http://jalammar.github.io/a-visual-guide-to-using-bert-for-the-first-time/
https://jalammar.github.io/explaining-transformers/ https://jalammar.github.io/hidden-states/
11.Speech
speech to text
text to speech
Acoustic model,Speaker diarisation,apis
SpeechRecognition
googletrans (google Translator) https://pypi.org/project/googletrans/
lang-identification Google Compact Language Detector,FastText
๐ด๐ง๐ง๐ฆ for text to speech conversion , ๐๐ฝ๐ฒ๐ฒ๐ฐ๐ต_๐ฟ๐ฒ๐ฐ๐ผ๐ด๐ป๐ถ๐๐ถ๐ผ๐ป
Speech-Transformer-tf2.0 https://github.com/xingchensong/Speech-Transformer-tf2.0
The Super Duper NLP Repo https://notebooks.quantumstat.com/
ecco https://github.com/jalammar/ecco https://www.eccox.io/ https://www.youtube.com/watch?v=rHrItfNeuh0&feature=youtu.be
autonlp https://analyticsindiamag.com/hands-on-guide-to-using-autonlp-for-automating-sentiment-analysis/
https://medium.com/towards-artificial-intelligence/natural-language-processing-nlp-with-python-tutorial-for-beginners-1f54e610a1a0
https://pakodas.substack.com/p/neural-search-on-indian-languages
https://www.linkedin.com/pulse/natural-language-processing-2020-year-review-ivan-bilan/?trackingId=CYfd1ZyLStu6x09tjVIoGw%3D%3D
ConvBert https://github.com/yitu-opensource/ConvBert
SentenceTransformers https://www.sbert.net/
Reformer โ The Efficient Transformer https://analyticsindiamag.com/hands-on-guide-to-reformer-the-efficient-transformer/
Funnel-Transformer https://github.com/laiguokun/Funnel-Transformer
CLIP โ Connecting Text To Images https://analyticsindiamag.com/hands-on-guide-to-openais-clip-connecting-text-to-images/
Topic Modeling in One Line with Top2Vec https://towardsdatascience.com/topic-modeling-in-one-line-with-top2vec-a413991aa0ef
MT5-https://venturebeat.com/2020/10/26/google-open-sources-mt5-a-multilingual-model-trained-on-over-101-languages/?utm_content=144321587&utm_medium=social&utm_source=linkedin&hss_channel=lcp-3740012
VADER does not require any training data https://pypi.org/project/vaderSentiment/ https://analyticsindiamag.com/sentiment-analysis-made-easy-using-vader/
APPLICATIONS OF MACHINE TRANSLATIO-Text-to-text,Text-to-speech,Speech-to-text,Speech-to-speech,Image (of words)-to-text
Google-GNMT (Tensorflow),Facebook-fairseq (Torch),Amazon-Sockeye (MXNet),NEMATUS (Theano),THUMT (Theano),OpenNMT (PyTorch),StanfordNMT (Matlab),DyNet-lamtram(CMU),EUREKA(MangoNMT
awesome-gpt3 https://github.com/elyase/awesome-gpt3
Robustness Gym: Evaluation Toolkit for NLP https://github.com/robustness-gym/robustness-gym
https://analyticsindiamag.com/best-nlp-based-seo-tools-for-2021/
https://www.kdnuggets.com/2020/05/best-nlp-deep-learning-course-free.html https://analyticsindiamag.com/flair-hands-on-guide-to-robust-nlp-framework-built-upon-pytorch/
https://medium.com/modern-nlp/nlp-metablog-a-blog-of-blogs-693e3a8f1e0c
classification,clustering,recommender systems,topic modelling,sentiment analysis,semantic analysis,summarization,machine translation,conversational interface,named entity recognition
F.Time Series
here data split is different (train,test,validate)
here handling missing data different
generally used to impute data in Time Series
1.ffill
2.bfill
3.do mean of previous or future x samples and impute
4.take previous season value and impute (data with trend)
5.mean,mode,median,random sample imputation (data without trend and without seasonality)
6.linear interpolation(data with trend and without seasonality)
7.seasonal +interpolation(data with trend and with seasonality)
here model selection deponds on different property of data like stationary,trend,seasonality,cyclic
Anomaly Detection using Isolation Forest,AutoEncoders
Granger Causality Statistical Test use for variable usable for forecast
adfuller test for Stationarity Non Stationary Statistical Test - KPSS and ADF
Handling Data with Regular Gaps using Facebook Prophet
models
1.arma,Arima , auto arima ,seasonal arima
2.Autoregressive
3.Moving average,Exponential Moving average,Exponential Smoothing
4.Lstm(neural network)
5.GARCH
atspy Automated time-series models
6.Navie forecasts
7.Smoothing (moving average,exponential smoothing)
8.Facebook prophet (note:expceted date column as ds and target column as y)
NeuralProphet Model- https://ourownstory.github.io/neural_prophet/model-overview/
hmmlearn https://github.com/ushareng/StockPricePredictionUsingHMM_Byte/blob/master/StockPricePredictionUsingHMM.ipynb
stumpy https://github.com/TDAmeritrade/stumpy
Informer (for Long Sequence Time-Series Forecasting) https://analyticsindiamag.com/informer/
deepar is global model
pmdarima for Auto ARIMA
GluonTS
9.Holts winter,Holts linear trend
10.Auto_Timeseries by auto-ts
AutoTS-https://analyticsindiamag.com/hands-on-guide-to-autots-effective-model-selection-for-multiple-time-series/ https://github.com/AutoViML/Auto_TS
AutoTS https://github.com/winedarksea/AutoTS
GluonTS , PytorchTS https://analyticsindiamag.com/gluonts-pytorchts-for-time-series-forecasting/
11.Temporal Convolutional Neural
12.Atspy For Automating The Time-Series Forecasting-https://analyticsindiamag.com/hands-on-guide-to-atspy-for-automating-the-time-series-forecasting/
13.Darts-https://analyticsindiamag.com/hands-on-guide-to-darts-a-python-tool-for-time-series-forecasting/
14.Bayesian Neural Network , TsEuler
15.PyFlux (easy way to compare different models)-https://analyticsindiamag.com/pyflux-guide-python-library-for-time-series-analysis-and-prediction/
16.Orbit , DeepAR ,NeuralProphet(https://github.com/ourownstory/neural_prophet https://ourownstory.github.io/neural_prophet/model-overview/)
best article-https://www.analyticsvidhya.com/blog/2018/02/time-series-forecasting-methods/,
time series visualization tool https://plotjuggler.io/
fastquant โ Backtest and optimize your trading strategies with only 3 lines of code https://github.com/enzoampil/fastquant
pytorch-forecasting https://github.com/jdb78/pytorch-forecasting https://analyticsindiamag.com/guide-to-pytorch-time-series-forecasting/
https://pytorch-forecasting.readthedocs.io/en/latest/ https://pytorch-forecasting.readthedocs.io/en/latest/tutorials/ar.html
sktime-https://github.com/alan-turing-institute/sktime https://analyticsindiamag.com/sktime-library/
atspy https://github.com/firmai/atspy
tcn https://towardsdatascience.com/farewell-rnns-welcome-tcns-dd76674707c8
https://machinelearningmastery.com/time-series-forecasting-methods-in-python-cheat-sheet/
https://www.machinelearningplus.com/time-series/time-series-analysis-python/
https://github.com/Apress/hands-on-time-series-analylsis-python
https://otexts.com/fpp2/simple-methods.html
https://analyticsindiamag.com/top-time-series-deep-learning-methods/
G.Semi supervised learning,Self-Supervised Learning,Multi-Instance Learning
H.Active learning,Multi-Task Learning,Online Learning
I.Transfer learning(Inductive Transfer learning(similar domain,different task),Unsupervised Transfer Learning(different task,different domain but similar enough) ,Transductive Transfer Learning(similar task,different domain))
https://github.com/artix41/awesome-transfer-learning
J.Deep dream,Style transfer
K.One-shot learning,Zero-shot learning
l.Incremental Training https://blog.rasa.com/rasa-new-incremental-training/
https://github.com/ChristosChristofidis/awesome-deep-learning
101 Machine Learning Algorithms for Data Science with Cheat Sheets https://blog-datasciencedojo-com.cdn.ampproject.org/c/s/blog.datasciencedojo.com/machine-learning-algorithms/amp/
TYPES OF ACTIVATION FUNCTIONS: LINEAR ACTIVATION,RELU,LEAKY RELU,SIGMOID ACTIVATION,TANH ACTIVATION,elu,PReLU,Softmax,Swish,Softplus
Optimizer- Gradient Descent(Batch Gradient Descent,Stochastic Gradient Descent,Mini batch Gradient Descent),sgd with momentum,Adagrad,RMSProp,Adam,AdaBelief
https://analyticsindiamag.com/ultimate-guide-to-pytorch-optimizers/ https://analyticsindiamag.com/guide-to-tensorflow-keras-optimizers/
Regularization- L1, L2, dropout, early stopping, and data augmentation,batch normalisation,tree purning
Learning rate scheduling,Weight Decay,Gradient clipping
Different Normalization Layers - https://towardsdatascience.com/different-normalization-layers-in-deep-learning-1a7214ff71d6
Hyperparameters Number of hidden layers,Dropout,activation function,Weights initialization , learning rate,epoch, iterations and batch size
DropBlock-Keras-Implementation https://github.com/iantimmis/DropBlock-Keras-Implementation https://github.com/miguelvr/dropblock https://github.com/DHZS/tf-dropblock
Hyperparameter tuning
a.GridSearchCV (check every given parameter so take long time)
b.RandomizedSearchCV (search randomly narrow down our time)
c.Bayesian Optimization , Hyperopt
d.Sequential Model Based Optimization(Tuning a scikit-learn estimator with skopt)
e.Optuna
f.Genetic Algorithms
g.Keras tuner
h.Scikit-Optimize
i.ray[tune] and aisaratuners https://towardsdatascience.com/choosing-a-hyperparameter-tuning-library-ray-tune-or-aisaratuners-b707b175c1d7
Milano https://github.com/NVIDIA/Milano
Auto-PyTorch https://github.com/automl/Auto-PyTorch
https://towardsdatascience.com/10-hyperparameter-optimization-frameworks-8bc87bc8b7e3
Cross validation techniques- https://towardsdatascience.com/understanding-8-types-of-cross-validation-80c935a4976d
1.Loocv
2.Kfoldcv
3.Stratfied cross validation
4.Time Series cross-validation
5.Holdout cross-validation
6.Repeated cross-validation
Tensorboard,Neptune to visualization of model performance
Distributed Training with TensorFlow
6.Testing model
Generally used metrics
Always check bias variance tradeoff to know how model is performing
Model can be overfitting(low bias,high variance),underfitting(high bias,high variance),good fit(low bias,low variance)
https://scikit-learn.org/stable/modules/model_evaluation.html https://scikit-learn.org/stable/modules/classes.html#module-sklearn.linear_model
https://stanford.edu/~shervine/teaching/cs-229/cheatsheet-machine-learning-tips-and-tricks
1.Regression task - mean-squared-error, Root-Mean-Squared-Error,mean-absolute error, Rยฒ, Adjusted Rยฒ,Cross-entropy loss,Mean percentage error
2.Classification task-Accuracy,confusion matrix,Precision,Recall,F1 Score,Binary Crossentropy,Categorical Crossentropy,AUC-ROC curve,log loss,Average precision,Mean average precision
3.Reinforcement learning - generally use rewards
4.Incase of machine translation use bleu score
5.Clustering then use External: Adjusted Rand index, Jaccard Score, Purity Score Internal:silhouette_score, Davies-Bouldin Index, Dunn Index
6.Object Detection loss-localization loss,classification loss,Focal Loss,IOU,L2 loss
7.Distance Metrics - Euclidean Distance,Manhattan Distance,Minkowski Distance,Hamming Distance
metric-Built-in metrics, Custom metric without external parameters,Custom metric with external parameters,Subclassing custom metric layer
Robustness Gym: Evaluation Toolkit for NLP https://github.com/robustness-gym/robustness-gym
https://medium.com/swlh/custom-loss-and-custom-metrics-using-keras-sequential-model-api-d5bcd3a4ff28
loss-Built-in loss, Custom loss without external parameters,Custom loss with external parameters,Subclassing loss layer
https://analyticsindiamag.com/all-pytorch-loss-function/ https://analyticsindiamag.com/ultimate-guide-to-loss-functions-in-tensorflow-keras-api-with-python-implementation/
Docker and Kubernetes
simplest way to serve your ML models on Kubernetes https://towardsdatascience.com/the-simplest-way-to-serve-your-ml-models-on-kubernetes-5323a380bf9f
7.deployment
Platform as a Service (PaaS),Infrastructure as a Service (IaaS),SaaS (Software as a Service)
3 main approaches of Saving and Reloading an ML Model-Pickle Approach,Joblib Approach,JSON approach
https://www.datacamp.com/community/tutorials/pickle-python-tutorial
1.Azure
2.Heroku
3.Amazon Web Services
4.Google cloud platform
MODEL DEPLOYMENT USING TF SERVING
TensorFlow Extended (TFX) is an end-to-end platform for deploying production ML pipelines https://www.tensorflow.org/tfx
Models visualization using Tensorboard,netron, TensorBoard.dev
Python web Frameworks for App Development- Flask,Streamlit,fastapi,Django,Web2py,Pyramid,CherryPy,Voila,Kivy and Kivymd
streamlit,plotly jupyterdash,h2o wave
https://analyticsindiamag.com/top-8-python-tools-for-app-development/
PyQt and Tkinter , PySimpleGUI are GUI programming in Python https://github.com/tirthajyoti/DS-with-PySimpleGUI
DearPyGui https://github.com/hoffstadt/DearPyGui
snapyml Deploy AI Models For Free -http://snapyml.snapy.ai/
h20wave-apps https://github.com/h2oai/wave-apps https://h2oai.github.io/wave/docs/installation/
DS-with-PySimpleGUI https://github.com/tirthajyoti/DS-with-PySimpleGUI
Web-Based GUI (Gradio)- https://analyticsindiamag.com/guide-to-gradio-create-web-based-gui-applications-for-machine-learning/
Bamboolib https://medium.com/ai-in-plain-english/bamboolib-a-data-warriors-weapon-9f734f4c2553
web application(dash)- https://dash.plotly.com/
https://towardsdatascience.com/pycaret-2-1-is-here-whats-new-4aae6a7f636a
Create a Website with AIhttps://www.bookmark.com/
Jupyter Notebook into an interactive dashboard (voila)-https://voila.readthedocs.io/en/stable/
high-level app and dashboarding solution(Panel)-https://panel.holoviz.org/
https://github.com/gradio-app/gradio
Tensorflow lite:Use of tensorflow lite to reduce size of model https://www.tensorflow.org/lite https://codelabs.developers.google.com/codelabs/recognize-flowers-with-tensorflow-on-android-beta/#0 https://tfhub.dev/s?deployment-format=lite https://www.tensorflow.org/lite/examples https://www.tensorflow.org/lite/microcontrollers https://www.tensorflow.org/lite/models
six different types of methods:
- Pruning
- Quantization Post-Training Quantization โ Reduce Float16 โ Hybrid Quantization โ Integer Quantization 2. During-Training Quantization 3. Post-Training Pruning 4. Post-Training Clustering
- Knowledge distillation
- Parameter sharing
- Tensor decomposition
- Linear Transformer
model optimization (architecture)
TinyML https://blog.tensorflow.org/2020/08/the-future-of-ml-tiny-and-bright.html
Post-training Quantization in TensorFlow Lite https://www.tensorflow.org/lite/performance/post_training_quantization
pruning
Automatic Model Compression (AutoMC) framework for developing smaller and faster AI applications https://github.com/Tencent/PocketFlow
leverage of model architecture
Quantization:Use Quantization to reduce size of model
8.Mointoring model
CI CD pipeline used- circleci , jenkins
In real world project use pipeline -https://scikit-learn.org/stable/modules/generated/sklearn.pipeline.Pipeline.html
1.easy debugging
2.better readability
BIG DATA: hadoop,apache spark
research paper-https://arxiv.org/ ,https://arxiv.org/list/cs.LG/recent, https://www.kaggle.com/Cornell-University/arxiv
arXiv.org https://arxiv.org/list/cs.AI/recent https://arxiv.org/list/stat.ML/recent https://arxiv.org/list/cs.CL/recent https://arxiv.org/list/cs.CV/recent
https://github.com/amitness/papers-with-video
Semantic Scholar searches: https://www.semanticscholar.org/search?q=%22neural%20networks%22&sort=relevance&ae=false
https://www.semanticscholar.org/search?q=%22machine%20learning%22&sort=relevance&ae=false
https://www.semanticscholar.org/search?q=%22natural%20language%22&sort=relevance&ae=false
https://www.semanticscholar.org/search?q=%22computer%20vision%22&sort=relevance&ae=false
https://www.semanticscholar.org/search?q=%22deep%20learning%22&sort=relevance&ae=false
code for Research Papers-https://chrome.google.com/webstore/detail/find-code-for-research-pa/aikkeehnlfpamidigaffhfmgbkdeheil
Summarise Research Papers - https://www.semanticscholar.org/
programming language for data science is Python,R,Julia,Java,Scala,JAVA SCRIPT(Tensorflow.js)
IDE:jupyter notebook,spyder,pycharm,visual studio
BEST ONLINE COURSES
1.COURSERA
2.UDEMY
3.EDX
4.DATACAMP
5.Udacity
6.https://www.skillbasics.com/
BEST YOUTUBE CHANNEL TO FOLLOW
1.Krish Naik-https://www.youtube.com/user/krishnaik06
2.Codebasics-https://www.youtube.com/channel/UCh9nVJoWXmFb7sLApWGcLPQ
3.Abhishek thakur-https://www.youtube.com/user/abhisheksvnit
4.AIEngineering-https://www.youtube.com/channel/UCwBs8TLOogwyGd0GxHCp-Dw
5.Ineuron-https://www.youtube.com/channel/UCb1GdqUqArXMQ3RS86lqqOw
6.Ken jee-https://www.youtube.com/c/KenJee1/featured
7.3Blue1Brown-https://www.youtube.com/c/3blue1brown/featured
8.The AI Guy -https://www.youtube.com/channel/UCrydcKaojc44XnuXrfhlV8Q
9.Unfold Data Science-https://www.youtube.com/channel/UCh8IuVJvRdporrHi-I9H7Vw
BEST BLOGS TO FOLLOW
https://www.cybrhome.com/topic/data-science-blogs
1.Towards data science-https://towardsdatascience.com/
2.Analyticsvidhya-https://www.analyticsvidhya.com/blog/?utm_source=feed&utm_medium=navbar https://analyticsindiamag.com/
3.Medium-https://medium.com/
4.Machinelearningmastery-https://machinelearningmastery.com/blog/
5.ML+ -https://www.machinelearningplus.com/
6.analyticsinsight https://www.analyticsinsight.net/category/latest-news/
7.KDnuggets https://www.kdnuggets.com/ https://www.kdnuggets.com/news/index.html
https://machinelearningknowledge.ai/
https://github.com/rushter/data-science-blogs
https://www.datamuni.com/
https://blog.ml.cmu.edu/?utm_source=towardsai.net&utm_medium=referral&utm_campaign=marketing&utm_term=machine-learning-blog&utm_content=best-machine-learning-blogs-to-follow
https://www.amazon.science/blog?utm_source=towardsai.net&utm_medium=referral&utm_campaign=marketing&utm_term=machine+learning+blog&utm_content=machine+learning+blog&f0=0000016e-2ff1-d205-a5ef-aff9651e0000&s=0
https://distill.pub/?utm_source=towardsai.net&utm_medium=referral&utm_campaign=marketing&utm_term=machine-learning-blog&utm_content=best-machine-learning-blogs-to-follow
https://ai.googleblog.com/search/label/Machine%20Learning?utm_source=towardsai.net&utm_medium=referral&utm_campaign=marketing&utm_term=machine-learning-blog&utm_content=best-machine-learning-blogs-to-follow
https://neptune.ai/blog?utm_source=towardsai.net&utm_medium=referral&utm_campaign=marketing&utm_term=machine+learning+blog&utm_content=machine+learning+blog
https://bair.berkeley.edu/blog/?utm_source=towardsai.net&utm_medium=referral&utm_campaign=marketing&utm_term=machine-learning-blog&utm_content=best-machine-learning-blogs-to-follow
https://deepmind.com/research?utm_source=towardsai.net&utm_medium=referral&utm_campaign=marketing&utm_term=machine-learning-blog&utm_content=machine-learning-blogs-to-follow&filters=%7B%22category%22:%5B%22Research%22%5D%7D
https://ai.facebook.com/blog/?utm_source=towardsai.net&utm_medium=referral&utm_campaign=marketing&utm_term=machine-learning-blog&utm_content=machine-learning-blogs-to-follow
https://becominghuman.ai/top-25-ai-and-machine-learning-blogs-for-data-scientists-9f121bcfd9a2
https://medium.com/towards-artificial-intelligence/best-machine-learning-blogs-to-follow-ml-research-ai-3994e01967f9
BEST RESOURCES
https://amitness.com/toolbox/ https://github.com/khuyentran1401/Data-science https://github.com/ml-tooling/best-of-ml-python
https://github.com/ml-tooling/best-of-ml-python#machine-learning-frameworks
https://towardsdatascience.com/data-science-tools-f16ecd91c95d https://mathdatasimplified.com/
1.paperswithcode-https://paperswithcode.com/methods
paperswithcode-client https://github.com/paperswithcode/paperswithcode-client
2.madewithml-https://madewithml.com/topics/ https://madewithml.com/courses/applied-ml-in-production/
Weights & Biases- https://wandb.ai/gallery sotabench-https://sotabench.com/
3.Deep learning-https://course.fullstackdeeplearning.com/#course-content
4.pytorch deep learning-https://atcold.github.io/pytorch-Deep-Learning/
https://www.kdnuggets.com/2019/08/pytorch-cheat-sheet-beginners.html https://www.kdnuggets.com/2019/04/nlp-pytorch.html
PyTorch Lightning-https://github.com/PyTorchLightning/pytorch-lightning
PYTORCH - https://pytorch.org/ https://pytorch.org/ecosystem/ https://pytorch.org/tutorials/ https://pytorch.org/docs/stable/index.html https://github.com/pytorch/pytorch
PYTORCH Lightning https://pytorchlightning.ai/community#projects https://seannaren.medium.com/introducing-pytorch-lightning-sharded-train-sota-models-with-half-the-memory-7bcc8b4484f2
๐ข๐ฝ๐ฎ๐ฐ๐๐ (๐๐ฟ๐ฎ๐ถ๐ป๐ถ๐ป๐ด ๐ฃ๐๐ง๐ผ๐ฟ๐ฐ๐ต ๐บ๐ผ๐ฑ๐ฒ๐น๐ ๐๐ถ๐๐ต ๐ฑ๐ถ๐ณ๐ณ๐ฒ๐ฟ๐ฒ๐ป๐๐ถ๐ฎ๐น ๐ฝ๐ฟ๐ถ๐๐ฎ๐ฐ๐)-https://opacus.ai/
light-face-detection https://github.com/borhanMorphy/light-face-detection
DALLE-pytorch https://github.com/lucidrains/DALLE-pytorch
PyTorch JIT -https://lernapparat.de/jit-optimization-intro/
jax- https://github.com/google/jax
incubator-mxnet - https://github.com/apache/incubator-mxnet
ignite-https://github.com/pytorch/ignite
fastText - https://github.com/facebookresearch/fastText
rapidminer-https://rapidminer.com/
5.deep-learning-drizzle-https://deep-learning-drizzle.github.io/ https://deep-learning-drizzle.github.io/index.html
6.Fastaibook-https://github.com/fastai/fastbook , https://course.fast.ai/ https://www.fast.ai/2019/07/08/fastai-nlp/ https://www.fast.ai/2020/08/21/fastai2-launch/
neptune.ai-https://docs.neptune.ai/index.html
Dive into Deep Learning http://d2l.ai/
7.TopDeepLearning-https://github.com/aymericdamien/TopDeepLearning
8.NLP-progress-https://github.com/sebastianruder/NLP-progress
9.EasyOCR-https://github.com/JaidedAI/EasyOCR
10.Awesome-pytorch-list-https://github.com/bharathgs/Awesome-pytorch-list https://shivanandroy.com/awesome-nlp-resources/
11.free-data-science-books-https://github.com/chaconnewu/free-data-science-books
12.arcgis-https://github.com/Esri/arcgis-python-api https://geemap.org/
13.data-science-ipython-notebooks-https://github.com/donnemartin/data-science-ipython-notebooks
14.julia-https://github.com/JuliaLang/julia , https://docs.julialang.org/en/v1/
15.google-research-https://github.com/google-research/google-research
16.reinforcement-learning-https://github.com/dennybritz/reinforcement-learning
17.keras-applications-https://github.com/keras-team/keras-applications , https://github.com/keras-team/keras https://keras.io/examples/
18.opencv-https://github.com/opencv/opencv
19.transformers-https://github.com/huggingface/transformers
20.code implementations for research papers-https://chrome.google.com/webstore/detail/find-code-for-research-pa/aikkeehnlfpamidigaffhfmgbkdeheil
21.regarding satellite images - Geo AI,Arcgis,geemap
ersi arcgis-https://www.esri.com/en-us/arcgis/about-arcgis/overview
earthcube-https://www.earthcube.eu/
geemap-https://geemap.org/
22.Monk_Object_Detection-https://github.com/Tessellate-Imaging/Monk_Object_Detection
https://github.com/Tessellate-Imaging/monk_v1
pyradox https://github.com/Ritvik19/pyradox
23.NLP-progress - https://github.com/sebastianruder/NLP-progress
24.interview-question-data-science-https://github.com/iNeuronai/interview-question-data-science-
25.recommenders-https://github.com/microsoft/recommenders
26.Awesome-NLP-Resources -https://github.com/Robofied/Awesome-NLP-Resources https://shivanandroy.com/awesome-nlp-resources/ https://github.com/keon/awesome-nlp
27.Tool for visualizing attention in the Transformer model-https://github.com/jessevig/bertviz
28.TransCoder-https://github.com/facebookresearch/TransCoder
29.Tessellate-Imaging-https://github.com/Tessellate-Imaging/monk_v1
Monk_Object_Detection-https://github.com/Tessellate-Imaging/Monk_Object_Detection/tree/master/application_model_zoo
Artificial-Intelligence-Deep-Learning-Machine-Learning-Tutorials- https://github.com/TarrySingh/Artificial-Intelligence-Deep-Learning-Machine-Learning-Tutorials
30.Machine-Learning-with-Python-https://github.com/tirthajyoti/Machine-Learning-with-Python
31.huggingface contain almost all nlp pretrained model and all tasks related to nlp field
https://github.com/huggingface https://github.com/huggingface/transformers https://huggingface.co/transformers/ https://huggingface.co/transformers/master/ https://github.com/huggingface/tokenizers
ktrain https://github.com/amaiya/ktrain
32.multi-task-NLP-https://github.com/hellohaptik/multi-task-NLP
33.gpt-2 - https://github.com/openai/gpt-2
34.Powerful and efficient Computer Vision Annotation Tool (CVAT)-https://github.com/openvinotoolkit/cvat, https://github.com/abreheret/PixelAnnotationTool
https://github.com/UniversalDataTool/universal-data-tool http://www.robots.ox.ac.uk/~vgg/software/via/
35.Data augmentation for NLP-https://github.com/makcedward/nlpaug
36.awesome Data Science-https://github.com/academic/awesome-datascience
37.mlops-https://github.com/visenger/awesome-mlops
38.gym-https://github.com/openai/gym
39.Super Duper NLP Repo-https://notebooks.quantumstat.com/ https://models.quantumstat.com/ https://miro.com/app/board/o9J_kqndLls=/ https://datasets.quantumstat.com/
40.papers summarizing the advances in the field-https://github.com/eugeneyan/ml-surveys
41.deep-translator-https://github.com/nidhaloff/deep-translator
42.detext-https://github.com/linkedin/detext
43.nlpaug-https://github.com/makcedward/nlpaug
44.ipython-sql-https://github.com/catherinedevlin/ipython-sql
45.libra-https://github.com/Palashio/libra
46.opencv-https://github.com/opencv/opencv
47.learnopencv-https://github.com/spmallick/learnopencv , https://www.learnopencv.com/
48.math is fun-https://www.mathsisfun.com/ , https://pabloinsente.github.io/intro-linear-algebra, https://hadrienj.github.io/posts/Deep-Learning-Book-Series-Introduction/
49.DEEP LEARNING WITH PYTORCH: A 60 MINUTE BLITZ - https://pytorch.org/tutorials/beginner/deep_learning_60min_blitz.html
50.https://data-flair.training/blogs/
https://data-flair.training/blogs/python-tutorials-home/ https://data-flair.training/blogs/hadoop-tutorials-home/ https://data-flair.training/blogs/spark-tutorials-home/
https://data-flair.training/blogs/tableau-tutorials-home/ https://data-flair.training/blogs/data-science-tutorials-home/
Spark Release 3.0.1-https://spark.apache.org/releases/spark-release-3-0-1.html
mllib https://spark.apache.org/docs/2.0.0/api/python/pyspark.mllib.html https://spark.apache.org/docs/2.0.0/api/python/index.html
https://data-flair.training/blogs/spark-tutorial/ Spark Core,Spark SQL,Spark Streaming,Spark MLlib,Spark GraphX,etc...
Machine Learning with Optimus on Apache Spark https://www.kdnuggets.com/2017/11/machine-learning-with-optimus.html
BigDL: Distributed Deep Learning Framework for Apache Spark https://github.com/intel-analytics/BigDL
51.for more cheatsheets-https://github.com/FavioVazquez/ds-cheatsheets , https://medium.com/swlh/the-ultimate-cheat-sheet-for-data-scientists-d1e247b6a60c
https://www.theinsaneapp.com/2020/12/machine-learning-and-data-science-cheat-sheets-pdf.html
https://stanford.edu/~shervine/teaching/cs-229/cheatsheet-supervised-learning
52.text2emotion-https://pypi.org/project/text2emotion/
53.ExploriPy-https://analyticsindiamag.com/hands-on-tutorial-on-exploripy-effortless-target-based-eda-tool/
54.TCN-https://github.com/philipperemy/keras-tcn
55.deeplearning-models-https://github.com/rasbt/deeplearning-models
56.earthengine-py-notebooks-https://github.com/giswqs/earthengine-py-notebooks
57.NLP-progress -https://github.com/sebastianruder/NLP-progress
58.numerical-linear-algebra -https://github.com/fastai/numerical-linear-algebra
59.Super Duper NLP Repo- https://notebooks.quantumstat.com/
60.reinforcement learning by using PyTorch-https://github.com/SforAiDl/genrl
61.chatbot- from scratch,google dialogflow,rasa nlu,azure luis, chatterbot,Amazon lex,Wit.ai,Luis.ai,IBM Watson etc...
https://github.com/fendouai/Awesome-Chatbot
https://www.analyticsinsight.net/category/chatbots/
https://blog.ubisend.com/optimise-chatbots/chatbot-training-data
- No Code Machine Learning / Deep Learning
Teachable Machine-https://teachablemachine.withgoogle.com/
Microsoft Lobe -https://lobe.ai/
WEKA - https://www.cs.waikato.ac.nz/ml/weka/
Monk_Gui-https://github.com/Tessellate-Imaging/Monk_Gui
FlashML https://www.flash-ml.com/
igel https://github.com/nidhaloff/igel
obviously https://www.obviously.ai/
machine learning straight from Microsoft Excel https://venturebeat.com/2020/12/30/you-dont-code-do-machine-learning-straight-from-microsoft-excel/
ENNUI-https://math.mit.edu/ennui/ https://github.com/martinjm97/ENNUI https://www.youtube.com/watch?v=4VRC5k0Qs2w
Knime https://www.knime.com/
Accord.net http://accord-framework.net/
H2O Driverless AI https://www.h2o.ai/products/h2o-driverless-ai/
Rapid Miner https://rapidminer.com/
opennn https://www.opennn.net/
datarobot https://www.datarobot.com/
dataiku https://www.dataiku.com/product/get-started/
ludwig https://github.com/ludwig-ai/ludwig
orange https://orange.biolab.si/
OpenBlender https://openblender.io/#/welcome
create neural networks with one line of code https://github.com/PraneetNeuro/nnio.l
Machine Learning in JUST ONE LINE OF CODE libra https://github.com/Palashio/libra/ https://www.youtube.com/watch?v=N_T_ljj5vc4
64.tensorflow development-https://blog.tensorflow.org/
TensorFlow Hub (trained ready-to-deploy machine learning models in one place) - https://tfhub.dev/
TensorBoard.dev - https://tensorboard.dev/
tutorials-https://www.tensorflow.org/tutorials https://www.tensorflow.org/guide
TensorFlow Graphics - https://www.tensorflow.org/graphics Lattice-https://www.tensorflow.org/lattice
TensorFlow Probability-https://www.tensorflow.org/probability TensorFlow Privacy- tensorflow-privacy
63.Data Science in the Cloud-Amazon SageMaker,Amazon Lex,Amazon Rekognition,Azure Machine Learning (Azure ML) Services,Azure Service Bot framework,Google Cloud AutoML
64.platforms to build and deploy ML models -Uber has Michelangelo,Google has TFX,Databricks has MLFlow,Amazon Web Services (AWS) has Sagemaker
65.Time Complexity Of Machine Learning Models -https://www.thekerneltrip.com/machine/learning/computational-complexity-learning-algorithms/
66.ML from scratch-https://dafriedman97.github.io/mlbook/content/introduction.html
https://aihubprojects.com/machine-learning-from-scratch-python/
67.turn-on visual training for most popular ML algorithms https://github.com/lucko515/ml_tutor https://pypi.org/project/ml-tutor/
68.mlcourse.ai is a free online- https://mlcourse.ai/
69.using pretrained model provided by tfhub- https://tfhub.dev/
70.Deep-Learning-with-PyTorch- https://pytorch.org/assets/deep-learning/Deep-Learning-with-PyTorch.pdf
71.MIT 6.S191 Introduction to Deep Learning-http://introtodeeplearning.com/
72.R for Data Science-https://r4ds.had.co.nz/ ,Fundamentals of Data Visualization-https://clauswilke.com/dataviz/
74.machine learning in JavaScript-https://www.tensorflow.org/js https://www.tensorflow.org/js/models https://tensorflow-js-object-detection.glitch.me/
TensorFlow.jl Julia with TensorFlow https://malmaud.github.io/tfdocs/ https://malmaud.github.io/TensorFlow.jl/latest/tutorial.html
Sonnet is a library built on top of TensorFlow 2 https://github.com/deepmind/sonnet
TensorFlow Federated (TFF) ( facilitate open research and experimentation with Federated Learning)-https://www.tensorflow.org/federated
TFX is an end-to-end platform for deploying production ML pipelines https://www.tensorflow.org/tfx https://github.com/tensorflow/tfx
Federated Learning -https://www.tensorflow.org/federated/tutorials/federated_learning_for_image_classification
Neural Structured Learning-https://www.tensorflow.org/neural_structured_learning/tutorials/graph_keras_mlp_cora
Responsible AI-https://www.tensorflow.org/resources/responsible-ai
https://www.tensorflow.org/graphics
Multilingual Representations for Indian Languages https://tfhub.dev/google/MuRIL/1
75.free list of AI/ Machine Learning Resources/Courses-https://www.marktechpost.com/free-resources/
https://github.com/kabartay/OpenUnivCourses
https://www.kdnuggets.com/2018/11/10-free-must-see-courses-machine-learning-data-science.html
https://www.kdnuggets.com/2018/12/10-more-free-must-see-courses-machine-learning-data-science.html
https://www.theinsaneapp.com/2020/12/machine-learning-and-data-science-cheat-sheets-pdf.html
https://www.theinsaneapp.com/2020/11/free-machine-learning-data-science-and-python-books.html
65 Machine Learning and Data books for free- https://towardsdatascience.com/springer-has-released-65-machine-learning-and-data-books-for-free-961f8181f189
https://www.deeplearningbook.org/ http://d2l.ai/
https://www.datasciencecentral.com/profiles/blogs/free-500-page-book-on-applications-of-deep-neural-networks-1 https://github.com/jeffheaton/t81_558_deep_learning
https://www.theinsaneapp.com/2020/12/free-data-science-books-pdf.html
https://github.com/chaconnewu/free-data-science-books
https://www.kdnuggets.com/2020/03/24-best-free-books-understand-machine-learning.html
https://www.kdnuggets.com/2020/12/15-free-data-science-machine-learning-statistics-ebooks-2021.html
http://introtodeeplearning.com/
https://www.theinsaneapp.com/2020/12/free-data-science-books-pdf.html
http://d2l.ai/index.html https://www.kdnuggets.com/2020/09/best-free-data-science-ebooks-2020-update.html
https://www.youtube.com/playlist?app=desktop&list=PLypiXJdtIca5ElZMWHl4HMeyle2AzUgVB https://mit6874.github.io/
76.Code for Research Papers-https://chrome.google.com/webstore/detail/find-code-for-research-pa/aikkeehnlfpamidigaffhfmgbkdeheil
77.Natural Language Processing 365- https://ryanong.co.uk/natural-language-processing-365/
78.Top Computer Vision Google Colab Notebooks- https://www.qblocks.cloud/creators/computer-vision-google-colab-notebooks
79.For practice -https://www.confetti.ai/exams
81.Mathematics of Machine Learning,deep learning-https://towardsdatascience.com/the-mathematics-of-machine-learning-894f046c568
https://github.com/hrnbot/Basic-Mathematics-for-Machine-Learning
https://towardsdatascience.com/the-roadmap-of-mathematics-for-deep-learning-357b3db8569b
https://www.kdnuggets.com/2020/02/free-mathematics-courses-data-science-machine-learning.html
https://towardsai.net/p/data-science/how-much-math-do-i-need-in-data-science-d05d83f8cb19
https://www.mltut.com/how-to-learn-math-for-machine-learning-step-by-step-guide/
https://stanford.edu/~shervine/teaching/cs-229/cheatsheet-machine-learning-tips-and-tricks#
https://www.datasciencecentral.com/profiles/blogs/free-online-book-machine-learning-from-scratch
https://www.youtube.com/playlist?list=PLRDl2inPrWQW1QSWhBU0ki-jq_uElkh2a https://github.com/jonkrohn/ML-foundations
https://ocw.mit.edu/resources/res-18-001-calculus-online-textbook-spring-2005/textbook/
82.Googleai-https://ai.google/education
83.ELI5 is a Python package which helps to debug machine learning classifiers and explain their predictions
PyBrain is a modular Machine Learning Library for Python
84.Best Online Courses for Machine Learning and Data Science-https://www.mltut.com/best-online-courses-for-machine-learning-and-data-science/
Comprehensive Project Based Data Science Curriculum https://julienbeaulieu.github.io/2019/09/25/comprehensive-project-based-data-science-curriculum/
AI Expert Roadmap-https://i.am.ai/roadmap/#data-science-roadmap
85.FastAPI-https://fastapi.tiangolo.com/deployment/deta/
86.Yann LeCunโs Deep Learning Course at CDS-https://cds.nyu.edu/deep-learning/ https://atcold.github.io/pytorch-Deep-Learning/
https://atcold.github.io/pytorch-Deep-Learning/
https://www.cs.cmu.edu/~ninamf/courses/601sp15/lectures.shtml
87.Four Important Computer Vision Annotation Tools https://heartbeat.fritz.ai/4-important-computer-vision-annotation-tools-you-need-to-know-in-2020-9f964931ed7
88.Python Data Science Handbook https://jakevdp.github.io/PythonDataScienceHandbook/
89.for low code object detection (detecto)- https://github.com/alankbi/detecto
90.1 line for hundreds of NLP models and algorithms- https://github.com/JohnSnowLabs/nlu
91.AudioFeaturizer when deal with audio data- https://pypi.org/project/AudioFeaturizer/
liborsa library https://librosa.org/doc/latest/index.html
MAGENTA-https://magenta.tensorflow.org/
92.Palladium-https://palladium.readthedocs.io/en/latest/
93.KNIME-https://www.knime.com/
94.Facebook Open Sourced New Frameworks to Advance Deep Learning Research https://www.kdnuggets.com/2020/11/facebook-open-source-frameworks-advance-deep-learning-research.html
95.Software Engineering for Machine Learning https://github.com/SE-ML/awesome-seml
96.Atlas web-based dashboard -https://www.atlas.dessa.com/
97.Pytest (test code) https://docs.pytest.org/en/latest/index.html (test code)
98.keras- https://keras.io/ https://keras.io/api/ https://keras.io/examples/
99.High-Performance Jupyter Notebook - BlazingSQL Notebooks https://blazingsql.com/notebooks
jupyter-tabnine https://github.com/wenmin-wu/jupyter-tabnine
100.CV-pretrained-model- https://github.com/balavenkatesh3322/CV-pretrained-modelCV-pretrained-model-
101.Kubeflow Machine Learning Toolkit for Kubernetes https://www.kubeflow.org/
102.Daily AI updates to your inbox- https://sago-ai.news/#/
103.Three API styles - Sequential Model,functional API,Model subclassing
104.Deep Learning Toolkit for Medical Image Analysis -https://github.com/DLTK/DLTK
106.Interpret The ML Model
lime(explain black box models)- https://lime-ml.readthedocs.io/en/latest/
https://github.com/slundberg/shap
Shapash makes Machine Learning models transparent and understandable by everyone https://github.com/MAIF/shapash
interpret https://github.com/interpretml/interpret
Captum Model Interpretability for PyTorch https://captum.ai/ https://github.com/pytorch/captum
ecco https://github.com/jalammar/ecco https://jalammar.github.io/explaining-transformers/ https://www.eccox.io/
dalex https://pypi.org/project/dalex/ https://blog.learningdollars.com/2021/01/02/ai-in-medical-diagnosis/ https://www.kdnuggets.com/2020/11/dalex-explain-tensorflow-model.html
google AI Explanations for AI Platform https://cloud.google.com/ai-platform/prediction/docs/ai-explanations/overview?utm_source=youtube&utm_medium=Unpaidsocial&utm_campaign=guo-20200423-Intro-Aiexp
eli5 https://eli5.readthedocs.io/en/latest/
TabNet: Attentive Interpretable Tabular Learning https://github.com/dreamquark-ai/tabnet
skater https://oracle.github.io/Skater/
what if tool https://pair-code.github.io/what-if-tool/ https://pair-code.github.io/what-if-tool/demos/uci.html
DeepLIFT https://github.com/kundajelab/deeplift
explainerdashboard https://towardsdatascience.com/the-quickest-way-to-build-dashboards-for-machine-learning-models-ec769825070d
Responsible AI-https://www.tensorflow.org/resources/responsible-ai
fairlearn https://github.com/fairlearn/fairlearn
Google Facets https://pair-code.github.io/facets/
Googleโs Model Card Toolkit
Opening the AI Black Box -https://zetane.com/gallery
AI Explainability 360 Toolkit from IBM Research https://aix360.mybluemix.net/
onnx https://github.com/onnx/onnx
torch-dreams https://github.com/Mayukhdeb/torch-dreams
https://github.com/jphall663/awesome-machine-learning-interpretability
https://christophm.github.io/interpretable-ml-book/ https://github.com/christophM/interpretable-ml-book
https://www.kdnuggets.com/2018/12/machine-learning-explainability-interpretability-ai.html
Fairness
How to easily check if your Machine Learning model is fair (dalex) https://www.kdnuggets.com/2020/12/machine-learning-model-fair.html
LinkedIn Fairness Toolkit,Fairlearn,AI Fairness 360,scikit-fairness,Algofairness,Aequitas,CERTIFAI,ML-fairness-gym
107.deep-learning-drizzle -https://deep-learning-drizzle.github.io/
108.Machine Learning University - https://aws.amazon.com/machine-learning/mlu/
109.mlflow https://mlflow.org/ An open source platform for the machine learning lifecycle
https://www.kdnuggets.com/2021/01/5-tools-effortless-data-science.html
https://azure.microsoft.com/en-us/services/machine-learning/
https://github.com/VertaAI/modeldb
110.Data Preparation / ETL https://airflow.apache.org/ https://intake.readthedocs.io/en/latest/
111.fairlearn https://github.com/fairlearn/fairlearn/blob/master/README.md Evaluating fairness of AI/ML models and training data and for mitigating bias in models determined to be unfair.
AI Fairness 360 evaluating fairness of AI/ML models and training data and mitigating bias in current models https://aif360.mybluemix.net/
An ethics checklist for data scientists https://deon.drivendata.org/
112.MONAI Framework For Medical Imaging Research https://analyticsindiamag.com/monai-datatsets-managers/
torchio https://github.com/fepegar/torchio https://analyticsindiamag.com/torchio-3d-medical-imaging/
MolBert: Molecular Representation learning with AI
medicalAI https://github.com/aibharata/medicalAI
Biopython is a set of freely available tools https://github.com/biopython/biopython
DeepIPW https://github.com/ruoqi-liu/DeepIPW
113.OpenVINO https://opencv.org/openvino-model-optimization/ https://opencv.org/how-to-speed-up-deep-learning-inference-using-openvino-toolkit-2/
115.Code faster https://www.tabnine.com/
116.Pytest for Data Scientists https://towardsdatascience.com/4-lessor-known-yet-awesome-tips-for-pytest-2117d8a62d9c
117.mlflow https://mlflow.org/docs/latest/index.html
MLOps https://github.com/microsoft/MLOps
DevOps https://github.com/collections/devops-tools
airflow https://github.com/apache/airflow
kubeflow https://github.com/kubeflow/kubeflow
kubernetes https://github.com/kubernetes/kubernetes
pipeline https://scikit-learn.org/stable/modules/generated/sklearn.pipeline.Pipeline.html
118.algorithm to use by problem https://www.datasciencecentral.com/profiles/blogs/which-machine-learning-deep-learning-algorithm-to-use-by-problem
119.Connect the world to your data and fuel your ML.
OpenBlender Enrich ML Models with adding new Variables from Any Source to Boost Performance https://www.youtube.com/channel/UCCFN8DDrA6k7eHYLvZGdNVA https://openblender.io/
- Google's MuRIL (Multilingual Representations for Indian Languages) https://tfhub.dev/google/MuRIL/1
122.tools-https://towardsdatascience.com/data-science-tools-f16ecd91c95d
123.Elements of AI free online course https://www.elementsofai.com/
124.Best_AI_paper_2020 https://github.com/louisfb01/Best_AI_paper_2020
125.roadmap https://github.com/graykode/nlp-roadmap
https://www.freecodecamp.org/news/data-science-learning-roadmap/
https://github.com/AMAI-GmbH/AI-Expert-Roadmap
data-engineer-roadmap https://github.com/datastacktv/data-engineer-roadmap
Visualizing the Execution of Python Program http://pythontutor.com/ https://www.youtube.com/watch?v=pCSlWQjfCzA
MLPerf Model performance debugging tools https://mlperf.org/
Model debugging tools Manifold https://eng.uber.com/manifold/
Icecream https://towardsdatascience.com/stop-using-print-to-debug-in-python-use-icecream-instead-79e17b963fcc
Experiment tracking tools WandB https://wandb.ai/site
Comet manage and organize machine learning experiments https://www.comet.ml/site/
MLflow Open-source platform for tracking machine learning experiments https://mlflow.org/
neptune https://neptune.ai/
weights & biases https://wandb.ai/site
127.19 Best JupyterLab Extensions for Machine Learning https://neptune.ai/blog/jupyterlab-extensions-for-machine-learning
128.coreml https://developer.apple.com/machine-learning/core-ml/
129.Protect Your Neural Networks Against Hacking Adversarial Robustness Toolbox (ART) https://analyticsindiamag.com/adversarial-robustness-toolbox-art/
131.datascience-fails https://github.com/xLaszlo/datascience-fails
132.Jupyter notebook integration for Microsoft Excel https://github.com/pyxll/pyxll-jupyter https://towardsdatascience.com/python-jupyter-notebooks-in-excel-5ab34fc6439
Voilร turns Jupyter notebooks into standalone web applications https://github.com/voila-dashboards/voila https://github.com/voila-dashboards/voila-gridstack
How to Optimize Your Jupyter Notebook https://www.kdnuggets.com/2020/01/optimize-jupyter-notebook.html
TabNet: Attentive Interpretable Tabular Learning https://github.com/dreamquark-ai/tabnet
133.rapidly develop data applications with Python https://github.com/dstackai/dstack
134.Google Research: Looking Back at 2020, and Forward to 2021 https://ai.googleblog.com/2021/01/google-research-looking-back-at-2020.html
135.cortex Run inference at scale https://www.cortex.dev/ https://github.com/cortexlabs/cortex
Follow leaders in the field to update yourself in the field
1.Linkedin
2.Twitter
CPU/GPU/TPU
1.Google cloab (FREE)
2.Kaggle kernel(read terms and conditions before use) (FREE)
3.Paperspace Gradient(read terms and conditions before use)
4.knime - https://www.knime.com/(read terms and conditions before use)
5.RapidMiner (read terms and conditions before use)
https://github.com/zszazi/Deep-learning-in-cloud
So what next ?
participate online competition and do project and apply to intership ,job,solving real world problems, etc...
applications of data science in many industry
1.E-commerce- Identifying consumers,Recommending Products,Analyzing Reviews
2.Manufacturing- Predicting potential problems,Monitoring systems,Automating manufacturing units, Maintenance Scheduling,Anomaly Detection
3.Banking- Fraud detection,Credit risk modeling,Customer lifetime value
4.Healthcare- Medical image analysis, Drug discovery,Bioinformatics,Virtual Assistants,image segmentation
5.Transport- Self-driving cars,Enhanced driving experience,Car monitoring system,Enhancing the safety of passengers
6.Finance- Customer segmentation,Strategic decision making,Algorithmic trading,Risk analytics
7.Marketing (Added from comments Credits: Jawad Ali)- LTV predictions,Predictive analytics for customer behavior,Ad targeting
and many more fields - https://www.topbots.com/enterprise-ai-companies-2020/ , https://venturebeat.com/2020/10/21/the-2020-data-and-ai-landscape/
Research blogs
1.https://ai.facebook.com/ https://ai.facebook.com/blog/
3.https://deepmind.com/blog https://deepai.org/definitions
5.https://www.malongtech.com/en/research.html
6.https://blogs.nvidia.com/blog/tag/artificial-intelligence/
https://ai.googleblog.com/2021/01/google-research-looking-back-at-2020.html?m=1
7.https://blog.tensorflow.org/
kdnuggets.com
https://www.kdnuggets.com/2020/01/top-10-ai-ml-articles-to-know.html
RESEARCH LABS IN THE WORLD
https://ai.facebook.com/ https://ai.googleblog.com/ https://research.google/ https://ai.google/research/
1.The Alan Turing Institute:https://www.turing.ac.uk/
2.J.P. Morgan AI Research Lab:https://www.jpmorgan.com/insights/tec...
3.Oxford ML Research Group:http://www.robots.ox.ac.uk/~parg/proj...
4.Microsoft Research Lab- AI:https://www.microsoft.com/en-us/resea...
5.Berkeley AI Research:https://bair.berkeley.edu/
6.LIVIA:https://en.etsmtl.ca/Unites-de-recher...
7.MIT Computer Science and Artificial :https://www.csail.mit.edu/
online competitions:
1.Kaggle-https://www.kaggle.com/
2.hackerearth-https://www.hackerearth.com/challenges/
3.machinehack-https://www.machinehack.com/
4.analyticsvidhya-https://datahack.analyticsvidhya.com/contest/all/
5.zindi-https://zindi.africa/competitions
6.crowdai-https://www.crowdai.org/
7.driven data-https://www.drivendata.org/
8.dockship-https://dockship.io/
9.SIGNATE Competition- https://signate.jp/about?rf=competition_about
9.International Data Analysis Olympiad (IDAHO)
10.Codalab
11.Iron Viz
12.Data Science Challenges
13.Tianchi Big Data Competition
14.https://www.techgig.com/hackathon/ml_hackathon
Some useful content :
- H20.ai automl, google automl,google ml kit(https://developers.google.com/ml-kit) ,Azure Cognitive Services,Azure Machine Learning Service,amazon ml,Azure Machine Learning Studio,Google Cloud Platform,gcp automl ision,Weka,Microsoft Cognitive Toolkit,Google Cloud AutoML,DataRobot AutoML,Databricks AutoML,Azure ML,azure machine learning studio,IBM Watson ml studio,AWS Sagemaker Studio,aws rekognition,Google AI Platform,Databricks,Domino Data Lab,roboflow
https://docs.microsoft.com/en-us/azure/machine-learning/algorithm-cheat-sheet
https://codegnan.com/blog/35-best-data-sciecne-tools-for-beginners-to-master/
mlkit-https://firebase.google.com/products/ml
- Tpot
auto_ml https://github.com/ClimbsRocks/auto_ml
-
autopandas
-
AutoGluon https://analyticsindiamag.com/how-to-automate-machine-learning-tasks-using-autogluon/
AutoGL: The First Ever AutoML Framework for Graph Datasets https://analyticsindiamag.com/meet-autogl-the-first-ever-automl-framework-for-graph-datasets/
- autosklearn,autokeras,LightAutoML (https://github.com/sberbank-ai-lab/LightAutoML)
AutoNeuro https://autoneuro.challenge-ineuron.in/
-
autoviml
๐ฎ๐๐๐ผ๐บ๐ฎ๐๐ฒ ๐บ๐ผ๐๐ ๐ผ๐ณ ๐๐ต๐ฒ ๐ฑ๐ฎ๐๐ฎ ๐๐ฐ๐ถ๐ฒ๐ป๐ฐ๐ฒ https://github.com/Muhammad4hmed/GML
CodeLess https://pypi.org/project/codeless/ https://github.com/porky5191/codeless_demo_project
-
autoViz
-
hyperopt
-
sweetviz (EDA purpose) - https://pypi.org/project/sweetviz/
-
pandasprofiling(display whole EDA) - https://pypi.org/project/pandas-profiling/ https://pandas-profiling.github.io/pandas-profiling/docs/master/rtd/index.html
-
autokeras,AutoSklearn,Neural Network Intelligence
FeatureTools automated feature engineering.
MLBox,Lightwood,mindsdb(machine learning models using SQL queries),mljar-supervised,Ludwig(deep learning models without the need to write code)
AdaNet is a lightweight TensorFlow-based framework
-
pycaret- https://pycaret.org/
Machine Learning in Power BI using PyCaret https://www.kdnuggets.com/2020/05/machine-learning-power-bi-pycaret.html
mindsdb Machine Learning in 5 Lines of Code https://mindsdb.com/
automated feature engineering https://github.com/alteryx/featuretools
AutoML toolkit https://github.com/microsoft/nni
mljar-supervised Automates Machine Learning Pipeline with Feature Engineering and Hyper-Parameters Tuning https://github.com/mljar/mljar-supervised
MLBox is a powerful Automated Machine Learning python library https://github.com/AxeldeRomblay/MLBox
12.Auto_Timeseries by auto_ts
13.AutoNLP_Sentiment_Analysis by autoviml
14.automl lazypredict https://github.com/shankarpandala/lazypredict
AutoML Toolkit for Graph Datasets & Tasks AutoGL(Auto Graph Learning)https://medium.com/syncedreview/tsinghua-university-releases-first-automl-toolkit-for-graph-datasets-tasks-c61ea0261d78
AutoFeat-https://analyticsindiamag.com/guide-to-automatic-feature-engineering-using-autofeat/
15.bamboolib or pandas-ui or pandas-summary or pandas_visual_analysis or Dtale(get code also) (python package for easy data exploration & transformation)
Automating EDA using Pandas Profiling, Sweetviz and Autoviz,DataPrep,vaex,Datapane,Sweetviz,PandasGUI,Datatable,Dora,Pywedge,D-Tale,lux,Dabl,Pretty pandas,AWS Glue DataBrew,speedML,edaviz,Altair,voyager,Mito,Facets,KNIME
explainerdashboard https://towardsdatascience.com/the-quickest-way-to-build-dashboards-for-machine-learning-models-ec769825070d
Facets https://github.com/PAIR-code/facets https://towardsdatascience.com/visualize-your-data-with-facets-d11b085409bc
https://github.com/mstaniak/autoEDA-resources
ExploriPy import EDA-https://analyticsindiamag.com/hands-on-tutorial-on-exploripy-effortless-target-based-eda-tool/
Lens- Statistical Analysis of Data https://analyticsindiamag.com/hands-on-tutorial-on-lens-python-tool-for-swift-statistical-analysis/
Dashboard in Less Than 10 Lines of Code https://towardsdatascience.com/build-dashboards-in-less-than-10-lines-of-code-835e9abeae4b
MitoSheets https://analyticsindiamag.com/guide-to-mitosheets-harnessing-power-of-spreadsheets-in-python/
Datacleaner-https://analyticsindiamag.com/tutorial-on-datacleaner-python-tool-to-speed-up-data-cleaning-process/
Datacleaner :dora ,Voilร -Jupyter Notebooks quickly into standalone web applications , Plotly Dash - for more advanced and production level dashboards
featurewiz(Select the best features from your data set fast with a single line of code) - https://github.com/AutoViML/featurewiz
explainerdashboard https://medium.com/analytics-vidhya/explainer-dashboard-build-interactive-dashboards-for-machine-learning-models-fda63e0eab9
Panel - web apps
Automating report generation with Jupyter Notebooks https://medium.com/applied-data-science/full-stack-data-scientist-5-automating-report-generation-with-jupyter-notebooks-919e32e88d18
Datapane ( Build Interactive Reports) https://towardsdatascience.com/introduction-to-datapane-a-python-library-to-build-interactive-reports-4593fd3cb9c8
pomegranate probabilistic modelling in Python https://github.com/jmschrei/pomegranate https://www.kdnuggets.com/2020/12/fast-intuitive-statistical-modeling-pomegranate.html
16.CUPY (array process parallel in gpu) https://pypi.org/project/cupy/
17.Dabl-automate the known 80% of Data Science which is data preprocessing, data cleaning, and feature engineering https://pypi.org/project/dabl/
18.dask (parallel comptataion) https://docs.dask.org/en/latest/ https://medium.com/rapids-ai/reading-larger-than-memory-csvs-with-rapids-and-dask-e6e27dfa6c0f#cid=av01_so-nvsh_en-us
thundergbm Fast GBDTs and Random Forests on GPUs https://github.com/Xtra-Computing/thundergbm
thundersvm https://github.com/Xtra-Computing/thundersvm
pandas chunksize,Modin , Vaex , Dask,cuDF,mars,ray,rapids,joblib,snorkel https://www.youtube.com/watch?v=eJyjB3cNIB0&feature=youtu.be
19.dataprep (Understand your data with a few lines of code in seconds)
data-preparation-tools - https://improvado.io/blog/data-preparation-tools
20.Dora library is another data analysis library designed to simplify exploratory data analysis. https://pypi.org/project/Dora/
21.FastAPI is a modern, fast (high-performance), web framework for building APIs. https://fastapi.tiangolo.com/
22.faster Hyper Parameter Tuning(sklearn-nature-inspired-algorithms) https://pypi.org/project/sklearn-nature-inspired-algorithms/
23.FlashText (A library faster than Regular Expressions for NLP tasks) https://pypi.org/project/flashtext/
24.Guietta (tool that makes simple GUIs simple) https://pypi.org/project/guietta/
pandas-visual-analysis -https://analyticsindiamag.com/hands-on-guide-to-pandas-visual-analysis-way-to-speed-up-data-visualization/
25.hummingbird (make code fastly exexcute) https://pypi.org/project/Hummingbird/ https://analyticsindiamag.com/guide-to-hummingbird-a-microsofts-library-for-expediting-traditional-machine-learning-models/
CUML- increase the speed of training your machine learning model https://towardsdatascience.com/train-your-machine-learning-model-150x-faster-with-cuml-69d0768a047a
https://docs.rapids.ai/api/cuml/stable/
26.memory-profiler (tell memory consumption line by line) https://pypi.org/project/memory-profiler/
Cython A Speed-Up Tool for your Python Function https://towardsdatascience.com/cython-a-speed-up-tool-for-your-python-function-9bab64364bfd
Python Tricks for Keeping Track of Your Data https://towardsdatascience.com/python-tricks-for-keeping-track-of-your-data-aef3dc817a4e
27.numexpr (incerease speed of execution of numpy) https://github.com/pydata/numexpr
pypolars instead of pandas (beating-pandas-performance) https://www.youtube.com/watch?v=1-O_KnLZEso
50X speed up your Pandas apply function https://github.com/jmcarpenter2/swifter
JAX Autograd and XLA, facilitating high-performance machine learning research https://github.com/google/jax
Numba (optimise performance of numpy and high performance python compiler) http://numba.pydata.org/
28.pandarallel (simple and efficient tool to parallelize your pandas computation on all your CPUs) https://pypi.org/project/pandarallel/
29.PDFTableExtract(by PyPDF2) https://github.com/ashima/pdf-table-extract
Camelot-https://towardsdatascience.com/extracting-tabular-data-from-pdfs-made-easy-with-camelot-80c13967cc88
30.PyImpuyte(Python package that simplifies the task of imputing missing values in big datasets) https://pypi.org/project/PyImpuyte/
31.libra(Automates the end-to-end machine learning process in just one line of code) https://pypi.org/project/libra/
32.debug code by puyton -m pdp -c continue
33.cURL (This is a useful tool for obtaining data from any server via a variety of protocols including HTTP.) https://stackabuse.com/using-curl-in-python-with-pycurl/
34.csvkit https://pypi.org/project/csvkit/
35.IPython IPython gives access to enhanced interactive python from the shell.
36.pip install faker (Create our own Dataset) https://pypi.org/project/Faker/
37.Python debugger %pdb
38.๐๐๐๐๐-From notebooks to standalone web applications and dashboards https://voila.readthedocs.io/en/stable/ https://github.com/voila-dashboards/voila
39.๐๐๐๐๐๐๐ for timeseries data https://github.com/tslearn-team/tslearn
40.texthero text-based dataset in Pandas Dataframe quickly and effortlessly https://github.com/jbesomi/texthero
41.๐๐๐๐๐๐๐(web-based visualization libraries like your Jupyter Notebook with zero dependencies) https://pypi.org/project/kaleido/
42.Vaex- Reading And Processing Huge Datasets in seconds https://github.com/vaexio/vaex
43.Uberโs Ludwig is an Open Source Framework for Low-Code Machine Learning https://eng.uber.com/introducing-ludwig/
44.Google's TAPAS, a BERT-Based Model for Querying Tables Using Natural Language https://github.com/google-research/tapas
45.RAPIDS open GPU Data Science https://rapids.ai/
RAPIDS cuML
tick is a lightweight machine learning library https://x-datainitiative.github.io/tick/
modular machine learning framework http://www.pybrain.org/docs/
machine learning framework It supports several programming languages notably: Python, R, Java, Scala, Ruby and Lua Shogun https://github.com/shogun-toolbox/shogun/
46.pyforest Lazy-import of all popular Python Data Science libraries. Stop writing the same imports over and over again. https://pypi.org/project/pyforest/0.1.1/
47.Modin Get faster Pandas with Modin https://github.com/modin-project/modin
48.Text2Code for Jupyter notebook - https://github.com/deepklarity/jupyter-text2code , https://towardsdatascience.com/data-analysis-made-easy-text2code-for-jupyter-notebook-5380e89bb493
49.Openrefine Tool-For Data Preprocessing Without Code https://analyticsindiamag.com/openrefine-tutorial-a-tool-for-data-preprocessing-without-code/
50.Microsoft Releases Latest Version Of DeepSpeed deep learning optimisation library known as DeepSpeed- https://github.com/microsoft/DeepSpeed
51.4-pandas-tricks-https://towardsdatascience.com/4-pandas-tricks-that-most-people-dont-know-86a70a007993
52.tkinter to deploy machine learning model-https://analyticsindiamag.com/complete-tutorial-on-tkinter-to-deploy-machine-learning-model/
53.autoplotter is a python package for GUI based exploratory data analysis-https://github.com/ersaurabhverma/autoplotter
54.3 NLP Interpretability Tools For Debugging Language Models-https://www.topbots.com/nlp-interpretability-tools/
55.New Algorithm For Training Sparse Neural Networks (RigL)-https://analyticsindiamag.com/rigl-google-algorithm-neural-networks/
56.Read Data from pdf and Word-PyPDF2,PDFMiner,PDFQuery,tabula-py,pdflib for Python,PDFTables,PyFPDF2
OpenCV to Extract Information From Table Images-https://analyticsindiamag.com/how-to-use-opencv-to-extract-information-from-table-images/
57.Text Annotation-https://towardsdatascience.com/tortus-e4002d95134b
58.GDMix, A Framework That Trains Efficient Personalisation Models - https://analyticsindiamag.com/linkedin-open-sources-gdmix-a-framework-that-trains-efficient-personalisation-models/
59.Learn Machine Learning Concepts Interactively-https://towardsdatascience.com/learn-machine-learning-concepts-interactively-6c3f64518da2
60.Folium, Python Library For Geographical Data Visualization-https://analyticsindiamag.com/hands-on-tutorial-on-folium-python-library-for-geographical-data-visualization/
61.GPU Technology Conference (GTC) Keynote Oct 2020-https://www.youtube.com/watch?v=Dw4oet5f0dI&list=PLZHnYvH1qtOYOfzAj7JZFwqtabM5XPku1
62.jiant nlp task-https://github.com/nyu-mll/jiant
63.painted your machine learning model-https://koaning.github.io/human-learn/
64.Vector AI-https://github.com/vector-ai/vectorai
65.NVIDIA NeMo(for Conversational AI)-https://github.com/NVIDIA/NeMo
66.Deep Learning Models Without Coding(DeepCognition)-https://analyticsindiamag.com/how-to-use-deepcognition-to-build-drag-and-drop-deep-learning-models-without-coding/
67.100 Machine Learning Projects-https://medium.com/@amankharwal/100-machine-learning-projects-aff22b22dd6e
68.Question generation using Natural Language Processing-https://github.com/ramsrigouthamg/Questgen.ai
69.PixelLib(image segmentation,Blur Background,Gray Background,Background Colour Change,Background Change)-https://github.com/ayoolaolafenwa/PixelLib
70.High-Resolution 3D Human Digitization-https://shunsukesaito.github.io/PIFuHD/
71.AI model that translates 100 languages without relying on English data - https://ai.facebook.com/blog/introducing-many-to-many-multilingual-machine-translation/
72.800 free textbooks - https://open.umn.edu/opentextbooks
73.TensorDash is an application that lets you remotely monitor your deep learning model's metrics and notifies you when your model training is completed or crashed.
https://github.com/CleanPegasus/TensorDash
74.YellowBrick -select features, tune hyperparameters, select the best models, and understand the performance metrics.
75.Freely Available Python Books-https://rajukumarmishrablog.com/freely-available-python-books/
Collection of Python Cheat Sheets- https://rajukumarmishrablog.com/collection-of-python-cheat-sheets/
76.Add External Data to Your Pandas Dataframe - https://towardsdatascience.com/add-external-data-to-your-pandas-dataframe-with-a-one-liner-f060f80daaa4
https://www.openblender.io/#/welcome
77.visualize the model architecture-https://github.com/PerceptiLabs/PerceptiLabs
78.Train Conversational AI in 3 lines of code with NeMo and Lightning-https://towardsdatascience.com/train-conversational-ai-in-3-lines-of-code-with-nemo-and-lightning-a6088988ae37
79.Machine Learning for Healthcare by mit-https://ocw.mit.edu/courses/electrical-engineering-and-computer-science/6-s897-machine-learning-for-healthcare-spring-2019/
80.pydot is an interface to Graphviz ,AutoGraph-Easy control flow for graphs,Neo4j-Graph Data Science Library,pyRDF2Vec-Representations of Entities in a Knowledge Graph,igraph,NetworkX,euler,pyvis
https://www.tensorflow.org/neural_structured_learning
AutoGL: The First Ever AutoML Framework for Graph Datasets https://analyticsindiamag.com/meet-autogl-the-first-ever-automl-framework-for-graph-datasets/
open-source project for analysis of graphs or networks GrasPy / graspologic https://graspy.neurodata.io/
https://www.kdnuggets.com/2019/05/60-useful-graph-visualization-libraries.html
81.HTML tables into Google Sheets -https://towardsdatascience.com/import-html-tables-into-google-sheets-effortlessly-f471eae58ac9
82.Gradio - take input frpm user https://gradio.app/getting_started
- Mito, an editable spreadsheet inside your Jupyter Notebook. - https://trymito.io/
84.Google Introduces Document AI (DocAI) https://www.marktechpost.com/2020/11/05/google-introduces-document-ai-docai-platform-for-automated-document-processing/
85.100 Machine Learning Projects-https://amankharwal.medium.com/100-machine-learning-projects-aff22b22dd6e
86.https://towardsdatascience.com/25-hot-new-data-tools-and-what-they-dont-do-31bf23bd8e56
87.Opacus: A high-speed library for training PyTorch models-https://ai.facebook.com/blog/introducing-opacus-a-high-speed-library-for-training-pytorch-models-with-differential-privacy
88.lazynlp https://github.com/chiphuyen/lazynlp
89.yfinance to get finance data
90.Pseudo-Labeling (deal with small datasets)https://towardsdatascience.com/pseudo-labeling-to-deal-with-small-datasets-what-why-how-fd6f903213af
91.Project List A - Comparatively Easy Wine Quality Analysis,Boston Housing Prediction,Spam Email Classification,Survival Prediction - Titanic Disaster,Stock Market Prediction Class of Flower Prediction,Bigmart Sales Prediction,Air Pollution Prediction,IMDB Prediction,Optimizing Product Price,Web Traffic Time Series Forecasting,Insurance Purchase Prediction,Tweet Classification
Project List B - Comparatively Difficult,Domain-Specific Chatbot,Fake News Detection,Human Action Recognition,Video Classification,Driver Drowsiness Detection,Medical Report Gen Using CT Scans,Sign Language Detection,Image Caption Generator,Celebrity Voice Prediction,Speech Emotion Recognition,Job Recommendation System,Interest Level in Rental Properties,Google Ads Keywords Generator
https://ml-showcase.paperspace.com/ https://github.com/ashishpatel26/500-AI-Machine-learning-Deep-learning-Computer-vision-NLP-Projects-with-code
https://dev.to/hb/30-machine-learning-ai-data-science-project-ideas-gf5
https://medium.com/the-innovation/130-machine-learning-projects-solved-and-explained-605d188fb392
https://thecleverprogrammer.com/machine-learning/ https://www.kdnuggets.com/2020/03/20-machine-learning-datasets-project-ideas.html
https://data-flair.training/blogs/machine-learning-datasets/# https://data-flair.training/blogs/machine-learning-project-ideas/
https://data-flair.training/blogs/artificial-intelligence-ai-tutorial/
https://data-flair.training/blogs/cartoonify-image-opencv-python/ https://data-flair.training/blogs/python-project-calorie-calculator-django/
https://www.theinsaneapp.com/2020/11/machine-learning-projects-with-source-codes.html https://www.theinsaneapp.com/2020/11/data-science-projects-with-source-code.html
https://medium.com/coders-camp/20-deep-learning-projects-with-python-3c56f7e6a721 https://amankharwal.medium.com/12-machine-learning-projects-on-object-detection-46b32adc3c37
https://amankharwal.medium.com/7-python-gui-projects-for-beginners-87ae2c695d78
https://amankharwal.medium.com/20-machine-learning-projects-for-portfolio-81e3dbd167b1 https://amankharwal.medium.com/4-chatbot-projects-with-python-5b32fd84af37
https://amankharwal.medium.com/30-python-projects-solved-and-explained-563fd7473003
https://www.aiquotient.app/projects https://www.aiquotient.app/ https://www.mltut.com/best-machine-learning-projects-for-beginners/
https://medium.com/coders-camp/20-machine-learning-projects-on-nlp-582effe73b9c
- Visual Programming (Orange) https://orange.biolab.si/
93.The Linux Command Handbook-https://www.freecodecamp.org/news/the-linux-commands-handbook/
94.130 Machine Learning Projects Solved and Explained-https://medium.com/the-innovation/130-machine-learning-projects-solved-and-explained-605d188fb392
95.DataBrew-do drag-and-drop data cleansing
96.stratascratch- https://www.stratascratch.com/
97.5 ways to celebrate TensorFlow's 5th birthday-https://blog.google/technology/ai/5-ways-celebrate-tensorflows-5th-birthday/
98.TensorFlow.js: Machine Learning in Javascript https://blog.tensorflow.org/2018/03/introducing-tensorflowjs-machine-learning-javascript.html
99.Language Interpretability Tool open-source platform for visualization and understanding of NLP models - https://pair-code.github.io/lit/
100.Deep Learning Hardware Guide https://towardsdatascience.com/another-deep-learning-hardware-guide-73a4c35d3e86
101.johnsnowlabs- https://nlp.johnsnowlabs.com/ https://nlp.johnsnowlabs.com/docs/en/quickstart https://nlp.johnsnowlabs.com/docs/en/licensed_release_notes
103.Edit a spreadsheet Generate Python https://trymito.io/?source=twitter1
104.Clarifai-https://www.clarifai.com/ https://analyticsindiamag.com/clarifai/
105.rapidly build and deploy machine learning models https://analyticsindiamag.com/top-10-datarobot-alternatives-one-must-know/
106.Hive Data full-stack AI https://thehive.ai/hive-data
107.real-time remote service to get the Keras callbacks to the telegram including the details of metrics https://github.com/ksdkamesh99/TensorGram
108.Language Interpretability Tool - https://pair-code.github.io/lit/demos/
109.Docly will handle the comments http://thedocly.io/
110.machine-learning-roadmap-2020 https://whimsical.com/machine-learning-roadmap-2020-CA7f3ykvXpnJ9Az32vYXva
111.Django models https://www.deploymachinelearning.com/#create-django-models https://www.deploymachinelearning.com/
112.freecodecamp - https://www.freecodecamp.org/learn
113.image_to_string (pytesseract)
Extract Tables in PDFs to pandas DataFrames - tabula-py
114.NLP Pipelines in a single line of code https://medium.com/analytics-vidhya/nlp-pipelines-in-a-single-line-of-code-500b3266ac7b
115.Best and Worst Cases of Machine-Learning Models https://medium.com/towards-artificial-intelligence/best-and-worst-cases-of-machine-learning-models-part-1-36cdb9296611
https://www.youtube.com/watch?v=mlumJPFvooQ&list=PLZoTAELRMXVM0zN0cgJrfT6TK2ypCpQdY
116.aitextgen #for ai text generation
117.http://introtodeeplearning.com/ http://cs231n.stanford.edu/ http://web.stanford.edu/class/cs224n/index.html#schedule https://www.youtube.com/playlist?list=PLkFD6_40KJIwhWJpGazJ9VSj9CFMkb79A https://www.youtube.com/playlist?list=PLkFD6_40KJIwhWJpGazJ9VSj9CFMkb79A https://www.youtube.com/playlist?list=PLwRJQ4m4UJjPiJP3691u-qWwPGVKzSlNP https://www.youtube.com/playlist?list=PLoROMvodv4rMC6zfYmnD7UG3LVvwaITY5
117.https://data-flair.training/blogs/data-science-tutorials-home
118.Integrating Tableau With Python https://analyticsindiamag.com/tabpy/
Qlib https://analyticsindiamag.com/qlib/
119.Pystiche - Create Your Artistic Image Using Pystiche https://analyticsindiamag.com/pystiche/ https://pystiche.readthedocs.io/en/latest/index.html
120.Low Light Image Enhancement using Python & Deep Learning https://github.com/soumik12345/MIRNet/ https://www.youtube.com/watch?v=b5Uz_c0JLMs
I will be so happy that this repository helps you. Thank you for reading.
HAPPY LEARNING