An interactive approach to understanding Machine Learning using scikit-learn
-
Updated
Jun 17, 2024 - Jupyter Notebook
An interactive approach to understanding Machine Learning using scikit-learn
Developed a Windows tool using PyQt5, integrating K-means clustering for data analysis. The application recommends optimal cluster numbers, identifies cluster members, and allows exporting results to Excel.
Clustream, Streamkm++ and metrics utilities C/C++ bindings for python
The project involves performing clustering analysis (K-Means, Hierarchical clustering, visualization post PCA) to segregate stocks based on similar characteristics or with minimum correlation. Having a diversified portfolio tends to yield higher returns and face lower risk by tempering potential losses when the market is down.
Capstone Project for the IBM Professional Certificate on Coursera
Pytorch implementation of standard metrics for clustering
This case requires to develop a customer segmentation to understand customer's behaviour and separate them in different groups according to their preferences, and once the division is done, this information can be given to marketing team so they can plan the strategy accordingly.
This repository contains introductory notebooks for principal component analysis.
Clustered customers into distinct groups based on similarity among demographical and geographical parameters. Applied PCA to dispose insignificant and multi correlated variances. Defined optimal number of clusters for K-Means algorithm. Used Euclidian distance as a measure between centroids.
K-means is a least-squares optimization problem, so is PCA. k-means tries to find the least-squares partition of the data. PCA finds the least-squares cluster membership vector.
Perform Clustering (Hierarchical, K Means Clustering and DBSCAN) for the airlines and crime data to obtain optimum number of clusters. Draw the inferences from the clusters obtained.
This project focuses on predicting customer churn in an e-commerce setting using machine learning techniques.
It's the HAC algorithm that Im using to sort newspaper articles by news. You can adapt it to pretty much any type of text.
A customer profiling project based on RFM (Recency, Frequency, Monetary) analysis using a dataset from an online retail company in the United Kingdom. The aim is to identify customer habits and create personalized marketing strategies for targeted advertising.
To perform customer segmentation using Python unsupervised learning model
This repo explores KMeans and Agglomerative Clustering effectiveness in simplifying large datasets for ML. Goals include dataset download, finding optimal clusters via Elbow and Silhouette methods, comparing clustering techniques, validating optimal clusters, tuning hyperparameters. Detailed explanations and analysis are provided.
OptimalCluster is the Python implementation of various algorithms to find the optimal number of clusters. The algorithms include elbow, elbow-k_factor, silhouette, gap statistics, gap statistics with standard error, and gap statistics without log. Various types of visualizations are also supported.
This project aims to analyze a transnational dataset from a UK-based online retail company and identify major customer segments. By categorizing customers into distinct groups based on their characteristics, businesses can gain valuable insights and tailor their strategies to better serve each segment.
Best Clustering using silhouette_score
Unsupervised Machine Learning project for Netflix Movies and TV Shows Clustering. The main goal of this project is to create a content-based recommender system that recommends top 10 shows to users based on their viewing history.
Add a description, image, and links to the silhouette-score topic page so that developers can more easily learn about it.
To associate your repository with the silhouette-score topic, visit your repo's landing page and select "manage topics."