Vector GeoData & K-Means clustering

Course: Advanced Geocomputing | UW-Madison
Instructor: Song Gao
In this repository, I performed 3 tasks in total. (1) Math computation for geometries, (2) GeoData operation, (3) K-means clustering. GeoData (geographic data) will be applied in my project, so the coordinates will be defined by latitudes and longitudes, which refer to y-coordinates and x-coordinates respectively in Cassette coordinates.

Task 1: Math computation for geometries

In this task, I performed general mathematical computation to calculate/determine the centroid of a polygon. The procedure contains 3 steps:
Step 1. Set up classes of Point, Polyline, and Polygon.
Step 2. Set up functions (methods) in each class.
Step 3. Demonstrate the functionality of these class at the end

The method in Polyline class allows users to calcuate the length The method in the Polygon class allows users to
(1) calculate the area
(2) get the coordinate of the centroid of a polygon
(3) determine 4 boundaries of the minimum envelope which may be referred to as Bounding Box.

Task 2: Geometry Operation and Geodata Structure

In this task, I'll use pandas, Geopandas, json, and numpy libraries for reading and loading the data retrieved from Flickr. Using shapely along with pylab to visualize the location of data, then creating a convex hull for the data in a city. The purpose of convex hull is used to visualize the minimum geographic extent of a cluster. In this task, the data is classified by its location, which is the city they belong to, instead of applying the clustering algorithm to classify them.

Task 3: Customized K-means clustering

In this task, I'll manually create a K-means clustering function. Different from the existing K-means you can get in other libraries, my K-means uses GeoDataFrame as the input. It would make geodata users process the data easier, it also demonstrates my ability to create and customize the functions as needed.

Task 4: Clustering in scikit-learn

This task demonstrated the ability to use the machine learning kit to perform different clustering methods. You could also compare the K-means result in Sciki-learn with my K-means method in the previous task to check whether they have nearly the same functionality. Other than these, I also compare the result of DBSCAN with K-means to determine which one is the better way to classify the data in my case.

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
KuangChengCheng_GeoData_ML.ipynb		KuangChengCheng_GeoData_ML.ipynb
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Vector GeoData & K-Means clustering

Task 1: Math computation for geometries

Task 2: Geometry Operation and Geodata Structure

Task 3: Customized K-means clustering

Task 4: Clustering in scikit-learn

About

Releases

Packages

Languages

KCivilEnGI/VectorData-KMeansClustering

Folders and files

Latest commit

History

Repository files navigation

Vector GeoData & K-Means clustering

Task 1: Math computation for geometries

Task 2: Geometry Operation and Geodata Structure

Task 3: Customized K-means clustering

Task 4: Clustering in scikit-learn

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages