Follow the tutorial: TUTORIAL
This project is built in Motoko and is aimed at processing student data, calculating distances between data points based on specific student attributes, and making predictions using the KNN (k-Nearest Neighbors) algorithm.
K-Nearest Neighbors (KNN) is a simple, yet powerful, machine learning algorithm used for classification and regression tasks. It works by finding the 'k' nearest data points (neighbors) to a new data point and classifying it based on the majority label of its neighbors or predicting its value using the average of nearby points. KNN is a non-parametric method, meaning it makes no assumptions about the underlying data distribution. However, it can be computationally expensive, especially with large datasets, since it requires calculating the distance between data points during the prediction phase.
- Calculate Distances: Calculate the distance between a set of student records and input records using attributes such as absences, study time, failures, and higher education status.
- Fetch Raw Data: Retrieve raw JSON data for analysis.
- KNN Prediction: Make predictions based on input data and pre-existing student records using KNN.
-
Clone the repository:
-
Deploy the Canisters: Use the command below to deploy the backend:
make deploy
-
Test the Application: You can run the provided tests with:
./dm_core_test.sh
To fetch raw data, navigate to the Candid UI. You can select a JSON file (like the one available in the data/
folder) and load its contents:
- Go to the
fetch_raw_data
function. - Choose a JSON ia Data Directory, file (e.g.,
kirito.json
). - Select the Raw view, copy the link.
- Click Call to fetch the data.
Use calculateAllDistancesMock
to compute the distances between student records. Enter values for absences
, failures
, higher
, and studytime
and click Call.
Use the predictHigher
function to predict the chances of a student pursuing higher education based on their study habits and failures.
The data/
directory contains multiple example student records in JSON format that you can use to fetch raw data and test the algorithms.
Example of kirito.json
:
{
"studytime": 1,
"higher": "yes",
"absences": 2,
"failures": "no"
}