Skip to content

WallAlec/BikeSharingDataset

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 

Repository files navigation

Bike Sharing Demand Prediction

This project aims to predict the hourly bike rental demand using the Bike Sharing dataset from the UCI Machine Learning Repository. Open the python notebook to view results.

Dataset

The dataset contains hourly rental data spanning two years. The training set contains data from the first 19 days of each month, while the test set contains data from the remaining days.

The dataset contains the following features:

  • datetime: hourly date + timestamp
  • season: 1 = spring, 2 = summer, 3 = fall, 4 = winter
  • holiday: whether the day is considered a holiday
  • workingday: whether the day is neither a weekend nor a holiday
  • weather:
    • 1: Clear, Few clouds, Partly cloudy, Partly cloudy
    • 2: Mist + Cloudy, Mist + Broken clouds, Mist + Few clouds, Mist
    • 3: Light Snow, Light Rain + Thunderstorm + Scattered clouds, Light Rain + Scattered clouds
    • 4: Heavy Rain + Ice Pallets + Thunderstorm + Mist, Snow + Fog
  • temp: temperature in Celsius
  • atemp: "feels like" temperature in Celsius
  • humidity: relative humidity
  • windspeed: wind speed

The target variable is:

  • count: number of total rentals

Methodology

  1. Data exploration and visualization to understand the relationships between different features and the target variable.
  2. Feature engineering: create new features such as year, month, hour, weekday, day of the month, and rush hour.
  3. Data preprocessing: split the dataset into training and test sets.
  4. Model selection: use the Random Forest Regressor for prediction.
  5. Hyperparameter tuning: perform an exhaustive search over the specified parameter grid using GridSearchCV to find the best parameters for the Random Forest Regressor model.
  6. Model evaluation: measure the performance of the model using the Root Mean Squared Log Error (RMSLE) metric.

Dependencies

  • Python 3.8+
  • pandas
  • numpy
  • matplotlib
  • seaborn
  • scikit-learn

Usage

  1. Clone this repository.
  2. Install the required dependencies.
  3. Run the Jupyter Notebook or Python script containing the data analysis and prediction steps.

Results

The Random Forest Regressor model with the best hyperparameters found by GridSearchCV was used to predict the bike rental demand on the test dataset. The model's performance was evaluated using the RMSLE metric. Further improvements can be made by exploring different machine learning algorithms or by engineering additional features.

About

We beat the kaggle record and get RMSLE of 0.3318

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published