Forecasting models for scooter demand can employ various approaches, such as time series models or simple regression when a linear relationship between variables exists. When historical data is available and the distribution of previous data is known, Bayesian models offer a robust method for estimating future demand. This assumes no major changes in data distribution due to factors like new policies promoting scooter usage, which could significantly increase user numbers. Ensemble techniques, including XGB, LGBM, and CatBoost, can be used to predict potential users based on factors like date, season, year, month, hour, holiday, weekday, working day, weather, temperature, perceived temperature, humidity, wind speed, and existing users. In Table 7, we summarized some the methods that have been used for bike and scooter demand prediction. Scooters are gaining popularity as an alternative transportation mode due to their convenience, affordability, and eco-friendliness. Predicting user numbers is essential to forecast demand as scooter usage continues to rise. Ensemble techniques, such as XGB, LGBM, and CatBoost, are powerful tools for modeling complex prediction challenges. Based on the existing literature and available data on bike-sharing and scooter systems, our initial approach involved examining the linearity of the dataset. Subsequently, we opted for a complex model, capable of addressing the intricate structure and time series features recently corroborated by contemporary studies. Given the time constraints and the following reasons, we selected a deep learning model for our analysis.
Our choice of a deep learning model over Long Short-Term Memory (LSTM) and graph-based neural networks was primarily influenced by several factors [18]. First, deep learning models have demonstrated remarkable success in capturing non-linear relationships and intricate patterns within large datasets. Second, these models exhibit a higher degree of flexibility, allowing them to better adapt to various types of data and problems, compared to LSTM and graph-based neural networks, which are more specialized architectures. Third, the computational efficiency of deep learning models is generally superior, especially when dealing with complex datasets, thereby making them a more practical option within the constraints of our study. In conclusion, our decision to employ a deep learning model for the analysis of bike-sharing and scooter data was guided by its ability to effectively capture complex relationships. Due to time constrain, comparing the result with Graph neural networks and LSTM could be the next step.
Based on the features available, time series models and regression models may be good options. Time series models can capture patterns over time, which may be important if user behavior varies by season or time of day. Regression models can incorporate multiple variables, which may be useful if there are external factors (e.g., weather, holidays) that affect user behavior. However, each model has its own strengths and weaknesses, and the best choice will depend on the specific context and data available. In order to create a robust comparative framework, we considered other machine learning models, including linear regression with XGBoost (Extreme Gradient Boosting), LightGBM (Light Gradient Boosting Machine), CatBoost (Categorical Boosting), and Random Forest regressor. These models are all part of the gradient boosting-based machine learning family, which falls under the broader category of ensemble learning methods that specifically utilize the boosting technique. These models train weak learners, such as decision trees, sequentially and combine their predictions to form a strong and precise predictive model [11]. By employing rapid learning and parallel processing, XGB and LGBM can provide accurate predictions in a short period, even for complicated problems. Additionally, LGBM's leaf-wise leaf growth strategy and multi-threaded optimization can effectively reduce model complexity, thus reducing overfitting.
CatBoost is another powerful ensemble learning algorithm that can be used for classification, regression, and ordinal variable prediction problems. This technique utilizes Bayesian estimators to avoid overfitting and applies symmetric tree structures in the splitting process to ensure consistency across all nodes at the same depth. Moreover, CatBoost uses efficient target-based statistics to model categorical input variables directly, reducing running time. Based on previous studies, CatBoost has shown superior accuracy compared to conventional machine learning techniques such as Gaussian Naïve Bayes, Decision Tree Classifier, Multi-layered Perceptron, Gradient Boosting Classifier, AdaBoost, Long Short-Term Memory, Seq2seq, and Random Forest.
Patil in 2015 conducted a comprehensive study on the application of the Random Forests algorithm for bike-sharing demand prediction. The authors have used a dataset containing various parameters, such as weather conditions, time, day, and user information, to build the model. The results showed that the Random Forests algorithm outperforms other traditional machine learning algorithms in terms of prediction accuracy.