This software performs feature selection for the data-driven newsvendor problem.
This method is proposed in:
Breno Serrano, Stefan Minner, Maximilian Schiffer, and Thibaut Vidal (2024). Bilevel optimization for feature selection in the data-driven newsvendor problem. European Journal of Operational Research, 315(2), 703-714. https://doi.org/10.1016/j.ejor.2024.01.025
This repository provides all components required to reproduce the results from the paper, including the data and code for solving the bilevel optimization model and all benchmark methods based on regularization.
The solver is implemented in the programming language C++ and requires CPLEX (version 20.1) to solve the mixed intereger linear programming formulations. The code requires Python (version 3.10) for generating the instances. The software was developed and tested on Linux. Compilation uses make and requires g++ to be installed.
Compilation in Windows is also possible and requires CMake to be installed.
-
Linux: The file
src/Makefile
is provided for compiling the code, which assumes that CPLEX is installed in the directory/opt/ibm/
. To compile the code, change tosrc
directory and runmake
. -
Windows: The files
CMakeLists.txt
andFindCPLEX.cmake
are provided for compiling the code with CMake.
To create training and testing instances, create a Python virtual environment and install dependencies with pip install -r requirements.txt
.
Then, activate the environment and run python generate_instances.py
to create all instance files in the current working directory.
All parameters are passed by CLI arguments to the executable file newsvendor-features
using:
./newsvendor-features problem_type instance_path backorder_cost holding_cost [--split train_val_split] [--split_size subset_samples] [--lambda regularization_param] [--folds k] [--breakpts n_bpts] [-t time_limit] [--threads nb_threads] [-o out_path]
The following CLI arguments are available:
Argument | Inputs |
---|---|
problem_type | Method used to perform feature selection (see Table 2 for a description of valid input values) |
instance_path | Path to instance file |
backorder_cost | Back-ordering cost |
holding_cost | Holding cost |
--split | Number of training-validation splits |
--split_size | Size of training-validation split |
--lambda | Regularization parameter |
--folds | Number of partitions for cross-validation |
--breakpts | Number of breakpoints for grid search cross-validation |
-t | Time limit (in seconds) |
--threads | Number of threads to be used by CPLEX |
-o | Path to save output file |
Code | Method |
---|---|
BL | Bilevel Feature Selection (BFS) |
BL_SS | BFS with Shuffle & Split cross-validation |
ERM | Empirical Risk Minimization (without regularization) |
ERM_l0 | ERM with l0-norm regularization |
ERM_l1 | ERM with l1-norm regularization |
GS_ERM_l0 | ERM_l0 with grid search over regularization parameter values |
GS_ERM_l1 | ERM_l1 with grid search over regularization parameter values |
GS_SS_ERM_l0 | GS_ERM_l0 with Shuffle & Split cross-validation |
GS_SS_ERM_l1 | GS_ERM_l1 with Shuffle & Split cross-validation |
As an example, the following command reads the specified file from the data
directory and solves the corresponding instance using the Bilevel Feature Selection (BFS) method:
./newsvendor-features BL ../data/n300-m8-linear-l1-s6.train 1 1