SNESet comprises a total of 9 million clean records with QoS and QoE telemetry metrics of 8 VSAs over four months, covering end-users from 798 edge sites, 30 cities, and 3 ISPs in one country.
We provide the artifact for our SIGMOD'24 paper, including:
- Setup
- Datasets (SNESet and all datasets for comparison)
- Characterization & Comparison
- Benchmark (Experiments for regression)
Required software dependencies are listed below:
catboost==1.1.1
matplotlib==3.5.0
numpy==1.19.2
pandas==1.1.3
seaborn==0.11.0
scikit-learn==1.1.3
scipy==1.5.2
statsmodels==0.12.2
python==3.8.5
pytorch==1.8.1
xgboost==1.7.1
The installation of GPU version LighGBM refers to this link for more details.
Dependencies can be installed using the following command:
pip install -r requirements.txt
SNESet and all datasets for comparison are available under the path 'datasets/'. The datasets for comparison consists of:
- Huawei Dataset. The original dataset is available at http://jeremie.leguay.free.fr/qoe/index.html. Our cleaned version is
<repo>/datasets/ICC_cleaned.csv
. - Alibaba cluster-trace-v2018. The original datasets is available at this reporstery. Our aggregated results are available at
<repo>/datasets/ecs/
. - Edge Dataset. The original datasets is available at this reporstery. Our aggregated results are available at
<repo>/datasets/ens/
. - SNESet. The raw data of our dataset SNESet is
<repo>/datasets/training_2nd_dataset.csv
. The overall architecture of data collection and analysis is shown below. Please refer to our paper for more details about the data collection system.
We characterize and compare the QoS and QoE metrics in SNESet with existing publicly available datasets and qualitatively investigate the impact of QoS on QoE using Kendall correlation and relative information gain.
Please refer to <repo>/characterization/README.md
for details.
We quantitatively measure the impact of different QoS metrics on QoE utilizing seven mainstream regression methods.
Considering the timeliness of real-world deployment, we compare the prediction accuracy and the time efficiency in both domain-general (for all the applications) and domain-specific scenarios (for specific applications).
Please refer to <repo>/benchmark/README.md
for details.
If you find this repo useful, please cite our paper.
@article{li2023demystifying,
title={Demystifying the QoS and QoE of Edge-hosted Video Streaming Applications in the Wild with SNESet},
author={Li, Yanan and Deng, Guangqing and Bai, Changming and Yang, Jingyu and Wang, Gang and Zhang, Hao and Bai, Jin and Yuan, Haitao and Xu, Mengwei and Wang, Shangguang},
journal={Proceedings of the ACM on Management of Data},
volume={1},
number={4},
pages={1--29},
year={2023},
publisher={ACM New York, NY, USA}
}