Identifying and zero variance features were removed.
The removed features (click to expand)
- Identifying features
- src
- dst
- table_id
- in_port
- dl_dst
- Zero variance features
- port_rx_dropped
- port_tx_dropped
- port_rx_errors
- port_tx_errors
- port_rx_frame_err
- port_rx_over_err
- port_rx_crc_err
- port_collisions
Laplacian correction (adding 1 to all values) was applied before division transformation to handle division by zero.
To address the limitations of individual feature selection methods, three methods were employed and the intersection of their results was used to identify key variables. The three methods applied were:
- FDR[1]: controls for the expected proportion of false rejection of features in multiple significance testing
- Stepwise Selection[2]: an iterative process of adding important features to a null set of features and removing the worst performing features
- Boruta[3]: iteratively removes features that are relatively less statistically significant compared to random probability distribution
The 13 transformed features, as well as the final 5 selected by the above feature selection methods' intersection (bolded) is given below:
No | Feature Name | Equation |
---|---|---|
1 | ip_bytes_sec | |
2 | ip_packets_sec | |
3 | ip_bytes_packet | |
4 | port_bytes_sec | |
5 | port_packet_sec | |
6 | port_byte_packet | |
7 | port_flow_count_sec | |
8 | table_matched_lookup | |
9 | table_active_lookup | |
10 | port_rx_packets_sec | |
11 | port_tx_packets_sec | |
12 | port_rx_bytes_sec | |
13 | port_tx_bytes_sec |
RobustScaler[9] (RS) was then applied to handle outliers in the dataset, ensuring that the model is not overly affected by extreme values.
After performing Model Training & Analysis, the PCA, LDA and ICA dimensionality reduction techniques were then compared using the selected RF model.
The performance with LDA, the best of all three, was slightly lower than using the dataset without dimensionality reduction. Given that this performance difference is negligible, we opted to use LDA for dimensionality reduction, as it allows us to reduce the feature space from 5 to 4 features.
The final RF model pipeline, with manual division transformation, feature selection through intersection, scaling and dimensionality reduction, gave a final performance of 99.99% across all metrics, with marginal variance.
[1] Benjamini, Y., & Hochberg, Y. (1995). Controlling the False Discovery Rate: A Practical and Powerful Approach to Multiple Testing. Journal of the Royal Statistical Society: Series B (Methodological), 57, 289–300.
[2] Naser, M. (2021). Mapping functions: A physics-guided, data-driven and algorithm-agnostic machine learning approach to discover causal and descriptive expressions of engineering phenomena. Measurement, 185, 110098.
[3] Kursa, M. B., & Rudnicki, W. R. (2010). Feature Selection with the Boruta Package. Journal of Statistical Software, 36, 1–13.