This page will contain our progress in creating a report detailing the quality of diferent data sets.
We have aquired permission from Mike Sconzo, owner of secrepo.com, to use his security datasets to analyze and report on the data.
by Tien Tran, Citlalin Galvan, Vivian Nguyen, Huy Nguyen
Machine Learning is on the rise ⇑
A Machine Learning Algorithm can:
Detect Suspicious Activity
Stop malicious files from executing
The Problem: One critical problem in Machine Learning is the limited data for security and the quality of training datasets in Cyber Security. Without a good quality dataset, a Machine Learning Algorithm cannot learn properly.
Downloading SecRepo’s Datasets
PE Malware Dataset featureExtraction.py
Network Dataset Network_LogtoCSV.py
Bro Logs Dataset Brolog_LogtoCSV.py
System Dataset System_LogtoCSV.py System_Squid_LogtoCSV.py
Detailing the data inside the Datasets with Jupyter Notebook
Elements in Data Quality Report:
Data Type
Count
Unique Values
Missing Values
Minimum Values
Maximum Values
Report Format
Abstract
Source
Dataset Information
Attribute Information
Relevant Papers
Associate Data Science Notebook