Skip to content

cyberdefendersprogram/MachineLearning

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

56 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

MachineLearning - DataSet Quality Research

This page will contain our progress in creating a report detailing the quality of diferent data sets.

We have aquired permission from Mike Sconzo, owner of secrepo.com, to use his security datasets to analyze and report on the data.

Security Datasets for Machine Learning

by Tien Tran, Citlalin Galvan, Vivian Nguyen, Huy Nguyen

WHY FOCUS ON DATASETS?

Machine Learning is on the rise ⇑

A Machine Learning Algorithm can: Detect Suspicious Activity
Stop malicious files from executing

The Problem: One critical problem in Machine Learning is the limited data for security and the quality of training datasets in Cyber Security. Without a good quality dataset, a Machine Learning Algorithm cannot learn properly.

Collecting the DataSets

Downloading SecRepo’s Datasets

PE Malware Dataset featureExtraction.py

Network Dataset Network_LogtoCSV.py

Bro Logs Dataset Brolog_LogtoCSV.py

System Dataset System_LogtoCSV.py System_Squid_LogtoCSV.py

Analysis Reports

Detailing the data inside the Datasets with Jupyter Notebook

Elements in Data Quality Report:

Data Type

Count

Unique Values

Missing Values

Minimum Values

Maximum Values

Description Reports

Report Format

Abstract

Source

Dataset Information

Attribute Information

Relevant Papers

Associate Data Science Notebook

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 4

  •  
  •  
  •  
  •