forked from wolever/Protocol-Informatics
-
Notifications
You must be signed in to change notification settings - Fork 2
Patches to the Protocol Informatics project to make it work with a numpy.
License
tumi8/Protocol-Informatics
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
The Protocol Informatics Framework ---------------------------------- Written by Marshall Beddoe <[email protected]> Extended and modified by Lothar Braun <[email protected]> Copyright (c) 2004 Baseline Research Copyright (c) 2011 Lothar Braun Source code repository available at https://github.com/constcast/Protocol-Informatics Overview: The Protocol Informatics project is a software framework that allows for advanced sequence and protocol stream analysis by utilizing bioinformatics algorithms. The sole purpose of this software is to identify protocol fields in unknown or poorly documented network protocol formats. The algorithms that are utilized perform comparative analysis on a series of samples to better understand the underlying structure of the otherwise random-looking data. The PI framework was designed for experimentation through the use of a widget-based component set. The framework aims at including a number of different algorithms that help with identifying protocol structures from network trace. It is shipped with a command line interface that interactively allows one to control the process of inferring protocol information from network traces. Requirements: ------------- Python >= 2.4 http://www.python.org numpy http://numpy.scipy.org/ PyYAML http://pyyaml.org/ Optional: Pcapy http://oss.coresecurity.com/projects/pcapy.html Pydot http://code.google.com/p/pydot/ Controlling PI using the command line interface: ------------------------------------------------ All commands in the interface should be documented using an online help. Whenever you want to learn more about a command, just use the online help: inf> help quit Quit the program. Program start: You can start the program with or without an configuration file: ./main -c config.yml If you do not specify a configuration file, a default file 'config.yml' in the current working directory will be used. If that file does not exist, a default configuration will be loaded and stored in 'config.yml'. Whenever you make any changes to the configuration file in your program, e.g. using the "config" command, you can save your configuration using the "saveconfig" command. You can then load the configuration on program start. If you set the configuration parameters "inputFile" and "format" in your config, PI will automatically try to read input from this file. Reading input data: There are basically two ways to read sequences into your environment. You can set the "inputFile" and "format" configuration variables, save your config using "saveconfig", and restart the program using the "restart" command. Or you can explicitly read input using the "read" command in your environment. We will now show how first steps in the command line interface can look like. First steps: Start the program: $ ./main.py No default configuration found. Creating a default config file "config.yml". Welcome to Protocol-Informatics. What do you want to do today? inf> This creates your default configuration file with default parameters and drops you into the command line prompt. You can list the available commands using the "help" command: : inf> help : : Documented commands (type help <topic>): : ======================================== : EOF PI config env exit help quit read restart saveconfig seqs show : : inf> For each command, you can get verbose help by specifying the commands' name to the help command itself: : inf> help read : Command syntax: read [<bro|pcap|ascii|config>] <file> : : Tries to read file <file> in the specified format. If format : equals "config", a new configuration file is read from <file>. : In all other cases, input data for the protocol inferences are : read in the specified format (bro, pcap, ascii) : inf> An important command is the "config" command which can be used to read and set configuration variables. If it is run without an argument, it will print the configuration: : inf> config : ethOffset 14 : maxMessages 50 : weight 1.0 : format pcap : graph False : textBased False : configFile config.yml : messageDelimiter None : onlyUniq False : gnuplotFile None : inputFile None : interactive True The configuration parameters are important for controlling the program and will be documented in the following sections. The configuration parameters can be group by their meaning and use in the modules. For the main module, denoted by inf> there are the following important parameters: ethOffset: Important when pcap files are read: Defines the length of the ETH header. The default value is 14 (use 18 if you have a trace from a VLAN tagged network. maxMessages: Defines how many messages will be read by default from the input traces. If this is set to 0, all messages are read from the input file. onlyUniq: Controls whether only unique messages are read from the input file or if duplicate messages are allowed. Please note: this parameter depends on the connection context. If this configuration parameter is set to true, this will only remove duplicate messages from within connections. Duplicate messages that are distributed over multiple connections will still be part of the input data. inputFile: Defines the filename that will be used to read messages from format: Defines the format that is used to read the filename specified by inputFile. Possible values: - pcap - expects a pcap file as produced by tcpdump -w <filename> - bro - expects an adu file as produced by bro with the script that is shipped with this source code in bro-scripts/adu_writer.bro - ascii - expects a textfile which contains a number of messages separate with the newline character PCAP Files can easily be converted into BRO files via the following command: CD to bro-script directory <path_to_bro>/bin/bro -C -r <path_to_pcap> adu_writer.bro configFile: filename of the configuration yml file which is used to store the current config with the saveconfig command. interactive: Defines if PI should run in interactive or non-interactive mode. Currently, only interactive mode is supported. Other configuration parameters are only necessary in submodules. Currently, we have the following submodules: - seqs Offers methods for changing and looking at input data. This module allows, for example, to select a random subsample of the input data, or to only select unique messages - PI Offers the original functionality of the PI framework. Can create distance matrices, phylogeny trees and can perform multi-sequence aligning. Please find more information on the code in PI/README. Configuration parameters for the "seqs" module: messageDelimiter: This configuration parameter can be used to split messages according to a sequence of characters. fieldDelimiter: Currently unused. Configuration parameters for the PI module: graph: Decides whether graphs are written to disk gnuplotFile: Currently unused weight: Weight used to determine how many clusters are found when grouping messages according to their similarity. === Discoverer module specific config options === minWordLength: The minimum lenght of printable characters considered as a text token ASCIILowerBound: The lowest ASCII character considered as printable token (used for text classification) ASCIIUpperBound: The highest ASCII character considered as printable token (used for text classification) dumpFile: Path where to write the discoverer results to when the 'dumpresult' command is executed. The filename is taken from the inputFile configuration
About
Patches to the Protocol Informatics project to make it work with a numpy.
Resources
License
Stars
Watchers
Forks
Packages 0
No packages published
Languages
- Python 86.0%
- C 12.3%
- Zeek 1.7%