Skip to content
Permalink

Comparing changes

Choose two branches to see what’s changed or to start a new pull request. If you need to, you can also or learn more about diff comparisons.

Open a pull request

Create a new pull request by comparing changes across two branches. If you need to, you can also . Learn more about diff comparisons here.
base repository: wolever/Protocol-Informatics
Failed to load repositories. Confirm that selected base ref is valid, then try again.
Loading
base: master
Choose a base ref
...
head repository: tumi8/Protocol-Informatics
Failed to load repositories. Confirm that selected head ref is valid, then try again.
Loading
compare: master
Choose a head ref
Able to merge. These branches can be automatically merged.

Commits on Oct 19, 2011

  1. Copy the full SHA
    9fcba64 View commit details
  2. new command line switch to limit the maximum number of sequences PI s…

    …hould look at (distance matrix creating is O(N*N))
    Lothar Braun committed Oct 19, 2011
    Copy the full SHA
    3a9b4d7 View commit details

Commits on Oct 26, 2011

  1. fix missing option in usage

    Lothar Braun committed Oct 26, 2011
    Copy the full SHA
    af7c73a View commit details
  2. include vim formating hints

    Lothar Braun committed Oct 26, 2011
    Copy the full SHA
    8018c5d View commit details

Commits on Nov 21, 2011

  1. - use env to find real python version

    - introduced new commmand line switch -t (for textbased protocols)
    - use different output plugin for printing text-based protocols results
    - use pcap as default input format
    Lothar Braun committed Nov 21, 2011
    Copy the full SHA
    a023f75 View commit details
  2. Utils module: show progress bar

    Lothar Braun committed Nov 21, 2011
    Copy the full SHA
    b822dcf View commit details
  3. Progress bars for distance matrix and tree building

    Lothar Braun committed Nov 21, 2011
    Copy the full SHA
    d2c73e3 View commit details
  4. Started TextBased output module

    Lothar Braun committed Nov 21, 2011
    Copy the full SHA
    8976b30 View commit details

Commits on Nov 22, 2011

  1. Improved progress bars

    Lothar Braun committed Nov 22, 2011
    Copy the full SHA
    51b86de View commit details
  2. Basic text-based output

    Lothar Braun committed Nov 22, 2011
    Copy the full SHA
    a977025 View commit details
  3. Added Bro Scripts for PCAP preprocessing

    Lothar Braun committed Nov 22, 2011
    Copy the full SHA
    f722f3b View commit details
  4. Copy the full SHA
    2d85a4b View commit details

Commits on Nov 23, 2011

  1. Read input from Bro files

    Lothar Braun committed Nov 23, 2011
    Copy the full SHA
    60a29b0 View commit details
  2. Create eps when graphing

    Lothar Braun committed Nov 23, 2011
    Copy the full SHA
    9d860aa View commit details
  3. Moved main functionallity into PI core module

    Lothar Braun committed Nov 23, 2011
    Copy the full SHA
    fafa4e0 View commit details
  4. Started entropy anlaysis module

    Lothar Braun committed Nov 23, 2011
    Copy the full SHA
    05c0c7d View commit details
  5. Use new entropy module

    - added command line option for new module
    - introduced new code path
    Lothar Braun committed Nov 23, 2011
    Copy the full SHA
    f856997 View commit details
  6. Import submodules from PI/__init__

    Lothar Braun committed Nov 23, 2011
    Copy the full SHA
    a9fefc2 View commit details

Commits on Nov 24, 2011

  1. New configuration interface using yaml

    Lothar Braun committed Nov 24, 2011
    Copy the full SHA
    4ff5d27 View commit details
  2. Fix oonfig parsing problem

    Lothar Braun committed Nov 24, 2011
    Copy the full SHA
    d0521dd View commit details
  3. Fix eps generation

    Lothar Braun committed Nov 24, 2011
    Copy the full SHA
    94263d9 View commit details
  4. Ansi output changes

    - make termwidth variable
    - fix problems with empty lines
    Lothar Braun committed Nov 24, 2011
    Copy the full SHA
    49c737c View commit details
  5. Initialize variable before first use

    Lothar Braun committed Nov 24, 2011
    Copy the full SHA
    802cc3e View commit details
  6. Do not include empty messages into analysis set

    Lothar Braun committed Nov 24, 2011
    Copy the full SHA
    b9dc226 View commit details
  7. Split messages according to user-defined delimiter

    Lothar Braun committed Nov 24, 2011
    Copy the full SHA
    8e65a4b View commit details

Commits on Nov 25, 2011

  1. Make analysis module configurable

    Lothar Braun committed Nov 25, 2011
    Copy the full SHA
    86534b4 View commit details
  2. Require pydot only if graph is set in config file

    Lothar Braun committed Nov 25, 2011
    Copy the full SHA
    a3e541e View commit details
  3. Entropy module

    Lothar Braun committed Nov 25, 2011
    Copy the full SHA
    09c0d94 View commit details

Commits on Dec 1, 2011

  1. Only read maxMessages from file

    Lothar Braun committed Dec 1, 2011
    Copy the full SHA
    811539c View commit details
  2. Configure whether messages should be uniq or not

    Lothar Braun committed Dec 1, 2011
    Copy the full SHA
    957913e View commit details

Commits on Dec 5, 2011

  1. Started common module

    Lothar Braun committed Dec 5, 2011
    Copy the full SHA
    f042b8c View commit details
  2. Copy the full SHA
    d21d083 View commit details

Commits on Dec 7, 2011

  1. maxMessages are checked in the input modules

    Lothar Braun committed Dec 7, 2011
    Copy the full SHA
    a892cdc View commit details
  2. Started command line interface

    Lothar Braun committed Dec 7, 2011
    Copy the full SHA
    ec5c48e View commit details
  3. Do cli in own cmdinterface module

    Lothar Braun committed Dec 7, 2011
    Copy the full SHA
    3ef6853 View commit details
  4. Implemented cmdline reader

    Lothar Braun committed Dec 7, 2011
    Copy the full SHA
    24ec3b8 View commit details

Commits on Dec 8, 2011

  1. Started moving configuration to own module

    Lothar Braun committed Dec 8, 2011
    Copy the full SHA
    71aef69 View commit details

Commits on Dec 9, 2011

  1. Fixed configuration for non-interactive mode

    Lothar Braun committed Dec 9, 2011
    Copy the full SHA
    5cb0560 View commit details
  2. read configuration from cli interface

    Lothar Braun committed Dec 9, 2011
    Copy the full SHA
    5eaba4c View commit details

Commits on Dec 12, 2011

  1. Started PI cli interface

    Lothar Braun committed Dec 12, 2011
    Copy the full SHA
    41facc6 View commit details
  2. pass configuration to commandline interface

    Lothar Braun committed Dec 12, 2011
    Copy the full SHA
    d67f650 View commit details
  3. Changed parameter name, added new parameters

    - Configuration was now renamed to "config" in pycli
    - new stub for setting configuration variables
    Lothar Braun committed Dec 12, 2011
    Copy the full SHA
    7861c45 View commit details
  4. Added parameter for saving current config

    Lothar Braun committed Dec 12, 2011
    Copy the full SHA
    237e7f7 View commit details
  5. Renamed configuration to config

    Lothar Braun committed Dec 12, 2011
    Copy the full SHA
    9702a97 View commit details
  6. Read config file on startup in interactive mode

    Lothar Braun committed Dec 12, 2011
    Copy the full SHA
    0bb471d View commit details
  7. Use generic help

    Lothar Braun committed Dec 12, 2011
    Copy the full SHA
    741f150 View commit details
  8. Document more commands

    Lothar Braun committed Dec 12, 2011
    Copy the full SHA
    1980ad4 View commit details
  9. Implemented save config

    Lothar Braun committed Dec 12, 2011
    Copy the full SHA
    33534d8 View commit details
  10. Sanitize save and load config routines

    Lothar Braun committed Dec 12, 2011
    Copy the full SHA
    f875922 View commit details
  11. Implemented show and set config

    Lothar Braun committed Dec 12, 2011
    Copy the full SHA
    204985b View commit details
Showing with 10,049 additions and 475 deletions.
  1. +0 −19 .hgignore
  2. +17 −0 .project
  3. +15 −0 .pydevproject
  4. +2 −1 Makefile
  5. +198 −0 PI/README
  6. +8 −1 PI/__init__.py
  7. +59 −0 PI/core.py
  8. +26 −2 PI/distance.py
  9. +0 −154 PI/input.py
  10. +1 −0 PI/multialign.py
  11. +104 −9 PI/output.py
  12. +25 −2 PI/phylogeny.py
  13. +10 −8 PI/tree.py
  14. +17 −0 PI/util.py
  15. +209 −174 README
  16. +290 −0 bro-script/adu.bro
  17. +30 −0 bro-script/adu_writer.bro
  18. +3 −0 cmdinterface/__init__.py
  19. +343 −0 cmdinterface/cli.py
  20. +967 −0 cmdinterface/disccli.py
  21. +158 −0 cmdinterface/picli.py
  22. +168 −0 cmdinterface/seqcli.py
  23. +4 −0 common/__init__.py
  24. +123 −0 common/config.py
  25. +298 −0 common/input.py
  26. +121 −0 common/sequences.py
  27. +648 −0 discoverer/DFA.py
  28. +37 −0 discoverer/Globals.py
  29. +27 −0 discoverer/UnionFind.py
  30. +11 −0 discoverer/__init__.py
  31. +661 −0 discoverer/cluster.py
  32. +492 −0 discoverer/clustercollection.py
  33. +77 −0 discoverer/common.py
  34. +235 −0 discoverer/formatinference.py
  35. +87 −0 discoverer/formattree.py
  36. +321 −0 discoverer/message.py
  37. +127 −0 discoverer/needlewunsch.py
  38. +131 −0 discoverer/peekable.py
  39. +349 −0 discoverer/pystatistics.py
  40. +125 −0 discoverer/recursiveclustering.py
  41. +339 −0 discoverer/semanticinference.py
  42. +80 −0 discoverer/setup.py
  43. +140 −0 discoverer/splitter.py
  44. +2,071 −0 discoverer/statemachine.py
  45. +27 −0 discoverer/statistics.py
  46. +82 −0 discoverer/tests/mergesimilartests.py
  47. +23 −0 discoverer/tests/messagetests.py
  48. +62 −0 discoverer/tests/variabletests.py
  49. +16 −0 discoverer/tokenformat.py
  50. +50 −0 discoverer/tokenrepresentation.py
  51. BIN dns_requests_and_responses.dump
  52. BIN dns_requests_only.dump
  53. +1 −0 entropy/__init__.py
  54. +37 −0 entropy/entropy.py
  55. +11 −0 exampleconfig.yml
  56. +18 −0 log4py.properties
  57. +35 −0 log4py/__init__.py
  58. +206 −0 log4py/appenders.py
  59. +152 −0 log4py/config.py
  60. +91 −0 log4py/layouts.py
  61. +82 −105 main.py
  62. +2 −0 setup.py
19 changes: 0 additions & 19 deletions .hgignore

This file was deleted.

17 changes: 17 additions & 0 deletions .project
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
<?xml version="1.0" encoding="UTF-8"?>
<projectDescription>
<name>Protocol-Informatics</name>
<comment></comment>
<projects>
</projects>
<buildSpec>
<buildCommand>
<name>org.python.pydev.PyDevBuilder</name>
<arguments>
</arguments>
</buildCommand>
</buildSpec>
<natures>
<nature>org.python.pydev.pythonNature</nature>
</natures>
</projectDescription>
15 changes: 15 additions & 0 deletions .pydevproject
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<?eclipse-pydev version="1.0"?>

<pydev_project>
<pydev_property name="org.python.pydev.PYTHON_PROJECT_INTERPRETER">Default</pydev_property>
<pydev_property name="org.python.pydev.PYTHON_PROJECT_VERSION">python 2.7</pydev_property>
<pydev_pathproperty name="org.python.pydev.PROJECT_SOURCE_PATH">
<path>/Protocol-Informatics</path>
</pydev_pathproperty>
<pydev_pathproperty name="org.python.pydev.PROJECT_EXTERNAL_SOURCE_PATH">
<path>/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/PyYAML-3.10-py2.7-macosx-10.6-intel.egg</path>
<path>/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/pcapy-0.10.5-py2.7-macosx-10.6-intel.egg</path>
<path>/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/log4py.pyc</path>
</pydev_pathproperty>
</pydev_project>
3 changes: 2 additions & 1 deletion Makefile
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
all:
python setup.py build
find build -name align.so | xargs -J % cp % PI
#find build -name align.so | xargs -J % cp % PI
find build -name align.so -exec cp {} PI/ \;

install:
python setup.py install
198 changes: 198 additions & 0 deletions PI/README
Original file line number Diff line number Diff line change
@@ -0,0 +1,198 @@
The Protocol Informatics Framework
Written by Marshall Beddoe <mbeddoe@baselineresearch.net>
Copyright (c) 2004 Baseline Research
----

Overview:

The Protocol Informatics project is a software framework that allows for
advanced sequence and protocol stream analysis by utilizing bioinformatics
algorithms. The sole purpose of this software is to identify protocol fields in
unknown or poorly documented network protocol formats. The algorithms that are
utilized perform comparative analysis on a series of samples to better
understand the underlying structure of the otherwise random-looking data. The
PI framework was designed for experimentation through the use of a widget-based
component set.

Requirements:

Python >= 2.4 http://www.python.org
numpy http://numpy.scipy.org/
Pyrex http://www.cosc.canterbury.ac.nz/~greg/python/Pyrex/
Pcapy http://oss.coresecurity.com/projects/pcapy.html
Pydot http://code.google.com/p/pydot/
PyYAML http://pyyaml.org/

These requirements are available using pip or easy_install:

$ pip install numpy pydot pcapy pyyaml
$ make

This software has been tested and works correctly under:
- OpenBSD
- FreeBSD
- Linux
- MacOSX


Example usage: Analyzing the ICMP protocol

ICMP is a simple fixed length protocol.
Let's use the PI framework to discover the format.

Step 1: Gather 100 ICMP packets using tcpdump

# tcpdump -s 42 -c 100 -nl -w icmp.dump icmp

Step 2: Run dump through PI prototype

# ./main.py -g -p ./icmp.dump

Protocol Informatics Prototype (v0.01 beta)
Written by Marshall Beddoe <mbeddoe@baselineresearch.net>
Copyright (c) 2004 Baseline Research

Found 100 unique sequences in '../dumps/icmp.out'
Creating distance matrix .. complete
Creating phylogenetic tree .. complete

Discovered 1 clusters using a weight of 1.00
Performing multiple alignment on cluster 1 .. complete

Output of cluster 1
0097 x08 x00 xad x4b x05 xbe x00 x60
0039 x08 x00 x30 x54 x05 xbe x00 x26
0026 x08 x00 xf7 xb2 x05 xbe x00 x19
0015 x08 x00 x01 xdb x05 xbe x00 x0e
0048 x08 x00 x4f xdf x05 xbe x00 x2f
0040 x08 x00 xf8 xa4 x05 xbe x00 x27
0077 x08 x00 xe8 x28 x05 xbe x00 x4c
0017 x08 x00 xe8 x6c x05 xbe x00 x10
0027 x08 x00 xc3 xa9 x05 xbe x00 x1a
0087 x08 x00 xdd xc1 x05 xbe x00 x56
0081 x08 x00 x88 x42 x05 xbe x00 x50
0058 x08 x00 xb0 x42 x05 xbe x00 x39
0013 x08 x00 x3e x38 x05 xbe x00
0067 x08 x00 x99 x36 x05 xbe x00 x42
0055 x08 x00 x0f x56 x05 xbe x00 x36
0004 x08 x00 xe6 xda x05 xbe x00 x03
0028 x08 x00 x83 xd9 x05 xbe x00 x1b
0095 x08 x00 xc1 xd9 x05 xbe x00 x5e
0075 x08 x00 x3a x63 x05 xbe x00 x4a
0053 x08 x00 x6d x2a x05 xbe x00 x34
0021 x08 x00 x6d x8d x05 xbe x00 x14
0088 x08 x00 xa8 x07 x05 xbe x00 x57
0005 x08 x00 xa8 x8a x05 xbe x00 x04
0080 x08 x00 xa8 x62 x05 xbe x00 x4f
0023 x08 x00 x3f x18 x05 xbe x00 x16
0002 x08 x00 x3f x65 x05 xbe x00 x01
0074 x08 x00 x3f xc2 x05 xbe x00 x49
0030 x08 x00 x3f x15 x05 xbe x00 x1d
0044 x08 x00 xcc xc2 x05 xbe x00 x2b
0078 x08 x00 xcc x8a x05 xbe x00 x4d
0071 x08 x00 xd8 x18 x05 xbe x00 x46
0035 x08 x00 x9a xfd x05 xbe x00 x22
0001 x08 x00 x69 xf9 x05 xbe x00 x00
0034 x08 x00 xc5 x9e x05 xbe x00 x21
0031 x08 x00 x38 x00 x05 xbe x00 x1e
0092 x08 x00 x38 x4c x05 xbe x00 x5b
0100 x08 x00 x2b x1a x05 xbe x00 x63
0049 x08 x00 x15 x1d x05 xbe x00 x30
0008 x08 x00 x2f x64 x05 xbe x00 x07
0089 x08 x00 x80 xe5 x05 xbe x00 x58
0096 x08 x00 xb2 xb0 x05 xbe x00 x5f
0079 x08 x00 xc2 xae x05 xbe x00 x4e
0057 x08 x00 xc2 x79 x05 xbe x00 x38
0046 x08 x00 x77 x7a x05 xbe x00 x2d
0018 x08 x00 xbb xce x05 xbe x00 x11
0025 x08 x00 xfe xaa x05 xbe x00 x18
0068 x08 x00 x50 xe3 x05 xbe x00 x43
0065 x08 x00 xe0 xb7 x05 xbe x00 x40
0011 x08 x00 x8d xd6 x05 xbe x00
0029 x08 x00 x7c xf3 x05 xbe x00 x1c
0033 x08 x00 xef xf3 x05 xbe x00
0069 x08 x00 x25 x6b x05 xbe x00 x44
0083 x08 x00 x25 xff x05 xbe x00 x52
0099 x08 x00 x56 x99 x05 xbe x00 x62
0061 x08 x00 x33 x81 x05 xbe x00 x3c
0050 x08 x00 xe9 xba x05 xbe x00 x31
0042 x08 x00 xb3 x49 x05 xbe x00 x29
0059 x08 x00 x81 x4e x05 xbe x00 x3a
0098 x08 x00 x81 xad x05 xbe x00 x61
0091 x08 x00 x42 xa0 x05 xbe x00 x5a
0054 x08 x00 x42 xd8 x05 xbe x00 x35
0037 x08 x00 x4c xe8 x05 xbe x00 x24
0041 x08 x00 xeb x4d x05 xbe x00 x28
0086 x08 x00 xe4 x53 x05 xbe x00 x55
0006 x08 x00 x71 x7b x05 xbe x00 x05
0012 x08 x00 x63 x7b x05 xbe x00
0070 x08 x00 xee x7d x05 xbe x00 x45
0051 x08 x00 xc8 x57 x05 xbe x00 x32
0066 x08 x00 xb4 x3c x05 xbe x00 x41
0014 x08 x00 x2c x26 x05 xbe x00
0062 x08 x00 x2c x7c x05 xbe x00 x3d
0016 x08 x00 xed x8e x05 xbe x00 x0f
0007 x08 x00 x47 x3d x05 xbe x00 x06
0073 x08 x00 x5e x72 x05 xbe x00 x48
0052 x08 x00 x9e x06 x05 xbe x00 x33
0072 x08 x00 x9e x9d x05 xbe x00 x47
0036 x08 x00 x6f x6e x05 xbe x00 x23
0060 x08 x00 x6c xc6 x05 xbe x00 x3b
0045 x08 x00 xa2 xf5 x05 xbe x00 x2c
0085 x08 x00 x00 x47 x05 xbe x00 x54
0076 x08 x00 x14 x85 x05 xbe x00 x4b
0020 x08 x00 xa0 x85 x05 xbe x00 x13
0019 x08 x00 xa6 x2c x05 xbe x00 x12
0003 x08 x00 x14 x2c x05 xbe x00 x02
0022 x08 x00 x44 x8c x05 xbe x00 x15
0082 x08 x00 x5d xe0 x05 xbe x00 x51
0009 x08 x00 xfc x41 x05 xbe x00 x08
0084 x08 x00 x35 x05 xbe x00 x53
0032 x08 x00 x0e x17 x05 xbe x00 x1f
0056 x08 x00 xe5 x05 xbe x00 x37
0043 x08 x00 xa1 xde x05 xbe x00 x2a
0094 x08 x00 x03 x92 x05 xbe x00 x5d
0047 x08 x00 x55 x83 x05 xbe x00 x2e
0090 x08 x00 x55 x94 x05 xbe x00 x59
0064 x08 x00 x8f x05 xbe x00 x3f
0093 x08 x00 xb6 x05 xbe x00 x5c
0010 x08 x00 xd1 xb6 x05 xbe x00
0024 x08 x00 x11 x8f x05 xbe x00 x17
0063 x08 x00 x11 x04 x05 xbe x00 x3e
0038 x08 x00 x37 x3b x05 xbe x00 x25
DT BBB ZZZ BBB BBB BBB BBB ZZZ AAA
MT 000 000 081 089 000 000 000 100

Ungapped Consensus:
CONS x08 x00 x3f x18 x05 xbe x00 ???
DT BBB ZZZ BBB BBB BBB BBB ZZZ AAA
MT 000 000 081 089 000 000 000 100

Step 3: Analyze Consensus Sequence

Pay attention to datatype composition and mutation rate.

Offset 0: Binary data, 0% mutation rate
Offset 1: Zeroed data, 0% mutation rate
Offset 2: Binary data, 81% mutation rate
Offset 3: Binary data, 89% mutation rate
Offset 4: Binary data, 0% mutation rate
Offset 5: Binary data, 0% mutation rate
Offset 6: Zeroed data, 0% mutation rate
Offset 7: ASCII data, 100% mutation rate

Using this information we can construct the structure of the format:

[ 1 byte ] [ 1 byte ] [ 2 byte ] [ 2 byte ] [ 1 byte ] [ 1 byte ]

The real format of an ICMP message:

[ 1 byte ] [ 1 byte ] [ 2 byte ] [ 2 byte ] [ 2 byte ]

The reason PI made the mistake in identifying the last field was due to the
fact that the last field in an ICMP packet is a 16 bit sequence identifier.
We only gathered 100 packets therefore the greatest significant byte never
changed as the field incremented.

Therefore, it is very important to gather data efficiently as PI is only as
good as the data that is fed to it.
9 changes: 8 additions & 1 deletion PI/__init__.py
Original file line number Diff line number Diff line change
@@ -1 +1,8 @@
__all__ = [ "input", "distance", "phylogeny", "multialign", "output" ]
import distance
import phylogeny
import multialign
import output
import core

# vim: set sts=4 sw=4 cindent nowrap expandtab:

59 changes: 59 additions & 0 deletions PI/core.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,59 @@
import distance, phylogeny, multialign, output

def pi_core(sequences, weight, graph, textBased):
#
# Create distance matrix (LocalAlignment, PairwiseIdentity, Entropic)
#
print "Creating distance matrix ..."
dmx = distance.LocalAlignment(sequences)
print "complete"

#
# Pass distance matrix to phylogenetic creation function
#
print "Creating phylogenetic tree ..."
phylo = phylogeny.UPGMA(sequences, dmx, minval=weight)
print ""

#
# Output some pretty graphs of each cluster
#
if graph:
cnum = 1
for cluster in phylo:
out = "graph-%d" % cnum
print "Creating %s .." % out,
cluster.graph(out)
print "complete"
cnum += 1

print "\nDiscovered %d clusters using a weight of %.02f" % \
(len(phylo), weight)

#
# Perform progressive multiple alignment against clusters
#
i = 1
alist = []
for cluster in phylo:
print "Performing multiple alignment on cluster %d .." % i,
aligned = multialign.NeedlemanWunsch(cluster)
print "complete"
alist.append(aligned)
i += 1
print ""

#
# Display each cluster of aligned sequences
#
i = 1
for seqs in alist:
print "Output of cluster %d" % i
if textBased:
output.TextBased(seqs)
else:
output.Ansi(seqs)
i += 1
print ""


28 changes: 26 additions & 2 deletions PI/distance.py
Original file line number Diff line number Diff line change
@@ -1,3 +1,4 @@
# vim: set sts=4 sw=4 cindent nowrap expandtab:

"""
Distance module
@@ -15,6 +16,7 @@
#

import align, zlib
import util
from numpy import *

__all__ = [ "Distance", "Entropic", "PairwiseIdentity", "LocalAlignment" ]
@@ -23,7 +25,19 @@ class Distance:

"""Implementation of classify base class"""

def __init__(self, sequences):
def __init__(self, flowBasedSequences):
# Note: messages may now be grouped by flow/connection identifiers
# Since we neither have any notion to distinguish different flow
# directions nor distingush between different connections, we
# now need to merge this into a single sequences field
counter = 0
sequences = []
for i in flowBasedSequences:
flowInfo = flowBasedSequences[i]
for seq in flowInfo.sequences:
sequences.append((counter, seq.sequence))
counter += 1

self.sequences = sequences
self.N = len(sequences)

@@ -190,9 +204,14 @@ def _go(self):
#
# Compute similarity matrix of SW scores
#
progress = 0
for i in range(self.N):
for j in range(self.N):

if progress % (self.N * self.N / 100) == 0:
util.progress(100, float(progress) / (self.N * self.N) * 100)
progress += 1

if similar[i][j] >= 0:
continue

@@ -204,6 +223,7 @@ def _go(self):

similar[i][j] = similar[j][i] = score

util.progress(100,100)
#
# Compute distance matrix of SW scores
#
@@ -213,5 +233,9 @@ def _go(self):
if self.dmx[i][j] >= 0:
continue

self.dmx[i][j] = 1 - (similar[i][j] / similar[i][i])
#print similar[i][j], " ",similar[i][i]
if similar[i][i] != 0:
self.dmx[i][j] = 1 - (similar[i][j] / similar[i][i])
else:
self.dmx[i][j] = 1
self.dmx[j][i] = self.dmx[i][j]
Loading