Skip to content

Commit

Permalink
Code review: 255980043: Added xlsx output module log2timeline#263
Browse files Browse the repository at this point in the history
  • Loading branch information
dc3-plaso authored and joachimmetz committed Dec 31, 2015
1 parent 5e2294e commit 7f03df3
Show file tree
Hide file tree
Showing 11 changed files with 460 additions and 8 deletions.
2 changes: 1 addition & 1 deletion .travis.yml
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@ language: python
python:
- "2.7"
before_install:
- if test `uname -s` = 'Linux'; then sudo add-apt-repository ppa:gift/dev -y && sudo apt-get update -q && sudo apt-get install ipython libbde-python libesedb-python libevt-python libevtx-python libewf-python libfwsi-python liblnk-python libmsiecf-python libolecf-python libqcow-python libregf-python libsigscan-python libsmdev-python libsmraw-python libtsk libvhdi-python libvmdk-python libvshadow-python python-artifacts python-bencode python-binplist python-construct python-coverage python-coveralls python-dateutil python-dfvfs python-docopt python-dpkt python-hachoir-core python-hachoir-metadata python-hachoir-parser python-mock python-pefile python-protobuf python-psutil python-pyparsing python-requests python-six python-yaml python-tz pytsk3; fi
- if test `uname -s` = 'Linux'; then sudo add-apt-repository ppa:gift/dev -y && sudo apt-get update -q && sudo apt-get install ipython libbde-python libesedb-python libevt-python libevtx-python libewf-python libfwsi-python liblnk-python libmsiecf-python libolecf-python libqcow-python libregf-python libsigscan-python libsmdev-python libsmraw-python libtsk libvhdi-python libvmdk-python libvshadow-python python-artifacts python-bencode python-binplist python-construct python-coverage python-coveralls python-dateutil python-dfvfs python-docopt python-dpkt python-hachoir-core python-hachoir-metadata python-hachoir-parser python-mock python-pefile python-protobuf python-psutil python-pyparsing python-requests python-six python-xlsxwriter python-yaml python-tz pytsk3; fi
- sudo pip install ipython --upgrade
script:
- ./run_tests.py
Expand Down
2 changes: 1 addition & 1 deletion config/dpkg/control
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ Homepage: https://github.com/log2timeline/plaso/
Package: python-plaso
Architecture: all
Depends: libprotobuf7 | libprotobuf8 | libprotobuf9, libyaml-0-2, libbde-python, libesedb-python, libevt-python, libevtx-python, libewf-python, libfwsi-python, liblnk-python, libmsiecf-python, libolecf-python, libqcow-python, libregf-python, libtsk, libsigscan-python, libsmdev-python, libsmraw-python, libvhdi-python, libvmdk-python, libvshadow-python, ipython, python-artifacts, python-bencode, python-binplist, python-construct, python-dateutil, python-dfvfs, python-dpkt, python-hachoir-core, python-hachoir-metadata, python-hachoir-parser, python-mock, python-pefile, python-protobuf, python-psutil, python-pyparsing, python-six, python-yaml, python-tz, pytsk3, ${shlibs:Depends}, ${misc:Depends}
Recommends: elasticsearch, libesedb-tools, libbde-tools, libevt-tools, libevtx-tools, libewf-tools, liblnk-tools, libmsiecf-tools, libolecf-tools, libqcow-tools, libregf-tools, libsmdev-tools, libsmraw-tools, libvhdi-tools, libvmdk-tools, libvshadow-tools, libtsk-dev, python-requests, pyelasticsearch, sleuthkit
Recommends: elasticsearch, libesedb-tools, libbde-tools, libevt-tools, libevtx-tools, libewf-tools, liblnk-tools, libmsiecf-tools, libolecf-tools, libqcow-tools, libregf-tools, libsmdev-tools, libsmraw-tools, libvhdi-tools, libvmdk-tools, libvshadow-tools, libtsk-dev, pyelasticsearch, python-requests, python-xlsxwriter, sleuthkit
Description: Plaso Log2Timeline
Log2Timeline is a framework to create super timelines.
It is a framework to parse various files and collect time-based
Expand Down
26 changes: 26 additions & 0 deletions config/licenses/LICENSE.xlsxwriter
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
Copyright (c) 2013, John McNamara <[email protected]>
All rights reserved.

Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions are met:

1. Redistributions of source code must retain the above copyright notice, this
list of conditions and the following disclaimer.
2. Redistributions in binary form must reproduce the above copyright notice,
this list of conditions and the following disclaimer in the documentation
and/or other materials provided with the distribution.

THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND
ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR
ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
(INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND
ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

The views and conclusions contained in the software and documentation are those
of the authors and should not be interpreted as representing official policies,
either expressed or implied, of the FreeBSD Project.
1 change: 1 addition & 0 deletions plaso/cli/helpers/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -10,3 +10,4 @@
from plaso.cli.helpers import viper_analysis
from plaso.cli.helpers import virustotal_analysis
from plaso.cli.helpers import windows_services_analysis
from plaso.cli.helpers import xlsx_output
61 changes: 61 additions & 0 deletions plaso/cli/helpers/xlsx_output.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,61 @@
# -*- coding: utf-8 -*-
"""The arguments helper for the xlsx output module."""

from plaso.lib import errors
from plaso.cli.helpers import interface
from plaso.cli.helpers import manager
from plaso.output import xlsx


class XlsxOutputHelper(interface.ArgumentsHelper):
"""CLI arguments helper class for the XLSX output module."""

NAME = u'xlsx'
CATEGORY = u'output'
DESCRIPTION = u'Argument helper for the XLSX output module.'

_DEFAULT_TIMESTAMP_FORMAT = u'YYYY-MM-DD HH:MM:SS.000'

@classmethod
def AddArguments(cls, argument_group):
"""Add command line arguments the helper supports to an argument group.
This function takes an argument parser or an argument group object and adds
to it all the command line arguments this helper supports.
Args:
argument_group: the argparse group (instance of argparse._ArgumentGroup or
or argparse.ArgumentParser).
"""
argument_group.add_argument(
u'--timestamp_format', dest=u'timestamp_format', type=unicode,
action=u'store', default=u'', help=(
u'Set the timestamp format that will be used in the datetime'
u'column of the XLSX spreadsheet.'))

@classmethod
def ParseOptions(cls, options, output_module):
"""Parses and validates options.
Args:
options: the parser option object (instance of argparse.Namespace).
output_module: an output module (instance of OutputModule).
Raises:
BadConfigObject: when the output module object is of the wrong type.
BadConfigOption: when a configuration parameter fails validation.
"""
if not isinstance(output_module, xlsx.XlsxOutputModule):
raise errors.BadConfigObject(
u'Output module is not an instance of XlsxOutputModule')

timestamp_format = getattr(
options, u'timestamp_format', cls._DEFAULT_TIMESTAMP_FORMAT)
output_module.SetTimestampFormat(timestamp_format)

filename = getattr(options, u'write', None)
if filename:
output_module.SetFilename(filename)


manager.ArgumentHelperManager.RegisterHelper(XlsxOutputHelper)
1 change: 1 addition & 0 deletions plaso/dependencies.py
Original file line number Diff line number Diff line change
Expand Up @@ -55,6 +55,7 @@
(u'requests', u'__version__', u'2.2.1', None),
(u'six', u'__version__', u'1.1.0', None),
(u'sqlite3', u'sqlite_version', u'3.7.8', None),
(u'xlsxwriter', u'__version__', u'0.6.5', None),
(u'yaml', u'__version__', u'3.10', None)]

# The tuple values are:
Expand Down
32 changes: 28 additions & 4 deletions plaso/lib/utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,11 +2,18 @@
"""This file contains utility functions."""

import logging
import re

from plaso.lib import errors
from plaso.lib import lexer


# Illegal Unicode characters for XML.
ILLEGAL_XML_RE = re.compile(
ur'[\x00-\x08\x0b-\x1f\x7f-\x84\x86-\x9f'
ur'\ud800-\udfff\ufdd0-\ufddf\ufffe-\uffff]')


def IsText(bytes_in, encoding=None):
"""Examine the bytes in and determine if they are indicative of a text.
Expand Down Expand Up @@ -42,7 +49,7 @@ def IsText(bytes_in, encoding=None):
return is_ascii

# Is this already a unicode text?
if type(bytes_in) == unicode:
if isinstance(bytes_in, unicode):
return True

# Check if this is UTF-8
Expand Down Expand Up @@ -77,7 +84,7 @@ def IsText(bytes_in, encoding=None):

def GetUnicodeString(string):
"""Converts the string to Unicode if necessary."""
if type(string) != unicode:
if not isinstance(string, unicode):
return str(string).decode('utf8', 'ignore')
return string

Expand Down Expand Up @@ -134,10 +141,10 @@ def GetInodeValue(inode_raw):
Returns:
An integer inode value.
"""
if type(inode_raw) in (int, long):
if isinstance(inode_raw, (int, long)):
return inode_raw

if type(inode_raw) is float:
if isinstance(inode_raw, float):
return int(inode_raw)

try:
Expand All @@ -149,3 +156,20 @@ def GetInodeValue(inode_raw):
return int(inode_string)
except ValueError:
return -1


def RemoveIllegalXMLCharacters(string, replacement=u'\ufffd'):
"""Removes illegal Unicode characters for XML.
Args:
string: A string to replace all illegal characters for XML.
replacement: A replacement character to use in replacement of all
found illegal characters.
Return:
A string where all illegal Unicode characters for XML have been removed.
If the input is not a string it will be returned unchanged."""
if isinstance(string, basestring):
return ILLEGAL_XML_RE.sub(replacement, string)
return string

1 change: 1 addition & 0 deletions plaso/output/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -11,3 +11,4 @@
from plaso.output import sqlite_4n6time
from plaso.output import timesketch_out
from plaso.output import tln
from plaso.output import xlsx
4 changes: 2 additions & 2 deletions plaso/output/dynamic.py
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,7 @@ class DynamicOutputModule(interface.LinearOutputModule):
# a callback function that formats the field value.
# They should be documented here:
# http://plaso.kiddaland.net/usage/psort/output
_FIELD_FORMAT_CALLBACKS = {
FIELD_FORMAT_CALLBACKS = {
u'date': u'_FormatDate',
u'datetime': u'_FormatDateTime',
u'description': u'_FormatMessage',
Expand Down Expand Up @@ -348,7 +348,7 @@ def WriteEventBody(self, event_object):
"""
row = []
for field in self._fields:
callback_name = self._FIELD_FORMAT_CALLBACKS.get(field, None)
callback_name = self.FIELD_FORMAT_CALLBACKS.get(field, None)
callback_function = None
if callback_name:
callback_function = getattr(self, callback_name, None)
Expand Down
164 changes: 164 additions & 0 deletions plaso/output/xlsx.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,164 @@
# -*- coding: utf-8 -*-
"""Output module for the Excel Spreadsheet (XLSX) output format."""

import datetime
import logging
import os

try:
import xlsxwriter
except ImportError:
xlsxwriter = None

from plaso.lib import timelib
from plaso.lib import utils
from plaso.output import dynamic
from plaso.output import manager


class XlsxOutputModule(dynamic.DynamicOutputModule):
"""Output module for the Excel Spreadsheet (XLSX) output format."""

NAME = u'xlsx'
DESCRIPTION = u'Excel Spreadsheet (XLSX) output'

_MAX_COLUMN_WIDTH = 50
_MIN_COLUMN_WIDTH = 6

def __init__(self, output_mediator):
"""Initializes the output module object.
Args:
output_mediator: The output mediator object (instance of OutputMediator).
"""
super(XlsxOutputModule, self).__init__(output_mediator)
self._column_widths = {}
self._current_row = 0
self._filename = None
self._sheet = None
self._timestamp_format = None
self._workbook = None

def _FormatDateTime(self, event_object):
"""Formats the date to a datetime object without timezone information.
Note: Timezone information must be removed due to lack of support
by xlsxwriter and Excel.
Args:
event_object: the event object (instance of EventObject).
Returns:
A datetime object (instance of datetime.datetime) of the event object's
timestamp or the Excel epoch (the null time according to Excel)
on an OverflowError.
"""
try:
timestamp = timelib.Timestamp.CopyToDatetime(
event_object.timestamp, timezone=self._output_mediator.timezone,
raise_error=True)

return timestamp.replace(tzinfo=None)

except OverflowError as exception:
logging.error((
u'Unable to copy {0:d} into a human readable timestamp with error: '
u'{1:s}. Event {2:d}:{3:d} triggered the exception.').format(
event_object.timestamp, exception,
getattr(event_object, u'store_number', u''),
getattr(event_object, u'store_index', u'')))

return datetime.datetime(1899, 12, 31)

def Close(self):
"""Closes the output."""
self._workbook.close()

def Open(self):
"""Creates a new workbook."""
if not self._filename:
raise ValueError((
u'Unable to create XlSX workbook. Output filename was not provided.'))

if os.path.isfile(self._filename):
raise IOError((
u'Unable to use an already existing file for output '
u'[{0:s}]').format(self._filename))

self._workbook = xlsxwriter.Workbook(
self._filename,
{u'constant_memory': True, u'strings_to_urls': False,
u'default_date_format': self._timestamp_format})
self._sheet = self._workbook.add_worksheet(u'Sheet')
self._current_row = 0

def SetFilename(self, filename):
"""Sets the filename.
Args:
filename: the filename.
"""
self._filename = filename

def SetTimestampFormat(self, timestamp_format):
"""Set the timestamp format to use for the datetime column.
Args:
timestamp_format: A string that describes the way to format the datetime.
"""
self._timestamp_format = timestamp_format

def WriteEventBody(self, event_object):
"""Writes the body of an event object to the spreadsheet.
Args:
event_object: the event object (instance of EventObject).
"""
for field in self._fields:
callback_name = self.FIELD_FORMAT_CALLBACKS.get(field, None)
callback_function = None
if callback_name:
callback_function = getattr(self, callback_name, None)

if callback_function:
value = callback_function(event_object)
else:
value = getattr(event_object, field, u'-')

if not isinstance(value, (bool, int, long, float, datetime.datetime)):
value = utils.GetUnicodeString(value)
value = utils.RemoveIllegalXMLCharacters(value)

# Auto adjust column width based on length of value.
column_index = self._fields.index(field)
self._column_widths.setdefault(column_index, 0)
self._column_widths[column_index] = max(
self._MIN_COLUMN_WIDTH,
self._column_widths[column_index],
min(self._MAX_COLUMN_WIDTH, len(utils.GetUnicodeString(value)) + 2))
self._sheet.set_column(
column_index, column_index, self._column_widths[column_index])

if field in [u'datetime', u'timestamp']:
self._sheet.write_datetime(
self._current_row, column_index, value)
else:
self._sheet.write(self._current_row, column_index, value)

self._current_row += 1

def WriteHeader(self):
"""Writes the header to the spreadsheet."""
self._column_widths = {}
bold = self._workbook.add_format({u'bold': True})
bold.set_align(u'center')
for index, field in enumerate(self._fields):
self._sheet.write(self._current_row, index, field, bold)
self._column_widths[index] = len(field) + 2
self._current_row += 1
self._sheet.autofilter(0, len(self._fields)-1, 0, 0)
self._sheet.freeze_panes(1, 0)


manager.OutputManager.RegisterOutput(
XlsxOutputModule, disabled=xlsxwriter is None)

Loading

0 comments on commit 7f03df3

Please sign in to comment.