Skip to content

Releases: michelcrypt4d4mus/pdfalyzer

1.10.1

09 Oct 05:27
Compare
Choose a tag to compare

Use rich_argparse_plus

1.10.0

07 Oct 22:28
Compare
Choose a tag to compare
  • --streams arg now takes an optional PDF object ID
  • --fonts no longer takes an optional PDF object ID
  • YARA matches will display more than 512 bytes
  • Improved output formatting

1.9.0

07 Oct 04:02
Compare
Choose a tag to compare
  • Scan all binary streams, not just fonts. Separate --streams option is provided. (--font option has much less output)
  • Display MD5, SHA1, and SHA256 for all binary streams as well as overall file

1.8.3

06 Oct 22:33
Compare
Choose a tag to compare

1.8.3

  • Highlight suspicious instructions in red, BOMs in green
  • Reenable guillemet quote matching
  • Clearer labeling of binary scan results
  • Sync with yaralyzer v0.4.0

1.8.2

  • Sync with yaralyzer v0.3.3

1.8.1

  • Show defaults and valid values for command line options

1.8.0

  • Add table of stream lengths for PDF objects containing streams to --doc-info output
  • Quote extraction API methods should use yara, not bespoke extraction
  • Fix bug with rich tree view of non binary streams

1.7.0

  • Use yaralyzer as the match engine
  • Scan all binary streams, not just the fonts

1.6.0

03 Oct 02:23
Compare
Choose a tag to compare
  • Integrate YARA scanning - all the rules I could dig up relating to PDFs
  • Add MD5, SHA1, SHA256 to document info section
  • pdfalyzer_show_color_theme script shows the theme
  • Make README more PyPi friendly

1.5.0

30 Sep 07:21
Compare
Choose a tag to compare

Bunch of small changes to support releasing on pypi

  • Invoke with shell command pdfalyze instead of local python file ./pdfalyzer.py (options are the same)
  • Core class renames: PdfWalker -> Pdfalyzer, DataStreamHandler -> BinaryScanner
  • Permanent env var configuration moved from a file called .env to a file called .pdfalyzer
  • Logging to a file is off unless configured by env var
  • To use Didier Stevens's pdf-parser.py you must provide the PDFALYZER_PDF_PARSER_PY_PATH env var

1.4.0

29 Sep 09:33
Compare
Choose a tag to compare
  • Hexadecimal representation of matched bytes in decode attempts table
  • --quote-type option to limit binary scans
  • --min-decode-length option to skip decode attempts on short matches
  • --file-suffix option
  • Output filenames will contain some of the options used to generate them
  • Add runtime params to export filenames where it is material to the output
  • Ensure /OpenAction etc are not subsumed by parent/child relationships in the condensed tree view
  • Tweak available configuration options for logging to file

v.1.3.1

28 Sep 01:02
Compare
Choose a tag to compare

Small bugfix in verification step

Improved binary scanning, summary stats, config options, more

26 Sep 09:27
Compare
Choose a tag to compare

1.3.0

General

  • Improved scanning of binaries for UTF-X encoded data where X is not a prime number.
  • Lots of summary data is now displayed about what were the most and least successful encodings at extracting some meaning (or at least not failing) from binary sequences surrounded by quote chars, frong slashes, backticks, etc etc.
  • Will execute "by the book" decodes using normally untested encodings if the chardet.detect() library feels strongly enough about it.
  • Exporting SVGs, HTML, and colored text can be done in a single invocation.

Logging

  • Invocations of the tool are now logged in a history file log/pdfalyzer.invocation.log
  • Logging to a file can be enabled by setting a PDFALYZER_LOG_DIR environment variable but see comments in .env.example about side effects.

Command line options

  • --maximize-width arg means you can set yr monitor to teeny tiny fonts and print out absolutely monstrous SVGs (yay!)
  • --chardet-cutoff option lets you control the the cutoff for adding untested encodings to the output based on what chardet.detect() thinks is the right encoding
  • --suppress-chardet command line option removes the chardet tables that are (mostly) duplicative of the decoded text tables
  • --output-dir and --file-prefix are now shared by all the export modes
  • You can use dotenv to permanently turn on or off or change the value of some command line options; see .env.example for mdetails on what is configurable.

Visualizations

  • Default TerminalTheme colors kind of sucked when you went to export SVGs and HTML... like black was not black, or even close. Things are simpler now - black is black, blue is blue, etc. Makes exports look better.

Bugfixes

  • Binary data highlighting now goes all the way to the end of the matched string in most cases (small bug had it falling 1-4 chars behind sometimes)
  • Fix small bug with exporting font/binary details to SVGs
  • Fix `Win-
  • BytesMatch class to keep track of binary regex matches
  • Group suppression notifications together

v1.2.0 Large expansion in binary data scouring, visualizing, etc

22 Sep 00:02
Compare
Choose a tag to compare

1.2.0

  • Dramatic expansion in the pdfalyzer's binary data scouring capabilities:
    • Add chardet library guesses as to the encoding of all unknown byte sequences and ranks them from most to least likely
    • Add attempted decodes of all backtick, frontslash, single, double, and guillemet quoted strings in font binaries
    • Add decode attempts with Windows-1252, UTF-7, and UTF-16 encodings
    • Add --suppress-decodes to suppress attempted decodes of quoted strings in font binaries
    • Cool art gets generated when you swarm a binaries quoted strings, which are mostly but not totally random
  • The --font option takes an optional argument to limit the output to a single font ID
  • Add --limit-decodes to suppress attempted decodes of quoted strings in font binaries over a certain length
  • Add --surrounding option to specify number of bytes to print/decode before and after suspicious bytes; decrease default number of surrounding bytes
  • Add --version option
  • extract_guillemet_quoted_bytes() and extract_backtick_quoted_bytes() are now iterators
  • Fix scanning for UTF-16 BOM in font binary