Skip to content

Navigation Menu

Explore
By company size
By use case
By industry
View all solutions
Topics
- AI
- DevOps
- Security
- Software Development
- View all
Explore
- GitHub Sponsors
  Fund open source developers
- The ReadME Project
  GitHub community articles
Repositories
- Enterprise platform
  AI-powered developer platform
Available add-ons
Pricing

Search code, repositories, users, issues, pull requests...

Search

Clear

Search syntax tips

Provide feedback

We read every piece of feedback, and take your input very seriously.

Include my email address so I can be contacted

Saved searches

Use saved searches to filter your results more quickly

Name

Query

To see all available qualifiers, see our documentation.

You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session. You switched accounts on another tab or window. Reload to refresh your session.

Dismiss alert

michelcrypt4d4mus / pdfalyzer Public

Notifications You must be signed in to change notification settings
Fork 19
Star 249

Code
Issues 2
Pull requests 7
Discussions
Actions
Security
Insights

Additional navigation options

Code
Issues
Pull requests
Discussions
Actions
Security
Insights

Releases: michelcrypt4d4mus/pdfalyzer

Releases · michelcrypt4d4mus/pdfalyzer

1.10.1

09 Oct 05:27

michelcrypt4d4mus

Compare

Choose a tag to compare

Loading

1.10.1

Use rich_argparse_plus

Assets 2

Loading

All reactions

1.10.0

07 Oct 22:28

michelcrypt4d4mus

Compare

Choose a tag to compare

Loading

1.10.0

--streams arg now takes an optional PDF object ID
--fonts no longer takes an optional PDF object ID
YARA matches will display more than 512 bytes
Improved output formatting

Assets 2

Loading

All reactions

1.9.0

07 Oct 04:02

michelcrypt4d4mus

Compare

Choose a tag to compare

Loading

1.9.0

Scan all binary streams, not just fonts. Separate --streams option is provided. (--font option has much less output)
Display MD5, SHA1, and SHA256 for all binary streams as well as overall file

Assets 2

Loading

All reactions

1.8.3

06 Oct 22:33

michelcrypt4d4mus

Compare

Choose a tag to compare

Loading

1.8.3

1.8.3

Highlight suspicious instructions in red, BOMs in green
Reenable guillemet quote matching
Clearer labeling of binary scan results
Sync with yaralyzer v0.4.0

1.8.2

Sync with yaralyzer v0.3.3

1.8.1

Show defaults and valid values for command line options

1.8.0

Add table of stream lengths for PDF objects containing streams to --doc-info output
Quote extraction API methods should use yara, not bespoke extraction
Fix bug with rich tree view of non binary streams

1.7.0

Use yaralyzer as the match engine
Scan all binary streams, not just the fonts

Assets 2

Loading

All reactions

1.6.0

03 Oct 02:23

michelcrypt4d4mus

Compare

Choose a tag to compare

Loading

1.6.0

Integrate YARA scanning - all the rules I could dig up relating to PDFs
Add MD5, SHA1, SHA256 to document info section
pdfalyzer_show_color_theme script shows the theme
Make README more PyPi friendly

Assets 2

Loading

All reactions

1.5.0

30 Sep 07:21

michelcrypt4d4mus

Compare

Choose a tag to compare

Loading

1.5.0

Bunch of small changes to support releasing on pypi

Invoke with shell command pdfalyze instead of local python file ./pdfalyzer.py (options are the same)
Core class renames: PdfWalker -> Pdfalyzer, DataStreamHandler -> BinaryScanner
Permanent env var configuration moved from a file called .env to a file called .pdfalyzer
Logging to a file is off unless configured by env var
To use Didier Stevens's pdf-parser.py you must provide the PDFALYZER_PDF_PARSER_PY_PATH env var

Assets 3

Loading

All reactions

1.4.0

29 Sep 09:33

michelcrypt4d4mus

Compare

Choose a tag to compare

Loading

1.4.0

Hexadecimal representation of matched bytes in decode attempts table
--quote-type option to limit binary scans
--min-decode-length option to skip decode attempts on short matches
--file-suffix option
Output filenames will contain some of the options used to generate them
Add runtime params to export filenames where it is material to the output
Ensure /OpenAction etc are not subsumed by parent/child relationships in the condensed tree view
Tweak available configuration options for logging to file

Assets 2

Loading

All reactions

v.1.3.1

28 Sep 01:02

michelcrypt4d4mus

Compare

Choose a tag to compare

Loading

v.1.3.1

Small bugfix in verification step

Assets 2

Loading

All reactions

Improved binary scanning, summary stats, config options, more

26 Sep 09:27

michelcrypt4d4mus

Compare

Choose a tag to compare

Loading

Improved binary scanning, summary stats, config options, more

1.3.0

General

Improved scanning of binaries for UTF-X encoded data where X is not a prime number.
Lots of summary data is now displayed about what were the most and least successful encodings at extracting some meaning (or at least not failing) from binary sequences surrounded by quote chars, frong slashes, backticks, etc etc.
Will execute "by the book" decodes using normally untested encodings if the chardet.detect() library feels strongly enough about it.
Exporting SVGs, HTML, and colored text can be done in a single invocation.

Logging

Invocations of the tool are now logged in a history file log/pdfalyzer.invocation.log
Logging to a file can be enabled by setting a PDFALYZER_LOG_DIR environment variable but see comments in .env.example about side effects.

Command line options

--maximize-width arg means you can set yr monitor to teeny tiny fonts and print out absolutely monstrous SVGs (yay!)
--chardet-cutoff option lets you control the the cutoff for adding untested encodings to the output based on what chardet.detect() thinks is the right encoding
--suppress-chardet command line option removes the chardet tables that are (mostly) duplicative of the decoded text tables
--output-dir and --file-prefix are now shared by all the export modes
You can use dotenv to permanently turn on or off or change the value of some command line options; see .env.example for mdetails on what is configurable.

Visualizations

Default TerminalTheme colors kind of sucked when you went to export SVGs and HTML... like black was not black, or even close. Things are simpler now - black is black, blue is blue, etc. Makes exports look better.

Bugfixes

Binary data highlighting now goes all the way to the end of the matched string in most cases (small bug had it falling 1-4 chars behind sometimes)
Fix small bug with exporting font/binary details to SVGs
Fix `Win-
BytesMatch class to keep track of binary regex matches
Group suppression notifications together

Assets 2

Loading

All reactions

v1.2.0 Large expansion in binary data scouring, visualizing, etc

22 Sep 00:02

michelcrypt4d4mus

Compare

Choose a tag to compare

Loading

v1.2.0 Large expansion in binary data scouring, visualizing, etc

1.2.0

Dramatic expansion in the pdfalyzer's binary data scouring capabilities:
- Add chardet library guesses as to the encoding of all unknown byte sequences and ranks them from most to least likely
- Add attempted decodes of all backtick, frontslash, single, double, and guillemet quoted strings in font binaries
- Add decode attempts with Windows-1252, UTF-7, and UTF-16 encodings
- Add --suppress-decodes to suppress attempted decodes of quoted strings in font binaries
- Cool art gets generated when you swarm a binaries quoted strings, which are mostly but not totally random
The --font option takes an optional argument to limit the output to a single font ID
Add --limit-decodes to suppress attempted decodes of quoted strings in font binaries over a certain length
Add --surrounding option to specify number of bytes to print/decode before and after suspicious bytes; decrease default number of surrounding bytes
Add --version option
extract_guillemet_quoted_bytes() and extract_backtick_quoted_bytes() are now iterators
Fix scanning for UTF-16 BOM in font binary

Assets 2

Loading

All reactions

Previous 1 2 3 4 5 Next

Footer

© 2024 GitHub, Inc.

Footer navigation

Terms
Privacy
Security
Status
Docs
Contact

You can’t perform that action at this time.