Skip to content
This repository has been archived by the owner on Apr 15, 2024. It is now read-only.

To Get Colors of Characters #197

Open
wants to merge 127 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
127 commits
Select commit Hold shift + click to select a range
faea729
tests pass under Py 2.7 and 3.4
Sep 1, 2014
846cd18
Python 3.4 support
Sep 2, 2014
a6475b6
Python 3.4 support added and tested
Sep 3, 2014
29c07ea
Python 3.4 support and tests
Sep 3, 2014
4ab48d1
Python 3.4 compatibility + tests
Sep 4, 2014
7b610b3
tools must be a module to enable scripts tests
Sep 4, 2014
1c93468
faster, less verbose tests
Sep 4, 2014
58b8492
no logging in travis.ci
Sep 4, 2014
28c2a4e
2.7/3.4 encoding corrected
Sep 4, 2014
0a2d90c
pdf2txt: do not double encode stdout
Cybjit Sep 7, 2014
f9a67db
change xrange to range
Cybjit Sep 7, 2014
cc733c8
fixes for ARC4
Cybjit Sep 7, 2014
a6f31a7
cmap bytes and decode
Cybjit Sep 7, 2014
7b620b3
Merge pull request #1 from Cybjit/master
goulu Sep 9, 2014
9b0a3ee
decode cmap font name
Cybjit Sep 11, 2014
6357e2d
code2cid uses int, not byte
Cybjit Sep 11, 2014
cba5a42
decipher_all bytes
Cybjit Sep 11, 2014
ed13f7c
conv_cmap py3 compat
Cybjit Sep 11, 2014
31e6afc
faster and simpler bytes implementation
Cybjit Sep 11, 2014
01821c7
rename bytes to avoid built-in collision
Cybjit Sep 11, 2014
39942b6
avoid string formating when not logging
Cybjit Sep 11, 2014
7144238
setup logging for pdf2txt and fix dumppdf
Cybjit Sep 11, 2014
4f8aa9f
Merge pull request #2 from Cybjit/master
goulu Sep 12, 2014
8861d7e
version 20140915 pushed to PyPi as pdfminer_six
Sep 15, 2014
03de0f4
forgot 'six' requirement ...
Sep 15, 2014
f577f76
renamed as pdfminer.six in PyPi
Sep 15, 2014
2ee7153
add python3 in sample Makefile
Cybjit Sep 16, 2014
51a361c
clean up HTMLConverter and XMLConverter encoding
Cybjit Sep 16, 2014
2260f77
fix dict_value usage in strict mode
Cybjit Sep 16, 2014
1458598
keep password api unicode, latin1 or utf-8 is encoded in handler
Cybjit Sep 16, 2014
ad05121
password py3
Cybjit Sep 16, 2014
9b2e293
apply_png_predictor py3
Cybjit Sep 16, 2014
2639b15
guess argv encoding in py2 using sys.stdin.encoding
Cybjit Sep 16, 2014
515687e
more xrange to range
Cybjit Sep 16, 2014
0e40264
Merge pull request #3 from Cybjit/master
goulu Sep 17, 2014
d0379a2
Fix utils.decode_text
enkore Dec 4, 2014
448aa08
Merge pull request #4 from enkore/master
goulu Dec 5, 2014
1b47bed
Many changes to make pdf2txt.py work better in Py3, some in that scri…
May 17, 2015
08cb217
Progress, progress.. not nearly atomic enough, sorry.
May 30, 2015
ead8e77
Successfully compartmentalised code, getting closer to moving pdf->te…
May 30, 2015
cbe270a
Killed the old main function for pdf2txt.py
May 30, 2015
b3553ce
Cleaning up pdf2txt.py after the partition/move.
May 30, 2015
3b7edba
Forgot to add the actual compartmentalised function..
May 30, 2015
268e9fb
Removed typechecking, nothing's exploded yet and argparse does lots o…
May 30, 2015
79c97ac
Docstrings.
May 30, 2015
a2ad7a6
Fixed some bugs preventing all tests from passing in Py2.
May 30, 2015
403711e
Whoops, forgot to version-gate chardet in the actual code. Thanks Tra…
May 30, 2015
e2d3adc
Adding chardet to Travis
May 30, 2015
30e14dd
Merge pull request #5 from cathalgarvey/master
goulu Jun 1, 2015
623bd98
Update __init__.py
goulu Jun 1, 2015
131cb1e
change STRICT to be a settings attribute
Jun 22, 2015
bc8d631
Merge pull request #6 from GreenLightGo/hotfix/strict-setting
goulu Jul 21, 2015
e143ad7
Ensure to install required libraries on installation
orangain Aug 6, 2015
a46ea52
Merge pull request #7 from orangain/install_requires
goulu Aug 11, 2015
b686dd0
pdfminer/settings.py for STRICT and added ENFORCE_CHECK_EXTRACTABLE
Nov 1, 2015
2e1be57
removed settings.ENFORCE_CHECK_EXTRACTABLE
Nov 1, 2015
146abb4
Updated setup.py to work with Python 2.6
Nov 8, 2015
a9a026b
Merge remote-tracking branch 'origin/patch-1'
Dec 5, 2015
8149be1
bugfixes
Dec 5, 2015
72b2bc3
Merge pull request #11 from metachris/pdfminerX
goulu Dec 6, 2015
f1d5d68
Include compiled cmap resources to simplify installation for CJK lang…
orangain Dec 27, 2015
f8a051a
Close device to write footer of xml/html files
orangain Dec 27, 2015
92c7143
Improved settings management
stevenhair Jan 10, 2016
4f762cb
Merge pull request #16 from stevenhair/settings-management
goulu Jan 18, 2016
2103e58
Merge pull request #13 from orangain/include-cmap
goulu Jan 18, 2016
5a23fad
Merge pull request #14 from orangain/close-device
goulu Jan 18, 2016
5a2e342
Add .gitattributes to always checkout *.py files with LF line endings
orangain Jan 25, 2016
5f888fe
Merge pull request #17 from orangain/ensure-lf
goulu Feb 2, 2016
21fd2bb
v 20160202 with Py 2.6 & Py 3.5 support
Feb 2, 2016
2c8f226
Fix issues #20 - NameError: global name 'ImageWriter' is not defined
ivanteoh Apr 26, 2016
e121f7e
Merge pull request #21 from ivanteoh/master
goulu May 1, 2016
1d54ecd
Make the logger run in a namespace.
pudo May 20, 2016
0d38aa1
Merge pull request #22 from pudo/log-into-namespace
goulu Jun 9, 2016
881ea17
v 20160614
goulu Jun 14, 2016
10815bf
Fixed tests.
Daniel-KM Jun 26, 2016
19155d3
remove lf rule
pudo Sep 23, 2016
1820f96
backport changes for upstream: #145, #95, #111, #117, #129, #132.
pudo Sep 23, 2016
0cb1398
Backport LICENSE.
pudo Sep 23, 2016
865246b
fix print, upstream: 01121124587d99601cf3368e9f82f096a9e5a98f
pudo Sep 23, 2016
a7f9623
Merge pull request #25 from Daniel-KM/fix_tests
goulu Sep 23, 2016
7091809
Return an empty list when no `Differences` are found.
pudo Sep 24, 2016
447adcf
fix STRICT reference
pudo Sep 24, 2016
bc78fd2
Merge pull request #33 from pudo/backports
goulu Oct 31, 2016
0fdebc6
Removing all the "#!/usr/bin/env python" lines, they do not need for …
eracle Nov 8, 2016
e6ad15a
Added painting information (#37)
begnini Nov 8, 2016
6cc4abb
Fix import of Django settings (#41)
Crocmagnon Nov 26, 2016
61d423d
bugfix: if fontname is bytes then skip (#43)
TaeGuNi Dec 14, 2016
52feb22
Merge remote-tracking branch 'origin/master'
Jan 19, 2017
7c96fe2
links updated to new https://github.com/pdfminer owner
Jan 19, 2017
f094f0b
v. 20170119 RC
Jan 19, 2017
9b9d69a
image export works again with Py3 (issue #15)
Jan 20, 2017
fd63dbf
no more skipped tests
Jan 20, 2017
1e5db2b
some keywords can't be decoded
Jan 20, 2017
9439a3a
Miscellaneous bug fixes (#47)
0xabu Feb 6, 2017
3427dca
Merge branch 'master' of https://github.com/pdfminer/pdfminer.six.git
Feb 6, 2017
f2b0650
Fixes #54 -- don't pass bytestrings through ord() (#55)
sergei-maertens Apr 18, 2017
7055862
solves https://github.com/pdfminer/pdfminer.six/issues/50
goulu Apr 18, 2017
5ef8333
new test fails on Linux & TRavis-CI. TODO: find why
goulu Apr 18, 2017
11a4c8b
version 20170418
goulu Apr 18, 2017
f28ce1e
Merge branch 'master' of https://github.com/pdfminer/pdfminer.six.git
Apr 19, 2017
cd92883
logging (stupid bug)
Apr 19, 2017
82af7f0
issue #56 reproduced, solution attempt unsucessful
Apr 19, 2017
baddb25
v 20170419 (patches a stupid bug from yesterday...)
Apr 19, 2017
4e59fb6
Convert Windows/DOS line endings CR/LF to Unix LF (#58)
bittner May 29, 2017
4bc0a0c
Update pdftypes.py (#61)
Levantado May 29, 2017
fe21725
Please replace pycrypto with pycryptodome (#63)
mpasternak May 29, 2017
488545d
Add string expressions to asserts showing local data (#67)
hughsw May 29, 2017
35a58ee
Add tools/pdfstats.py which counts all LT* types in a PDF (#68)
hughsw May 29, 2017
87726d8
No, thank you. (#69)
mpasternak May 30, 2017
d79612c
Resolve unresolved PDFObjectRefs (#70)
sergei-maertens Jun 2, 2017
938419c
Align dumppdf tool to modified data structures. (#73)
tilusnet Jul 20, 2017
3e36435
Fixes #64 -- be less strict when inspecting a tree type (#76)
sergei-maertens Jul 20, 2017
67bf5ab
Compare byte with byte instead of int (#78)
sergei-maertens Jul 20, 2017
b010db6
solves https://github.com/pdfminer/pdfminer.six/issues/65
goulu Jul 20, 2017
4c60482
v. 20170720
goulu Jul 20, 2017
c2432c3
Fix assert message for PDFLayoutAnalyzer.end_page (#80)
vstoykov Aug 18, 2017
496bfd0
Remove leftover from removing shebangs (#81)
vstoykov Aug 18, 2017
14de393
Cleanup psparser (#83)
vstoykov Aug 18, 2017
171cdcc
Microoptimization for singlebyte fonts (#84)
vstoykov Aug 18, 2017
5ef5484
Add tox configuration for easy local testing (#85)
vstoykov Aug 18, 2017
e39800f
Move package description into package docstring (#87)
bittner Aug 18, 2017
d4118cf
Enabled PDFDevice in the with statement (#88)
massongit Aug 18, 2017
1b88575
FIX: Null character replaced by blank
tataganesh Nov 8, 2017
6d3210d
pdfdiff tool (and .spec files for compilation with pyinstaller)
Nov 21, 2017
8885469
pass GraphicState out to LTChar
Nov 28, 2017
710d685
Fix bug - Pass a copy of graphicstate to render_string, not its refer…
Dec 7, 2017
63e5515
Fix bug - fillbuf, in some cases charpos is tuple instead of int
Dec 7, 2017
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
18 changes: 18 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
*.class
*.pyc
*.pyo
.svn
_svn
.pythoscope
.ipynb_checkpoints
.settings
_update.bat
docs/_build
/Goulib.egg-info/
/build/
/dist/
/pdfminer.six.egg-info/
tests/*.xml
tests/*.txt
.idea/
.tox/
9 changes: 7 additions & 2 deletions .travis.yml
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,12 @@ language: python
python:
- "2.6"
- "2.7"
- "3.4"
- "3.5"
- "3.6"
install:
- pip install pycrypto
- pip install six
- pip install pycryptodome
- pip install chardet
script:
- make test
nosetests --nologcapture
2 changes: 2 additions & 0 deletions MANIFEST.in
Original file line number Diff line number Diff line change
@@ -1,9 +1,11 @@
include Makefile
include LICENSE
include *.txt
include *.md
include *.py
graft cmaprsrc
graft docs
graft pdfminer
graft samples
graft tools
global-exclude *.pyc
11 changes: 2 additions & 9 deletions Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@

PACKAGE=pdfminer

PYTHON=python2
PYTHON=python
GIT=git
RM=rm -f
CP=cp -f
Expand Down Expand Up @@ -55,12 +55,5 @@ $(CMAPDST)/to-unicode-Adobe-Korea1.pickle.gz: $(CMAPDST)
$(CMAPDST) Adobe-Korea1 $(CMAPSRC)/cid2code_Adobe_Korea1.txt

test: cmap
$(PYTHON) -m doctest \
pdfminer/arcfour.py \
pdfminer/lzw.py \
pdfminer/ascii85.py \
pdfminer/runlength.py \
pdfminer/rijndael.py
$(PYTHON) -m pdfminer.ccitt
$(PYTHON) -m pdfminer.psparser
nosetests
cd samples && $(MAKE) test
82 changes: 15 additions & 67 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,20 +1,21 @@
PDFMiner
========
PDFMiner.six
============

[![Build Status](https://travis-ci.org/euske/pdfminer.svg?branch=master)](https://travis-ci.org/euske/pdfminer)
PDFMiner.six is a fork of PDFMiner using six for Python 2+3 compatibility

[![Build Status](https://travis-ci.org/pdfminer/pdfminer.six.svg?branch=master)](https://travis-ci.org/pdfminer/pdfminer.six) [![PyPI version](https://img.shields.io/pypi/v/pdfminer.six.svg)](https://pypi.python.org/pypi/pdfminer.six/)

PDFMiner is a tool for extracting information from PDF documents.
Unlike other PDF-related tools, it focuses entirely on getting
Unlike other PDF-related tools, it focuses entirely on getting
and analyzing text data. PDFMiner allows one to obtain
the exact location of text in a page, as well as
the exact location of text in a page, as well as
other information such as fonts or lines.
It includes a PDF converter that can transform PDF files
into other text formats (such as HTML). It has an extensible
PDF parser that can be used for other purposes than text analysis.

* Webpage: https://euske.github.io/pdfminer/
* Download (PyPI): https://pypi.python.org/pypi/pdfminer/
* Demo WebApp: http://pdf2html.tabesugi.net:8080/
* Webpage: https://github.com/pdfminer/
* Download (PyPI): https://pypi.python.org/pypi/pdfminer.six/


Features
Expand All @@ -34,42 +35,16 @@ Features
How to Install
--------------

* Install Python 2.6 or newer. (**For Python 3 support have a look at [pdfminer.six](https://github.com/goulu/pdfminer)**).
* Download the source code.
* Unpack it.
* Run `setup.py`:
* Install Python 2.7 or newer. (Python 3.x is supported in pdfminer.six)
* Install

$ python setup.py install
$ pip install pdfminer.six

* Do the following test:
* Run the following test:

$ pdf2txt.py samples/simple1.pdf


For CJK Languages
-----------------

In order to process CJK languages, do the following before
running setup.py install:

$ make cmap
python tools/conv_cmap.py pdfminer/cmap Adobe-CNS1 cmaprsrc/cid2code_Adobe_CNS1.txt
reading 'cmaprsrc/cid2code_Adobe_CNS1.txt'...
writing 'CNS1_H.py'...
...
$ python setup.py install

On Windows machines which don't have `make` command,
paste the following commands on a command line prompt:

mkdir pdfminer\cmap
python tools\conv_cmap.py -c B5=cp950 -c UniCNS-UTF8=utf-8 pdfminer\cmap Adobe-CNS1 cmaprsrc\cid2code_Adobe_CNS1.txt
python tools\conv_cmap.py -c GBK-EUC=cp936 -c UniGB-UTF8=utf-8 pdfminer\cmap Adobe-GB1 cmaprsrc\cid2code_Adobe_GB1.txt
python tools\conv_cmap.py -c RKSJ=cp932 -c EUC=euc-jp -c UniJIS-UTF8=utf-8 pdfminer\cmap Adobe-Japan1 cmaprsrc\cid2code_Adobe_Japan1.txt
python tools\conv_cmap.py -c KSC-EUC=euc-kr -c KSC-Johab=johab -c KSCms-UHC=cp949 -c UniKS-UTF8=utf-8 pdfminer\cmap Adobe-Korea1 cmaprsrc\cid2code_Adobe_Korea1.txt
python setup.py install


Command Line Tools
------------------

Expand All @@ -91,53 +66,26 @@ You cannot extract any text from a PDF document which does not have extraction p

**dumppdf.py**

dumppdf.py dumps the internal contents of a PDF file in pseudo-XML format.
dumppdf.py dumps the internal contents of a PDF file in pseudo-XML format.
This program is primarily for debugging purposes,
but it's also possible to extract some meaningful contents (e.g. images).

(For details, refer to the html document.)


API Changes
-----------

As of November 2013, there were a few changes made to the PDFMiner API
prior to October 2013. This is the result of code restructuring. Here
is a list of the changes:

* PDFDocument class is moved to pdfdocument.py.
* PDFDocument class now takes a PDFParser object as an argument.
PDFDocument.set_parser() and PDFParser.set_document() is removed.
* PDFPage class is moved to pdfpage.py
* process_pdf function is implemented as a class method PDFPage.get_pages.


TODO
----

* Replace STRICT variable with something better.
* Use logging module instead of sys.stderr.
* Proper test cases.
* PEP-8 and PEP-257 conformance.
* Better documentation.
* Crypt stream filter support.


Related Projects
----------------

* <a href="http://pybrary.net/pyPdf/">pyPdf</a>
* <a href="http://www.foolabs.com/xpdf/">xpdf</a>
* <a href="http://pdfbox.apache.org/">pdfbox</a>
* <a href="http://mupdf.com/">mupdf</a>


Terms and Conditions
--------------------

(This is so-called MIT/X License)

Copyright (c) 2004-2016 Yusuke Shinyama <yusuke at shinyama dot jp>
Copyright (c) 2004-2014 Yusuke Shinyama <yusuke at cs dot nyu dot edu>

Permission is hereby granted, free of charge, to any person
obtaining a copy of this software and associated documentation
Expand Down
91 changes: 8 additions & 83 deletions docs/index.html
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@

<div align=right class=lastmod>
<!-- hhmts start -->
Last Modified: Mon Sep 26 09:04:15 UTC 2016
Last Modified: Wed Jun 25 10:27:52 UTC 2014
<!-- hhmts end -->
</div>

Expand Down Expand Up @@ -82,14 +82,14 @@ <h3>Features</h3>
<h3><a name="download">Download</a></h3>
<p>
<strong>Source distribution:</strong><br>
<a href="http://pypi.python.org/pypi/pdfminer/">
http://pypi.python.org/pypi/pdfminer/
<a href="http://pypi.python.org/pypi/pdfminer_six/">
http://pypi.python.org/pypi/pdfminer_six/
</a>

<P>
<strong>github:</strong><br>
<a href="https://github.com/euske/pdfminer/">
https://github.com/euske/pdfminer/
<a href="https://github.com/goulu/pdfminer/">
https://github.com/goulu/pdfminer/
</a>

<h3><a name="wheretoask">Where to Ask</a></h3>
Expand All @@ -100,11 +100,9 @@ <h3><a name="wheretoask">Where to Ask</a></h3>
http://groups.google.com/group/pdfminer-users/
</a>


<h2><a name="install">How to Install</a></h2>
<ol>
<li> Install <a href="http://www.python.org/download/">Python</a> 2.6 or newer.
(<font color=red><strong>Python 3 is not supported.</strong></font>)
<li> Download the <a href="#source">PDFMiner source</a>.
<li> Unpack it.
<li> Run <code>setup.py</code> to install:<br>
Expand Down Expand Up @@ -268,7 +266,6 @@ <h4>Options</h4>
<dd> Specifies how much a horizontal and vertical position of a text matters
when determining a text order. The value should be within the range of
-1.0 (only horizontal position matters) to +1.0 (only vertical position matters).
When this value is out of the range (e.g. +2), a simpler ordering rule is used.
The default value is 0.5.
<p>
<dt> <code>-C</code>
Expand Down Expand Up @@ -373,82 +370,10 @@ <h4>Options</h4>
<dd> Increases the debug level.
</dl>

<h2><a name="changes">Changes</a></h2>
<h2><a name="changes">Changes:</a></h2>
<ul>
<li> 2014/03/28: Further bugfixes.
<li> 2014/03/24: Bugfixes and improvements for fauly PDFs.<br>
API changes:
<ul>
<li> <code>PDFDocument.initialize()</code> method is removed and no longer needed.
A password is given as an argument of a PDFDocument constructor.
</ul>
<li> 2013/11/13: Bugfixes and minor improvements.<br>
As of November 2013, there were a few changes made to the PDFMiner API
prior to October 2013. This is the result of code restructuring. Here
is a list of the changes:
<ul>
<li> <code>PDFDocument</code> class is moved to <code>pdfdocument.py</code>.
<li> <code>PDFDocument</code> class now takes a <code>PDFParser</code> object as an argument.
<li> <code>PDFDocument.set_parser()</code> and <code>PDFParser.set_document()</code> is removed.
<li> <code>PDFPage</code> class is moved to <code>pdfpage.py</code>.
<li> <code>process_pdf</code> function is implemented as <code>PDFPage.get_pages</code>.
</ul>
<li> 2013/10/22: Sudden resurge of interests. API changes.
Incorporated a lot of patches and robust handling of broken PDFs.
<li> 2011/05/15: Speed improvements for layout analysis.
<li> 2011/05/15: API changes. <code>LTText.get_text()</code> is added.
<li> 2011/04/20: API changes. LTPolygon class was renamed as LTCurve.
<li> 2011/04/20: LTLine now represents horizontal/vertical lines only. Thanks to Koji Nakagawa.
<li> 2011/03/07: Documentation improvements by Jakub Wilk. Memory usage patch by Jonathan Hunt.
<li> 2011/02/27: Bugfixes and layout analysis improvements. Thanks to fujimoto.report.
<li> 2010/12/26: A couple of bugfixes and minor improvements. Thanks to Kevin Brubeck Unhammer and Daniel Gerber.
<li> 2010/10/17: A couple of bugfixes and minor improvements. Thanks to standardabweichung and Alastair Irving.
<li> 2010/09/07: A minor bugfix. Thanks to Alexander Garden.
<li> 2010/08/29: A couple of bugfixes. Thanks to Sahan Malagi, pk, and Humberto Pereira.
<li> 2010/07/06: Minor bugfixes. Thanks to Federico Brega.
<li> 2010/06/13: Bugfixes and improvements on CMap data compression. Thanks to Jakub Wilk.
<li> 2010/04/24: Bugfixes and improvements on TOC extraction. Thanks to Jose Maria.
<li> 2010/03/26: Bugfixes. Thanks to Brian Berry and Lubos Pintes.
<li> 2010/03/22: Improved layout analysis. Added regression tests.
<li> 2010/03/12: A couple of bugfixes. Thanks to Sean Manefield.
<li> 2010/02/27: Changed the way of internal layout handling. (LTTextItem -&gt; LTChar)
<li> 2010/02/15: Several bugfixes. Thanks to Sean.
<li> 2010/02/13: Bugfix and enhancement. Thanks to Andr&eacute; Auzi.
<li> 2010/02/07: Several bugfixes. Thanks to Hiroshi Manabe.
<li> 2010/01/31: JPEG image extraction supported. Page rotation bug fixed.
<li> 2010/01/04: Python 2.6 warning removal. More doctest conversion.
<li> 2010/01/01: CMap bug fix. Thanks to Winfried Plappert.
<li> 2009/12/24: RunLengthDecode filter added. Thanks to Troy Bollinger.
<li> 2009/12/20: Experimental polygon shape extraction added. Thanks to Yusuf Dewaswala for reporting.
<li> 2009/12/19: CMap resources are now the part of the package. Thanks to Adobe for open-sourcing them.
<li> 2009/11/29: Password encryption bug fixed. Thanks to Yannick Gingras.
<li> 2009/10/31: SGML output format is changed and renamed as XML.
<li> 2009/10/24: Charspace bug fixed. Adjusted for 4-space indentation.
<li> 2009/10/04: Another matrix operation bug fixed. Thanks to Vitaly Sedelnik.
<li> 2009/09/12: Fixed rectangle handling. Able to extract image boundaries.
<li> 2009/08/30: Fixed page rotation handling.
<li> 2009/08/26: Fixed zlib decoding bug. Thanks to Shon Urbas.
<li> 2009/08/24: Fixed a bug in character placing. Thanks to Pawan Jain.
<li> 2009/07/21: Improvement in layout analysis.
<li> 2009/07/11: Improvement in layout analysis. Thanks to Lubos Pintes.
<li> 2009/05/17: Bugfixes, massive code restructuring, and simple graphic element support added. setup.py is supported.
<li> 2009/03/30: Text output mode added.
<li> 2009/03/25: Encoding problems fixed. Word splitting option added.
<li> 2009/02/28: Robust handling of corrupted PDFs. Thanks to Troy Bollinger.
<li> 2009/02/01: Various bugfixes. Thanks to Hiroshi Manabe.
<li> 2009/01/17: Handling a trailer correctly that contains both /XrefStm and /Prev entries.
<li> 2009/01/10: Handling Type3 font metrics correctly.
<li> 2008/12/28: Better handling of word spacing. Thanks to Christian Nentwich.
<li> 2008/09/06: A sample pdf2html webapp added.
<li> 2008/08/30: ASCII85 encoding filter support.
<li> 2008/07/27: Tagged contents extraction support.
<li> 2008/07/10: Outline (TOC) extraction support.
<li> 2008/06/29: HTML output added. Reorganized the directory structure.
<li> 2008/04/29: Bugfix for Win32. Thanks to Chris Clark.
<li> 2008/04/27: Basic encryption and LZW decoding support added.
<li> 2008/01/07: Several bugfixes. Thanks to Nick Fabry for his vast contribution.
<li> 2007/12/31: Initial release.
<li> 2004/12/24: Start writing the code out of boredom...
<li> 2014/09/15: pushed on PyPi</li>
<li> 2014/09/10: pdfminer_six forked from pdfminer since Yusuke didn't want to merge and pdfminer3k is outdated</li>
</ul>

<h2><a name="todo">TODO</a></h2>
Expand Down
17 changes: 14 additions & 3 deletions pdfminer/__init__.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,16 @@
#!/usr/bin/env python
__version__ = '20140328'
# -*- coding: utf-8 -*-
"""
Fork of PDFMiner using six for Python 2+3 compatibility

PDFMiner is a tool for extracting information from PDF documents.
Unlike other PDF-related tools, it focuses entirely on getting and analyzing
text data. PDFMiner allows to obtain the exact location of texts in a page,
as well as other information such as fonts or lines.
It includes a PDF converter that can transform PDF files into other text
formats (such as HTML). It has an extensible PDF parser that can be used for
other purposes instead of text analysis.
"""
__version__ = '20170720'

if __name__ == '__main__':
print (__version__)
print(__version__)
Loading