Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Drop python2 support #433

Open
wants to merge 2 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -71,3 +71,6 @@ tests/gif/standardized_text.txt
tests/jpg/standardized_text.txt
tests/tiff/standardized_text.txt
tests/pdf/ocr_text.txt

# PyCharm
.idea/
17 changes: 4 additions & 13 deletions .travis.yml
Original file line number Diff line number Diff line change
Expand Up @@ -3,35 +3,26 @@ os: linux

language: python
python:
- "2.7"
- "3.7"

# install system dependencies here with apt-get.
before_install:
- sudo ./provision/debian.sh
- python -m pip install --upgrade pip
- python -m pip install --upgrade pip setuptools wheel

# install python dependencies including this package in the travis
# virtualenv
install:

- if [[ $TRAVIS_PYTHON_VERSION == 3.7 ]];
then ./provision/python3.sh;
fi
- if [[ $TRAVIS_PYTHON_VERSION == 2.7 ]];
then ./provision/python2.sh;
fi
- pip install .[pocketsphinx]
- ./provision/python.sh
- pip install .

# commands to run the testing suite. if any of these fail, travic lets us know
script:
- cd tests && make && cd -
- nosetests --with-coverage --cover-package=textract
- cd tests && pytest && cd -
# - pycodestyle textract/ bin/textract
- if [[ $TRAVIS_PYTHON_VERSION == 3.7 ]];
then cd docs && make html && cd -;
fi
- cd docs && make html && cd -;

# commands to run after the tests successfully complete
after_success:
Expand Down
3 changes: 1 addition & 2 deletions Vagrantfile
Original file line number Diff line number Diff line change
Expand Up @@ -27,8 +27,7 @@ Vagrant.configure("2") do |config|
vb.customize ["modifyvm", :id, "--ioapic", "on"]
vb.customize ["modifyvm", :id, "--cpus", "2"]
vb.customize ["modifyvm", :id, "--memory", "2048"]
override_config.vm.box = "trusty64"
override_config.vm.box_url = "https://cloud-images.ubuntu.com/vagrant/trusty/current/trusty-server-cloudimg-amd64-vagrant-disk1.box"
override_config.vm.box = "ubuntu/focal64"
end

# steps for provisioning so that these provisioning steps are
Expand Down
Empty file modified bin/textract
100644 → 100755
Empty file.
5 changes: 5 additions & 0 deletions docs/changelog.rst
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,11 @@ latest changes in development for next release
----------------------------------------------

.. THANKS FOR CONTRIBUTING; ADD YOUR UNRELEASED CHANGES HERE!
1.7.0
-------------------

* Dropped python2 support

1.6.5
-------------------

Expand Down
2 changes: 1 addition & 1 deletion docs/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -58,7 +58,7 @@
# built documents.
#
# The short X.Y version.
release = version = "1.6.5"
release = version = "1.7.0"

# The language for content autogenerated by Sphinx. Refer to documentation
# for a list of supported languages.
Expand Down
2 changes: 1 addition & 1 deletion docs/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -86,7 +86,7 @@ file types by either mentioning them on the `issue tracker

* ``.wav`` via `SpeechRecognition`_ and `pocketsphinx`_

* ``.xlsx`` via `xlrd <https://pypi.python.org/pypi/xlrd>`_
* ``.xlsx`` via `openpyxl <https://pypi.python.org/pypi/openpyxl>`_

* ``.xls`` via `xlrd <https://pypi.python.org/pypi/xlrd>`_

Expand Down
2 changes: 1 addition & 1 deletion provision/python3.sh → provision/python.sh
Original file line number Diff line number Diff line change
Expand Up @@ -12,5 +12,5 @@ fi
pip install -U pip

# Install the requirements for this package as well as this module.
pip install -r requirements/python-dev3
pip install -r requirements/python-dev
pip install -r requirements/python-doc
15 changes: 0 additions & 15 deletions provision/python2.sh

This file was deleted.

2 changes: 1 addition & 1 deletion provision/travis-mock.sh
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@
# if its a problem.
# http://docs.travis-ci.com/user/languages/python/#Travis-CI-Uses-Isolated-virtualenvs
sudo apt-get update -qq
sudo apt-get install -y python-pip python-dev build-essential
sudo apt-get install -y python3-pip python3-dev build-essential

# install pep8 and nose for testing
sudo pip install pep8 nose
4 changes: 3 additions & 1 deletion requirements/debian
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ make

# these packages are required by python-docx, which depends on lxml
# and requires these things
python-dev
python3-dev
libxml2-dev
libxslt1-dev

Expand Down Expand Up @@ -48,3 +48,5 @@ swig
# libxslt1-dev for compiling lxml.
# https://github.com/deanmalmgren/textract/issues/19
zlib1g-dev

python-is-python3
21 changes: 11 additions & 10 deletions requirements/python
Original file line number Diff line number Diff line change
@@ -1,13 +1,14 @@
# This file contains all python dependencies that are required by the textract
# package in order for it to properly work.

argcomplete~=1.10.0
beautifulsoup4~=4.8.0
chardet==3.*
docx2txt~=0.8
extract-msg<=0.29.* #Last with python2 support
pdfminer.six==20191110 #Last with python2 support
python-pptx~=0.6.18
six~=1.12.0
SpeechRecognition~=3.8.1
xlrd~=1.2.0
argcomplete>=1.10.0
beautifulsoup4>=4.8.0
chardet>=3.0
docx2txt>=0.8
extract-msg>=0.29.0
pdfminer.six>=20191110
python-pptx>=0.6.18
six>=1.12.0
SpeechRecognition>=3.8.1
xlrd>=1.2.0
openpyxl>=2.0.0
File renamed without changes.
16 changes: 0 additions & 16 deletions requirements/python-dev2

This file was deleted.

2 changes: 2 additions & 0 deletions requirements/python-doc
Original file line number Diff line number Diff line change
@@ -1,5 +1,7 @@
# this only includes packages that are needed for documentation build.

jinja2<3.1
sphinx==2.1.2
sphinx_rtd_theme==0.4.3
sphinx-argparse==0.2.5
pocketsphinx==0.1.15
2 changes: 1 addition & 1 deletion setup.cfg
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
[bumpversion]
current_version = 1.6.5
current_version = 1.7.0
commit = True
tag = True

Expand Down
2 changes: 1 addition & 1 deletion setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -42,7 +42,7 @@ def parse_requirements(requirements_filename):

setup(
name=textract.__name__,
version="1.6.5",
version="1.7.0",
description="extract text from any document. no muss. no fuss.",
long_description=long_description,
url=github_url,
Expand Down
17 changes: 8 additions & 9 deletions tests/Dockerfile
Original file line number Diff line number Diff line change
@@ -1,16 +1,15 @@
FROM ubuntu:12.04
FROM ubuntu:20.04
MAINTAINER Shawn Milochik <[email protected]>
ENV DEBIAN_FRONTEND noninteractive
ENV REFRESHED_AT 2014-08-12b
ENV REFRESHED_AT 2022-08-17
RUN apt-get update
RUN apt-get install python-pip -y
ADD . /src
WORKDIR /src
RUN /bin/bash /src/provision/debian.sh
RUN /bin/bash /src/provision/python.sh
RUN apt-get install python3-pip -y
ADD . /app
WORKDIR /app
RUN /bin/bash /app/provision/debian.sh
RUN /bin/bash /app/provision/python.sh
RUN adduser --disabled-password --gecos "" --home=/home/textract textract
VOLUME ["/home/textract/src"]
ENV PATH $PATH:/home/textract/src/bin
ENV PYTHONPATH /home/textract/src
USER textract
ENTRYPOINT ["/home/textract/src/tests/run.py"]
ENTRYPOINT ["/home/textract/src/tests/docker_entry.sh"]
2 changes: 1 addition & 1 deletion tests/docker_entry.sh
Original file line number Diff line number Diff line change
Expand Up @@ -3,4 +3,4 @@
# This script gets called from within the
# Docker container.

./tests/run.py
cd "$(dirname "$0")" && make && pytest && cd -
40 changes: 36 additions & 4 deletions tests/pdf/raw_text-m=pdfminer.txt
Original file line number Diff line number Diff line change
@@ -1,59 +1,91 @@
I  love  word  documents.  They  are  lovely.  They  make  me  so  happy  I  could  smile.  And  
that’s  why  I  wrote  this  package.  


Sample text is hard. That’s
where http://hipsum.co comes
in handy.



Semiotics church-key VHS, Truffaut cliche actually vegan. Cray Austin

pop-up disrupt letterpress, kitsch fixie Cosby sweater cliche craft beer

PBR&B. Gentrify cornhole Tonx McSweeney's, Shoreditch keffiyeh

ethnic Marfa 90's kogi American Apparel. Shabby chic distillery church-

key locavore beard, food truck chillwave sartorial deep v flannel authentic

Tumblr narwhal kogi organic. Cred vegan jean shorts Banksy forage

Neutra dreamcatcher, hashtag Bushwick polaroid pork belly flannel

keytar Portland post-ironic. Cred hoodie vegan, food truck leggings

Austin pour-over banjo trust fund before they sold out cray Intelligentsia

plaid typewriter. Williamsburg XOXO plaid Carles Austin tofu.

Carles Tonx keffiyeh, leggings 90's lo-fi kogi viral semiotics Brooklyn

biodiesel tousled bespoke kitsch. Vinyl Tonx art party Thundercats retro,

viral asymmetrical artisan bicycle rights bitters master cleanse Kickstarter

YOLO. Seitan street art semiotics twee skateboard, PBR&B VHS hashtag

meh. Thundercats semiotics shabby chic forage single-origin coffee retro,

3 wolf moon iPhone mumblecore 90's trust fund Intelligentsia. Beard

gluten-free seitan, VHS sartorial pork belly gastropub meh whatever

authentic synth. Beard single-origin coffee irony fixie, before they sold



out Pitchfork kitsch readymade. Helvetica butcher wayfarers, lomo artisan

hashtag Brooklyn four loko fanny pack 90's mustache 8-bit.

Meh jean shorts selfies, crucifix selvage Helvetica Carles PBR Vice

Banksy roof party master cleanse ugh PBR&B. Lo-fi freegan salvia photo

booth, Wes Anderson skateboard Odd Future. Etsy art party Bushwick

keffiyeh. Pork belly 3 wolf moon butcher mustache. YOLO raw denim lo-

fi, hoodie gentrify Schlitz 8-bit sriracha Shoreditch retro brunch.

Williamsburg farm-to-table beard, mlkshk Banksy fap kogi Etsy art party

squid semiotics. XOXO church-key Pitchfork mlkshk irony tote bag.

Farm-to-table brunch tattooed hoodie keytar, literally selvage authentic

trust fund deep v Thundercats Kickstarter narwhal locavore. Swag disrupt

chambray, leggings shabby chic gastropub YOLO plaid hoodie

Williamsburg Godard mixtape. Retro Godard keytar biodiesel, freegan

paleo Etsy you probably haven't heard of them Pitchfork Schlitz

readymade small batch cred. Pug trust fund paleo, 90's fixie typewriter

next level banjo. Banksy occupy authentic master cleanse Bushwick

fingerstache selfies, direct trade craft beer cliche +1 cray. Locavore four

loko biodiesel Neutra chia mlkshk. Fanny pack YOLO Portland, mlkshk

PBR&B single-origin coffee drinking vinegar 8-bit flannel gentrify

stumptown pop-up.

Oh. You need a little dummy text for your mockup? How quaint.

I bet you’re still using Bootstrap too…




Expand Down
9 changes: 2 additions & 7 deletions tests/run_docker_tests.sh
Original file line number Diff line number Diff line change
Expand Up @@ -5,15 +5,10 @@
cd $(dirname $0)/..
base=$(pwd)

image="textract/ubuntu12.04"

cp tests/Dockerfile ./Dockerfile
image="textract/ubuntu20.04"

# Note: For speed, the image won't be automatically rebuilt. If the dependencies
# change and the existing image is outdated, just delete it with:
# docker rmi <image name>
docker images | grep $image || docker build -t $image .
docker images | grep $image || docker build -t $image -f tests/Dockerfile .
docker run --rm -v $base:/home/textract/src $image

rm ./Dockerfile

Loading