Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pyth 0.7, with improved partial Python 3 port #44

Open
wants to merge 24 commits into
base: master
Choose a base branch
from

Conversation

prechelt
Copy link

@prechelt prechelt commented Sep 8, 2017

Hi Brendon,
this replaces PR #33 and contains the same commits plus some more.
I needed to move from the pyth.zip I had used for a long time in my setup to a more proper git dependency and decided to consider this a hint I should consolidate my pyth work some more.

test_readrtf15.py is now a reasonable set of tests with proper skipping and expectedFailures to indicate the state of the RTF reading functionality. This also exercises and hence co-tests the XHTMLWriter and PlaintextWriter and so also shows their functionality. This involves 18 new reference output test data files in tests/rtf-as-txt.
I have also reworked README (and turned it into README.md)

For my purposes, this is a satisfactory package. I have therefore set the version to 0.7 in setup.py and set the url to point to my fork.
Feel free to adopt this version and release it to PyPI, although some compatibility testing on Python 2 would probably be a good idea before -- I have not done that.

There is plenty left to do before the package as a whole supports Python 3.
There is still more left to do in terms of functionality. No boredom anywhere in sight.

prechelt and others added 24 commits June 13, 2015 16:35
modernize automates most of the changes required to make
Python 2 code compatible with Python 3.
The resulting code will rely on one additional package: six

Some more changes will be required for code such as this one
that handes both binary data and text data without always being
explicit in the original Python 2 code which is which.

The file  modernize_output.txt  captures the diff of these changes
as shown by the modernize call.
The file  modernize_output_strippeddown.txt is a subset of that
previous file (as described in the header of itself).
pyth\plugins\rtf15\reader.py
pyth\plugins\xhtml\writer.py

The former in particular was tricky because most strings have to be handled
as bytestrings -- but not all of them.
See  http://pythonhosted.org/six/

These two now appear to work for ASCII and 8-bit non-ASCII characters in the RTF file,
at least for a simple RTF file.
Complex files and true Unicode remain to be seen.
It should now work correctly.
They are based on pairs of files with the same basename in directories
tests/rtfs           (inputfiles) and
tests/rtf-as-html    (reference outputfiles)
The unittest test method for each pair is created dynamically
by test_readrtf15.py and will write the actual output to
yet another file with that same basename in directory
tests/currentoutput  (files holding actual outputs).
Those files are deleted only if the test succeeds and so
can be used for analysis if the test fails.
- performed changes needed to run under Python 3
- fixed one mistake in decodeTable
- added blank to the codes list
- improved the error message
- added two more test cases:
  - msword-symbol.rtf
  - wordpad-symbol.rtf
- neither of them works correctly, see 'Limitations' section
  in top-level README
- therefore, the corresponding 'correct' outputs
  are empty files (to make the automated tests fail)
I have thrown out the complex test files (except the interesting zh-cn.rtf)
as well as the overly simple or redundant test files and have introduced
a set of simple tests (not covering too much functionality at once)
and a regular naming of the test files.
The file names in tests/rtfs now consist of two parts:
program-testcontent.rtf
where program is one of
- msword: Microsoft Word from Microsoft Office 2013 on Windows 7
- librewriter: Writer from LibreOffice 4.4 on Windows 7
- wordpad: Microsoft Wordpad on Windows 7
and testcontent is a name describing roughly what functionality
is tested in the file.

The corresponding found-to-be-correct test outputs are in
tests/rtf-as-html/*.html
where these outputs have been truncated to zero bytes
for test cases with errors in their output.
…-py3

unfortunately, I am surprised by why this merge is necessary.
merge robertour PR of change by Petro-Viron: pyth.plugins.plaintext.writer seems to be working now
No argument is allowed for \super (like for \sub, contrary to \up and \dn).
(This has already been fixed in the python2 version.)
Fix signature of handle_super in rtf parser
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants