Added serialization and token collection. #31

Craxic · 2015-03-22T11:18:19Z

Hi there!

I've been working pretty hard on this little project for some time now and I hope you like the stuff I've done. I've added the ability to write out the tree back to a Java file. So it's now possible to simply add and remove functions and variables just like you would do when modifying something like an HTML file. I've also tried to collect all the tokens from the parser and attach them to their model classes. This would make it easy for someone to find the exact location of something inside a Java file. I've also added a test that pulls all of openjdk7 and runs the parser against them to ensure that the parser succeeds without error. It also serializes and then de-serializes the result to ensure that serialization results in an identical tree as the original parse. Unfortunately, it takes too long on Travis (>50 minutes) so I disabled it when running in Travis. I've also re-factored the two source files into many separate source files because I think it's nicer that way. If you don't like it I'm sure we can fix it!

There are a couple of other reasons why you may not want to accept this pull request:

I'm almost certain that speed will have decreased a bit due to the many assertions I've added to ensure it is very difficult to create an invalid tree. Also collecting the tokens will take some time. But it's still usable and parses, serializes, re-parses and then compares the two trees over the entirety of the openjdk7 in about 10 minutes on my PC (single core only).
This is not compatible with current projects using the library. Many many things have changed, but I thought it was the best thing to do. I'm not sure how many people are really using the library, but I can't imagine it would be too hard for them to simply use an old version of the library or make necessary adjustments to restore compatibility.
I added my name in the COPYING file. I've added at least 3752 LOC, so I hope that's OK and doesn't come across as narcissistic...

Anyways, I hope you like it. Let me know!

…fy that output can be parsed and that the parse result is identical to input parse result (but that might be a stretch...)

…le for the serialization

… (takes too long)

…cacher.py

…eserves its own project

musiKk · 2015-04-01T20:04:14Z

Hi Craxic,

I noticed your work on your fork for quite a while. It's very impressive indeed. When you read the following, please keep in mind that even if I voice criticism I think you did a lot of tremendous work and I'm thankful that you put so much time into the project. I assume you had an itch to scratch.

My biggest problem with this PR is that it is so huge. I like to have discrete units of work which are only as big as necessary. This applies to branches and doubly to commits (haven't checked those yet). The branch touches a LOT of things and is basically a complete reorganization of the whole code base. Based on my reaction time you can see that I can't put too much time into the project and this makes reviewing big changes harder than small ones.

I have a few comments on what's probably only a fraction of what actually deserves commenting on. I'll add more once I get to it.

Serializing to Java is a great feature. I think this would be more appropriately implemented as a visitor. The way it's currently implemented, it bloats the model. There are many ways in which the model could be serialized: Back to Java as it was or pretty printed, XML, JSON, HTML and more. Those shouldn't be in the model as well.
I put off organizing the code into multiple modules because I thought it would be more pythonic to have everything that belongs together be in a single module. I'm not sure what's the best way, I'm not a super experienced Python dev. In addition to that the division is sometimes arbitrary. Do I group expression statements to expressions or statements? Having everything in one file made finding stuff pretty fast and definitely helped during initial development. I haven't made my mind up. What I will definitely not have is putting the burden of importing lots of packages to the user. Some import lists have Java-like proportions. There is one model package to import and that's it. I think this can be achieved by importing the rest from the __init__.py files.
If those asserts prove to be too big of a performance hit and cannot be disabled, they won't make it. This too could easily be part of a visitor during tests.
I believe in giving credit where credit is due. Seriously. However, in my opinion the license file is not the place to put it as this doesn't scale and... well... "why is this guy mentioned there and I'm not?!". I'd like to add a contributers file (or a section to the readme) and put everybody who contributed and wants a mention in there. A contribution the size of yours would definitely warrant a special thanks section. Is that something that could work for you?

Those are my comments for now. As I said, this may take a while... :)

Craxic · 2015-04-02T04:41:28Z

Hi musiKk,

You're right, it does bloat the model. I think you're correct in moving serialization out of there and into something else. I will take a look at doing this during the Easter break. As to where, I'm not sure. I sort of dislike the "visitor" pattern, only because it uses dynamic calling and ruins static analysis by PyCharm (of which I'm a big fan and I think others are too). There are also some other technical limitations I'm concerned about (not sure yet). As for an alternative, I would simply move the serialization methods to module level functions and then move them into their own "serialization" directory. This way I can avoid having a gigantic visitor class/subclasses. Although then they are kind of detached from their model classes... Let me know what you think about that idea, perhaps we can arise at a solution we both like.
I'm not a super-experienced Python developer either. I was hoping you were so you'd know the best way to do this 😃. You can indeed import everything in __init__.py, but I remember the only reason I didn't put it in there was because PyCharm wasn't happy about it for whatever reason. I will need to investigate this closer and properly ensure this is the case and investigate alternatives. I agree that the import declarations are huge, but I wanted to avoid from ... import * since people complain about it being nasty for static analysis. If you're not happy with the huge imports, obviously they're gonna have to go. As for the reason I did this in the first place: I believe that splitting code up into multiple files makes things easier for a number of reasons. While the division is arbitrary, and has no functional purpose, I think it makes it easier to deal with working on two things at once (since in many cases they'll be in different files). There are a few other reasons (git being one of them), but it is mainly a personal choice. I don't really ever see an entire project in two files, so I thought it would be best to split them up. Ultimately up to you, of course. But I believe it would be best separate.
The assertions aren't there just for testing purposes: they're to help ensure you don't accidentally create an impossible model when modifying it. Like putting a literal as the body of a class declaration. Many of them should really be if/raise statements I think. I'm sure they're going to have some sort of impact, of how much I'm unsure. I guess I'll have to do some testing.
Moving contributors to a contributor file works 100% fine for me! I wasn't sure how you'd feel about me adding myself in there like that, but its basically the same as me asking you directly anyway since you review it before pulling it.

Thanks for getting back to me!

Craxic and others added 30 commits March 9, 2015 10:33

Massive refactoring of the giant 2 files into multiple smaller files

5d1bad5

More work on making the tokens available to the user

2fc9b87

More work fixing type assertions

1ba1bfc

More work on altering models

14aa960

Yet more work...

cad0e3b

Surprise! More work.

f6cc46e

More work. Making OpenJDK test

ce065db

Fixed all parsing bugs (hopefully) and finished OpenJDK7 test

71f9284

Added a few serializers

feb0732

More work...

8571965

Added more serializers

e71d7a2

Added more serialization

48a837c

Added more serializers

a719ded

Added yet more serializers

8ad9490

Tiny bit more work here and there.

264476d

Bit more fixes for serialization

832cadb

A little bit more tidying up of serializers.

e12ca52

Serializers all done now, works for all of OpenJDK7. Now need to veri…

0536315

…fy that output can be parsed and that the parse result is identical to input parse result (but that might be a stretch...)

More bug fixes to serializers

a8b33e8

Fixed a few issues with the serializers

a373902

Started doing a of cleanup. Serializers still not 100% though.

fa7bbe8

Massive changes, many of them untested. Started writing another examp…

65473f3

…le for the serialization

Tested.

2f86829

To Python 3

e38ba11

Fixed file closing issue.

edecf7b

Fixed encoding issue

1624379

Fixed Python2 compatibility. Prevented running OpenJDK test in travis…

7851a2b

… (takes too long)

Minor updates to the function_cacher.py example

fbf979e

Updated setup.py

6bea580

Minor changes to function_cacher and source_element.py

7f44103

Craxic added 3 commits March 23, 2015 16:12

Fixed mistake I made with equality function, fixed error in function_…

49828c4

…cacher.py

Altered the way static functions are cached

c14cb92

Decided that this whole function cacher shouldn't be an example and d…

3241e41

…eserves its own project

Forced usage of old dependency

317c779

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Added serialization and token collection. #31

Added serialization and token collection. #31

Craxic commented Mar 22, 2015

musiKk commented Apr 1, 2015

Craxic commented Apr 2, 2015

Added serialization and token collection. #31

Are you sure you want to change the base?

Added serialization and token collection. #31

Conversation

Craxic commented Mar 22, 2015

musiKk commented Apr 1, 2015

Craxic commented Apr 2, 2015