Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added serialization and token collection. #31

Open
wants to merge 34 commits into
base: master
Choose a base branch
from

Conversation

Craxic
Copy link

@Craxic Craxic commented Mar 22, 2015

Hi there!

I've been working pretty hard on this little project for some time now and I hope you like the stuff I've done. I've added the ability to write out the tree back to a Java file. So it's now possible to simply add and remove functions and variables just like you would do when modifying something like an HTML file. I've also tried to collect all the tokens from the parser and attach them to their model classes. This would make it easy for someone to find the exact location of something inside a Java file. I've also added a test that pulls all of openjdk7 and runs the parser against them to ensure that the parser succeeds without error. It also serializes and then de-serializes the result to ensure that serialization results in an identical tree as the original parse. Unfortunately, it takes too long on Travis (>50 minutes) so I disabled it when running in Travis. I've also re-factored the two source files into many separate source files because I think it's nicer that way. If you don't like it I'm sure we can fix it!

There are a couple of other reasons why you may not want to accept this pull request:

  • I'm almost certain that speed will have decreased a bit due to the many assertions I've added to ensure it is very difficult to create an invalid tree. Also collecting the tokens will take some time. But it's still usable and parses, serializes, re-parses and then compares the two trees over the entirety of the openjdk7 in about 10 minutes on my PC (single core only).
  • This is not compatible with current projects using the library. Many many things have changed, but I thought it was the best thing to do. I'm not sure how many people are really using the library, but I can't imagine it would be too hard for them to simply use an old version of the library or make necessary adjustments to restore compatibility.
  • I added my name in the COPYING file. I've added at least 3752 LOC, so I hope that's OK and doesn't come across as narcissistic...

Anyways, I hope you like it. Let me know!

Craxic and others added 30 commits March 9, 2015 10:33
…fy that output can be parsed and that the parse result is identical to input parse result (but that might be a stretch...)
@musiKk
Copy link
Owner

musiKk commented Apr 1, 2015

Hi Craxic,

I noticed your work on your fork for quite a while. It's very impressive indeed. When you read the following, please keep in mind that even if I voice criticism I think you did a lot of tremendous work and I'm thankful that you put so much time into the project. I assume you had an itch to scratch.

My biggest problem with this PR is that it is so huge. I like to have discrete units of work which are only as big as necessary. This applies to branches and doubly to commits (haven't checked those yet). The branch touches a LOT of things and is basically a complete reorganization of the whole code base. Based on my reaction time you can see that I can't put too much time into the project and this makes reviewing big changes harder than small ones.

I have a few comments on what's probably only a fraction of what actually deserves commenting on. I'll add more once I get to it.

  • Serializing to Java is a great feature. I think this would be more appropriately implemented as a visitor. The way it's currently implemented, it bloats the model. There are many ways in which the model could be serialized: Back to Java as it was or pretty printed, XML, JSON, HTML and more. Those shouldn't be in the model as well.
  • I put off organizing the code into multiple modules because I thought it would be more pythonic to have everything that belongs together be in a single module. I'm not sure what's the best way, I'm not a super experienced Python dev. In addition to that the division is sometimes arbitrary. Do I group expression statements to expressions or statements? Having everything in one file made finding stuff pretty fast and definitely helped during initial development. I haven't made my mind up. What I will definitely not have is putting the burden of importing lots of packages to the user. Some import lists have Java-like proportions. There is one model package to import and that's it. I think this can be achieved by importing the rest from the __init__.py files.
  • If those asserts prove to be too big of a performance hit and cannot be disabled, they won't make it. This too could easily be part of a visitor during tests.
  • I believe in giving credit where credit is due. Seriously. However, in my opinion the license file is not the place to put it as this doesn't scale and... well... "why is this guy mentioned there and I'm not?!". I'd like to add a contributers file (or a section to the readme) and put everybody who contributed and wants a mention in there. A contribution the size of yours would definitely warrant a special thanks section. Is that something that could work for you?

Those are my comments for now. As I said, this may take a while... :)

@Craxic
Copy link
Author

Craxic commented Apr 2, 2015

Hi musiKk,

  • You're right, it does bloat the model. I think you're correct in moving serialization out of there and into something else. I will take a look at doing this during the Easter break. As to where, I'm not sure. I sort of dislike the "visitor" pattern, only because it uses dynamic calling and ruins static analysis by PyCharm (of which I'm a big fan and I think others are too). There are also some other technical limitations I'm concerned about (not sure yet). As for an alternative, I would simply move the serialization methods to module level functions and then move them into their own "serialization" directory. This way I can avoid having a gigantic visitor class/subclasses. Although then they are kind of detached from their model classes... Let me know what you think about that idea, perhaps we can arise at a solution we both like.
  • I'm not a super-experienced Python developer either. I was hoping you were so you'd know the best way to do this 😃. You can indeed import everything in __init__.py, but I remember the only reason I didn't put it in there was because PyCharm wasn't happy about it for whatever reason. I will need to investigate this closer and properly ensure this is the case and investigate alternatives. I agree that the import declarations are huge, but I wanted to avoid from ... import * since people complain about it being nasty for static analysis. If you're not happy with the huge imports, obviously they're gonna have to go. As for the reason I did this in the first place: I believe that splitting code up into multiple files makes things easier for a number of reasons. While the division is arbitrary, and has no functional purpose, I think it makes it easier to deal with working on two things at once (since in many cases they'll be in different files). There are a few other reasons (git being one of them), but it is mainly a personal choice. I don't really ever see an entire project in two files, so I thought it would be best to split them up. Ultimately up to you, of course. But I believe it would be best separate.
  • The assertions aren't there just for testing purposes: they're to help ensure you don't accidentally create an impossible model when modifying it. Like putting a literal as the body of a class declaration. Many of them should really be if/raise statements I think. I'm sure they're going to have some sort of impact, of how much I'm unsure. I guess I'll have to do some testing.
  • Moving contributors to a contributor file works 100% fine for me! I wasn't sure how you'd feel about me adding myself in there like that, but its basically the same as me asking you directly anyway since you review it before pulling it.

Thanks for getting back to me!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants