Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Code review and repo structure - suggestions #20

Open
akolonin opened this issue Dec 28, 2020 · 0 comments
Open

Code review and repo structure - suggestions #20

akolonin opened this issue Dec 28, 2020 · 0 comments
Labels
enhancement New feature or request

Comments

@akolonin
Copy link
Member

akolonin commented Dec 28, 2020

Comments and suggestions from my Russian colleague @rssdev10 - googletranslated below:

  1. Tried to find a test that verifies the code ... not so easy this .... Your project is not structured. https://github.com/aigents/aigents-java-nlp/tree/master/src/test/java/org/aigents/nlp/gen. Such files are usually placed in src / test / resources. And just src / test / java / contains exactly the test code
  2. Project file, I would recommend using gradle…. Really easier to maintain. If you need help, I can do it, but it is advisable to change the project structure a little
  3. Found a suspicious place - Does this really need to be done strictly sequentially?…. https://github.com/aigents/aigents-java-nlp/blob/master/src/main/java/org/aigents/nlp/gen/Segment.java#L1205
return l.replace("(", "").replace(")", "").replace("[", "").replace("]", "").replace("}", "").replace("{", "")
    .trim();
  1. Another suspicious place is https://github.com/aigents/aigents-java-nlp/blob/master/src/main/java/org/aigents/nlp/gen/Segment.java#L1300 - Computational complexity str.contains 8 times in the worst case .... (There is https://docs.oracle.com/javase/7/docs/api/java/util/regex/Pattern.html which can be used)
if (str.isEmpty() || str.contains("[") || str.contains("]") || str.indexOf(".") != str.length() - 1 
      || str.contains("(") || str.contains(")") || str.contains("{") || str.contains("}")
      || str.contains("\"") || str.contains("'"))
  1. Also, I recommend running everything through a static analyzer as part of Intellij IDEA
  2. I see the sentence splitter call code in https://github.com/aigents/aigents-java-nlp/blob/master/src/main/java/org/aigents/nlp/gen/Segment.java#L92 but something very lot. And for any benchmark, we need to remove the console output from here. Otherwise it will significantly slow down
  3. The code was not written for use as a library ... https://github.com/aigents/aigents-java-nlp/blob/master/src/main/java/org/aigents/nlp/gen/Segment.java#L748 - will kill the process .... And System.err.println instead of output to a log is also kind of not a very good style.
  4. In general, it will be quite interesting to compare. We now consider the F-measure, productivity. And there is an opportunity to see the errors made when splitting.
@akolonin akolonin added the enhancement New feature or request label Dec 28, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant