-
-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
LT-20351: Fix LT-20351 and implement Novel Root Guesser #245
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #245 +/- ##
==========================================
+ Coverage 69.80% 69.88% +0.07%
==========================================
Files 379 379
Lines 31480 31732 +252
Branches 4400 4450 +50
==========================================
+ Hits 21976 22175 +199
- Misses 8483 8525 +42
- Partials 1021 1032 +11 ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 8 of 9 files at r1, 1 of 1 files at r2, all commit messages.
Reviewable status: all files reviewed, 9 unresolved discussions (waiting on @isaac091 and @jtmaxwell3)
src/SIL.Machine/Annotations/Annotation.cs
line 142 at r2 (raw file):
/// </value> public bool IsNaturalClass
The Annotation
class is intended to be general-purpose. These properties are specific to HC. You can either use the Data
property or add these properties to the FeatureStruct
. If you add them as features, you will need to update HC's feature system, i.e. HCFeatureSystem
.
src/SIL.Machine.Morphology.HermitCrab/CharacterDefinitionTable.cs
line 138 at r2 (raw file):
// Check for pattern language. // NB: This only happens when the characters don't match. if (normalized[i] == '[')
Did you consider using a regex to parse the pattern language?
src/SIL.Machine.Morphology.HermitCrab/Morpher.cs
line 88 at r2 (raw file):
} public IList<RootAllomorph> LexicalPatterns
This should return a read-only type, such as IEnumerable
.
src/SIL.Machine.Morphology.HermitCrab/Morpher.cs
line 135 at r2 (raw file):
File.WriteAllLines("analyses.txt", lines.OrderBy(l => l)); #endif var origAnalyses = guessRoot ? analyses.ToList() : null;
Please only use var
when the type is explicit elsewhere in the line.
src/SIL.Machine.Morphology.HermitCrab/Morpher.cs
line 137 at r2 (raw file):
var origAnalyses = guessRoot ? analyses.ToList() : null; var syntheses = Synthesize(word, analyses); if (guessRoot && syntheses.Count() == 0)
C# uses lazy evaluation for LINQ functions. If you are going to iterate over an enumerable multiple times (Count()
will iterate over the enumerable), you should first convert it to a list or array.
src/SIL.Machine.Morphology.HermitCrab/Morpher.cs
line 372 at r2 (raw file):
IEnumerable<ShapeNode> shapeNodes = input.Shape.GetNodes(input.Range); HashSet<string> shapeSet = new HashSet<string>(); foreach (RootAllomorph lexicalPattern in _lexicalPatterns)
Did you consider implementing this using an FSA or the Pattern
class?
src/SIL.Machine.Morphology.HermitCrab/Morpher.cs
line 430 at r2 (raw file):
if (prefix == null) prefix = new List<ShapeNode>(); if (pattern.Count() == p)
You should use the Count
property.
src/SIL.Machine.Morphology.HermitCrab/Morpher.cs
line 440 at r2 (raw file):
// Try skipping this item in the pattern. results.AddRange(MatchNodesWithPattern(nodes, pattern, n, p + 1, false, prefix)); if (nodes.Count() == n)
You should use the Count
property.
src/SIL.Machine.Morphology.HermitCrab/XmlLanguageLoader.cs
line 720 at r2 (raw file):
_language.NaturalClasses.Add(nc); _natClasses[(string)natClassElem.Attribute("id")] = nc; foreach (var table in _language.CharacterDefinitionTables)
Please only use var
when the type is explicit elsewhere in the line.
I didn't use a regular expression to find or match patterns because I wanted this to be backwards compatible with existing projects. The characters '[', ']', '(', ')', and '*' all could be valid characters in the character table. So, I can't tell if a lexeme with [...] in it is a pattern without analyzing the shape. I only recognize the pattern if the lexeme would have an invalid shape otherwise. If I have to analyze the shape anyway, it isn't hard to make the shape a pattern to match against other shapes. I fixed the other issues. |
One advantage of matching the shapes rather than using a regular expression is that I unify the feature structures while matching. So, the pattern can add some details to the lexeme being matched (such as accent marks or tones). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 6 of 7 files at r3, 1 of 1 files at r5, all commit messages.
Reviewable status: all files reviewed, 4 unresolved discussions (waiting on @isaac091 and @jtmaxwell3)
src/SIL.Machine/Annotations/Annotation.cs
line 140 at r5 (raw file):
/// <c>true</c> if this annotation is iterative, otherwise <c>false</c>. /// </value> public bool Iterative
This property is also specific to HC.
src/SIL.Machine.Morphology.HermitCrab/CharacterDefinitionTable.cs
line 138 at r5 (raw file):
// Check for pattern language. // NB: This only happens when the characters don't match. if (normalized[i] == '[')
This should be restricted so that it only applies to root allomorphs. You could pass in an optional Boolean that enables this parsing code.
src/SIL.Machine.Morphology.HermitCrab/Morpher.cs
line 372 at r2 (raw file):
Previously, ddaspit (Damien Daspit) wrote…
Did you consider implementing this using an FSA or the
Pattern
class?
Machine already provides multiple ways to implement this type of functionality (FSA or Pattern
class). One of these methods should be used. If you don't have time to implement this right now, you should add a TODO
comment that indicates that this should be reimplemented in the future using FSAs.
src/SIL.Machine.Morphology.HermitCrab/RootAllomorph.cs
line 35 at r5 (raw file):
/// Does this represent a lexical pattern (e.g. [Seg]*)? /// </summary> public bool IsPattern
I would move this code to the constructor, so that it is only computed once.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 9 of 9 files at r6, all commit messages.
Reviewable status: all files reviewed, 1 unresolved discussion (waiting on @isaac091 and @jtmaxwell3)
src/SIL.Machine/Annotations/ShapeNode.cs
line 51 at r6 (raw file):
/// Whether this is an iterative node in a lexical pattern. /// </summary> public bool Iterative
You should use an extension method. See the HermitCrabExtensions.cs
file for examples.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 5 of 5 files at r7, all commit messages.
Reviewable status: complete! all files reviewed, all discussions resolved (waiting on @isaac091)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 2 of 2 files at r8, all commit messages.
Reviewable status: complete! all files reviewed, all discussions resolved (waiting on @isaac091)
Sorry for all of the formatting errors. I don't have a Resharper license and I'm new to C#.
This change is