Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ignore a note that occurs right after \id marker #203

Merged
merged 1 commit into from
Jun 4, 2024
Merged

Conversation

ddaspit
Copy link
Contributor

@ddaspit ddaspit commented May 23, 2024

Copy link
Collaborator

@Enkidu93 Enkidu93 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 3 of 4 files at r1, all commit messages.
Reviewable status: 3 of 4 files reviewed, 1 unresolved discussion (waiting on @ddaspit)


src/SIL.Machine/Corpora/UsfmTokenizer.cs line 393 at r1 (raw file):

                            if (
                                usfm[usfm.Length - 1] == ' '
                                && ((prevToken != null && prevToken.ToUsfm().Trim() != "") || !tokensHaveWhitespace)

What's the point of this logic here? I'm not quite getting it.

@codecov-commenter
Copy link

Codecov Report

Attention: Patch coverage is 95.65217% with 1 lines in your changes are missing coverage. Please review.

Project coverage is 67.32%. Comparing base (a19d577) to head (6cc0bc7).

Files Patch % Lines
src/SIL.Machine/Corpora/UsfmTokenizer.cs 93.75% 0 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master     #203      +/-   ##
==========================================
+ Coverage   67.31%   67.32%   +0.01%     
==========================================
  Files         441      441              
  Lines       35001    35021      +20     
  Branches     4695     4700       +5     
==========================================
+ Hits        23560    23579      +19     
  Misses      10352    10352              
- Partials     1089     1090       +1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Collaborator

@Enkidu93 Enkidu93 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 1 of 4 files at r1.
Reviewable status: all files reviewed, 1 unresolved discussion (waiting on @ddaspit)


src/SIL.Machine/Corpora/UsfmTokenizer.cs line 393 at r1 (raw file):

Previously, Enkidu93 (Eli C. Lowry) wrote…

What's the point of this logic here? I'm not quite getting it.

Sorry, I haven't approved this yet because I was hoping to get a better handle on this logic here. Could you explain what's going on, @ddaspit ?

Copy link
Contributor Author

@ddaspit ddaspit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: all files reviewed, 1 unresolved discussion (waiting on @Enkidu93)


src/SIL.Machine/Corpora/UsfmTokenizer.cs line 393 at r1 (raw file):

Previously, Enkidu93 (Eli C. Lowry) wrote…

Sorry, I haven't approved this yet because I was hoping to get a better handle on this logic here. Could you explain what's going on, @ddaspit ?

This checks to see if we need to strip off the space at the end before a newline. There are two cases that we have to handle:

  1. The tokens contain whitespace. This occurs when we are trying to preserve the whitespace from the original USFM.
  2. The tokens do not contain whitespace. This occurs when we want to normalize the whitespace.

This logic is preserved from the original USFM parser code in Paratext.

Copy link
Collaborator

@Enkidu93 Enkidu93 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: all files reviewed, 1 unresolved discussion (waiting on @ddaspit)


src/SIL.Machine/Corpora/UsfmTokenizer.cs line 393 at r1 (raw file):

Previously, ddaspit (Damien Daspit) wrote…

This checks to see if we need to strip off the space at the end before a newline. There are two cases that we have to handle:

  1. The tokens contain whitespace. This occurs when we are trying to preserve the whitespace from the original USFM.
  2. The tokens do not contain whitespace. This occurs when we want to normalize the whitespace.

This logic is preserved from the original USFM parser code in Paratext.

Thank you!

@ddaspit ddaspit merged commit bf2b46d into master Jun 4, 2024
4 checks passed
@ddaspit ddaspit deleted the note-after-id branch June 4, 2024 19:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

"Stack empty" error for invalid USFM
3 participants