-
Notifications
You must be signed in to change notification settings - Fork 20
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Problem with quotes in tsv files #40
Comments
Parsing and detecting errors in this utility is handled by https://pkg.go.dev/encoding/csv#Reader Which seems to complain if the quotes are not the first or last character in the field.
Only the second case throws the error for me. |
It certainly could be the second case. Since this is foreign language prose and not 'clean' text the expectation is that when it is defined as tab delimited then it should not matter if and where any quote may occur. So in your second example the text should 'properly' lint as with \t replaced by line feed: dondon So it looks like the bug is with csv#Reader? I'm really just checking that the number of columns is accurate. And for now Awk will do the job, But it would be great to see tsv handled correctly here. |
I'm not certain if its a bug or not, because the Reader docs are not explicit on tab delimited data.
|
I'll just use awk. The whole point of tab delimiters is to avoid the numerous problems of quote delimiters. In a tab delimited file quotes should not be considered as anything but another string character. I guess csv#Reader is true to its name, comma separated. It does not understand tabs correctly. |
When linting tsv files, I get:
The record 1035 is as follows. But since this is tsv (for this very reason) should any quoting not be totally ignored as an error?
The text was updated successfully, but these errors were encountered: