Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for field delimiter detection #43

Open
PoshAlpaca opened this issue Feb 17, 2022 · 2 comments
Open

Add support for field delimiter detection #43

PoshAlpaca opened this issue Feb 17, 2022 · 2 comments
Assignees
Labels
enhancement New feature or request

Comments

@PoshAlpaca
Copy link

Is your feature request related to a problem?

When using CodableCSV to load user-provided CSV files, one currently needs to ask the user which field delimiter is used in their file.

Describe the solution you'd like

It would be nice if CodableCSV had an option to automatically infer the field delimiter from the provided file.

I saw that this feature is on the roadmap, along with row delimiter detection and header detection. There are also already some references to it in the code, with the idea to use auto-detection when the field delimiter is set to nil in the reader's configuration.

I'd be happy to contribute this feature. My idea was to port the dialect detection code from the CleverCSV Python library to Swift.

Describe alternatives you've considered

An alternative would be to use the library directly, however that would introduce a dependency to the project, and, more importantly, I'm not quite sure how good Swift's support is for calling Python code. I guess it wouldn't work on iOS, for example?

@dehesa what do you think?

@PoshAlpaca PoshAlpaca added the enhancement New feature or request label Feb 17, 2022
@dehesa
Copy link
Owner

dehesa commented Feb 17, 2022

Hey @PoshAlpaca,

Delimiter inference is indeed something I always wanted to do and plan for, but never really got into doing it. To be honest, I haven't even begin to think how to approach the problem. So, if you want to research it and come up with a solution, I will be more than happy to review it.

The inference code is supposed to live here. You would probably want to expand the switch statement to indicate which delimiter has the user input; i.e.:

  • you know the field delimiter, but not the row delimiter, or
  • you know the row delimiter, but not the field delimiter, or
  • you know neither.

I don't want to add dependencies to the project, so if you want to take this over, I would ask you to write Swift code directly.

@ws-garcia
Copy link

Hello @PoshAlpaca and @dehesa. Recently I developed a more simple mechanism to implement CSV file dialect detection on data ingesting pipelines. The approach is described in a research paper and also has a python implementation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants