Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Protected names are not sanitized correctly in Data Lake Sink / Influx sink #2166

Closed
bossenti opened this issue Nov 10, 2023 · 3 comments · Fixed by #2172
Closed

Protected names are not sanitized correctly in Data Lake Sink / Influx sink #2166

bossenti opened this issue Nov 10, 2023 · 3 comments · Fixed by #2172
Labels
bug Something isn't working connect Related to the `connect` module (adapters)
Milestone

Comments

@bossenti
Copy link
Contributor

Apache StreamPipes version

dev (current development state)

Affected StreamPipes components

Connect

What happened?

When reading the following CSV file with the FileStream adapter and persisting the created the data stream in the data storage, simple string values are lost.
Only string values that are explicitly quoted are kept, but this should not be necessary for a CSV file.

ts,testId,name,assetId
1000,1,Tom,“5“
2000,2,Albert,“1“

How to reproduce?

image image image

Expected behavior

No response

Additional technical information

No response

Are you willing to submit a PR?

None

@bossenti bossenti added the bug Something isn't working label Nov 10, 2023
@bossenti bossenti added this to the 0.95.0 milestone Nov 10, 2023
@bossenti bossenti added the connect Related to the `connect` module (adapters) label Nov 10, 2023
@muyangye
Copy link
Member

Hi @bossenti, I have looked into the issue. The event is "correct" ("correct" meaning key value pairs are the same with the file) in frontend, Kafka producer, and Kafka consumer. However, once the code hits Influx Store, the runtime name of some keys are modified. The root cause is that "name" key in the file stream is a keyword of Influx reserved keywords list and Streampipes sanitizes DataLake measure for those keys conflicting with Influx reserved keywords. If you change "name" to something else not in the list everything works.

To fix this, I propose to also sanitize event instead of only sanitizing DataLake measure. Let me know what do you think🙂!

@bossenti
Copy link
Contributor Author

Hi @muyangye,

you are right, changing the column name makes it work.

As you said, sanitation should prevent naming conflicts and therefore rename the name column. But the expected outcome is then that the values are part of the DataLakeMeasure with the sanitized measure name, which doesn't seem to work here.

Where would you place the sanitation of the event? Sanitation is a specific requirement of the InfluxDb in this case, so I'm a bit hesitant to affect other usages of the event stream as well.

@muyangye
Copy link
Member

Just published a PR! The scope of the sanitization is limited to InfluxDb.

@bossenti bossenti linked a pull request Nov 13, 2023 that will close this issue
bossenti added a commit that referenced this issue Nov 14, 2023
* implement new round processor

* add English locale, icon, and documentation

* fix checkstyle

* support different rounding modes

* add rounding mode in documentation

* fix time display

* let NaryMapping selection account for property scope

* implement boolean filter unit tests

* add common StoreEventCollector class and refactor TestChangedValueDetectionProcessor

* add new class

* show associated pipelines' names and allow one click deletion

* center text

* fix minor error

* replace magic number

* add timeout

* restore newline

* changeb baseurl

* revert port

* revert timeout

* implement pipelines owner check

* undo automatic changes

* enable admin to delete pipelines no matter ownership

* sanitize event

* add newline back

* fix iter is on a copy

---------

Co-authored-by: bossenti <[email protected]>
@bossenti bossenti changed the title CSV reader in FileStream adapter cannot handle strings Protected names are not sanitized correctly in Data Lake Sink / Influx sink Nov 14, 2023
bossenti added a commit that referenced this issue Nov 14, 2023
* implement new round processor

* add English locale, icon, and documentation

* fix checkstyle

* support different rounding modes

* add rounding mode in documentation

* fix time display

* let NaryMapping selection account for property scope

* implement boolean filter unit tests

* add common StoreEventCollector class and refactor TestChangedValueDetectionProcessor

* add new class

* show associated pipelines' names and allow one click deletion

* center text

* fix minor error

* replace magic number

* add timeout

* restore newline

* changeb baseurl

* revert port

* revert timeout

* implement pipelines owner check

* undo automatic changes

* enable admin to delete pipelines no matter ownership

* sanitize event

* add newline back

* fix iter is on a copy

---------

Co-authored-by: bossenti <[email protected]>
@bossenti bossenti modified the milestones: 0.95.0, 0.93.0 Nov 14, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working connect Related to the `connect` module (adapters)
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants