-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for org.openx.data.jsonserde.JsonSerDe in Hive connector #5638
Comments
Beware of rcongiu/Hive-JSON-Serde#228 if you’re going to handle timestamps with that serde. |
Some other previous past reports of problems with openx json serde -- https://prestosql.slack.com/archives/CGB0QHWSW/p1595400446310500 |
We recommend using |
Guys, we really need this issue. |
Hey guys. I don't think it's an enhancement. I think it really mandatory. Is there anything I can do to advance the issue? |
would u like to code up a PR? @GalDayan |
Sure. I would love to help. |
@GalDayan whip up something that works on your local first can deal with the paperwork later |
@pettyjamesm is there any other code version we should be using instead? |
@findepi none that I’m aware of. We’ve maintained a version with some other thread safety fixes (eg: unsynchronized static map access: https://github.com/rcongiu/Hive-JSON-Serde/blob/5c7ca56d084c6b507dcf78ad16b66e01c518201f/json-serde/src/main/java/org/openx/data/jsonserde/objectinspector/JsonObjectInspectorFactory.java#L32) but doubt we’ll be taking on the task of maintaining an open source fork, which is what seems to be what’s proposed here. |
Any updates? @electrum |
Also having the same issue when upgrading with 341+. We have tables in the data lake populated using openx JSON serde. It will very difficult for us to migrate to a different serde. |
We looked into this and it turns out the OpenX serde is incompatibility licensed and cannot be included here. We discussed this with @martint @electrum and myself, and it seems the best course of action is to reimplement the OpenX serde, avoiding the problematic dependency, and also avoiding the problematic dependency on Hive APIs (which would help avoid things like #5602). I am not aware about anyone working on this currently. |
hey guys, is there an ETA on support for openx serde with trino? even we are facing a similar issues as pointed out by @chickenPopcorn here would be great to see this support made available :) |
I managed to get a Trino query to return data from a table using I have no idea how broken this method is / if there are concurrency issues, but it did return data :) Basically, it just requires getting the $ curl -o ./plugin/hive/json-serde-1.3.8-jar-with-dependencies.jar http://www.congiu.net/hive-json-serde/1.3.8/cdh5/json-serde-1.3.8-jar-with-dependencies.jar |
The publicly available openx serde is _very _ broken when used from within Trino, because it isn’t written in a way that tolerates separate threads reading data concurrently on the same JVM. Athena has many, many patches to try to fix the issues we have found over time and it seems like we’re always finding more. When I’ve tried contributing some of those patches upstream, the PRs have sat open without review comments or being merged for years. The project seems entirely unmaintained, but I can assure you that if you’re using the publicly available artifact- you will get incorrect results. Some of those will be silent and others can cause loud failures. I do not recommend using it. |
Understood, good to know! I've come to the conclusion that re-using Athena objects in Trino is generally unsupported (views dont work either). Unfortunately the recommended SerDe for jsonlines files ( Basically, this: https://stackoverflow.com/questions/58606743/hive-table-with-nested-json-as-string-value |
@pettyjamesm @erindru , |
Starburst OpenX serde fork has some fixes regarding concurrent parsing of timestamps which was using thread-unsafe classes in the original code. There are multiple features added as well, such as compatibility with Hive 3 APIs. Check out the develop branch. |
Checklist
The text was updated successfully, but these errors were encountered: