Skip to content
This repository has been archived by the owner on Dec 21, 2022. It is now read-only.

no supported spark driver for datastore #3

Open
reactivedev opened this issue Oct 15, 2018 · 1 comment
Open

no supported spark driver for datastore #3

reactivedev opened this issue Oct 15, 2018 · 1 comment

Comments

@reactivedev
Copy link

Dear google,

I have been a GCP user for the past 6 months and I would like to take this opportunity to report my agony. PLEASE DO NOT FOOL DEVELOPERS WITH FALSE EXAMPLES!
Google doesn't provide supported spark driver for neither pubsub not datastore. Its a shame. Even worse is the following lines of code:

def saveRDDtoDataStore(tags: Array[Popularity], windowLength: Int): Unit

Please read the function name "saveRDD", and you are accepting an array. This is called cheating.

Even worse:

sortedHashtags.foreachRDD(rdd => {
    handler(rdd.take(n)) //take top N hashtags and save to external source
})

Do you know the consequences of using take? Are you a spark developer?

I had to go great lengths to ensure I don't Ack (pubsub) before I process my records. I had to resort to sub-optimal plan-B (broadcast variables) when datastore driver didn't support stream-join.

Its a fact that you want to capture your big-client by forcing them to use propitiatory software like grpc, cloud-data flow by not providing proper drivers for spark. Why beat around the bush?

What a shame! remember "DON'T BE EVIL?" This is evil.

@theacodes
Copy link

@jphalip, @texasmichelle, @holdenk can you take a look at this bug report?

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants