-
Notifications
You must be signed in to change notification settings - Fork 28
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Document Azure Blob Storage support #256
Comments
In which form should we address this ? I would think about a continuation of https://github.com/zero-one-group/geni/blob/develop/docs/kubernetes_basic.md At the end of the Kubernes setup, the next natural question is: There is quite some options for it:
All of this are a bit complex, and depend on he "concrete setup" . And they have little to do with "geni" itself, and are somewhere else documented.
Maybe the best is just to point this out at then end of kubernetes_basic.md, without giving a solution (because there are so many) |
I added a chapter into the Kubernetes documentation accordingly: https://github.com/behrica/geni/blob/develop/docs/kubernetes_basic.md |
Hi @behrica, thank you again for bringing this up.
Yes, I think that's probably a good place to put it. I think some docs do get a bit long, which is fine.
In which case, we can perhaps link to the documents? I still think it's good to have an example of a working version. I'm happy to try out one of your examples and try to get it working!
Would you like to make a PR? I believe there are some typos - I hope you're okay with it being reviewed 😄 |
I made a PR. |
This does not add yet anything to read the files from storage. All realistic examples, would need to assume the existence of some form of "cloud storage "of data. |
I think it's great! I'm just reviewing the styling, so that it's a bit more consistent throughout the repo 😄
Ah I see, in which case it may become a bit too involved to setup. Could we work on some public Azure Storage files, but I'm not sure what's available out there. |
I did a complete walk-through, which starts from "zero" up to analysing a 10GB CVS file stored in a newly created Azure File Storage with Geni on an AKS cluster. All commands are there, but I did not write any text yet. @anthony-khong Do you have a way to try it out and give me some feedback ? If you copy / paste all commands after each other it should all work. Starting from scratch made it rather long, but like this it's easy reproducible. |
Hi @behrica, absolutely, I'll give it a go in the coming days and report back to you! It looks really neat! |
Hi @behrica, I've given it a go, and, as before, it works as expected! And I've never used Azure before, so that's great! Really looking forward to merging this. I've got some comments and feedback.
Please let me know if you'd like me to chip in on some of these. I'd be very happy to work on it! I think this a really cool guide. I would love to make a lein template that has everything here in it. All you do is |
Thanks for the feedback, I will take it on board when writing the text. Working in a "kubectl exec" terminal is not the most comfortable experience of the world. So what I do personally, is so start an nRepl in the driver node, instead of shelling into it. To that one I can then connect remotely from Emacs or any other nRepl client. I could add this as an optional step at the end. |
I think that's a really good point. Copying and pasting to the terminal is not the end of the world, but it would absolutely be a deal breaker for some people. I think if the I think we can have a main-like script that gets executed during startup - it can look like Geni's main, but with a different [init-eval[(https://github.com/zero-one-group/geni/blob/develop/src/clojure/zero_one/geni/main.clj#L24) where we create a SparkSession that connects to an Azure cluster, we fix the port instead of picking a random one, then on a separate terminal instance we do |
I am still not sure, If I want to go the geni cli way. But how long will it take, until I want to add dependencies to it ? |
I did without using geni CLI, just by adding the nrepl start into the command used by the string container. This "scenario" is as well a potential realistic usage scenario, in which the :
This is maybe still an enterprise scenario, as the Kubernetes cluster costs money, while existing. |
We could potentially make a bash script, which does the whole setup "on keypress". Including copy of a data file into the blob storage. (this can be the most time consuming part). This would be more attractive for users, which don't want to have a long running Kubernetes / blob storage. |
Yes, this is exactly what I meant! I agree with you, Geni CLI serves a simple use case to get started up and running quickly (and most realistically on a local machine). Instead of a bash script, what do you think about making a lein template where you could just do What's the most effective way for me to help you with this? It sounds like a great addition to the library, and I would love to chip in here. |
I thought about this. The setup script is mostly calls to "az", which could be easily shelled out to or even use the proper java/clojure client. I have a "running bash script" ready, I will share with you and you can have a look. |
Please find here the working setup script: https://github.com/behrica/geni/blob/azure_storage_doc/docs/azureSetup/setupKubernetes.sh It does all the tasks as from here : in one go. In the beginning there are some parameters to be set, if needed. After it finishes, you can do 2 port forwards (in different shells):
and
to have the nrepl port and the spark web-gui proxied on your local machine. |
I am not sure, in which form the script could be re-used, either "as is" , or with some modifications. The concrete setup has so many "moving parts", which a user want to do eventually differently. |
@behrica, that sounds awesome! Would you mind sharing what config options to pass, and we can add it to the docs or the README? 😄
Originally posted by @anthony-khong in #228 (comment)
The text was updated successfully, but these errors were encountered: