Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to update cluster with new vibe_core code. #166

Open
chetan2309 opened this issue May 7, 2024 · 15 comments
Open

Unable to update cluster with new vibe_core code. #166

chetan2309 opened this issue May 7, 2024 · 15 comments
Assignees
Labels
bug Something isn't working local cluster Issues encountered in local cluster management script Issues encountered when interacting with the management script (e.g., adding secrets, onnx models)

Comments

@chetan2309
Copy link

In which step did you encounter the bug?

Management script

Are you using a local or a remote (AKS) FarmVibes.AI cluster?

Local cluster

Bug description

Hi Team,

I am working on extending this spectral indices notebook workflow

This is what I wish to achieve:-

  1. Once the images of several indices are generated, I would like to store them at field level hierarchy in our application blob container. So, I am writing a custom component to write these images to blob container.

So, I have defined a task and an edge in indices.yaml workflow like below

upload_to_azure:
    op: upload_to_azure_op
    parameters:
      storage_account_name: admadatastore
      container_name: boundaryimages
      prefix: indices

and I have defined a method in upload_to_azure_op.py file to do the task.

Now, as I understand I would have to perform

pip install ./src/vibe_core

to install this package and then do

farmvibes-ai local update --cluster-name mycluster

to deploy this update code to cluster image

image

I also tried this by stopping the cluster first and then trying but no luck

Steps to reproduce the problem

  1. Make any changes in src/vibe_core/vibe_core package in any of the files that you wish to deploy.
  2. Run command - farmvibes-ai local update --cluster-name mycluster
  3. Unable to update the cluster and getting this error -
    image
@chetan2309 chetan2309 added the bug Something isn't working label May 7, 2024
@github-actions github-actions bot added triage Issues still not triaged by team local cluster Issues encountered in local cluster management script Issues encountered when interacting with the management script (e.g., adding secrets, onnx models) labels May 7, 2024
@renatolfc
Copy link
Contributor

Hi. What’s the output of docker ps?

@chetan2309
Copy link
Author

@renatolfc - Here it is.

image

@chetan2309
Copy link
Author

@renatolfc - Any updates on your end? This is a blocker for me

@chetan2309
Copy link
Author

@renatolfc - I wrapped up my head into the logic of updating the cluster. This is how it goes

  1. Get the old_k3d(I'm hoping this means, current cluster? Right?) -
old_k3d = K3dWrapper(os_artifacts, OLD_DEFAULT_CLUSTER_NAME)
  1. Check if a cluster exists -> If exists, get the confirmation to delete clusters using the old format.
  2. Once confirmed -> Destroy old registry -> This is where it gets interesting
def destroy_old_registry(
    os_artifacts: OSArtifacts, cluster_name: str = OLD_DEFAULT_CLUSTER_NAME
) -> bool:
    container_name = f"k3d-{cluster_name}-registry.localhost"
    docker = DockerWrapper(os_artifacts)
    try:
        result = docker.get(container_name)
        if not result:
            return True
        docker.rm(container_name)
        return True
    except Exception as e:
        log(f"Unable to remove old registry container: {e}", level="warning")
        return False

It seems like there is an assumption on how name of old registry will look like, I am running farmvibes on a VM, the docker ps command gives me a output like this

image

Though it is expecting path like this - k3d-farmvibes-ai-registry.localhost

Of course, this gives it no image and process exit.

@chetan2309
Copy link
Author

@rafaspadilha / @brsilvarec - Bringing this to your attention. This thread is not getting any attention. Please look into this also

@rafaspadilha
Copy link
Contributor

rafaspadilha commented May 16, 2024

Hey, @chetan2309. Sorry for the delay, a new release is coming in the next week and the team is all focused on that.

We currently do not support adding a custom op to FarmVibes.AI. This is in our roadmap, but for now the best way to achieve what you want is through a script interacting with the FarmVibes client. Once the workflow runs are finished, iterate over the output dictionary of each run, retrieve the path to the index rasters, and upload them to your blob storage.

The spectral index notebook has a few examples showing how to interact with the output and retrieve the paths.
Let us know if you have any problem doing that.


About the error that you are seeing with the update command. At some point, we changed how we were naming the registry, and probably forgot to update the destroy_old_registry method. Sorry about that, a fix will come in the future.

As a workaround, you can stop and remove the registry container with:

$ docker stop farmvibes-ai-registry
$ docker rm farmvibes-ai-registry

The update command should work after that.

Another option would be to destroy the old cluster and create it again with:

$ farmvibes-ai local destroy
$ farmvibes-ai local setup

In both cases, your data (cache) wouldn't be affected.

@rafaspadilha rafaspadilha removed the triage Issues still not triaged by team label May 17, 2024
@chetan2309
Copy link
Author

@rafaspadilha - thanks for your help and suggested flow. Can you help me understand how workflow will look like.

  1. A user will create a Field in our ADMA app .
  2. On field creation we are thinking of calling let say "spectral indices" workflow may be via an Azure function
  3. Since it could be a long running process, how to best poll? We can't use run.monitor()?
  4. I am thinking of saving an association between my field_id in ADMA and run_id. Once the process runs completely(by checking via run_id), I will iterate as you suggested?

We will be calling this process via our app/services and not through notbooks.

BTW, is it necessary to have this deployed via k8s to get the public REST API? I deployed it via VM and i can't seem to have access to REST API.

Thanks,
Chetan

@rafaspadilha
Copy link
Contributor

rafaspadilha commented May 23, 2024

This looks good, @chetan2309. A few things to consider:

For pooling the status of a run (3):

  • run.monitor() blocks until the workflow finishes (with a done, failed, or cancelled status) and present a table for the user to follow the progress of the workflow. This last part might not be necessary for you, but you could reuse some of its code that checks the status of the run from time to time.
  • Another possibility would be to use run.block_until_complete() method. This won't show any progress status, but will block the process until the workflow run finishes (with the possibility of defining a timeout).

The rest of the workflow (4) looks good. We envisioned the notebooks mostly as examples of the workflows and how to interact with our Python client. For applications, we highly recommend using it or, less so, interacting directly with the REST API. Let us know if any doubts arise during the development or if you have any feedback on how we can improve this integration.

Accessing the REST API in a local VM should work fine, as long as the port is exposed and there is no network policy blocking outside connections. Could you check to see if these two are properly set up?

@renatolfc
Copy link
Contributor

renatolfc commented May 23, 2024 via email

@chetan2309
Copy link
Author

If you want to expose your cluster on the local network, please pass the --host 0.0.0.0 flag when creating your cluster. Otherwise, the cluster will only listen on localhost.

Hello @renatolfc - I hope you specifically meant passing 0.0.0.0?

I tried this. Cluster updated successfully but I can’t access the rest api - myvmip:31108/v0/docs

@renatolfc
Copy link
Contributor

Updating won't work. You have to destroy and recreate it. This flag only affects cluster creation.

@chetan2309
Copy link
Author

Updating won't work. You have to destroy and recreate it. This flag only affects cluster creation.

I ran these commands

$ farmvibes-ai local destroy
$ farmvibes-ai local setup —host 0.0.0.0

@renatolfc
Copy link
Contributor

renatolfc commented May 27, 2024

In that case, what's the output of the commands:

  • ip addr
  • docker network inspect $(docker network ls | grep -i farmvibes | awk '{print $1 }' | grep -v NETWORK)
    ?

@chetan2309
Copy link
Author

1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host 
       valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
    link/ether 60:45:bd:ec:f8:2e brd ff:ff:ff:ff:ff:ff
    inet 10.1.0.4/24 brd 10.1.0.255 scope global eth0
       valid_lft forever preferred_lft forever
    inet6 fe80::6245:bdff:feec:f82e/64 scope link 
       valid_lft forever preferred_lft forever
4: docker0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default 
    link/ether 02:42:76:3a:ec:a6 brd ff:ff:ff:ff:ff:ff
    inet 172.17.0.1/16 brd 172.17.255.255 scope global docker0
       valid_lft forever preferred_lft forever
    inet6 fe80::42:76ff:fe3a:eca6/64 scope link 
       valid_lft forever preferred_lft forever
44: br-fcf74542337e: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default 
    link/ether 02:42:e6:74:5f:5e brd ff:ff:ff:ff:ff:ff
    inet 172.21.0.1/16 brd 172.21.255.255 scope global br-fcf74542337e
       valid_lft forever preferred_lft forever
    inet6 fe80::42:e6ff:fe74:5f5e/64 scope link 
       valid_lft forever preferred_lft forever
48: veth089d7be@if47: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master br-fcf74542337e state UP group default 
    link/ether 6a:13:0b:a2:e8:8d brd ff:ff:ff:ff:ff:ff link-netnsid 1
    inet6 fe80::6813:bff:fea2:e88d/64 scope link 
       valid_lft forever preferred_lft forever
50: veth4874985@if49: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master docker0 state UP group default 
    link/ether 3a:bc:f7:f3:4e:43 brd ff:ff:ff:ff:ff:ff link-netnsid 3
    inet6 fe80::38bc:f7ff:fef3:4e43/64 scope link 
       valid_lft forever preferred_lft forever
52: veth60b8fb7@if51: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master br-fcf74542337e state UP group default 
    link/ether 06:ab:2a:97:a2:4a brd ff:ff:ff:ff:ff:ff link-netnsid 2
    inet6 fe80::4ab:2aff:fe97:a24a/64 scope link 
       valid_lft forever preferred_lft forever
54: veth1a7f1ba@if53: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master br-fcf74542337e state UP group default 
    link/ether 92:a9:41:20:c6:22 brd ff:ff:ff:ff:ff:ff link-netnsid 3
    inet6 fe80::90a9:41ff:fe20:c622/64 scope lin
$ docker network inspect $(docker network ls | grep -i farmvibes | awk '{print $1 }' | grep -v NETWORK)
[
    {
        "Name": "k3d-farmvibes-ai",
        "Id": "fcf74542337e1901c743a58429a9cd79fdac845d7c298718fb08d58d18553c25",
        "Created": "2024-05-27T12:09:14.771811529Z",
        "Scope": "local",
        "Driver": "bridge",
        "EnableIPv6": false,
        "IPAM": {
            "Driver": "default",
            "Options": null,
            "Config": [
                {
                    "Subnet": "172.21.0.0/16",
                    "Gateway": "172.21.0.1"
                }
            ]
        },
        "Internal": false,
        "Attachable": false,
        "Ingress": false,
        "ConfigFrom": {
            "Network": ""
        },
        "ConfigOnly": false,
        "Containers": {
            "6ceefa381ff14f3757882353bfb425ef046224ca1e3f8556e4dcf0c1fd533e5f": {
                "Name": "k3d-farmvibes-ai-server-0",
                "EndpointID": "78d9f56271a1bc617cc9f92e6263cb8ab314b99d63a22694a20942f06dca8932",
                "MacAddress": "02:42:ac:15:00:02",
                "IPv4Address": "172.21.0.2/16",
                "IPv6Address": ""
            },
            "e2044bfa3f6b107579717fb9ba6d6d77babe4a4912e811e8e3c214d5bf9779f8": {
                "Name": "farmvibes-ai-registry",
                "EndpointID": "2dc48b78927940894623d17ced9a43710784e34fca78d525e1d62446cc9b63e5",
                "MacAddress": "02:42:ac:15:00:04",
                "IPv4Address": "172.21.0.4/16",
                "IPv6Address": ""
            },
            "e8b6e43d32d9df59d5dc07fd0911ccbb1fe3799e33e28d0331d99e68f55edca2": {
                "Name": "k3d-farmvibes-ai-serverlb",
                "EndpointID": "6d190dbbc737602f6554ebc5658fe614fc68d921dd6c2519d323b8f9c4efaaa0",
                "MacAddress": "02:42:ac:15:00:03",
                "IPv4Address": "172.21.0.3/16",
                "IPv6Address": ""
            }
        },
        "Options": {
            "com.docker.network.bridge.enable_ip_masquerade": "true"
        },
        "Labels": {
            "app": "k3d"
        }
    }
]

@renatolfc
Copy link
Contributor

Hello Chetan,

Seems like I was mistaken in assuming docker would be using host networking.

Probably, the easiest way to expose the FarmVibes.AI service would be by using a reverse proxy such as nginx.

I've seen recommendations of nginx proxy manager as a tool to help you with that.

Another possibility would be to use iptables or a firewall front-end to do port-forwarding for the farmvibes rest api.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working local cluster Issues encountered in local cluster management script Issues encountered when interacting with the management script (e.g., adding secrets, onnx models)
Projects
None yet
Development

No branches or pull requests

3 participants