Skip to content

Latest commit

 

History

History
287 lines (202 loc) · 12.6 KB

TROUBLESHOOTING.md

File metadata and controls

287 lines (202 loc) · 12.6 KB

Troubleshooting

This document compiles the most common issues encountered when installing and running FarmVibes.AI platform, grouped into broad categories. Besides the issues listed here, we also collect a list of known issues on our GitHub repository that are currently being addressed by the development team.

  • Package installation:

    Permission denied when installing `vibe_core`

    Old versions of pip might fail to install the vibe_core library because it erroneously tries to write the library to the system's site-packages directory.

    An excerpt of the error follows:

    × python setup.py develop did not run successfully.
    │ exit code: 1
    ╰─> [32 lines of output]
        running develop
        /usr/lib/python3/dist-packages/setuptools/command/easy_install.py:158:
            EasyInstallDeprecationWarning: easy_install command is deprecated. Use
            build and pip and other standards-based tools.
          warnings.warn(
        WARNING: The user site-packages directory is disabled.
        /usr/lib/python3/dist-packages/setuptools/command/install.py:34:
            SetuptoolsDeprecationWarning: setup.py install is deprecated. Use build
            and pip and other standards-based tools.
          warnings.warn(
        error: can't create or remove files in install directory
    

    If that happens, you might have to upgrade pip itself. Please run pip install --upgrade pip if you have write access to the directory where pip is installed, or sudo pip install --upgrade pip if you need root privileges.


  • Cluster setup:

    How to change the storage location during cluster creation

    You may change the storage location by defining the environment variable FARMVIBES_AI_STORAGE_PATH prior to installation with the farmvibes-ai command. Additionally, you may use the flag --storage-path when running the farmvibes-ai local setup command. For more information, please refer to the help message of the farmvibes-ai command.

    Missing secrets

    Running a workflow while missing a required secret will yield the following error message:

    Could not retrieve secret {secret_name} from Dapr.

    Add the missing secrets to the Kubernetes cluster. Learn more about secrets here.

    No route to the Rest-API

    Building a cluster with the farmvibes-ai command will set up a Rest-API service with an address visible only within the cluster. In case the client cannot reach the Rest-API, make sure to restart the cluster with:

    farmvibes-ai local restart
    Running out of space even after changing storage location

    If, even after setting the FARMVIBES_AI_STORAGE_PATH env var to point to another location you are still running out of space with FarmVibes.AI, you might have to change the storage location of the docker daemon.

    That happens because even though asset storage goes into FARMVIBES_AI_STORAGE_PATH, we still use temporary space in our worker pods. If your operating system's disk is limited in space (especially when running multiple workers), you might run out of space. If that's the case, you can change the docker daemon data directory location to another disk with more space.

    For example, to instruct the docker daemon to save data in /mnt/docker-data, you would have to define the contents of /etc/docker/daemon.json as

    {
      "data-root": "/mnt/docker-data"
    }

    As an alternative you might also want to delete data from previous workflow runs to free some space. For more information on how to do that and other data management operations, please refer to the Data Management user guide.

    Unable to run workflows after machine rebooted

    After a reboot, make sure to start the cluster with:

    farmvibes-ai local start

  • Composing and running workflows:

    Calling an unknown workflow

    Calling client.run() with a wrong workflow name will yield the following error message:

    HTTPError: 400 Client Error: Bad Request for url: http://192.168.49.2:30000/v0/runs. Unable to run workflow with provided parameters. Workflow "WORKFLOW_NAME" unknown

    Solutions:

    • Double check the workflow name and parameters;

    • Verify that your cluster and repo are up-to-date;

    Tasks fail with "Abnormal Termination"

    Some workflows, such as the SpaceEye workflow (in the preprocess.s1.preprocess) or the Segment Anything Model (SAM) workflow might use a large amount of memory depending on the input area and/or time range used for processing. When that's the case, the Operating System might terminate the offending task, failing it and the workflow.

    When inspecting the error reason, users might find a text that says ... ProcessExpired: Abnormal termination.

    One solution is to request processing of a smaller region.

    Another solution is to scale down the number of workers with the command ~/.config/farmvibes-ai/kubectl scale deployment terravibes-worker --replicas=1.

    If, even when doing the above, the task still fails, the Kubernetes cluster might need to be migrated to a machine with more RAM.

    Unable to find ONNX model when running workflows

    Make sure the ONNX model was added to the FarmVibes.AI cluster:

    farmvibes-ai local add-onnx <onnx-model>

    If no output is generated, then your model was successfully added.

    Verifying why a workflow run failed

    In case a workflow run fails, you might see a similar status table when monitoring a run with run.monitor() (please refer to the client documentation for more information on monitor):

    >>> run.monitor()
                                  🌎 FarmVibes.AI 🌍 dataset_generation/datagren_crop_segmentation 🌏
                                          Run name: Generating dataset for crop segmentation                                    
                                            Run id: dd541f5b-4f03-46e2-b017-8e88a518dfe6                              
                                                          Run status: failed                                           
                                                        Run duration: 00:00:16
    ┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
    ┃ Task Name                          ┃ Status   ┃ Start Time          ┃ End Time            ┃ Duration ┃ Progress                    ┃
    ┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
    │ spaceeye.preprocess.s2.s2.download │ failed   │ 2022/10/03 22:22:16 │ 2022/10/03 22:22:20 │ 00:00:00 │  ━━━━━━━━━━━━━━━━━━━━  0/1  │
    │ cdl.download_cdl                   │ done     │ 2022/10/03 22:22:12 │ 2022/10/03 22:22:15 │ 00:00:05 │  ━━━━━━━━━━━━━━━━━━━━  1/1  │
    │ spaceeye.preprocess.s2.s2.filter   │ done     │ 2022/10/03 22:22:10 │ 2022/10/03 22:22:12 │ 00:00:02 │  ━━━━━━━━━━━━━━━━━━━━  1/1  │
    │ spaceeye.preprocess.s2.s2.list     │ done     │ 2022/10/03 22:22:09 │ 2022/10/03 22:22:10 │ 00:00:01 │  ━━━━━━━━━━━━━━━━━━━━  1/1  │
    │ cdl.list_cdl                       │ done     │ 2022/10/03 22:22:04 │ 2022/10/03 22:22:09 │ 00:00:04 │  ━━━━━━━━━━━━━━━━━━━━  1/1  │
    └────────────────────────────────────┴──────────┴─────────────────────┴─────────────────────┴──────────┴─────────────────────────────┘
                                                   Last update: 2022/10/03 22:23:59

    The platform logs the possible reason why a task failed, which might be recovered with run.reason and run.task_details.

    Workflow run with 'pending' status indefinitally

    If the status of a workflow run remains in 'pending', make sure to restart the cluster with:

    farmvibes-ai local restart

  • Example notebooks:

    Unable to import modules when running a notebook

    Make sure you have installed and activated the micromamba environment provided with the notebook.

    Workflow run monitor table not rendering inside notebook

    Make sure to have the ipywidgets package installed in your environment.


  • Segment Anything Model (SAM):

    Adding SAM's ONNX models to the cluster

    Running workflows based on SAM requires the image encoder and prompt encoder/mask decoder to be exported as ONNX models, and added to the cluster. To do so, run the following command in the repository root:

    python scripts/export_sam_models.py --models <model_types>

    where <model_types> is a list of model types to be exported (vit_b, vit_l, vit_h). For example, to export all three ViT backbones, run:

    python scripts/export_sam_models.py --models vit_b vit_l vit_h

    The script will download the models from the SAM repository, export each component as a separate ONNX file, and add them to the cluster with the farmvibes-ai local add-onnx command. If you are using a different storage location, make sure to pass the --storage-path flag to the add-onnx command.

    Before running the script, make sure you have a micromamba environment set up with the required packages. You can use the environments defined by env_cpu.yaml or env_gpu.yaml files in the notebooks/segment_anything directory.

    Unreliable segmentation mask when using bounding box as prompt

    As the input Sentinel-2 rasters may be considerably larger than the images expected by SAM, we split the rasters into 1024 x 1024 chips (with an overlap defined by the spatial_overlap parameter of the workflow). This may lead to corner cases that yield unreliable segmentation masks, especially when using a bounding box as prompt. To avoid such cases, consider the following:

    • Only a single bounding box is supported per prompt group (i.e., all points with the same prompt_id).
    • We recommend providing at least one foreground point within the bounding box. Even though the model supports segmentating rasters solely with a bounding box, the results may be unreliable.
    • If the prompt contains a foreground point outside the provided bounding box, the workflow will adjust the bounding box to include all foreground points in that prompt group.
    • Background points outside the bounding box are ignored.
    • Regions outside the bounding box will be masked out in the final segmentation mask.
.. autosummary::
   :toctree: _autosummary