- Create a virtual environment with the same python version as the databricks runtime
virtualenv .venv
- In setup.py change the databricks runtime version in install_requires from
databricks-connect==6.2.*
to the one you are using. For example if using 5.5 then change it to==5.5.*
- Install the current directory with setup.py in the virtual environment
pip install -e .
Follow the steps in the official guide to finish configuring the client.
Follow the steps in the guide for VS Code or Jupyter to configure the IDE.
Note: Check that you don't have SPARK_HOME set to your local spark installation. If set, then unset it or use
python.envFile
to set SPARK_HOME to the path returned bydatabricks-connect get-spark-home
Once you have code you want to deploy to databricks.
- Import notebooks directly to databricks from the
.ipynb
or.py
files. - For the libraries in src/ you will need to build a library and upload it
Use python setup.py bdist_spark
or python setup.py bdist_egg
to build a library in dist/
. Import this library into databricks and install into the databricks cluster.