Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Helm upgrade from 4.0.2 -> 4.1.1 fails as psycopg2 is no longer bundled into lean Superset image #31026

Open
3 tasks done
kirmorozov1992 opened this issue Nov 22, 2024 · 17 comments
Assignees
Labels
install:dependencies Installation - Dependencies

Comments

@kirmorozov1992
Copy link

Bug description

Hi
I want to upgrade Superset 4.0.2 to 4.1.1 version using Helm.
helm upgrade --install superset superset/superset -f values.yaml
I got issue with psycopg2. Help, please.

Screenshots/recordings

Defaulted container "superset-init-db" out of: superset-init-db, wait-for-postgres (init)
Upgrading DB schema...
Loaded your LOCAL configuration at [/app/pythonpath/superset_config.py]
2024-11-22 03:34:11,278:ERROR:superset.app:Failed to create app
Traceback (most recent call last):
File "/app/superset/app.py", line 40, in create_app
app_initializer.init_app()
File "/app/superset/initialization/init.py", line 476, in init_app
self.setup_db()
File "/app/superset/initialization/init.py", line 667, in setup_db
pessimistic_connection_handling(db.engine)
File "/usr/local/lib/python3.10/site-packages/flask_sqlalchemy/init.py", line 998, in engine
return self.get_engine()
File "/usr/local/lib/python3.10/site-packages/flask_sqlalchemy/init.py", line 1017, in get_engine
return connector.get_engine()
File "/usr/local/lib/python3.10/site-packages/flask_sqlalchemy/init.py", line 594, in get_engine
self._engine = rv = self._sa.create_engine(sa_url, options)
File "/usr/local/lib/python3.10/site-packages/flask_sqlalchemy/init.py", line 1027, in create_engine
return sqlalchemy.create_engine(sa_url, **engine_opts)
File "", line 2, in create_engine
File "/usr/local/lib/python3.10/site-packages/sqlalchemy/util/deprecations.py", line 375, in warned
return fn(*args, **kwargs)
File "/usr/local/lib/python3.10/site-packages/sqlalchemy/engine/create.py", line 544, in create_engine
dbapi = dialect_cls.dbapi(**dbapi_args)
File "/usr/local/lib/python3.10/site-packages/sqlalchemy/dialects/postgresql/psycopg2.py", line 811, in dbapi
import psycopg2
ModuleNotFoundError: No module named 'psycopg2'
Traceback (most recent call last):
File "/usr/local/bin/superset", line 8, in
sys.exit(superset())
File "/usr/local/lib/python3.10/site-packages/click/core.py", line 1157, in call
return self.main(*args, **kwargs)
File "/usr/local/lib/python3.10/site-packages/click/core.py", line 1078, in main
rv = self.invoke(ctx)
File "/usr/local/lib/python3.10/site-packages/click/core.py", line 1685, in invoke
super().invoke(ctx)
File "/usr/local/lib/python3.10/site-packages/click/core.py", line 1434, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/usr/local/lib/python3.10/site-packages/click/core.py", line 783, in invoke
return __callback(*args, **kwargs)
File "/usr/local/lib/python3.10/site-packages/click/decorators.py", line 33, in new_func
return f(get_current_context(), *args, **kwargs)
File "/usr/local/lib/python3.10/site-packages/flask/cli.py", line 355, in decorator
app = __ctx.ensure_object(ScriptInfo).load_app()
File "/usr/local/lib/python3.10/site-packages/flask/cli.py", line 309, in load_app
app = locate_app(import_name, name)
File "/usr/local/lib/python3.10/site-packages/flask/cli.py", line 238, in locate_app
return find_app_by_string(module, app_name)
File "/usr/local/lib/python3.10/site-packages/flask/cli.py", line 166, in find_app_by_string
app = attr(*args, **kwargs)
File "/app/superset/app.py", line 40, in create_app
app_initializer.init_app()
File "/app/superset/initialization/init.py", line 476, in init_app
self.setup_db()
File "/app/superset/initialization/init.py", line 667, in setup_db
pessimistic_connection_handling(db.engine)
File "/usr/local/lib/python3.10/site-packages/flask_sqlalchemy/init.py", line 998, in engine
return self.get_engine()
File "/usr/local/lib/python3.10/site-packages/flask_sqlalchemy/init.py", line 1017, in get_engine
return connector.get_engine()
File "/usr/local/lib/python3.10/site-packages/flask_sqlalchemy/init.py", line 594, in get_engine
self._engine = rv = self._sa.create_engine(sa_url, options)
File "/usr/local/lib/python3.10/site-packages/flask_sqlalchemy/init.py", line 1027, in create_engine
return sqlalchemy.create_engine(sa_url, **engine_opts)
File "", line 2, in create_engine
File "/usr/local/lib/python3.10/site-packages/sqlalchemy/util/deprecations.py", line 375, in warned
return fn(*args, **kwargs)
File "/usr/local/lib/python3.10/site-packages/sqlalchemy/engine/create.py", line 544, in create_engine
dbapi = dialect_cls.dbapi(**dbapi_args)
File "/usr/local/lib/python3.10/site-packages/sqlalchemy/dialects/postgresql/psycopg2.py", line 811, in dbapi
import psycopg2
ModuleNotFoundError: No module named 'psycopg2'

Superset version

master / latest-dev

Python version

3.9

Node version

16

Browser

Chrome

Additional context

No response

Checklist

  • I have searched Superset docs and Slack and didn't find a solution to my problem.
  • I have searched the GitHub issue tracker and didn't find a similar bug report.
  • I have checked Superset's logs for errors and if I found a relevant Python stacktrace, I included it here as text in the "additional context" section.
Copy link

dosubot bot commented Nov 22, 2024

Hello @kirmorozov1992! I'm here to assist you with any bugs, questions, or contributions you have. Let's work together to solve the issue you're facing.

To resolve the "No module named 'psycopg2'" error when upgrading Apache Superset using Helm, you need to ensure that the psycopg2 package is installed as part of the bootstrap script in your Helm chart configuration. You can do this by adding the installation command for psycopg2 in the bootstrapScript section of your values.yaml file. Here is an example configuration:

bootstrapScript: |
  #!/bin/bash
  pip install psycopg2==2.9.6 \
    sqlalchemy-bigquery==1.6.1 \
    elasticsearch-dbapi==0.2.5 &&\
  if [ ! -f ~/bootstrap ]; then echo "Running Superset with uid {{ .Values.runAsUser }}" > ~/bootstrap; fi

This script ensures that psycopg2 is installed when the Superset application starts, which should resolve the module not found error [1].

To continue talking to Dosu, mention @dosu.

@dosubot dosubot bot added the install:dependencies Installation - Dependencies label Nov 22, 2024
@kirmorozov1992
Copy link
Author

@dosu Unfortunately it does not help me. Somebody has other advice?

Copy link

dosubot bot commented Nov 22, 2024

This is a tough nut to crack, could really use your help on this, @rusackas.

To continue talking to Dosu, mention @dosu.

@DavidProdinger
Copy link

There is the same issue for the docker compose version (docker-compose-image-tag.yml), not limited to Helm.

@paulchoi
Copy link

This worked for me. You need psycopg2-binary.
You can ignore clickhouse-connect.

bootstrapScript: |
  #!/bin/bash
  pip install \
    "clickhouse-connect>=0.6.8" \
    "psycopg2-binary>=2.9.10" \
    &&\
  if [ ! -f ~/bootstrap ]; then echo "Running Superset with uid {{ .Values.runAsUser }}" > ~/bootstrap; fi

@kirmorozov1992
Copy link
Author

@paulchoi Hi! Do you check this with Docker Compose? I`ve tried many times to change config and reloading, but it is unhelpful for me. I use Minikube.

@DavidProdinger
Copy link

I use it with Docker Compose

just add in your docker/requirements-local.txt file this content:

# database drivers
pymysql
psycopg2-binary

Sadly the non-binary version of psycopg2 cant be installed, as well as mysqlclient.
Therefore I use pymysql with the URL mariadb+pymysql://...

@richard-fairthorne
Copy link

richard-fairthorne commented Nov 26, 2024

Put this in your bootstrapScript, before pip install:

apt-get update && apt-get install -y build-essential

I am surprised this is not included in the documentation.

@Rusp0
Copy link

Rusp0 commented Nov 29, 2024

bootstrapScript: |
#!/bin/bash
apt-get update && apt-get install -y build-essential
pip install psycopg2==2.9.6
sqlalchemy-bigquery==1.6.1
elasticsearch-dbapi==0.2.5 &&
if [ ! -f ~/bootstrap ]; then echo "Running Superset with uid {{ .Values.runAsUser }}" > ~/bootstrap; fi

Sadly, Didn't help to me, same error

@sfirke
Copy link
Member

sfirke commented Dec 2, 2024

Starting with 4.1.0 the lean docker image no longer contains the drivers for MySQL or Postgres, as described in the release notes: https://github.com/apache/superset/blob/master/RELEASING/release-notes-4-1/README.md#change-to-docker-image-builds

I know that affects people deploying with docker compose and may be the issue here with the Helm chart too. The Helm chart is in a gray area where it doesn't have a dedicated manager and gets bumped/fixed by the community as needed -- so community fixes are especially welcome here.

@sfirke sfirke changed the title upgrade error Helm upgrade from 4.0.2 -> 4.1.1 fails as psycopg2 is no longer bundled into lean Superset image Dec 3, 2024
@merlos
Copy link

merlos commented Dec 6, 2024

In my case, I was tryping to install sqlalchemy-drill as part of the helm chart deployment and also failed because psycopg2 was not available.

This change in the values.yaml fixed the issue

bootstrapScript: |
  #!/bin/bash
  pip install sqlalchemy-drill psycopg2-binary

@martimors
Copy link
Contributor

martimors commented Dec 13, 2024

Why can't psycopg2 just be bundled in the image? The docs suggest to create a derived image, but using a postgresql backend is not only an extremely common use-case for Superset There isn't really any harm in bundling a few database libs with the image for convenience, especially ones needed for the typical backends for superset itself.

@villebro
Copy link
Member

villebro commented Dec 13, 2024

This is not directly related to the Helm chart, but rather how the Docker image is built. Sadly, pulling in extra db drivers on the fly is not totally straight forward for the following reasons:

  • User account: Only the root account is allowed to install new packages on the running pod/container. If you're ok with this, remember to keep the default runAsUser: 0 in your values.yaml
  • Internet access: Many environments may have blocked external access from the Superset pods to the external internet. However, if you happen to have an internal PyPI registry, you can use that in your bootstrap script: pip install psycopg2 --index-url https://pypi.mycorp.com/simple

If neither of these is possible in your environment, you will need to build a custom image, where you preinstall all necessary drivers. This both eliminates the need to run as root, and doesn't require having access to a running PyPI registry. This is also the recommended approach, as it keeps startup times to a minimum (no need to install the drivers every time the pod starts up), doesn't require access to a PyPI registry, and doesn't require running as root.

@villebro
Copy link
Member

Why can't psycopg2 just be bundled in the image? The docs suggest to create a derived image, but using a postgresql backend is not only an extremely common use-case for Superset There isn't really any harm in bundling a few database libs with the image for convenience, especially ones needed for the typical backends for superset itself.

@martimors sadly this is a bit of a slippery slope, as Superset supports some 40+ databases currently. As you will anyway need to figure out a way to add drivers for your other database drivers, prebaking psycopg2 into the image is not necessarily a good solution for the following reasons:

  • It adds to the image size.
  • It introduces an unnecessary attack vector for envs that don't need psycopg2 if an exploit exists in it.
  • Users may run into dependency issues if the pre-baked version of psycopg2 has conflicting requirements with whichever db driver someone wants to install.

@villebro
Copy link
Member

Put this in your bootstrapScript, before pip install:

apt-get update && apt-get install -y build-essential

I am surprised this is not included in the documentation.

@richard-fairthorne Pull Requests improving the docs are always welcomed!

@sfirke
Copy link
Member

sfirke commented Dec 16, 2024

Hm, I see both points here.

  • I agree with @villebro that if we included support for the backend DB, people will still likely need to build their own image to include drivers for their data warehouse as well as a browser to take screenshots for Alerts & Reports, pyxl for Excel import/export, etc.
  • But for people who just want to spin up Superset the first time with example data to see what it feels like, including psycopg2 and mysqlclient might be beneficial to new users and ultimately the Superset project. We are seeing users getting stuck on docker build and installing pre-reqs for mysqlclient when they are just trying to test out Superset, that doesn't seem right.

At the very least we could document how to build an extended image with these drivers, I hope to do that in the coming months. I personally think it would be nice to offer a new docker image basics that has these drivers, pyxl, Pillow, and a headless browser installed. Then people who are extending can still extend from lean to avoid the bloat and security issues that Ville points out, but new users have a plug-and-play demo option.

johnallen3d added a commit to johnallen3d/argocd-demo that referenced this issue Jan 18, 2025
johnallen3d added a commit to johnallen3d/argocd-demo that referenced this issue Jan 18, 2025
@nfalco79
Copy link

This hit also our helm chart deployment.
Suggestion in THIS thread resolved the issue. I'm not happy to override the bootstrapScript because in future version could not work. So I have to remember to remove in the next update

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
install:dependencies Installation - Dependencies
Projects
None yet
Development

No branches or pull requests

10 participants