-
Notifications
You must be signed in to change notification settings - Fork 217
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Socket creation problem with Upgrading database (by switching to newer PostgreSQL image version) #362
Comments
Hi @kevindafm , thanks for getting in touch! I will look into this. I tried reproduce the issue you are having just via docker, but I am unable to do so - the upgrade always succeeds for me. Will try out on an Openshift instance as well, to see what might be different. For your question about what user runs the upgrade - the upgrade is run by the user running the image, which should be in your case: Note that this is also the postgres user, since we are using nss_wrapper internally to map the random uids Openshift generates for the runtime to postgres username (postgres datadir should hold a |
Hi @pkubatrh Thanks for looking into this. You are correct about the uids. I missed the nss_wrapper piece but I can see the entry in /var/lib/pgsql/passwd for postgres maps the uid to 1001440000 sh-4.2$ more /var/lib/pgsql/passwd One difference between direct docker and OpenShift is that Openshift generally reduces the permissions a bit more. This may also happen in Docker but not always. From my postgresql pod yaml I see the following. securityContext: Also this piece securityContext: I am not sure a direct run in docker is as restrictive? Thanks again for looking into it. Kevin. |
Hi @kevindafm Tried out the upgrade on Openshift as well and was not able to reproduce with a simple test deployment. Took the following steps:
Any ideas how to reproduce are appreciated. |
Remembered today that we had a similar issue on Guster-backed PVC with regards to sockets and upgrades, could be related: https://bugzilla.redhat.com/show_bug.cgi?id=1696732 |
Hi Petr, Thanks for response and further procedures. I think in my original attempt I had some mistakes for example I did not have the remove of 9.5 image. I tred your steps above but I get the errors below. usr/share/container-scripts/postgresql/common.sh: line 121: [: 9.6: integer expression expected ========== $PGDATA upgrade: 9.5 -> 9.6 ========== ===> Initialize new data directory; we will migrate to that. The files belonging to this database system will be owned by user "postgres". The database cluster will be initialized with locale "en_US.utf8". Data page checksums are disabled. creating directory /var/lib/pgsql/data/userdata-new ... ok Success. You can now start the database server using:
WARNING: enabling "trust" authentication for local connections ===> Starting the pg_upgrade process. failure There seems to be a postmaster servicing the old cluster. pg_upgrade run on Tue Mar 10 14:46:57 2020command: "/opt/rh/rh-postgresql95/root/usr/bin/pg_ctl" -w -l "pg_upgrade_server.log" -D "/var/lib/pgsql/data/userdata" -o "-p 50432 -c autovacuum=off -c autovacuum_freeze_max_age=2000000000 -c listen_addresses='' -c unix_socket_permissions=0700" start >> "pg_upgrade_server.log" 2>&1 Just to say I am following your procedure above when I get the errors above. I wonder what else I can try? Best Regards, Kevin. |
Hi @pkubatrh, I have proved this is a gluster file versus block problem. If I run the upgrade process on a postgress database with gluster block volumes it works no problem. If I run it on gluster file volumes it does not work. Unfortunately the default storage class is gluster file in the postgress deployment config / template. Hence I have a number of postgres databases running on gluster file. So I wonder can this be fixed or at least investigated here -> https://bugzilla.redhat.com/show_bug.cgi?id=1696732 Best Regards, Kevin. |
Hi @kevindafm, thanks for providing more info! If this is tied to using gluster file I am afraid it might be the same case as bug 1696732. I do not think we ever figured out what is wrong with gluster itself but we managed to figure out a workaround for the socket permissions issue - you just need to set Unfortunately, there is a bug in postgresql itself that causes the user-provided arguments to be overwritten by hard-coded ones (see upstream discussion). This is something that needs to be fixed in the rpms, so please get in touch with your support representative and tell them to create a bug against the Red Hat Software Collections product, postgresql component, rh-postgresql96 version. @UncleAlbie Since it has been mostly you working on the bug above, is there anything else important I missed? |
@pkubatrh Just to make sure I understand the potential solution -- would that problem be fixed if we backported the same patch as we're considering in bug 1696732? |
Hi @pkubatrh , Thanks for coming back. Can you provide an example of POSTGRESQL_UPGRADE_PGUPGRADE_OPTIONS? I tried setting to just /tmp. I also tried setting to unix_socket_directories='/tmp'. I get the same error as before. Do I need to set all options or just the /tmp? I see this in the logs of the postgres upgrade process. command: "/opt/rh/rh-postgresql95/root/usr/bin/pg_ctl" -w -l "pg_upgrade_server.log" -D "/var/lib/pgsql/data/userdata" -o "-p 50432 -b -c listen_addresses='' -c unix_socket_permissions=0700 -c unix_socket_directories='/var/lib/pgsql/dat Does that help me know what to set the POSTGRESQL_UPGRADE_PGUPGRADE_OPTIONS to? Any help much appreciated. Best Regards, Kevin. |
@pkubatrh I think you've summarised it pretty well. I don't think I have anything to add here. @kevindafm As described by pkubatrh, there's a bug in postgresql. The |
@UncleAlbie for information and confirmation. I have support call 02599066 raised about this issue also. Does combination of discussion here and the support call mean the hardcoded values in pg_upgrade tool will get fixed? There are 2 options here.
I think the pg_upgrade tool fix will be easiest to fix but I don't know for sure. I also thin having something like POSTGRESQL_UPGRADE_PGUPGRADE_OPTIONS is useful for future upgrades etc. Best Regards, Kevin. |
@kevindafm It definitely helped us to find out where is the problem. That said, this github repo is not a support tool, so I'd like to ask you to continue discussing this issue with the Red Hat Support, because they need to drive the update itself. In any case, thank you for spending time on debugging this and helping us to find a way forward. This is really appreciated. |
No problem @hhorak I am keeping the support call updated with all the discussions here and other things. Ultimately I have a bunch of postgres 9.5 (now out of support) with several CVEs I need to to upgrade to more recent postgres so I need a solution. postgres 9.5 is no longer available on the red hat container registry so nothing to fix the CVE gap! This means I must upgrade. Thanks all on this discussion here. It has helped inform the support call and at least we all know the problems now. |
While this issue needs to be solved either in rh-postgresql96 packages or gluster/OCP itself, I think what we can do in case of this container is documenting
How does that sound? |
That sounds great. |
All releases of postgresql should have the socketdir option now. Hopefully all that is left is documenting |
Hello. This process "Upgrading database (by switching to newer PostgreSQL image version)" looks awesome and is a really cool idea. See this readme -> https://github.com/sclorg/postgresql-container/tree/generated/9.6
Unfortunately it does not work fully for me. It would appear there is a problem creating the socket.
The logs files from the upgrade are below.
sh-4.2$ more pg_upgrade_internal.log
pg_upgrade run on Mon Mar 2 15:19:46 2020
Performing Consistency Checks
Checking cluster versions ok
failure
Consult the last few lines of "pg_upgrade_server.log" for
the probable cause of the failure.
connection to database failed: could not connect to server: No such file or directory
Is the server running locally and accepting
connections on Unix domain socket "/var/lib/pgsql/data/.s.PGSQL.50432"?
could not connect to old postmaster started with the command:
"/opt/rh/rh-postgresql95/root/usr/bin/pg_ctl" -w -l "pg_upgrade_server.log" -D "/var/lib/pgsql/data/userdata" -o "-p 50432 -b -c listen_addresses='' -c unix_socket_permissions=0700 -c unix_socket_directories='/var/lib/pgsql/data'" st
art
pg_upgrade run on Mon Mar 2 15:20:58 2020
Performing Consistency Checks
Checking cluster versions ok
failure
Consult the last few lines of "pg_upgrade_server.log" for
the probable cause of the failure.
connection to database failed: could not connect to server: No such file or directory
Is the server running locally and accepting
connections on Unix domain socket "/var/lib/pgsql/data/.s.PGSQL.50432"?
could not connect to old postmaster started with the command:
"/opt/rh/rh-postgresql95/root/usr/bin/pg_ctl" -w -l "pg_upgrade_server.log" -D "/var/lib/pgsql/data/userdata" -o "-p 50432 -b -c listen_addresses='' -c unix_socket_permissions=0700 -c unix_socket_directories='/var/lib/pgsql/data'" st
art
pg_upgrade run on Mon Mar 2 15:22:17 2020
Performing Consistency Checks
Checking cluster versions ok
failure
Consult the last few lines of "pg_upgrade_server.log" for
the probable cause of the failure.
connection to database failed: could not connect to server: No such file or directory
Is the server running locally and accepting
connections on Unix domain socket "/var/lib/pgsql/data/.s.PGSQL.50432"?
could not connect to old postmaster started with the command:
"/opt/rh/rh-postgresql95/root/usr/bin/pg_ctl" -w -l "pg_upgrade_server.log" -D "/var/lib/pgsql/data/userdata" -o "-p 50432 -b -c listen_addresses='' -c unix_socket_permissions=0700 -c unix_socket_directories='/var/lib/pgsql/data'" st
art
pg_upgrade run on Mon Mar 2 15:24:13 2020
Performing Consistency Checks
Checking cluster versions ok
failure
Consult the last few lines of "pg_upgrade_server.log" for
the probable cause of the failure.
connection to database failed: could not connect to server: No such file or directory
Is the server running locally and accepting
connections on Unix domain socket "/var/lib/pgsql/data/.s.PGSQL.50432"?
could not connect to old postmaster started with the command:
"/opt/rh/rh-postgresql95/root/usr/bin/pg_ctl" -w -l "pg_upgrade_server.log" -D "/var/lib/pgsql/data/userdata" -o "-p 50432 -b -c listen_addresses='' -c unix_socket_permissions=0700 -c unix_socket_directories='/var/lib/pgsql/data'" st
art
pg_upgrade run on Mon Mar 2 15:28:06 2020
Performing Consistency Checks
Checking cluster versions ok
failure
Consult the last few lines of "pg_upgrade_server.log" for
the probable cause of the failure.
connection to database failed: could not connect to server: No such file or directory
Is the server running locally and accepting
connections on Unix domain socket "/var/lib/pgsql/data/.s.PGSQL.50432"?
could not connect to old postmaster started with the command:
"/opt/rh/rh-postgresql95/root/usr/bin/pg_ctl" -w -l "pg_upgrade_server.log" -D "/var/lib/pgsql/data/userdata" -o "-p 50432 -b -c listen_addresses='' -c unix_socket_permissions=0700 -c unix_socket_directories='/var/lib/pgsql/data'" st
art
pg_upgrade run on Mon Mar 2 15:29:18 2020
Performing Consistency Checks
Checking cluster versions ok
failure
Consult the last few lines of "pg_upgrade_server.log" for
the probable cause of the failure.
connection to database failed: could not connect to server: No such file or directory
Is the server running locally and accepting
connections on Unix domain socket "/var/lib/pgsql/data/.s.PGSQL.50432"?
could not connect to old postmaster started with the command:
"/opt/rh/rh-postgresql95/root/usr/bin/pg_ctl" -w -l "pg_upgrade_server.log" -D "/var/lib/pgsql/data/userdata" -o "-p 50432 -b -c listen_addresses='' -c unix_socket_permissions=0700 -c unix_socket_directories='/var/lib/pgsql/data'" st
art
pg_upgrade run on Mon Mar 2 15:30:56 2020
Performing Consistency Checks
Checking cluster versions ok
failure
Consult the last few lines of "pg_upgrade_server.log" for
the probable cause of the failure.
connection to database failed: could not connect to server: No such file or directory
Is the server running locally and accepting
connections on Unix domain socket "/var/lib/pgsql/data/.s.PGSQL.50432"?
could not connect to old postmaster started with the command:
"/opt/rh/rh-postgresql95/root/usr/bin/pg_ctl" -w -l "pg_upgrade_server.log" -D "/var/lib/pgsql/data/userdata" -o "-p 50432 -b -c listen_addresses='' -c unix_socket_permissions=0700 -c unix_socket_directories='/var/lib/pgsql/data'" st
art
pg_upgrade run on Mon Mar 2 15:32:49 2020
Performing Consistency Checks
Checking cluster versions ok
failure
Consult the last few lines of "pg_upgrade_server.log" for
the probable cause of the failure.
connection to database failed: could not connect to server: No such file or directory
Is the server running locally and accepting
connections on Unix domain socket "/var/lib/pgsql/data/.s.PGSQL.50432"?
could not connect to old postmaster started with the command:
"/opt/rh/rh-postgresql95/root/usr/bin/pg_ctl" -w -l "pg_upgrade_server.log" -D "/var/lib/pgsql/data/userdata" -o "-p 50432 -b -c listen_addresses='' -c unix_socket_permissions=0700 -c unix_socket_directories='/var/lib/pgsql/data'" st
art
pg_upgrade run on Mon Mar 2 15:34:46 2020
Performing Consistency Checks
Checking cluster versions ok
failure
Consult the last few lines of "pg_upgrade_server.log" for
the probable cause of the failure.
connection to database failed: could not connect to server: No such file or directory
Is the server running locally and accepting
connections on Unix domain socket "/var/lib/pgsql/data/.s.PGSQL.50432"?
could not connect to old postmaster started with the command:
"/opt/rh/rh-postgresql95/root/usr/bin/pg_ctl" -w -l "pg_upgrade_server.log" -D "/var/lib/pgsql/data/userdata" -o "-p 50432 -b -c listen_addresses='' -c unix_socket_permissions=0700 -c unix_socket_directories='/var/lib/pgsql/data'" st
art
pg_upgrade run on Mon Mar 2 15:36:35 2020
Performing Consistency Checks
Checking cluster versions ok
failure
Consult the last few lines of "pg_upgrade_server.log" for
the probable cause of the failure.
connection to database failed: could not connect to server: No such file or directory
Is the server running locally and accepting
connections on Unix domain socket "/var/lib/pgsql/data/.s.PGSQL.50432"?
could not connect to old postmaster started with the command:
"/opt/rh/rh-postgresql95/root/usr/bin/pg_ctl" -w -l "pg_upgrade_server.log" -D "/var/lib/pgsql/data/userdata" -o "-p 50432 -b -c listen_addresses='' -c unix_socket_permissions=0700 -c unix_socket_directories='/var/lib/pgsql/data'" st
art
pg_upgrade run on Mon Mar 2 15:39:46 2020
Performing Consistency Checks
Checking cluster versions ok
failure
Consult the last few lines of "pg_upgrade_server.log" for
the probable cause of the failure.
connection to database failed: could not connect to server: No such file or directory
Is the server running locally and accepting
connections on Unix domain socket "/var/lib/pgsql/data/.s.PGSQL.50432"?
could not connect to old postmaster started with the command:
"/opt/rh/rh-postgresql95/root/usr/bin/pg_ctl" -w -l "pg_upgrade_server.log" -D "/var/lib/pgsql/data/userdata" -o "-p 50432 -b -c listen_addresses='' -c unix_socket_permissions=0700 -c unix_socket_directories='/var/lib/pgsql/data'" st
art
I wonder is this a permissions issue or is postgresql already started. It is worth noting that normally the socket gets created in /tmp but during the upgrade process it attempts to create the socket in /var/lib/pgsql/data/.s.PGSQL.50432. I am not sure which user attempts to create the socket during the upgrade process.
Is it the uid below
sh-4.2$ id
uid=1001440000 gid=0(root) groups=0(root),2044,1001440000
sh-4.2$
Or the postgres user?
sh-4.2$ more /etc/passwd
root:x:0:0:root:/root:/bin/bash
bin:x:1:1:bin:/bin:/sbin/nologin
daemon:x:2:2:daemon:/sbin:/sbin/nologin
adm:x:3:4:adm:/var/adm:/sbin/nologin
lp:x:4:7:lp:/var/spool/lpd:/sbin/nologin
sync:x:5:0:sync:/sbin:/bin/sync
shutdown:x:6:0:shutdown:/sbin:/sbin/shutdown
halt:x:7:0:halt:/sbin:/sbin/halt
mail:x:8:12:mail:/var/spool/mail:/sbin/nologin
operator:x:11:0:operator:/root:/sbin/nologin
games:x:12:100:games:/usr/games:/sbin/nologin
ftp:x:14:50:FTP User:/var/ftp:/sbin/nologin
nobody:x:99:99:Nobody:/:/sbin/nologin
systemd-network:x:192:192:systemd Network Management:/:/sbin/nologin
dbus:x:81:81:System message bus:/:/sbin/nologin
default:x:1001:0:Default Application User:/opt/app-root/src:/sbin/nologin
postgres:x:26:26:PostgreSQL Server:/var/lib/pgsql:/bin/bash
sh-4.2$ ls -a /tmp
. .. .ICE-unix .Test-unix .X11-unix .XIM-unix .font-unix .s.PGSQL.5432 .s.PGSQL.5432.lock ks-script-OQQXsO yum.log
sh-4.2$
I have also raised a redhat support ticket for this issue (02599066). Any information/discussion much appreciated.
The text was updated successfully, but these errors were encountered: