Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Better support for table level locking #67

Open
simovesterinen81 opened this issue Feb 1, 2021 · 6 comments
Open

Better support for table level locking #67

simovesterinen81 opened this issue Feb 1, 2021 · 6 comments

Comments

@simovesterinen81
Copy link

Hi,

We have been using a long time now table level locking branch.
In our application we have multiple databases on one postgres instance and they all use IMCS heavily.

Every now and then some strange behaviour happens on imcs side system crashes on out of memory killer or segmentation fault.
These crashes seem always somehow related to multiple users using same time get,load or delete actions to different tables.
Would there be possibility to implement better support for table level locking that would prevent these crashes?

@knizhnik
Copy link
Owner

knizhnik commented Feb 1, 2021

Sorry, but to prevent crashes it is necessary first to understand the reason of crashes.
Are them causes by OOM or some bug in software?

Table level locking prevent concurrent modification of the table, But it doesn't prevent (and should not prevent) concurrent execution of read-only queries. So if problem is caused by memory exhaustion which in turn is result of concurrent execution of some heave queries, then locking can not solve this problem.

Can you some how provide me core files or stack traces of such crashed backends?

@simovesterinen81
Copy link
Author

simovesterinen81 commented Feb 1, 2021

Both reasons:

OOM sometimes happen when there are multiple users and simultnious delete&load&query. The Get-query gets somehow infinite loop --> takes long time--> builds memory --> oom killer. The query result should be very small is small but we can't even kill the process from postgres side.

Segmentation fault happens sometimes. It is very difficult to reproduce but it seems that deleting data and querying processes starting same time some times makes this.
Also when this occurs the Get-query gets in not responding situation and crashes the server.

These are the reasons why i suspect that the problem is somehow related to deleting/adding rows from imcs when there is same time querying.

@knizhnik
Copy link
Owner

knizhnik commented Feb 1, 2021

Sorry, but without core file I can't do anything.
Can you configure poastgres to dump cores?

@simovesterinen81
Copy link
Author

simovesterinen81 commented Feb 1, 2021

No.
It's a production environment.
We are not able to reproduce this in development env.

Maybe I could create the needed locking by using advisory locks.

@simovesterinen81
Copy link
Author

Could there be better exception hanlind in IMCS side so that if OOM killer happens it would not restart the postgres server?
It's a problem for us because we need to load all the data back to imcs and during that time our users can't use the IMCS.

@knizhnik
Copy link
Owner

knizhnik commented Feb 2, 2021

It is possible to dump cores ni production environment as well.
Certainly debugging code built with optimization is much more difficult, then code built without optimization and with debugging symbols. But at least it is possible to see stack trace and examine with dissambler the code fragment where crash happen.

Sorry, but I have to say once again, that it is not possible to do something without understanding first the reason of the problem. It is not clear to me now whether OOM happen because of IMCS shared memory or private backend's memory used for query execution (for example for hash).

In general, it is not possible to protect program from OO killer. You can play with overcommit memory settings or add swap,
In first case, application will get malloc failure before OOM killer. In the second case you will get swapping instead of failure
(but impact of trashing on performance may be even more dramatic).

As far as I know there is no way to protect process from OOM killer. And once processes is killed, there is on other
way for Postgres postmaster to handle this crash except complete instance restart, because shared memory can stay in inconsistent state after such crash.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants