-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Pixelpipe history and style problems #18311
Pixelpipe history and style problems #18311
Conversation
In normal processing, no. The images are freshly imported, then culled. The selected ones are then processed. The style that is applied is appended to the current (virgin) history. The only time I reset the history stack is after an "insane data" or "image reset" problem when I end up with the image I was editing being "reset" to some state. I reset the history stack and remove the tags and reprocess. Shot a couple thousand images last night. I'll process tonight and test. |
Got an image reset when I tried to advance after editing an image. I stopped darktable after that, so it's at the end of the run1.txt log file. This is what it looks like after the reset Here's the log file and here's the XMP files after the reset. 129EOSR7_2R4A6098.CR3.xmp.txt 94 was the edited image that got reset and 98 was the image I was changing to. |
Another image reset with "insane data" in crop and rotate module. Again it's at the end of the file. thumbnail was same interesting shade of blue as above. 129EOSR7_2R4A6583.CR3.xmp.txt 74 was the image that reset. |
@wpferguson thanks for the logs. This is likely the interesting thing
It means that in |
9c71889
to
0c8ab00
Compare
Okay, I think I've figured it out... I just finished processing 129EOSR7_2R4A7194.CR3
then I load 129EOSR7_2R4A7219.CR3
Then I update the thumbnail for 129EOSR7_2R4A7194.CR3
Than I switch back to processing 129EOSR7_2R4A7219.CR3
Then this happens The gets applied to 129EOSR7_2R4A7194.CR3, instead of 129EOSR7_2R4A7219.CR3, and the history gets written again (image 1453), which leads to....
Maybe we should flush the pixelpipe if the imgids don't match. I also think this is only 1 reproducer since we seen the blue images on other OS's and workflows. EDIT: Maybe this is what we should check for, EDIT 2: Here's the run file |
@wpferguson indeed that looks bad. We have some remaining issues with the image cache that might be related, the first bunch of race problems i was able to reproduce is in There is one more problem seen in your first log
but not sure to understand that. History had been saved (bad or good) but while calculating dimensions for full and thumb pipe the full pipe has a bad roi_out and thumb hasn't. That might be a problem in ashift internal algorithm. One idea was to make sure we always have aligned data for |
0c8ab00
to
fa09c13
Compare
The problem here is, we start a new |
On another note, I processed a ~200 images without a loop occurring 😄 |
You mean on another machine? With or without OpenCL? OpenCL would mean we have two pipe running, that could be the culprit for your observation that OpenCL "hurts". |
f55d7e8
to
c6250af
Compare
I was processing with OpenCL on (in the logs it shows up as CL0). In all the processing I did I got a half dozen image resets, but I didn't get any processing loops. EDIT: I only have one machine. I have a set of scripts to set up environments for testing and development so that I don't pollute my main darktable installation. |
bashing is our friend :-) |
c6250af
to
07c8592
Compare
07c8592
to
38375b6
Compare
Force-pushed some more fixes for ashift.
|
Looks like we have a dead-lock with this. To reproduce:
The style I have used: After some more testing it looks like the culprit is Liquify, if I remove it from style it works better. Yet, another issue (style without liquify):
|
No, I just haven't had a chance to test yet. I'll test today |
Hold on, will first try to sort out pascals reports. |
@TurboGit no problem to reproduce :-) This is getting somewhat difficult and there is no quick solution except a quick hack not allowing reentrancy of that function. Maybe you cherrypick the first two commit and we only have history stuff here? Or i could open another PR with just those first two? |
Started testing. Processed first image and advanced to second. Image did not load (upper left preview is image 1), screenshot attached. Had to force quit darktable. |
Before the bisected commit where these errors started, the pixelpipe worked without errors. Since then we've been trying fix after fix after fix... Did we ever figure out what went wrong with the bisected commit? I'm concerned that the patient (darktable) has cancer and we keep trying to fix it with band aids. |
Which function? |
We got some things fixed while investigating this not being the real issue but still worthwhile. Try only the first commit. |
We must not apply all the requested history changes until the last request to do so has finalized. This can happen as reported in darktable-org#18311 (and elsewhere by lua calls to this function).
38375b6
to
ae91825
Compare
@wpferguson i am also - let's say deeply - concerned. And yes, your bisected commit e8c2b95 had a severe problem, the hash was wrong and that has been fixed via #18275. (but please remember we have the new iop_order scheme ...) And yes again, i confess i don't yet fully understand what is sometimes going wrong. I did a deep investigation into history/style code and had hoped, i could fix it via the history mutex. Unfortunately this doesn't fully work as in many parts of that code we use different/fresh dev's ... So here is another bunch of commits. The good point, what @TurboGit reported is a reproducer here too. ATM i hope that the reentrancy locking in ae91825 will be a fix for your lua stuff too. |
Thank you for your efforts. Unfortunately, it seems 'very few' can contribute to this. I've done my fair share of concurrency, but that was mainly about parallelising largely independent work (e.g. calling out to different backend systems to gather data), without users submitting other events that could affect the ongoing work. Hats off! Would it not be possible to set up some kind of queue for events, and handle them in a sequential manner, in order to simplify the locking? Processing itself could still be parallel, but until a cascade of processing triggered by one event completes (either finishing, or cancelling), the next event would be not taken from the queue. It may be pretty bad for performance, and the parallel processing (between darktable's various pipelines, which somewhat depend on each-other, if I understand correctly) may still have issues -- I'm not familiar with darktable's architecture, unfortunately. |
Didn't lock up. Processed 2 images and got the image reset |
The new full stack of commits here is ready to be checked as I see it. If you find issues I would love to get a fresh log. |
Okay, I pulled the latest commits, then started darktable. Resumed from where reset image. I
Here's the log, |
Tried again. Opened the image from above. The style had already been applied, even though it refused to load earlier. Processed the image, then advanced to the next image. Darkroom stopped responding with "loading ...." just like the screenshot above. Here's the log - |
1. Use only 'piece' as parameter 2. Include/deduplicate pipe->type check 3. Don't use iop_order to test for skipping 4. Some debugging logs added and improved
- use aligned data for _homography and simd matrix code - make use of feqf() where appropriate - use size_t instead of int where required for large image data - some gboolean for clarity
Easier reading of return values.
Don't apply a style via dt_styles_apply_to_image() or dt_styles_apply_to_dev() to an invalid image or not appropriate. Added logs for both functions telling if ok or for errors. Removed misleading filename `unknown* as we simply don't have that available here.
We must not apply all the requested history changes until the last request to do so has finalized. This can happen as reported in darktable-org#18311 (and elsewhere by lua calls to this function).
Ensure the image is locked while applying the style.
0a23816
to
00f5352
Compare
@wpferguson first: thanks a lot (again) for all your patience and testing!
|
Processed one image, advanced to the next and darktable hung with a loading message just like the screen shot above. EDIT: Had to force quit to get out of it. |
@wpferguson i am a bit lost atm, latest commits work just right here, but ... I tried another approach in https://github.com/jenshannoschwalm/darktable/pull/new/reverting_pipe_stuff The first two commits a) revert the commit e8c2b95 you bisected as being the bad one and b) 48282d4 the one fixing a wrong hash calculation done in a) So if you just use those first two reverting commits everything should be as you remember "pixelpipe worked without errors". EDIT: (this commit is not available any more with latest force-push) The third commit does the corrections i thought to be required to match the new iop_order scheme. Can you confirm that this breaks your workflow? EDIT: Please note that if using only the first two commits artifacts in tone-equalizer and all modules doing blending are to be expected as those require a hash based on iop_order instead of "node position in the pipe" @TurboGit you might also have a look. If the third commit is actually the culprit we can keep the "hash based on position" and will just have to fix hashing for tone equalizer and the blending cache ... Let's see what your tests show. |
@jenshannoschwalm : Will do another testing session later today. |
Hold on, I will do a PR later today that keeps the hash being calculated on position and will fix the mentioned other parts so we can do "proper" testing. |
Ok, found the time and updated the https://github.com/jenshannoschwalm/darktable/pull/new/reverting_pipe_stuff branch, it all follows the assumption (suggested by the bisected commit) that we must use the node position in the pipe to calculate the hash, not the iop_order. Two commits are code maintenance/logs work. One commit fixes the internal hash calculation in tone equalizer, one fixes the hash calculation we do to check for a correct blending cache. So now ready for testing and review. |
Some more work related to #18245 #18014 and more.
A short summary; especially when adding/changing styles in a very fast way while being in darkroom the pixelpipe might get "confused" and gets into a looping state. Anyother symptom being reported is "insane value".
a) superfluous pipe runs - many of those have been fixed by late PR's
b) gtk redrawing related to position/size changes. This could be some remaining calculation inaccuracies or some canvas resizing due to what we do
c) Something bad in the pipe.
There are two commits here to be tested the first tackling/reporting a possible problem for skipped modules, the second commit tries to protect dev history while adding/removing history items/styles.
When testing please do with
-d pipe -d ioporder -d params -d opencl -d expose
EDIT: dont-d params
for now as that explodes the log@wpferguson (i tried your tarball, i am not sure if i do everything right to test as i almost never use lua ...) would appreciate another round of in-depth "hammer-tests"
@kofa73 you also reported issues possibly related.
From commit messages
Refactor dt_iop_module_is_skipped()
Added locking for history and style operations
As develop history_mutex and global db_image mutexes are both of recursive type we
can and should use them for locking the history or image struct while modifying history.