Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pixelpipe history and style problems #18311

Closed

Conversation

jenshannoschwalm
Copy link
Collaborator

@jenshannoschwalm jenshannoschwalm commented Jan 28, 2025

Some more work related to #18245 #18014 and more.

A short summary; especially when adding/changing styles in a very fast way while being in darkroom the pixelpipe might get "confused" and gets into a looping state. Anyother symptom being reported is "insane value".

  1. The "insane value" is reported in some modules if the processed roi is too small; the reason for this could be valid-but-bad parameters or some badly inserted history
  2. Reasons for the loop - according to various logs - can either be
    a) superfluous pipe runs - many of those have been fixed by late PR's
    b) gtk redrawing related to position/size changes. This could be some remaining calculation inaccuracies or some canvas resizing due to what we do
    c) Something bad in the pipe.
  3. Still unsure if the underlying reason might be the new iop_order scheme

There are two commits here to be tested the first tackling/reporting a possible problem for skipped modules, the second commit tries to protect dev history while adding/removing history items/styles.

When testing please do with -d pipe -d ioporder -d params -d opencl -d expose EDIT: dont -d params for now as that explodes the log

@wpferguson (i tried your tarball, i am not sure if i do everything right to test as i almost never use lua ...) would appreciate another round of in-depth "hammer-tests"
@kofa73 you also reported issues possibly related.


From commit messages
Refactor dt_iop_module_is_skipped()

  1. Use only 'piece' as parameter
  2. Include/deduplicate pipe->type check
  3. Don't use iop_order to test for skipping
  4. Some debugging log

Added locking for history and style operations

As develop history_mutex and global db_image mutexes are both of recursive type we
can and should use them for locking the history or image struct while modifying history.


@jenshannoschwalm jenshannoschwalm marked this pull request as draft January 28, 2025 06:25
@jenshannoschwalm jenshannoschwalm added the bugfix pull request fixing a bug label Jan 28, 2025
src/develop/develop.c Outdated Show resolved Hide resolved
@wpferguson
Copy link
Member

@TurboGit

@wpferguson : Do you reset the history in your testing?

In normal processing, no. The images are freshly imported, then culled. The selected ones are then processed. The style that is applied is appended to the current (virgin) history.

The only time I reset the history stack is after an "insane data" or "image reset" problem when I end up with the image I was editing being "reset" to some state. I reset the history stack and remove the tags and reprocess.

@jenshannoschwalm

Shot a couple thousand images last night. I'll process tonight and test.

@wpferguson
Copy link
Member

Got an image reset when I tried to advance after editing an image. I stopped darktable after that, so it's at the end of the run1.txt log file.

This is what it looks like after the reset

image reset

Here's the log file

run1.txt

and here's the XMP files after the reset.

129EOSR7_2R4A6098.CR3.xmp.txt
129EOSR7_2R4A6094.CR3.xmp.txt

94 was the edited image that got reset and 98 was the image I was changing to.

@wpferguson
Copy link
Member

Another image reset with "insane data" in crop and rotate module. Again it's at the end of the file.

run3.txt.zip

thumbnail was same interesting shade of blue as above.

129EOSR7_2R4A6583.CR3.xmp.txt
129EOSR7_2R4A6574.CR3.xmp.txt

74 was the image that reset.

@jenshannoschwalm
Copy link
Collaborator Author

@wpferguson thanks for the logs. This is likely the interesting thing

   214.6129            toneequal, num 34, order 6500, v(2), multiprio 1
   214.6134 get dimensions                [full]                                       (   0/   0) 7128x4732 scale=1.0000 ID=39
   214.6134 insane data                   [full]           ashift                 1600 (   0/   0) 7128x4732 scale=1.0000 --> (   0/   0) -2147483648x-2147483648 scale=1.0000  
   214.6150 updated final size for ID=39 to 7128x4732

It means that in dt_dev_pixelpipe_get_dimensions() while calling modify_roi_out of ashift there is something bad happening there. As it's roi_in size is ok we must have either bad commited parameters or the code for modify_roi_out is bad.

@wpferguson
Copy link
Member

wpferguson commented Jan 29, 2025

Okay, I think I've figured it out...

I just finished processing 129EOSR7_2R4A7194.CR3


  1095.6516 [dt_dev_write_history_ext] Writing history image id=1453 `129EOSR7_2R4A7194.CR3', iop version: 0
  1095.6516           rawprepare, num  0, order 100, v(2), multiprio 0, enabled
  1095.6516             demosaic, num  1, order 800, v(4), multiprio 0, enabled
  1095.6517              colorin, num  2, order 3800, v(7), multiprio 0, enabled
  1095.6517             colorout, num  3, order 10200, v(5), multiprio 0, enabled
  1095.6517                gamma, num  4, order 10900, v(1), multiprio 0, enabled
  1095.6517          temperature, num  5, order 300, v(4), multiprio 0, enabled
  1095.6517           highlights, num  6, order 400, v(4), multiprio 0, enabled
  1095.6517      channelmixerrgb, num  7, order 3900, v(3), multiprio 0, enabled
  1095.6518             exposure, num  8, order 2700, v(6), multiprio 0, enabled
  1095.6518            filmicrgb, num  9, order 7700, v(6), multiprio 0, enabled
  1095.6518                 flip, num 10, order 1700, v(2), multiprio 0, enabled
  1095.6518      colorbalancergb, num 11, order 5700, v(5), multiprio 0
  1095.6519       denoiseprofile, num 12, order 900, v(11), multiprio 0
  1095.6519              diffuse, num 13, order 3700, v(2), multiprio 0
  1095.6519              diffuse, num 14, order 5800, v(2), multiprio 1
  1095.6519             exposure, num 15, order 2500, v(6), multiprio 1
  1095.6520            toneequal, num 16, order 2400, v(2), multiprio 0
  1095.6520            toneequal, num 17, order 6500, v(2), multiprio 1
  1095.6520             demosaic, num 18, order 800, v(4), multiprio 0, enabled
  1095.6520      channelmixerrgb, num 19, order 3900, v(3), multiprio 0, enabled
  1095.6520                 lens, num 20, order 1300, v(10), multiprio 0, enabled
  1095.6520               ashift, num 21, order 1600, v(5), multiprio 0, enabled
  1095.6521                 crop, num 22, order 3300, v(1), multiprio 0, enabled
  1095.6521             exposure, num 23, order 2700, v(6), multiprio 0, enabled
  1095.6521             exposure, num 24, order 2500, v(6), multiprio 1, enabled
  1095.6521       denoiseprofile, num 25, order 900, v(11), multiprio 0, enabled
  1095.6521         cacorrectrgb, num 26, order 1400, v(1), multiprio 0, enabled
  1095.6522              diffuse, num 27, order 3700, v(2), multiprio 0, enabled
  1095.6522      colorbalancergb, num 28, order 5700, v(5), multiprio 0, enabled
  1095.6522              diffuse, num 29, order 5800, v(2), multiprio 1, enabled
  1095.6522            toneequal, num 30, order 6500, v(2), multiprio 1, enabled
  1095.6522             exposure, num 31, order 2700, v(6), multiprio 0, enabled

then I load 129EOSR7_2R4A7219.CR3

  1095.6816 dt_view_paint_surface         [full]                                        viewport zoom_scale  1.797 backbuf_scale  0.341 (x=3321.94 y=2141.85) -> (x=+0.000 y=+0.000)
  1095.6816   painting                    [full]                                        size 2849x1899 processed 3665x5493 buf 1251x1875 scale=0.341 zoom (x=3321.93 y=2141.81) -> offset (x=-0.000 y=+0.000)
  1095.6944 get dimensions                [full]                                       (   0/   0) 7128x4732 scale=1.0000 ID=1453
  1095.6944 modify roi OUT                [full]           rawprepare              100 (   0/   0) 7128x4732 scale=1.0000 --> (   0/   0) 6984x4660 scale=1.0000
  1095.6944 modify roi OUT                [full]           ashift                 1600 (   0/   0) 6984x4660 scale=1.0000 --> (   0/   0) 6355x4240 scale=1.0000
  1095.6944 modify roi OUT                [full]           flip                   1700 (   0/   0) 6355x4240 scale=1.0000 --> (   0/   0) 4240x6355 scale=1.0000
  1095.6944 modify roi OUT                [full]           crop                   3300 (   0/   0) 4240x6355 scale=1.0000 --> (   0/ 733) 3665x5493 scale=1.0000
  1095.8097 changed CAT for channelmixerrgb from (nil) to 0x5809081d65e0
  1095.8247 used preset                                    temperature             300  preset='as shot to reference': D65 2.006 1.000 1.365, AS-SHOT 1.321 1.000 2.528
  1095.8322 [lib_modulegroups_update_iop_visibility] modulegroups
  1095.8322           rawprepare 100
  1095.8322          temperature 300
  1095.8322           highlights 400
  1095.8322             demosaic 800
  1095.8322                 flip 1700
  1095.8322             exposure 2400
  1095.8323              colorin 3200
  1095.8323      channelmixerrgb 3300
  1095.8323            filmicrgb 5800
  1095.8324           finalscale 8200, hidden
  1095.8324             colorout 8300
  1095.8324          overexposed 8500, hidden
  1095.8324       rawoverexposed 8600, hidden
  1095.8324                gamma 9000, hidden
  1095.8327 LUA processing image 129EOSR7_2R4A7219.CR3
  1095.8327 LUA image time is 2025:01:27 20:32:51
  1095.8328 LUA got tag darktable|format|cr3
  1095.8328 LUA dumping processing tags....
  1095.8328 LUA finished tags dump
  1095.8328 LUA applying modulegroup preset
  1095.8328 LUA checking for Senior Pics
  1095.8330 LUA setting Sports

Then I update the thumbnail for 129EOSR7_2R4A7194.CR3

  1095.8332 LUA INFO: responsive_cache.lua:  returning cache size 6
  1095.8333 LUA INFO: responsive_cache.lua:  returning cache size 3
  1095.8334 LUA INFO: responsive_cache.lua:  updating just developed image 129EOSR7_2R4A7194.CR3
  1095.8335 LUA INFO: responsive_cache.lua:  returning cache size 3
  1095.8709 get dimensions                [thumbnail]                                  (   0/   0) 7128x4732 scale=1.0000 ID=1453
  1095.8709 modify roi OUT                [thumbnail]      rawprepare              100 (   0/   0) 7128x4732 scale=1.0000 --> (   0/   0) 6984x4660 scale=1.0000
  1095.8709 modify roi OUT                [thumbnail]      ashift                 1600 (   0/   0) 6984x4660 scale=1.0000 --> (   0/   0) 6355x4240 scale=1.0000
  1095.8709 modify roi OUT                [thumbnail]      flip                   1700 (   0/   0) 6355x4240 scale=1.0000 --> (   0/   0) 4240x6355 scale=1.0000
  1095.8709 modify roi OUT                [thumbnail]      crop                   3300 (   0/   0) 4240x6355 scale=1.0000 --> (   0/ 733) 3665x5493 scale=1.0000
  1095.8710 pipe starting             CL0 [thumbnail]                                  (   0/   0)  600x 899 scale=0.1638 --> (   0/   0)  600x 899 scale=0.1638  ID=1453, nvidiacudanvidiageforcertx3070tilaptopgpu 5017MB
  1095.8710 modify roi IN                 [thumbnail]      crop                   3300 (   0/   0)  600x 899 scale=0.1638 --> (   0/ 120)  600x 899 scale=0.1638  ID=1453
  1095.8711 modify roi IN                 [thumbnail]      flip                   1700 (   0/ 120)  600x 899 scale=0.1638 --> (  22/   0)  899x 600 scale=0.1638  ID=1453
  1095.8711 modify roi IN                 [thumbnail]      ashift                 1600 (  22/   0)  899x 600 scale=0.1638 --> (  49/   6)  941x 663 scale=0.1638  ID=1453
  1095.8712 modify roi IN                 [thumbnail]      lens                   1300 (  49/   6)  941x 663 scale=0.1638 --> (  49/   5)  937x 662 scale=0.1638  ID=1453
  1095.8712 modify roi IN                 [thumbnail]      demosaic                800 (  49/   5)  937x 662 scale=0.1638 --> ( 298/  30) 5718x4040 scale=1.0000  ID=1453
  1095.8712 modify roi IN                 [thumbnail]      highlights              400 ( 298/  30) 5718x4040 scale=1.0000 --> (   0/   0) 6984x4660 scale=1.0000  ID=1453
  1095.8712 modify roi IN                 [thumbnail]      rawprepare              100 (   0/   0) 6984x4660 scale=1.0000 --> (   0/   0) 7128x4732 scale=1.0000  ID=1453
  1095.8712 pipe data: full               [thumbnail]                                  (   0/   0) 7128x4732 scale=1.0000 --> (   0/   0) 7128x4732 scale=1.0000
  1095.8775 process                   CL0 [thumbnail]      rawprepare              100 (   0/   0) 7128x4732 scale=1.0000 --> (   0/   0) 6984x4660 scale=1.0000  IOP_CS_RAW 265.1MB
  1095.8789 process                   CL0 [thumbnail]      temperature             300 (   0/   0) 6984x4660 scale=1.0000 --> (   0/   0) 6984x4660 scale=1.0000  IOP_CS_RAW 260.4MB
  1095.8817 process                   CL0 [thumbnail]      highlights              400 (   0/   0) 6984x4660 scale=1.0000 --> ( 298/  30) 5718x4040 scale=1.0000  IOP_CS_RAW 325.5MB
  1095.9167 process                   CPU [thumbnail]      demosaic                800 ( 298/  30) 5718x4040 scale=1.0000 --> (  49/   5)  937x 662 scale=0.1638  IOP_CS_RAW -> IOP_CS_RGB 1129MB
  1095.9962 demosaic clip_and_zoom    CPU [thumbnail]      demosaic                800 ( 298/  30) 5718x4040 scale=1.0000 --> (  49/   5)  937x 662 scale=0.1638
  1096.0614 process                   CL0 [thumbnail]      denoiseprofile          900 (  49/   5)  937x 662 scale=0.1638 --> (  49/   5)  937x 662 scale=0.1638  IOP_CS_RGB 49.6MB
  1096.0787 process                   CL0 [thumbnail]      lens                   1300 (  49/   5)  937x 662 scale=0.1638 --> (  49/   6)  941x 663 scale=0.1638  IOP_CS_RGB 44.9MB
  1096.0883 process                   CPU [thumbnail]      cacorrectrgb           1400 (  49/   6)  941x 663 scale=0.1638 --> (  49/   6)  941x 663 scale=0.1638  IOP_CS_RGB 20MB
  1096.1934 process                   CL0 [thumbnail]      ashift                 1600 (  49/   6)  941x 663 scale=0.1638 --> (  22/   0)  899x 600 scale=0.1638  IOP_CS_RGB 18.6MB
  1096.1942 process                   CL0 [thumbnail]      flip                   1700 (  22/   0)  899x 600 scale=0.1638 --> (   0/ 120)  600x 899 scale=0.1638  IOP_CS_RGB 25.9MB
  1096.1947 process                   CL0 [thumbnail]      exposure.1             2500 (   0/ 120)  600x 899 scale=0.1638 --> (   0/ 120)  600x 899 scale=0.1638  IOP_CS_RGB 30.2MB
  1096.2001 blend with form           CL0 [thumbnail]      exposure.1             2500 (   0/ 120)  600x 899 scale=0.1638 --> (   0/ 120)  600x 899 scale=0.1638  IOP_CS_RGB, BLEND_CS_RGB_SCENE, inverted
  1096.2024 process                   CL0 [thumbnail]      exposure               2700 (   0/ 120)  600x 899 scale=0.1638 --> (   0/ 120)  600x 899 scale=0.1638  IOP_CS_RGB 17.3MB
  1096.2028 process                   CL0 [thumbnail]      crop                   3300 (   0/ 120)  600x 899 scale=0.1638 --> (   0/   0)  600x 899 scale=0.1638  IOP_CS_RGB 17.3MB
  1096.2033 process                   CL0 [thumbnail]      diffuse                3700 (   0/   0)  600x 899 scale=0.1638 --> (   0/   0)  600x 899 scale=0.1638  IOP_CS_RGB 79.8MB
  1096.2082 process                   CL0 [thumbnail]      colorin                3800 (   0/   0)  600x 899 scale=0.1638 --> (   0/   0)  600x 899 scale=0.1638  IOP_CS_RGB -> IOP_CS_LAB 17.3MB
  1096.2099 transform colorspace      CL0 [thumbnail]      channelmixerrgb        3900 (   0/   0)  600x 899 scale=0.1638  IOP_CS_LAB -> IOP_CS_RGB `linear Rec2020 RGB'
  1096.2116 process                   CL0 [thumbnail]      channelmixerrgb        3900 (   0/   0)  600x 899 scale=0.1638 --> (   0/   0)  600x 899 scale=0.1638  IOP_CS_RGB 17.3MB
  1096.2122 process                   CL0 [thumbnail]      colorbalancergb        5700 (   0/   0)  600x 899 scale=0.1638 --> (   0/   0)  600x 899 scale=0.1638  IOP_CS_RGB 17.3MB
  1096.2135 process                   CL0 [thumbnail]      diffuse.1              5800 (   0/   0)  600x 899 scale=0.1638 --> (   0/   0)  600x 899 scale=0.1638  IOP_CS_RGB 131.6MB
  1096.2561 process                   CPU [thumbnail]      toneequal.1            6500 (   0/   0)  600x 899 scale=0.1638 --> (   0/   0)  600x 899 scale=0.1638  IOP_CS_RGB 17MB
  1096.2589 process                   CL0 [thumbnail]      filmicrgb              7700 (   0/   0)  600x 899 scale=0.1638 --> (   0/   0)  600x 899 scale=0.1638  IOP_CS_RGB 77.7MB
  1096.2610 transform colorspace      CL0 [thumbnail]      colorout               10200 (   0/   0)  600x 899 scale=0.1638  IOP_CS_RGB -> IOP_CS_LAB `linear Rec2020 RGB'
  1096.2624 process                   CL0 [thumbnail]      colorout               10200 (   0/   0)  600x 899 scale=0.1638 --> (   0/   0)  600x 899 scale=0.1638  IOP_CS_LAB -> IOP_CS_RGB 17.3MB
  1096.2639 process                   CPU [thumbnail]      gamma                  10900 (   0/   0)  600x 899 scale=0.1638 --> (   0/   0)  600x 899 scale=0.1638  IOP_CS_RGB 17MB
  1096.2651 cache report                  [thumbnail]                                   2 lines (important=0, used=0, invalid=0). Using 1029MB, limit=0MB. Hits/run=0.00. Hits/test=0.000
  1096.2651 pipe finished             CL0 [thumbnail]                                  (   0/   0)  600x 899 scale=0.1638 --> (   0/   0)  600x 899 scale=0.1638  ID=1453

Than I switch back to processing 129EOSR7_2R4A7219.CR3

  1096.2725 [lib_modulegroups_update_iop_visibility] modulegroups
  1096.2725           rawprepare 100
  1096.2726          temperature 300
  1096.2726           highlights 400
  1096.2726             demosaic 800
  1096.2726                 flip 1700
  1096.2726             exposure 2400
  1096.2726              colorin 3200
  1096.2726      channelmixerrgb 3300
  1096.2727            filmicrgb 5800
  1096.2727           finalscale 8200, hidden
  1096.2727             colorout 8300
  1096.2727          overexposed 8500, hidden
  1096.2727       rawoverexposed 8600, hidden
  1096.2727                gamma 9000, hidden
  1096.2728 LUA applying style
  1096.2731 [dt_dev_write_history_ext] Writing history image id=1478 `129EOSR7_2R4A7219.CR3', iop version: 0
  1096.2732           rawprepare, num  0, order 100, v(2), multiprio 0, enabled
  1096.2732             demosaic, num  1, order 800, v(4), multiprio 0, enabled
  1096.2732              colorin, num  2, order 3200, v(7), multiprio 0, enabled
  1096.2732             colorout, num  3, order 8300, v(5), multiprio 0, enabled
  1096.2733                gamma, num  4, order 9000, v(1), multiprio 0, enabled
  1096.2733          temperature, num  5, order 300, v(4), multiprio 0, enabled
  1096.2733           highlights, num  6, order 400, v(4), multiprio 0, enabled
  1096.2733      channelmixerrgb, num  7, order 3300, v(3), multiprio 0, enabled
  1096.2734             exposure, num  8, order 2400, v(6), multiprio 0, enabled
  1096.2734            filmicrgb, num  9, order 5800, v(6), multiprio 0, enabled
  1096.2734                 flip, num 10, order 1700, v(2), multiprio 0, enabled

Then this happens

The gets applied to 129EOSR7_2R4A7194.CR3, instead of 129EOSR7_2R4A7219.CR3, and the history gets written again (image 1453), which leads to....

  1096.2769 [styles_apply_to_image_ext] Apply style on image `(unknown)' id 1453, history size 32
  1096.2782 [dt_dev_write_history_ext] Writing history image id=1453 `(unknown)', iop version: 0
  1096.2783           rawprepare, num  0, order 100, v(2), multiprio 0, enabled
  1096.2784             demosaic, num  1, order 800, v(4), multiprio 0, enabled
  1096.2784              colorin, num  2, order 3800, v(7), multiprio 0, enabled
  1096.2785             colorout, num  3, order 10200, v(5), multiprio 0, enabled
  1096.2786                gamma, num  4, order 10900, v(1), multiprio 0, enabled
  1096.2786          temperature, num  5, order 300, v(4), multiprio 0, enabled
  1096.2787           highlights, num  6, order 400, v(4), multiprio 0, enabled
  1096.2788      channelmixerrgb, num  7, order 3900, v(3), multiprio 0, enabled
  1096.2788             exposure, num  8, order 2700, v(6), multiprio 0, enabled
  1096.2789            filmicrgb, num  9, order 7700, v(6), multiprio 0, enabled
  1096.2790                 flip, num 10, order 1700, v(2), multiprio 0, enabled
  1096.2790      colorbalancergb, num 11, order 5700, v(5), multiprio 0
  1096.2791       denoiseprofile, num 12, order 900, v(11), multiprio 0
  1096.2791              diffuse, num 13, order 3700, v(2), multiprio 0
  1096.2792              diffuse, num 14, order 5800, v(2), multiprio 1
  1096.2793             exposure, num 15, order 2500, v(6), multiprio 1
  1096.2793            toneequal, num 16, order 2400, v(2), multiprio 0
  1096.2794            toneequal, num 17, order 6500, v(2), multiprio 1
  1096.2794             demosaic, num 18, order 800, v(4), multiprio 0, enabled
  1096.2795      channelmixerrgb, num 19, order 3900, v(3), multiprio 0, enabled
  1096.2796                 lens, num 20, order 1300, v(10), multiprio 0, enabled
  1096.2796               ashift, num 21, order 1600, v(5), multiprio 0, enabled
  1096.2797                 crop, num 22, order 3300, v(1), multiprio 0, enabled
  1096.2797             exposure, num 23, order 2700, v(6), multiprio 0, enabled
  1096.2799             exposure, num 24, order 2500, v(6), multiprio 1, enabled
  1096.2799       denoiseprofile, num 25, order 900, v(11), multiprio 0, enabled
  1096.2800         cacorrectrgb, num 26, order 1400, v(1), multiprio 0, enabled
  1096.2800              diffuse, num 27, order 3700, v(2), multiprio 0, enabled
  1096.2801      colorbalancergb, num 28, order 5700, v(5), multiprio 0, enabled
  1096.2801              diffuse, num 29, order 5800, v(2), multiprio 1, enabled
  1096.2802            toneequal, num 30, order 6500, v(2), multiprio 1, enabled
  1096.2803             exposure, num 31, order 2700, v(6), multiprio 0, enabled
  1096.2804      colorbalancergb, num 32, order 5700, v(5), multiprio 0
  1096.2805       denoiseprofile, num 33, order 900, v(11), multiprio 0
  1096.2805              diffuse, num 34, order 3700, v(2), multiprio 0
  1096.2806              diffuse, num 35, order 5800, v(2), multiprio 1
  1096.2807             exposure, num 36, order 2500, v(6), multiprio 1
  1096.2807            toneequal, num 37, order 2400, v(2), multiprio 0
  1096.2808            toneequal, num 38, order 6500, v(2), multiprio 1
  1096.2814 get dimensions                [full]                                       (   0/   0) 7128x4732 scale=1.0000 ID=1453
  1096.2814 insane data                   [full]           ashift                 1600 (   0/   0) 7128x4732 scale=1.0000 --> (   0/   0) -2147483648x-2147483648 scale=1.0000

Maybe we should flush the pixelpipe if the imgids don't match.

I also think this is only 1 reproducer since we seen the blue images on other OS's and workflows.

EDIT: Maybe this is what we should check for, Apply style on image (unknown)' id 1453`

EDIT 2: Here's the run file
run7.txt.zip

@jenshannoschwalm
Copy link
Collaborator Author

@wpferguson indeed that looks bad. We have some remaining issues with the image cache that might be related, the first bunch of race problems i was able to reproduce is in Fix dt_image_update_final_size() and dt_image_get_final_size() (shutdown part 3) but there are more.

There is one more problem seen in your first log

   214.6128             exposure, num 32, order 2500, v(6), multiprio 1
   214.6128            toneequal, num 33, order 2400, v(2), multiprio 0
   214.6129            toneequal, num 34, order 6500, v(2), multiprio 1
   214.6134 get dimensions                [full]                                       (   0/   0) 7128x4732 scale=1.0000 ID=39
   214.6134 insane data                   [full]           ashift                 1600 (   0/   0) 7128x4732 scale=1.0000 --> (   0/   0) -2147483648x-2147483648 scale=1.0000  
   214.6150 updated final size for ID=39 to 7128x4732
   214.6190 LUA INFO: responsive_cache.lua:  returning cache size 6 
   214.6191 LUA INFO: responsive_cache.lua:  returning cache size 3 
   214.6193 LUA INFO: responsive_cache.lua:  updating just developed image 129EOSR7_2R4A6094.CR3 
   214.6194 LUA INFO: responsive_cache.lua:  returning cache size 3 
   214.6587 get dimensions                [thumbnail]                                  (   0/   0) 7128x4732 scale=1.0000 ID=39
   214.6587 modify roi OUT                [thumbnail]      rawprepare              100 (   0/   0) 7128x4732 scale=1.0000 --> (   0/   0) 6984x4660 scale=1.0000  
   214.6587 modify roi OUT                [thumbnail]      ashift                 1600 (   0/   0) 6984x4660 scale=1.0000 --> (   0/   0) 6606x4002 scale=1.0000  
   214.6587 modify roi OUT                [thumbnail]      flip                   1700 (   0/   0) 6606x4002 scale=1.0000 --> (   0/   0) 4002x6606 scale=1.0000  
   214.6588 pipe starting             CL0 [thumbnail]                                  (   0/   0)  545x 900 scale=0.1362 --> (   0/   0)  545x 900 scale=0.1362  ID=39, nvidiacudanvidiageforcertx3070tilaptopgpu 5017MB

but not sure to understand that. History had been saved (bad or good) but while calculating dimensions for full and thumb pipe the full pipe has a bad roi_out and thumb hasn't. That might be a problem in ashift internal algorithm. One idea was to make sure we always have aligned data for _homography

@jenshannoschwalm
Copy link
Collaborator Author

EDIT: Maybe this is what we should check for, Apply style on image (unknown)' id 1453`

The problem here is, we start a new dev so not sure where / how we could do a history protection.

@wpferguson
Copy link
Member

On another note, I processed a ~200 images without a loop occurring 😄

@jenshannoschwalm
Copy link
Collaborator Author

On another note, I processed a ~200 images without a loop occurring 😄

You mean on another machine? With or without OpenCL? OpenCL would mean we have two pipe running, that could be the culprit for your observation that OpenCL "hurts".

@wpferguson
Copy link
Member

wpferguson commented Jan 29, 2025

I was processing with OpenCL on (in the logs it shows up as CL0). In all the processing I did I got a half dozen image resets, but I didn't get any processing loops.

EDIT: I only have one machine. I have a set of scripts to set up environments for testing and development so that I don't pollute my main darktable installation.

@jenshannoschwalm
Copy link
Collaborator Author

EDIT: I only have one machine. I have a set of scripts to set up environments for testing and development so that I don't pollute my main darktable installation.

bashing is our friend :-)

@jenshannoschwalm
Copy link
Collaborator Author

Force-pushed some more fixes for ashift.

  1. @wpferguson i guess from your being silent there are no more issues to be reported from your side?
  2. @TurboGit i think we are good now. Personally i think it's all for 5.0.1 but let me know if (and what) you want me to prepare as a 5.0.1 specific PR

@TurboGit
Copy link
Member

Looks like we have a dead-lock with this. To reproduce:

  • Create a style with quite some modules
  • Assign a shortcut to it
  • Open an image (fresh edit)
  • Apply the style using the shortcut - wait - all ok
  • Now apply two time the style by quickly typing two time the shortcut
  • See, dead lock, need to kill darktable
    ...

The style I have used:

lot of iop.dtstyle.txt

After some more testing it looks like the culprit is Liquify, if I remove it from style it works better.

Yet, another issue (style without liquify):

  • Apply it multiple time (say 8 times) quickly on the same image
  • Then while the pipe is processing click on "reset" of the history module.
  • See that the pipe keeps recomputing

@wpferguson
Copy link
Member

i guess from your being silent there are no more issues to be reported from your side?

No, I just haven't had a chance to test yet. I'll test today

@jenshannoschwalm
Copy link
Collaborator Author

Hold on, will first try to sort out pascals reports.

@TurboGit TurboGit added the wip pull request in making, tests and feedback needed label Jan 31, 2025
@jenshannoschwalm
Copy link
Collaborator Author

@TurboGit no problem to reproduce :-) This is getting somewhat difficult and there is no quick solution except a quick hack not allowing reentrancy of that function. Maybe you cherrypick the first two commit and we only have history stuff here? Or i could open another PR with just those first two?

@jenshannoschwalm jenshannoschwalm changed the title Pixelpipe looping problems Pixelpipe history and style problems Feb 1, 2025
@wpferguson
Copy link
Member

Started testing. Processed first image and advanced to second. Image did not load (upper left preview is image 1), screenshot attached. Had to force quit darktable.

hang

run9.txt

@wpferguson
Copy link
Member

Before the bisected commit where these errors started, the pixelpipe worked without errors. Since then we've been trying fix after fix after fix... Did we ever figure out what went wrong with the bisected commit?

I'm concerned that the patient (darktable) has cancer and we keep trying to fix it with band aids.

@TurboGit
Copy link
Member

TurboGit commented Feb 1, 2025

except a quick hack not allowing reentrancy of that function.

Which function?

@jenshannoschwalm
Copy link
Collaborator Author

Since then we've been trying fix after fix after fix...

We got some things fixed while investigating this not being the real issue but still worthwhile.

Try only the first commit.

jenshannoschwalm added a commit to jenshannoschwalm/darktable that referenced this pull request Feb 2, 2025
We must not apply all the requested history changes until the last request to do so
has finalized.
This can happen as reported in darktable-org#18311 (and elsewhere by lua calls to this function).
@jenshannoschwalm
Copy link
Collaborator Author

@wpferguson i am also - let's say deeply - concerned. And yes, your bisected commit e8c2b95 had a severe problem, the hash was wrong and that has been fixed via #18275. (but please remember we have the new iop_order scheme ...)
Also we possibly calculated the dimensions wrongly (not directly related but that was at least one of the reasons leading to the observed loops) after some pipe iop errors, see 9a344b6
The ashift related f73d95c makes sure some internal simd algos always find aligend data - as we do elsewhere in that module code for good reason. I could not reproduce any of the ashift related issues here yet but that might depend on compiler and system (we has such issues many times elewhere ...)
Other reported issues that were linked to this were simply not correct

And yes again, i confess i don't yet fully understand what is sometimes going wrong. I did a deep investigation into history/style code and had hoped, i could fix it via the history mutex. Unfortunately this doesn't fully work as in many parts of that code we use different/fresh dev's ... So here is another bunch of commits.

The good point, what @TurboGit reported is a reproducer here too. ATM i hope that the reentrancy locking in ae91825 will be a fix for your lua stuff too.

@kofa73
Copy link
Contributor

kofa73 commented Feb 2, 2025

Thank you for your efforts. Unfortunately, it seems 'very few' can contribute to this. I've done my fair share of concurrency, but that was mainly about parallelising largely independent work (e.g. calling out to different backend systems to gather data), without users submitting other events that could affect the ongoing work. Hats off!

Would it not be possible to set up some kind of queue for events, and handle them in a sequential manner, in order to simplify the locking? Processing itself could still be parallel, but until a cascade of processing triggered by one event completes (either finishing, or cancelling), the next event would be not taken from the queue. It may be pretty bad for performance, and the parallel processing (between darktable's various pipelines, which somewhat depend on each-other, if I understand correctly) may still have issues -- I'm not familiar with darktable's architecture, unfortunately.

@wpferguson
Copy link
Member

Try only the first commit.

Didn't lock up. Processed 2 images and got the image reset

@jenshannoschwalm
Copy link
Collaborator Author

The new full stack of commits here is ready to be checked as I see it.

If you find issues I would love to get a fresh log.

@wpferguson
Copy link
Member

Okay, I pulled the latest commits, then started darktable. Resumed from where reset image.

I

  • removed the processing complete tag from the reset image and the next image
  • cleared the history of the reset image
  • double clicked on the reset image (whish was now "virgin").
  • darktable refused to enter darkroom. Double clicked on the image again and got a crash (force quit).

Here's the log,
run10.txt

@wpferguson
Copy link
Member

Tried again. Opened the image from above. The style had already been applied, even though it refused to load earlier. Processed the image, then advanced to the next image. Darkroom stopped responding with "loading ...." just like the screenshot above.

Here's the log -
run11.txt

1. Use only 'piece' as parameter
2. Include/deduplicate pipe->type check
3. Don't use iop_order to test for skipping
4. Some debugging logs added and improved
- use aligned data for _homography and simd matrix code
- make use of feqf() where appropriate
- use size_t instead of int where required for large image data
- some gboolean for clarity
Easier reading of return values.
Don't apply a style via dt_styles_apply_to_image() or dt_styles_apply_to_dev()
to an invalid image or not appropriate.

Added logs for both functions telling if ok or for errors.
Removed misleading filename `unknown* as we simply don't have that available here.
We must not apply all the requested history changes until the last request to do so
has finalized.
This can happen as reported in darktable-org#18311 (and elsewhere by lua calls to this function).
Ensure the image is locked while applying the style.
@jenshannoschwalm
Copy link
Collaborator Author

@wpferguson first: thanks a lot (again) for all your patience and testing!

  1. I checked about the unknown reported filename again, that message was just misleading as we simply don't have the filename available at that part of the code so the logs have been fixed.
  2. After more thinking it seems unlocking darktable.dev_threadsafe should happen before we reload the image and reconnect the accels.
  3. Could not reproduce here the re-entering darkroom or switching to next image problems, what i observe is that pressing space applies the style again due to shortcuts though.

@wpferguson
Copy link
Member

wpferguson commented Feb 4, 2025

Processed one image, advanced to the next and darktable hung with a loading message just like the screen shot above. EDIT: Had to force quit to get out of it.

run12.txt

@jenshannoschwalm
Copy link
Collaborator Author

jenshannoschwalm commented Feb 4, 2025

@wpferguson i am a bit lost atm, latest commits work just right here, but ...

I tried another approach in https://github.com/jenshannoschwalm/darktable/pull/new/reverting_pipe_stuff

The first two commits a) revert the commit e8c2b95 you bisected as being the bad one and b) 48282d4 the one fixing a wrong hash calculation done in a)

So if you just use those first two reverting commits everything should be as you remember "pixelpipe worked without errors".

EDIT: (this commit is not available any more with latest force-push) The third commit does the corrections i thought to be required to match the new iop_order scheme. Can you confirm that this breaks your workflow?

EDIT: Please note that if using only the first two commits artifacts in tone-equalizer and all modules doing blending are to be expected as those require a hash based on iop_order instead of "node position in the pipe"

@TurboGit you might also have a look. If the third commit is actually the culprit we can keep the "hash based on position" and will just have to fix hashing for tone equalizer and the blending cache ... Let's see what your tests show.

@TurboGit
Copy link
Member

TurboGit commented Feb 4, 2025

@jenshannoschwalm : Will do another testing session later today.

@jenshannoschwalm
Copy link
Collaborator Author

Hold on, I will do a PR later today that keeps the hash being calculated on position and will fix the mentioned other parts so we can do "proper" testing.

@jenshannoschwalm
Copy link
Collaborator Author

Ok, found the time and updated the https://github.com/jenshannoschwalm/darktable/pull/new/reverting_pipe_stuff branch, it all follows the assumption (suggested by the bisected commit) that we must use the node position in the pipe to calculate the hash, not the iop_order.
Basically this has no downside at all and requires two fixes.

Two commits are code maintenance/logs work.

One commit fixes the internal hash calculation in tone equalizer, one fixes the hash calculation we do to check for a correct blending cache.

So now ready for testing and review.

@jenshannoschwalm jenshannoschwalm deleted the hashing_pipe branch February 5, 2025 17:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bugfix pull request fixing a bug release notes: pending wip pull request in making, tests and feedback needed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants