Segmentation Fine-Tuning leads to 0 line detection #683

agombert · 2025-02-04T16:05:41Z

Hey,

Thanks for the great work, I have some questions on the fine-tuning. I think it may come from the format of my input data. I've been looking at this link to try to get the right xml well shaped for my jpg images. But after fine-tuning (even after 1 epoch) i don't get any line 👀 .

Here is an example of xml file I have:

<?xml version="1.0" ?>
<alto xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns="http://www.loc.gov/standards/alto/ns-v4#" xsi:schemaLocation="http://www.loc.gov/standards/alto/ns-v4# http://www.loc.gov/standards/alto/v4/alto-4-3.xsd">
    <Description>
        <MeasurementUnit>pixel</MeasurementUnit>
        <sourceImageInformation>
            <fileName>/home/ubuntu/data/20250204_line_detection/images/FRANOM22_COLH78_0261_0232_4.jpg</fileName>
        </sourceImageInformation>
    </Description>
    <Tags>
        <OtherTag DESCRIPTION="line type" ID="LINE_TYPE_1" TYPE="type" LABEL="default"/>
        <OtherTag DESCRIPTION="region type" ID="REGION_TYPE_1" TYPE="region" LABEL="text"/>
    </Tags>
    <Layout>
        <Page WIDTH="491" HEIGHT="722" PHYSICAL_IMG_NR="0" ID="page_0">
            <PrintSpace HPOS="0" VPOS="0" WIDTH="491" HEIGHT="722">
                <TextBlock ID="42c3ee03-8810-4b48-9eb6-ddcb9f23321d" HPOS="0" VPOS="0" WIDTH="491" HEIGHT="722" TAGREFS="REGION_TYPE_1">
                    <TextLine ID="780aac20-46ee-4072-a714-04c14590713b" HPOS="99" VPOS="48" WIDTH="358" HEIGHT="3" BASELINE="99 48 205 50 342 51 457 50" TAGREFS="LINE_TYPE_1">
                        <String CONTENT=""/>
                    </TextLine>
                    <TextLine ID="f9e0050e-4014-437b-bdb6-fc85e4059699" HPOS="107" VPOS="99" WIDTH="327" HEIGHT="1" BASELINE="107 100 194 99 277 100 434 100" TAGREFS="LINE_TYPE_1">
                        <String CONTENT=""/>
                    </TextLine>
                    <TextLine ID="9edd2cd0-e6a5-4579-8734-afafc03819e0" HPOS="101" VPOS="135" WIDTH="196" HEIGHT="8" BASELINE="101 143 167 141 297 135" TAGREFS="LINE_TYPE_1">
                        <String CONTENT=""/>
                    </TextLine>
                    <TextLine ID="5f86381b-c704-4cbd-a1b7-b6295f0ed4a3" HPOS="106" VPOS="177" WIDTH="361" HEIGHT="8" BASELINE="106 185 248 177 467 181" TAGREFS="LINE_TYPE_1">
                        <String CONTENT=""/>
                    </TextLine>
                    <TextLine ID="b4366d1e-151d-4e8c-9960-3bb7437a9c29" HPOS="102" VPOS="226" WIDTH="176" HEIGHT="1" BASELINE="102 226 167 227 278 226" TAGREFS="LINE_TYPE_1">
                        <String CONTENT=""/>
                    </TextLine>
                    <TextLine ID="166863f8-0ebe-488f-97ba-3e62a88aeb79" HPOS="100" VPOS="267" WIDTH="317" HEIGHT="3" BASELINE="100 270 224 268 417 267" TAGREFS="LINE_TYPE_1">
                        <String CONTENT=""/>
                    </TextLine>
                    <TextLine ID="dab2d94e-3ffd-42c2-bf69-7e3d05292b8c" HPOS="100" VPOS="311" WIDTH="254" HEIGHT="1" BASELINE="100 312 192 311 354 312" TAGREFS="LINE_TYPE_1">
                        <String CONTENT=""/>
                    </TextLine>
                    <TextLine ID="35552902-4556-490c-bcae-3fe2826655c6" HPOS="101" VPOS="352" WIDTH="172" HEIGHT="2" BASELINE="101 352 183 354 273 352" TAGREFS="LINE_TYPE_1">
                        <String CONTENT=""/>
                    </TextLine>
                    <TextLine ID="f1fbad86-de43-411b-8233-99030c1235ee" HPOS="104" VPOS="395" WIDTH="268" HEIGHT="4" BASELINE="104 397 191 399 372 395" TAGREFS="LINE_TYPE_1">
                        <String CONTENT=""/>
                    </TextLine>
                    <TextLine ID="f76cd89c-df93-446a-baa8-489ca7a4991e" HPOS="106" VPOS="438" WIDTH="205" HEIGHT="3" BASELINE="106 441 188 438 311 441" TAGREFS="LINE_TYPE_1">
                        <String CONTENT=""/>
                    </TextLine>
                    <TextLine ID="9da1cfdf-e9b1-4322-9171-22843062be75" HPOS="108" VPOS="477" WIDTH="170" HEIGHT="7" BASELINE="108 481 166 477 278 484" TAGREFS="LINE_TYPE_1">
                        <String CONTENT=""/>
                    </TextLine>
                    <TextLine ID="ebea3fb5-1b3f-4ce4-9509-33504388e3a8" HPOS="102" VPOS="526" WIDTH="189" HEIGHT="5" BASELINE="102 531 181 528 291 526" TAGREFS="LINE_TYPE_1">
                        <String CONTENT=""/>
                    </TextLine>
                    <TextLine ID="d45f7502-66d5-4ed2-9a55-6f8a6db7a7dd" HPOS="89" VPOS="572" WIDTH="395" HEIGHT="8" BASELINE="89 580 246 572 417 575 484 575" TAGREFS="LINE_TYPE_1">
                        <String CONTENT=""/>
                    </TextLine>
                    <TextLine ID="6d5e459f-6b18-493b-bcd8-8f2082620791" HPOS="83" VPOS="619" WIDTH="403" HEIGHT="7" BASELINE="83 626 217 625 315 625 486 619" TAGREFS="LINE_TYPE_1">
                        <String CONTENT=""/>
                    </TextLine>
                </TextBlock>
            </PrintSpace>
        </Page>
    </Layout>
</alto>

And here is an BASELINE points on the image:

Then I'm using:

ketos -vvv segtrain -i /home/ubuntu/models/blla.mlmodel -f xml /home/ubuntu/data/20250204_line_detecti
on/alto_xml/*.xml -cl -o /home/ubuntu/models/ft_kraken -d cuda:0

And everything looks to train, but the mean_iu stays around 0.25 and even decreases.

[02/04/25 15:54:37] INFO validation run: accuracy 0.9899430871009827 mean_acc 0.9899430871009827 mean_iu 0.2532690465450287 freq_iu 0.96146160364151

After a few epochs, when I run the inference, I don't get any line though...

Also, I'm using only 30 pictures to test the training before annotating more and scale the process. Do you have any idea why this is not working ?

The text was updated successfully, but these errors were encountered:

agombert · 2025-02-10T10:46:54Z

👋 hey @mittagessen any idea 😃?

mittagessen · 2025-02-10T10:58:45Z

Sorry, I'm working on an application that is due tonight. I'll be able to have a look at everything that accumulated over the last couple of weeks afterwards.

agombert · 2025-02-25T15:21:28Z

Hey @mittagessen any availabitility to check the problem ?

mittagessen · 2025-02-25T19:16:43Z

30 images can be ok but you'll need to train quite a bit longer than a single epoch, probably 50+. Although I have to say it is weird that the base model breaks so completely you're not seeing *any* line output anymore after only 30 training step, especially as your data seems to be fairly similar to what the base model has been trained on. To verify your training data there's a script `contrib/segmentation_overlay.py` that you can feed your ALTO files into to see what kraken makes out of them. You should also check that you aren't introducing spurious new line classes. That's most easily seen when running the training without all the verbosity switches which will print a table of detected classes before training actually starts. There should be one line class (default) and one text region if you weren't planning on introducing a more complex typology. The validation metrics are fairly useless for segmentation training unfortunately, but the current main branch of kraken prints training losses as well. You should see those going down over time. If that isn't the case something is wrong^TM and we'll have to investigate.

agombert · 2025-03-04T09:07:10Z

Hey @mittagessen I've checked the contrib/segmentation_overlay.py on the data and it looks quite normal:

I see the lines as in the previous photo and, as I don't want to detect any region it shows the whole photo.

Also when launching the training I get:

WARNING  Setting baseline location to centerline from unset model.                                                train.py:1032
                    INFO     Training line types:                                                                                     train.py:1038
                    INFO       default       2       258                                                                              train.py:1040
                    INFO     Training region types:                                                                                   train.py:1041
                    INFO       text  3       30                                                                                       train.py:1043
                    DEBUG    Constructing Adam optimizer (lr: 0.0002, momentum: 0.9)

Which looks to show one type for each line and region types.

Any idea of why there is a problem ?

agombert · 2025-03-04T11:17:52Z

Okay, I've spotted 🔎 an "invalid geometry" on 2 xml files when I used the overlay properly on all the data.

➡ Now it looks to train without forgetting everything after one epoch. I'll wait for 50+ epochs to see if it can overfit just to confirm if it's learning. If so I will scale the annotation to have around 200 images for fine-tuning.

📢 Will tell you soon !

agombert · 2025-03-04T19:26:59Z

With 30 examples the training works. ✅

❎ But I tried to do it with 100 annotated data then used the overlay to be sure all the data was well structured (really painfull process tbh... as I had to go through a lot of little things to avoid Polygonizer failed on line 6: No intersection with boundaries...). The overlay was okay, only (cf two examples below). But when doing again the training after 1 epochs, no segmentation done on training set 😕.

Also it looks like it's loading at each epoch the image:

[03/04/25 19:21:26] DEBUG    Attempting to load                                                                                 segmentation.py:163
                             /home/ubuntu/trocr_handwritten/20250304_line_detection/alto_xml/../images/FRANOM22_COLH78_0458_003                    
                             4_6.jpg

Would it be possible to make it once to accelerate the training ?

Best,

Arnault

agombert mentioned this issue Feb 11, 2025

Line Segmentation Pipeline with Kraken handwrittenOCR/trocr_handwritten#20

Merged

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Segmentation Fine-Tuning leads to 0 line detection #683

Segmentation Fine-Tuning leads to 0 line detection #683

agombert commented Feb 4, 2025

agombert commented Feb 10, 2025

mittagessen commented Feb 10, 2025 via email

agombert commented Feb 25, 2025

mittagessen commented Feb 25, 2025 via email

agombert commented Mar 4, 2025 •

edited

Loading

agombert commented Mar 4, 2025 •

edited

Loading

agombert commented Mar 4, 2025

Segmentation Fine-Tuning leads to 0 line detection #683

Segmentation Fine-Tuning leads to 0 line detection #683

Comments

agombert commented Feb 4, 2025

agombert commented Feb 10, 2025

mittagessen commented Feb 10, 2025 via email

agombert commented Feb 25, 2025

mittagessen commented Feb 25, 2025 via email

agombert commented Mar 4, 2025 • edited Loading

agombert commented Mar 4, 2025 • edited Loading

agombert commented Mar 4, 2025

agombert commented Mar 4, 2025 •

edited

Loading

agombert commented Mar 4, 2025 •

edited

Loading