Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Segmentation Fine-Tuning leads to 0 line detection #683

Open
agombert opened this issue Feb 4, 2025 · 7 comments
Open

Segmentation Fine-Tuning leads to 0 line detection #683

agombert opened this issue Feb 4, 2025 · 7 comments

Comments

@agombert
Copy link

agombert commented Feb 4, 2025

Hey,

Thanks for the great work, I have some questions on the fine-tuning. I think it may come from the format of my input data. I've been looking at this link to try to get the right xml well shaped for my jpg images. But after fine-tuning (even after 1 epoch) i don't get any line 👀 .

Here is an example of xml file I have:

<?xml version="1.0" ?>
<alto xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns="http://www.loc.gov/standards/alto/ns-v4#" xsi:schemaLocation="http://www.loc.gov/standards/alto/ns-v4# http://www.loc.gov/standards/alto/v4/alto-4-3.xsd">
    <Description>
        <MeasurementUnit>pixel</MeasurementUnit>
        <sourceImageInformation>
            <fileName>/home/ubuntu/data/20250204_line_detection/images/FRANOM22_COLH78_0261_0232_4.jpg</fileName>
        </sourceImageInformation>
    </Description>
    <Tags>
        <OtherTag DESCRIPTION="line type" ID="LINE_TYPE_1" TYPE="type" LABEL="default"/>
        <OtherTag DESCRIPTION="region type" ID="REGION_TYPE_1" TYPE="region" LABEL="text"/>
    </Tags>
    <Layout>
        <Page WIDTH="491" HEIGHT="722" PHYSICAL_IMG_NR="0" ID="page_0">
            <PrintSpace HPOS="0" VPOS="0" WIDTH="491" HEIGHT="722">
                <TextBlock ID="42c3ee03-8810-4b48-9eb6-ddcb9f23321d" HPOS="0" VPOS="0" WIDTH="491" HEIGHT="722" TAGREFS="REGION_TYPE_1">
                    <TextLine ID="780aac20-46ee-4072-a714-04c14590713b" HPOS="99" VPOS="48" WIDTH="358" HEIGHT="3" BASELINE="99 48 205 50 342 51 457 50" TAGREFS="LINE_TYPE_1">
                        <String CONTENT=""/>
                    </TextLine>
                    <TextLine ID="f9e0050e-4014-437b-bdb6-fc85e4059699" HPOS="107" VPOS="99" WIDTH="327" HEIGHT="1" BASELINE="107 100 194 99 277 100 434 100" TAGREFS="LINE_TYPE_1">
                        <String CONTENT=""/>
                    </TextLine>
                    <TextLine ID="9edd2cd0-e6a5-4579-8734-afafc03819e0" HPOS="101" VPOS="135" WIDTH="196" HEIGHT="8" BASELINE="101 143 167 141 297 135" TAGREFS="LINE_TYPE_1">
                        <String CONTENT=""/>
                    </TextLine>
                    <TextLine ID="5f86381b-c704-4cbd-a1b7-b6295f0ed4a3" HPOS="106" VPOS="177" WIDTH="361" HEIGHT="8" BASELINE="106 185 248 177 467 181" TAGREFS="LINE_TYPE_1">
                        <String CONTENT=""/>
                    </TextLine>
                    <TextLine ID="b4366d1e-151d-4e8c-9960-3bb7437a9c29" HPOS="102" VPOS="226" WIDTH="176" HEIGHT="1" BASELINE="102 226 167 227 278 226" TAGREFS="LINE_TYPE_1">
                        <String CONTENT=""/>
                    </TextLine>
                    <TextLine ID="166863f8-0ebe-488f-97ba-3e62a88aeb79" HPOS="100" VPOS="267" WIDTH="317" HEIGHT="3" BASELINE="100 270 224 268 417 267" TAGREFS="LINE_TYPE_1">
                        <String CONTENT=""/>
                    </TextLine>
                    <TextLine ID="dab2d94e-3ffd-42c2-bf69-7e3d05292b8c" HPOS="100" VPOS="311" WIDTH="254" HEIGHT="1" BASELINE="100 312 192 311 354 312" TAGREFS="LINE_TYPE_1">
                        <String CONTENT=""/>
                    </TextLine>
                    <TextLine ID="35552902-4556-490c-bcae-3fe2826655c6" HPOS="101" VPOS="352" WIDTH="172" HEIGHT="2" BASELINE="101 352 183 354 273 352" TAGREFS="LINE_TYPE_1">
                        <String CONTENT=""/>
                    </TextLine>
                    <TextLine ID="f1fbad86-de43-411b-8233-99030c1235ee" HPOS="104" VPOS="395" WIDTH="268" HEIGHT="4" BASELINE="104 397 191 399 372 395" TAGREFS="LINE_TYPE_1">
                        <String CONTENT=""/>
                    </TextLine>
                    <TextLine ID="f76cd89c-df93-446a-baa8-489ca7a4991e" HPOS="106" VPOS="438" WIDTH="205" HEIGHT="3" BASELINE="106 441 188 438 311 441" TAGREFS="LINE_TYPE_1">
                        <String CONTENT=""/>
                    </TextLine>
                    <TextLine ID="9da1cfdf-e9b1-4322-9171-22843062be75" HPOS="108" VPOS="477" WIDTH="170" HEIGHT="7" BASELINE="108 481 166 477 278 484" TAGREFS="LINE_TYPE_1">
                        <String CONTENT=""/>
                    </TextLine>
                    <TextLine ID="ebea3fb5-1b3f-4ce4-9509-33504388e3a8" HPOS="102" VPOS="526" WIDTH="189" HEIGHT="5" BASELINE="102 531 181 528 291 526" TAGREFS="LINE_TYPE_1">
                        <String CONTENT=""/>
                    </TextLine>
                    <TextLine ID="d45f7502-66d5-4ed2-9a55-6f8a6db7a7dd" HPOS="89" VPOS="572" WIDTH="395" HEIGHT="8" BASELINE="89 580 246 572 417 575 484 575" TAGREFS="LINE_TYPE_1">
                        <String CONTENT=""/>
                    </TextLine>
                    <TextLine ID="6d5e459f-6b18-493b-bcd8-8f2082620791" HPOS="83" VPOS="619" WIDTH="403" HEIGHT="7" BASELINE="83 626 217 625 315 625 486 619" TAGREFS="LINE_TYPE_1">
                        <String CONTENT=""/>
                    </TextLine>
                </TextBlock>
            </PrintSpace>
        </Page>
    </Layout>
</alto>

And here is an BASELINE points on the image:
Image

Then I'm using:

ketos -vvv segtrain -i /home/ubuntu/models/blla.mlmodel -f xml /home/ubuntu/data/20250204_line_detecti
on/alto_xml/*.xml -cl -o /home/ubuntu/models/ft_kraken -d cuda:0

And everything looks to train, but the mean_iu stays around 0.25 and even decreases.

[02/04/25 15:54:37] INFO validation run: accuracy 0.9899430871009827 mean_acc 0.9899430871009827 mean_iu 0.2532690465450287 freq_iu 0.96146160364151

After a few epochs, when I run the inference, I don't get any line though...

Also, I'm using only 30 pictures to test the training before annotating more and scale the process. Do you have any idea why this is not working ?

@agombert
Copy link
Author

👋 hey @mittagessen any idea 😃?

@mittagessen
Copy link
Owner

mittagessen commented Feb 10, 2025 via email

@agombert
Copy link
Author

Hey @mittagessen any availabitility to check the problem ?

@mittagessen
Copy link
Owner

mittagessen commented Feb 25, 2025 via email

@agombert
Copy link
Author

agombert commented Mar 4, 2025

Hey @mittagessen I've checked the contrib/segmentation_overlay.py on the data and it looks quite normal:

Image
Image

I see the lines as in the previous photo and, as I don't want to detect any region it shows the whole photo.

Also when launching the training I get:

WARNING  Setting baseline location to centerline from unset model.                                                train.py:1032
                    INFO     Training line types:                                                                                     train.py:1038
                    INFO       default       2       258                                                                              train.py:1040
                    INFO     Training region types:                                                                                   train.py:1041
                    INFO       text  3       30                                                                                       train.py:1043
                    DEBUG    Constructing Adam optimizer (lr: 0.0002, momentum: 0.9)

Which looks to show one type for each line and region types.

Any idea of why there is a problem ?

@agombert
Copy link
Author

agombert commented Mar 4, 2025

Okay, I've spotted 🔎 an "invalid geometry" on 2 xml files when I used the overlay properly on all the data.

➡ Now it looks to train without forgetting everything after one epoch. I'll wait for 50+ epochs to see if it can overfit just to confirm if it's learning. If so I will scale the annotation to have around 200 images for fine-tuning.

📢 Will tell you soon !

@agombert
Copy link
Author

agombert commented Mar 4, 2025

With 30 examples the training works. ✅

❎ But I tried to do it with 100 annotated data then used the overlay to be sure all the data was well structured (really painfull process tbh... as I had to go through a lot of little things to avoid Polygonizer failed on line 6: No intersection with boundaries...). The overlay was okay, only (cf two examples below). But when doing again the training after 1 epochs, no segmentation done on training set 😕.

Image
Image

Also it looks like it's loading at each epoch the image:

[03/04/25 19:21:26] DEBUG    Attempting to load                                                                                 segmentation.py:163
                             /home/ubuntu/trocr_handwritten/20250304_line_detection/alto_xml/../images/FRANOM22_COLH78_0458_003                    
                             4_6.jpg

Would it be possible to make it once to accelerate the training ?

Best,

Arnault

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants