mechanism to temporarily prevent text retrieval #396

hi-ko · 2022-04-04T08:51:00Z

the content tracker still needs to run synchronously. We need a mechanism to temporarily prevent text retrieval to avoid scalability issues and timeouts (caused by async transactions) - especially if we already know that they are long running like for ocr.

In the old, sync transformer framework it was possible to fake such a feature by setting cm:isContentIndexed=false to prevent the node to be catched up from the repository before it has been transformed and to remove that aspect later when the text transformation is available.

#395 / SEARCH-2974 breakes this old "feature". So either we get a new feature to postpone the text retrieval or the mechanism for isContentIndexed is working again as expected e.g.

if a new node get's cm:isContentIndexed=false property added by behavior it must not result into an empty index doc
when removing the aspect or setting isContentIndexed to true later the text should be indexed

The text was updated successfully, but these errors were encountered:

hi-ko mentioned this issue Apr 4, 2022

soft timeout for long running text extractions #397

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

mechanism to temporarily prevent text retrieval #396

mechanism to temporarily prevent text retrieval #396

hi-ko commented Apr 4, 2022

mechanism to temporarily prevent text retrieval #396

mechanism to temporarily prevent text retrieval #396

Comments

hi-ko commented Apr 4, 2022