You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Describe the bug
While extracting a PDF I realized some tables were getting split because a row was not captured, as if it was considered a blank line. On other tables, the last cell in the row was skipped.
Screenshots
The screenshots are not from the original PDF, but it will hopefully illustrate the problem.
Given a table with a schema similar to the image below:
I expected to capture the entire table at once:
However, once the 5th was skipped, I ended up with two distinct capture groups.
The other problem is that sometimes some cell are skipped. One example is the 1st capture, that have missing data at the 3 ending rows:
Sometimes the data skipping happens also when the table is not split.
The text was updated successfully, but these errors were encountered:
GuroGuru
changed the title
[BUG] - {Stream or Lattice} {Description}
[BUG] - {Lattice} SpreadsheetExtractionAlgorithm failing in capturing rows and cells
Dec 22, 2021
Describe the bug
While extracting a PDF I realized some tables were getting split because a row was not captured, as if it was considered a blank line. On other tables, the last cell in the row was skipped.
Screenshots
The screenshots are not from the original PDF, but it will hopefully illustrate the problem.
Given a table with a schema similar to the image below:
![1-schema](https://user-images.githubusercontent.com/19675437/147165393-70fe0a86-77fa-4044-82ac-6159a15e7c45.png)
I expected to capture the entire table at once:
![2-expected](https://user-images.githubusercontent.com/19675437/147165569-1bdea1b9-3180-4232-8350-cb75fb772bf8.png)
However, once the 5th was skipped, I ended up with two distinct capture groups.
![3-extracted](https://user-images.githubusercontent.com/19675437/147165572-6fe9ee86-50b0-4518-9c1e-2d5254017920.png)
The other problem is that sometimes some cell are skipped. One example is the 1st capture, that have missing data at the 3 ending rows:
![4-skipped-data](https://user-images.githubusercontent.com/19675437/147165881-f6111ef7-107d-445e-a642-f4bd00801e5c.png)
Sometimes the data skipping happens also when the table is not split.
The text was updated successfully, but these errors were encountered: