calling pdf.Page.GetTextByRow and got result with disordered text with some pdf file #16

cfh0081 · 2021-06-17T07:36:31Z

I discover that calling pdf.Page.GetTextByRow and got result with disordered text with some pdf file. For example, I got "761" which should be "176".
I found the result is that in page.go sort with sort.Sort which is not stable, and replace the sort function with sort.Stable can solve the problem.
And pdf.Page.GetTextByColumn also need to modify the same.

stuta · 2022-05-13T14:21:22Z

I tried replacing sort.Sort with sort.Stable. It did not help this problem. Text is not in the same order as with r.GetPlainText(). GetPlainText seems to produce text in the correct order, but without linefeeds, it makes the text hard to read.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

calling pdf.Page.GetTextByRow and got result with disordered text with some pdf file #16

calling pdf.Page.GetTextByRow and got result with disordered text with some pdf file #16

cfh0081 commented Jun 17, 2021

stuta commented May 13, 2022

calling pdf.Page.GetTextByRow and got result with disordered text with some pdf file #16

calling pdf.Page.GetTextByRow and got result with disordered text with some pdf file #16

Comments

cfh0081 commented Jun 17, 2021

stuta commented May 13, 2022