Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add new example for custom granular output [v5] #896

Merged
merged 3 commits into from
Mar 19, 2024

Conversation

Kishlay-notabot
Copy link
Contributor

Add new example for custom granular output, Users can generate json files which contain bbox data of words/symbols detected in the input files. These examples add the ability for users to generate bulk datasets for handwritten characters/words.
See #877

@Balearica
Copy link
Member

Thanks for putting this together. I reviewed, and have a couple notes aimed at making sure other users can understand and run this example.

  1. As Tesseract.js supports both browser and Node.js, the README should specify in the first few sentences that this is an example specifically for Node.js.
    1. This may seem obvious, however users trying to run Node.js code in the browser, or browser code in Node.js, is actually one of the most common sources of confusion.
  2. The example repo, and open source Node.js projects in general, should include a package.json file that installs the correct version of the relevant dependencies
    1. Running npm i should install all the correct dependencies
    2. In addition to saving other users time, specifying the expected version of packages ensures that the example still runs even when dependencies implement breaking changes in later versions.

@Kishlay-notabot
Copy link
Contributor Author

@Balearica my apologies for not including the package-json file. I accidentally deleted it in the latest commit of the repo, I'll re-add it once I get back to my laptop.

@Kishlay-notabot
Copy link
Contributor Author

@Balearica updated with all the necessary changes, clarified the usage method and added package-json files.

@Kishlay-notabot
Copy link
Contributor Author

The contributing link was broken, fixed it with the latest commit.

@Kishlay-notabot
Copy link
Contributor Author

@Balearica please review this

@Balearica
Copy link
Member

I reviewed this today, and have a few more comments. As before, the overall goal is making sure new users can easily run this code.

  1. Running npm i currently does not install all the dependencies required, because package.json does not include the dependency canvas
    1. It's possible that this happened because you have canvas installed globally on your system
  2. Please add the exact commands users need to run as code blocks in the README
    1. This appears to be:
      1. node OCR-and-bbox-export.js
      2. node Crop-from-exported-json.js
  3. Please add a sample directory/image such that the commands run without the user needing to create a new folder and copy in data
    1. At present downloading and running immediately results in an error, as it appears to be looking for images in input_images despite this path not existing in the repo

If you make these adjustments such that new users can simply (1) clone, (2) run npm i, and (3) run node OCR-and-bbox-export.js/node Crop-from-exported-json.js, and everything runs, I will merge in.

@Kishlay-notabot
Copy link
Contributor Author

Kishlay-notabot commented Mar 19, 2024

@Balearica Added all the specific details, Please check it out!

@Balearica
Copy link
Member

Great, I will merge. Thank you for this contribution to the community.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants