shinonome-bunko / 東雲文庫

やがて夜は明ける．

In my mind...🤔

graph TD
    A["Images of Books (ex. 国立国会図書館デジタルコレクション)"] -->|OCR| B
    B["Text File (MD? XML?; UTF-8? Shift-JIS?)"] -->|"Parser? (Should I do this?)"| C
    B --> |"Iterate + Modify by Human (Editor in Browser or Git; cf. Wiki, Qiita, Zenn)"| B
    C["Aozora Bunko File Format?"] -->| | D
    D["Publish to Aozora Bunko?"]

Use Existing Aozora Bunko Files as Training Data
- We can find original texts since Aozora Bunko shows the original version of the texts ("底本").
- Supervised learning with these data

This Project consists of...

Text Recognition
- OCR with Python
- Aim to generate texts accurately and quickly also in Japanese vertical texts
Viewer/Editor
- Simple and Fast Viewer and Editor working on Browser
- Anyone can modify the generated texts either in the Built-in Editor or GitHub (Can we compare the original pictures and the generated texts?)
- Can this editor be built with Python as well?
Text Matching Game
- Matching Game for Japanese Texts
- Aim to improve the accuracy of OCR (also for fun, of course!)
- This game can be a learning material for Japanese learners (like the original concept of Duolingo)
- cf. Google Captcha

Related Projects

aozorahack
- Web Page
- ideathon: There are many ideas similar to this project!
- kosakuin: Aozora Bunko Editor (MIT License)
- aozora-cli: Aozora Bunko CLI (MIT License)
- aozora-parser.js
- aozoraflow
kyukyunyorituryo/AozoraEditor: 青空文庫エディタ
kyukyunyorituryo/html2aozora
gearsns/AozoraJavaScriptParser

Name		Name	Last commit message	Last commit date
Latest commit History 24 Commits
src/text_recognition		src/text_recognition
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

shinonome-bunko / 東雲文庫

In my mind...🤔

This Project consists of...

Related Projects

About

Releases

Packages

Languages

License

ItsukiKigoshi/shinonome-bunko

Folders and files

Latest commit

History

Repository files navigation

shinonome-bunko / 東雲文庫

In my mind...🤔

This Project consists of...

Related Projects

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages