OSRAChem is a desktop application that facilitates a semi-automated work-flow for extracting chemical structures (from images) in full-text scientific articles. It relies on the open-source OSRA utility (many thanks to Igor Filippov). The extracted structures are displayed as 2D depictions. The application also takes image and text inputs. All image file formats supported by GraphicsMagick are valid.
The work was part of an internship project under the supervison of Dr. Christoph Steinbeck, at the European Bioinformatics Institute.
- OSRA- Optical Structure Recognition Application
- CDK- Chemistry Development Kit
- OPSIN- Open Parser for Systematic IUPAC Nomenclature
- Apache PDFBox- A Java PDF Library
- JPedal- An open source library with fully-featured PDF viewer
- Operating System: Mac OS X or Linux (Ubuntu 12.04 or later)
- Java 6 or later
- Install Homebrew (if not previously installed): see instructions
- Tap the cheminformatics repository (thanks to Matt Swain):
brew tap mcs07/cheminformatics
- Install OSRA:
brew install osra
- type
osra
and you should see something like below (if yes, installation is complete)
OSRA must be compiled from source in Linux. Find detailed instructions here.