Tesseract boxfile web editor
The editor is used to create ground truth files for training Tesseract or Transkribus OCR AI-based models from provided scanned images. It can be also be used for other tasks as described below.
To run the online editor you must upload a scanned image and optionally a box (text) file. You can use this image and accompanying box file as example. The example is from Nicolae Țincu, Cele şapte virtuţi sau Faptele bune de căpetenie, Tipografia lui Ioan Gött, Brașov, 1847.
Right now the editor supports:
- manual annotation of the document to create a ground truth file to be used in training our AI models to automatically perform OCR in the Romanian Transitional Script.
- word pairing to map words in the Romanian Transitional Script to Modern Latin characters.
- automatically suggest a transliteration based on a pretrained simple model. The suggested version can be edited online.