OCR Error Correction Using Character Correction and Feature-Based Word Classification
Nachum Dershowitz et Ido Kissos, "OCR Error Correction Using Character Correction and Feature-Based Word Classification", Actes de l'IAPR International Workshop on Document Analysis Systems, DAS, Santorin, 2016, 198-203 pp.
Abstract
This paper explores the use of a learned classifier for post-OCR text correction. Experiments with the Arabic language show that this approach, which integrates a weighted confusion matrix and a shallow language model, improves the vast majority of segmentation and recognition errors, the most frequent types of error on our dataset.
Plus d'informations (site IEEE Xplore Digital Library)
|
Machine Learning Tools for Historical Documents 01 October 2015 - 30 June 2016 |
|