Download PDF

Automated Form Filling Using OCR and Machine Learning for Enhanced Data Accuracy and Efficiency

Author : Greshma P Sebastian and Thasneem Musthafa

Abstract :

The system starts with optical character recognition (OCR) picture processing and text extraction, with an emphasis on improving image quality and obtaining pertinent textual data. Then, to deal with missing or noisy text input and convert it into an organized format appropriate for model training, a data preparation step is utilized. The construction and training of a machine learning model especially made for filling out forms using the preprocessed text input forms the basis of the system. This entails investigating different machine learning algorithms, maybe involving classification models and Natural Language Processing (NLP) methods. The trained model is integrated by the form filling engine, which then transfers the structured text data to the appropriate form fields. Among the advantages of the system are its ability to streamline the form-filling process and reduce human data entering. This helps in improving precision through language interpretation powered by machine learning. Adapting to changes in form layouts and making sure that the text and picture formats are robust are challenges.

Keywords :

Tesseract, OCR, YOLO, OpenCV