Optical Character Recognition (OCR)

Optical character recognition (OCR) refers to both the technology and process of reading and converting typed, printed or handwritten characters into machine-encoded text or something that the computer can manipulate. It is a subset of image recognition and is widely used as a form of data entry with the input being some sort of printed document or data record such as bank statements, sales invoices, passports, resumes and business cards. The document is either scanned or a picture is taken and it is up to the program to recognize the characters and give an output in the form of a text document.

Optical character recognition is the recognition of language-specific characters by a computer by analyzing an image, which is already computer-readable. This is often done by taking an image of the document first by scanning it or taking a digital picture. This creates a raster image made up of data that the computer understands, and through specifically programmed algorithms, most of which are used in the field of artificial intelligence, the computer recognizes the patterns in the image, and in this case the patterns are characters. The program then creates or outputs character codes, usually ASCII, that are equivalent to the recognized characters from the input image. Most OCR programs must be trained in order for them to become better at recognizing characters.

Post a Comment

0 Comments