OCR
Software OCR
OCR and ICR software systems analysis of artificial intelligence technology with consider only sequences of characters instead of words or phrases and not anti-validate the data during the recognition process, see ExperVision, ABBYY, OmniPage or cuneiform. Based on sequential analysis of lines and curves, OCR and ICR to "best estimates" Character through database tables to match or chains near make up words. For these systems actually recognize hand-printed forms or printed words must be separated into individual characters. That's why most forms of administration require that persons or handprint in boxes or combs carefully spaced using (steps) to the bottom lines of input spaces force between the letters written in a form. Without the use of combs or boxes of conventional technologies reject fields if people do not follow the structure for filling out forms, what resulting in administrative overheads and the high cost of forms processing organizations.
History
Gustav Tauschek in 1929 obtained a patent on OCR in Germany, followed by Handel who obtained a U.S. patent on OCR in USA in 1933 (U.S. patent 1,915,993). In 1935 Tauschek also obtained a U.S. patent in his method (U.S. Patent 2,026,329).
Tauschek machine was a mechanical device that uses templates. A photodetector is positioned so that when the model and the character to be recognized were aligned for accurate results and a light coming toward them, no light would reach the photodetector.
In 1950, David H. Shepard, a cryptanalyst in the Security Agency Armed Forces of the United States, called on Frank Rowlett, who had broken Japan's PURPLE code diplomatic, to work with Dr. Luis Tordelli recommend data automation procedures for the Agency. This included the problem of converting printed messages machine language for computer processing. Pastor decided that it should be possible to build a machine to do this, and with the help of Harvey Cook, a friend, built "Gismo" in his attic during evenings and weekends. This was reported in the Washington Daily News April 27, 1951 and the New York Times, December 26, 1953, after its U.S. patent 2,663,758 was issued. Shepard then founded Intelligent Machines Research Corporation (IMR), which happened to have the world's first OCR systems number used in commercial transactions. While both Gismo and the systems used by the analysis of magnetic resonance imaging, rather than character recognition can accept some changes in the police Gismo was limited to a reasonable distance from the vertical recording, while commercial IMR scanners analyzed characters after any where in the scanning field, a practical necessity in the real world documents.
The first commercial system was installed in Reader's Digest 1955, several years later [when?], Was donated by Readers Digest to the Smithsonian, where he was put on display. The second system was sold to the Standard Oil Company of California for reading credit cards printouts for billing purposes, with many other systems sold to other oil companies. Other systems sold by TMI during the 1950 include a floppy heel bill to the Ohio Bell Telephone Company and a scanner on the website of the United States Air Force for reading and transmission of written messages by teletype. IBM and other patents were licensed after Shepard's OCR.
In about 1965, Reader's Digest and RCA Building collaborated OCR document reader designed to scan the serial numbers back advertising coupons Digest reader. The source documents were used in printed by a printer using the drum RCA OCR-A font. The reader connects directly to an RCA 301 computer (one of the first solid state computers.) This unit has been followed by a specialized document reader installed on the reader TWA Flight of the transformed (a task made more difficult by the charred support Estimated ticket). Readers of documents processed at a rate of 1,500 documents per minute and check each document rejecting those who have not been able to process correctly. The product became part of the range of products such as the RCA player designed to handle "response documents" such as utility bills insurance payments and income.
United States States Postal Service has been using OCR machines to sort mail since 1965 based on technology developed primarily by the prolific inventor Jacob Rabinow. The first use of OCR in Europe was the British General Post Office (GPO). In 1965 he began planning a bank system as a whole, the National Giro, using the technology of optical character recognition, a process that has revolutionized payment systems taxes in the UK. Canada Post systems been using OCR since 1971. systems, optical character recognition to read the name and address the recipient in the mechanized sorting center first, and printing a bar code on the envelope-based routing zip code. To avoid confusion with legible address field, can be placed wherever the letter, special ink (orange in visible light) is used which is clearly visible under ultraviolet light. The envelopes can be treated with equipment based on simple barcode readers.
In 1974, Ray Kurzweil started the company Kurzweil Computer Products, Inc. led the development of the first omni-font optical character recognition program characteristics of the computer system that can recognize printed text in normal print. It was decided that the best application of this technology would create a reading machine for the blind, to allow the blind to have a computer to read text aloud. This device is the invention of two flat technologieshe CCD scanner and text to speech synthesizer. On January 13, 1976, the end product was successfully presented at a press conference Kurzweil led by widely reported and officials of the National Federation of the Blind. Call of the Kurzweil reading machine, the device has a complete table. The opening day of the machine, Walter Cronkite used the machine to give his signature SoundOff, "and this is the way it was, January 13, 1976." By listening to The Today Show, musician Stevie Wonder heard a demonstration of the device and personally buy the first version of the Kurzweil reading machine.
In 1978, Kurzweil Computer Products began selling a commercial version of software for optical character recognition. LexisNexis is one of the first customers, and buy the program to download and emerging legal document in their new databases online. Two years later, Kurzweil sold his company to Xerox, which has an interest in pursuing the sale of text-to-paper to computer. Kurzweil Computer Products became a subsidiary of Xerox known as ScanSoft, now Nuance Communications.
The current state of OCR technology
This section should additional citations for verification.
Please help Please help improve this article by adding reliable references. Reference material may be challenged and removed. (May 2009)
The accurate recognition of the Latin alphabet the text is written now considered a problem largely resolved in applications where clear images are available, such as the digitization of printed documents. typical accuracy rate of over 99% [citation needed], total accuracy can not be achieved through a review of men. Other recognition areasncluding hand printing, cursive script and printed text in other scripts (especially those with large number of characters) again is still under of active research.
accuracy rate can be measured in several ways, and how they are measured can affect the rate of reported accuracy. For example, if the word context (Essentially a lexicon of words) is used to correct the software search words character error rate absent, 1% (99% accuracy) can rise to an error rate of 5% (95% accuracy) or worse if the measure is based on whether each complete word has not been recognized incorrect letters.
Recognition character line sometimes confused with optical character recognition (see handwriting recognition). OCR is an instance of character recognition offline, where the system recognizes the fixed form of static, while the line character recognition to recognize the dynamic movement during handwriting. For example, the recognition of line, such as those used for gestures on the Tablet PC OS or PenPoint can tell whether a mark was set horizontally from right to left or left to right. The line character recognition is also known by other terms such as character recognition character recognition real-time dynamic and intelligent character recognition or ICR.
on-line systems to recognize hand-printed text on the march became known as commercial products in recent years (see story Tablet PC). These include input devices such as PDA running Palm OS. Apple launched the Newton product. The algorithms used in these devices exploit the fact that the order, speed and direction of individual lines segments at input are known. In addition, the user can be trained to use only the forms of specific letters. These methods can not be used in the software that scans paper documents recognition if necessary hand-printed documents is still an open problem. The accuracy rate of 80% to 90% off the neat, clean hand characters Forms can be obtained, but the accuracy rate still translates to dozens of errors per page, making the technology useful only in very limited applications.
cursive text recognition is an active area of research, with the lower level of recognition of hand printed text. The highest rates of recognition of general cursive script will probably not be possible without the use of contextual information or grammar. For example, recognizing entire words from a dictionary is easier than trying to analyze the individual characters in the script. Reading the amount of line on a check (which is always a number written) is an example in using a small dictionary can significantly increase the recognition rate. Knowing the grammar of the language being scanned can also help determine whether a word is likely to be a verb or a noun, for example, allowing greater accuracy. Various forms of cursive characters themselves simply do not contain sufficient information accurately (above 98%) recognize cursive handwriting all.
It is necessary to understand that OCR technology is a basic technology is also used scanning applications. For this reason, advanced navigation can be unique and patented and not easily copied although based on the optical character recognition basic.
In case of problems more complex recognition, intelligent character recognition are generally used as artificial neural networks can be indifferent to affine transformations and nonlinear.
One technique that has enjoyed considerable success in word recognition difficult groups and in the documents general, lend themselves to computer OCR is automatically presented to man in the reCAPTCHA system.
Software OCR language support
Name
The latest version
Release Year
Language recognition
Dictionaries
ExperVision and OpenRTK TypeReader
8,0
2010
English, French, German, Italian, Spanish, Portuguese, Danish, Dutch, Swedish, Norwegian, Hungarian, Polish, Simplified Chinese, Traditional Chinese, Russian, Finnish and French Polynesia
ABBYY FineReader
10
2009
Adyghian Abkhaz, Afrikaans, Albanian Agul, Altai, Armenian (Eastern, Western, Record), Avar, Aymara, Azeri (Cyrillic) Azerbaijan (Latin), Bashkir, Basic, Basque Belarusian, Bemba, Blackfoot, Breton Bugotu, Bulgarian, Buryat, C / C + +, Catalan, Cebuano, Chamorro, Chechen, Chinese (Simplified and Traditional), Chukchi, Chuvash, COBOL, Corsican, Crimean Tatar, Croatian Crow Czech Dakota Danish Dargwa, Dutch Dungan (Netherlands and Belgium), English, Eskimo (Cyrillic and Latin), Esperanto, Estonian, including, Evenki, Faeroese, Fiji, Finland, Fortran, French, Frisian, Friulian, Gagauz, Galician, Ganda, German (Luxembourg), German (new and old spelling), Greek, Guarani, Hani, Hausa, Hawaiian, Hebrew, Hungarian, Icelandic, Ido, Indonesian, Ingush, Interlingua, Irish, Italian, Japanese, JAVA, Jingpo, Kabardian, Kalmyk, Karachay-Balkaria Karakalpak Kawa Kasubi Kazakhstan, khaki, Khanty, Kikuyu, Kirghiz, Congo, Korea, Koryak, Kpelle, Kumyk, Kurdish, Lak, Latin, Latvia, LEZG, Lithuanian, Luba, Macedonia, Madagascar, Malay, Malinke, Maltese, Mansy, Maori, Mari, Maya, Miao, Minangkabau, Mohawk, Moldova, Mongolia, Mordvins, Nahuatl, Nenets, Nivkh, Nogay, Norwegian (Y bokml Nynorsk), Nyanja, Occidental, Ojibway, South Ossetia, Papiamento, Pascal, Polish, Portuguese (Portugal and Brazil), Provencal, Quechua, Romansh, Romanian Romany Rundi, Russian, Russian (old spelling), Rwanda, Sami (Lappish), Samoan, Scottish Gaelic, Selkup, Serbian (Cyrillic and Latin), Shona, Simple chemical formulas, Slovak, Slovenian, Somali, Sorbian, Sotho, Spanish, Sunda, Swahili, Swaziland, Sweden Tabasaran, Tagalog, Tahiti, Tajikistan, Tatar, Thai, Tok Pisin, Tonga, Tswana, Tun, Turkey, Turkmenistan, Tuvan, Udmurt, Uigur (Cyrillic and Latin), Ukrainian Welsh, Uzbekistan (Cyrillic and Latin), Wolof, Xhosa, Yakut, Yiddish, Zulu Zapotec
Armenian Bashkir (East, West, Record), Bulgarian, Catalan, Croatian, Czech, Danish, Dutch (Netherlands and Belgium), English, Estonian, Finnish, French, German, (new and old spelling), Greek, Hebrew, Hungarian, Indonesian, Italian, Latvian, Lithuanian, Norwegian (Nynorsk and bokml), Polish, Portuguese (Portugal and Brazil), Romanian, Russian, Slovak, Slovenian, Spanish, Swedish, Tatar, Thai, Turkish, Ukrainian
OmniPage
17
2009
Afrikaans, Albanian, Aymara, Basque, Bemba, Blackfoot, Breton, Bugotu, Bulgarian, Byelorussian, Catalan, Chamorro, Chechen, Corsa, Croatian, Crow, Czech, Danish, Dutch, English, Esperanto, Estonia, Faroe Islands, Fiji, Finland, French Frisian, Friulian, Gaelic (Irish), Gaelic (Scottish), Galician, Ganda / Luganda, German, Greek, Guarani, Hani, Hawaiian, Hungarian, Icelandic, Ido, Indonesian, Interlingua, Italian, Inuit, Kabardian, Kasubi, Kawa, Kikuyu, Kongo, Kpelle, American Kurdish, Latvian, Lithuanian, Luba, Luxembourg, Macedonia, Madagascar, Malaysia, Mande, Malta, Maori, Maya, Miao, Minankabaw, Mohawk, Moldavia, Nahuatl, Norwegian, Nyanja, Ojibway West Papiamento, Pidgin English, Polish, Portuguese (Brazilian), Portuguese, Provencal, Quechua, Romansh, Romanian, Romany Rundi Rwanda, Russia, Sami Lule, Northern, Southern Sami, Sami, Samoa, Sardinia, Serbian (Cyrillic), Serbian (Latin), Shona, Sioux, Slovak, Slovenian, Somali, Sorbian, Sotho, Spanish, Sundanese, Swahili, Swazi, Swedish, Tagalog, Tahiti, TINP, Tonga, Tswana, Tun, Turkish, Ukrainian, Visayas, Welsh, Wolof, Xhosa, Zapotec, Zulu
[PDF OCR X]
1.4
2010
Bulgarian Catalan, Czech, Simplified Chinese Traditional Chinese, Danish, German, Greek, English, Finnish, French, Hungarian, Indonesian, Italian, Japanese, Latvian, Lithuanian, Norwegian, Polish, Portuguese, Romanian, Serbian, Slovak, Slovenian, Spanish, Swedish, Tagalog, Turkish, Vietnamese, Ukrainian,
Readiris
12 Pro & Corporate
2009
American English, British English, Afrikaans, Albanian, Aymara, Bali, Basque, Bemba, Bikol, Bislama, Brazilian, Breton, Bulgarian, Byelorussian, Catalan, Cebuano, Chamorro, Corsican, Croatian, Czech, Danish, Dutch, Esperanto, Estonian, Faeroese, Fiji, Finland, French, Frisian, Friulian Galician, Ganda, German, Greek, Greenlandic, Haitian (Creole), Hani, Hiligaynon, Hungarian, Icelandic, Ido, Ilocano, Indonesian, Interlingua, Irish (Gaelic), Italian, Javanese, Kapampangan, Kicongo, Kinyarwanda, Kurdish, Latin, Latvian, Lithuania, Luxembourg, Macedonia, maturity, Madagascar, Malaysia, Malta, Isle of Man (Gaelic), Maori, Mayan, Minangkabau, Nahuatl, Norwegian, digital, Nyanja, Nynorsk, Occitan, Pidgin English, Polish, Portuguese, Quechua, Rhaeto-Romanic Romanian Rundi, Russian, Samoan, Sardinian, Scottish (Gaelic) Serbian, Serbian (Latin), Shona, Slovakian, Slovenian, Somali, Sotho, Spanish, Sudanese, Swahili, Swedish, Tagalog, Tahitian, Tok Pisin, Tonga, Tswana, Turkish, Ukrainian, Waray, Wolof, Xhosa, Zulu Zapotec, Bulgaria – English, Byelorussian – English, Greek – English, Macedonian – English, Russian – English Serbian – English, Ukrainian – English + Moldova, Bosnia (Cyrillic and Latin), Tetum, Swiss-German and Kazak
Readiris
12 Pro & Corporate Middle East
2009
Arabic, Persian and Hebrew
Readiris
12 Asia Pro & Corporate
2009
Simplified Chinese, Traditional Chinese, Japanese, and Korean
Cuneiform
12
2007
English German, Croatian, Polish, Danish, Portuguese, Dutch, figures, Czech, French, Romanian Hungarian, Bulgarian, Slovenian, Latvian, Lithuanian, Estonian, Turkish, Russian, Swedish, Spanish, Italian, Russian and English (mixed), Ukrainian, Serbian
RBM
0.47
2009
Arabic OCR Kirtas Technologies
2009
15 languages from left to right: English, French, German and Dutch. Arabic, Farsi, Jawi, Pashto and Urdu, and Bilingual Arabic / English Arabic, French and Persian and English.
MoreData
1.0
2008
a completely free software using tesseract ocr (google) as an engine of OCR, scan several documents each text search test results, the Windows interface
English, French, Italian
Microsoft Office Document Imaging
Office 2007
2007
The language is related to the availability of verification tools installed. For languages not included in the version MS Office, you will have proofing tools for kit (sold separately).
NEOPTEC-SCAN DATA
5.7
2009
French, Spanish, English.
Microsoft Office OneNote 2007
Verus NovoDynamics
Middle East Business
2005
Arabic, Persian (Farsi, Dari), Pashto, Urdu, English and integrated including French. It also recognizes the Hebrew language, including English integrated.
Verus NovoDynamics
Professional Asia
2009
Simplified and Traditional Chinese, Korean and Russian, including integrated English
Ocrad
Brainware
Hocr
0.10.13
2008
Hebrew
OCRopus
0.3.1
2008
All languages and alphabets that supports Tesseract Tesseract through the plugin, and is compatible with the Latin alphabet and English for the recognition of native species
ReadSoft
European characters, simplified and traditional Chinese, Korean characters, Japanese
Alt-N Technologies
RelayFax Network Fax Manager
Sakhr OCR
2009
Arabic, English, French and 16 other languages. Farsi, Jawi, Dari, Pashto, Urdu (optional on extra language pack)
bilingual documents in Arabic and English, Persian / English and Arabic, French | |
Scantron Cognition
SimpleOCR
3.5
2008
In English and French
SmartScore
Tesseract
2.03
2008
Can recognize 6 languages, is fully capable UTF8, and is fully recyclable
Tranzyme – Tocris
3.0
2008
characters maximum precision in 11 different languages. English, French, Italian, German, Dutch, Swedish, Norwegian, Finnish, Danish, Spanish and Portuguese
See also
Wikimedia Commons has media related to OCR
Automatic Plate Recognition
CAPTCHA
language computational
Computer vision
Machine learning
Music OCR
OCR SDK
Recognition software Optical Character
Optical Mark Recognition
Raster to Vector
Raymond Kurzweil
Voice Recognition
Exploration Books
Institutional Repository
Digital Library
OCR-B
References
^ CY Suen, et al (05/29/1987) future challenges of the applications the handwriting and the computer, 3rd International Symposium on Handwriting and Computer Applications, Montreal, May 29 1987, # http://users.erols.com/rwservices/pens/biblio88.html Suen88, Retrieved 03/10/2008
^ Tappert, Charles C., et al (1990-1908), The State of the Art Writing for online recognition, IEEE Transactions on Design Analysis and Machine Intelligence, Vol 12 No. 8, August 1990, pp 787-FF # http://users.erols.com/rwservices/pens/biblio90 . html Tappert90c, retrieved 03/10/2008
^ Lenet-5, convolutional neural networks
^ SimpleOCR FAQ – dictionaries
External Links
ICDAR'07, ICDAR'09, a global conference on all aspects of document recognition
17 Understanding the basic principles of recognition Handwriting things and history
Optical Character Recognition Unicode – hex range: Optical Character Recognition in Unicode 2440-245F
EV
OCR software
Free Software
RBM cuneiform Ocrad OCRopus Tesseract
Proprietary software
Microsoft Expervision FineReader Office Document Imaging OmniPage Readiris ReadSoft SimpleOCR SmartScore ViewWise
OCR | Artificial intelligence applications | Applications of computer vision | Automatic identification and data capture | Computational Linguistics | Categories: Unicode | SymbolsHidden categories: Related articles from October 2008 | All articles | Wikipedia articles needing style editing from October 2008 | All articles needing style editing | Vague or ambiguous time | Related Articles from May 2009 | All articles with unsourced statements | Articles with links since January 2009 About the Author
I am a professional writer from China Manufacturers, which contains a great deal of information about plastic tea strainer , elegant plastic dinnerware, welcome to visit!
Tale of Kieu
|
|
My First Bilingual Book – Numbers (English-Vietnamese) by Milet Publishing $4.44 |
|
|
My First Bilingual Book – Animals – English-Vietnamese-Milet $9.80 |
|
|
My First Bilingual Book – Vegetables – English-Vietnamese-Milet $9.17 |
|
|
My First Bilingual Book-Animals (English-Vietnamese) $10.43 |
|
|
My First Bilingual Book – Fruit – English-Vietnamese by Milet, (Board book), boo $9.10 |
|
|
My First Bilingual Book – Vegetables – English-Vietnamese by Milet, (Board book) $9.10 |
|
|
My First Bilingual Book – Fruit – English-Vietnamese | Milet NEW 1840596392 GDN $7.19 |
|
|
My First Bilingual Book-Vegetables (English-Vietnamese) $10.43 |
|
|
My First Bilingual Book-Fruit (English-Vietnamese) $10.43 |
|
|
My First Bilingual Book-Animals (English-Vietnamese) $10.02 |
|
|
NEW – My First Bilingual Book-Home (English-Vietnamese) $10.43 |
|
|
Ectaco Partner EV500T Vietnamese & English Bidirectional Talking Electronic Dictionary, Translator and Travel Phrase Book $199.95 The new ECTACO Partner EV500 is a bilingual English Vietnamese translating dictionary and so much more. Designed to let anyone understand and communicate instantly, the EV500 features all the latest programming advances. Containing a massive 580,000 words in its comprehensive vocabulary database plus 14,000 of the most important phrases for immediate communication, the EV500 employs sophisticated… |
|
|
Numbers Vietnamese/English Children’s Bilingual Puzzle Set $15.00 The Number set of puzzle has 10 numbers and objects(1 to 10) This set is going from easy to more difficult level in order to encourage the children to keep playing and reaching to the challenging part. Each number has different color and the equivalent pieces (ex: #2 has two pieces and #10 has 10 pieces.) High quality, hard, durable, and colorful pieces come in the box which is great to keep all … |
|
|
Music and Sports Vietnamese/English Children’s Bilingual Picture Book $8.50 Pocket Dictionary 6″x5″ (36 pages) Printed by Art Publisher of Vietnam English-Vietnamese Picture Dictionary-Music and Sports This dictionary helps children enrich the vietnamese and english vocabulary and develop their reading skills. It is a great book for a beginner bilingual children. The vocabulary of musical instruments and sports with provocative pictures. This book is good for both viet… |
|
|
My First Animal Book Vietnamese/English Picture Dictionary – 8.5 x 12 47 pages $14.00 English-Vietnamese Picture Dictionary (animal dictionary) This dictionary help children enrich the vietnamese and english vocabulary and develop their reading skills. It is a great book for a beginner bilingual children. The vocabulary is devided by topics with provocative pictures. This book is good for both vietnamese and non-vietnamese speaking children…. |
|
|
Oxford Picture Dictionary English-Vietnamese: Bilingual Dictionary for Vietnamese speaking teenage and adult students of English $16.02 Content is organized within 12 thematic units, including Everyday Language, People, Housing, Food and Recreation. Each unit starts with an Intro page (new to this edition) and ends with a story page, with single or double-page sub-topics introducing new words in a realistic visual context and easy-to-learn “chunks.”The target new vocabulary is listed and simple practice activities help students pu… |
|
|
Pocket Vietnamese Dictionary: Vietnamese-English English-Vietnamese (Periplus Pocket Dictionaries) $3.08 The Periplus Pocket Dictionary Series has been compiled by academics and translators experienced with the needs of beginners. These are the ideal dictionaries for students and travelers, covering all the words needed for everyday situations and travel basics without confusing abbreviations or dictionary terminology…. |
|
|
Tuttle Practical Cambodian Dictionary: English-Cambodian Cambodian-English (Tuttle Language Library) $6.97 This handy, up-to-date Cambodian dictionary contains over 5,000 entries. It is designed with both script and romanized forms to help those with little or no experience with Cambodian script. Entries provide clear, precise definitions and sample phrases to illustrate the natural usage of the language. Foreigners learning Cambodian and Cambodians learning English will find this dictionary a reliable… |
Tags: dictionary, language, languages, reference, translation