Short description
Kognition is an omnifont OCR software for KDE.
Due to the fact that each step of the OCR process can be visualized you can get a quick idea of how OCR works and where the problems lie. However the program may be of minor/no use for end users in it's current state.
Long description
The program is the result of two diploma theses about omnifont optical character recognition.
The Question "Is kognition an usable OCR program?" can be simply answerd with a clear "No." At least not with the eyes of an end user.
However it is definitely worth a look for everybody who is interested in OCR! Due to the fact that each step of the OCR process can be visualized you can get a quick idea of how OCR works and where the problems lie.
The aim of the program is to do a good, widely font independent, text recognition. The internal OCR process is as follows:
- user selects an input picture (grayscale or b/w picture)
- program detects the lines of text and groups unconnected parts to characters
- for each character:
- find the skeleton of the character
- find a minmum polygon representation of this skeleton
- normalize size, position and rotation of the character
- match with database (one or more candidates are found)
- for each word: look up the word in the dictionary solving potentially uncertainties
- output ASCII text of recognition process
Key features:
- widely font independent (omnifont OCR)
- algorithmical character thinning (non morphological) for better results
- detects slope text lines
- does dictionary lookups (currently for German language)
- can visualize all steps of the recognition process
- easily extensible database (among other things one character may have different variants)
- both degree dissertations are included as documentation on the project homepage (in German language)
Technical aspects:
Development Status: Alpha
Environment: KDE
Intended Audience: OCR developers/OCR intrested end users
License: GNU General Public License (GPL)
Natural language: English (source code, README), German (documentation, dictionary)
OS platforms: OS Independent (developed with Linux)
libraries used: KDE/Qt (GPL), ispell dictionary (GPL)
Programming language: C++
Topic: OCR, Graphics Conversion, Information Analysis,
Artificial Intelligence
Sample session
Character window
Start the program and click "File->Open". If your install went fine, your file requester should point right to the examples folder. Open "B_smile.png". The file consists of only one (fantasy) character. Therefore the text preview and the character window show nearly the same content. At first we will concentrate on the character window.
Check "Debug->Show border lists". Now yellow connected dots show the border lists of the character. They are internally arranged by being "inside" or "outside" the character. You can step through border lists of the same segment by pressing page up/down.
Now deselect "Debug->Show border lists" and select "Debug->Show border angles" (You should sometimes uncheck the previous settings to avoid brain damage by the flood of information being displayed ;-)). Now you see the angle at each point of the border as small line. If the curvature at this point is concave the line is also ended with a dot. If "Debug->Show smoothed border angles" is off you see the angles as returned by the sobel operator applied to the border pixel, otherwise those after the smoothing process. In the last case yellow lines indicate smoothed angles and red lines indicate possibly problematic smoothed angles.
If you check "Debug->Show cross sections" you will see all areas of the character were both sides of the stroke are almost parallel. The dots show you the starting points of the cross sections. You can check if they are arranged in the right order by stepping through the cross sections with the keys "." and "," (watch out for the green cross section).
Checking "Debug->Show singular regions" will mark singular regions in blue. These are regions at the at both ends of of character strokes, when strokes are touching/crossing other strokes or if the form is irregular (no cross sections are found).
If you select "Debug->Show skeleton" you can see the skeleton of the character. Each red dot is in the middle of a cross section. Cyan dots are singular points (one for each singular region).
Checking "Debug->Show directions" shows you the points of the skeleton with their stroke direction. The attempt to describe the character with curved parts and lines didn't gave good results. So this information is not used for the OCR process.
Selecting "Debug->Show polygon skelton" shows you the skeleton approximated by a minimum amount of lines. The yellow singular points are aligned to the connecting lines. This is the first step to a more abstract view of the character. Note e.g. that the nose of our fantasy character is considered to be a dot only because its stroke was not long enough compared to the average stroke thickness of the character.
You can turn on "Debug->Show guessed lines" to see the singular points connected with the rest. This gives you also a good impression of how well the positions of the singular points were choosen. If you take a look at the character "V.png" you can see that the singular region is rather huge. Nevertheless the form of the V is reconstructed quite good. Here morphological algorithms such as getting the skeleton by removing border points using a dilation/erosion often fail and give you a skeleton that looks like "Y".
Checking "Debug->Show abstract char" shows you an abstract character where resolution, size, rotation and position are left behind and a normed representation is the result. This representation can now be compared to all the character representations in the database.
Selecting "Debug->Show matching state" shows you which character matches best (if any). You can step though all of them with page up/down. At first you see a group of matching characters and after that the rest of the alphabet. Cyan is the representation of the character you want to recognize. Grey the one of the currently chosen character in the database. Red marks lines that have been matched (the larger the space between them, the lesser is the matching). Blue lines mark unmatched remains of the database character. Green lines are the same as red lines, but the quality of matching was determined by the angle of the lines. Any cyan or grey parts remaining are completly unmatched. In some cases it can be useful to draw thicker lines (key ">").
Main window
Now please open the "Text.png" file. Hit "Cancel" to have all characters recognized at once. The result of the recognition process is written to standard out (i.e. the console were you started kognition from). Words that could not be verified in the word database have a "[?]" appended.
Switch off everything below "Debug->Show guessed lines". You should see the same in the text window as if "Text.png" was opened in a picture viewer. Now switch on "Debug->Show boundingboxes". Each unconnected part of a character should be surrounded by a blue box. (Note that the i has two bounding boxes.) You can enlarge the view by pressing the "+" key. If you take a look at the metaboxes ("Debug->Show metaboxes") you see that the stroke and the dot of the i have one metabox which means that the program understood that both boundingboxes are parts of one character. With "Debug->Show basepoints" you can see the base points (bottom middle of the metaboxes) that were used to calculate the baseline ("Debug->Show baseline"). The talest boundingbox which lies above the baseline can be marked purple with "Debug->Show tallest character part". The tallest boundingbox, the baseline and the position of the character relative to the baseline is used to create the abstract character representation.
Keys
key function
- switch zoom mode for text view
(fit to width/fit to height/original size)
+ zoom into text view
- zoom out of text view
cursor keys move text view
page up show border lists of next segment
abstract character view: compare with next database entry page down show border lists of previous segment
abstract character view: compare with previous database entry < draw thinner lines > draw thicker lines , mark previous cross section . mark next cross section print make snapshot of the character window: this will save an PNG
image to the current directory with the name "snapshot.png"
Once you made a snapshot your config file (most likely ~/.kde/share/config/kognitionrc) will contain following entries:
[Image Snapshot]
aspect=square
cm=5
dpi=300
linethickness=3
You can change the values while the program is running. The new values will be automatically used for the next snapshot. Meaning of the values:
dpi dots per inch of the resulting image (0=character window
as seen on screen, ignoring cm and aspect)
cm: length in cm of the image
aspect whether this length should be the "width" "height" or
"square" (larger one of width and height)
linethickness thickness of the lines (0=use same as on the screen)
Wordlists
The included German word list was generated using the ispell dictionary for the German language igerman98 (Version 20030222). Copyright (c) 1999-2002 Björn Jacke released under the GPL.
Simply use 'make isowordlist' to generate the German word list. It can be downloaded from http://lisa.goe.net/~bjacke/igerman98/
For other languages take a look at:
http://ficus-www.cs.ucla.edu/geoff/ispell-dictionaries.html
If you need to add additional characters for your language, you will have to adjust the character tables in 'character_database.cpp', as well as the constants 'dictionary_lookup_chars' and 'total_chars' in 'character_database.h'.
