INTRODUCTION
I bet you recognise the following situation. You have some paper material you need to scan because you want to copy it, mail it, or something like that. This can be papers from school, sheetmusic, part of a book, a whole book, etc. Whatever it is, it's much. So, you take a deep breath a prepare yourself for a couple of extremely dull hours behind your pc.
While your in this infinite prescan / select / scan / save loop, you think ahead on stage two. Maybe, if you're a bit of a multitasking person, you're even working on stage two already. Which is of course: rotating all scanned material upright. Yes, this is important. No matter how small the angle, it will be distracting.
Then, when you are finally done what you have is a large (huge) pile of images, probably named in some clever way so they can be viewed in alphabetical order. It's all not as sharp as the original material but it will have to do. If you want to copy it you're only halfway and you still have the painful task of printing each individual file ahead of you. If on the other hand you want to save it or mail it you probably want to compress it, just to keep it all together. Of course it will have to be extracted to be viewed again but its the only option you have. You think.
To summarize, what you'd really want is this:
* an easy way to scan a lot of paper
* have it just as clean and sharp as the original (or better)
* bundle this in a way it can be directly viewed / printed
Now that you clearly remember this long cherished wish, here is, a dream come true...
EPSJOIN
Actually this is only the third item of the short wishlist. The other two have already existed for some time but they clearly missed point three to be really useful. Let me explain by repeating the above situation, the better way.
an easy way to scan a lot of paper
This one is easy. The key here is to forget about scan regions, so you can do
without a gui. Sane <http://www.sane-project.org> comes with an excellent
command-line frontend scanimage
<http://www.sane-project.org/man/scanimage.1.html> which lets you in control of
all your scanner's options. Just take a minute to figure out the comand-line
arguments you need and enter a simple bash loop like this:
-- for i in $(seq -w 999); do read -p 'press a key to continue' -- scanimage --mode Gray --resolution 300 > page$i.pnm -- done
Break out of it with control-c. This way you'll scan all that paper in no time.
have it just as clean and sharp as the original (or better)
For this use a bitmap tracer like autotrace <http://autotrace.sourceforge.net> or potrace <http://potrace.sourceforge.net>, whichever you like. These programs transform a bitmap image (like the ones you just scanned) into a smooth and sharp postscript (eps) image. Personally I prefer the latter (don't let that influence you) so I'll continue this example with potrace. It takes a single command to turn all these scanned images into postscript. However, I strongly advise you to first preprocess these images with mkbitmap, a highpass filter that comes with potrace. This will significantly improve the final postscript outcome. So, two commands:
-- mkbitmap *.pnm
-- potrace *.pbm
Easy, right?
bundle this in a way it can be directly viewed / printed
Up until now you're left with a large bunch of eps files. The images may be sharp but they're not rotated, they all have a different placement and they are not joined in a single file. So that's where epsjoin comes in. It's a gtk2 <http://www.gtk.org> / gnome <http://www.gnome.org> app that builds a single postscript file of all these loose eps files by putting each on a single page. Each eps can be individually rotated and translated to correct scanning imperfections. As this happens at postscript level this doesn't involve raster operations, which means no loss of quality no matter how many rotations you need to get it perfect. The resulting postscript can of course be directly viewed or printed. Even better, it can be opened again to add, remove, reorder or modify images.
So far for the introduction, I'll go into detail now.
USAGE
With epsjoin you can create a postscript document from a collection of eps images. Each image is put on a separate page, individually rotated and translated. The translating procedure is a bit unconventional, aimed to make it very easy to position the individual images in a consistent way throughout the document. Each image has its own coordinate system (the 'local' system) which can be translated and rotated. The document page also has a coordinate system (the 'global' system) that can be translated and scaled. The final placement of each eps image is determined by making the two coordinate systems coincide. Both are controled from epsjoin's preview window, which can be either in local or in global mode. The mode is switched with the spacebar.
In local mode the image's coordinate system is represented by two semi-transparent perpendicular lines. The origin of the coordinate system is the intersection of these lines. It is up to you to choose a useful origin that can be found on all images. An example of such point that may be found on each page is the page number. The origin is set by moving the two perpendicular lines with the mouse, as follows:
- button 1: move a single line
- button 2: move both lines
- button 3: rotate both lines about the current origin
Another origin that can be often used is the intersection of the vertical line along the left margin and the horizontal line through the page number. This is a more useful point than simply the page number because this way the angle is automatically set as well.
When the origins are set, use the spacebar to switch to global mode. A semi-transparent rectangle will replace the two lines to mark the page boundaries. In this mode the global image scale and the location of the chosen origin on the page are set, again using the mouse:
- button 1: move the page with respect to the origin
- button 2: move the origin (same as in local mode)
- button 3: change global scale (vertical movement)
The second mouse button lets you modify the local origin while in global mode, which can be useful when the origin point is not found on a certain image. In that case (for instance a title page) the origin can be set based on the page boundaries. Cycle through the images to confirm that they all have the correct placement. Now all you have to do is save the document and you're done.
Some notes about the other (initial) window. The menubar controls (open / save / export) the postscript document. The four buttons at the bottom control (add / remove / reorder) the eps images. If no image is selected new images are appended at the end, otherwise it is inserted before the selected image. To deselect an image, just close the preview window.
A small two-page postscript demo created with epsjoin can be downloaded below. You'll see the 'scanning imperfections' are a bit exaggerated. The screenshots of epsjoin in action come from this same demo. They show the above described process, using the left margin / page number as origin.
IMPORTANT
Epsjoin uses gtk 2.4, supported by pygtk <http://pygtk.org> since version
2.3.91. This version is currently (August 2004) NOT IN DEBIAN, not even in
unstable. I use the package from the experimental
<http://packages.debian.org/experimental> distribution, which works fine for
epsjoin but there are likely some problems that I don't know of since it is
still not in unstable. I hope it will soon. If you're willing to take the
'risk', here is the package you need: python2.3-gtk2
<http://packages.debian.org/experimental/python/python2.3-gtk2>. Otherwise,
patience... If you are not a debian user, the latest PyGTK and gnome-python can
be found here <http://pygtk.org>. The page sizes may be a problem because I
believe libpaper-utils
<http://packages.debian.org/unstable/utils/libpaper-utils> (which contains
paperconf) is a native debian package. Maybe I'm wrong. Anyway I don't think
it's hard to compile paperconf from source.
FILE FORMAT
The following is 'optional reading', for the interested.
The created postscript document is very simple. In the document prolog a single [TRANSFORM] function is defined with the three parameters that define the global coordinate system: scale [s], horizontal translation [u] and vertical translation [v]. These parameters are read from this function declaration when opening the file. If it is not found in the document's prolog section (ended with [%%EndProlog]) you'll get a warning and the global parameters are not changed.
-- /TRANSFORM { <s> <u> <v> translate dup scale neg rotate neg exch neg exch translate } bind def
The function modifies the postscript transformation matrix according to the three parameters that define the image's local coordinate system: horizontal translation [x], vertical translation [y] and angle [a]. Every included eps image is preceded by a call to this function, which itself is preceded by [gsave] to save the current matrix.
-- <x> <y> <a> TRANSFORM
When the eps is drawn the matrix is restored with [grestore] and the page is shown with [showpage]. The eps code is enclosed by [%%BeginDocument] and [%%EndDocument]. This is the standard way of including (eps) documents, which means that many postscript documents created by other programs (such as dvips) can be opened in epsjoin as well. Of course they will not contain the [TRANSFORM] call so you will have to work yourself through some warning messages, but it works. In case of a missing [TRANSFORM] line the three individual parameters are left at the default.
JPEG2EPS
I used to state here that there was a problem with eps files created by jpeg2eps <http://wwwrses.anu.edu.au/~andy/jpeg2eps>. THIS WAS NEVER THE CASE. I had jpeg2eps mixed up with another program, jpeg2ps, which was in debian at that time but doesn't seem to be anymore. Moreover, there seem to be several different programs named 'jpeg2ps' so it wouldn't be true to say there is a problem with that either. Just keep in mind that not all eps files work the way they should. I'm pretty sure the problem lies with the eps code in such cases, just because the described postscript format is too simple to cause trouble. Still, I don't know postscript well enough to be certain.
I have tried jpeg2eps and it works just fine. In the latest epsjoin version the preview window supports color and greyscale images to better handle these eps files. This makes jpeg2eps a valid alternative for bitmap tracers such as potrace and autotrace. Referring to the introduction this approach skips step two ('have it just as clean and sharp as the original'). In return you'll usually get a smaller document and better support for images (as opposed to text). What remains of course is an easy way to rotate and scale the separate images and join them.
