Hany Char Convertor
HCC is char filter used to replace certain characters by another characters one by one. Usefull for converting plain texts from one character set to another.
This software is released without any warranty. See also COPYING for more information.
Copyright 1999,.. by Peter Hanecak <hanecak@megaloman.sk>.
== Hany Char Convertor ==
Hany Char Convertor 0.5.7
Description
HCC is char filter used to replace certain characters by another characters one by one. Usefull for converting plain texts from one character set to another.
Currently available filters:
win1250 -> ISO-8859-2
win1250 -> ASCII (ISO-8859-1)
ISO-8859-2 -> win1250
ISO-8859-2 -> ASCII (ISO-8859-1)
Copying
This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version.
This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.
You should have received a copy of the GNU General Public License along with this program; if not, write to the Free Software Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.
See the COPYING file or GNU General Public License page <http://www.gnu.org/copyleft/gpl.html> for license information.
Copyright 1998,.. by Peter Hanecak <hanecak@megaloman.sk>. All rights reserved.
Download
You can find sources at:
http://www.megaloman.com/~hany/_data/hcc/
http://terminus.sk/~hany/_data/hcc/
Also you can download RPM packages from:
http://www.megaloman.com/~hany/RPM/hcc.html
To verify files, use my public key:
<http://www.megaloman.com/~hany/gnupg-hany-public-key.txt>.
Requirements
hd2u: is required to perform end-of-line conversion from UNIX to
MS-DOS format and vice versa.
For more information about hd2u see
<http://www.megaloman.com/~hany/software/hd2u/>.
Sources can be downloaded from
<http://www.megaloman.com/~hany/_data/hd2u/>.
RPM package of hd2u can be found for example at
<http://www.megaloman.com/~hany/RPM/hd2u.html>.
Installation
First, you need some UNIX system (maybe some other too) which basic development utilities alredy installed (make, sed, install, ...).
After you sucessfuly downloaded and unpacked source tarball, do the following in source directory:
$ make
$ make install
By default, binary is installed in /usr/local/bin and filters in /usr/local/lib/hcc. If you want to use diferent directories, run following (example will install binary into /usr/bin and filters into /usr/lib/hcc):
$ make prefix=/usr install
Usage
hcc
For now hcc is just filter which is processing data from standard input and writing results to standard output.
You can run it with following parameters:
hcc <conversion table> [<conversion table file name>]
<conversion table> which conversion table to use
<conversion table file name> file where is stored conversion
table (default is
/usr/local/lib/hcc/hcc.ct)
- Example
cat source.txt | hcc win1250-iso88592 > target.txt
This will perform win1250 -> iso88592 conversion of text in
source.txt using default conversion table file storing resulting
text to target.txt .
u2w
This utility converts text in ISO-8859-2 encoding to text in win1250 (using hcc with iso88592-win1250 conversion table). It is also converting EOL characters using 'dos2unix' (from hd2u package).
Usage of u2w:
u2w [-b] [-n] <text file> [<text file> [...]]
-b create backup copies (.bak files)
-n do not convert end-of-line characters
- Example
u2w -n text-file.txt
This will perform iso88592 -> win1250 conversion of text in
text-file.txt . EOL charactes conversion is skipped.
Note: If text-file.txt contains proper UNIX like EOLs than resulting
text may be hard to view/edit under Window so use '-n' only if
you know what you are doing.
Note: Running this utility on one file more than once may corrupt content of the file so that it no longer contains data in original character set nor in expected character set.
w2u
This utility converts text in win1250 encoding to text in ISO-8859-2 (using hcc with win1250-iso88592 conversion table). It is also converting EOL characters using 'dos2unix' (from hd2u package).
Usage of w2u:
w2u [-b] [-n] <text file> [<text file> [...]]
-b create backup copies (.bak files)
-n do not convert end-of-line characters
- Example
w2u *.txt *.htm
This will perform win1250 -> iso88592 conversion of texts in
all .txt and .htm files in current directory.
Note: Running this utility on one file more than once may corrupt content of the file so that it no longer contains data in original character set nor in expected character set.
w2a
This utility converts text in win1250 encoding to text in ISO-8859-1 (using hcc with win1250-iso88591 conversion table). It is also converting EOL characters using 'dos2unix' (from hd2u package).
In the process of conversion national characters are converted to "nearest" ISO-8859-1 equivalent thus backward conversion to same source text is not posible.
Usage of w2a:
w2a [-b] [-n] <text file> [<text file> [...]]
-b create backup copies (.bak files)
-n do not convert end-of-line characters
- Example
w2a -b *.htm
This will perform win1250 -> iso88591 conversion of texts in
all .htm files in current directory making backup copies in
.htm.bak files in current directory.
u2a
This utility converts text in ISO-8859-2 encoding to text in ISO-8859-1 (using hcc with win1250-iso88591 conversion table).
In the process of conversion national characters are converted to "nearest" ISO-8859-1 equivalent thus backward conversion to same source text is not posible.
Usage of u2a:
u2a [-b] <text file> [<text file> [...]]
-b create backup copies (.bak files)
- Example
u2a example/*.txt
This will perform win1250 -> iso88591 conversion of texts in
all .txt files in 'example' subdirectory of current directory.
How to contribute
If you would like to submit a patch, send it to me <hanecak@megaloman.sk>. Please be sure to include a textual explanation of what your patch does.
The preferred format for changes is 'diff -u' output. You can generate it like this:
$ diff -urN hcc-orig hcc-work > mydiffs.patch
TODO
- finish existing conversion tables (win1250-iso88592 and win1250-iso88591) because they contains only characters used in Slovakia
- add other conversion tables for other encodings (more is better :)
- catch some win1250 "features" (for now, i just know they exists - like lower double quote)
- change parameters handling to be more like in other UNIX utilities
Authors
Peter Hanecak <hanecak@megaloman.sk>
Contributors
Sano Kurthy <kurthy@sopsr.sk>
Thank you.
Maintainer
Current maintainer is Peter Hanecak <hanecak@megaloman.sk> .
