html_parse v0.4.0 Tom Canich tcanich@betterbrain.com 02.18.2001
MANIFEST -- these files should have been included in the archive
html_parse-x.x.x/CHANGES
html_parse-x.x.x/html_parse
html_parse-x.x.x/GPL
html_parse-x.x.x/README
THIS PROGRAM IS FREE SOFTWARE RELEASED UNDER THE GNU GPL, v2.0, OR AT YOUR OPTION ANY LATER VERSION. NO WARRANTY IS EXPRESSED OR IMPLIED. USE AT YOUR OWN RISK.
IF NO COPY OF THE GNU GPL WAS PROVIDED, ONE CAN BE ACQUIRED AT http://www.gnu.org
html_parse is a tool for stripping HTML tags from a document. It can optionally load the resulting plain text into a MySQL database.
INSTALLATION
unzip and untar the archive into a location of your choosing:
# cp html_parse.tar.gz /usr/local/bin
# tar -xzvf html_parse.tar.gz
for a bziped file:
# cp html_parse.tar.bz2 /usr/local/bin
# tar -xyvf html_parse.tar.bz2
Run the program:
# html_parse
Without any options, the usage screen will be displayed. You will need write access in the present working directory.
OPTIONS
html_parse -f <input> -o <output> [-g] [-t <table>]
f specifies the file the program will read from. This is a required element.
o specifies the file the program will write to. This is a required element.
g specifies whether the program should try to automagically generate the
database table. The default is no.
t specifies the table the program should write to. If the g flag is set, then
this is a required element. If the g flag is unset, an error will be issued
and the program will continue normally.
NOTES, QUIRKS, AND OTHER ODDITIES
Version 0.4.0 will prefix the present working directory to the name of the file that MySQL is to read from. This solved an earlier problem wherein MySQL was not always able to read the file, claiming it did not exist.
BUGS
Send all bugs and comments to tcanich@betterbrain.com.
