SourceFiles.org - Use the Source, Luke
Home | Register | News | Forums | Guide | MyLinks | Bookmark

Related Sites

Latest News
  General News
  Reviews
  Press Releases
  Software
  Hardware
  Security
  Tutorials
  Off Topic


Back to files

<html><head><meta http-equiv=Content-type content="text/html; charset=utf-8"> <title>HTML file Character set converter</title> <style type="text/css"><!--
TABLE.toc {border:0px}
A:link,A:visited{text-decoration:none;color:#2A3B83} A:hover{text-decoration:underline;color:#002040} A:active{text-decoration:underline;color:#004060;background:#CCD8FF} TD.toc {font-size:80%; font-family:Tahoma; text-align:left}

H1       {font-size:250%; font-weight:bold} .level1 {text-align:center}
H2       {font-size:200%; font-weight:bold} .level2 {margin-left:1%}
H3       {font-size:160%; font-weight:bold} .level3 {margin-left:2%}
H4       {font-size:145%; font-weight:bold} .level4 {margin-left:3%}
H5       {font-size:130%; font-weight:bold} .level5 {margin-left:4%}
H6       {font-size:110%; font-weight:bold} .level5 {margin-left:5%}

BODY{background:white;color:black}
CODE{font-family:lucida console,courier new,courier;color:#105000} PRE.smallerpre{font-family:lucida console,courier new,courier;font-size:80%;color:#500010;margin-left:30px} SMALL {font-size:70%}
--></style></head>
<body>
<h1>HTML file Character set converter</h1> <h2 class=level2> 0. Contents </h2>

This is the documentation of htmlrecode-1.3.0. <div class=toc><table cellspacing=0 cellpadding=0 class=toc><tr><td width="50%" valign=middle align=left nowrap class=toc>&nbsp;&nbsp;&nbsp;1. <a href="h0">Purpose</a><br>&nbsp;&nbsp;&nbsp;2. <a href="h1">Usage</a><br>&nbsp;&nbsp;&nbsp;3. <a href="h2">TODO</a><br>&nbsp;&nbsp;&nbsp;4. <a href="h3">Installation</a><br>&nbsp;&nbsp;&nbsp;5. <a href="h4">Example</a><br></td> <td width="50%" valign=middle align=left nowrap class=toc>&nbsp;&nbsp;&nbsp;6. <a href="contact">Feedback</a><br>&nbsp;&nbsp;&nbsp;7. <a href="h5">Requirements</a><br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;7.1. <a href="h6">Compilation problems</a><br>&nbsp;&nbsp;&nbsp;8. <a href="h7">Changelog</a><br>&nbsp;&nbsp;&nbsp;9. <a href="h8">Copying</a><br>&nbsp;&nbsp;&nbsp;10. <a href="#download">Downloading</a><br></td> </tr></table></div><H2 id=h0 class=level2><a name=h0></a>1. Purpose</H2><div class=level2 id=divh0>

Recodes the HTML file using a new character set, while losing no characters at all. You can recode shift_jis to euc-jp, utf8 to latin1, iso-8859-15 to GB18030, iso-2022-jp to koi8-r etc if you wish, and none of the characters on the page will become unreadable (unless you specify -l switch, which disables making &amp;nnnn; escapes). <p>
Standard-correct HTML is a good thing.
One of the goals in the development of this program is that it never makes the HTML more broken than it previously was. It should even make it better than it was. So if you see that the program does the opposite, please <a href="
contact">tell me</a>.

</div><H2 id=h1 class=level2><a name=h1></a>2. Usage</H2><div class=level2 id=divh1>

<pre class=smallerpre>htmlrecode 1.2.0 - Copyright (C) 1992,2003 Bisqwit (http://iki.fi/bisqwit/)

Usage: htmlrecode [&lt;option> [&lt;...>]]

Reads stdin, writes stdout.

Options
-I, --inset setname Assumed input character set (default: iso-8859-1) -O, --outset setname Wanted output character set (default: iso-8859-1) -V, --version Displays version information. -e, --usehex Use hexadecimal escapes. -g, --signature Prefix the file with an unicode signature. -h, --help This help. -l, --lossy Disable lossless conversion. -q, --quiet Be less verbose. -s, --strict Turn off support for slightly broken HTML. -v, --verbose Be less quiet. -x, --xmlmode XML mode: all tag param values quoted.

Pipe in the html file and pipe the output to result file.</pre>

</div><H2 id=h2 class=level2><a name=h2></a>3. TODO</H2><div class=level2 id=divh2>

I'll soon add an interface for modifying the text content of a HTML file.<br> This should make making filters like Pootpoot or Pikachifier easier. It is already theoretically supported, but I haven't invented an interface for it yet.

</div><H2 id=h3 class=level2><a name=h3></a>4. Installation</H2><div class=level2 id=divh3>

<pre class=smallerpre
>$ make
$ su
# make install</pre>

If you do not want to install
<a href="http://oktober.stc.cx/source/libargh.html">libargh</a> (included in the archive), do not use "make install" and edit Makefile and enable the STATIC linking instead of DYNAMIC.

</div><H2 id=h4 class=level2><a name=h4></a>5. Example</H2><div class=level2 id=divh4>

This page template is locally stored in iso-8859-1, but is automatically converted to utf-8 to make the final version.<p> Here are some latin letters: åäöñé<br> Here are some CJK (chinese/japanese/korean ideograms): 日本<br> Here are some html escapes: >"äöê<br> <p>
Source code of the above:<pre class=smallerpre >Here are some latin letters: åäöñé&lt;br> Here are some CJK (chinese/japanese/korean ideograms): &amp;#26085;&amp;#26412;&lt;br> Here are some html escapes: &amp;gt;&amp;quot;&amp;auml;&amp;ouml;&amp;ecirc;&lt;br> </pre>
What your browser is getting, is not &amp;#26085; etc but the actual utf-8 characters.

</div><H2 id=contact class=level2><a name=contact></a>6. Feedback</H2><div class=level2 id=divcontact>

If you have problems using this program or ideas how to develop it, email me your questions or ideas.<br> Please do not omit the details.<br>
My email address (sigh) is: <em>bisqwit a<b style=font-weight:lighter>t i</b>ki <small>dot</small> fi</em>

</div><H2 id=h5 class=level2><a name=h5></a>7. Requirements</H2><div class=level2 id=divh5>

htmlrecode has been written in C++, utilizing the standard template library.<br> GNU make is required.<br>
I have g++ version 3.3, and htmlrecode compiles without warnings. For now.

</div><H3 id=h6 class=level3><a name=h6></a>7.1. Compilation problems</H3><div class=level3 id=divh6>

htmlrecode uses widestrings, which is a feature different g++ versions are very inconsistent about. <code>htmlrecode.hh</code> has some settings you can try to choose between. Try this:<p>
Replace<pre>
//define wstring ucs4string
typedef wchar_t ucs4;
//typedef unsigned int ucs4;
//typedef basic_string&lt;ucs4> wstring;</pre> With<pre>
//
define wstring ucs4string
//typedef wchar_t ucs4;
typedef unsigned int ucs4;
typedef basic_string&lt;ucs4> wstring;</pre> </p>
This might help compiling on g++-2.95.

</div><H2 id=h7 class=level2><a name=h7></a>8. Changelog</H2><div class=level2 id=divh7>

<pre>
Since 1.2.0:

  • Abrubtly terminated multibyte sequences no longer cause htmlrecode to enter an infinite loop

Since 1.1.5:

  • Tags are now recognized in all mixed case
  • Tag values can be in '', not only in ""
  • -:_. are recognized to be part of tag value if no "" is there
  • Nonspace are also recognized as above :(&nbsp;&nbsp;&nbsp;(unless -s option was used)
  • SCRIPT and STYLE contents are "raw" until the next &lt;/, unless -s was used
  • SCRIPT/STYLE contents are properly rehidden if necessary
  • " and ' quotes (and no quotes) are used wisely
  • Warnings from some bad HTML
  • Indentations inside tags are now kept mostly intact
  • XHTML support
  • Unicode signature character support
  • Major structural rewrites
  • New "configure" script
    • Big thanks to Winfried Szukalski for his thorough testing efforts and comments.

Since 1.1.4:

  • workaround for g++ versions, now compiles with g++-3

Since 1.1.3:

  • optimizations
  • error resistence

Since 1.1.2:

  • hex support
  • g++ string workarounds

Since 1.1.1:

  • improved documentation
  • fixed &lt; (was outputted as &amp;gt;, should be &amp;lt;)

</pre>

</div><H2 id=h8 class=level2><a name=h8></a>9. Copying</H2><div class=level2 id=divh8>

htmlrecode has been written by Joel Yliluoma, a.k.a. <a href="http://iki.fi/bisqwit/">Bisqwit</a>,<br> and is distributed under the terms of the <a href="http://www.gnu.org/licenses/licenses.html#GPL">General Public License</a> (GPL).

</div><H2 id=download class=level2><a name=download></a>10. Downloading</H2><div class=level2 id=divdownload>

The official home page of htmlrecode is at <a href="http://iki.fi/bisqwit/source/htmlrecode.html">http://iki.fi/bisqwit/source/htmlrecode.html</a>.<br>; Check there for new versions.
</div> <p align=right><small>Generated from

<tt>progdesc.php</tt> (last updated: Sun, 19 Sep 2004 01:35:54 +0300)<br> with <tt>docmaker.php</tt> (last updated: Fri, 20 Aug 2004 12:10:17 +0300)<br> at Sun, 19 Sep 2004 01:36:03 +0300</small> </p>
</body>
</html>


Other Sites

Discussion Groups
  Beginners
  Distributions
  Networking / Security
  Software
  PDAs

About | FAQ | Privacy | Awards | Contact
Comments to the webmaster are welcome.
Copyright 2006 Sourcefiles.org All rights reserved.