UntzUntz LAN Scan
-----------------
Version 1.3d
Contained In This File:
- Purpose
- Why
- Requirements
- Installation
- Problems & Questions
- Current Features and Information
- Purpose
This program will use 'smbclient' to creating a listing of all files contained within the local area network (that are shared). They can be shared on a Microsoft Windows machine or another Samba enabled machine.
- Why
- ---
I wrote this because I setup a network where most people (about 35) shared a lot of files between each other. Finding a song or file that is contained on maybe one machine out of 100,000 files - is a pain. So now! You can search!
- Requirements
The main requirement is the 'smbclient' program. If you do not have Samba installed, go to http://www.samba.org to download it.
You'll also need a web server running to use the CGI program
Perhaps in the future another medium for searching could be developed.
- Installation
- Compiling
g++ -O untzcgi-X.X.cpp -o untzcgi
g++ -O untzlanscan-X.X.cpp -o untzlanscan
Installation:
Copy the 'untzcgi' program into your web server's CGI directory
cp untzcgi /usr/local/apache/cgi-bin/
Copy the 'untzlanscan' program to /usr/local/bin
cp untzlanscan /usr/local/bin
Copy the 'untzls.conf' file to /etc/ directory
cp untzls.conf /etc/
You probably want to put this into your cron daemon so it will scan
the network every once and a while. Here is what I use:
59 * * * * /usr/local/bin/untzlanscan
This will run the scan every hour.
Configuration:
Edit the file, /etc/untzls.conf to reflect your configuration.
Here's the skinny on the configuration options:
ip address
This option is the IP address of the master browser
on the network. I'm pretty sure any computer will
work but the master browser has a better chance of
having all the computer listed.
temp
This is probably best set as /tmp
Basically it is a place where the program creates some
temporary files and holds the database.
smbclient
This is the path to 'smbclient' program.
If you do not have this program you will need to
download the Samba packet from http://www.samba.org
logo jpeg
This is the URL to a small image file in the upper left
hand corner of the search screen. Keep in mind there is
no size constraint in the IMG tag - so it might look kinda
wierd if you put a 640x480 size jpeg in there.
username
This is the name of the user smbclient will use to
connect to systems. If you have lax security the
user 'guest' is probably your best bet.
password
OPTIONAL. If you are connecting to systems that require
a user and a password, set the password option.
One thing to keep in mind is that this password is held in
clear text. You may want to consider creating a user
that only has READ and LIST options on the systems that
this program will be connecting to.
Default: None
workgroup
OPTIONAL. If given this will add this to the smbclient
for use as the workgroup option.
Default: None
results per page
OPTIONAL. This is the configuration option to specify the
number of results to print out per page. If it is not
given and not passes to the CGI script, it will default
to 15.
Default: 15
allow user robots
OPTIONAL. A value of yes or no (or YES/NO)
This will specify whether you want to give the user the
ability to block certain directories.
Default: Yes, the user can specify his/her own robots.txt
server robots
OPTIONAL. This points to a file which specifies a list of
machines shares, and directories to block. See under the
'Usage' section on the format of the file.
Default: None
page header
OPTIONAL. This points to a file that defines a header for
each search results page. For more information on the
variables that can be contained within this file, please
see the section 'Template' section below under 'Usage'
page footer
OPTIONAL. This points to a file that defines a footer for
each search results page. For more inf on the variables
see below.
search template
OPTIONAL. This points to a file that defines a structure
for each result. See below for more details.
force file
OPTIONAL. This points to a file that tells the crawler to
scan the shares contained. This can be used it for some
reason the smb client cannot see a share or if the share
name is being cut off by smbclient. The format of the
file is simply one //COMPUTER/SHARE per line.
- Usage
Search Form
Creating a main search web page is easy! Simply use the following code in your webpage:
<form action="/cgi-bin/untzcgi">
<input type=text value="" name=QUERY size=65 maxlength=256>
<input type=submit value="Search">
As you can see, you only need to pass the QUERY variable to begin a
search.
RESULTS (cgi option)
If this is specified and is passed from the form it tells the CGI how
many results to print out per page of results. It goes hand-in-hand
with the configuration option 'results per page'. If it is not given
and not specified in the configuration file, the default is 15.
If you want to embed this in your form, simply add
<input type=hidden name=RESULTS value=15>
To your HTML within the <FORM> tag.
server robots
If this directive is specified in the configuration program, the crawler
will look through the file specified for machines, shares, and directories
to not search. You ask, why would you want to do such a thing? Well, if
you are, for example, sharing a directory tree which has all the install
files to a program, you might want to block searching of the actual install
tree because there are a lot of ambigious files that would have no use to
a casual search. Blocking will also keep the size of the database small and
therefore allow for faster lookups of worthwhile data.
Each line is checked until there is a match made. At the end there is an
mplicit 'allow all'.
[action] [type] [data]
action
------
This is what action to take, the options are:
allow
deny
type
This is where to impliment the filter, the options are:
machine
share
directory
all
* When using all it will block/allow on machine/share/directory
data
This is the actual thing to deny/allow. It can use the * as a
wildcard. Keep in mind that when it check this, it check on full
path information. So, for example, if you wanted to hide a
directory called 'Monkey' and it was in a directory called 'Data'
you would want to use the following:
deny directory \Data\Monkey*
Examples
The following example is to deny certain areas:
deny machine passout
deny share top*
deny share *secret
deny directory help
The first line, 'deny machine passout' will tell the crawler to
not even look at the machine named 'passout'.
The second line, 'deny share top*' will tell the crawler if
it comes upon a share that begins with 'top', to not crawl it.
The third line, 'deny share *secret' will tell the crawler if
it comes upon a share that ends with 'secret' to not crawl it.
The fourth line, 'deny directory help' tells the crawler if
it sees a directory with the word 'help' in it, to not crawl it.
This will also block all the subdirectories of that directory.
You can also use the reverse logic to tell the program only to
crawl certain machines, share, directories, for example:
allow machine passout
allow machine monkey
deny all
Tells the crawler only to search the machines named, 'passout'
and 'monkey'
robots.txt (user side)
If you want the user to have the ability to block a share from being
searched simply have them place a file called 'robots.txt' in the
share's root directory. The crawler will see this and skip the directory
If they want to block just a directory, have them place the 'robots.txt'
file in that directory and it will skip that directory. It will not skip
the subdirectories at this time.
Templates
So you want to make your UntzUntz LAN Scan search fit into the rest of
your website? Well then templates are the answer. Included are three
example templates which are very similar to the default UntzUntz LAN
Scan search results. One thing to mention is that templates are
optional. If you do not define template files, UntzUntz LAN Scan will
use the default built into the code.
Let's get started. First, let's examine the configuration options:
page header =
This can be a filename (with path) or it can be the word
'none'. If it is set to none, then UntzUntz LAN Scan
will not print anything out for the header. The equivalent
would also be to set it to a file that doesn't exist, in
which case UntzUntz LAN Scan will also print nothing for
the header.
page footer =
Same as page header above.
search template =
This can only be a filename. If it is defined and the
cannot be found UntzUntz LAN Scan will use the default
template.
Ok...now we've configured out /etc/untzls.conf file...let's make the
templates:
Each file has a different set of variables that can be used. Below is
a table of variables and the files they work in.
+----------+----------------+--------------------------------+
| File | Variable | Description |
+----------+----------------+--------------------------------+
| Header | $SMALL_LOGO | This is the small logo defined |
| | | in /etc/untzls.conf |
| +----------------+--------------------------------+
| | $QUERY_DATA | This is what is the last query |
| +----------------+--------------------------------+
| | $SEARCH_RESULTS| Listing of each keyword and the|
| | | number of 'hits' for each |
| +----------------+--------------------------------+
| | $FIND_SIZE | This is the size (in MB/GB/TB) |
| | | of the files found |
| +----------------+--------------------------------+
| | $TOTAL_SIZE | This is the total size of all |
| | | files on the network. |
| +----------------+--------------------------------+
| | $LOW_RESULTS | This is the lowest result on |
| | | the page. |
| +----------------+--------------------------------+
| | $HIGH_RESULTS | This is the highest result on |
| | | the page. |
| +----------------+--------------------------------+
| | $TOTAL_RESULTS | Total number of results |
| +----------------+--------------------------------+
| | $SEARCH_TIME | Time in seconds it took to find|
| +----------------+--------------------------------+
| | $PAGING_BAR | The list of result pages |
+----------+----------------+--------------------------------+
| Footer | $PAGING_BAR | The list of result pages |
| +----------------+--------------------------------+
| | $UNTZ_FOOTER | A small ad for UntzUntz LS |
+----------+----------------+--------------------------------+
| Search | $NUMBER | Result number |
| Template +----------------+--------------------------------+
| | $FILE_PATH | Computer\Share\Dir\file |
| +----------------+--------------------------------+
| | $FILE_NAME | Name of file found |
| +----------------+--------------------------------+
| | $FILE_SIZE | Size of file (in KB/MB/GB) |
| +----------------+--------------------------------+
| | $FILE_DATE | Date of file |
| +----------------+--------------------------------+
| | $FILE_TIME | Time of file |
| +----------------+--------------------------------+
| | $FILE_LOCATION | Computer\Share\Dir |
+----------+----------------+--------------------------------+
Notes:
1. $QUERY_DATA has the "" around it already
Any HTML will work around these variables. Only one of each variable
per line. Meaning you can have:
$NUMBER. $FILE_NAME
But not,
$NUMBER, $NUMBER, $NUMBER
And why would you want to? But if you really did want to:
$NUMBER,
$NUMBER,
$NUMBER
Which would be the same.
Look at the example template files for a better idea of how they work.
Problems or Questions?
One thing you can start with is changing the DEBUG_LEVEL variable
Change the 0 to a number between 1 and 9. 1 will give the least amount of
information, 9 the most.
Otherwise, email me at jed204@users.sourceforge.net
Current Features and Information:
Currently the program is configured through a configuration file. Within the configuration file is information about the master browser (or a computer on the network), a temporary directory, a logo file for the search page, a network user name to search as (probably guest for most networks).
The crawler current will scan the network for clients, create a share listing, from the share listing create a file listing. There is support for 'robots.txt', if this file is in a 'root' directory of a share the share will be skipped. If it is in a 'lower level' directory of a share, that directory will be skipped.
Currently the crawler will skip hidden and printer shares as well. Once the crawler has found all the files on the network the cgi program can do a search against that. The cgi takes less than 1 second to search through the 30,000 files in 2,500 directories on my network. I created a database with 350,000 and 30,000 directories and it took about 5 to 6 seconds. These benchmarks are from a Pentium III 450 MHz with 512 MB of ram.
Also see the beginning of each source file for more information about development and releases.
If you have any problems feel free to contact me at jed204@users.sourceforge.net
Thanks,
John
jed204@users.sourceforge.net
