SourceFiles.org - Use the Source, Luke
Home | Register | News | Forums | Guide | MyLinks | Bookmark

Sponsored Links

Latest News
  General News
  Reviews
  Press Releases
  Software
  Hardware
  Security
  Tutorials
  Off Topic


Back to files
                                UntzUntz LAN Scan
                                -----------------
                                  Version 1.3d

Contained In This File:

  1. Purpose
  2. Why
  3. Requirements
  4. Installation
  5. Problems & Questions
  6. Current Features and Information
    Purpose

    This program will use 'smbclient' to creating a listing of all files contained within the local area network (that are shared). They can be shared on a Microsoft Windows machine or another Samba enabled machine.

    Why
    ---

    I wrote this because I setup a network where most people (about 35) shared a lot of files between each other. Finding a song or file that is contained on maybe one machine out of 100,000 files - is a pain. So now! You can search!

    Requirements

    The main requirement is the 'smbclient' program. If you do not have Samba installed, go to http://www.samba.org to download it.

        You'll also need a web server running to use the CGI program
        Perhaps in the future another medium for searching could be developed.
Installation

Compiling
                g++ -O untzcgi-X.X.cpp -o untzcgi
                g++ -O untzlanscan-X.X.cpp -o untzlanscan

        Installation:

                Copy the 'untzcgi' program into your web server's CGI directory
                        cp untzcgi /usr/local/apache/cgi-bin/                   

                Copy the 'untzlanscan' program to /usr/local/bin
                        cp untzlanscan /usr/local/bin

                Copy the 'untzls.conf' file to /etc/ directory
                        cp untzls.conf /etc/

                You probably want to put this into your cron daemon so it will scan
                the network every once and a while.  Here is what I use:

                59 * * * * /usr/local/bin/untzlanscan

                This will run the scan every hour.

        Configuration:

                Edit the file, /etc/untzls.conf to reflect your configuration.

                Here's the skinny on the configuration options:

                        ip address
                                This option is the IP address of the master browser 
                                on the network.  I'm pretty sure any computer will
                                work but the master browser has a better chance of
                                having all the computer listed.
        
                        temp
                                This is probably best set as /tmp
                                Basically it is a place where the program creates some
                                temporary files and holds the database.

                        smbclient
                                This is the path to 'smbclient' program.
                                If you do not have this program you will need to
                                download the Samba packet from http://www.samba.org

                        logo jpeg
                                This is the URL to a small image file in the upper left
                                hand corner of the search screen.  Keep in mind there is
                                no size constraint in the IMG tag - so it might look kinda
                                wierd if you put a 640x480 size jpeg in there.

                        username
                                This is the name of the user smbclient will use to
                                connect to systems.  If you have lax security the
                                user 'guest' is probably your best bet.

                        password
                                OPTIONAL. If you are connecting to systems that require
                                a user and a password, set the password option.
                                One thing to keep in mind is that this password is held in
                                clear text.  You may want to consider creating a user
                                that only has READ and LIST options on the systems that
                                this program will be connecting to.
                                Default: None
                                                        
                        workgroup
                                OPTIONAL. If given this will add this to the smbclient
                                for use as the workgroup option.        
                                Default: None

                        results per page
                                OPTIONAL. This is the configuration option to specify the
                                number of results to print out per page.  If it is not
                                given and not passes to the CGI script, it will default
                                to 15.
                                Default: 15

                        allow user robots
                                OPTIONAL. A value of yes or no (or YES/NO)
                                This will specify whether you want to give the user the
                                ability to block certain directories.
                                Default: Yes, the user can specify his/her own robots.txt

                        server robots
                                OPTIONAL. This points to a file which specifies a list of 
                                machines shares, and directories to block.  See under the
                                'Usage' section on the format of the file.
                                Default: None

                        page header
                                OPTIONAL. This points to a file that defines a header for
                                each search results page.  For more information on the 
                                variables that can be contained within this file, please        
                                see the section 'Template' section below under 'Usage'

                        page footer
                                OPTIONAL. This points to a file that defines a footer for
                                each search results page.  For more inf on the variables
                                see below.

                        search template
                                OPTIONAL. This points to a file that defines a structure
                                for each result.  See below for more details.

                        force file
                                OPTIONAL. This points to a file that tells the crawler to
                                scan the shares contained. This can be used it for some
                                reason the smb client cannot see a share or if the share
                                name is being cut off by smbclient.  The format of the
                                file is simply one    //COMPUTER/SHARE    per line.
Usage

Search Form


Creating a main search web page is easy! Simply use the following code in your webpage:

                        <form action="/cgi-bin/untzcgi">
                        <input type=text value="" name=QUERY size=65 maxlength=256>
                        <input type=submit value="Search">

                As you can see, you only need to pass the QUERY variable to begin a
                search.

RESULTS (cgi option)

If this is specified and is passed from the form it tells the CGI how many results to print out per page of results. It goes hand-in-hand with the configuration option 'results per page'. If it is not given and not specified in the configuration file, the default is 15. If you want to embed this in your form, simply add <input type=hidden name=RESULTS value=15> To your HTML within the <FORM> tag.

server robots

If this directive is specified in the configuration program, the crawler will look through the file specified for machines, shares, and directories to not search. You ask, why would you want to do such a thing? Well, if you are, for example, sharing a directory tree which has all the install files to a program, you might want to block searching of the actual install tree because there are a lot of ambigious files that would have no use to a casual search. Blocking will also keep the size of the database small and therefore allow for faster lookups of worthwhile data. Each line is checked until there is a match made. At the end there is an mplicit 'allow all'. [action] [type] [data] action ------ This is what action to take, the options are: allow deny

type

This is where to impliment the filter, the options are: machine share directory all * When using all it will block/allow on machine/share/directory

data

This is the actual thing to deny/allow. It can use the * as a wildcard. Keep in mind that when it check this, it check on full path information. So, for example, if you wanted to hide a directory called 'Monkey' and it was in a directory called 'Data' you would want to use the following: deny directory \Data\Monkey*

Examples

The following example is to deny certain areas: deny machine passout deny share top* deny share *secret deny directory help The first line, 'deny machine passout' will tell the crawler to not even look at the machine named 'passout'. The second line, 'deny share top*' will tell the crawler if it comes upon a share that begins with 'top', to not crawl it. The third line, 'deny share *secret' will tell the crawler if it comes upon a share that ends with 'secret' to not crawl it. The fourth line, 'deny directory help' tells the crawler if it sees a directory with the word 'help' in it, to not crawl it. This will also block all the subdirectories of that directory. You can also use the reverse logic to tell the program only to crawl certain machines, share, directories, for example: allow machine passout allow machine monkey deny all Tells the crawler only to search the machines named, 'passout' and 'monkey'

robots.txt (user side)

If you want the user to have the ability to block a share from being searched simply have them place a file called 'robots.txt' in the share's root directory. The crawler will see this and skip the directory If they want to block just a directory, have them place the 'robots.txt' file in that directory and it will skip that directory. It will not skip the subdirectories at this time.

Templates

So you want to make your UntzUntz LAN Scan search fit into the rest of your website? Well then templates are the answer. Included are three example templates which are very similar to the default UntzUntz LAN Scan search results. One thing to mention is that templates are optional. If you do not define template files, UntzUntz LAN Scan will use the default built into the code. Let's get started. First, let's examine the configuration options: page header = This can be a filename (with path) or it can be the word 'none'. If it is set to none, then UntzUntz LAN Scan will not print anything out for the header. The equivalent would also be to set it to a file that doesn't exist, in which case UntzUntz LAN Scan will also print nothing for the header. page footer = Same as page header above. search template = This can only be a filename. If it is defined and the cannot be found UntzUntz LAN Scan will use the default template. Ok...now we've configured out /etc/untzls.conf file...let's make the templates: Each file has a different set of variables that can be used. Below is a table of variables and the files they work in. +----------+----------------+--------------------------------+ | File | Variable | Description | +----------+----------------+--------------------------------+ | Header | $SMALL_LOGO | This is the small logo defined | | | | in /etc/untzls.conf | | +----------------+--------------------------------+ | | $QUERY_DATA | This is what is the last query | | +----------------+--------------------------------+ | | $SEARCH_RESULTS| Listing of each keyword and the| | | | number of 'hits' for each | | +----------------+--------------------------------+ | | $FIND_SIZE | This is the size (in MB/GB/TB) | | | | of the files found | | +----------------+--------------------------------+ | | $TOTAL_SIZE | This is the total size of all | | | | files on the network. | | +----------------+--------------------------------+ | | $LOW_RESULTS | This is the lowest result on | | | | the page. | | +----------------+--------------------------------+ | | $HIGH_RESULTS | This is the highest result on | | | | the page. | | +----------------+--------------------------------+ | | $TOTAL_RESULTS | Total number of results | | +----------------+--------------------------------+ | | $SEARCH_TIME | Time in seconds it took to find| | +----------------+--------------------------------+ | | $PAGING_BAR | The list of result pages | +----------+----------------+--------------------------------+ | Footer | $PAGING_BAR | The list of result pages | | +----------------+--------------------------------+ | | $UNTZ_FOOTER | A small ad for UntzUntz LS | +----------+----------------+--------------------------------+ | Search | $NUMBER | Result number | | Template +----------------+--------------------------------+ | | $FILE_PATH | Computer\Share\Dir\file | | +----------------+--------------------------------+ | | $FILE_NAME | Name of file found | | +----------------+--------------------------------+ | | $FILE_SIZE | Size of file (in KB/MB/GB) | | +----------------+--------------------------------+ | | $FILE_DATE | Date of file | | +----------------+--------------------------------+ | | $FILE_TIME | Time of file | | +----------------+--------------------------------+ | | $FILE_LOCATION | Computer\Share\Dir | +----------+----------------+--------------------------------+ Notes: 1. $QUERY_DATA has the "" around it already
                Any HTML will work around these variables.  Only one of each variable
                per line.  Meaning you can have:
                                
                                $NUMBER. $FILE_NAME
        
                But not,
                
                                $NUMBER, $NUMBER, $NUMBER
        
                And why would you want to? But if you really did want to:
                
                                $NUMBER,
                                $NUMBER,
                                $NUMBER
                
                Which would be the same.
                Look at the example template files for a better idea of how they work.

Problems or Questions?

        One thing you can start with is changing the DEBUG_LEVEL variable
        Change the 0 to a number between 1 and 9.  1 will give the least amount of
        information, 9 the most.

        Otherwise, email me at jed204@users.sourceforge.net

Current Features and Information:

Currently the program is configured through a configuration file. Within the configuration file is information about the master browser (or a computer on the network), a temporary directory, a logo file for the search page, a network user name to search as (probably guest for most networks).

The crawler current will scan the network for clients, create a share listing, from the share listing create a file listing. There is support for 'robots.txt', if this file is in a 'root' directory of a share the share will be skipped. If it is in a 'lower level' directory of a share, that directory will be skipped.

Currently the crawler will skip hidden and printer shares as well. Once the crawler has found all the files on the network the cgi program can do a search against that. The cgi takes less than 1 second to search through the 30,000 files in 2,500 directories on my network. I created a database with 350,000 and 30,000 directories and it took about 5 to 6 seconds. These benchmarks are from a Pentium III 450 MHz with 512 MB of ram.

Also see the beginning of each source file for more information about development and releases.

If you have any problems feel free to contact me at jed204@users.sourceforge.net

Thanks,
John
jed204@users.sourceforge.net


Sponsored Links

Discussion Groups
  Beginners
  Distributions
  Networking / Security
  Software
  PDAs

About | FAQ | Privacy | Awards | Contact
Comments to the webmaster are welcome.
Copyright 2006 Sourcefiles.org All rights reserved.