SourceFiles.org - Use the Source, Luke
Home | Register | News | Forums | Guide | MyLinks | Bookmark

Related Sites

Latest News
  General News
  Reviews
  Press Releases
  Software
  Hardware
  Security
  Tutorials
  Off Topic


Back to files
        Modular JDBC SQL Engine (ModSQL)
        (c) 2004 Chris Studholme

Introduction

ModSQL is a project who's goal is to develop a modular SQL engine that can be grafted onto any raw database (such as a large flat ASCII file). ModSQL is a JDBC driver that will parse SQL queries and execute them by calling third-party database modules. The database module API is designed such that writing database modules is much easier than writing an entire RDBMS.

Included in this package is the ModSQL engine, three database modules, and a simple command line JDBC client. The database modules are SampleDatabase, used to demonstrate how to write database modules, TemporaryDatabase, for in-memory storage of temporary tables, and AsciiDatabase, which can be used to access various formats of flat ASCII file.

Capabilities

ModSQL works best with read-only databases; however, support for read-write databases was recently added and works well for temporary tables. The following SQL constructs currently work:

  • most standard SELECT . FROM . WHERE . GROUP BY . HAVING . ORDER BY .
  • SELECT DISTINCT
  • operators AND, OR, AS, LIKE, IN, IS, BETWEEN, EXISTS
  • functions ABS, CEIL, FLOOR, GREATEST, LEAST, MOD, POWER, ROUND, SIGN, SQRT
  • functions LEFT, RIGHT, UPPER, LOWER, LENGTH, LPAD, RPAD, SUBSTR, NVL
  • aggregates COUNT, MIN, MAX, SUM, AVG, STDEV
  • JOIN...ON and JOIN...USING
  • subqueries in the SELECT, FROM, and WHERE sections
  • CREATE INDEX ON table [USING SORT or HASH] (column)
  • CREATE TABLE, INSERT, UPDATE, DELETE, and DROP TABLE

Licence

This packages is provided under the terms of the GNU General Public Licence. Please read the COPYING file for more details.

Disclaimer

This software is considered beta quality. It is relatively bug free when used in an environment similar to the one it was designed and written in. This software is provided "AS IS" with no warranty or guarantee either written or implied. It is being made available primarily to other developers who may be interested in the ideas contained within.

Contents

AsciiDatabase -- database module for accessing flat ASCII files ModSQL -- modular SQL engine (JDBC driver; needs database modules) SampleDatabase -- in-memory sample database module (for testing purposes) TemporaryDatabase -- in-memory writable temporary database

  doc            -- JavaDoc package documentation
  samples        -- sample config files and tables

  ChangeLog      -- detailed log of changes
  COPYING        -- licence
  Makefile       -- top level make file
  NEWS           -- what's new in this version
  README         -- this file
  TODO           -- brief list of what should be done in the near & long term
  UPGRADE        -- notes regarding upgrading from earlier versions
  modsql.c       -- convenient wrapper for modsql.class

modsql.class -- compiled modsql.java modsql.java -- command line JDBC user interface

Installation

To use modsql.c, you need to edit the file to set the location of your java runtime environment (if it is not /usr/bin/java), and then type "make modsql". This code has been tested on Linux and Solaris with Java 1.2 and Java 1.3.1.

If you just want to use the Java code, it should be ready to go "out of the box". The command:

java -classpath . modsql --version

should yield something like:

modsql version 0.40 20030402
ModSQL.Driver version 0.40
failed to load database module AsciiDatabase.Manager SampleDatabase.Manager version 0.40 TemporaryDatabase.Manager version 0.40

Use

java -classpath . modsql -h

for a usage summary. A sample query can be executed with:

java -classpath . modsql -c "select * from antiques"

If modsql.c works for you (type `make` to compile it), you can use:

./modsql (or just modsql if it is in your path)

instead of the java commands above.

Typing

make install

will install the class files in /usr/local/classes and the modsql binary in /usr/local/bin. You will probably need to set your classpath as follows:

CLASSPATH=/usr/local/classes; export CLASSPATH

to run the installed classes.

Config File

The master config file for ModSQL is located at: ./ModSQL/ModSQL.conf. This config file is self documented.

The AsciiDatabase module has its own config file named asciidb.conf, which is described below. The location of this config file can be set in ModSQL.conf.

SampleDatabase

The SampleDatabase module is an in-memory database useful for testing the SQL engine. The following tables are currently defined:

AntiqueOwners
Antiques
EmployeeAddressTable
EmployeeStatisticsTable
Orders

These sample tables are useful when working through James Hoffman's SQL tutorial available at:

http://www.geocities.com/SiliconValley/Vista/2207/sql1.html

AsciiDatabase

To use the AsciiDatabase module you need to create a file named /tmp/modsql/asciidb.conf. Here is an example:

# /tmp/modsql/asciidb.conf

# sample tables
[homeusage]
tabledef=/tmp/modsql/table/home.table datafile=/tmp/modsql/data/home.log
[sysload]
tabledef=/tmp/modsql/table/load.table datafile=/tmp/modsql/data/load.log.gz

To make this work, create the directories:

/tmp/modsql
/tmp/modsql/index

and copy the directories:

./samples/table -> /tmp/modsql/table
./samples/data -> /tmp/modsql/data

and the file:

./samples/asciidb.conf -> /tmp/modsql/asciidb.conf

If all goes well, you can do:

java -classpath . modsql -c "select * from homeusage"

or

java -classpath . modsql -c "select * from sysload"

and get results. The format of asciidb.conf is fairly self explanitory. Note that AsciiDatabase will transparently read from gzip compressed files. Also note that you can have multiple datafile's for each table defined. Each additional file is just appended to the previous from ModSQL's point of view.

Table Definition File

Two sample table definition files are located in ./samples/table. Here is home.table:

# table definition file

recordsize=40

[row]
# each line is one column

  # srcType       srcLen  srcOfs  colTyp  colName options
  column=char     8       0       int     DATE
  column=char     4       9       int     TIME
  column=char     8       14      int     USAGE
  column=char     16      23      char    USER

Lines starting with a # character are comments. A record can contain multiple rows of data. If recordsize is set to -1, records are terminated by a end-ofline marker. The only supported source type is 'char'. Length is the width of the field and offset is the offset from the start of the record or the last 'offset=' line. To demonstrate the use of offset=, the following is the same as the previous:

  column=char     8       0       int     DATE
  column=char     4       9       int     TIME
  offset=14
  column=char     8       0       int     USAGE
  column=char     16      9       char    USER

Supported column types include: char, int, long, single, double, and bool. The bool type requires a case-insensitive value of "1", "yes", or "true" to be true. False otherwise.

The options column can include a comma-seperated list of options. Supported options are "a", "d", "m", and "n", for add, divide, multiply, and null replacement. Here are some examples:

  column=char     2       64      int     YEAR    a1900
  column=char     10      30      double  A4      d1000,n-0.001

The first column is a two digit year that has 1900 added after translation. The second column is a floating point number with 3 implied decimals and -1 (-0.001 after dividing) is to be interpreted as null. Note that the order of the options is important. The last line could have been:

column=char 10 30 double A4 n-1,d1000

If you need to skip some fixed number of bytes at the beginning of the file, use an offset= option before the first [row] section.

If you need to select particular rows within a record for a given file, use:

option=select,offset,text

where offset is some number offset from the beginning of the record, and text is some text to match. This line should come immediately after a [row] line.

If you need to skip particular rows within a record for a given file, use:

option=skip,offset,text

where offset is some number offset from the beginning of the record, and text is some text to match. This line should come immediately after a [row] line.

I may have missed some options but hopefully this will get you started.

Creating Indices

The index directory (specified in ModSQL.conf) will hold non-native indices created for databases. Use the SQL command:

CREATE INDEX ON table [USING SORT|HASH] (column);

to create an index on the specified column of the specified table. Indices are not updated automatically when the table data changes. An attempt is made to recognize that an underlying table data file has changed (or the table definition file has changed) and not use the index until the index is recreated. You should probably not index files that change regularly (like log files).

Both sorted and hashed index files are supported. There is very little difference between them. When being read the first time, they are simply scanned in both cases. During a table join, where multiple scans of the index are made, efficient lookups will be done. Lookups are only for an exact match, and since the hash lookup is O(1) verses O(log n) for binary search, the hash table index method is a little faster. The downside to a hash table index is that the column values are stored in an order that looks random as opposed to a sorted order. In the future, better use may be made of sorted indices.

Creating an index on multiple columns is not currently supported; however, you can create multiple indices on the same table (different columns). The SQL engine will only use one index per table specified in the FROM section of your query, but it will attempt to choose the best index to use.

Index files make relatively efficient use of disk space. I have several 10's of GB of data indexed with about 130MB of indices.

Bugs

I'm sure there are plenty. Please send comments, criticism, bug reports, etc. to:

Chris.Studholme@utoronto.ca

Home Page

Located at:

http://modsql.sourceforge.net/

--
$Id: README,v 1.8 2004/01/04 03:01:20 cvs Exp $


Other Sites

Discussion Groups
  Beginners
  Distributions
  Networking / Security
  Software
  PDAs

About | FAQ | Privacy | Awards | Contact
Comments to the webmaster are welcome.
Copyright 2006 Sourcefiles.org All rights reserved.