Modular JDBC SQL Engine (ModSQL)
(c) 2004 Chris Studholme
Introduction
ModSQL is a project who's goal is to develop a modular SQL engine that can be grafted onto any raw database (such as a large flat ASCII file). ModSQL is a JDBC driver that will parse SQL queries and execute them by calling third-party database modules. The database module API is designed such that writing database modules is much easier than writing an entire RDBMS.
Included in this package is the ModSQL engine, three database modules, and a simple command line JDBC client. The database modules are SampleDatabase, used to demonstrate how to write database modules, TemporaryDatabase, for in-memory storage of temporary tables, and AsciiDatabase, which can be used to access various formats of flat ASCII file.
Capabilities
ModSQL works best with read-only databases; however, support for read-write databases was recently added and works well for temporary tables. The following SQL constructs currently work:
- most standard SELECT . FROM . WHERE . GROUP BY . HAVING . ORDER BY .
- SELECT DISTINCT
- operators AND, OR, AS, LIKE, IN, IS, BETWEEN, EXISTS
- functions ABS, CEIL, FLOOR, GREATEST, LEAST, MOD, POWER, ROUND, SIGN, SQRT
- functions LEFT, RIGHT, UPPER, LOWER, LENGTH, LPAD, RPAD, SUBSTR, NVL
- aggregates COUNT, MIN, MAX, SUM, AVG, STDEV
- JOIN...ON and JOIN...USING
- subqueries in the SELECT, FROM, and WHERE sections
- CREATE INDEX ON table [USING SORT or HASH] (column)
- CREATE TABLE, INSERT, UPDATE, DELETE, and DROP TABLE
Licence
This packages is provided under the terms of the GNU General Public Licence. Please read the COPYING file for more details.
Disclaimer
This software is considered beta quality. It is relatively bug free when used in an environment similar to the one it was designed and written in. This software is provided "AS IS" with no warranty or guarantee either written or implied. It is being made available primarily to other developers who may be interested in the ideas contained within.
Contents
AsciiDatabase -- database module for accessing flat ASCII files ModSQL -- modular SQL engine (JDBC driver; needs database modules) SampleDatabase -- in-memory sample database module (for testing purposes) TemporaryDatabase -- in-memory writable temporary database
doc -- JavaDoc package documentation samples -- sample config files and tables ChangeLog -- detailed log of changes COPYING -- licence Makefile -- top level make file NEWS -- what's new in this version README -- this file TODO -- brief list of what should be done in the near & long term UPGRADE -- notes regarding upgrading from earlier versions modsql.c -- convenient wrapper for modsql.class
modsql.class -- compiled modsql.java modsql.java -- command line JDBC user interface
Installation
To use modsql.c, you need to edit the file to set the location of your java runtime environment (if it is not /usr/bin/java), and then type "make modsql". This code has been tested on Linux and Solaris with Java 1.2 and Java 1.3.1.
If you just want to use the Java code, it should be ready to go "out of the box". The command:
java -classpath . modsql --version
should yield something like:
modsql version 0.40 20030402
ModSQL.Driver version 0.40
failed to load database module AsciiDatabase.Manager
SampleDatabase.Manager version 0.40
TemporaryDatabase.Manager version 0.40
- Use
-
java -classpath . modsql -h
for a usage summary. A sample query can be executed with:
java -classpath . modsql -c "select * from antiques"
If modsql.c works for you (type `make` to compile it), you can use:
./modsql (or just modsql if it is in your path)
instead of the java commands above.
- Typing
-
make install
will install the class files in /usr/local/classes and the modsql binary in /usr/local/bin. You will probably need to set your classpath as follows:
CLASSPATH=/usr/local/classes; export CLASSPATH
to run the installed classes.
Config File
The master config file for ModSQL is located at: ./ModSQL/ModSQL.conf. This config file is self documented.
The AsciiDatabase module has its own config file named asciidb.conf, which is described below. The location of this config file can be set in ModSQL.conf.
SampleDatabase
The SampleDatabase module is an in-memory database useful for testing the SQL engine. The following tables are currently defined:
AntiqueOwners
Antiques
EmployeeAddressTable
EmployeeStatisticsTable
Orders
These sample tables are useful when working through James Hoffman's SQL tutorial available at:
http://www.geocities.com/SiliconValley/Vista/2207/sql1.html
AsciiDatabase
To use the AsciiDatabase module you need to create a file named /tmp/modsql/asciidb.conf. Here is an example:
# /tmp/modsql/asciidb.conf
# sample tables
[homeusage]
tabledef=/tmp/modsql/table/home.table
datafile=/tmp/modsql/data/home.log
[sysload]
tabledef=/tmp/modsql/table/load.table
datafile=/tmp/modsql/data/load.log.gz
To make this work, create the directories:
/tmp/modsql
/tmp/modsql/index
and copy the directories:
./samples/table -> /tmp/modsql/table
./samples/data -> /tmp/modsql/data
and the file:
./samples/asciidb.conf -> /tmp/modsql/asciidb.conf
If all goes well, you can do:
java -classpath . modsql -c "select * from homeusage"
or
java -classpath . modsql -c "select * from sysload"
and get results. The format of asciidb.conf is fairly self explanitory. Note that AsciiDatabase will transparently read from gzip compressed files. Also note that you can have multiple datafile's for each table defined. Each additional file is just appended to the previous from ModSQL's point of view.
Table Definition File
Two sample table definition files are located in ./samples/table. Here is home.table:
# table definition file
recordsize=40
[row]
# each line is one column
# srcType srcLen srcOfs colTyp colName options column=char 8 0 int DATE column=char 4 9 int TIME column=char 8 14 int USAGE column=char 16 23 char USER
Lines starting with a # character are comments. A record can contain multiple rows of data. If recordsize is set to -1, records are terminated by a end-ofline marker. The only supported source type is 'char'. Length is the width of the field and offset is the offset from the start of the record or the last 'offset=' line. To demonstrate the use of offset=, the following is the same as the previous:
column=char 8 0 int DATE column=char 4 9 int TIME offset=14 column=char 8 0 int USAGE column=char 16 9 char USER
Supported column types include: char, int, long, single, double, and bool. The bool type requires a case-insensitive value of "1", "yes", or "true" to be true. False otherwise.
The options column can include a comma-seperated list of options. Supported options are "a", "d", "m", and "n", for add, divide, multiply, and null replacement. Here are some examples:
column=char 2 64 int YEAR a1900 column=char 10 30 double A4 d1000,n-0.001
The first column is a two digit year that has 1900 added after translation. The second column is a floating point number with 3 implied decimals and -1 (-0.001 after dividing) is to be interpreted as null. Note that the order of the options is important. The last line could have been:
column=char 10 30 double A4 n-1,d1000
If you need to skip some fixed number of bytes at the beginning of the file, use an offset= option before the first [row] section.
If you need to select particular rows within a record for a given file, use:
option=select,offset,text
where offset is some number offset from the beginning of the record, and text is some text to match. This line should come immediately after a [row] line.
If you need to skip particular rows within a record for a given file, use:
option=skip,offset,text
where offset is some number offset from the beginning of the record, and text is some text to match. This line should come immediately after a [row] line.
I may have missed some options but hopefully this will get you started.
Creating Indices
The index directory (specified in ModSQL.conf) will hold non-native indices created for databases. Use the SQL command:
CREATE INDEX ON table [USING SORT|HASH] (column);
to create an index on the specified column of the specified table. Indices are not updated automatically when the table data changes. An attempt is made to recognize that an underlying table data file has changed (or the table definition file has changed) and not use the index until the index is recreated. You should probably not index files that change regularly (like log files).
Both sorted and hashed index files are supported. There is very little difference between them. When being read the first time, they are simply scanned in both cases. During a table join, where multiple scans of the index are made, efficient lookups will be done. Lookups are only for an exact match, and since the hash lookup is O(1) verses O(log n) for binary search, the hash table index method is a little faster. The downside to a hash table index is that the column values are stored in an order that looks random as opposed to a sorted order. In the future, better use may be made of sorted indices.
Creating an index on multiple columns is not currently supported; however, you can create multiple indices on the same table (different columns). The SQL engine will only use one index per table specified in the FROM section of your query, but it will attempt to choose the best index to use.
Index files make relatively efficient use of disk space. I have several 10's of GB of data indexed with about 130MB of indices.
Bugs
I'm sure there are plenty. Please send comments, criticism, bug reports, etc. to:
Home Page
Located at:
http://modsql.sourceforge.net/
--
$Id: README,v 1.8 2004/01/04 03:01:20 cvs Exp $
