SourceFiles.org - Use the Source, Luke
Home | Register | News | Forums | Guide | MyLinks | Bookmark

Sponsored Links

Latest News
  General News
  Reviews
  Press Releases
  Software
  Hardware
  Security
  Tutorials
  Off Topic


Back to files

Soundmosaic
Copyright (c) 2001-2003 Steven Hazel <sah@thalassocracy.org>

Soundmosaic constructs an approximation of one sound out of small pieces of other sounds.

The soundmosaic algorithm is: Split the target file up into equal-sized segments, or "tiles". For each tile in the target file, find the closest match in the source files, and replace the target tile with the tile from the source files.

Usage

Build it like this:

$ ./configure && make

Run it like this:

soundmosaic [options] <target file> <output file> [at least one source file]

or, to collate existing output files:

soundmosaic [options] --collate

<target file> <output file> [at least two attempt files]

or, to run as a master server, collating output from slaves:

soundmosaic [options] --master <port>

<target file> <output file> [optional local attempt files]

or, to run as a slave:

soundmosaic [options] --slave <url> [at least one source file]

options

-t, --tilesize=SECONDS Seconds per tile (usually

fractional). Defaults to 0.1.

-p, --partition Merely partition the source file(s)

into tiles, rather than using all of the continuous (overlapping) tiles in the source file(s). This is much faster, but produces lower quality results.

-r, --resume Attempt to resume processing on an

existing, aborted output file. Not valid when collating, or when acting as a master or slave.

Details

Distance metric:

The difference between two tiles is defined as the correlation of the normalized vectors. This is the cosine of the angle between the vectors, and can be calculated with a dot product once the vectors have been scaled to any common length.

In fact, the prospective match is scaled to the volume of the original tile before comparison, and it is written to the output file at that volume. Normalization before comparison means that the overall volume of tiles does not affect the comparison. This also serves to make the output sound a little bit more like the target, since it follows the same broad amplitude changes.

Before 1.1, soundmosaic used the Manhattan distance between the "normalized" vectors, where "normalization" was done in the common audio sense of increasing the volume as much as possible without clipping (this corresponds to mapping onto the surface of a hypercube rather than a hypersphere). The old metric worked reasonably well, but the new metric is much better.

Resampling

Soundmosaic automatically resamples the source files to match the sample rate of the target file. It does this using a simple zero order hold / drop sample resampler, which is low quality and introduces all kinds of artifacts -- it doesn't even low pass filter at the relevant Nyquist frequency. If resampling quality is important to you, you should use a higher quality resampler to adjust all of your source material to the same sample rate as the target file before you run soundmosaic.

Dealing with large amounts of data:

In order to find matches good enough to make both the target and source inputs recognizable in the output, it helps to have a tremendous amount of source data, and a tremendous amount of data storage and processing to go with it. Distributing the system across multiple machines using the --master and --slave options helps to handle that load so that a decent result can be achieved in a more reasonable amount of time.

Normally, we compare each tile with all of the continuous tiles in the source files (one beginning at the first sample, another beginning at the second, and so on). That's very time consuming, though, even for a small amount of data, so the --partition flag is provided to merely partition the source file into non-overlapping tiles, the same as is done with the target file. This method produces lower quality results, but it allows for a variety of source tiles, and prevents the processing time from getting out of hand. It can be a useful way to "test run" a soundmosaic project to get an idea of what the results might be like.

Future development:

I'm interested in ways of speeding up the calculation of distance -- I'm not sure whether soundmosaic can use the standard DSP techniques for calculating correlation more efficiently, because I think the per-tile normalization probably gets in the way.

I'm also interested in distance metrics which are more relevant to the sounds which are important to the human ear. It might be helpful to filter some frequency ranges before doing the comparison, or to use mp3 compression to strip out less important information.

Soundmosaic usually produces output that clicks loudly at the edges of tiles. I'd like to fix that. I could fade the ends of every output tile, but I'm not sure that would sound any better for small tile sizes, and I don't know what the falloff curve should be or how quickly to fade the edges. Or I could split tiles at the nearest 0-crossing, but I don't like the idea of having variable-length tiles.


Sponsored Links

Discussion Groups
  Beginners
  Distributions
  Networking / Security
  Software
  PDAs

About | FAQ | Privacy | Awards | Contact
Comments to the webmaster are welcome.
Copyright 2006 Sourcefiles.org All rights reserved.