he Problem
You have a huge number of .wav-files (or CDs/LPs that you can convert to .wav) and you have a number of machines running *IX. Usually You would take Your fastest machine and a software like lame, bladeenc or the Frauenhofersoftware to convert them to mp3's. All the other CPUs are idle and lonely \:)
The Idea
Why not write a little framework to distribute the work. And while we're at it also create a filestructure that you will find Your mp3's again. The best thing would be: "Insert an audiocd start something wait a little and soon You will find Your MP3s. Another issue is, that ripping a whole audiocd might take up to 700MB of diskspace which might be an issue for some systems.
Another issue is naming the MP3s and setting the tags right. But there is the CDDB for that. So the idea was to provide a framework that
- Parallelizes the work
- Optionally saves diskspace for ripping to a minimum
- Minimizes the stuff You have to do by hand like naming files, calling the ripper and the encoder sorting the files
My Solution
The franework I designed consists of three components:
- The Rlamecpu (remotelamecpu) at first I wrote a little bit more of a perlscript for that, which is still included, but basically it is a two-line-shellscript, that turns a machine that has inetd running to a remote-encoder You just have to install lame, and the shellscript to the machine and edit /etc/inetd.conf and /etc/services
- The CPU-Broker (Rlamed.pl). This is a process running on one machine in the networks that is responsible for the management of all Rlamecpus installed in step 1. Whenever an rlameclient from point 3 wants to encode some wavfiles it asks the Rlamed for cpus, which it receives in the form ipaddress:portnumber. During work the Rlamec sends statusupdates to the daemon. After work is finished the rlamec sends a releasemessage so that the rlamecpu can be used by other rlamec's. A client can also get the rlamed to send statusupdates to it for monitoringpurposes. Finally the rlamed supports dynamically adding and removing cpus from its pool and redistributing them among clients if necassary.
- The Rlameclient (Rlamec.pl) finally does the work. It contacts the cpubroker, gets the songs to work on either from the audiocd in the cdromdrive (since I have a cd-changer it actually is an array of devices) and from the cddb or from infofiles in a configurable directory. If there are no files in the directory and the actual information is fetchd via the cddb, the infofiles are created so they won't have to be fetched again if you interrupt the work. For each song it checks, if there is already a file with this Title_of_the_song.mp3 in the directory "Interpreten/Name of Singer" underneath a configurable startdirectory. If the file exists nothing happens but that there will be a softlink created in "Interpreten/Name of Singer/Alben/Name of Album/Indexofsong_Title_of_song/mp3" that points to the file. Otherwise the Rlamec looks underneath another configurable directory in a directory that is named after the cddbid (as is the infofile by the way) if track<index>.cdda.way exists. If it doesn't it rips it from the the audiocd in the correct device using cdparanoia. If ripping is continued a process is forked off to read the file and send it to the rlamecpu. If this is finished the MP3 is sent back and then put in the proper location as stated above. The signalhandler for SIGCHLD deletes the ripped .wav-file if it thinks the MP3 was received correctly and then work begins on the next song. The Rlamec also receives statusupdates from it's children. In a later release I might use that information to get rid of the slowest CPUs first. If there are less songs left to work on than there are CPUs the Rlamec. releases the CPUs it doesn't need anymore. If it receives a message that it got an additional CPU or a CPU was deleted it either distributes an undone song to the new CPU or marks the deleted CPU to be released if it has finished it's current song.
The Protocol
The Rlamed binds itself on a port that is specified in it's configfile. If a TCP-connection is created to this port it understand the following commands (they are case insensitive):
*
getcpu number
This requests as many cpus as are stated in number. Rlamec normally requests 5 CPUs. The answer received is either
WAIT
This means there are currently no free cpus and as soon as a one gets free it will be allocated. The other answer is
CPU ipaddr:port
There are as many lines of this format as there are free CPUs but at most as many as there were requested. *
getstatus
The answer to this command is a list of all CPUs and their status. The answer looks something like this:
CPU 192.168.1.1:9999@192.168.1.1:1224|Albumname|Singer's name|Songindex Songtitle
Which means, that the CPU on 192.168.1.1:9999 is working for 192.168.1.1 (sourceport 1224 in its connection to the CPUbroker) working on albumname, from Singer's name song number songindex. You get one line per cpu. If the cpu has nothing to do and is not allocated you will just see ipaddress and portnumber followed by idle if it is allocated but idle (the rlamec is ripping for another cpu) than you will the
cpu:port@client:port|idle
if the client is currently ripping a song for this cpu you will see a message at the end "ripping index". *
addcpu ipaddr:port
This adds the cpu at ipaddr:port to the pool of available CPUs. Rlamed then tries to reallocate this CPU to either a client that has no CPU's or a client that has less CPUs then it requested. This client receives a message like:
CPU 10.10.10.10:9999
and then gives it some work to do. All statusreceivers (see below) get a message like
CPU 10.10.10.10:9999|idle
*
delcpu ipaddr:port
This removes the CPU from the pool of available CPUs. If the CPU is allocated it is deleted as soon, as the clients that has it releases it. After this message is received by rlamed the client that has allocated gets the same message to mark the cpu for release if it has finished it's work. Afterwards the client sends a releasemessage to the rlamed. If the releasemessage is received (or the CPU was not allocated) all statusreceivers get the message forwarded. *
releasecpu ipaddr:port
This message can only be sent by a client that has a cpu allocated. It is either sent, when there is no work left for a cpu or when the delete-message was sent to a client and the cpu has finished it's work. *
sendstatusupdates
This makes the client that has the connection to the rlamed to a statusreceiver. Whenever the Rlamed receives a performanceupdate (see below) this client receives a message in the format like with getstatus but with three more fields which are the bytes that are already processed, the total bytes processed and the rate in bytes/second that this cpu processes. *
perfupdate ipaddr:port|bytesfinished|size|rate
This message is first sent by the rlamec masterprocess to set all fields. Later on the client's children send it with the size field empty, since the client does not no the size. *
quit
This quits the connection (as does closing the socket). Doing this will also release cpus the client might have allocated.
