README FILE FOR DIABLO TRANSIT FEEDER
---
DIABLO is a news transit and reader system. This readme file describes the transit side of things. You can operate the transit and reader sides separately or together, or just operate one or the other depending on your needs.
The transit side of Diablo is designed to transit news between one or more incoming feeds and one or more outgoing feeds. The transit side of Diablo is not designed to operate as a reader. However, it may be used to back a reader's spool for readers that fetch articles by message-id (which the reader side of Diablo does).
Since the transit portion of Diablo is not designed to support readers, an active file is not usually used with it ('active off' in diablo.config). If you want the transit portion of Diablo to act as the master article number assignment point which downstream sites are slaved off of, you turn ON feeder-side active file support in diablo.config and run the reader side, dreaderd, with the -x option to slave it's article numbering to the master's. When turned on, the active file only effects how the Xref: header is generated by the transit side of Diablo. The transit side of Diablo will still transit articles even if none of the newsgroups are listed in the active file.
The feeder side of diablo does not process control messages, so if you are mastering article numbers on the feeder, you either have to run the reader on the same box in order to process control messages and keep the active file up to date, or you must periodically synchronize the active file from a remote source using dsyncgroups (taking care to use the proper options so you do not overwrite the begin/end/other article numbering parameters that Diablo uses to master XRef: lines).
The transit side of Diablo maintains a history file (the reader side does not). This means that the transit side is able to take multiple feeds in parallel without transiting duplicate articles. The reader side's ability to handle duplicate articles is limited only to the case where the -x option is used on dreaderd. If the option is not used dreaderd CANNOT handle duplicate articles being fed to it. It is often beneficial to run the feeder and reader on the same machine, with the feeder feeding the reader internally, even if the feeder is not going to be used as a (major) spool cache. It is more common to run the feeder and the reader on the same machine if the feeder is given a large enough spool to act as the backend cache for the reader. Or you may just want to run a feeder-only machine with no reader elements on it at all.
The transit side of Diablo (the 'feeder') is strictly for news transit and does not understand reader-related NNTP commands.
The Diablo spool, usually /news/spool/news, is maintained by the transit system. You must size the spool and run dexpire as appropriate to your needs. Diablo stores multiple articles per file in the spool in a two-tiered directory structure. Article files & directories should never be directly edited or removed or you risk corrupting the reference data Diablo stores in the history file. You must use dexpire to free space in the spool. Since Diablo does not write out article files singly it tends to be much more efficient then INN without CNFS. It is roughly on-par with INN + CNFS, though I personally believe it is better.
Typically, anyone taking a full feed these days must dedicate a machine to it that is separate from the newsreader machine that your users use to read news.
The DIABLO transit system is designed to replace the dedicated newsfeeds machine and is designed to be a mostly hands-off affair once you get past configuring and stabiliing it. See TUNING_NOTES for machine configuration suggestions.
OS REQUIREMENTS
See TUNING_NOTES.
WHO SHOULD RUN DIABLO
If you need to run a USENET news feeding/transit system and/or a USENET newsreading system, Diablo may be for you.
WHERE TO GET DIABLO
http://www.openusenet.org/diablo/
REPORTING BUGS
send the bug to: diablo-bugs@openusenet.org send non-bug stuff to: diablo-users@openusenet.org
NEWSGROUPS: news.software.nntp
MAILING LISTS: http://www.plig.net/mailman/listinfo/diablo-users
USE OF REALTIME FEEDS AND FEED DELAYS
If you have several outgoing feeds, you should consider using the
realtime, queueskip and startdelay options in dnewsfeeds. All of your
local and internal feeds should be realtime. Cheap external paths
to the internet can also be realtime. To reduce the cost of running
outgoing feeds over your internet transit, you may wish to weight
the feeds according to cost. For example, our MAE-WEST connection
is a lot cheaper then our MCI T3, so I run outgoing feeds with
MAE-WEST destinations in realtime and run outgoing feeds which go
via MCI in batch mode with a 10 second delay. This way the articles
may actually propogate to the more expensive destinations via other
means prior to my actually attempting to send them direct.
Likewise, if you have T1 and frame customrs, it is usually cheaper
to supply them with a newsfeed yourself rather then force them to
go to someone over the internet. This way they are not eating your
transit bandwidth on newsfeeds. A realtime feed to those people is
best.
CATCHING UP AFTER BEING DOWN
The key item to monitor when catching up on incoming feeds after
being down for a while is the incoming article rate. Diablo will
generate a log line for every 1024 articles received that looks like
this:
Jun 24 11:03:59 news1 diablo[18153]: DIABLO uptime=7:46 arts=241.000K tested=0 bytes=1.842G fed=12.613M
You can calculate the article rate by looking at the delta activity
from two log lines that are around an hour apart from each other.
If the article rate is above 9 articles/sec, diablo is catching up
reasonably well.. as of today, a full feed is around 5 articles/sec.
With a moderate number of incoming feeds, diablo can do around 30
articles/sec. If you have a huge number of incoming feeds that are
all in catchup, in-kernel filesystem locking will begin to interfere
with the history file lookups and updates. Diablo will be able to
maintain a reasonable history file write transaction rate, but the
lookup rate will suffer.
This causes diablo to catch up on articles first without appreciably
reducing the backlog at remote sites due to slow check-responses.
Once it passes a certain threshold, however, and the load on the
history file turns to mostly-read rather then read/write, the
transaction rate will increase dramatically and diablo will generally
be able to cleanup the backlogs very quickly after that.
SPAMALIAS OPTION IN DNEWSFEEDS
Submitted by: uhclem @ nemesis.lonestar.org (Frank Durda IV)
Spamalias
Related to an "alias" command, any originating server (not
transit servers) or username entry in the Path: header that
match a "spamalias" wildcard declaration will cause that
message to not be propagated to any neighbors. Diablo
behaves as though some "alias" command matched that Path:
header on all outbound feeds.
Intended for use on transit-only servers (since in this
implementation, "spamalias" checks are too late to prevent
the message from being viewable from a local reader), the
"spamalias" parameter acts globally and must appear in the
GLOBAL section.
For example,
groupdef GLOBAL
spamalias badpornsite
spamalias xxxnilla
spamalias *.unresponsive.net
spamalias abuseisokay*
spamalias annoyizer
end
These entries would block messages with these Path: headers:
Path: someplace!elsewhere!okaysite!badpornsite!not-for-mail
Path: someplace!elsewhere!okaysite!notspammershonest!annoyizer
Path: someplace!okaysite!xxxnilla!seemelive
Path: someplace!elsewhere!okaysite!west-coast.unresponsive.net!me
Path: someplace!elsewhere!okaysite!east-coast.unresponsive.net!them
Path: someplace!goodplace!elsewhere!abuseisokay
Path: someplace!goodplace!abuseisokay.com!nobody
However, these messages would not be blocked:
Path: someplace!xxxnilla!elsewhere!west.unresponsive.net!goodplace!xyz
Path: someplace!elsewhere!badpornsite!goodplace!gooduser
Path: someplace!elsewhere!abuseisokay.com!elsewhere!innocentuser
Only the last and next-to-last elements of the Path: header are checked
by the "spamalias" command.
All too frequently there are sites who originate spam or
abuse that need to be blocked (mainly to punish a lack of
response to abuse/spam problems), but you don't want to
lose articles that are just passing through that site and
didn't originate there, since your history database is now
poisoned, so you can't get the article via some other route
just to make sure it didn't really originate there. (If
you have limited redundant feeds, hoping an article will
show up via a less-tainted route may not be an option.)
Using "spamalias" provides reasonable blocking of specific
sites or spam/abuse signatures without blacking-out unintended
chunks of the USENET network.
When a "spamalias" match is made, an entry is logged to
news.debug, indicating if it was a position A (originating
site) or B (user) match. The entry in incoming.log for
such an article will show no sites to be fed.
Odd Things and Cautions about Spamalias:
If "spamalias" entries are used outside of the GLOBAL
section, it may either sly slow Diablo because of the amount
of redundant work it ends up doing and may still act globally
or act on some but not all feeds. The "spamalias" command
is not designed to be used to filter subsets of outbound
feeds.
There is a potential to defeat "spamalias" if the blocked
site allows Path: preloading, but in three years of use,
the incidents of any abuser bothering to work around such
filtering have been virtually non-existent, and the number
of big sites that still allow preloading and don't take
their own action against spam/abuse is getting quite small.
When writing "spamalias" expressions that are intended to
match usernames, be cautious since some "dummy" usernames
appear in posts originating from many sites, likely in
addition to whatever it is you are trying to block. For
example, never block "not-for-mail" or "news", since these
are commonly used in place of actual user names.
Since the "spamalias" command tests both originating site
and originating user fields, make sure an entry written
for one field isn't busily zapping non-abusive posts that
happen to match on the other field.
WHEREIS COMMAND
Diablo allows an optional NNTP command 'WHEREIS' which returns
the location of an article on the local spool (filename, offset
and size). This option is useful for a reader accessing the
article from an NFS mounted spool. The use of NFS in Diablo
is strongly discouraged, but some people are forced to use
diablo in an NFS-only environement.
200 news.example.com NNTP Service Ready
WHEREIS <3ba5b2d9.440788@news.tel.hr>
223 0 whereis <3ba5b2d9.440788@news.tel.hr> in \
/news/spool/news/D.00fe7ee6/B.036a offset 83825 length 1018
(line wrap added for this document)
