Flow Loader and Virtual Information Output
F.L.A.V.I.O. is a netflow collector data grapher. It gets data from netflow collector, loads it into a MySQL database and gets it back to graph daily, weekly, monthly and yearly charts. It shows traffic splitted by ports, as well as total statistics, and generates percentile 95 information suitable for billing in burstable services.
It has been written in perl using DBD, DBI and GD modules, so it's completely portable and can be run in many different platforms.Additionally it can be easily extended to support new features or to present data in different formats.
BASIC INFORMATION
Cisco Netflow Data Export feature is included in every Cisco router and switch
platform.
It's a lightweight data flow statistics exporting feature, from CPU usage
standpoint, that can be used to store data flow records statistics for future
analysis.
The data statistics are exported every second or so using udp to a listening
server running Cisco's Netflow Collector which generates binary or plain text
files with that information. Cisco offers Netflow Data Analyzer, an interactive
tool that allows data manipulation, suitable for occasional data review but
not designed for being run in batch processes for graphs generation.
Many people uses tools like MRTG or Cricket to take SNMP information and graph traffic, however SNMP is a CPU hog when getting information for every port in clobbered switches and routers. On the other hand, SNMP just provides global interface statistics and not per UDP and TCP port statistics as netflow does.
F.L.A.V.I.O. parses Netflow Collector data files to get flow statistics, aggregates them and loads them in a MySQL DataBase. Then it may use the stored data to draw per UDP and TCP daily, weekly, monthly and yearly charts as well as totalized charts. It stores charts in a directory tree structure that allows easy visualization.
F.L.A.V.I.O. also adds a complete drop-in replacement for netflow collector that provides:
- easier installation and configuration (no configuration file at all),
- on-the-fly collection and parsing mode,
- freely availability (both, as in free beer and free speech),
- complete integration with netflow_parse.pl.
INSTALLATION
For running F.L.A.V.I.O. you'll need a proper perl 5.0 installation and the next modules that can be found in CPAN archives:
DBI-1.21.tar.gz or newer
DBD-mysql-2.1013.tar.gz or newer
GD-1.38.tar.gz or newer
You also need to have MySQL installed and running in that server. Having Apache or any web server is desirable in order to access the charts remotely.
Steps
- Create a netflow database in MySQL using the next SQL sentence as root from mysql command line interface:
CREATE DATABASE netflow;
2. Create a netflow user and give him total permisions on netflow database:
USE mysql;
INSERT INTO user values ("localhost","netflow",password("whatever"),"N","N",
"N","N","N","N","N","N","N","N","N","N","N","N");
INSERT INTO db values ("localhost","netflow","netflow","Y","Y","Y","Y","Y","Y",
"Y","Y","Y","Y");
flush privileges;
3. Create a customers table in netflow database:
USE netflow;
CREATE TABLE customers (name varchar(30), ip varchar(15), username varchar(15),
password varchar(15), last_ip varchar(15), INDEX (ip));
4. Add your customers to customers table replacing customer_name and ip_address by the real data:
USE netflow;
INSERT INTO customers values("customer1_name","ip1_address","","","");
INSERT INTO customers values("customer1_name","ip2_address","","","");
INSERT INTO customers values("customer1_name","ip3_address","","","");
INSERT INTO customers values("customer2_name","ip4_address","","","");
INSERT INTO customers values("customer3_name","ip5_address","","","");
INSERT INTO customers values("customer3_name","ip6_address","","","");
For big customers with many consecutive addresses use this simplified format instead:
INSERT INTO customers values("customer4_name","initial_address","","","last_address");
where initial_address and last_address are the initial and the last address of that block respectively, assigned to that customer.
Next time you run netflow_parse.pl it will take care of these special records and convert them into normal one_ip_per_customer records.
5. Install perl DBI, DBD-mysql and GD modules following their instructions. Summarily untar each one and run the next command sequence for each one as root:
perl Makefile.PL
make
make test
make install
6. Untar FLAVIO-X.X.tar.gz in a directory. You'll get four perl files:
netflow_collector.pl
netflow_parse.pl
netflow_aggregate.pl
netflow_graph.pl
Look at the first lines of each script and change them if you want to use a different database name, username or password.
Decide if you'll use F.L.A.V.I.O.'s netflow_collector, Cisco's collector or OSU flow-tools collector and set $format accordingly in netflow_parse.pl.
Set $diff_as to 1 in netflow_aggregate.pl if you want traffic differentiated by Autonomous Systems in your weekly, monthly and yearly tables.
Set $diff_if to 1 in netflow_aggregate.pl if you want traffic differentiated by Interfaces in your weekly, monthly and yearly tables.
Also change $multiplier factor in netflow_graph.pl if your routers feature multisampling (i.e. they export one every N flows).
Optionally set $fast_mode to 1 in netflow_parse.pl to speed up parsing. It may create bigger tables but it will be significantly faster.
7a. If you're using Cisco's collector:
- Review your netflow collector configuration files and create at least two threads, one for incoming and one for outgoing traffic and store each of them in a different directory tree (required to be able to graph incoming versus outgoing traffic). Make sure you're using DetailASMatrix for aggregation type as data file parsing depends on aggregation scheme.
- Optionally you can make netflow collector use a single thread for incoming and outgoing traffic and pass "C" flag to netflow_parse.pl to ask it to read customers table to see if an IP is a valid internal IP. Of course, for this mode to work properly you need an up to date customers table as it will only record data for valid IP's on the table.
- F.L.A.V.I.O. is hardcoded to process 5 minutes files, so you must make sure that your threads are configured for exporting in periods of 5 minutes. Otherwise the resulting values will be wrong. If the period is different than 5 minutes, netflow_parse.pl will exit with an error.
- Aggregation format must be DetailASMatrix. If the Aggregation format is different than DetailASMatrix, netflow_parse.pl will exit with an error.
- Make sure you have your netflow collector files in an uncompressed ascii format. If they are binary files, transform them to ascii with cisco's bin_to_ascii tool. If they are compressed, use gunzip to uncompress them.
- Create a shell script with a simple for/next loop to pass each data file as an argument to netflow_parse.pl like:
for a in outgoing_data_dir/date/*; do ./netflow_parse.pl S $a; done for a in incoming_data_dir/date/*; do ./netflow_parse.pl D $a; done
7b. If you're using F.L.A.V.I.O.'s netflow_collector:
- Set @valid_routers in netflow_collector.pl to allow receiving flows from your real routers.
- Create a directory where to store flow files and error.log.
- Execute netflow_collector.pl in one of three ways: STANDARD which will left
flows in ascii format ready for batch processing from netflow_parse.pl;
ON-THE-FLY parsing mode (w/backup) to run netflow_parse.pl every time a new
flow file is created (once every five minutes by default) with "C" and the
flow file as arguments; or ON-THE-FLY wo/backup mode to run netflow_parse.pl.
with "C" and "-" arguments and pass flows through its standard input (flow
files are note created inside flows directory).
- Syntax
./netflow_collector.pl 9876 /var/flows/ ./netflow_parse.pl
(for on-the-fly w/backup parsing mode)
./netflow_collector.pl 9876 /var/flows/ ./netflow_parse.pl -
(for on-the-fly wo/backup parsing mode)
./netflow-collector.pl 9876 /var/flows/
(for batch parsing mode)
7c. If you're using OSU Flow-tools collector module:
- create a directory to store flow files.
- pass files uncompressed in ascii format to netflow_parse.pl using flow_print format 5.
- Run netflow_aggregate.pl to generate weekly, monthly and yearly tables, and run netflow_graph.pl to graph everything. Replace YEAR MONTH and DAY by proper date. Do not pad MONTH and DAY with left zeros. Replace data_root_directory accordingly. Create a cron job to run it once a day for the previous day's data.
./netflow_aggregate.pl YEAR MONTH DAY
cd /data_root_directory/
/path_to_FLAVIO/netflow_graph.pl YEAR_MONTH_DAY (i.e. 2002_4_17)
/path_to_FLAVIO/netflow_graph.pl WYEAR_WEEK (i.e. W2002_16) /path_to_FLAVIO/netflow_graph.pl MYEAR_MONTH (i.e. M2002_4) /path_to_FLAVIO/netflow_graph.pl YYEAR (i.e. Y2002)
Optionally you could ask netflow_graph.pl to aggregate by customers' names instead of ip addresses using "C" option
11. Set up Apache to look up the graphs in /data_root_directory/
12. Done!
F.L.A.V.I.O. WORKFLOW
netflow_collector.pl
Usage: netflow_collector.pl port path_to_results [path_to_netflow_parse.pl] [-]
port: UDP port to receive flow statistics from routers path_to_results: save results in this directory path_to_netflow_parse.pl:execute netflow_parse.pl for on-the-fly loading
to the database w/backup (optional) -: execute netflow_parse.pl for on-the-fly loading to the database
wo/backup (optional)
netflow_collector.pl receives flows from routers in a UDP port and exports them to ascii files. It's totally compatible with Cisco's NDE (Netflow Data Export) flows version 1, 5 and 7. It's able to call netflow_parse.pl every time it saves a flows file passing the flow file as an argument to it. i.e.:
./netflow_collector.pl 9999 /var/flows/ ./netflow_parse.pl
is similar to:
./netflow_collector.pl 9999 /var/flows
and
./netflow_parse.pl C /var/flows/fileXXXX.flw
where fileXXXX.flw is each of the created flow files.
When using the "-" (dash) mode, flow files are not created, output being directly piped to netflow_parse.pl standard input.
On-the-fly parsing modes won't work on platforms where perl is not allowed to fork new processes (for many reason being potential flows loosing one of them).
By default netflow_collector.pl will save flow files every 300 seconds. Change $period in netflow_collector.pl to modify the standard behaviour.
netflow_collector.pl will create an error.log file inside flows directory. It will drop warnings and errors into it.
netflow_parse.pl
Usage: netflow_parse.pl [S|D|C] file1 file2 file3 file4
[S|D|C] look for local IP in source/destination/customers_table
a single '-' (dash) instead of filenames reads flows from stdin
netflow_parse.pl generates a daily table for the day for which the flow statistics below, if it doesn't exist. It creates a table per day as the amount of daily data can be huge (more than 100,000 records per day for 100Mbps of peak traffic). It gets the data from the neflow collector files and loads records into the daily table.
netflow_aggregate.pl
Usage: netflow_aggregate.pl year month day
year in yyyy format
month & day without leading 0
netflow_aggregate.pl generates weekly, monthly and yearly tables for which the day passed as an argument to it belongs. It gets the data from the daily table, reduces it to its 1/6th for the weekly table, 1/12th for the monthly table and 1/24th for the yearly table. It also calculates the totals for each sample from the per port information and uploads them into the weekly, monthly and yearly tables, and create tables if they don't exist.
netflow_graph.pl
Usage: netflow_graph.pl table [C] [Customer_Name]
table in year_month_day format without left zeros for day table or
table in Wyear_week format without left zeros for week table or
table in Myear_month format without left zeros for month table or
table in Yyear format for year table
C: aggregate by customer name instead of IP address
Customer_Name: single customer mode (C flag required) (optional)
netflow_graph.pl gets data from a single table and draws a chart for that information. It draws a green line for output and a blue line for input, scalating Y and X axis accordingly. It adds a legend on the graph with the basic reference data and the 95th percentile information.
DATABASE LAYOUT
Customers table:
mysql> describe customers;
| Field | Type | Null | Key | Default | Extra | +----------+-------------+------+-----+---------+-------+
| name | varchar(30) | YES | | NULL | | | ip | varchar(15) | YES | MUL | NULL | | | username | varchar(15) | YES | | NULL | | | password | varchar(15) | YES | | NULL | | | last_ip | varchar(15) | YES | | NULL | |
+----------+-------------+------+-----+---------+-------+
mysql> show index from customers;
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Comment | +-----------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+---------+ | customers | 1 | ip_index | 1 | ip | A | 42 | NULL | NULL | | +-----------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+---------+
Daily tables:
mysql> describe 2002_4_12;
| Field | Type | Null | Key | Default | Extra | +-----------+----------------------+------+-----+---------+-------+
| ip | varchar(15) | YES | MUL | NULL | | | port | varchar(12) | YES | | NULL | | | starttime | int(10) unsigned | YES | | NULL | | | endtime | int(10) unsigned | YES | MUL | NULL | | | local | char(1) | YES | | NULL | | | octets | int(10) unsigned | YES | | NULL | | | intf | smallint(5) unsigned | YES | | NULL | | | auts | smallint(5) unsigned | YES | | NULL | |
+-----------+----------------------+------+-----+---------+-------+
mysql> show index from 2002_4_12;
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Comment | +-----------+------------+-------------------------------+--------------+-------------+-----------+-------------+----------+--------+---------+
| 2002_4_12 | 1 | ip_port_local_st_et_intf_auts | 1 | ip | A | NULL | NULL | NULL | | | 2002_4_12 | 1 | ip_port_local_st_et_intf_auts | 2 | port | A | NULL | NULL | NULL | | | 2002_4_12 | 1 | ip_port_local_st_et_intf_auts | 3 | local | A | NULL | NULL | NULL | | | 2002_4_12 | 1 | ip_port_local_st_et_intf_auts | 4 | starttime | A | NULL | NULL | NULL | | | 2002_4_12 | 1 | ip_port_local_st_et_intf_auts | 5 | endtime | A | NULL | NULL | NULL | | | 2002_4_12 | 1 | ip_port_local_st_et_intf_auts | 6 | intf | A | NULL | NULL | NULL | | | 2002_4_12 | 1 | ip_port_local_st_et_intf_auts | 7 | auts | A | NULL | NULL | NULL | | | 2002_4_12 | 1 | et_port | 1 | endtime | A | NULL | NULL | NULL | | | 2002_4_12 | 1 | et_port | 2 | port | A | NULL | NULL | NULL | |
+-----------+------------+-------------------------------+--------------+-------------+-----------+-------------+----------+--------+---------+
Any other table:
mysql> describe W2002_15;
| Field | Type | Null | Key | Default | Extra | +-----------+----------------------+------+-----+---------+-------+
| ip | varchar(15) | YES | MUL | NULL | | | port | varchar(30) | YES | | NULL | | | starttime | int(10) unsigned | YES | MUL | NULL | | | endtime | int(10) unsigned | YES | MUL | NULL | | | local | char(1) | YES | | NULL | | | octets | int(10) unsigned | YES | | NULL | | | intf | smallint(5) unsigned | YES | | NULL | | | auts | smallint(5) unsigned | YES | | NULL | |
+-----------+----------------------+------+-----+---------+-------+
mysql> show index from W2002_15;
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Comment | +----------+------------+---------------------+--------------+-------------+-----------+-------------+----------+--------+---------+
| W2002_15 | 1 | ip_port_local_st_et | 1 | ip | A | NULL | NULL | NULL | | | W2002_15 | 1 | ip_port_local_st_et | 2 | port | A | NULL | NULL | NULL | | | W2002_15 | 1 | ip_port_local_st_et | 3 | local | A | NULL | NULL | NULL | | | W2002_15 | 1 | ip_port_local_st_et | 4 | starttime | A | NULL | NULL | NULL | | | W2002_15 | 1 | ip_port_local_st_et | 5 | endtime | A | NULL | NULL | NULL | | | W2002_15 | 1 | et_port | 1 | endtime | A | NULL | NULL | NULL | | | W2002_15 | 1 | et_port | 2 | port | A | NULL | NULL | NULL | | | W2002_15 | 1 | st_et | 1 | starttime | A | NULL | NULL | NULL | | | W2002_15 | 1 | st_et | 2 | endtime | A | NULL | NULL | NULL | |
+----------+------------+---------------------+--------------+-------------+-----------+-------------+----------+--------+---------+
TIPS AND TRICKS
Flow-Tools files support
F.L.A.V.I.O has preliminary support for flow-tools since version 1.2.1. Netflow_parse.pl can parse and import records from files created from flow-tools using:
flow-cat <filename> | flow-print -f 5
Change Netflow parse $format variable to "FT" and $fixed_year to reflect current year.
Make sure your Flow-tools file format is 5 as this is hardcoded into FLAVIO.
Files parsing
Make sure your files have records belonging to just one day, otherwise you will end adding records from different days into the same daily table and messing everything up. The best way to ensure not having more than one day per file is doing file rotation at midnight.
netflow_parse.pl fast mode
Fast mode (set $fast_mode to 1 to enable it) avoids looking for existing records in daily tables in order to update them if the new potential record matchs with it. It may create bigger tables if there is more than one ascii source file for the same period and the same customer/ip (as when you get information from multiple routers in the same path) which is unlikely to happen. On the other hand it will considerably speed up things as it will avoid a SELECT and an UPDATE to the big daily tables.
"Read flows from STDIN" mode
netflow_parse.pl has the ability of reading flows in ascii format from STDIN. This mode avoids creating temporary files when converting flows from binary formats. It can be used for an on-the-fly database loading while uncompressing and converting files.
i.e.:
flow-cat <filename> | flow-print -f 5 | netflow_parse.pl C -
Some people say that this mode speeds up things, others say it doesn't. Your milleage may vary...
"Single customer" graphing mode
netflow_graph.pl now supports a customer name after the C flag to draw just graphs for a single customer. This allows to speed up things and even to generate on the fly graphs for a single customer.
i.e. for customer foo_isp:
netflow_graph.pl 2002_6_12 C foo_isp
Autonomous Systems and Interfaces Support
Since version 1.2.0 F.L.A.V.I.O. supports autonomous systems and interfaces parsing, recording, aggregating and graphing.
Just set $diff_as and/or $diff_if to 1 in netflow_aggregate and netflow_graph to generate records aggregated by Autonomous Systems and Interfaces and graph them accordingly.
Very large tables
If the amount of generated data looks excesive, verify that you have loaded your incoming traffic statistics passing "D" and your outgoing traffic statistics passing "S" parameters to netflow_parse.pl. Remember that netflow_parse.pl uses the D|S flag to determine which column to use for data aggregation (usually it will aggregate data for internal hosts). If you, mistakenly give it the wrong column, it will try to aggregate data per world host and that will usually result in 900% bigger tables.
bash: ./netflow_parse.pl: bad interpreter: No such file or directory
If you're getting "bad interpreter" error, this is likely due to your perl not located in /usr/bin/perl. In that case you can change the first line of the scripts to reflect your perl's location.
DBD::mysql::st execute failed: Table 'collector.2002_4_13' doesn't exist at ../scripts/netflow_graph.pl line 48. Unable execute query:DBI::db=HASH(0x298f8c)->err, DBI::db=HASH(0x298f8c)->errstr
This errors can reflect one of two things: either your database name, username or passwords are different to those defined in the scripts (you can change the lines with "my $database - my $username - my $password to reflect your values), or the table passed as an argument to netflow_parse.pl doesn't exist (may be you made a mistake on its name).
ADDITIONAL INFORMATION
Percentile 95 index is the 95% highest sample from the total day, week, month or year. It's calculated sorting the data, discarding the highest 5% spikes and taking the highest remaining burst. It's useful for billing in burstable situtations where the customer wants a commited available bandwidth plus some remaining burstable capacity. It's an alternative to billing for megabyte transferred.
Flavio Villanustre
flavio@geminis.myip.org
http://geminis.myip.org/
