| Up: Martijn's Homepage | |
| Prev: PostgreSQL stuff | Next: An rsync-able gzip |
If you prefer to read it as straight text, you can find it here. Also, some time in the near future, most of this information will be merged into the Linux 2.4 Routing HOWTO
Until then this is a fairly good run-down of how Linux Packet Shaping works.
Linux 2.2 Packet Shaping HOWTO
Martijn van Oosterhout, kleptog@svana.org
v0.1, 25 Mar 2000
This document aims to help people discover how to configure and use the
packet shaping capabilities of the Linux 2.2 kernel.
Table of Contents
1. Introduction
2. Disclaimer
3. Related Documentation.
3.1. Feedback.
4. Overview of packet shaping.
5. The programs
5.1. Other requirements
6. Using 'tc'
6.1. Manipulating qdiscs
6.2. Manipulating classes
6.3. Manipulating filters
7. Types of schedulers
8. Types of filters
9. Examples of usage
9.1. Using the "fw" filter
9.2. Using the "route" filter
9.3. Using the "u32" filter
10. Copyright message
11. Acknowledgements
1. Introduction
The is the sum total of my discovered knowledge about the Linux 2.2 Packet
Shaping code. I created it because it was hard to discover and the amount
of documentation is surprisingly small and the examples are not very good.
I conceed that from the programming point of view there do exist documents
that describe how the system works, but none of them from the user's point
of view. Any corrections or suggestions are welcome.
The packet shaping code was mainly written by Alexey Kuznetsov,
<kuznet@ms2.inr.ac.ru>. He has an FTP site contain up-to-date versions of
the iproute2 software required to manipulate the packet shaping modules.
2. Disclaimer
I do not and cannot know everything there is to know about the Linux
network software. Please accept and be warned that this document
probably does contain errors. Please read any README files that are
included with any of the various pieces of software described in this
document for more detailed and accurate information. I will attempt to
keep this document as error-free and up-to-date as possible. Versions
of software are current as at time of writing.
In no way do I or the authors of the software in this document offer
protection against your own actions. If you configure this software,
even as described in this document and it causes problems on your
network then you alone must carry the responsibility.
3. Related Documentation.
This document presumes you understand how to build a Linux kernel with
the appropriate networking options selected and that you understand
how to use the basic network tools such as ifconfig and route. If you
do not, then you should read the NET-3-HOWTO <NET-3-HOWTO.html> in
conjunction with this document as it describes these.
For a closer to the kernel look at Packet Shaping and QoS in general, see
the Linux-QoS-HOWTO available at:
http://qos.ittc.ukans.edu/howto/index.html
For more information of Netlink Sockets you can go here:
http://qos.ittc.ukans.edu/netlink/html
For a HTMLised version of the iproute2+tc notes, see here:
http://snafu.freedom.org/linux2.2/iproute-notes.html
3.1. Feedback.
Please send any comments, updates, or suggestions to me,
<kleptog@svana.org>. The sooner I get feedback, the sooner I can
update and correct this document. If you find any problems with it, please
mail me directly as I can miss info posted to mailing lists.
4. Overview of packet shaping.
Here is a useful comment from the include/linux/pkt_sched.h file in the
Linux kernel source:
/* "Handles"
---------
All the traffic control objects have 32bit identifiers, or "handles".
They can be considered as opaque numbers from user API viewpoint,
but actually they always consist of two fields: major and
minor numbers, which are interpreted by kernel specially,
that may be used by applications, though not recommended.
F.e. qdisc handles always have minor number equal to zero,
classes (or flows) have major equal to parent qdisc major, and
minor uniquely identifying class inside qdisc.
*/
Handles are written as major:minor. If either are left out they are assumed
to be 0. In some cases 0 is special and it cannot be used as a major number.
The numbers are actually hexadecimal so you can use ABC as your handle. Thus
3AB: and 45:B are both valid handles. Using :n as a handle does not work in
all cases.
You can consider the packet shaping code to be a huge array of filters, with
the major numbers (from 1-FFFE) going down and the minor numbers (the
classes, 0-FFFF) going across. A qdisc is assigned to a whole major and
within that each class may have it's own settings. There is one of these
tables for each device.
One of these qdiscs is attached to the root of the device. All packets that
go out of the device start at the zero-th class (column) in this qdisc
(row). To traverse between the nodes are filters. Each node may have any
number of filters and they each have a priority. The packet is tested
against each of the filters in order and if one matches it moves to the
target.
If there are no filters attached to the current node or all the filters
fail, then the packet is queued on that node. The node will then queue the
packet and depending on the qdisc selected and the parameters set, it may
send the packet straight away, queue it for later or drop it altogether.
Each class also has a parent. This parent seems to have various meanings.
For the CBQ scheduler, it specifies the class it may steal bandwidth from if
it exceeds it's own. When a class is deleted, all it's children are deleted
also, even (I believe) if they are referenced by filters. Note that it is
possible to delete all the classes attached to a qdisc. I have not yet
worked out how to delete such a qdisc.
Note that not all qdiscs have classes. The CBQ (Class Based Queueing) does
and there the child may borrow bandwidth from the parents. However, TBF
(Token Bucket Filter) simply is and filters the traffic according to its
rules. So only the major is used and no classes can be created.
5. The programs
For manipulating the packet shaping modules you need the programs named 'tc'
which is part of the iproute2 package. The current version is always
available on Alexeys FTP site mentioned above. As of this writing the latest
version is iproute2-2.2.4-now-ss000305.tar.gz. Most distributions come with
it packaged but it is not generally installed automatically. All the testing
for this document was done with the 991023-2 version of the Debian package
'iproute'.
For using some of the filters you maybe need other programs to configure
other parts of the networking code to set the appropriate flags. For
example, for the 'fw' filter you will need the ipchains package and for the
'route' filter you will need the 'ip' command which is also part of the
iproute2 package. The use of these commands is not covered here though
examples will be given when appropriate.
5.1 Other requirements
You will need to be able to compile your own kernel to create the necessary
modules. In the kernel there is a whole menu under the option
"QoS and/or fair queueing" which you will need most of. I generally compile
all the listed modules as modules so I can play with them all at will.
Module auto-loading for these options does work so if your modutils is configured
correctly you won't even have to load these modules manually.
Also, under the networking options you WILL need the CONFIG_RTNETLINK
(Routing messages) option. It is hidden under the CONFIG_NETLINK
(Kernel/User netlink socket) option in the Networking menu. This is true
of the 'ip' tool as well.
6. Using 'tc'
'tc' can be a fairly hard to use program. The user space generally does a
little bit of syntax checking and then sends it to the kernel. The kernel
sends only a single integer back indicating success or failure. So unless you
made a spelling mistake, your errors will generally be of the form:
RTNETLINK answers: No such file or directory
RTNETLINK answers: File exists
RTNETLINK answers: Invalid argument
The first generally means you referenced a handle that does not exist. The second
generally means to tried to add something where the handle was already in use.
The last is the catch-all error that generally means "Something went wrong".
There is usually no indication of what and generally only much re-reading of
help and trial-and-error will help you here. Part of the reason for writing
this document is to save you such agony.
6.1. Manipulating qdiscs
Qdiscs are added deleted and modified using the commands beginning with 'tc
qdisc'. For example, to add a qdisc with major 1 as the root of device eth1,
you would use the following command:
tc qdisc add dev eth1 handle 1: root cbq bandwidth 100Mbit avpkt 1000 mpu 64
------------ -------- --------- ---- --- -----------------------------------
1 2 3 4 5 6
1. This part says you want to add a qdisc
2. The device to add it to. Required.
3. This is the handle (major) you want to give it. If you don't specify, one
will be assigned for you, starting at around 8000 (remember, hex) and
going up. The minor number must be zero.
4. This means to attach to the root of the device. Only one qdisc may be the
root. Alternatives are "ingress" (for when a packet comes in) and "parent
CLASS" (which is specifying the parent of this qdisc). Each qdisc can
only be the parent of one other class, so the qdiscs form chains hanging
off the "root" and "ingress" nodes. This field is required.
5. This field specifies the qdisc to attach to this major.
There are many choices of schedulers to choose from.
6. This specifies the parameters used to inititialise the :0 class that is
automatically created when you create the qdisc. See the help appropriate
to that qdisc for more details.
After executing the above command you have a CBQ qdisc on major 1 with a class
with handle 1:0 using the CBQ parameters given. This class 1:0 is attached as the
root of device eth1. To delete it again you merely have execute the command:
tc qdisc del dev eth1 root
Note how you only have to specify the fact that it is the root class. To
delete a non-root qdisc you need only specify the parent. When a qdisc is
deleted, all its constituent classes are also deleted.
To view all the current defined classes, use the command:
tc qdisc show [dev DEVICE]
The device is optional. If omitted, all devices are listed.
6.2 Manipulating classes
Once you have created the qdiscs, you probably want to add classes to your qdiscs to
represent the various types of data you wish to shape. For example, with the above qdisc,
to create a class that is restricted to only 2Mbit no matter what, you would use the
following command:
tc class add dev eth1 parent 1: classid 1:1 cbq bandwidth 2Mbit avpkt 1000 prio 1 rate 1Mbit maxburst 10 bounded isolated
------------ -------- --------- ----------- --- -------------------------------------------------------------------------
1 2 3 4 5 6
1. This part indicates you wish to add a class.
2. The device to add it to. Required.
3. The parent of this class. This is different from the parent of a qdisc.
This mainly indicates the major this class is in but it has special
meanings for some schedulers. For example, for CBQ it is the class that
this class may steal bandwidth from if required. This class will also be
delete if its parent is deleted. Required.
4. Is the classid that you want this new class to be. Since the parent must
always have the same major number as the class itself, you are allowed to
leave off the major number and just put :n. Required.
5. The scheduling class of this class, must be the same as the parent qdisc.
Required.
6. The parameters to the scheduling algorithm for this class. See the
appropriate section for more information.
That command creates a new class 1:1 whose parent is 1:0 as using CBQ with
the given parameters. However, this class will not be used as it currently
stands because it needs to have packets sent to it. Meanwhile, if you want
to delete a class, you use the following command:
tc class del dev eth1 classid 1:1
Only the classid is required to delete a class. You may not delete a class
which is the parent of another class, you must delete the child classes
first. It is usually faster to delete the root qdisc since that has the
effect of deleting all the sub-classes.
To show all the classes that belong to a particular device, use the
following command:
tc class show dev DEVICE
Don't beleive the help when it says the device is optional. If you leave it
out it doesn't work. Something that is quite useful is the -s switch when
showing the classes. It lists each class together with the number of packets
transmitted or dropped and other information about the current state of that
class.
6.3 Manipulating filters
Now that you've setup all your classes, you want for the data to be actually
sent to the right classes. For this you need filters that match the data you
want to shape. One of the simplest is the "fw" filter which filters on the
basis of the mark attached by any part of the firewall. For example, the
simplest such rule would be:
tc filter add dev eth1 protocol ip parent 1: prio 1 handle 1 fw classid 1:1
------------- -------- ----------- --------- ------ -------- -- -----------
1 2 3 4 5 6 7 8
1. Indicates you want to add a filter
2. To device eth1. Required.
3. The protocol to filter. At least for this filter type it is required.
4. The parent is the class that this filter will be attached to. The filter
will also be automatically deleted when the parent is. The parent must
exist. This field is required.
5. Means that this filter will be checked before filters of priority greater
than 1. The priority is optional and defaults to one.
6. This is the handle. The handle means different things for different types
of filters. For the "fw" filter, it is the mark the packet must have
gotten from the firewall code. The handle is required
7. Indicates that this is an "fw" type filter. Required.
8. This is a field type that is common to filters. It indicates the class to
go to if the packet matches the filter. This field is required. The
target need not exist yet.
When you delete them, only the priority is required, though you will need to
specify the parent if it is not the default which appears to be 1:0.
tc filter del dev eth1 parent 1:0 prio 5
There is no command to list all the currently installed filters. However,
the following command will list all filters attached to a particular node.
If the optional bit is omitted, the root is listed.
tc filter show dev DEVICE [parent CLASSID | root]
Again, the device is not optional in this case.
7. Types of schedulers
[Would list the various types of schedulers available, what the differences are.]
In the kernel source in the net/sched directory there are source files named sch_*.c.
These file are the source to the schedulers. Each of these files contains a large header
comment describing how the filter works (though not how to configure it).
8. Types of filters
[List the various types of filters, how to configure them and what they do.]
9. Examples of usage
To demonstrate the various ways you can use the Packet Shaping code, I will
setup a scenario and show various ways of doing it. Basically, through our
interface eth1 there is a HostA behind a gateway HostB. The eth1 link is a
100Mbit interface but we want to limit all packets to that machine to
10Mbit. Here is will show two ways of doing this.
9.1. Using the "fw" filter
The "fw" filter relies on the firewall tagging the packets to be shaped. So,
first we will setup the firewall to tag them:
ipchains -I output -d HostA -m 1
Now all packets to that machine are tagged with the mark 1. Now we build the
packet shaping rules to actually shape the packets. First we build a CBQ
class that covers the whole device to attach to the root. Note that the
qdisc attached to the root should always cover the whole of the bandwidth of
the device, or will simply lose the leftover bandwidth.
tc qdisc add dev eth1 handle 1: root cbq bandwidth 100Mbit avpkt 1000 mpu 64
The avpkt represents the average packet size. 1000 is a good estimate. The
mpu is the minimum packet size. 64 is usually used here. These are generally
good defaults.
Now we have a CBQ class covering all the traffic. Now we need to create the
class for the data to that host.
tc class add dev eth1 parent 1: classid 1:1 cbq bandwidth 10Mbit avpkt 1000 prio 1 rate 10Mbit bounded isolated
The classes in CBQ have many more options. What this command basically does
is create a class which is limited to 10Mbit and may not borrow bandwidth
from any other class (bounded), nor may it lend bandwidth to other classes
(isolated).
Now we just need to indicate that we want the packets that are tagged with
the mark 1 to go to class 1:1. This is accomplished with the command:
tc filter add dev eth1 protocol ip parent 1:0 prio 1 handle 1 fw classid 1:1
This should be fairly self-explanatory. Attach to the 1:0 class a filter
with priority 1 to filter all packet marked with 1 in the firewall to
class 1:1.
That's all there is to it! This is the (IMHO) easy way, the other ways are
I think harder to understand.
9.2. Using the "route" filter
This filter filters based on the results of the routing tables. When a
packet that is traversing through the classes reaches one that is marked
with the "route" filter, it splits the packets up based on information in
the routing table. First, as above, we create the two traffic classes:
tc qdisc add dev eth1 handle 1: root cbq bandwidth 100Mbit avpkt 1000 mpu 64
tc class add dev eth1 parent 1: classid 1:1 cbq bandwidth 10Mbit avpkt 1000 prio 1 rate 10Mbit bounded isolated
From here on I'm going on the example given in the Advanced Linux Networking HOWTO.
tc filter add dev eth1 parent 1:0 protocol ip prio 100 route
Here we add a route filter onto the parent node 1:0 with priority 100. When
a packet reaches this node (which, since it is the root, will happen
immediately) it will consult the routing table and if one matches will send
it to the given class and give it a priority of 100. Then, to finally kick
it into action, you add the appropriate routing entry:
ip route add HostA via HostB flow 1:1
(Strangely, though I think I've done everything in the example, this doesn't
seem to work for me. I get an error that goes:
Error: either "to" is duplicate, or "flow" is a garbage.
Someone who knows will have to comment on this.)
9.3. Using the "u32" filter
The "u32" filter is a filter that filter directly based on the contents of the
packet. Thus it can filter based on source or destination addresses or
ports. It can filter based on the TOS and other truly bizarre fields. It
does this by taking a specification of the form [offset/mask/value] and
applying that to all the packets. Fortunately you can use symbolic names much
as with tcpdump.
To begin with you create the classes as in the previous examples:
tc qdisc add dev eth1 handle 1: root cbq bandwidth 100Mbit avpkt 1000 mpu 64
tc class add dev eth1 parent 1: classid 1:1 cbq bandwidth 10Mbit avpkt 1000 prio 1 rate 10Mbit bounded isolated
Then you just add the "u32" filter to make it work.
tc filter add dev eth1 parent 1:0 protocol ip prio 1 u32 match ip dst HostA flowid 1:1
That all there is to it.
10. Copyright message
The Packet-Shaping-HOWTO, a guide to software supporting packet shaping
for Linux. Copyright (c) 2000 Martijn van Oosterhout.
This program is free software; you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation; either version 2 of the License, or (at
your option) any later version.
This program is distributed in the hope that it will be useful, but
WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
General Public License for more details.
You should have received a copy of the GNU General Public License
along with this program; if not, write to the:
Free Software Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139,
USA.
11. Acknowledgements
[List various people]
END
| Up: Martijn's Homepage | |
| Prev: PostgreSQL stuff | Next: An rsync-able gzip |