IXP Switch Wishlist Draft
- Date: Wed, 2 May 2001 13:07:16 +0100 (BST)
RIPE European Internet Exchange (EIX) Working Group
Internet Exchange Point Switching "Wishlist"
Mike Hughes mike@localhost
London Internet Exchange
April 2001
ABSTRACT:
---------
At the RIPE meeting held in Amsterdam in February 2000, a number of
participants agreed that the group should produce a "wishlist" to guide
equipment manufacturers when producing boxes aimed at the core switching
market. Over the coming months, ideas were collected from the EIXP
community to form the basis of this document.
In Europe, most Internet Exchange Points use a shared switch fabric to
which the participants connect. Organisations then arrange peering via
bi-lateral peering agreements. It is not compulsory for all particpants
to peer with every other participant (called multi-lateral peering).
Once two participants agree to peer, they will set up BGP4 sessions
between their routers connected to the Exhcange to exchange routes and
traffic. In the majority of cases, the Exchange Point operator does not
become involved in the routing of any traffic across the Exchange, they
choose to leave this to the participants.
For this reason, switched Ethernet has become one of the most common
choices for Exchange Point media. The main reasons behind this are:
* Cost effectiveness
* Simplicity of setup
* Can use standard CAT5 wiring - easy to implement and maintain
* Interfaces available across a wide range of platforms
With the growth of the Internet, more and more traffic is being routed
to Internet Exchange points, and the importance of IXPs has grown in
line with this, especially in Europe where private peering is less
common than North America.
The IXP operators feel that having the right tools and features
implemented in the equipment they deploy will play an important part of
scaling ethernet technology to meet the demands placed upon Exchange
Points.
This is an informational document to outline the various features which
IXPs would like to see implemented in core Ethernet Switching products.
SECURITY FEATURES:
------------------
a) Control of dynamic MAC learning
----------------------------------
Currently, switches are provided with two options, either statically
configured or dynamically learned forwarding information.
Exchange Points like to monitor and control how many MAC addresses are
connected to a participant's port. The XP operator generally does not
desire ad-hoc extensions connected to their network. The common way of
managing this is to enforce a "router-only" or "limited MAC address"
rule.
This is currently controlled by statically configuring forwarding
information, or not controlled, but policed by counting the number of
MAC addresses learned on each port, and action taken against offenders.
Static configuration of forwarding information is a somewhat inelegant
option, as this increases configuration overhead, and decreases
flexibility, especially in case of emergencies.
We propose a configurable "maximum learning" limit, configurable on a
per port basis. In this way, operators can configure participants ports
according to their house rules, but retain the flexibility of dynamic
learning.
This feature would also have to include a "first in-last out" lockdown
facility, to avoid forwarding information for valid addresses being
overwritten by addresses in excess of the exchange's house rules.
b) Disable acting on STP BPDU information
-----------------------------------------
Many exchange operators currently deploy Spanning Tree Protocol (STP) in
networks which contain redundant links/full meshing.
There is however, a danger presented by STP information leaked from a
participant's network. The participant may have connected a poorly
configured switch/router product, and may be leaking their STP
information into that of the exchange.
We would wish to see a configurable option to allow STP information to
be ignored, and filtered at the port, on a per port basis.
c) Wire-speed ACL-type filtering based on L3 header info
--------------------------------------------------------
The ability to look into the layer3 information of a packet, and
selectively monitor, or filter, based on certain layer 3 criteria.
d) ARP/Broadcast snooping and control
-------------------------------------
Many exchange points insist on participants using IP addresses they have
assigned by the exchange operator. It is desirable for the operator to
be able to monitor/restrict "off-net" ARP.
As Ethernet is a broadcast medium, broadcast storms have been known to
bring exchanges to their knees, affecting the forwarding abilities of
both the exchange's switches, and the participants' routers.
Monitoring/rate limiting/control of Ethernet broadcast frames is
desirable.
Most exchanges also forbid the speaking of interior routing protocols
across their peering network. Since these take the form of broadcase
frames on ethernet, broadcast control would help monitor this type of
incidence.
e) Policy exception logging
---------------------------
In the above paragraphs, we have asked for some policy-based tools.
Operators need to know when these policies have been breached.
Good logging of policy exceptions need to be implemented:
* SNMP-trap
* Configurable syslog (i.e. which syslog facility to write to)
f) Access to management interfaces
----------------------------------
In the past, security of management interfaces on Ethernet switching
products as often been lacking.
CLI or web interfaces should support authentication using
username/password pairs, to avoid the use of "password only"
authentication which implies shared passwords.
CLI interfaces should also support SSH access, using either
username/password or secure key authentication.
Web interfaces should be HTTPS/SSL enabled, to avoid passwords being
passed in the clear over HTTP.
Management interfaces should be able to perform authentication from an
external source, such as TACACS, RADIUS or LDAP services, as well as
providing locally held accounts (have to be retained for emergencies)
All management interfaces, CLI, web and SNMP should be able to benefit
from access-list control. The access lists should be able to support
variable-length subnet masks.
Ability to disable management interfaces on a per-VLAN basis. Many XP
operators choose to configure a "management" VLAN, so that all
management is done out-of-band of the core peering traffic. It is
desirable to have the management interfaces to listen on the management
networks only.
g) Port mirroring
-----------------
It is sometimes necessary to mirror participants' ports, either because
a participant is suspected of some inappropriate activity, or to help
obtain information to debug a problem.
Not all exchange points have staff on site 24x7, and port mirroring may
need to be remotely set up, without hands-on intervention on-site.
The ability to allow any port to mirror any other port with a similar
lower speed within the chassis would allow the operator to connect a
traffic collector/analyser device to a monitoring port, and simply
configure the switch to mirror a port as desired to monitoring port.
SCALABILITY AND RESILIENCE
--------------------------
a) Spanning Tree
----------------
Spanning Tree is currently the only dynamic solution available to
operators of exchange points for dynamically managing redundant links in
their architecture.
There are a number of problems with Spanning Tree:
* Slow convergence
- especially in cases of root bridge re-election
* Wasteful of reslilent/redundnant resources
- redundnant links are switches off
- no traffic sharing
* Security concerns (highlighted above)
As the routes collected at an Exchange Point can be routed all over the
world, any routing instability can act like dropping a pebble in a pond,
and will spread around the Internet.
It's desirable to maintain stable routing sessions across Exchange Point
LANs to minimise these routing flaps, because of load it places on
routers, and the effects of route dampening penalties.
We believe that being able to declare ports as "end-stations" should
avoid them being counted in the STP calculation, enable these ports to
start forwarding more rapidly, and speed overall STP convergence time.
Rapid spanning tree (IEEE 802.1w) should be implemented
(http://www.ieee802.org/1/pages/802.1w.html).
b) Resilient Packet Ring - IEEE 802.17
--------------------------------------
This is a standards-based version of the technology currently used by
Cisco called DPT (Dynamic Packet Transport). This consists of a counter-
rotating ring-system, with spacial reuse and "ring wrapping" circuit
protection.
The Cisco version is currently implemented over SONET/SDH media,
however, the standardised version is being designed to be more media
agnostic, and the IEEE working group has already elected to provide
support for Gigabit Ethernet and 10 Gigabit Ethernet.
RPR shows promise of becoming an ideal backbone technology for use in a
flat layer-2 network, such as an exchange point. It will allow for
redundnant self-healing backbones, with optimal use of all interswitch
capacity, without the need for STP.
c) Trunking and Link-Aggregation
--------------------------------
It's become increasingly common for exchange points to become multiple
switch and multiple site based, and many need to deploy link aggregation
to handle the volume of interswitch traffic, where it exceeds the
maximum speed of a single link.
Most equipment implements load-sharing using either round-robin or
address-based algorithms. The address-based system generally employs a
hash of source/destination MAC address.
Address-based load-sharing may be preferable to minimise jitter.
In exchange points, many pieces of equipment will have similar MAC
addresses, especially the first and last bytes (corresponding to vendor
and slot position on router).
If the hash is only based on part of the address, this can result in
poor efficiency of load-sharing.
Load-sharing algorithms should consider the whole address when
calculating the hash used.
Load-sharing of broadcasts and multicast traffic should be implemented.
Behaviours such as forwarding all broadcast/multicast traffic out of the
"primary" port in a trunk have been observed when load-sharing using
destination MAC addresses has been implemented.
IEEE 803.3ad link-aggregation should be implemented.
d) Multicast Control and Containment
------------------------------------
Most switches are configured with IGMP snooping for multicast control.
However, in an exchange point, with only routers attched, there is no
IGMP present, only PIM and MSDP, and all multicast packets are flooded
out of all ports.
An exchange point, however, is an ideal place for mutlicast peering to
happen, inject the traffic once, and it comes out several times (as much
as is needed, or in the current situation, as much as isn't needed!).
Cisco developed RGMP (Router Group Management Protocol). This is a
proprietary technology whereby the router can communicate to the switch
which multicast groups it wishes to see.
This is, however, a vendor specific feature, and a wide range of routing
platforms and switching platforms are present at many exchange points -
both in equipment used by the operator, and the participants.
Therefore, this is not a workable solution for most exchange points,
whose princples are often include "equal treatment" of participants.
While it may not solve all potential issues with multicast peering,
implementing PIM-SM snooping and pruning within the switches will
achieve the traffic containment requirements.
e) Intelligent Layer 2 Forwarding - MAC-SPF
-------------------------------------------
The majority of ethernet switching products in use today are
switch/router products and contain enough processing power and memory to
handle sizeable routing tables.
Exchange points which do not involve Multilateral peering currently need
to provide a transparent layer-2 service to the participants, and
Ethernet has been widely deployed to achieve this.
In a bi-laterally peered XP, the XP cannot become involved in the layer-
3 routing decision process. There are a couple of reasons for this, but
an overriding one is a matter of scalability. Every participant will
have their own routing view. Involving the XP infrastructure in layer-3
routing decisions would need the XP to hold as many routing views as
members (Yes, MPLS could be employed, how is not clear right now).
With the growth in size of some exchange points, this has led to
expansion to multiple sites and switches, and consequently a more
complex topology. This has shown up the weaknesses in Spanning Tree.
An option this group considers worth exploration is the development of
a dynamic layer-2 routing protocol, to replace MAC learning and STP.
A number of thoughts have been given to this, and the following ideas
have been arrived at:
* Retain dynamic MAC learning on "end-station" ports
- added to forwarding database as usual
* Replace dynamic learning on inter-switch links with a routing process
- switches use some form of l2 neighbour discovery
* Use existing OSPF algorithms and similar LSAs to manage the routing
information
This would need to interoperate between different switch vendors to
allow for scalability and freedom.
Currently, one issue is where and how to pursue this. It's unclear
whether this should lie within the IEEE 802 LMSC or within the IETF.
Many switch vendors like the sound of this idea, as it solves some
problems being faced by service providers working in the MAN space, who
need to provide transparent L2 transport services. However, they need to
be convinced of demand and ensure interoperability.
The IETF has in the past only been concerned with layer-3 and up, "IP-
over-Foo". However, they are currently of the mindset that they do need
do need to dig down and look at interaction between IP and the Foo it is
transmitted over. This is one possible avenue for pursuing this further.
f) MPLS
-------
Another option for improving scalability and resilience is MPLS (Multi
Protocol Label Switching). Currently, there is rather limited support
for MPLS in switching products, mainly because customer demand is
pushing vendors to implement this in their high-end router products
first. Many high-end router products do not posess sufficient port-
density for use in an IXP environment.
MPLS may enable an exchange to run a layer-3 network internally, and use
an existing IGP to route traffic internally, By using MPLS tags to pass
the traffic between the peers on the edge of the IXP cloud, the IXP does
not need to be aware of IP source and destination of the packets.
PHYSICAL WISHES
---------------
IXPs are high-uptime environments. The equipment used in an IXP needs to
be able to satify this requirement, in terms of redundancy, and hot-
swappable components.
* Hot swap of management/switch fabric cards with instantaneous failover
to any installed redundancy (not rebooting onto the "backup")
* Full-redunancy of PSUs, and hot-swap (i.e. box should run on 50% of
PSUs)
* Rapid booting and card startup (after all, much functionality is
implemented in the ASIC hardware)
* GBIC-optics for flexibility, easy replacement, and maximised port
utilisation (freedom to choose SX/LX, etc)
ACKNOWLEDGEMENT AND THANKS
--------------------------
Thanks are due to the "usual suspects" in the RIPE EIX Working Group, but
specifically Christian Panigl, Keith Mitchell and Rick Payne, for their
contributions to this document.
|