From fm at st-kilda.org Wed May 4 14:30:18 2005 From: fm at st-kilda.org (Fearghas McKay) Date: Wed, 4 May 2005 14:30:18 +0200 Subject: [eix-wg] IXP Switching Wishlist v3.0 draft Message-ID: Comments to Mike at mike at linx.net ------------ Forwarded Message ------------ Date: 04 May 2005 11:42 +0100 From: Mike Hughes To: eix-wg at ripe.net Subject: IXP Switching Wishlist v3.0 draft Hi all, Here is a current draft of the wishlist, containing all recent feedback and requests. Key changes: * Removed sections on MAC-SPF and MPLS options - these don't seem to be gaining any traction, and don't seem as relevant anymore given the improvement in things such as RSTP. * Added section on VLAN tag space issues - tag rewrite or "virtual bridges" * Added items on 240V AC power, and Environmental monitoring * Added further detail to filtering, mac locking, multicast, and security. Let me know if you have any further comments, otherwise, hope to see you tomorrow between 11.00 and 12.30. Thanks, Mike -- Mike Hughes Chief Technical Officer London Internet Exchange mike at linx.net http://www.linx.net/ "Only one thing in life is certain: init is Process #1" ---------- End Forwarded Message ---------- -- Mike Hughes Chief Technical Officer London Internet Exchange mike at linx.net http://www.linx.net/ "Only one thing in life is certain: init is Process #1" Date: Wed, 04 May 2005 11:42:03 +0100 From: Mike Hughes To: eix-wg at ripe.net Subject: IXP Switching Wishlist v3.0 draft Message-ID: X-Mailer: Mulberry/3.1.6 (Win32) X-NCC-RegID: uk.linx MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="==========A330610A0DCDC318557B==========" Hi all, Here is a current draft of the wishlist, containing all recent feedback and requests. Key changes: * Removed sections on MAC-SPF and MPLS options - these don't seem to be gaining any traction, and don't seem as relevant anymore given the improvement in things such as RSTP. * Added section on VLAN tag space issues - tag rewrite or "virtual bridges" * Added items on 240V AC power, and Environmental monitoring * Added further detail to filtering, mac locking, multicast, and security. Let me know if you have any further comments, otherwise, hope to see you tomorrow between 11.00 and 12.30. Thanks, Mike -- Mike Hughes Chief Technical Officer London Internet Exchange mike at linx.net http://www.linx.net/ "Only one thing in life is certain: init is Process #1" -------------- next part -------------- RIPE European Internet Exchange (EIX) Working Group Internet Exchange Point Switching "Wishlist" Version 3.0 DRAFT Edited by Mike Hughes London Internet Exchange May 2005 ABSTRACT: --------- At the RIPE meeting held in Amsterdam in February 2000, a number of participants agreed that the group should produce a "wishlist" to guide equipment manufacturers when producing boxes aimed at the core switching market. Over the coming months, ideas were collected from the EIXP community to form the basis of this document. In Europe, most Internet Exchange Points use a shared switch fabric to which the participants connect. Organisations then arrange peering via bi-lateral peering agreements. It is not compulsory for all particpants to peer with every other participant (called multi-lateral peering). Once two participants agree to peer, they will set up BGP4 sessions between their routers connected to the Exhcange to exchange routes and traffic. In the majority of cases, the Exchange Point operator does not become involved in the routing of any traffic across the Exchange, they choose to leave this to the participants. For this reason, switched Ethernet has become one of the most common choices for Exchange Point media. The main reasons behind this are: * Cost effectiveness * Simplicity of setup * Can use standard CAT5 wiring - easy to implement and maintain * Interfaces available across a wide range of platforms With the growth of the Internet, more and more traffic is being routed to Internet Exchange points, and the importance of IXPs has grown in line with this, especially in Europe where private peering is less common than North America. The IXP operators feel that having the right tools and features implemented in the equipment they deploy will play an important part of scaling ethernet technology to meet the demands placed upon Exchange Points. This is an informational document to outline the various features which IXPs would like to see implemented in core Ethernet Switching products. SECURITY AND MANAGEMENT FEATURES: --------------------------------- a) Control of dynamic MAC learning ---------------------------------- Currently, switches are provided with two options, either statically configured or dynamically learned forwarding information. Exchange Points like to monitor and control how many MAC addresses are connected to a participant's port. The XP operator generally does not desire ad-hoc extensions connected to their network. The common way of managing this is to enforce a "router-only" or "limited MAC address" rule. This is currently controlled by statically configuring forwarding information, or not controlled, but policed by counting the number of MAC addresses learned on each port, and action taken against offenders. Static configuration of forwarding information is a somewhat inelegant option, as this increases configuration overhead, and decreases flexibility, especially in case of emergencies. We propose a configurable "maximum learning" limit, configurable on a per port basis. In this way, operators can configure participants ports according to their house rules, but retain the flexibility of dynamic learning. The filtering should not impose a performance hit on those ports which are mac-limited. The lockdown should automatically flush when there is a state transition on the interface. There should be multiple levels of locking: * "Forwarding" limit - the maximum number of addresses you will forward for on this port * "Soft" limit - the limit at which you will record syslog events * "Hard" limit - the limit at which you shut down the port (drop link if able) and record a syslog event A hard limit should require manual intervention to reset the locking and bring the port back up. This feature must include a "first in-last out" mechanism in the lockdown facility, to avoid forwarding information for valid addresses being overwritten by addresses in excess of the exchange's house rules. All the above locking, timeout, and reset rules should be configurable by the network operator. b) Disable acting on STP BPDU information ----------------------------------------- Many exchange operators currently deploy Spanning Tree Protocol (STP) in networks which contain redundant links/full meshing. There is however, a danger presented by STP information leaked from a participant's network. The participant may have connected a poorly configured switch/router product, and may be leaking their STP information into that of the exchange. We would wish to see a configurable option to allow STP information to be ignored, and filtered in hardware on "edge ports", on a per port basis. There should also be an option to generate traps or log messages based on transgressions of the policy. c) Wire-speed ACL-type filtering based on L2/L3 header info ----------------------------------------------------------- The ability to look into the layer 2 or 3 header information of a packet, and selectively monitor, or filter, based on certain layer 2 or 3 criteria. This could be done using pattern matching or masking. A common example of an area where this is desirable is to permit frames of only certain ethertypes to enter the network through an edge port. This sort of filtering should be implemented in hardware wherever possible, and not have an effect on the forwarding performance of the system. Where this is not possible, it must be clearly documented. d) ARP/Broadcast snooping and control ------------------------------------- Many exchange points insist on participants using IP addresses they have assigned by the exchange operator. It is desirable for the operator to be able to monitor/restrict "off-net" ARP. As Ethernet is a broadcast medium, broadcast storms have been known to bring exchanges to their knees, affecting the forwarding abilities of both the exchange's switches, and the participants' routers. Monitoring/rate limiting/control of Ethernet broadcast frames is desirable. Most exchanges also forbid the speaking of interior routing protocols across their peering network. Since these take the form of broadcast or multicast frames on ethernet, control would help monitor this type of incidence. Such control should be able to distinguish (through appropriate configuration) between legitimate ARP and genuine broadcast storms. There should be suitable configuration knobs to be able to rate limit, shut down, log exceptions, etc. e) Support for MARP? -------------------- We had looked at seeking support for something such as MARP - "MultiAccess Reachability Protocol". This was defined in an Internet Draft, which allows for detection of failures at L2 across multiple switch hops. However, this has failed to become published as an RFC, as Cisco has asserted IPR over the content of the draft, and would therefore require licencing. This seems to have somewhat killed this promising idea off. f) Policy exception logging --------------------------- In the above paragraphs, we have asked for some policy-based tools. Operators need to know when these policies have been breached. Good logging of policy exceptions need to be implemented: * SNMP-trap * Configurable syslog (i.e. which syslog facility to write to) * RFC3164 compliant syslogging * Syslog over SSL g) Access to management interfaces ---------------------------------- In the past, security of management interfaces on Ethernet switching products as often been lacking. CLI or web interfaces should support authentication using username/password pairs, to avoid the use of "password only" authentication which implies shared passwords. CLI interfaces should also support SSHv2 access, using either username/password pair, or public key authentication. Web interfaces should be HTTPS/SSL enabled, to avoid passwords being passed in the clear over HTTP. Support SCP/SFTP for config copy/upload/download, as well as existing methods (TFTP/FTP). Management interfaces should be able to perform authentication from an external source, such as TACACS, RADIUS or LDAP services, as well as providing locally held accounts (have to be retained for emergencies) All management interfaces, CLI, web and SNMP should be able to benefit from access-list control. The access lists should be able to support variable-length subnet masks. Ability to disable management interfaces on a per-VLAN basis. Many XP operators choose to configure a "management" VLAN, so that all management is done out-of-band of the core peering traffic. It is desirable to have the management interfaces to listen on the management networks only. On devices which are designed to support high bandwidth per-slot, such as high-density GigE or 10GigE, it is preferable to have a 10/100 Mb management port provided on the system, to avoid burning a fast port for management. h) Port mirroring ----------------- It is sometimes necessary to mirror participants' ports, either because a participant is suspected of some inappropriate activity, or to help obtain information to debug a problem. Not all exchange points have staff on site 24x7, and port mirroring may need to be remotely set up, without hands-on intervention on-site. The ability to allow any port to mirror any other port with a similar lower speed within the chassis would allow the operator to connect a traffic collector/analyser device to a monitoring port, and simply configure the switch to mirror a port as desired to monitoring port. i) Statistics and Accounting ---------------------------- As well as implementing defacto SNMP counters/RMON, also consider implementing the following: * Per-VLAN traffic statistics * sFlow export support (via management interface) * Counters based on common ethertypes (IPv4, IPv6, multicast, ARP, etc) SCALABILITY AND RESILIENCE -------------------------- a) Spanning Tree ---------------- Spanning Tree is currently the only cross-platform dynamic solution available to operators of exchange points for dynamically managing multiple redundant links in their architecture. There are a number of problems with Spanning Tree: * Slow convergence - especially in cases of root bridge re-election * Wasteful of reslilent/redundnant resources - redundnant links are switched off - no traffic sharing * Security concerns (highlighted above) As the routes collected at an Exchange Point can be routed all over the world, any routing instability can act like dropping a pebble in a pond, and will spread around the Internet. It's desirable to maintain stable routing sessions across Exchange Point LANs to minimise these routing flaps, because of load it places on routers, and the effects of route dampening penalties. We believe that being able to declare ports as "end-stations" should avoid them being counted in the STP calculation, enable these ports to start forwarding more rapidly, and speed overall STP convergence time. Rapid spanning tree (IEEE 802.1w) should be implemented (http://www.ieee802.org/1/pages/802.1w.html), and results from testing RSTP on certain platforms show that for simple topologies with few redundant links, sub-second failover and reconvegence is achievable with minimal tuning or additional configuration. b) Ring Restoration Protocols ----------------------------- IEEE 802.17 - This is a standards-based version of the technology currently used by Cisco called DPT (Dynamic Packet Transport). This consists of a counter-rotating ring-system, with spacial reuse and "ring wrapping" circuit protection. The Cisco version is currently implemented over SONET/SDH media, however, the standardised version is being designed to be more media agnostic, and the IEEE working group has already elected to provide support for Gigabit Ethernet and 10 Gigabit Ethernet. Proprietary Protocols - There are a number of proprietary ring protocols, such as Extreme's EAPS (published as informational RFC3619), or Foundry's MRP. They are relatively similar in operation, in that they make assumptions about the number of redundant links in a topology (i.e. only one), have a concept of master and transit nodes, use a "heartbeat" sent out by the master, and topology change messages are passed between the nodes to speed network reconvergence (by triggering FDB flushing, and backup port unblocking on the master node). These recovery protocols may become less important as RSTP becomes more graceful. c) Trunking and Link-Aggregation -------------------------------- It's become increasingly common for exchange points to become multiple switch and multiple site based, and many need to deploy link aggregation to handle the volume of interswitch traffic, where it exceeds the maximum speed of a single link. Most equipment implements load-sharing using either round-robin or address-based algorithms. In exchange points, many pieces of equipment will have similar MAC addresses, especially the first and last bytes (corresponding to vendor and slot position on router). This causes significant problems if the load-sharing hash does not use enough significant bits in the frame. If the hash is only based on part of the address, this can result in poor efficiency of load-sharing, and "clutching" of traffic on a single link inside a group. It's prefereable that load-sharing algorithms should consider the whole L2 address, and where possible the L3 header information, when calculating the hash used. Load-sharing of broadcasts and multicast traffic should be implemented. This is because behaviour such as forwarding all broadcast/multicast traffic out of the "primary" port in a trunk have been observed when load-sharing using destination MAC addresses has been implemented. IEEE 803.3ad link-aggregation "LACP" should be implemented. d) Multicast Control and Containment ------------------------------------ Most switches are configured with IGMP snooping for multicast control. However, in an exchange point, with only routers attched, there is no IGMP present, only PIM and MSDP, and all multicast packets are flooded out of all ports. An exchange point, however, is an ideal place for mutlicast peering to happen, inject the traffic once, and it comes out several times (as much as is needed, or in the current situation, as much as isn't needed!). Cisco developed RGMP (Router Group Management Protocol). This is a proprietary technology whereby the router can communicate to the switch which multicast groups it wishes to see. This remains, despite being released as an informational RFC (RFC3488), a vendor specific feature, and a wide range of routing and platforms are present at many exchange points - both in equipment used by the operator, and the participants. These are true multi-vendor environments. Therefore, this is not a workable solution for most exchange points, whose princples are often include "equal treatment" of participants. While it may not solve all potential issues with multicast peering, implementing PIM-SM snooping and pruning within the switches will achieve the traffic containment requirements. Where PIM snooping is available, this should not have a negative effect on the overall forwarding perfomance of the system. Where there is a performance impact, this and it's surrounding caveats shall be clearly documented. e) VLAN tagspace issues/overlapping ----------------------------------- A serious emerging issue is VLAN tag space overlapping/clashing issues. Most metro transport networks can solve this by using q-in-q (tag stacking), however, this doesn't apply to shared networks like Internet Exchanges. Current switches use a 1:1 mapping of 802.1q vlans to bridge groups, which is the way 802.1q was probably intended. This mapping should be loosened if not abandoned - nowadays there are so many ways to egress an ethernet frame from a switch that more and more often we have to resort to 'tricks' to put the right label on the right ethernet packet going out the right interface. This problem is being exacerbated by a number of issues: * Increased use of switch router products (e.g. Cisco 7600) * Use of switches as "channel-banks" - breaking out higher speed router interfaces * Use of metro-ethernet, lan extension or Ethernet over MPLS ("Martini") circuits to connect to the IXP We think there are two (fairly similar) approaches to solving this: * Basic VLAN tag rewrite * Separate the tag from the virtual bridge instance VLAN tag rewrite is, as it's name suggests, being able to rewrite a dot1q tag on a specific interface to a VLAN ID on the switch. This would need to be implemented on both ingress and egress. The other option is complete seperation of VLAN ID from the virtual bridges inside the switch. You assemble a framework where you can place untagged ports, tagged ports, q-in-q tagged ports, mpls endpoints, atm vc's all together in into the same virtual bridge. Effectively a bridge group which can contain any number of these sort of entities. f) Link failure detection ------------------------- Link failure detection should be implemented, and should look like: * UDLD - Uni-Directional Link Detection * LFN - Link Failure Notification This avoids the risk of an ethernet link going "one-way" and fooling the restoration protcols that the link is working, when really it isn't. ENVIRONMENTAL MONITORING ------------------------ There should be reasonable environmental monitoring provided: * Temperature sensors * Fan health sensors * Power supply health sensors There should be exception logging via SNMP trap and syslog (as specified above) of any incidents. It should also be possible to shut down a malfunctioning element in the system (automatic, user configurable, or manual), in order to preserve system health. For example, a power supply failing in a system could cause an instability in the device. If the system could make a decision to shut that power supply down, and assuming a redundant configuration, the switch would then operate in a stable condition until such time that the power supply could be exchanged. PHYSICAL WISHES --------------- IXPs are high-uptime environments. The equipment used in an IXP needs to be able to satify this requirement, in terms of redundancy, and hot-swappable components. * Hot swap of management/switch fabric cards with instantaneous failover to any installed redundancy (not rebooting onto the "backup") * Full-redunancy of PSUs, and hot-swap (i.e. box should run on 50% of PSUs) * Rapid booting and card startup (after all, much functionality is implemented in the ASIC hardware) * GBIC/SFP-optics for flexibility, easy replacement, and maximised port utilisation (freedom to choose SX/LX, etc) * "Coloured" (DWDM/CWDM) GBIC/SFP/Xenpak/XFP (etc, etc) support * Vendor "lockdown" of pluggable interfaces should either not be implemented, or be able to be switched off in configuration. * 220-240V AC power options. Unlike most telco-managed facilities, the carrier-neutral facilities common in Europe do not provide indigenous 48V DC power. Power distribution is done using the regular utility supply voltage in that country - usually ~230V AC in EU countries. * Cable testing functionality in copper ports, and optical power metering in optical ports. ACKNOWLEDGEMENT AND THANKS -------------------------- Thanks are due to the "usual suspects" in the RIPE EIX Working Group, but specifically Christian Panigl, Kurtis Lindqvist, Keith Mitchell, Daniele Arena, and Remco Van Mook, for their contributions to this document. From fm at st-kilda.org Wed May 4 14:51:34 2005 From: fm at st-kilda.org (Fearghas McKay) Date: Wed, 4 May 2005 14:51:34 +0200 Subject: [eix-wg] EIX WG Final Agenda RIPE 50 Stockholm Message-ID: Greetings The EIX-WG meeting will be held on May 5th at the RIPE 50 meeting in Stockholm - in the Thursday 11-12:30 slot. Current Agenda 1 Scribe 2 Agenda Bashing 3 Minutes Approval 5mins for all 4 IXP Presentations AMSIX DE-CIX Equinix Gigapix INXS LINX LONAP NaMeX NDIX Netnod NYIIX/LAIIX Union-IXP VIX Euro-IX 45 5 RIPE NCC Presentation on Routing Registry & associated training courses 5 mins 6 Switching Wishlist 7 AOCB 5 mins for both 8 "The Great (Public vs Private) Peering Debate: Peering at 10Gbps" - Bill Norton This new paper focuses on an Economic Analysis of 10G Peering from the ISP Perspective. It is demonstrates that even with new and pretty expensive 10G hardware, peering makes sense financially after a few gbps of peering traffic. The Public vs. Private peering part is required for the research because the Peering Coordinator Community points out that the next best alternative to 10G public peering is peeling off cross connects or circuits and private peering, so the paper compares these two options. The data used is from both the USA and Europe. 35 mins -=-=- Presentations should be prepared in advance so that they can be uploaded to both the ROSIE server and the local laptop to ensure a rapid turnaround of presenters. There is a link on the ROSIE website on the front page to the upload section. Please upload your presentations before the coffee break otherwise you may find that you will be presenting in front of a blank screen !! With the time contstraints we will only be using the one laptop for presenting. To be clear you will not be able to present from your own laptop. The session will be audiocast. See you in tomorrw! Fearghas From fm at st-kilda.org Wed May 4 17:15:33 2005 From: fm at st-kilda.org (Fearghas McKay) Date: Wed, 4 May 2005 17:15:33 +0200 Subject: [eix-wg] RIPE49 eix-wg minutes Message-ID: Comments and Corrections to me please Thanks Fearghas --- begin forwarded text Hi, Please find attached the minutes of RIPE49's eix-wg session.. Apologies for the delay, it somehow slipped my attention :( -- Rene ================================== Minutes EIX-WG, RIPE49, Manchester ================================== Wednesday September 22 11:00 - 12:30 BST 14:00 - 15:30 BST Agenda ====== A. Administrative Matters B. IXP Presentations C. SLAs & Pricing at IXPs - Call for information for research D. Switching Wish List - Final call for input E. Euro-IX Update F. AMS-IX Membership Survey G. Routing Registry Courses Announcement H. KIX Overview and Future Plan A. Administrative Matters ========================= Chair: Fearghas McKay Co-Chair: Mike Hughes Scribe: Rene Wilhelm B. Presentations by Exchange Points ==================================== AMS-IX Steven Bakker -------------------- The change of topology has been completed 10 GigE is in betatesting 15 partners (NL-IX) 50% of traffic coming through partner program Question: Any news on IPv6 participants? Answer: we have 50 IPv6, 6 multicast connectees DE-CIX - Bernhard Kroenung -------------------------- http://www.ripe.net/ripe/meetings/ripe-49/presentations/ripe49-eix-decix.pdf Proposal for a bit more idenpendence from ECO approved. DE-CIX will start to offer VLAN services by end of year. Had first technical meeting on September 9th. Aggregated traffic 23 Gbps on 21/09/2004. 134 members, 25% using IPv6 LINX, Mike Hughes ----------------- http://www.ripe.net/ripe/meetings/ripe-49/presentations/ripe49-eix-linx.pdf Reached 40 Gig/sec aggregate peak traffic this monday! Membership continues to grow, 162 members from 27 countries Upgrade of Foundry platform is now almost completed Extreme platform had some reliability problems during summer, researching upgrade paths. Multicast experiment by BBC during olympics Question: What is your experience with 10 GigE? Is there a business case for connecting at such speeds, any customers looking at it, or are they satisfied with two or three 1 GigE connections? Answer: LINX have no members at 10 GigE yet; connecting two or three GigE can work well (and we will sell that if requested by customer) but we do want to stimulate parties to use a 10 GigE instead; the price of a 10 GigE port is set at 3.5 times that of 1 normal GigE port. Question: how many parties connect on each platform? how many on IPv6? Answer: Foundry infrastructure is the primary platform to which everone connects. 80-85% in addition connected to the Extreme based infrastructure and that number is going up. People are encouraged to connect to both. As for IPv6, we have 9 sessions at the v6 collector. LoNAP, Paul thornton --------------------- http://www.ripe.net/ripe/meetings/ripe-49/presentations/ripe49-eix-lonap.pdf Entirely based on Cisco hardware 3 sites in London Docklands 45 members, 250 Mbit/sec traffic total interconnect with xchange point 6 connections on IPv6 8-9 multicast Core switch fabric upgraded to new 6509s 10 GigE connections coming very soon. Investigating other locations. MIX - Milan Internet Exchange, Raffaele D'Albenzino --------------------------------------------------- http://www.ripe.net/ripe/meetings/ripe-49/presentations/ripe49-eix-mix.pdf 56 members, 14 carriers present in MIX data center Traffic on public peering LAN peaked with 7.7 Gbps in May 2004 Today (Sep '04) 6.2 Gbps New services: - IPv6 operational since April 2004, currently 6 connected members - Multicast will be in production starting from november 2004. - Test Traffic Measuremnet box, also used as NTP server - RIS route collector, rrc10, 16 IPv4 peerings, 6 IPv6 - DNS root name servers: o I-root server replica since december 2003 o K-root server replica since august 2004 Inter provider traffic Analyzer, first presented at RIPE46, new version 1.0 has been released. More info from http://www.mix-it.net/ipta/index.html or ipta at mix-it.net Software Available for euro-ix members at http://www.euro-ix.net/members/resources/tools/mix/ Netnod/Autonomica, Kurtis Lindquist ----------------------------------- Netnod is the company which operates the exchanges, Autonomica is a fully owned daughter which is responsible for services like i.root-servers.net News: - private VLAN service - working on 10GigE, likely to be same price as GigE - new general terms and conditions and ... all SRP now out of production NIX-CZ - Czech Neutral Internet Exchange, Jozef Chomyn ---------------------------------------------- http://www.ripe.net/ripe/meetings/ripe-49/presentations/ripe49-eix-nixcz.pdf 46 members + 4 customers (2 x DE, 1 x SE, 1 x CZ) 8 networks peer on IPv6 Czech academic network plans to upgrade to 10GigE NIX.CZ infrastructure must also be upgraded currently thinking about technology and charging scheme VIX - Vienna Internet Exchange, Christian Panigl ------------------------------------------------ http://www.ripe.net/ripe/meetings/ripe-49/presentations/ripe49-eix-vix.pdf Sept 2004: 80 members (12 on IPv6 VLAN) Summary peaks at 5Gbps Core switches upgraded, intersite trunc upgraded to 2x 10Gig, prepared for more traffic to come. Xchange Point Europe, Keith Mitchell ------------------------------------ http://www.ripe.net/ripe/meetings/ripe-49/presentations/ripe49-eix-xpe.pdf Over 180 customers Fully operational in Frankfurt and Amsterdam Started running European "peering forum" events. Xchange point does not provide connections between the metropolitan areas. Some London network stability issues between Dec '03 & July '04. due to hardware problems with the switches, now resolved, network has been stable with no outages for over 8 weeks. See http://www.xchangepoint.net/news/Recent-outages.html for full history NAP of the Americas, Josh Snowhorn ---------------------------------- http://www.ripe.net/ripe/meetings/ripe-49/presentations/ripe49-eix-nap.pdf NAP of the Americas connects North, South and Middle America It is located in Miami, Florida 18 domestic fiber providers, 12 undersea cables 168 clients, 89 unique ASNs represented in facility Hosts J-root and I-root server anycast instances as well as Verising's "D" gTLD name server. Entering partnership with Linx C. SLAs & Pricing at IXPs: Call for information - Tomas Marsalak NIX.CZ ======================================================================= Tomas Marsalak is starting a research on prices at internet exchanges, prices for transit and SLAs. The study is expected to be completed study by the end of the year, the results will be made public. He requests people to send him information. D. Switching Wish List: Final call for input - Mike Hughes LINX =============================================================== The "switching" wish list http://www.ripe.net/ripe/wg/eix/wishlist-v2.0.html is a work item on the eix working group. The current version has been up on the web for quite some time, Mike asks for final input. Keith Mitchel (Xchange Point Europe) mentions several items he'd like to see added. Mike agrees and will incorporate these in the final list. Mike also proposes to invite vendors to the next euro-ix or eix-wg meeting to give a presentation about what lies beyond 10 GigE. Bill Norton (Equinix) sees two dangers with inviting vendors: 1. Vendors often don't want the competition to see what they're doing, 2. Vendors might send folk from marketing who could promise perfect systems that are far from becoming areality E. Euro-IX Update - Serge Radovcic ================================== http://www.ripe.net/ripe/meetings/ripe-49/presentations/ripe49-eix-euroix.pdf Euro-IX is the European Internet Exchange Association Formed in 2001, euro-ix aims to further develop strengthen and improve the Internet Exchange Community Currently 32 member IXPs in 21 different countries throughout Europe Question: what is Europe for Euro-IX? where does it end? Answer: definition is based on a workable region, larger than the European Union, but not as large as the RIPE region. Will put information up on the euro-ix website soon. F. AMS-IX Membership Survey - Cara Mascini, AMS-IX ================================================== http://www.ripe.net/ripe/meetings/ripe-49/presentations/ripe49-eix-amsix.pdf Cara Mascini presented the results of the membership survey conducted by AMS-IX. To the respondents (~50% of the membership) AMS-IX offers good value for money. The most important reasons to connect are "amount of connected parties" and "cost savings". Question: what definition did you use for "open" peering policy? Answer: we didn't define it; we just listed the various options in the survey, respondents picked what in their view matched best. RIPE NCC Routing Registry Training - Arno Meulenkamp, RIPE NCC ============================================================== http://www.ripe.net/ripe/meetings/ripe-49/presentations/ripe49-eix-rr-training.pdf RIPE NCC organises Trainining Courses throughout the RIPE region on how to use the Routing Registry. Dates and venues of upcoming courses were shown in the slide. Arno asks if Exchange Points would be interested in a condensed version, a short informational session to be presented at any of their regular meetings? If yes, contact him or Rumy Kamis (Training Team Leader) H. KIX Overview and Future Plan - Chang Hun Lee, National Computerization Agency ================================================================================ http://www.ripe.net/ripe/meetings/ripe-49/presentations/ripe49-eix-kix.pdf KIX is the first IX in Korea built in 1995 by NCA and now it is the exchange point for public Internet Traffic. KIX internetconnects non-commercial ISPs and enables these ISPs to connect to transit providers. KIX also connects to other Korean exchanges. ================================== --- end forwarded text From mike at linx.net Wed May 4 12:42:03 2005 From: mike at linx.net (Mike Hughes) Date: Wed, 04 May 2005 11:42:03 +0100 Subject: [eix-wg] IXP Switching Wishlist v3.0 draft Message-ID: Hi all, Here is a current draft of the wishlist, containing all recent feedback and requests. Key changes: * Removed sections on MAC-SPF and MPLS options - these don't seem to be gaining any traction, and don't seem as relevant anymore given the improvement in things such as RSTP. * Added section on VLAN tag space issues - tag rewrite or "virtual bridges" * Added items on 240V AC power, and Environmental monitoring * Added further detail to filtering, mac locking, multicast, and security. Let me know if you have any further comments, otherwise, hope to see you tomorrow between 11.00 and 12.30. Thanks, Mike -- Mike Hughes Chief Technical Officer London Internet Exchange mike at linx.net http://www.linx.net/ "Only one thing in life is certain: init is Process #1" -------------- next part -------------- RIPE European Internet Exchange (EIX) Working Group Internet Exchange Point Switching "Wishlist" Version 3.0 DRAFT Edited by Mike Hughes London Internet Exchange May 2005 ABSTRACT: --------- At the RIPE meeting held in Amsterdam in February 2000, a number of participants agreed that the group should produce a "wishlist" to guide equipment manufacturers when producing boxes aimed at the core switching market. Over the coming months, ideas were collected from the EIXP community to form the basis of this document. In Europe, most Internet Exchange Points use a shared switch fabric to which the participants connect. Organisations then arrange peering via bi-lateral peering agreements. It is not compulsory for all particpants to peer with every other participant (called multi-lateral peering). Once two participants agree to peer, they will set up BGP4 sessions between their routers connected to the Exhcange to exchange routes and traffic. In the majority of cases, the Exchange Point operator does not become involved in the routing of any traffic across the Exchange, they choose to leave this to the participants. For this reason, switched Ethernet has become one of the most common choices for Exchange Point media. The main reasons behind this are: * Cost effectiveness * Simplicity of setup * Can use standard CAT5 wiring - easy to implement and maintain * Interfaces available across a wide range of platforms With the growth of the Internet, more and more traffic is being routed to Internet Exchange points, and the importance of IXPs has grown in line with this, especially in Europe where private peering is less common than North America. The IXP operators feel that having the right tools and features implemented in the equipment they deploy will play an important part of scaling ethernet technology to meet the demands placed upon Exchange Points. This is an informational document to outline the various features which IXPs would like to see implemented in core Ethernet Switching products. SECURITY AND MANAGEMENT FEATURES: --------------------------------- a) Control of dynamic MAC learning ---------------------------------- Currently, switches are provided with two options, either statically configured or dynamically learned forwarding information. Exchange Points like to monitor and control how many MAC addresses are connected to a participant's port. The XP operator generally does not desire ad-hoc extensions connected to their network. The common way of managing this is to enforce a "router-only" or "limited MAC address" rule. This is currently controlled by statically configuring forwarding information, or not controlled, but policed by counting the number of MAC addresses learned on each port, and action taken against offenders. Static configuration of forwarding information is a somewhat inelegant option, as this increases configuration overhead, and decreases flexibility, especially in case of emergencies. We propose a configurable "maximum learning" limit, configurable on a per port basis. In this way, operators can configure participants ports according to their house rules, but retain the flexibility of dynamic learning. The filtering should not impose a performance hit on those ports which are mac-limited. The lockdown should automatically flush when there is a state transition on the interface. There should be multiple levels of locking: * "Forwarding" limit - the maximum number of addresses you will forward for on this port * "Soft" limit - the limit at which you will record syslog events * "Hard" limit - the limit at which you shut down the port (drop link if able) and record a syslog event A hard limit should require manual intervention to reset the locking and bring the port back up. This feature must include a "first in-last out" mechanism in the lockdown facility, to avoid forwarding information for valid addresses being overwritten by addresses in excess of the exchange's house rules. All the above locking, timeout, and reset rules should be configurable by the network operator. b) Disable acting on STP BPDU information ----------------------------------------- Many exchange operators currently deploy Spanning Tree Protocol (STP) in networks which contain redundant links/full meshing. There is however, a danger presented by STP information leaked from a participant's network. The participant may have connected a poorly configured switch/router product, and may be leaking their STP information into that of the exchange. We would wish to see a configurable option to allow STP information to be ignored, and filtered in hardware on "edge ports", on a per port basis. There should also be an option to generate traps or log messages based on transgressions of the policy. c) Wire-speed ACL-type filtering based on L2/L3 header info ----------------------------------------------------------- The ability to look into the layer 2 or 3 header information of a packet, and selectively monitor, or filter, based on certain layer 2 or 3 criteria. This could be done using pattern matching or masking. A common example of an area where this is desirable is to permit frames of only certain ethertypes to enter the network through an edge port. This sort of filtering should be implemented in hardware wherever possible, and not have an effect on the forwarding performance of the system. Where this is not possible, it must be clearly documented. d) ARP/Broadcast snooping and control ------------------------------------- Many exchange points insist on participants using IP addresses they have assigned by the exchange operator. It is desirable for the operator to be able to monitor/restrict "off-net" ARP. As Ethernet is a broadcast medium, broadcast storms have been known to bring exchanges to their knees, affecting the forwarding abilities of both the exchange's switches, and the participants' routers. Monitoring/rate limiting/control of Ethernet broadcast frames is desirable. Most exchanges also forbid the speaking of interior routing protocols across their peering network. Since these take the form of broadcast or multicast frames on ethernet, control would help monitor this type of incidence. Such control should be able to distinguish (through appropriate configuration) between legitimate ARP and genuine broadcast storms. There should be suitable configuration knobs to be able to rate limit, shut down, log exceptions, etc. e) Support for MARP? -------------------- We had looked at seeking support for something such as MARP - "MultiAccess Reachability Protocol". This was defined in an Internet Draft, which allows for detection of failures at L2 across multiple switch hops. However, this has failed to become published as an RFC, as Cisco has asserted IPR over the content of the draft, and would therefore require licencing. This seems to have somewhat killed this promising idea off. f) Policy exception logging --------------------------- In the above paragraphs, we have asked for some policy-based tools. Operators need to know when these policies have been breached. Good logging of policy exceptions need to be implemented: * SNMP-trap * Configurable syslog (i.e. which syslog facility to write to) * RFC3164 compliant syslogging * Syslog over SSL g) Access to management interfaces ---------------------------------- In the past, security of management interfaces on Ethernet switching products as often been lacking. CLI or web interfaces should support authentication using username/password pairs, to avoid the use of "password only" authentication which implies shared passwords. CLI interfaces should also support SSHv2 access, using either username/password pair, or public key authentication. Web interfaces should be HTTPS/SSL enabled, to avoid passwords being passed in the clear over HTTP. Support SCP/SFTP for config copy/upload/download, as well as existing methods (TFTP/FTP). Management interfaces should be able to perform authentication from an external source, such as TACACS, RADIUS or LDAP services, as well as providing locally held accounts (have to be retained for emergencies) All management interfaces, CLI, web and SNMP should be able to benefit from access-list control. The access lists should be able to support variable-length subnet masks. Ability to disable management interfaces on a per-VLAN basis. Many XP operators choose to configure a "management" VLAN, so that all management is done out-of-band of the core peering traffic. It is desirable to have the management interfaces to listen on the management networks only. On devices which are designed to support high bandwidth per-slot, such as high-density GigE or 10GigE, it is preferable to have a 10/100 Mb management port provided on the system, to avoid burning a fast port for management. h) Port mirroring ----------------- It is sometimes necessary to mirror participants' ports, either because a participant is suspected of some inappropriate activity, or to help obtain information to debug a problem. Not all exchange points have staff on site 24x7, and port mirroring may need to be remotely set up, without hands-on intervention on-site. The ability to allow any port to mirror any other port with a similar lower speed within the chassis would allow the operator to connect a traffic collector/analyser device to a monitoring port, and simply configure the switch to mirror a port as desired to monitoring port. i) Statistics and Accounting ---------------------------- As well as implementing defacto SNMP counters/RMON, also consider implementing the following: * Per-VLAN traffic statistics * sFlow export support (via management interface) * Counters based on common ethertypes (IPv4, IPv6, multicast, ARP, etc) SCALABILITY AND RESILIENCE -------------------------- a) Spanning Tree ---------------- Spanning Tree is currently the only cross-platform dynamic solution available to operators of exchange points for dynamically managing multiple redundant links in their architecture. There are a number of problems with Spanning Tree: * Slow convergence - especially in cases of root bridge re-election * Wasteful of reslilent/redundnant resources - redundnant links are switched off - no traffic sharing * Security concerns (highlighted above) As the routes collected at an Exchange Point can be routed all over the world, any routing instability can act like dropping a pebble in a pond, and will spread around the Internet. It's desirable to maintain stable routing sessions across Exchange Point LANs to minimise these routing flaps, because of load it places on routers, and the effects of route dampening penalties. We believe that being able to declare ports as "end-stations" should avoid them being counted in the STP calculation, enable these ports to start forwarding more rapidly, and speed overall STP convergence time. Rapid spanning tree (IEEE 802.1w) should be implemented (http://www.ieee802.org/1/pages/802.1w.html), and results from testing RSTP on certain platforms show that for simple topologies with few redundant links, sub-second failover and reconvegence is achievable with minimal tuning or additional configuration. b) Ring Restoration Protocols ----------------------------- IEEE 802.17 - This is a standards-based version of the technology currently used by Cisco called DPT (Dynamic Packet Transport). This consists of a counter-rotating ring-system, with spacial reuse and "ring wrapping" circuit protection. The Cisco version is currently implemented over SONET/SDH media, however, the standardised version is being designed to be more media agnostic, and the IEEE working group has already elected to provide support for Gigabit Ethernet and 10 Gigabit Ethernet. Proprietary Protocols - There are a number of proprietary ring protocols, such as Extreme's EAPS (published as informational RFC3619), or Foundry's MRP. They are relatively similar in operation, in that they make assumptions about the number of redundant links in a topology (i.e. only one), have a concept of master and transit nodes, use a "heartbeat" sent out by the master, and topology change messages are passed between the nodes to speed network reconvergence (by triggering FDB flushing, and backup port unblocking on the master node). These recovery protocols may become less important as RSTP becomes more graceful. c) Trunking and Link-Aggregation -------------------------------- It's become increasingly common for exchange points to become multiple switch and multiple site based, and many need to deploy link aggregation to handle the volume of interswitch traffic, where it exceeds the maximum speed of a single link. Most equipment implements load-sharing using either round-robin or address-based algorithms. In exchange points, many pieces of equipment will have similar MAC addresses, especially the first and last bytes (corresponding to vendor and slot position on router). This causes significant problems if the load-sharing hash does not use enough significant bits in the frame. If the hash is only based on part of the address, this can result in poor efficiency of load-sharing, and "clutching" of traffic on a single link inside a group. It's prefereable that load-sharing algorithms should consider the whole L2 address, and where possible the L3 header information, when calculating the hash used. Load-sharing of broadcasts and multicast traffic should be implemented. This is because behaviour such as forwarding all broadcast/multicast traffic out of the "primary" port in a trunk have been observed when load-sharing using destination MAC addresses has been implemented. IEEE 803.3ad link-aggregation "LACP" should be implemented. d) Multicast Control and Containment ------------------------------------ Most switches are configured with IGMP snooping for multicast control. However, in an exchange point, with only routers attched, there is no IGMP present, only PIM and MSDP, and all multicast packets are flooded out of all ports. An exchange point, however, is an ideal place for mutlicast peering to happen, inject the traffic once, and it comes out several times (as much as is needed, or in the current situation, as much as isn't needed!). Cisco developed RGMP (Router Group Management Protocol). This is a proprietary technology whereby the router can communicate to the switch which multicast groups it wishes to see. This remains, despite being released as an informational RFC (RFC3488), a vendor specific feature, and a wide range of routing and platforms are present at many exchange points - both in equipment used by the operator, and the participants. These are true multi-vendor environments. Therefore, this is not a workable solution for most exchange points, whose princples are often include "equal treatment" of participants. While it may not solve all potential issues with multicast peering, implementing PIM-SM snooping and pruning within the switches will achieve the traffic containment requirements. Where PIM snooping is available, this should not have a negative effect on the overall forwarding perfomance of the system. Where there is a performance impact, this and it's surrounding caveats shall be clearly documented. e) VLAN tagspace issues/overlapping ----------------------------------- A serious emerging issue is VLAN tag space overlapping/clashing issues. Most metro transport networks can solve this by using q-in-q (tag stacking), however, this doesn't apply to shared networks like Internet Exchanges. Current switches use a 1:1 mapping of 802.1q vlans to bridge groups, which is the way 802.1q was probably intended. This mapping should be loosened if not abandoned - nowadays there are so many ways to egress an ethernet frame from a switch that more and more often we have to resort to 'tricks' to put the right label on the right ethernet packet going out the right interface. This problem is being exacerbated by a number of issues: * Increased use of switch router products (e.g. Cisco 7600) * Use of switches as "channel-banks" - breaking out higher speed router interfaces * Use of metro-ethernet, lan extension or Ethernet over MPLS ("Martini") circuits to connect to the IXP We think there are two (fairly similar) approaches to solving this: * Basic VLAN tag rewrite * Separate the tag from the virtual bridge instance VLAN tag rewrite is, as it's name suggests, being able to rewrite a dot1q tag on a specific interface to a VLAN ID on the switch. This would need to be implemented on both ingress and egress. The other option is complete seperation of VLAN ID from the virtual bridges inside the switch. You assemble a framework where you can place untagged ports, tagged ports, q-in-q tagged ports, mpls endpoints, atm vc's all together in into the same virtual bridge. Effectively a bridge group which can contain any number of these sort of entities. f) Link failure detection ------------------------- Link failure detection should be implemented, and should look like: * UDLD - Uni-Directional Link Detection * LFN - Link Failure Notification This avoids the risk of an ethernet link going "one-way" and fooling the restoration protcols that the link is working, when really it isn't. ENVIRONMENTAL MONITORING ------------------------ There should be reasonable environmental monitoring provided: * Temperature sensors * Fan health sensors * Power supply health sensors There should be exception logging via SNMP trap and syslog (as specified above) of any incidents. It should also be possible to shut down a malfunctioning element in the system (automatic, user configurable, or manual), in order to preserve system health. For example, a power supply failing in a system could cause an instability in the device. If the system could make a decision to shut that power supply down, and assuming a redundant configuration, the switch would then operate in a stable condition until such time that the power supply could be exchanged. PHYSICAL WISHES --------------- IXPs are high-uptime environments. The equipment used in an IXP needs to be able to satify this requirement, in terms of redundancy, and hot-swappable components. * Hot swap of management/switch fabric cards with instantaneous failover to any installed redundancy (not rebooting onto the "backup") * Full-redunancy of PSUs, and hot-swap (i.e. box should run on 50% of PSUs) * Rapid booting and card startup (after all, much functionality is implemented in the ASIC hardware) * GBIC/SFP-optics for flexibility, easy replacement, and maximised port utilisation (freedom to choose SX/LX, etc) * "Coloured" (DWDM/CWDM) GBIC/SFP/Xenpak/XFP (etc, etc) support * Vendor "lockdown" of pluggable interfaces should either not be implemented, or be able to be switched off in configuration. * 220-240V AC power options. Unlike most telco-managed facilities, the carrier-neutral facilities common in Europe do not provide indigenous 48V DC power. Power distribution is done using the regular utility supply voltage in that country - usually ~230V AC in EU countries. * Cable testing functionality in copper ports, and optical power metering in optical ports. ACKNOWLEDGEMENT AND THANKS -------------------------- Thanks are due to the "usual suspects" in the RIPE EIX Working Group, but specifically Christian Panigl, Kurtis Lindqvist, Keith Mitchell, Daniele Arena, and Remco Van Mook, for their contributions to this document. From Christian.Panigl at UniVie.ac.at Wed May 11 16:10:09 2005 From: Christian.Panigl at UniVie.ac.at (Christian Panigl, UniVie/ACOnet/VIX) Date: Wed, 11 May 2005 16:10:09 +0200 Subject: [eix-wg] IXP Switching Wishlist v3.0 draft In-Reply-To: References: Message-ID: <42821241.5050909@UniVie.ac.at> Hi all, firstly, this is a reminder to send your comments / additions for the "v3.0" edition to Mike Hughes (and EIX-WG) until end of this week, as we have agreed at RIPE50 last week. And, secondly, Mike, I'm herewith including my own comments: > on 4.5.2005, Mike Hughes wrote: >Key changes: > >* Removed sections on MAC-SPF and MPLS options - these don't seem to be >gaining any traction, and don't seem as relevant anymore given the >improvement in things such as RSTP. > > I don't think that the MAC-SPF idea is completely obsolete but I agree that our wishlist is probably not the right place to ask for it (-> IETF, IEEE ?) > * Added section on VLAN tag space issues - tag rewrite or "virtual > bridges" > * Added items on 240V AC power, and Environmental monitoring > * Added further detail to filtering, mac locking, multicast, and security. OK, thanks, futher comments: >d) ARP/Broadcast snooping and control >------------------------------------- > please also add "unknown unicast" to this family of traffic where we need configurable monitoring/logging and rate limiting; another nice to have: ARP sponge >g) Access to management interfaces >---------------------------------- > > this section doesn't mention anything regarding physical console ports, but probably everybody is happy with what we are currently getting ? >h) Port mirroring >----------------- > > is specifically (only) asking for "within the chassis", shouldn't we also be interested in port mirroring capabilities across chassis (RSPAN in Cisco jargon) ? >c) Trunking and Link-Aggregation >-------------------------------- > > please add a comment that all port security features should also be applicable to port-groups/trunks/aggregated links. >f) Link failure detection >------------------------- >Link failure detection > and fast loop-detection and suppression mechanisms (independent of STP) >should be implemented, and should look like: > >* UDLD - Uni-Directional Link Detection >* LFN - Link Failure Notification > >This avoids the risk of an ethernet link going "one-way" and fooling the restoration protcols that the link is working, when really it isn't. > > we should (also) point to BFD - "Bi-directional forwarding detection" here: http://www.ietf.org/ids.by.wg/bfd.html >PHYSICAL WISHES >--------------- > > how about also mentioning "hitless SW upgrade" here ? Thanks Mike for editing this document !!! Cheers Christian -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: eix-wishlist-v30-draft.txt URL: