Re: Multicast forwarding loop at mbone-cern.switch.ch
- Date: Fri, 17 Apr 1998 10:50:11 +0200
Hi Bill,
> There's a pretty bad forwarding loop at mbone-cern.switch.ch:
>
> Source Response Dest Overall Packet Statistics For Traffic From
> 130.59.4.22 224.0.1.32 Packet 130.59.4.22 To 224.2.200.165
> | __/ rtt 795 ms Rate Lost/Sent = Pct Rate
> v / hop 0 ms ------- ---------------------
> 130.59.4.22 nuptse.switch.ch
> | ^ ttl 17 1 pps 0/2 = -- 0 pps
> v | hop 0 ms 1 pps 0/4 = -- 0 pps
> 130.59.20.50 mbone-zh.switch.ch
> | ^ ttl 34 1 pps 0/2 = -- 0 pps
> v | hop 0 ms 1 pps -157/4 = -- 0 pps
> 192.65.185.100 mbone-cern.switch.ch
>
> This is probably the Solaris+IOS bug described at
> http://sandbox.parc.xerox.com/fenner/mcast/flooding.html .
>
> The TTL's I'm observing appear to be symptomatic of multiple interacting
> loops. I don't know the topology well enough to say who's misbehaving
> for sure, but if either all of mbone-cern.switch.ch's IOS neighbors
> are upgraded to a version of IOS that implements "DVMRP conditional
> flooding", or if mbone-cern gets the fixed multicast kernel patches,
> the problem should go away.
first of all, I cannot see the problem right now, the mtraces I do
from here look pretty normal. However, we are well aware of
instabilities here, but were unable so far to track them down. We have
an open call #5992349 with SunSupport in this matter. What we see is
that mbone-cern.switch.ch (and to some extent our neighbour-mrouted's
as well) sometimes freeze for several minutes, they do no longer
respond even on the console. Network traffic is always very high during these
periods. In some rare cases the machine crashes and from a kernel's crash
dump we know that one of the internal buffers had grown to 40MB.
Our Multicast environment:
Some of our neighbours use Ciscos with IOS 11.1. It will be hard to
convince them to upgrade, as they really want to use netflow...
However, I only see one loop involving a Cisco, and that
one runs a 11.3 release (am-mr.att-unisource.net).
This is our Mbone setup in/near Switzerland:
v
|
ams-mr.att-unisource.net(11.3)
| . kom26-e.ethz.ch
| . | nuptse.switch.ch === LAN
ch-ws.ten-34.net . | |
| | | | . mbone-zh.switch.ch
| | v | . / | \_
| | mbone-cern.switch.ch | \
| | . | | | \ | (swima8.switch.ch)
| | . | | | \ | .
| | . | | | \-- mbone-ls.switch.ch ....
| | . | | | |
| | . | | | |
| | . | | | ro-ext-mbone.epfl.ch(11.1)
| astra.infn.it | | |
| | | | rtr-unige.gva-man.ch(11.1)
mbone.aco.net | | |
| (11.2) | | ccpnmed.in2p3.fr
| | b513-b-rci47-1-gb0.cern.ch(11.1)
| |
v v
---- tunnel
.... tunnel intentionally configured as backup with higher metric
() mrouter currently down
(11.x) IOS release (from mrinfo); the rest are mrouteds...
Questions/how can we proceed now ?
o patching the Solaris kernels: we use 2.6 everywhere. Is there a
patch available somewhere ???
o we could temporarily remove the redundant tunnels if this might help
o what impact does this problem have on the global Mbone (How did you
get aware of the problem ?)
Thanks for your help, Felix
BTW: I hope the Shuttle mission did not have to be postponed due to
this Mbone problem :-)
-----------------------------------------------------------------------------
Felix Kugler | email: kugler@localhost
SWITCH - The Swiss Academic and Research Network | Phone: +41 1 268 15 30
Smail: Limmatquai 138, 8001 Zurich, Switzerland | FAX: +41 1 268 15 68
|