Skip to main content

Using Real-Time Measurements in Support of Real-Time Network Management

James L. Alberi, Ta Chen, Sumit Khurana, Allen Mcintosh, Marc Pucci, Ravichander Vaidyanathan
Telcordia Technologies, Inc.

I.INTRODUCTION

Increased reliability is necessary if the Internet is to carry information such as voice, video and other enhanced services. Congestion in the network because of the statistical nature of packet forwarding is a serious issue that could impede achieving the reliable, timely delivery of data and the desired quality of service. At present humans monitor the network for congestion with a variety of data collection mechanisms and take corrective actions on an ad-hoc basis. Putting humans in the control loop yields corrective actions that are too slow because of delays in collecting data and too error prone because of the complexity of the network. This paper presents Rondo, an automated control system that manages congestion in near real time in core networks. The Rondo system is composed of three subsystems, a data collection system, a rebalancing algorithm and an element management system. Congestion in the network causes undesirable effects on three basic parameters that are often components of quality of service, namely delay, loss and jitter. Rondo reallocates data flows over network links to optimize an objective function that will be discussed in more detail in the full paper. We will also detail relationship to other work in the full paper.

II.DATA COLLECTION

The Rondo data-collection subsystem is near real-time rather than hard real-time because it does not guarantee scheduling of tasks, which would be difficult because of its scale and distributed architecture. The architecture is key to the rapid collection and processing of data. With its basic abstractions of data streams and stream policies, the data subsystem has the capacity to perform preliminary computations necessary for network control close to the point of data collection, e.g. trending or threshold detection. The architecture also encompasses a diverse range of data sources including router-resident data, RMON, RTR, and customized active probes. Rondo converts the data from each source to a data stream that is a specialization of a generic data stream. The collection system supports solicited reading of the data stream and event notification to other systems on specified conditions.

III.REBALANCING ALGORITHM

A primary goal of Rondo is a near real-time response to network congestion. Rapid response implies that a rebalancing algorithm be simple, efficient and effective. This requirement conflicts with choosing an algorithm that optimally allocates bandwidth and utilization over a large network. As a result, the architecture of Rondo's rebalancing mechanism lends itself to a whole spectrum of algorithms from a simple first-fit strategy to full off-line optimization. Different algorithms can be inserted depending on the needs and situations. One example yielding good results is a constrained shortest path algorithm modified from Dijkstra's algorithm that considers bandwidth demands, current link utilizations and the other factors.

IV.ELEMENT MANAGEMENT

Multiprotocol Label Switching (MPLS) facilitates the automated control of a network although other mechanisms might be possible. MPLS establishes routes through a core network that are under complete control of the management system. Both the hop-by-hop route and the class of service are under control of the Rondo system. We will detail how our system implements this control in the full paper.

V.EXPERIMENTAL NETWORK

Our experimental network models the core network of a national service provider. The network has ten routers interconnected in a mesh, each one representing a large POP. Varying network loadings are generated with a combination of hardware and software load generators. Data will be presented on the effectiveness and dynamic stability of the control system under the generated loads.

VI.ISSUES

Various issues need further exploration in automated network control. Among these are the dynamic stability of the control system under increasing scale, the interaction of network failure with automatic control, the effect of traffic shaping and admission control.