Tuesday, September 22, 2009

PortLand: A Scalable Fault-Tolerant Layer 2 Data Center Network Fabric

Data center network is organized as a number of the connections of ToR switches to one or more of ToR switches which in turn connected to core switch. It contains huge number of hosts and employ virtual machine multiplexing that results in millions of unique addressable end hosts that makes another challenge to both Layer 2 & 3. So the efficiency, scalability and fault tolerance are all the significant concerns with data center network

The authors presented a PortLand to achieve plug-and-play large-scale network fabric such that the data center network can be treated as a single unified fabric such that:

· It observes the data center network as a multi-rooted tree however, small scale fat tree scheme was considered over a data center network topology with three levels of connected switches; edge, aggregating and core switches.
· It employs centralized manager on a dedicated machine to maintain soft state not hard state
· It employs a LDP protocol to enable switches to discover their position in the topology without administrator configuration. The switches periodically send LDM message out all of their ports.
· It assigns 48 bit PMAC addresses to all end hosts to encode their position in the topology to enable loop-free forwarding and small forwarding tables. Egress switch performs PMAC to AMAC rewriting on the last hop to the destination host and ingress performs AMAC to PMAC rewriting on the first hop from the source host.
· It supports VM migration from one physical machine to another by forwarding an invalidation message to that VM's previous switch which in turn transmit a unicast ARP to any source of transmission destined to the migrated VM's PMAC address.

The efficiency, scalability and fault tolerance of this implementation were evaluated for both unicast and multicast communication using different measurements
· Total time required to re-establish communication of UDP and TCP flow in the presence of failure which is increased slowly with number of failures. Longer time is measured in case of TCP flow compared to that measured for UDP flow
· Scalability of fabric manager for larger topologies by measuring the control traffic that it can handle
· The ability to support VM migration

No comments:

Post a Comment