Sunday, September 27, 2009

Understanding TCP Incast Throughput Collapse in Datacenter Networks


Incast has been observed in distributed storage, MapReduce and web-search workloads. Many solutions were proposed but they either did not help like eliminating slow start, reduce RTO, or it is expensive like increasing switch's and router's buffer size, or application level solution which in turn requires modification in the application that use TCP.
The paper tries to describe the problem of Incast. It used a workload of distributed storage applications where the receiver requests a number of blocks from number of servers that response either with a fixed and variable size of fragments. Each block is striped across the servers and the next request has to wait until the received data from all senders.

The authors demonstrated that Incast pattern is general by replicating it in their simple test bed of 1Gbps with all servers are connected through single-layer switching such that the Incast phenomena can be observed easily. Different behaviors than that in other works have been observed.
They attempted to understand this phenomenon in both fixed and variable size fragement workload by
· Reducing and randomizing the minimum and/or initial RTO value and setting a smaller and randomized multiplier for the RTO exponential back off. However, they did not help to improve the goodput since the servers share the same switch buffer
· Turning out RTO time resolution's and delayed ACK's impacts where low time resolution and delayed ACK not disabled offer optimal result for fixed size fragment workload

No comments:

Post a Comment