Enterprises are increasingly virtualizing IT infrastructure by migrating storage, application, and database servers into cloud/hosted datacenters. As they do so, they need to partner with ISPs and service providers to establish reliable, performance assured, bandwidth optimized connections between each enterprise and data center location.
In many cases, enterprises aren't getting what they paid for - often not even close. It's common to measure actual throughput at 25% of purchased link capacity. Bandwidth Performance Optimization (BPO) techniques originally developed for latency-sensitive applications like financial trading networks, 4G mobile backhaul assurance and network-to-network interconnect can recover the missing 75% for less than the price of a basic server. Too good to be true?
Bandwidth profiles used for large enterprise connections are normally based on the Metro Ethernet Forum (MEF) service definition. These services conform to a bandwidth profile with a Committed and Excess Information Rate (CIR, guaranteed bandwidth; EIR, best effort bandwidth).
These bandwidth 'envelopes' are policed at the service edge using regulators: any traffic exceeding these predetermined thresholds is dropped, resulting in random packet discard that has no preference to low or high priority traffic. This "crush the edge" technique is effective in preventing bursts of client traffic from entering the providers' network, and is easy to implement.
Open the Window
Traffic is predominantly transmitted using TCP. The TCP protocol requires that every frame is accounted for, with a receipt acknowledgement required to confirm transmission success. However, if the sender waited for each individual packet to be acknowledged before the next packet was sent, throughput would be greatly impacted, especially over large area connections.
TCP handles this problem with transmission 'windows' - a collection of frames sent together with the expectation that they will all arrive without loss. The size of TCP transmission windows sent adapts to the success of previous windows. If a packet is lost in a window, all packets after the lost packet are retransmitted, and the window size is reduced by roughly half. When windows are successfully received, the window length slowly increases at first, then more rapidly with continued error-free transmission.
If packets are regularly lost, the window length will never increase to the size required to achieve full link utilization. The mismatch between port (media) speed and the CIR of a link ensure that this issue is ubiquitous: and means usable throughput is impacted by up to 75% in most cases.
Bring Your Own Optimization
WAN-Op methods are largely ineffective in these applications. Any techniques that use compression and caching don't work without a far-end appliance to 'unwrap' the traffic - and in this case the far-end is a data center owned by someone else.
Sign up for CIO Asia eNewsletters.