Subscribe / Unsubscribe Enewsletters | Login | Register

Pencil Banner

ExtraHop mines the network to glean operations intelligence

John Dix | Oct. 7, 2013
Rothstein, founder of Extrahop, updates us on the company and what he has learned about things about virtual loss.

And in a virtual environment, you can't measure performance by looking at resource utilization --  how much CPU it takes or how much memory is required. Resource utilization is not the same thing as performance, it's not the same thing as response time. In fact, in the virtual environment, we derive greater efficiency and cost savings by not having as much headroom and by utilizing that CPU and memory resources more efficiently. You actually want the CPU of your physical host to be highly utilized, but you don't want it to be under provisioned. And that's the balance.  

A great example of additional complexity in these environments is something we call virtual packet loss. Hypervisors are basically schedulers. They have to share resources across multiple guest machines and packets can get delayed, sometimes delayed enough that they're considered lost by the underlying network stack. Now TCP is very resilient to loss. If loss occurs on the network TCP will retransmit, so you might see extra packets on the network, and that might affect your throughput, but it doesn't necessarily affect your performance.  

However, a sufficient amount of loss can actually be devastating to performance. It can cause 1-2 second stalls in the application flows. So what's particularly insidious about virtual packet loss is, if due to the hypervisor scheduling, packets are delayed to the point where they're considered lost and this loss causes poor performance, it's really hard to track down. Because if you query your switches and routers using traditional means, and say, "How many frames did you drop?" The answer is none. No frames were dropped, they were just delayed. So a system like ours that's analyzing the wire data in real time can detect when this occurs, and it's essentially a provisioning problem, where certain physical hosts were under provisioned for the load that was on them. But it's an example of complexity and some challenges that exist in these virtual environments that we simply don't see in physical environments.

Another similar example, but perhaps higher level, is around things like auto-discovery. I don't really want to get into a debate about whether or not configuration management databases should exist anymore or how some organizations use them and some don't, but I think we can all say that virtual environments are much more dynamic. VMs spin up and they spin down. IP addresses are reused. It's a lot more difficult to figure out exactly what's running, who's using it, what the dependencies are, where stuff is located.  

And that means the level of auto-discovery required in our virtual environments is far beyond the more static asset management that we could get away with in physical environments. Our customers have used our auto-discovery capabilities for a variety of things -- to find dependencies that maybe they inherited in systems that weren't well understood, to manage things like data center consolidations where they need to understand the dependencies, application-level dependencies in order to migrate an application to a new data center. Just yesterday a customer told me about a troubleshooting scenario that I hadn't heard before. A system that was mislabeled as a test system went down, and they didn't know what it was doing. So they used the ExtraHop system to go back in time a bit and say "OK, what was this system doing and who was talking to it?" It sounds bad, but in large complex environments things like that can occur.

 

Previous Page  1  2  3  4  5  6  Next Page 

Sign up for CIO Asia eNewsletters.