The QPI connections between sockets is a dual-channel 12.8Gbps channel for a total performance of 25.6Gbps
The PCIe express bandwidth of a socket is 40x 1Gbps per lane or 40 Gbps of PCIe bandwidth to the socket.
Problems quickly arise when PCIe bandwidth exceeds 25.6Gbps and the process requesting access to the PCIe bus is not on the socket with the bus where the access is being requested. Some of the workarounds attempted would lock processes on sockets with the PCIe bus that needs to be read or written. But it did not work for all applications. For example, those with data coming in and going out of multiple locations such as a striped file system are affected because you cannot break the request and move each request to each PCIe bus.
The real-world performance for general purpose applications running on a four-socket system is likely an estimated 90 percent of the QPI bandwidth between sockets (or 23Gbps) unless the data goes out on the socket with the PCIe bus. Every fourth I/O, if they are equality distributed, will run at 40Gbps, so the average performance would be (3x23Gbps +40Gbps)/4 or an average performance of about 27.25Gbps per socket for a quad-socket system.
This is, of course, the average based on equal distribution of the processes and I/O to the PCIe bus. A process that has PCIe processor affinity will significantly improve that average, but it is often difficult to architect and meet the requirements of putting every task on a PCIe bus and ensuring that the process runs on the CPU with that bus. The probability of this limitation is higher with a quad-socket system than with a dual-socket system.
The diagram below shows an example of a dual-socket system that, though having the same issues, reduces the potential of hitting that architectural limitation.
My estimate for performance for a dual-socket system is (23Gbps +40Gbps) or average socket performance of 31.5Gbps. On a dual-socket system it is much easier to architect the system so that you can put the right I/O on the right CPU and achieve near-peak performance.
CPU Conclusions Are Counter-Intuitive
New Intel systems have far more I/O bandwidth than previous systems and they have more than anything available from AMD. ARM is not currently competitive if you need to move lots of data in and out of the system.
The current Intel line quad-socket systems will average about 27.25Gbps unless significant work is done to architect the system to connect with processors and PCIe buses. The IOPS performance of the system will, of course, be higher as IOPS is not impacted by QPI bandwidth limitation.
The dual-socket systems are easier to get higher performance, and the average system performance is over 4.25Gbps. So my conclusion is you are better off using dual-socket systems for high I/O bandwidth requirements versus a quad socket. This, of course, is clearly counterintuitive, but is the best strategy given the current Intel architecture.
Sign up for CIO Asia eNewsletters.