Right now, the market for GPUs for use in machine learning is essentially a market of one: Nvidia.
AMD, the only other major discrete GPU vendor of consequence, holds around 30 percent of the market for total GPU sales compared to Nvidia’s 70 percent. For machine-learning work, though, Nvidia’s lead is near-total. Not just because all the major clouds with GPU support are overwhelmingly Nvidia-powered, but because the GPU middleware used in machine learning is by and large Nvidia’s own CUDA.
AMD has long had plans to fight back. It’s been prepping hardware that can compete with Nividia on performance and price, but it’s also ginning up a platform for vendor-neutral GPU programming resources — a way for developers to freely choose AMD when putting together a GPU-powered solution without worrying about software support.
AMD recently announced its next steps toward those goals. First is a new GPU product, the Radeon Vega, based on a new though previously unveiled GPU architecture. Second is a revised release of the open source software platform, ROCm, a software layer that allows machine-learning frameworks and other applications to leverage multiple GPUs.
Both pieces, the hardware and the software, matter equally. Both need to be in place for AMD to fight back.
AMD’s new star GPU performer: Vega
AMD has long focused on delivering the biggest bang for the buck, whether by way of CPUs or GPUs (or long-rumored combinations of the two). Vega, the new GPU line, is not simply meant to be a most cost-conscious alternative to the likes of Nvidia’s Pascal series. It’s meant to beat Pascal outright.
Some preliminary benchmarks released by AMD, as dissected by Hassan Mujtaba at WCCFTech, shows a Radeon Vega Frontier Edition (a professional-grade edition of the GPU) beating the Nvidia Tesla P100 on the DeepBench benchmark by a factor of somewhere between 1.38 and 1.51, depending on which version of Nvidia’s drivers were in use.
Benchmarks are always worth taking with a jumbo-sized grain of salt, but even that much of an improvement is still impressive. What matters is at what price AMD can deliver that kind of improvement. A Tesla P100 retails for approximately $13,000, and no list price has been set yet for the Vega Frontier. Still, even offering the Vega at the same price as the competition is tempting, and falls in line with AMD’s general business approach.
AMD’s answer to CUDA: ROCm-roll
What matters even more for AMD to get a leg up, though, is not beating Nvidia on price, but ensuring its hardware is supported at least as well as Nvidia’s for common machine-learning applications.
Sign up for CIO Asia eNewsletters.