The progress of high performance computing can seem dizzying to outsiders. Yet, some argue that the rate of progress has been slipping in recent years. As supply chains grow more complex, primary building materials become more difficult to source, the R&D process lengthens, and corporate interests lead the way, the development of general-purpose technology has faltered. However, two pieces of recent research by researchers from Google and Rice University signal that the plateau in progress may just be a bottleneck to overcome.
Could a future defined by the coevolution of hardware and software bring general-purpose technology back into focus and once again quicken the pace of progress?
Gordon Moore’s 1965 forecast predicted that the number of transistors per chip would double about every two years as transistors became smaller, exponentially boosting computational power with reliable regularity. Since then, Moore’s prediction has become the self-fulfilling prophecy known as Moore’s Law. But we are nearing (or some argue, have already arrived at) a point of predicted slow-down, where maintaining this rate of progress has become impossibly costly.
Moore’s prediction was based on economics. His theory of technological progress was grounded in a strikingly simple function in which the more transistors you add to a chip, the cheaper each one becomes. Moore’s Law underpins the development of many technologies that have come to shape our world — from smartphones to artificial intelligence (AI). Today, however, designing and building new chips is becoming ever more expensive and challenging.
As the MIT Tech Review reports, the research effort that goes into upholding Moore’s Law has risen by a factor of 18 since 1971, and the cost of a “fab” (the R&D facilities in which chips are made) is rising at around 13% a year, and is expected to reach $16 billion or more by 2022. As a result, the number of companies building the next generation of chips has fallen from twenty-five in 2002, to eight in 2010, down to just three today.
Google is one of the select few companies with enough resources to make chips in-house. The company is focused on building the next generation of AI hardware. A recent paper published by Google researchers in Nature last month outlines the development of a reinforcement learning (RL) technique for automating part of the chip-building process, known as floorplanning, for their tensor processing unit (TPU) chip.
Chip floorplanning has traditionally been a highly manual and time-consuming process. By applying their RL algorithm, the Google team has demonstrated that they can produce layouts for chips in less than 6 hours — a job that takes human teams several months to complete. The AI’s results have been deemed comparable or superior to those produced by humans in terms of power consumption, performance, and chip density.
The RL technique used by the Google team was inspired by the approach used to train AlphaGo — the first ever computer program to beat a top-level human player at the complex board game of Go back in 2016. Google’s RL technique learns from experience to become both better and faster at chip placement. The algorithm essentially approaches a blank chip canvas as a jigsaw puzzle with millions of pieces and many potential solutions. It optimizes the specific parameters deemed important by engineers (such as power consumption, timing, area, and wirelength), while placing the various components and connections between them accurately.
Google’s RL technique treats chip floorplanning as a complex optimization problem. Initially, the algorithm developed highly random solutions. The data were fed into a separate algorithm to test for accuracy and efficiency, and then these results were fed back into the original RL algorithm. This process enables the AI to learn from its past attempts in order to devise more effective strategies for chip placement.
Google believes their RL approach could be applied to other parts of the chip design process, reducing the time it takes to design a chip from years to days.
The technique could even be used for other impactful placement problems beyond chip design, such as city planning or vaccine distribution. As such, the implications of this research could extend beyond the company’s own immediate interests.
It’s possible that this research could also unlock opportunities for co-optimization in technology. The approach could be used as a model to accelerate or even completely automate the process of chip design for general-purpose hardware. Faster, cheaper hardware rollout could enable more responsive development in parallel with rapidly evolving algorithm research.
Breaking the AI bottleneck with CPUs
While Google has demonstrated they can use an algorithm to optimize their hardware, Rice University and Intel researchers have focused on optimizing the algorithm for a specific hardware. A team of researchers led by Anshumali Shrivastava, Associate Professor of computer science at Rice’s Brown School of Engineering, has optimized AI training on commodity hardware.
The team demonstrated that AI algorithms run on general-purpose central processing units (CPUs) can train deep neural networks 15 times faster than platforms based on graphics processing units (GPUs).
The team’s aim is to break the bottleneck they identify in AI training: cost. Most deep neural network (DNN) training is done using GPUs adapted from gaming because the algorithms typically operate using a series of matrix multiplication operations — a problem well-suited to GPUs, which can handle parallel operations.
The problem is that GPUs are expensive; they cost about three times more than CPUs. Yet, most researchers have maintained tunnel vision when it comes to improving DNN design, focusing almost exclusively on making the matrix multiplication faster.
As Shrivastava explains, “Everyone is looking at specialized hardware and architectures to push matrix multiplication. People are now even talking about having specialized hardware-software stacks for specific kinds of deep learning. Instead of taking an expensive algorithm and throwing the whole world of system optimization at it, I’m saying, ‘Let’s revisit the algorithm.’”
Taking this approach, Shrivastava and his team recast DNN training as a search problem that could be solved with hash tables instead of matrix multiplication, developing their sub-linear deep learning engine (SLIDE) to run on CPUs. By optimizing the algorithm for commodity hardware, the research team has shown it’s possible to flip the script that leads ever-deeper into expensive, specialized architecture and hardware.
Back on track
Together, these two pieces of research — Google’s use of AI to optimize hardware and Shrivastava’s use of algorithms to make CPUs more powerful — are complementary. These advancements point to a time when researchers will work on both hardware and algorithms seamlessly in parallel.
This will allow large strategic decisions that could re-route the development of high performance computing back onto the general-purpose technology track, and away from the many fast-lane specializations currently forming in industry.
Moore’s Law drove the development of ever-faster and cheaper chips that were universally available. The gradual demise of this prophecy has led to an industry that selectively targets specific problems and business opportunities, where the benefits fall unevenly toward those with the money and resources to set research priorities. These research developments may breathe new life into Moore’s prophecy, and help ensure that the benefits of computational progress continue to benefit a broader segment of society.
Join the Ken Kennedy Institute’s newsletter to stay up to date on the impactful work our researchers like Shrivastava and his team are doing.
Author: ANGELA WILKINS