Scientists at the University of Texas at Austin are pioneering a completely new microprocessor architecture aimed at tackling one of the significant challenges faced by chip designers. If successful, their effort, funded by the U.S. Department of Defense, will bring forth processors with unprecedented performance and flexibility.
For decades, the density of transistors on a chip has doubled approximately every two years, resulting in processors that are smaller and more powerful. Advanced technologies such as demand prediction and speculative execution are emerging, which help chip designers increase processing speeds or enable them to execute multiple instructions simultaneously. However, as chips become more complex, the heat generated during operation signals that designers have reached the limits of this approach. They are shifting toward designing multi-core chips.
However, according to Professor Doug Burger from the University of Texas, the problem lies in the fact that for software to utilize this multi-core structure, programmers must write code that allows the processing tasks to be divided into multiple parts and assigned to different cores. For many applications, this is either impossible or very challenging to achieve. “The field of computer science is hitting a programming wall, throwing the ball back to the software side, hoping that programmers will develop applications for the new machine system,” he noted.
Professor Burger and his colleagues hope to address these issues with a new type of chip and instruction processing architecture called Trips (Tera-op Reliable Intelligently Adaptive Processing System). “Our goal is to exploit the capability for parallel execution, regardless of whether the programmer incorporates it into the application or not,” he stated.
Trips employs several techniques to achieve this. First, the Trips compiler will send executable code in blocks containing up to 128 instructions. The processor will receive and execute the entire block simultaneously, as if it were a single instruction, thereby alleviating the burden of instruction processing or prioritization.
Secondly, the instructions within a block are executed in a “data flow” manner, meaning each instruction is processed immediately upon arrival, rather than in the order defined by the programmer.
Another technique: Within the same block, the Trips compiler can merge two instructions from different lines into one instruction if they share the same target and execution method.
Finally, the data flow execution is made possible by the “direct target execution” technique, where the result of one instruction is passed directly to the instruction that requires that result, rather than being temporarily stored in a register file as is currently done. This will significantly reduce the workload on the chip and boost processing speed.
In comparison to previous improvements aimed at increasing processing speed, the aforementioned techniques do not require the chip to generate as much heat as before and consume less power.
Engineer Mark McDermott, formerly with Intel and now Vice President of Coherent Logix in Austin, commented: “Just look at chips like the Pentium, and you’ll see many transistors that serve no purpose – they are just there consuming power. The Trips generation chip is trying to incorporate those complex architectures into the compiler.” However, McDermott added that it is still uncertain what the future holds for the Trips chip, as scientists need to research many other parameters.
According to the scientists developing Trips, the data flow technique works well with three types of concurrent instructions in software and can function effectively for various applications: scientific, commercial, and embedded software. This is why the U.S. Department of Defense has invested $15.4 million in this project, hoping to develop a chip capable of processing 1 trillion instructions per second.
The University of Texas is set to hand over the Trips chip design to IBM, which will manufacture prototype chips to be delivered by February next year. The chips will feature two cores, operating at a speed of 500MHz, and will execute 16 billion instructions per second. The university aims to commercialize this technology and reach a milestone of 10 GHz chips capable of processing 1 trillion instructions by 2012.
One of the significant challenges for this project is compatibility with existing software. One solution is to use Trips as a parallel chip alongside older chip types.