Reduced instruction set computer

RISC is an acronym that stands for Reduced Instruction Set Computer (or Computing).

RISC is a design philosophy that led from the rapidly increasing speed of the CPU in relation to the memory it accessed. This led to a number of techniques to streamline processing within the CPU, while at the same time attempting to reduce the total number of memory accesses.

More modern terminology refers to these designs as "load-store", for reasons that will become clear below.

Pre-RISC design philosophy

Prior to the 1980's memory tended to be much faster than the CPU. Most chip designs took advantage of this by having little internal working-space, or "registers". Registers are always faster to work with than external memory, but they are also "expensive" in terms of complexity.

With normal external memory being almost as fast as the registers, it made sense to include very few registers, and instead dedicate that space on the CPU to code to work with memory. For instance you would have a command to "add two numbers", but it would come in a version that added the number in two registers, another that added a register to the value in a memory location, and another that added the values in two memory locations. Many designs would take this further with a whole group of such commands that would take one or both of the numbers from a location offset by the value contained in another location, which was a method of speeding up loops.

This sort of design led to lots of op-codes, where an op-code represents one flavour of one command. To improve performance, designers would design in as many of these op-codes as they had room for on the chip, allowing them to optimize every single possible case separately. The chip itself had very little memory, anywhere from 1 to 16 registers, which made it easy to build - op-codes are cheaper than the wiring needed for registers.

The ultimate expression of this sort of design can be seen at two ends of the power spectrum, the 6502 at one end, and the VAX at the other. The $25 single-chip 6502 effectively had only a single register, and by careful tuning of the memory interface it was still able to outperform designs running at much higher speeds (like the 4MHz Zilog Z80). The VAX was a minicomputer that filled the better part of a room and was notable for the amazing variety of memory access styles it supported, and the fact that every one of them was available for every command.

RISC design philosophy

As the speed of the CPU grew, the performance benefit to running commands on registers, as opposed to memory, grew rapidly. Even in the late 1970's it was apparent that this disparity was going to continue to grow for at least the next decade, by which time the CPU would be tens to hundreds of times faster than the memory it was talking to.

Meanwhile research into existing systems showed that most compilers actually didn't bother using the vast majority of the average CPU's advanced features. All of that logic on the chip the designers built in was going to waste.

RISC attempted to take advantage of these issues by designing processors with many more registers, and many fewer commands. Instead of the three "add two numbers" commands with different modes as in our example above, a RISC design had only one, which added the numbers in two registers. Instead of a single complex command that it might not even know about, the compiler would instead write a series of instructions to load the numbers into registers, add them, and then store out the result.

It was the small number of op-codes that resulted in the term Reduced Instruction Set. This is not an accurate terminology, as modern RISC designs often have huge op-code sets of their own. The real difference is the philosophy of doing everything in registers and loading and saving the data to and from them. This is why the design is more properly referred to as load-store. Over time the older design technique became known as Complex Instruction Set Computer, or CISC, although this was largely to give them an easy name to compare them.

With a simpler core there was much more room on the CPU to add more registers - typical RISC designs have 128 or more. If the compiler can write code that avoids saving out intermediate values to the main memory and instead moves them from register to register, the entire process can be completed very much faster even though it looks more complex and requires more instructions.

Since the RISC design needs considerably less chip space dedicated to logic and op-codes, given any particular transistor budget (the basic cost factor of a CPU) a RISC chip will tend to have larger caches and better pipelines, leading to much better performance for the same price chip. As the disparity between CPU and memory speed grows, the price/performance of RISC increases.

The long and short of it is that for any given level of general performance, a RISC chip will typically have many fewer transistors dedicated to the core logic. This allows the designers considerable flexibility, they can, for instance:

build the chips on older lines, which would otherwise go unused
add other functionality like I/O and timers for microcontrollers
add vector (SIMD) processors like AltiVec
add huge caches
do nothing, offer the chip for low-power or size-limited applications

Meanwhile since the basic design is simpler, development costs are lower. In theory this meant that RISC developers could easily afford to develop chips with similar power to the most advanced CISC designs, but do so for a fraction of the development cost and produce them on older fabs. After a few generations, CISC would simply not be able to keep up.

Features which are generally found in RISC designs are uniform instruction encoding (e.g. the op-code is always in the same bit positions in each instruction which is always one word long), which allows faster decoding; a homogenous register set, allowing any register to be used in any context and simplifying compiler design (although there are almost always separate integer and floating point register files); simple addressing modes with more complex modes replaced by sequences of simple arithmetic instructions; few data types supported in hardware (for example, some CISC machines had instructions for dealing with byte strings; such instructions are unlikely to be found on a RISC machine).

RISC designs are also more likely to feature a Harvard memory model, where the instruction stream and the data stream are conceptually separated; this means that modifying the addresses where code is held might not have any effect on the instructions executed by the processor (because the CPU has a separate instruction and data cache) at least until a special synchronisation instruction is issued. On the upside it allows both caches to be accessed separately, which can often improve performance.

Many of these early RISC designs also shared a not-so-nice feature, the "branch delay slot." A branch delay slot is an instruction space immediately following a jump or branch. The instruction in this space is executed whether or not the branch is taken (in other words the effect of the branch is delayed). This instruction keeps the ALU of the CPU busy for the extra time normally needed to perform a branch. Nowadays the branch delay slot is considered an unfortunate side effect of a particular strategy for implementing some RISC designs, and modern RISC designs generally do away with it (such as PowerPC as more recent versions of SPARC and MIPS).

Early RISC

The first system that would be today known as RISC wasn't at the time, it was the CDC 6600 supercomputer, designed in 1964 by Seymour Cray. At the time memory performance wasn't a specific problem as it was in the 1980's, but I/O in general was consuming much of the CPU's time. Cray's solution was to use a simple but very highly-tuned CPU (with 74 op-codes, compare with a 8086's 400) and a series of specialized controllers to handle I/O. This may not sound like the system outlined above, but in fact if you consider the I/O processors to be the equivalent of cache, the overall design is largely identical.

The next system to use the RISC philosophy was a project at IBM called the 801 which started in 1975. This led to the IBM 801 CPU family which was used widely inside IBM hardware. The 801 was eventually produced in a single-chip form as the ROMP in 1981, which stood for Research (Office Products Division) Mini Processor. As the name implies this CPU was designed for "mini" tasks, and when IBM released the IBM PC/RT based on the design in 1986, the performance was not acceptable.

Nevertheless the 801 inspired several research projects, including new ones at IBM that would eventually lead to their POWER system. The most public, however, were Berkeley's RISC I and RISC II starting in 1980 under the direction of David Patterson, and Stanford University's MIPS from 1981 led by John Hennessy. Both were run with funding from the DARPA VLSI Program, which led to a huge number of advances in chip design.

Berkeley's research was so successful that the entire design philosophy took on their name, and their design would spawn a number of commercial CPU's, including the Pyramid minicomputers and the SPARC. Stanford's design did even better in a way; a number of team members later left the university to form MIPS Technologies Inc., built the R2000 in 1986, and went on to become one of the most used CPU's in the world.

Modern RISC

Starting in 1986 all of the RISC research projects started delivering products. John Hennessy left Stanford to commercialize the MIPS design, stating the company also known as MIPS Technologies Inc. Their first design was the 2nd generation MIPS design, known as the R2000. Berkeley's research was not directly commerciallized, but the RISC-II design was used by Sun to develop the SPARC, and by Pyramid to develop their line of mid-range multi-proccessor machines. IBM learned from the PC/RT failure and would go on to design the RS/6000 based on the new POWER architechture.

Today RISC CPU's (and microcontrollers) represent the vast majority of all CPU's in use. This is suprising in view of the domination of the Intel x86 in the desktop PC market and the commodity server market. The reason is that the simple design of RISC chips allows them to be built at many different scales, from low-power tiny chips found in the fuel injectors of cars, to the huge CPU's of the largest mainframes and supercomputers.

Embedded CPUs are by far the most common market for processors: consider that a family with one or two PCs may own several dozen devices with embedded processors.

Apart from the PC architecture, CISC has been pushed into tiny niche markets, largely due to being able to be build on very old fabs that cost effectively zero to run.

This is largely unknown to the desktop computing public however. There Intel was able to counteract all of the advantages of RISC by simply applying massive amounts of cash. If it costs ten times as much to double performance of their CPU, no matter, they have ten times the cash. In fact they have more, and Intel's CPU's continue to make great (and to many, surprising) strides in performance.

RISC designs have led to a number of successful platforms and architechtures, some of the larger ones being:

MIPS's MIPS line, found in most SGI computers and the Nintendo 64
IBM's POWER series, used in all of their mini's and mainframes
Motorola and IBM's PowerPC (a version of POWER) used in all Apple Macintosh computers
Sun's SPARC and UltraSPARC, found in all of their machines
Hewlett-Packard's PA-RISC HP/PA
DEC Alpha
ARM

See also: