Intel Pentium(TM) Processor Technical Backgrounder
The Pentium(TM) processor is the newest and most powerful member of Intel'sx86 family of microprocessors. While incorporating new features and improvements made possible by advances in semiconductor technology, the Pentium processor is 100% code compatible with previous members of x86 family, preserving the value of user's software investments.
The Pentium processor incorporates a superscalar architecture, improved floating point unit, separate on-chip code and write-back data caches, 64- bit external data bus, and other features designed to provide a platform for high-performance computing.
[size="5"]The State of Processor Design Art
In recent years, developments in the art of semiconductor design and manufacturing have made it possible to produce increasingly more powerful microprocessors in smaller and smaller packages. Chief among these developments has been the decreasing size of components. Microprocessor designers are now working with CMOS (complementary metal-oxide semiconductor) process technology with features of less than a micron (one- millionth of a meter) in size.
The use of sub-micron components allows designers to fit more of them on a chip. The number of transistors in each member of the x86 family has continued to grow, culminating in the Pentium processor, which is implemented in 0.8 micron CMOS technology and has 3.1 million transistors.
The increase in transistors has made it possible to integrate components that were previously external to the CPU (such as math coprocessors and caches) and place them on-board the chip. Placing components on-board decreases the time required to access them and increases performance dramatically. Another way to decrease the distance between components (and therefore increase the speed of communication between them) is to provide multiple levels of metal for interconnection. Intel's current microprocessor technology utilizes a 3 metal layer, the layout of which requires special computer-aided design tools.
The Pentium processor utilizes the latest in microprocessor design technology to provide performance comparable to that of alternative architectures used in scientific and engineering workstations, while maintaining compatibility with the immense installed base of software now available for the x86 family of microprocessors.
[size="5"]Intel's x86 family
The history of the personal computer industry is intimately associated with the history of Intel's x86 chip family. In 1985, Intel introduced the ground-breaking Intel386TM DX CPU, a 32-bit microprocessor that executed 3 to 4 million instructions per second (MIPS). Available in speeds ranging from 16 MHz up to 33 MHz, the 80386 addresses up to 4 gigabytes of physical memory, and up to 64 terabytes of "virtual memory" (a technology borrowed from mainframe computers that allows systems to work with programs and data larger than their actual physical memory.)
The 80386 provided for true, robust multitasking and the ability to create "virtual 8086" systems, each running securely in its own 1-megabyte address space. Like its predecessors, the i386 DX microprocessor spawned a new generation of personal computers, which had the ability to run 32-bit operating systems and ever-more complicated applications, all the while maintaining compatibility with previous members of the x86 family.
In 1989, Intel shipped the Intel486TM DX microprocessor, which incorporated an enhanced 386-compatible core, math coprocessor, cache memory, and cache controller--a total of 1.2 million transistors--all on a single chip. Operating at an initial speed of 25MHz, the Intel486 DX CPU processed up to 20 MIPS. At its current peak speed of 50 MHz, the Intel486 DX CPU processes up to 41 MIPS. By incorporating RISC principles in its CPU core (specifically, instruction pipelining), the Intel486 DX CPU is able to execute most instructions in a single clock cycle. In spite of these powerful new features, the Intel486 DX microprocessor maintains 100% compatibility with previous members of the x86 family, thereby preserving customers' investment in software.
With the 1992 introduction of the Intel486TM DX2 microprocessor, Intel increased the speed of the 486 family by as much as 70 percent. The DX2 family features a technology called "clock doubling," which allows the processor to operate twice as fast internally as externally. The Intel486 DX2 CPU is nevertheless pin-compatible with the Intel486 DX processor. At its current peak speed of 66 MHz, the Intel486 DX2 CPU executes up to 54 MIPS.
The Pentium processor is the next step in Intel's commitment to provide the highest possible performance at the best price, while maintaining compatibility with previous Intel processors.
[size="5"]First Superscalar x86 Compatible Processor
The heart of the Pentium processor is its superscalar design, built around two instruction pipelines, each capable of performing independently. These pipelines allow the Pentium processor to execute two integer instructions in a single clock cycle, nearly doubling the chip's performance relative to a Intel486 chip at equal frequency.
The Pentium processor's pipelines are similar to the single pipeline of the Intel486 CPU, but they have been optimized to provide increased performance. Like the Intel486 CPU's pipeline, the pipelines in the Pentium processor execute integer instructions in 5 stages: Prefetch, Instruction Decode, Address Generate, Execute, and Write Back. When an instruction passes from Prefetch to Instruction Decode, the pipeline is then free to begin another operation.
In many instances, the Pentium processor can issue two instructions at once, one to each of the pipelines, in a process known as " instruction pairing." In this case, the instructions must both be "simple", and the v- pipe always receives the next sequential instruction after the one issued to the u-pipe. Each pipeline has its own ALU (arithmetic logic unit), address generation circuitry, and interface to the data cache.
While the Intel486 microprocessor incorporated a single 8 Kbyte cache, the Pentium processor features two 8K caches, one for instructions and one for data. These caches act as temporary storage places for instructions and data obtained from slower, main memory; when a system uses data, it will likely use it again, and getting it from an on-chip cache is much faster than getting it from main memory.
The Pentium processor's caches are 2-way set-associative caches, an improvement over simpler, direct-mapped designs. They are organized with 32-byte lines, which allows the cache circuitry to search only 2 32-byte lines rather than the entire cache. The use of 32-byte lines (up from 16- byte lines on the 486 DX) is a good match of the Pentium processor's bus width (64 bits) with burst length (4 chunks.)
When the circuitry needs to store instructions or data in a cache that is already filled, it discards the least recently used information (according to an "LRU" algorithm) and replaces it with the information at hand.
The data cache has two interfaces, one to each of the pipelines, which allows it to provide data for two separate operations in a single clock cycle. When data is removed from the data cache (and only then), it is written into main memory, a technique known as write-back caching. Write- back caching provides better performance than simpler write-through caching, in which the processor writes data to the cache and main memory at the same time (though the Pentium processor can be dynamically configured to support write-through caching).
To ensure that the data in the cache and in main memory are consistent with one another (especially a concern with multiprocessor systems), the data cache implements a cache consistency protocol known as MESI. This protocol defines four states, which are assigned to each line of the cache based on actions performed on that line by a CPU. By obeying the rules of the protocol during memory read/writes, the Pentium processor maintains cache consistency and circumvents problems that might be caused by multiple processors using the same data.
The use of separate caches for instructions and data works in conjunction with other elements of the Pentium processor's design to provide increased performance and faster throughput compared to the Intel486 microprocessor. For example, the first stage of the pipeline is Prefetch, during which instructions are obtained from the instruction cache. With a single cache, conflicts might occur between instruction prefetches and data accesses. Providing separate caches for instructions and data precludes such conflicts and allows both operations to take place simultaneously.
The Pentium processor also increases performance by using a small cache known as the Branch Target Buffer (BTB) to provide dynamic branch prediction. When an instruction leads to a branch, the BTB "remembers" the instruction and the address of the branch taken. The BTB uses this information to predict which way the instruction will branch the next time it is used, thereby saving time that would otherwise be required to retrieve the desired branch target. When the BTB makes a correct prediction, the branch is executed without delay, which enhances performance.
The combination of instruction pairing and dynamic branch prediction can speed operations considerably. For example, a single iteration of the classic Sieve of Eratosthenes benchmark requires 6 clock cycles to execute on the Intel486 microprocessor. The same code executes in only 2 clock cycles on the Pentium processor.
[size="5"]Improved Floating Point Unit
The floating point unit in the Pentium processor has been completely redesigned over that in the Intel486 microprocessor. It incorporates an 8- stage pipeline, which can execute one floating point operation every clock cycle. (In some instances, it can execute two floating point operations per clock--when the second instruction is an Exchange.)
The first four stages of the FPU pipeline are the same as that of the integer pipelines. The final four stages consist of a two-stage Floating Point Execute, rounding and writing of the result to the register file, and Error Reporting. The FPU incorporates new algorithms that increase the speed of common operations (such as ADD, MUL, and LOAD) by a factor of 3 times.
The Pentium processor's new architectural features--its superscalar design, separate instruction and data caches, write-back data caching, branch prediction, and redesigned FPU--will enable the development of new applications software, in addition to improving the performance of current applications in a manner that is completely transparent to the end user.
Internally, the Pentium processor uses a 32-bit bus, like that of the Intel486. However, the external data bus to memory is 64-bits wide, doubling the amount of data that may be transferred in a single bus cycle. The Pentium processor supports several types of bus cycles, including burst mode, which loads large (256-bit) portions of data into the data cache in a single bus cycle. The 64-bit data bus allows the Pentium processor to transfer data to and from memory at rates up to 528 Mbyte/sec, a more than 3-fold increase over the peak transfer rate of the 50 MHz Intel486 (160 Mbyte/sec).
Several instructions (such as MOV and ALU operations) have been hardwired into the Pentium processor, which allows them to operate more quickly. In addition, numerous microcode instructions execute more quickly due to the Pentium processor's dual pipelines. Finally, the Pentium processor features an increased page size, which results in less page swapping in larger applications.
The result of the Pentium processor's new architectural features and enhancements to the 486 architecture is performance improvement ranging from 3 to 5 times (5 to 10 times for floating point intensive applications) when compared to a 33 MHz 486 DX and 2.5 times when compared to the 66 MHz Intel486 DX2 CPU.
Such dramatic performance improvements will meet the demands of computing in a number of areas: advanced multitasking operating systems that support graphical user interfaces, such as Windows NT, OS/2, and new Unix implementations; compute-intensive graphics applications such as 3-D modeling, computer-aided design/engineering (CAD/CAE), large-scale financial analysis, high-throughput client/server; handwriting and voice recognition; network applications; virtual reality; electronic mail that combines many of the above areas; and new applications yet to be developed.
The Pentium processor employs a number of techniques to maintain the integrity of the data with which it is working. Error detection is performed on two levels: via parity checking at the external pins; and internally, on the on-chip memory structures (cache, buffers, and microcode ROM.)
For situations where data integrity is especially crucial, the Pentium processor supports Functional Redundancy Checking (FRC). FRC requires the use of two Pentium chips, one acting as the master and the other as the "checker". The two chips run in tandem, and the checker compares its output with that of the master Pentium processor to assure that errors have not occurred. The use of FRC results in an error detection rate that is greater than 99 percent.
The Pentium processor includes a number of built-in features for testing the reliability of the chip. These include: a Built-In Self Test that tests 70% of the Pentium processor's components upon resetting the chip; an implementation of the IEEE 1149.1 standard (Test Access Port and Boundary Scan Architecture), which provides a standard interface for manufacturers to test the external connections to the Pentium processor; and Probe Mode which provides access to the software visible Pentium processor registers for the purpose of determining the current state of the processor.
The Pentium processor also provides performance monitoring features that will make it easier for developers to take fullest advantage of the Pentium processor's superscalar architecture. System developers will be able to monitor the "hit rates" of the instruction and data caches, as well as the length of time the Pentium processor spends waiting for the external bus, which will help in the optimal design of external memory. The ability to measure address generation interlocks and parallelism will help compiler authors develop the most effective methods for instruction scheduling.
So that system developers may design systems with different features for specific applications, the Pentium processor incorporates a System Management Mode (SMM) similar to that of the Intel386 SL architecture. Power management and security features are two areas for which SMM is useful.
[size="5"]High Performance While Maintaining Compatibility
The Pentium processor is a high-performance microprocessor that incorporates the latest state-of-the-art design principles to meet the needs of newly developing areas of applications software, while nevertheless maintaining complete compatibility with the $50 billion installed base of software currently running on members of the x86 family.
Users will experience dramatic performance improvements while running their current software, and can anticipate new applications that take advantage of the Pentium processor's high-performance features.
Intel486, i486, Intel386 and Pentium are trademarks of Intel Corporation.