Many people think processors differ only in speed. They figure a 500 MHz processor from Company A is bound to be faster than a 400 MHz version from Company B, for example. But this is not always the case. While a processors rated speed is a critical factor in determining how fast it performs calculations (and hence, how fast the computer housing the processor runs), there are other important differences in how various processors work their magic internally. These differences impact how fast various microprocessors perform real-world tasks, such as checking a document for spelling mistakes, recalculating the numbers in a spreadsheet, or removing imperfections from a digital photograph. For example, many processors can perform several calculations at once; the technology that supports this technique is called pipelining. In addition, some jump ahead to perform extra calculations they think the running program will ask for, before the program actually does. This is called speculative execution, and it is another one of several very complex operations occurring inside todays processors. In addition, different processors implement these techniques, along with other technology, in different ways, which accounts for many of the differences in overall chip performance, independent of the chips rated speed (in MHz). For example, future versions of the Pentium III (or perhaps IV) family of processors, which have the codenames Foster and Williamette (using product codenames is a common high-tech industry practice) are expected to include new internal refinements to make the chips faster than todays Pentium IIIs. Another critical factor in overall chip performance is how efficient various processor designs are. Processors are capable of performing continuously, and cranking out results as quickly as they are given problems to work on. Ideally, therefore, you want to feed the processor a steady stream of data so it can keep crunching away at maximum speed. The reality, however, is that for a number of different reasons data is not fed into the CPU on a steady, consistent basis. In fact, most processing occurs in fits and starts. This, in turn, slows a computer down. Figure 2-2 shows the principle in action. One of the many ways processor designers attempt to compensate for this is by using the technologies, such as speculative execution, that I mention above. Another even more important way is by adding a special high-speed cache memory into the processors or computers overall design. In both cases, the goal is to make the processor work as much as possible because this translates directly into faster computer performance.
Figure 2-2: Feeding the CPU In an ideal environment, a computers processor runs at its maximum rate because it is fed a steady stream of data. In reality, however, various delays that occur throughout a computer system often force a processor to sit idle for brief periods of time, waiting for the next chunk of data to arrive. Cache memory, in its various forms, plays a particularly important role in a processors performance. Cache can dramatically improve a processors efficiency by offering it access to the data it needs more quickly than regular memory would. Not only are cache memory chips (typically Static Random Access Memory, or SRAM) faster than regular memory chips, but they also have a faster connection to the processor, as I explain in just a bit.
The two most common types of cache are referred to as L1, or Level 1, and L2, or Level 2 cache. (It is possible to have Level 3 caches, but they are not very common.) Although technically speaking caches are a type of memory, in most cases the L1 and L2 cache are actually built into the processor chip or processor card itself. Thus, theyre really more a feature of the processor than of memory. Each level of cache is a separate chunk of memory and is treated independently by the processor. The two levels refer to how close the cache is physically located to the main number-crunching section of the processor. Figure 2-3 shows how the different caches work together with main memory.
Figure 2-3: Multiple caches They way a processor works with a system that has multiple caches is that the processor checks the L1 cache first, then the L2 cache, and then, finally, the main memory. Traditionally, L1 cache, which is usually the smaller of the two, has been located on the processor itself and L2 cache has been located outside, but near the processor. (The physical location on a computers motherboard does make a difference because when you shuttle data back and forth to different places, the further away something is, the longer it takes to get there and back. And when it comes to computer processing, nanoseconds billionths of a second really do count.) Recent processor designs have begun to integrate L2 cache onto the processor card or into the CPU chip itself, much like L1 cache. This speeds up access to the larger L2 cache which, in turn, speeds up the computers performance. Figure 2-4 demonstrates the differences. Another traditional difference between L1 and L2 caches has been the speed at which the processor can access the different types of memory. Because L1 cache is integrated into the core of the microprocessor, it typically runs at the same speed as the CPU; so on a 500MHz processor, the connection speed to L1 cache is usually 500MHz. On older systems, the L2 cache often connected to the processor at the same speed as main memory. This speed is determined by a connecting route, called the computers system bus, and typically runs at 66, 100, or 133MHz (although faster speeds are possible). For more on system buses, see the "Logical connections" section later in this chapter.
Figure 2-4: Cache locations The L2 cache is located in different places on different processors. Some processors have the L2 cache integrated into the main chip itself, others have L2 cache on the circuit board that holds the processor, and still others work with L2 cache thats separate from the processor on the computers motherboard. On newer systems, however, where the L2 cache is located on a daughtercard, such as most Pentium II and Pentium IIIs, or in the processor itself, as with the Celeron A, K6-3, and some mobile Pentium IIs (sometimes called Pentium II PEs, for performance enhanced) and Pentium IIIs (those designed for note-books), communication between the processor and the L2 cache occurs much more rapidly. On the Pentium II and III, for example, the processor-to-L2 cache connection is often via a backside bus; it runs faster than the system bus, but at half the speed of the processor. (This is sometimes referred to as a 1:2 ratio.) Again, with a 500MHz Pentium III processor, the processor-to-L2 cache connection speed is 250MHz. Additionally, systems that incorporate L2 cache on the chip itself feature a 1:1 ratio between the speed of the processor and the speed of the processor-to-L2 cache connection. So with a 500MHz processor, the connection to the L2 cache also runs at 500MHz. The faster your processor is, the more important it is to have a reasonable amount of L2 cache. In fact, without the proper amount of L2 cache, a processor often sits idle, "wasting cycles" as they say, which means your computer is not running as fast it can. This lack of L2 cache explains why some of the early Celeron-based computers had relatively poor performance. The original Celeron essentially wasted a great deal of its processing power. The upgraded Celeron A chip and all current Celerons (including the mobile versions), however, incorporate some L2 cache on the processor itself, and dramatically improve the performance of computers using the "A" version of the Celeron. You can tell whether or not a system uses the Celeron or Celeron A because all Celerons faster than 300MHz are Celeron As and all processors slower than 300MHz are original Celerons. (The only exception is that all Celerons designed for notebooks have the integrated L2 cache, regardless of speed.) Unfortunately, desktop PC-oriented 300MHz chips were available in both Celeron and Celeron A formats, so the only way to tell 300MHz Celerons apart is to look at the computers documentation (or use a diagnostic program that lists the processors type and speed). Because most processors incorporate L2 cache into their basic design, you often dont have the option to choose more or less cache in the system youd like to purchase or put together. (Older processors with standalone L2 cache are the one exception.) Instead, you get the amount of L2 cache a particular model of microprocessor includes. Therefore, when you decide which type of processor to get, make sure you find out how much L2 cache it includes. Remember, though, you can have too much of a good thing in other words, its possible to have too much cache (believe it or not . . .). Depending on the type of applications you plan to use, you can reach a point of diminishing return, and additional cache wont improve your performance very much. Also, cache memory tends to be very expensive, so you need to balance the price vs. performance. Computers that operate as servers, machines that sit at the center point of computer networks, typically need more cache than normal desktop machines because of the type of work they do. Due to this fact, several versions of the Pentium II and Pentium III Xeon, which is designed for servers, include 2MB (or more) of expensive L2 cache, as opposed to most desktop-oriented Pentium IIs and IIIs, which include only 512KB. In addition to their rated MegaHertz (MHz) speed, and the amount and type of cache, processors can be categorized by their internal structure, or architecture; it basically determines the language software programs must use to work with them. The vast majority of IBM-compatible PCs use what are called x86 processors because they are derived from Intels 8086 processor the same processor found in some of the earliest IBM PCs. The very first IBM PC, however, used Intels 8088 processor, a predecessor to the 8086. The 8086 and 8088 are often confused because the later chip used a lower number as its product name. The reason for this is that the 8088 was an 8-bit processor, while the 8086 is a 16-bit processor (a distinction I explain below), and hence the difference in the last digit. The x86 family of processors share a common set of instructions that software programs use to run on the chip. These instructions are the basic "language" of the processor and determine what types of calculations the processor is capable of doing.
Other processors use different instructions. This is why programs written for the Macintosh, for example, wont work on PC-compatible machines; Mac programs use instructions that are specific to the PowerPC family of processors. If you try to run a Mac program on a computer with an x86 chip, the x86 will think you are speaking to it in a foreign language. (Not to confuse matters, but it is possible to run applications written for one type of chip architecture on another via a technology called software emulation.) Each generation of x86 chips has added to the original set of core instructions that previous generations could understand, thereby expanding the capabilities of the processor. This explains why some newer applications, written to work with the latest generation of processors, wont run on older computers, even though they use the same type of chip. In other words, some applications require a Pentium or Pentium-class processor to work and wont run, for example, on a 486. MMX, 3DNow, and Streaming SIMD Extensions One of the most well-known extensions to these core instructions is MMX technology, which chip leader Intel introduced several years ago as the Pentium with MMX Technology (the "official" name for the Pentium MMX processor). MMX consists of 57 new instructions that processors that support the technology understand and execute. The new instructions were primarily designed to improve the performance of multimedia applications, such as computer games and entertainment titles. In 1998, AMD developed a different set of instructions called 3DNow. It was designed not only to improve 2D games, but also to improve 3D gaming performance. The 3DNow instructions are found in AMDs K6-2 and later processors, as well as some processors from other third-party manufacturers, such as Cyrix and Centaur (makers of the IDT WinChip). Most recently, Intel has added Streaming SIMD (Single Instruction, Multiple Data) Extensions, or SSE, technology to its newest Pentium III processors. Streaming SIMD was designed to improve the performance of 3D games as well as improve speech recognition and other advanced applications, and it brings an additional 80 new instructions to the core language of processors that support it. SIMD refers to a technique for performing the same operation on multiple bits of data at the same time. This can be helpful for things like making all the pixels, or individual dots, in a digital photograph all a bit darker. The original MMX instructions perform SIMD on integer data and the Streaming SIMD instructions in the Pentium III extend this to floating point data. (Integer refers to whole numbers and floating point refers to decimals where the decimal point floats.) As great as these new additions can be, they also raise important and potentially problematic issues. On the positive side, if software programmers take advantage of these new instructions, they can make certain features in their programs run faster. But if the programmers dont provide an alternative method to perform the same operation using a non-MMX, non-3DNow, or non-Streaming SIMD set of instructions, the program wont work on machines that dont support the special extensions. Given that many of these extensions are only supported in relatively new computers, this limits their potential audience. Additionally, now that there are three different sets of extensions, figuring out which chips support which extensions and which applications work most efficiently with which processor can be very confusing. For the record, though, virtually every CPU now supports MMX. Brand new Intel chips support MMX and Streaming SIMD Extensions (SSE). AMD K6s and K7s, as well as the Cyrix MXi and later IDT/Centaur WinChips, support MMX and 3DNow. Thankfully, many applications that support these technologies (programs that are said to be MMX- or 3DNow- or SSE-enabled) will run on older machines because programmers provided an alternative "backdoor" capability. Programs using older instructions on a machine that doesnt support new extensions wont run as fast as they would on a computer with a chip that supports MMX, 3DNow, or SSE, but at least theyll run. The only time you run into a problem is if the program actually requires MMX, 3DNow or SSE. If this is the case, then this program will only run on computers with processors that support the appropriate extensions. (Whew!) If you want to read more of Chapter 2 (there's another 30 pages to go), you'll need to buy the book.
|
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Home | Radio | Television | Books | Magazines | Consulting | What's New Search | Feedback | Troubleshooting Guide | Audio | Site Map Send mail to bob@everythingtechnology.com
with questions or comments about this web site.
|