TO: Whom it may concern FROM: Dr. Thomas R. Nicely Professor of Mathematics Lynchburg College Lynchburg, Virginia 24501-3199 USA Phone: 804-522-8374 Fax: 804-522-8499 Internet: nicely@acavax.lynchburg.edu RE: Pentium FPU Bug DATE: 94.12.09.2115 EST Enumerated below are some questions that have frequently been posed to me. Each question is followed by my response. Many of these questions were submitted by Dr. Denis Delbecq of the Paris based computer periodical "Science et Vie Micro." Feel free to transmit unmodified copies of this document as you wish. /*************************************************************/ Q1: How can a user check a Pentium machine for the presence of the bug? /**************************************************************/ Perform Coe's calculation (see Question 5 below). That is, carry out the following division problem: 4195835.0/3145727.0 = 1.333 820 449 136 241 00 (Correct value) 4195835.0/3145727.0 = 1.333 739 068 902 037 59 (Flawed Pentium) The division can be done in BASIC, in a spreadsheet (such as Quattro Pro, Excel, or Microsoft Works), in the Microsoft Windows calculator, or in some other programming language such as Pascal, C, or Fortran. Make sure that the FPU has not been disabled (this usually has to be done intentionally through some specific action). /*************************************************************/ Q2: Could you summarize how you discovered the problem? Were you doing research calculations or were you studying the problem of accuracy with computers? /**************************************************************/ RESPONSE: I was pursuing a research project in an area of pure mathematics called computational number theory. Specifically, I have written a code which enumerates the primes, twin primes, prime triplets, and prime quadruplets for all positive integers up to an extremely large limit (currently to about 6e12). The totals are written to a file at intervals of 1e9. Also computed are the sums of the reciprocals of the twin primes, the triplets, and the quadruplets; each of these can be proved to converge to a limit, but the limit of the sum of the reciprocals of the twin primes is known imprecisely, and the others have not been previously computed. My intent is to publish the results in a research journal at such time as I have carried the computation to an extremely large limit (perhaps 20e12) and confirmed the results. The code is written so that the computation can be distributed over a large number of independent systems, with the final results synthesized upon completion. The calculation has run for over a year simultaneously on half a dozen systems; most are 486s, but one Pentium was added in March, 1994. Simultaneously with the calculation of the unknown quantities, a number of checks are maintained by calculating previously published values (such as pi(x), the number of primes <= x). The reciprocal sums are also computed by two different methods-to 19 digits using the FPU, and to 26 (later 53) decimal places using arrays of long integers to effect extended precision (some of the code for this purpose was modified from code kindly made available by Arjen Lenstra of Bellcore). On 13 June 1994, a number of results were reassembled, and I found that the computed check value for pi(x) disagreed with the published value. This led to a long search for logic errors and sources of reduced precision in my source code (some 3000 lines in all). In the process, I found that the Borland C++ 4.02 compiler was producing erroneous code when compiled in 32-bit mode with certain optimizations (-Op -Om -Og) enabled. For some time I believed this to be the source of my woes. However, after eliminating this source of error, and rewriting the code to convert certain floating point calculations from double precision to long double precision, I found that I was still encountering an error in the reciprocal sums of the twin primes; the floating point result differed from the extended precision result by an amount orders of magnitude in excess of that expected from normal rounding error accumulation. Through trial and error and finally a binary search, the discrepancy was isolated to the pair of twin primes 824633702441 and 824633702443, which were producing incorrect floating point reciprocals (the extended precision reciprocals were also in error, to a different degree, evidently due to some minor dependency on floating point arithmetic in Lenstra's original integer arithmetic code). My first conjecture was that the error was again an artifact of the Borland compiler, but even completely disabling optimization failed to eliminate the problem. Tracing the source of the error was further complicated by the fact that on one occasion I tested the code with the Pentium FPU locked out, and the error was still present (this never happened again, and was apparently due to my own failure to properly disable the FPU). Finally, in desperation, I ran this portion of the calculation on one of the 486s, rather than the Pentium. The error disappeared. Even at this point, I felt the problem might still be in the PCI bus on the Pentiums, rather than the CPU. After all, a number of Pentium PCI systems had been reported in the trade press as corrupting data due to faulty design of the interface with the PCI bus (this was especially true of Intel motherboards using the Neptune chipset). The final pieces of the puzzle fell in place during the week of 16- 22 October. On 17 October I gained access to a second Pentium, which had a motherboard from a different manufacturer. The error was present in this machine as well. On 18-19 October, I reproduced the error in a code written in Power Basic, eliminating the C compiler as a cause. I reproduced the error in a Quattro Pro spreadsheet, and also verified that the error disappeared when the FPU was locked out in real-mode DOS (this is difficult to do in Windows code or 32-bit code, which I was using for my main application). On 21 October, I ran the test code on a 486DX2-66 with a PCI bus; when no error appeared, I felt that the PCI bus had been eliminated as a cause. On 22 October, I tested the code on still a third Pentium on display at Staples, a local office supply store; this Packard-Bell machine also produced the error. I was now certain that the error was in the FPU of the Pentium chip. On or about 19 October, I contacted tech support at Micron, Inc., from whom I purchased my system, but they were unable to provide me with any information regarding the problem. On 24 October, I contacted Intel tech support. After six days, they still had no answer to the problem. On 27 October, I provided a colleague with a copy of the test code; her husband is an engineer in the nuclear reactor group at the local firm of Babcock and Wilcox. Babcock and Wilcox reported to me on 28 October that their new P90 Gateway Pentiums all appeared to have the bug. In the absence of any meaningful response from Intel tech support, on 30 October I sent e-mail to a number of individuals and organizations who I felt would have access to many other Pentium systems, and asked them to check for the problem. I believe you are aware of events from that point on. /**************************************************************/ Q3: In which fields of mathematics and numerical models could the FDIV roundoff error reduce significantly confidence in the results? Many people talk about the formulas that demonstrate the problem. /***************************************************************/ RESPONSE: Clearly, computational number theory is one area affected. Other areas with the potential for major difficulties include computations in chaos theory (non-linear dynamics), linear programming or finite element analysis (where ill-conditioned matrices may be involved), and areas requiring numerical solution of differential equations by iterative methods (if high precision is required in the extrapolated result, as in orbital dynamics). Bear in mind, however, that the likelihood is 1000 to 1000000 times greater that any erroneous results obtained on a Pentium are due to software errors, rather than any error in the CPU. For the average user, I do not believe the bug has a significant impact, particularly in comparison to other sources of error. However, for users in mathematics, science, and engineering, we must each be our own judge as to the danger posed by the bug. In any case, whether you are using the Pentium or some other CPU, mission-critical applications and those which may affect the health and welfare of others should be performed in duplicate, preferably on systems with different CPUs, operating systems, and application software. /***************************************************************/ Q4: Why did Intel contact you for a collaboration? Don't you think that people might interpret it as a way of buying your silence? Some observers find this quickly signed NDA surprising. /****************************************************************/ RESPONSE: Intel has indicated that they are interested in having me as a consultant because I am clearly doing a type of mathematical work that they did not previously anticipate the Pentium being used for; consequently they did not conduct their stress and validation tests on the Pentium with this type of application in mind. Apparently they would consider it a useful additional test of their future steppings and chips to see if these processors can correctly perform calculations of these types to the standards of accuracy which I require. The NDA was signed as part of an application process normally required of individuals or companies which act as independent contractors for Intel. As I have pointed out before, I accept full responsibility for misinterpreting the intent and force of the NDA. After the NDA became an issue, Intel went out of their way to make clear to me that it did not apply to information concerning the discovery that I had made; it was only relevant to confidential information the parties might exchange in any future consulting work (for example, proprietary information about a CPU before it had been released to the public). As I have explained before, my misinterpretation was primarily a consequence of the fact that I once held a Q-clearance for critical nuclear weapon design information at Los Alamos National Laboratory, and the interpretation enforced there is much, much stricter; even information acquired in the open, prior to signing the clearance, is considered "born secret" and subject to nondisclosure. Why, you might ask, would I sign the NDA if it might have the effect (due to my own mistaken interpretation) of silencing me regarding the bug? Perhaps I did not give it enough thought. On the other hand, I had to consider the value to myself, and to my employer (Lynchburg College), of a possible long-term relationship with a corporation which could provide benefits and prestige for both of us. I had already made the bug public; my original announcement and code were available almost worldwide at this point, so I certainly felt I had done my duty to the general public. Clearly Intel knew that no agreement with me could put the genie back in the bottle. I was trying to look at the possibility of an association with Intel in terms of its long-range impact. These are the kinds of decisions that are always easy to criticize if you do not have to make them yourself, without advice, under pressure. At this point (9 December), Intel and I have agreed to suspend all negotiations until the furor over the bug settles down. I am not an employee of or consultant for Intel; Intel has paid me no fees, either in the form of cash or equipment (they have provided me with bug-free replacement chips for the two Pentium systems I have been using). The NDA has no effect at this time, since we have in fact not exchanged any proprietary or confidential information. Perhaps after the first of the year, if my health allows, we will again explore the possibility of a relationship (on 19 December, I must enter the hospital for a heart procedure, possibly a coronary bypass; this will be the third such procedure in 13 months). /***************************************************************/ Q5: What does this FDIV problem signify at the logical level of the FPU? Does it occur with some specific mantissa schemes? /***************************************************************/ RESPONSE: The difficulty apparently arises from an error in the lookup tables used to implement the hardware division algorithm; the lookup tables are either incorrect or incomplete. The Pentium apparently attempts to use a much more aggressive algorithm for hardware floating point division than did the 486; this is indicated by the fact that it uses only about half as many clock cycles per floating point division. Evidently the 486 is attempting to generate one bit of the quotient per iteration, while the Pentium attempts to generate two bits per iteration. In every case of which I am aware that produces an error, the first 16 bits of the mantissa (in an 80-bit temporary real) are 0xBFFF. Only a small portion of even these mantissas produces an error, however (roughly 1 in 1e5, or less than one in 1e9 of all possible mantissas). The exponent appears to be irrelevant. The worst case error posted to date is the one discovered by Tim Coe, an engineer at Vitesse Semiconductors: 4195835.0/3145727.0 is returned correctly to only 14 significant bits (the 5th decimal digit and all beyond are in error): 4195835.0/3145727.0 = 1.333 820 449 136 241 00 (Correct value) 4195835.0/3145727.0 = 1.333 739 068 902 037 59 (Flawed Pentium) Brooke Crothers reports in "Infoworld" (5 December 1994, page 1) that Intel has confirmed the existence of cases where the fourth decimal digit is also in error, but I know of no specific example where the result does not at least round correctly to the fourth significant decimal digit. Note that the FPU instructions FPREM and FPREM1 (floating point remainders) are also subject to the bug. In fact, it was probably one of these that caused my original 13 June error, rather than the FDIV instruction. /****************************************************************/ Q6: Do your calculations of the relative frequency of the error agree with those publicized by Intel? /****************************************************************/ RESPONSE: Yes, for all practical purposes. Intel quotes an error rate of about 1 in 9.5e9 random divisions. I obtain a rate of 1 in 31e9 for random divisions and 1 in 1.26e9 for random reciprocals. The rates may not be directly comparable, since Intel is apparently including single and double precision operations in their count, and I am testing only long double divisions and reciprocals (since this is the natural data type for the FPU stack, and since it is the relevant data type in my own research). Note, however, that many authorities consider statistical sampling rates to be unrepresentative of the problem, since the values appearing in a particular application may not constitute a random sample of all possible mantissas. /****************************************************************/ Q7: Do the replacement Pentium chips you received from Intel appear to eliminate the bug? /****************************************************************/ RESPONSE: Yes. I have tested the replacement chips with > 1e15 simulated divisions and reciprocals and have observed zero errors. The critical cases, such as my original example and Tim Coe's example, have also been tested individually. /***************************************************************/ Q8: What about the so-called "workarounds" for the bug? /***************************************************************/ RESPONSE: The workaround suggested by Cleve Moler of MathWorks consists of replacing each division by a function call. The function call first performs the division directly, then tests the answer for correctness (e. g., by comparing x*(y/x) to y). If the result is in error due to the Pentium bug, the numerator and denominator are each multiplied by 3/4 (which destroys the 0xBFFF denominator mask causing the problem) and the division is repeated. This process is continued in a loop until the result checks correctly. I use a similar workaround in my sample code, but use a multiplier of 3 rather than 3/4, which would appear to be two clocks faster. Of course, the workaround only works for applications whose code has been rewritten, recompiled, and reshipped since the bug appeared. Previously existing binaries can avoid the bug only by locking out the FPU (e. g., by setting 87=NO and NO87=NO87 in DOS, or by resetting the emulation bit in the machine status word of CR0 otherwise). The workaround slows the machine down slightly, perhaps 30 % (this is application dependent). Locking out the FPU may slow the machine down by a factor of five or ten, depending on the application. A separate workaround is required if the floating-point remainder instructions, such as fmod or fmodl in C, are used. /***************************************************************/ Q9: Why do you think this particular bug has received an inordinate amount of publicity, making it such a public relations nightmare for Intel? /***************************************************************/ I believe several factors contributed to this phenomenon. * Intel's initial failure to publicize the problem, even in a listing of errata to their OEMs and most valued customers, was in retrospect a mistake which alienated these constituencies. * Intel's subsequent response, once the bug had been detected independently, was considered unsatisfactory by nearly everyone outside the company. * The Pentium CPU has been the subject of a high-profile advertising campaign by Intel. * In contrast to most previous errors found in CPUs, this one occurs in an elementary, frequently-used operation which is easy to demonstrate to the non-specialist, even those who have little or no computer training. * The bug was found late in the life cycle of the chip, after millions of them were already distributed or in production. * The existence of the Internet, and its current widespread availability, caused the news and the reaction to Intel's response to spread much more rapidly than for previous bugs. /***************************************************************/ Q10: Can you tell us something of your own background? /***************************************************************/ I was born 6 February 1943, in Wareham, Massachusetts, but grew up in the coal mining town of Amherstdale, Logan County, West Virginia. My father and most of my male relatives were coal miners; my father died in 1973 due to heart disease caused by black lung disease. I graduated from Man High School in Logan County in 1959; earned a B. S. degree in physics from West Virginia University, Morgantown, West Virginia, in 1963; an M. S. degree in theoretical physics from WVU in 1965; and earned the Ph. D. in applied mathematics from the School of Engineering, University of Virginia, Charlottesville, Virginia, in August, 1971. I have spent nearly all of my professional career as a professor of mathematics at Lynchburg College, Lynchburg, Virginia, beginning in 1968. Lynchburg College is a small (full time undergraduate enrollment about 1420), private, non-profit, coeducational liberal arts college, most generally noted for its excellent programs in the fine arts (dramatic arts, art, music) and its success in Division III (non-scholarship) athletics. The College was founded in 1903 by Dr. Josephus Hopwood, and is an ecumenical, non- sectarian institution affiliated with the Christian Church (Disciples of Christ). I did take a leave of absence in 1985-86 to work as a staff member in X Division (nuclear weapon and nuclear reactor design and analysis) at Los Alamos National Laboratory, Los Alamos, New Mexico, but decided I preferred the academic environment. I also do consulting work for the Avalon Hill Game Company, Baltimore, Maryland, producing the team charts and rules each year for the "Paydirt" tabletop football game originally developed by Sports Illustrated Enterprises, and also the team charts and rules for "Bowlbound," the college football edition of the game. My wife of 21 years is a practicing HVAC mechanical engineer and consultant, Linda Carol Taylor Nicely, a graduate of the School of Engineering at the University of Tennessee. We have no children, but have the good fortune to enjoy the company of six cats. Sincerely, Dr. Thomas R. Nicely | |
|