**************************************** * Matrix Multiply Algorithm Results * * Results file: mm_2.tbl * * Source file: mm.c * * RAM usage: Need at LEAST 10 MBytes * * Al Aburto, aburto@nosc.mil * * 01 Oct 1997 * **************************************** The Matrix Multiply program mm.c is by Mark Smotherman. His email address is: mark@cs.clemson.edu. Please contact Mark regarding the mm.c code or for questions, comments, and results showing wide variations. What results I get (Al Aburto, aburto@nosc.mil) I'll pass along to Mark too. This table of results is kept at 'ftp.nosc.mil' (128.49.192.51) in directory 'pub/aburto'. You can access this and other programs and results via anonymous ftp. I try to keep things frequently and regularly updated. mm.c is a collection of nine matrix multiply algorithms. Four of those algorithms were selected for this database. The algorithms and options are shown below. Compile mm.c as: cc -O -DN=500 mm.c -o mm (or use whatever other compile options you prefer) and then run mm with the options shown below. NOTE: You must use '-DN=500' else the matrix size will be undefined. The results are very interesting as they reveal the enormous effect that cache thrashing can have on the results with different machines, algorithms, compilers, and compiler options. There are even more efficient algorithms tuned for specific machines. Toshinori Maeno (tmaeno@cc.titech.ac.jp) of the Tokyo Institute of Technology has sent me a few examples for HP, IBM, DEC, and Sun. The MFLOPS rating (for FADD and FMUL) can be obtained from the results. For example, for the D. Warner algorithm (mm -w 50), the number of FADD and FMUL instructions (weighted equally) is 2*N*N*N = 250,000,000 (for N = 500). Therefore MFLOPS = 2*N*N*N / Runtime, where Runtime is in seconds (see table below). Thus the IBM RS/6000 Model 950 is working at 250000000/3.65 = 68.5 MFLOPS relative to equally weighted FADD and FMUL instructions with the D. Warner algorithm with blocking of size 50. With a properly 'tuned' algorithm this could be improved further. mm -p :option p - matrix multiply using pointers mm -v :option v - normal matrix multiply using temp variable mm -i :option i - matrix multiply with interchanged loops mm -w 50 :option w - matrix multiply using D. Warner method of blocking (size 50) and unrolling. mm -w 20 :option w - matrix multiply using D. Warner method of blocking (size 20) and unrolling. -- Time to do 500 X 500 Matrix Multiply -- System mm -p mm -v mm -i mm -w 50 mm -w 20 (sec) (sec) (sec) (sec) (sec) REF ### ----------------------- ------- -------- -------- -------- -------- --- 001 SGI Origin 200, 180 MHz 5.31 1.09 1.08 0.97 1.18 58 002 SGI Origin 200, 180MHz 5.49 5.48 6.45 1.00 1.21 46 003 IBM RS/6000 590 (power2) 2.76 2.75 2.41 1.34 1.26 20 004 SGI Indigo2, R10000/195 8.79 8.52 8.15 0.97 1.23 34 005 SGI Indigo2, R10000/195 8.52 8.61 8.00 1.07 1.32 34 006 HP 9000/J210XC, 120 MHz 11.24 11.30 5.05 1.01 1.47 52 007 Sun Ultra-1, 143 MHz 30.96 31.36 7.23 1.50 1.85 51 008 SGI O2, R10000/175 36.90 36.54 19.19 1.47 2.12 41 009 SGI Onyx, 75.0MHz 7.13 6.72 5.89 2.01 2.18 47 010 SGI O2, R10000/175 36.85 36.60 19.18 1.73 2.29 41 011 Sun Enterprise 4000/10 19.55 19.97 5.54 2.24 2.30 54 012 Sun Enterprise 4000/10 23.33 23.68 9.42 2.43 2.38 54 013 HP 9000/J210 23.17 30.12 8.85 1.86 2.39 40 014 Pentium Pro, 200 MHz 13.79 13.78 7.03 2.74 2.85 55 015 DEC 4000/710 AXP 25.98 25.77 10.40 3.56 2.90 11 016 PowerPC 7300, 200 MHz 40.27 42.38 14.43 3.13 2.96 57 017 HP 9000/735 28.99 28.89 24.48 2.62 3.24 13 018 Pentium Pro, 180MHz 13.31 13.12 10.57 3.33 3.48 43 019 Sun Ultra-1, 168 MHz 31.86 31.19 11.46 3.63 3.43 50 020 Dell XPS Pro 200n NT 11.75 11.73 9.65 3.46 3.64 37 021 Sun Ultra 1 30.39 31.49 11.04 4.43 3.92 23 022 IBM RS/6000 25E, 66MHz 145.62 145.58 14.31 4.18 3.96 53 023 Dell XPS Pro 200n DOS 11.81 11.92 9.72 3.68 4.06 37 024 DEC 4000/710 AXP 25.27 26.20 10.49 4.55 4.08 11 025 HP 9000/712 31.26 31.41 15.83 3.64 4.20 27 026 DEC 3000/400 AXP 40.93 40.53 20.60 5.07 4.22 11 027 HP 9000/712, 100 MHz 31.31 31.72 16.16 3.70 4.22 52 028 DEC 4000/710 AXP 25.31 25.58 9.82 4.88 4.24 11 029 SGI Challenge S 29.47 29.86 15.44 4.26 4.29 44 030 IBM RS/6000 Model 950 105.51 105.55 11.39 3.65 4.34 2 031 IBM RS/6000 Model 950 105.54 105.62 11.40 3.89 4.34 2 032 SGI Indy, 150MHz 33.33 33.76 16.51 4.37 4.37 46 033 Sun Ultra 1 37.47 37.95 13.32 5.02 4.65 23 034 IBM RS/6000 Model 570 103.30 103.26 11.37 4.89 4.66 10 035 PowerMac 8100/80 134.46 134.28 22.15 4.48 4.80 16 036 Sun HyperSPARC 20/HS21 51.29 51.38 19.35 4.46 5.01 22 037 DEC 4000/610 AXP 37.70 37.89 22.22 5.41 5.06 2 038 DEC 4000/610 AXP 37.85 37.21 37.91 5.43 5.06 2 039 IBM RS/6000 Model 550 105.23 104.78 11.32 5.42 5.34 3 040 SGI Indigo2 R4400 (100) 41.65 41.93 21.00 5.56 5.69 7 041 Pentium P5-166 40.70 40.92 15.54 7.52 5.77 45 042 DEC 3000/400 AXP 40.79 42.03 20.69 6.44 5.82 11 043 Pentium Pro, 200MHz 11.87 11.87 9.84 5.55 5.88 42 044 DEC 3000/400 AXP 40.40 40.40 20.13 6.94 6.10 11 045 Sun HyperSPARC 55.75 55.76 22.06 5.59 6.28 25 046 Pentium P5-133 42.18 41.41 17.57 8.51 6.86 36 047 AMD K6, 200 MHz 28.78 28.83 13.62 6.92 7.25 56 048 Pentium P5-120 48.83 47.95 21.20 9.61 7.85 33 049 IBM RS/6000 Model 950 105.68 105.67 13.30 7.45 8.09 2 050 DEC 4000/710 AXP 28.81 8.30 9.04 6.75 8.24 11 051 Dell XPS Pr200n NooptNT 15.31 16.30 15.84 7.98 8.30 37 052 SGI Indigo2 R4000 (100) 61.84 62.35 26.84 9.58 8.38 11 053 Pentium P5-100 48.45 48.56 21.86 10.65 8.85 31 054 Pentium P5-133 45.32 46.91 18.73 10.55 9.34 35 055 IBM RS/6000 Model 320 230.45 230.79 27.36 -------- 9.45 1 056 SGI Crimson R4000 (100) 63.40 63.12 32.22 10.95 9.78 4 057 SGI Indigo R4000 (100) 64.49 64.49 33.62 10.95 9.81 4 058 SGI Indigo2 R4000 (100) 67.65 67.33 35.57 11.26 10.16 4 059 SGI Indigo2 R4000 (100) 65.26 64.95 32.19 11.20 10.22 11 060 SGI Indigo2 R4000 (100) 66.54 66.99 32.19 11.21 10.22 11 061 Pentium P5-120 50.31 52.07 20.81 11.64 10.33 33 062 Sun SPARCstation 10/41 85.00 86.28 36.24 11.30 11.04 3 063 DEC 4000/610 AXP 36.67 7.78 7.81 11.78 11.28 2 064 Pentium P5-133 136.10 136.16 19.33 17.41 11.32 35 065 DEC 3000/400 AXP 40.88 11.54 12.69 11.02 11.48 11 066 SGI Crimson R4000 (100) 73.76 25.73 25.52 10.57 11.56 4 067 Sun SPARCstation 10/41 76.70 79.11 25.28 11.38 11.85 3 068 SGI Indigo R4000 (100) 74.89 27.03 26.93 10.74 11.93 4 069 Pentium P5-100 49.92 52.95 21.64 13.34 11.98 31 070 AMD K5-PR133, 100MHz 50.89 50.93 23.11 10.86 12.08 48 071 Escom Pentium/100 DOS 56.96 57.89 22.25 13.18 12.08 37 072 Pentium P5-100 50.91 53.83 22.08 13.41 12.08 32 073 SGI Indigo2 R4000 (100) 77.54 28.28 28.04 11.18 12.43 4 074 Gateway Pentium P5-90 62.61 61.90 24.66 14.23 12.58 14 075 Pentium P5-75 112.49 114.30 31.20 17.52 12.96 29 076 Gateway Pentium P5-90 65.65 -------- -------- 15.07 13.29 17 077 DEC 4000/710 AXP 35.93 33.52 16.16 13.59 13.46 11 078 SGI Indy PC R4000 (100) 112.98 111.92 57.33 22.48 13.94 4 079 Gateway Pentium P5-90 66.13 66.30 27.85 16.20 14.28 18 080 ZEOS Pentium P5-90 96.28 96.34 29.66 17.58 14.55 18 081 Cray YMP 1.64 1.69 1.17 7.49 14.68 8 082 DATEL Pentium P5-90 107.10 108.21 28.01 18.07 14.72 18 083 Sun SPARCstation 10/41 76.88 81.45 26.21 14.75 15.03 3 084 Sun SPARCstation 10/41 70.15 83.34 26.42 12.77 15.25 3 085 SGI Indigo R4000 (100) 64.23 63.57 34.67 15.77 15.53 2 086 Pentium P5-75 116.03 112.42 30.06 20.32 16.60 30 087 Sun SPARCstation 10/30 99.83 99.73 21.74 17.85 16.92 3 088 Pentium (75 MHz) 110.51 110.51 30.48 20.54 16.97 28 089 Sun SPARCstation 2 (80) 64.33 63.73 73.01 19.17 17.14 3 090 Force 2CE (80 MHz) 67.44 67.60 66.62 19.49 17.47 49 091 Dell XPS Pr200n NoopDOS 38.34 33.28 28.94 18.56 18.29 37 092 Sun SPARCsystem 600-4 83.88 78.04 35.61 17.60 18.40 15 093 Sun SPARCstation 10/41 85.95 85.98 33.71 18.32 18.56 3 094 DEC 3000/400 AXP 52.26 51.50 27.89 19.35 19.17 11 095 SGI Indigo R4000 (100) 65.93 65.25 36.79 19.64 19.30 2 096 SGI Indigo R4000 (100) 65.83 65.66 36.82 19.65 19.36 2 097 AMD5K86-P90 (90MHz) 73.16 73.22 35.37 20.76 19.39 39 098 AMD5K86-P90 (90MHz) 72.12 72.94 34.16 20.93 19.61 39 099 SGI Indy PC R4000 (100) 124.44 51.05 51.88 22.50 23.49 4 100 Sun SPARCstation 2 (80) 64.32 64.81 79.79 23.94 25.26 3 101 Sun SPARCstation 2 (50) 50.46 49.16 63.88 23.87 25.49 3 102 SGI Indigo R3000 (33) 123.43 123.34 73.29 26.32 26.48 4 103 Sun SPARCserver 690 MP 101.05 98.15 62.70 29.55 32.00 3 104 Sharp PC-3060 Cyrix 5x86 86.29 89.75 74.64 36.74 32.18 26 105 MAC PowerPC 8100/80 142.73 144.52 38.47 32.18 33.30 12 106 AMD 5x86-133 67.03 79.12 63.41 38.13 35.49 38 107 486DX4/100 66.37 70.32 61.38 40.46 36.64 24 108 486DX4/100 77.94 79.92 67.17 39.38 37.30 31 109 Sun SPARCstation 2 (40) 85.47 85.61 89.22 35.64 37.64 3 110 AMD 486DX4/100 81.72 83.93 73.87 40.48 38.23 31 111 SGI Indigo R3000 (33) 173.38 72.85 73.80 37.28 38.77 4 112 486DX4/100 112.55 113.15 67.22 47.18 39.22 31 113 Sun SPARCstation 670 108.38 115.45 72.92 43.31 45.51 6 114 Sun SPARCstation 670 113.53 119.85 73.94 43.59 45.89 6 115 DEC DECstation 5000/240 190.72 194.83 133.45 44.37 46.54 11 116 DEC DECstation 5000/240 164.69 176.23 128.70 45.04 47.39 11 117 DEC DECstation 5000/240 193.35 199.32 131.89 45.78 47.72 11 118 Sun SPARCstation 2 (40) 75.57 82.07 89.88 47.71 49.90 3 119 Sun SPARCstation 2 (40) 75.77 81.94 81.14 48.32 50.67 3 120 SGI Personal Iris 25G 249.63 249.38 198.65 56.19 53.80 4 121 SKY Shamrock, i860 (40) ------ 170.82 22.42 49.99 54.31 49 122 IBM RS/6000 25E, 66 MHz 187.09 198.16 61.99 54.82 55.91 53 123 DEC DECstation 5000/240 189.01 226.56 142.17 57.38 59.22 11 124 Escom Pentium/100 Noopt 107.16 125.88 66.84 60.26 62.83 37 125 Sun IPX 147.93 -------- 86.93 -------- 63.64 9 126 DEC DECstation 5000/240 162.31 157.51 104.28 71.58 74.64 11 127 DEC DECstation 5000/240 164.02 158.86 105.33 71.82 74.74 11 128 Escom 80486DX2/66 DOS 117.59 117.54 120.45 86.84 82.88 37 129 Sun SPARCstation 1 133.23 132.04 185.51 82.07 87.46 5 130 DG Aviion 5225 307.40 302.50 151.53 95.95 88.10 2 131 DG Aviion 5225 301.78 318.81 186.22 97.09 89.20 2 132 DG Aviion 5225 302.95 300.90 148.13 98.18 89.99 2 133 80486DX/40 158.45 164.14 147.10 99.51 92.71 20 134 SGI Personal Iris 25G 330.93 204.93 206.91 96.67 103.57 4 135 Vega 486DX/33, ISA 223.49 237.94 189.44 129.46 125.50 3 136 Vega 486DX/33, ISA 187.97 200.33 221.92 134.80 128.37 3 137 Escom 80486DX2/66 Noopt 281.33 348.94 235.41 212.84 205.75 37 --- ### 001 SGI Origin 200, Irix 6.4, MIPS R10000, 180MHz, SGI Irix C Compiler 7.1.1, cc -DUNIX -O3 -64, 1MB cache, 1GB RAM 002 SGI Origin 200, Irix 6.4, MIPS R10000, 180.0 MHz, SGI Irix C Compiler 7.1, cc -O -64 -DUNIX, 1MB Cache, 128MB RAM Note: The R10000 is a 64-bit CPU with 64-bit OS and compiler 003 cc -O3 -qarch=pwr2 -DUNIX -DN=500 mm.c -o mm -lbsd 004 IRIX C Compiler, 64-bit, cc -DUNIX -O -64 -r10000 NOTE: The R10000 is a 64-bit machine with 64-bit OS & compiler. 005 IRIX C Compiler, 32-bit, cc -DUNIX -O NOTE: The R10000 is a 64-bit machine with 64-bit OS & compiler. 006 HP 9000/J210XC, HP-UX 10.20, PA7200_2CPU, 120 MHz, HP92453-01 A.10.32.10 HP C Compiler, cc -DUNIX -Ae +Oall +DAJ210XC +DSJ210XC +Oparallel -Wl,aarchive 007 SunOS 5.5.1, Sun C V4.0, cc -DN=500 -DUNIX -O4 -xchip=ultra -xarch=v8plusa 008 IRIX C 6.2, 32-bit, cc -O -32 -DUNIX NOTE: The R10000 is a 64-bit machine with 64-bit OS & compiler. 009 SGI Irix C Compiler 6.2, cc -O -r8000 -DUNIX, 4MB Cache, 320MB RAM Note: The R8000 is a 64-bit CPU with 64-bit OS and compiler 010 IRIX C 6.2, 32-bit, 1MB cache, cc -O -n32 -DUNIX NOTE: The R10000 is a 64-bit machine with 64-bit OS & compiler. 011 Solaris 2.5.1, UltraSPARC, 250 MHz, Sun C 4.0, cc -fast -xO5 -xtarget=native -DUNIX -DN=500 012 Solaris 2.5.1, UltraSPARC, 250 MHz, gcc 2.7.2.1, gcc -O3 -funroll-loops -DUNIX -DN=500 013 HP 9000/J210, 120 MHz, HP-UX C compiler, cc -O -DUNIX 014 Pentium Pro, 200 MHz, Windows 95, Intel motherboard VS440FX ATX, 32 MB RAM (60ns fast page), Borland C++ V5, bcc32i -DBORLAND_C -DN=500 -O2 -6 015 OSF/1 1.3, OSFCMPLRS130, C89BASE130, KPCBASE150, cc -O3 -Olimit 1400 -migrate -DN=500 -DUNIX 016 BeOS, CPU: 604e, 200 MHz, 512KB L2 cache, 64MB RAM, Metrowerks C/C++, cc -O full 017 HP-UX 9.01, HP C 9.65, cc -DUNIX +O4 +Oall 018 Pentium Pro, 180 MHz, Linux 2.0 (ELF), gcc 2.7.2, gcc -DUNIX -DN=500 -O3 -funroll-loops 019 Sun Ultra-1, 168 MHz, SunOS 5.5.1, gcc 2.7.2.1, gcc -DN=500 -DUNIX -O3 -funroll-loops -mv8 020 Pentium Pro 200 MHz, 66MHz external, 256 KB cache, 440FX PCIset, 32 MB EDO RAM, Windows NT 3.51 Watcom C/C++ 10.5 Win32NT -otexan -fp5 -5r - zc -dN=500 -dMSC 021 Sun UltraSPARC, Solaris 2.5, 143 MHz, gcc 2.6.0, gcc -O2 -msupersparc -DUNIX_Old 022 AIX 3.2.5, PPC601, 66MHz, xlc 1.2.1, mm -p :cc -DUNIX -O3 -qarch=PPC -qtune=601 -qfold mm -v :cc -DUNIX -O3 -qarch=PPC -qtune=601 mm -i :cc -DUNIX -O3 -qarch=PPC -qtune=601 -qstrict mm -w 50 :cc -DUNIX -O3 -qstrict -qinlglue -Q mm -w 20 :cc -DUNIX -O 023 Pentium Pro 200 MHz, 66MHz external, 256 KB cache, 440FX PCIset, 32 MB EDO RAM, Windows NT 3.51/DOS Watcom C/C++ 10.5 Dos4GW -otexan -fp5 -5r - zc -dN=500 -dMSC 024 OSF/1 1.3, OSFCMPLRS130, C89BASE130, KPCBASE150, cc -O3 -Olimit 1400 -DN=500 -DUNIX 025 HP-UX 9.03, PA-RISC7100LC, 100 MHz cc -DUNIX +Oall +P -Wc,-DA712,-DS712 026 OSF/1 1.3, CMPLRS130, cc -O3 -Olimit 1400 -migrate -DN=500 -DUNIX 027 HP-UX 10.20, HP-PA7100LC, HP92453-01 A.10.32.10 HP C Compiler, cc -DUNIX -Ae +Oall +DA712 +DS712 -Wl,aarchive 028 OSF/1 1.3, OSFCMPLRS130, C89BASE130, KPCBASE150, c89 -O3 -Olimit 1400 -DN=500 -DUNIX 029 Irix 6.2, R4400, 200MHz, Irix C 7.0, 1MB cache, 128MB RAM, cc -O -DUNIX 030 AIX 3.2.?, xlc 1.2, c89 -O -Q -qansialias -DUNIX -DN=500, Result is average of 5 runs. 031 AIX 3.2.?, xlc 1.2, c89 -O -Q -DUNIX -DN=500, Result is average of 5 runs. 032 SGI Indy, Irix 6.2, MIPS R5000, 150.0 MHz, SGI IRIX C Compiler 7.0, cc -O -DUNIX, 512KB Cache, 64MB RAM 033 Sun UltraSPARC, Solaris 2.5, 143 MHz, gcc 2.6.0, gcc -O2 -msupersparc -DUNIX_Old 034 AIX 3.2.4, cc -DUNIX -O -Q -DN=500 035 xlc 1.02, xlc -O3 -Q 036 Sun HyperSPARC 20/HS21, 125 MHz, Solaris 2.4, gcc 2.6.0, gcc -O2 -msupersparc -DUNIX_Old 037 OSF/1 1.3, C89BASE120, c89 -O3 -Olimit 1400 -DUNIX -DN=500, Result is average of 5 runs. 038 OSF/1 1.3, C89BASE120, c89 -O4 -Olimit 1400 -DUNIX -DN=500, Result is average of 5 runs. 039 AIX 3.2.4, cc -O -DN=500 mm.c -o mm delphi.beckman.uiuc.edu 040 R4400, 150 MHz, Irix 4.0.5H, cc 3.10.1, cc -DUNIX -DN=500 -ansiposix -O3 -mips2 -Olimit 1400 041 430VX Chipset, 32 MB RAM, 256 KB Pipelined Burst SRAM Cache, Borland C++ V5, bcc32i -DBORLAND_C -O2M -5 042 OSF/1 1.3, CMPLRS130, cc -O3 -Olimit 1400 -DN=500 -DUNIX 043 Pentium Pro, 200MHz, Windows 95, 32MB DRAM, using DJGPP, gcc 2.7.2, gcc -DUNIX -DN=500 -O3 044 OSF/1 1.3, CMPLRS130, c89 -O3 -Olimit 1400 -DN=500 -DUNIX 045 Sun HyperSPARC, 100 MHz, SunOS 5.4, gcc 2.7.1, gcc -DMSC -DN=500 -msupersparc -O3 046 Pentium P5-133, 133MHz, Windows 95, MB-8500TVX motherboard, 82437VX chipset, 256KB Pipelined Burst SRAM cache, 64MB EDO 60ns DRAM, Borland C++ V5, bcc32i -DN=500 -DBORLAND_C -O2 -5 047 SOYO 5BT5 motherboard, chipset 82430TX, 512KB pipelined burst SRAM cache, 32MB EDO DRAM (60 ns). Borland C++ V5 (Intel compiler bcc32i). bcc32i -DBORLAND_C -DN=500 -O2 -6 048 Pentium P5-120, 120 MHz, Windows 95, MB-8500TVX motherboard, 82437VX chipset, 256 KB L2 burst SRAM cache, 32 MB Fast Page 70 ns DRAM, Borland C++ V5, bcci32 -DN=500 -DBORLAND_C -O2 -5 049 AIX 3.2.?, xlc 1.2, c89 -O -Q -qnomaf -DUNIX -DN=500, Results average of 5 runs. 050 OSF/1 1.3, CMPLRS130, C89BASE130, KPCBASE150, kcc -O3 -Olimit 1400 -migrate -D_POSIX_SOURCE -ckapargs="-dpr=32 -arl=3 -inl -ind=10 -inll=10 -nat -o=5 -r=3 -so=3 -ur=10" -DN=500 -DUNIX 051 Pentium Pro 200 MHz, 66MHz external, 256 KB cache, 440FX PCIset, 32 MB EDO RAM, Windows NT 3.51 Watcom C/C++ 10.5 Win32NT no optimization -dN=500 -dMSC -bw 052 Irix 4.0.5H, MIPS C 3.10.1, cc -O4 -ansiposix -mips2 -Olimit=1400 -DN=500 -DUNIX 053 Pentium P5-100, 100MHz, Windows 95, Intel MARL motherboard, Chipset 82430HX, 256KB Pipeline Burst SRAM, 32MB 60ns Fast Page DRAM, Borland C++ V5, bcc32i -DN=500 -DBORLAND_C -O2 -5 054 Pentium P5-133, 133MHz, MS DOS 6.22, MB-8500TVX motherboard, 82437VX chipset, 256KB Pipelined Burst SRAM cache, 16MB Fast Page DRAM, L2 cache enabled, gcc 2.5.7, gcc -DN=500 -DUNIX -O2 -m486 -fomit-frame-pointer 055 cc -O -DN=500 mm.c -o mm 056 R4000 50 MHz CPU, 8 KByte I/D caches, 1 MByte external cache, Irix 4.0.5, cc 3.10.1, cc -DUNIX -DN=500 -O2 -mips2 057 R4000 100 MHz CPU, 8 KByte I/D caches, 1 MByte external cache, Irix 4.0.5H, cc 3.10.1, cc -DUNIX -DN=500 -O2 -mips2 058 R4000 100 MHz CPU, 8 KByte I/D caches, 1 MByte external cache, Irix 4.0.5H, cc 3.10.1, cc -DUNIX -DN=500 -O2 -mips2 059 R4000 100 MHz CPU, Irix 4.0.5H, MIPS C 3.10.1, cc -O3 -ansiposix -Olimit=1400 -DN=500 -DUNIX 060 R4000 100 MHz CPU, Irix 4.0.5H, MIPS C 3.10.1, cc -O4 -ansiposix -Olimit=1400 -DN=500 -DUNIX 061 MB-8500TVX motherboard, 82437VX chipset, 256 KB L2 burst SRAM cache, 32 MB Fast Page 70 ns DRAM, gcc 2.5.7, gcc -DN=500 -DUNIX -O2 -m486 -fomit-frame-pointer 062 SunOS 4.1.3, gcc 2.2.2, gcc -O2 -DN=500 -fexpensive-optimizations marlin.nosc.mil 063 OSF/1 1.3, C89BASE120, kcc -O4 -Olimit 1400 Lmm7753.u dtime.o, Result is average of 5 runs. Note that this version of KAP does not do as well as the standard compiler with specific algorithms. 064 MB-8500TVX motherboard, 82437VX chipset, 256KB Pipelined Burst SRAM cache, 16MB Fast Page DRAM, L2 cache disabled, gcc 2.5.7, gcc -DN=500 -DUNIX -O2 -m486 -fomit-frame-pointer 065 OSF/1 1.3, CMPLRS130, kcc -O3 -Olimit 1400 -migrate -D_POSIX_SOURCE -ckapargs="-dpr=32 -arl=3 -inl -ind=10 -inll=10 -nat -o=5 -r=3 -so=3 -ur=10" -DN=500 -DUNIX 066 R4000 50 MHz CPU, 8 KByte I/D caches, 1 MByte external cache, Irix 4.0.5, cc 3.10.1, cc -DUNIX -DN=500 -O2 -mips2 -sopt,-inline 067 SunOS 4.1.3, Apogee AC2.2, apcc -O5 -DN=500 -dalign -cg92 venus.nosc.mil 068 R4000 100 MHz CPU, 8 KByte I/D caches, 1 MByte external cache, Irix 4.0.5H, cc 3.10.1, cc -DUNIX -DN=500 -O2 -mips2 -sopt,-inline 069 Pentium P5-100, 100 MHz, Windows 95, Intel MARL motherboard, Chipset 82430HX, 256KB Pipeline Burst SRAM, 32MB 60ns fast page DRAM, gcc 2.5.7, gcc -DN=500 -DUNIX -O2 -m486 -fomit-frame-pointer 070 AMD K5-PR133, 100MHz, MP070 motherboard, Intel 430HX PCI chipset, 512KB Pipelined Burst SRAM Cache, 32MB Fast Page RAM, gcc 2.7.2, gcc -DUNIX -DN=500 -O3 -malign-loops=3 -malign-jumps=4, -fomit-frame-pointer -funroll-loops -static mm.c 071 Pentium 100 Mhz, 16 MB RAM, 256 KB cache, Neptune chipset, Win95/DOS Watcom C/C++ 10.5 Dos4GW -otexan -fp5 -5r - zc -dN=500 -dMSC 072 MB-8500TVX motherboard, 82437VX chipset, 256 KB L2 burst SRAM cache, 16 MB Fast Page 60 ns DRAM, gcc 2.5.7, gcc -DN=500 -DUNIX -O2 -m486 -fomit-frame-pointer 073 R4000 100 MHz CPU, 8 KByte I/D caches, 1 MByte external cache, Irix 4.0.5H, cc 3.10.1, cc -DUNIX -DN=500 -O2 -mips2 -sopt,-inline 074 Pentium P5-90, 90 MHz, DOS 6.2, Watcom C32 V9.5 /oneatx /zp4 /5r 075 Borland C++ V5, bcc32i -DBORLAND_C -DN=500 -O2 -5, Intel ZAPPA motherboard, chipset 82437FX, 256KB asynchronous cache, 32MB Fast Page RAM. 076 Pentium P5-90, 90 MHz, 16 MByte RAM, LINUX 1.1.35, gcc 2.6.0, gcc -O2 -fexpensive-optimizations 077 OSF/1 1.3, OSFCMPLRS130, C89BASE130, KPCBASE150, cc -O3 -Olimit 1400 -ieee_with_inexact -DN=500 -DUNIX 078 R4000 100 MHz CPU, 8 KByte I/D caches, NO external cache, Irix 4.0.5H, cc 3.10.1, cc -DUNIX -DN=500 -O2 -mips2 079 Gateway Pentium P5-90, 90 MHz, 16 MB RAM, 256 KB cache, ISA/PCI, MS DOS 6.22, gcc 2.5.4, gcc -DUNIX -DN=500 -O2 080 ZEOS Pentium P5-90, 90 MHz, 16 MB RAM, 256 KB cache, ISA/PCI, MS DOS 6.22, gcc 2.5.4, gcc -DUNIX -DN=500 -O2 081 Cray YMP, UNICOS 7.0.6, cc -DFORTRAN_SEC -DN=500, NOTE: The program was compiled without optimization which means I think the code was not vectorized. That is, it was compiled as scalar. Compling with -O3 produced error in SGEMMX@ ... 082 DATEL Pentium P5-90, 90 MHz, 24 MB RAM, 256 KB cache, EISA/PCI, MS DOS 6.22, gcc 2.5.4, gcc -DUNIX -DN=500 -O2 083 SunOS 4.1.3, SC2.0.1, acc -fast -O4 -DN=500 mm.c NOTE: fast --> -fsingle -dalign -fnonstd -libmil -cg92 venus.nosc.mil 084 SunOS 4.1.3, Apogee AC2.2, apcc -DUNIX -DN=500 -O5 -cg92 -Xkap, NOTE: using the 'kap' preprocessor here. venus.nosc.mil 085 Irix 4.0.5F, MIPS C 2.40, cc -ansiposix -O4 -mips2 -Olimit 1400 -DN=500 -DUNIX, Result is average of 5 runs. 086 SCO UNIX Release 5.0.0a, cc -DUNIX_Old -O3 -Kpentium -dn, chipset 82437FX, ZAPPA motherboard, 256KB asynchronous cache, 32MB Fast Page RAM. 087 SunOS 4.1.3, SC2.0.1, acc -fast -O4 -DN=500 mm.c NOTE: fast --> -fsingle -dalign -fnonstd -libmil -cg92 rigel.nosc.mil 088 gcc 2.5.7, gcc -DUNIX -m486 -O2 -fomit-frame-pointer, Intel ZAPPA motherboard, chipset 82437FX, 256KB asynchronous cache, 32MB Fast Page RAM. 089 SunOS 4.1.2, Weitek 80 MHz CPU replacement for SPARCstation 2 (40 MHz) Sun C 2.0.1, acc -DUNIX -O4 -dalign -fnonstd -libmil -cg89 -DN=500 ariel.nosc.mil 090 Force 2CE with 80 MHz Weitek CPU, SunOS 4.1.3_U1, gcc 2.7.1-1, gcc -DUNIX -DN=500 -O2 mm.c -o mm 091 Pentium Pro 200 MHz, 66MHz external, 256 KB cache, 440FX PCIset, 32 MB EDO RAM, Windows NT 3.51/DOS Watcom C/C++ 10.5 Dos4GW no optimization -dN=500 -dMSC 092 SPARCsystem 600, 4 SuperSPARC CPU's, 50 MHz, SunOS 4.1.3, Apogee AC2.3, apcc -v -O5 -DN=500 -cg92 -Xkap -DGTODay mm.c -o mm NOTE: Results very erratic due to system load conditions and the number of CPU's devoted to the computations at any one time. As a result all results averaged at least 5 times, but up to 11 in some cases. 093 SunOS 4.1.3, 40.0 MHz, 64MBytes RAM, /bin/cc, cc -O4 -DN=500 marlin.nosc.mil 094 OSF/1 1.3, CMPLRS130, cc -O3 -Olimit 1400 -ieee_with_inexact -DN=500 -DUNIX mm.c -o mm-ieee 095 Irix 4.0.5F, MIPS C 2.40, cc -ansiposix -O4 -Olimit 1400 -DN=500 -DUNIX, Result is average of 5 runs. 096 Irix 4.0.5F, MIPS C 2.40, cc -ansiposix -O3 -Olimit 1400 -DN=500 -DUNIX, Result is average of 5 runs. 097 MB-8500TVC motherboard, 82439HX chipset, 512KB Pipelined Burst SRAM cache, 32MB EDO 60ns DRAM, Borland C++ Version 5 (bcc32i), bcc32i -DBORLAND_C -DN=500 -O2 -5 098 MB-8500TVC motherboard, 82439HX chipset, 512KB Pipelined Burst SRAM cache, 32MB EDO 60ns DRAM, gcc 2.5.7, gcc -DN=500 -DUNIX -O2 -m486 -fomit-frame-pointer 099 R4000 100 MHz CPU, 8 KByte I/D caches, NO external cache, Irix 4.0.5H, cc 3.10.1, cc -DUNIX -DN=500 -O2 -mips2 -sopt,-inline 100 SunOS 4.1.2, Weitek 80 MHz CPU replacement for SPARCstation 2 (40 MHz) Sun C 2.0.1, cc -DUNIX -fast -O4 -Bstatic -cg92 fast: -fsingle -dalign -fnonstd -libmil ariel.nosc.mil 101 SunOS 4.1.3, 50 MHz, Sun C 2.0.1, acc -DUNIX -DN=500 -fast -O4, original 40 MHz motherboard replaced with 50 MHz motherboard. fast: -fsingle -dalign -fnonstd -libmil metis.nosc.mil 102 R3000 33 MHz CPU, 32 KByte I/D caches, NO external cache, 32 MByte RAM, Irix 4.0.5H, cc 3.10.1, cc -DUNIX -DN=500 -O2 103 SunOS 4.1.2, 40 MHz, Sun C 2.0.1, acc -v -DN=500 -fast -O4 NOTE: fast --> -fsingle -dalign -fnonstd -libmil -cg89 sunspot.nosc.mil 104 Cyrix 5x86, 100 MHz, 8MB RAM, gcc 2.6.3, gcc -DGTODay -O2 105 Metrowerks C, Version DR/2. 256KB L2 cache. 32KB unified L1 cache. PowerPC 601 CPU @ 80 MHz, 80ns RAM, 40 MHz system bus. 106 AMD 5x86-P75, 133MHz, PCI, gcc 2.7.2, gcc -DUNIX -DN=500 -O3, 256KB L2 cache, 16MB DRAM (70 ns) 107 DCA/2 motherboard, 80486DX4, 100 MHz, Linux 1.2.10, 16MB fast RAM, gcc 2.5.8, gcc -O2 -m486 -fomit-frame-pointer -fecpensive-optimizations 108 L2 Cache Enabled. 486DX4/100, 100MHz, Windows 95, ExpertBoard 8449 motherboard, 256 KB L2 cache, 16 MB Fast Page DRAM, gcc 2.5.7, gcc -DN=500 -DUNIX -O2 -m486 -fomit-frame-pointer 109 SunOS 4.1.3, 40 MHz, gcc 2.5.6, gcc -DUNIX -DN=500 -O2 octopus.nosc.mil 110 AMD 486DX4/100, 100 MHz, WIndows 95, ExpertBoard 8449 motherboard, 256 KB L2 cache, 16 MB Fast Page DRAM, gcc 2.5.7, gcc -DN=500 -DUNIX -O2 -m486 -fomit-frame-pointer 111 R3000 33 MHz CPU, 32 KByte I/D caches, NO external cache, 32 MByte RAM, Irix 4.0.5H, cc 3.10.1, cc -DUNIX -DN=500 -O2 -sopt,-inline 112 L2 Cache Disabled. 486DX4/100, 100MHz, Windows 95, ExpertBoard 8449 motherboard, 256 KB L2 cache, 16 MB Fast Page DRAM, gcc 2.5.7, gcc -DN=500 -DUNIX -O2 -m486 -fomit-frame-pointer 113 SunOS 4.1.3, 2 X 40.0 MHz, /bin/cc, cc -DUNIX -DN=500 -O4 mm.c -o mm 114 SunOS 4.1.3, 2 X 40.0 MHz, /bin/cc, cc -DUNIX -DN=500 -Bstatic -O4 115 Ultrix 4.2A, R3000, 40 MHz, gcc 2.5.8, gcc -O2 -DN=500 -DUNIX 116 Ultrix 4.2A, R3000, 40 MHz, gcc 2.5.8, gcc -O2 -funroll-loops -fstrength-reduce -ffast-math -mcpu=r3000 -DN=500 -DUNIX 117 Ultrix 4.2A, R3000, 40 MHz, gcc 2.5.8, gcc -O2 -mcpu=r3000 -DN=500 -DUNIX 118 SunOS 4.1.3, 40.0 MHz, /bin/cc, cc -O4 -DN=500 octopus.nosc.mil 119 SunOS 4.1.3, 40.0 MHz, /bin/cc, cc -O4 -Bstatic -DN=500 octopus.nosc.mil 120 R3000 20 MHz CPU, 64 KByte ICache, 32 KByte DCache, NO external cache, 16 MByte RAM, Irix 4.0.5H, cc 3.10.1, cc -DUNIX -DN=500 -O2 121 Sky Shamrock, i860, 40MHz, SKYvec High C (Metaware) Compiler V2.4, hc860 -DUNIX -DN=500 -O3 -mathlib=dp -vec -vec1=2 mm.c -o mm 122 xlc 1.2.1, cc -DUNIX 123 Ultrix 4.2A, R3000, 40 MHz, gcc 2.5.8, gcc -O -DN=500 -DUNIX 124 Pentium 100 Mhz, 16 MB RAM, 256 KB cache, Neptune chipset, Win95/DOS Watcom C/C++ 10.5 Dos4GW no optimization -dN=500 -dMSC 125 Sun IPX, 40 MHz, SunOS 5.1, Sun C 2.0.1, cc -O3 -cg87 -dalign -w -c -DUNIX_Old 126 Ultrix 4.2A, R3000, 40 MHz, MIPS C V2.10, cc -O4 -Olimit=1400 -DN=500 -DUNIX 127 Ultrix 4.2A, R3000, 40 MHz, MIPS C V2.10, cc -O3 -Olimit=1400 -DN=500 -DUNIX 128 80486DX2 66 MHz, 20 MB RAM, 128 KB cache, SIS chipset, Windows95/DOS Watcom C/C++ 10.5 Dos4GW -otexan -fp5 -5r - zc -dN=500 -dMSC 129 SunOS 4.1.2, 20 MHz, Sun C 2.0.1, acc -DUNIX -DN=500 -fast -O4 belch.nosc.mil 130 DG/UX 5.4.2, gcc 2.2.2, gcc -DUNIX -O2 -funroll-loops -mno-check- zero-division -muse-div-instruction -moptimize-arg-area -DUNIX -DN=500 131 DG/UX 5.4.2, gcc 2.2.2, gcc -DUNIX -O2 -DN=500, Result is average of 5 runs. 132 DG/UX 5.4.2, gcc 2.2.2, gcc -DUNIX -O2 -funroll-loops -DUNIX -DN=500, Results is average of 5 runs. 133 Linux 1.1.54, gcc 2.5.8, gcc -DUNIX -O2 -fomit-frame-pointer, 8MB 134 R3000 20 MHz CPU, 64 KByte ICache, 32 KByte DCache, NO external cache, 16 MByte RAM, Irix 4.0.5H, cc 3.10.1, cc -DUNIX -DN=500 -O2 -sopt,-inline 135 Intel 80486DX, 33.3 MHz, MS DOS 5.0, gcc 2.2.2 for DOS, 8 MBytes Extended RAM. Accessed harddrive quite a bit. gcc -DUNIX -O2 -DN=500 -m486 -fomit-frame-pointer mm.c -o mm 136 Intel 80486DX, 33.3 MHz, MS DOS 5.0, gcc 2.4.1 for DOS, 8 MBytes Extended RAM. Accessed hardrive quite a bit. gcc -DUNIX -O2 -DN=500 -m486 -fomit-frame-pointer mm.c -o mm 137 80486DX2 66 MHz, 20 MB RAM, 128 KB cache, SIS chipset, Windows95/DOS Watcom C/C++ 10.5 Dos4GW no optimization -dN=500 -dMSC --- ### REF: 1 Mark Smotherman, mark@cs.clemson.edu, May 1993 2 Benjamin Z. Goldsteen, benjamin-goldsteen@uokhsc.edu, 04 Dec 1993 3 Al Aburto, aburto@marlin.nosc.mil, 17 Dec 1993 4 Kristen Wedberg, wedberg@mednet.gu.se, 17 Dec 1993 5 Al Aburto, aburto@belch.nosc.mil, 20 Dec 1993 6 Al Aburto, aburto@athens.nosc.mil, 06 Jan 1994 7 Jack Hunt, jack_hunt@jhuapl.edu, 12 Jan 1994 8 Mario Guerra, mguerra@inforisc.cr, 11 Feb 1994 9 Mohammad Bahathir Hashim, s8046@cs.shizupka.ac.jp, 19 Feb 1994 10 Mario Guerra, mguerra@inforisc.cr, 22 Feb 1994 11 Benjamin Z. Goldsteen, benjamin-goldsteen@uokhsc.edu, 24 Apr 1994 12 Craig S. Steele, steele@isi.edu, 04 May 1994 13 Bill Broadley, broadley@neurocog.lrdc.pitt.edu, 08 May 1994 14 Harlan W Stockman, hwstock@saix531.energylan.sandia.gov, 23 May 1994 15 Al Aburto, aburto@sunspot.nosc.mil, 29 May 1994 16 Evan Torrie, torrie@cs.stanford.edu, 31 May 1994 17 Mario Guerra, mguerra@cariari.ucr.ac.cr, 10 Aug 1994 18 Al Aburto, aburto@marlin.nosc.mil, 05 Nov 1994 20 Francis Courteaux, courtox@univ-rennes1.fr, 12 dec 1994 21 Michael Meskes, meskes@feivel.informatik.rwth-aachen.de, 23 Mar 1995 22 Paul Caskey, pcaskey@swcp.com, 16 Oct 1995 23 Paul Caskey, pcaskey@swcp.com, 07 Nov 1995 24 Robert Debath, rdebath@cix.compulink.co.uk, 26 Nov 1995 25 Al Aburto, aburto@nosc.mil, 30 Nov 1995 26 Zack Smith, zacksmith@mcimail.com, 10 Mar 1996 27 Kari Seppanen, kse@tell.tte.vtt.fi, 31 May 1996 28 Manuel Blanca, 101347.3363@compuserve.com, 01 Sep 1996 29 Manuel Blanca, 101347.3363@compuserve.com, 02 Sep 1996 30 Manuel Blanca, 101347.3363@compuserve.com, 08 Sep 1996 31 Manuel Blanca, 101347.3363@compuserve.com, 12 Sep 1996 32 Manuel Blanca, 101347.3363@compuserve.com, 16 Sep 1996 33 Manuel Blanca, 101347.3363@compuserve.com, 18 Sep 1996 34 Paul Caskey, pcaskey@swcp.com, 27 Sep 1996 35 Manuel Blanca, 101347.3363@compuserve.com, 05 Oct 1996 36 Manuel Blanca, 101347.3363@compuserve.com, 07 Oct 1996 37 Roy Longbottom, 101323.2241@compuserve.com, 05 Nov 1996 38 Al Aburto, aburto@nosc.mil, 10 Nov 1996 39 Manuel Blanca, 101347.3363@compuserve.com, 18 Nov 1996 40 Paul Caskey, pcaskey@swcp.com, 04 Dec 1996 41 Paul Caskey, pcaskey@swcp.com, 08 Jan 1997 42 Randy Brannan, brannan@nosc.mil, 10 Jan 1997 43 Jean-Marc Drezet, drezet@math.jussieu.fr, 17 Jan 1997 44 Paul Caskey, pcaskey@swcp.com, 28 Jan 1997 45 Manuel Blanca, 101347.3363@compuserve.com, 28 Jan 1997 46 Paul Caskey, pcaskey@swcp.com, 31 Jan 1997 47 Paul Caskey, pcaskey@swcp.com, 07 Feb 1997 48 Manuel Blanca, 101347.3363@compuserve.com, 12 Feb 1997 49 Richard Myers, myersr@bonehead.nosc.mil, 19 Feb 1997 50 Al Aburto, aburto@nosc.mil, 04 Mar 1997 51 Al Aburto, aburto@nosc.mil, 11 Mar 1997 52 Kari Seppanen, kse@farfalle.tte.vtt.fi, 01 Apr 1997 53 David Bass, david.bass@eurocontrol.be, 15 Apr 1997 54 Wolfram Wagner, ww@mpi-sb.mpg.de, 17 Apr 1997 55 Manuel Jose Blanca Molinos, 101347.3363@compuserve.com, 18 May 1997 56 Manuel Blanca, 101347.3363@compuserve.com, 02 Aug 1997 57 Manuel Blanca, 101347.3363@compuserve.com, 11 Sep 1997 58 Paul Caskey, pcaskey@swcp.com, 01 Oct 1997 Al Aburto aburto@nosc.mil