I just did a whole bunch of benchmarking on my ATI Radeon 5830, with an overclocked core of 1060MHz. The goal was to find out all about the new AMD Stream SDK 2.6's OpenCL performance, and what miner optimizations may get performance back where it was with 2.5.775.2 and previous SDK runtimes.
The 5830 is basically a 5870 (the same chip die) with the 1600 stream processors cut down to 1120 (failed die yields or simply product segmentation). The stock core clock on the 5870 is 850MHz, while the 5830 is 800MHz.
My card's highest possible mhash with 2.4-2.5 are at 1060/366Mhz with Phoenix 1.7.2 (exe) phatk2 VECTORS AGGRESSION=13 FASTLOOP=False WORKSIZE=256. The odd 366MHz optimal RAM speed differs on 6xxx, 5850, and 57xx, where 300MHz is where the peak performance is usually found.
Lets see what we find, I was curious what Diapolo's new 2011-12-21 kernel could do, so lots of Phoenix 1.7.2 benchmarks with different kernels, worksize, and RAM speeds (the scaling is similar to standard phatk2), tests were ran for several minutes and shares (unless the setting was obviously poor), with the -a averaging option:
SDK 2.5/11.6worksize: | | | 256 | 128 | 64 |
Diapolo | VECTORS2 | 366MHz | 340.87 | 335.56 | 324.79 |
Diapolo | VECTORS2 | 1000MHz | 328.81 | 335.56 | 332.32 |
Diapolo | VECTORS4 | 366MHz | 200.94 | 204.04 | 200.97 |
Diapolo | VECTORS4 | 1000MHz | 244.39 | 250.09 | 244.39 |
phatk2 | VECTORS | 366MHz | 345.32 | | |
Note: VECTORS3 is in the diapolo init file, but phoenix crashes with a python traceback error if it is used.SDK 2.6/11.12 (12.1 is identical)worksize: | | | 256 | 128 | 64 |
Diapolo | VECTORS2 | 366MHz | 332.99 | 327.86 | 321.16 |
Diapolo | VECTORS2 | 1000MHz | 326.03 | 327.90 | 325.99 |
Diapolo | VECTORS4 | 366MHz | 219 | 298 | 329.74 |
Diapolo | VECTORS4 | 1000MHz | 255 | 278 | 216 |
phatk2 | VECTORS | 366MHz | 307.37 | 306.78 | 297.25 |
phatk2 | VECTORS | 1000MHz | 300.87 | 304.61 | 298.81 |
phatk2 | VECTORS4 | 366MHz | 217.42 | 289.25 | 288.78 |
phatk2 | VECTORS4 | 1000MHz | 262.69 | 339.5 | 340.21 |
phatk | VECTORS | 366MHz | 326 | 323.5 | 315.3 |
phatk | VECTORS | 1000MHz | 317 | 321.2 | 320.5 |
Conclusions:
-AMD seems to have fixed the "bug" where underclocking memory is required for best performance; there is no massive drop at full RAM clock speed in most cases.
-Performance using phatk2/VECTORS4/WORKSIZE 64 is
99% of the old maximum, but with the video card using more power because of not underclocking RAM
-The new driver/SDK fixes the CPU bug; maybe you will break even in power usage by not using 100% CPU too...
-The stock 1000MHz RAM speed is the optimum speed, from 300MHz-1200MHz, the highest performance was at 1000MHz, and no further tweaking could get more out of the best setting.
-If you need the latest driver, play games, don't want to underclock, now you have your setting! Dedicated miners still get more hashrate for less wattage with SDK 2.5 (until OpenCL kernel tweakers can figure out how to get even more out of 2.6...)
Soon to come: recompiling pyOpenCL against the 2.6 SDK. jedi95 might try this for his exe distribution. pyOpenCL-0.98 seems to get about 1% more than the newest pyOpenCL-2011.2, BTW, and appears to be the version rolled into the exe. done: no difference. Be sure to use PyOpenCL 0.98 for best performance if running from source/Linux.
edit: My 5770 give similar results - it's default memory clock is 1200MHz, but the performance peak is at 1000MHz. VECTORS4 AGGRESSION=12 FASTLOOP=False WORKSIZE=128 gives me 229Mhash/s, vs the best before at 230Mhash/s.