Pages:
Author

Topic: [ANN][GRS][DMD][DGB] Pallas optimized groestl opencl kernels - page 10. (Read 61229 times)

hero member
Activity: 630
Merit: 500
Trust me 14.7RC3 is best.
Then u are unlucky enough to have a "Locked" card.
Only recourse for u is vbios modding your card if u want to lower memclock to 150 and be able to do higher overclock on gpu.
Do the research on vbios modding ... there are pointers in this thread by myself, I hate repeating my self a hundred times that's why the info is in thread.
https://bitcointalksearch.org/topic/m.9043545

BTW I can still clock mem at 1625 via sgminer setting when I mine X11 or Neoscrypt with it ... just have to set it manual for them ...

Welcome to Extreme Diamond Mining LOL
hero member
Activity: 732
Merit: 500
hmm now I see miner still showing the stock memlock 1550 for 280x vapor. why its not changed? driver 14.4
I changed it from the batch file also from the miner later.
Why can't ppl read the thread ... u will have best performance with driver 14.7RC3 ...

Many 280x are locked to small range of adjustment on clocks (PowerColor 280x being one of them, that's why I had to low-level vbios mod mine.  Also many 280x will throttle gpu-clock at temps above 72C)

thread readed. tryed 14.4 14.6 14.7 14.9
can't put the cards at lower memlock than the stock 1550 Sad maybe really locked!
hero member
Activity: 630
Merit: 500
hmm now I see miner still showing the stock memlock 1550 for 280x vapor. why its not changed? driver 14.4
I changed it from the batch file also from the miner later.
Why can't ppl read the thread ... u will have best performance with driver 14.7RC3 ...

Many 280x are locked to small range of adjustment on clocks (PowerColor 280x being one of them, that's why I had to low-level vbios mod mine.  Also many 280x will throttle gpu-clock at temps above 72C)
hero member
Activity: 732
Merit: 500
hmm now I see miner still showing the stock memlock 1550 for 280x vapor. why its not changed? driver 14.4
I changed it from the batch file also from the miner later.
legendary
Activity: 2716
Merit: 1094
Black Belt Developer
I am at stock core 1070 and mem 1100. this is the difference maybe.
Not to be condescending but have u tried on sgminer command line
--gpu-clock 1100 --mem-clock 150

I will stay at core 1070(dont like to overclock) but will set mem at 150 to see the result.

edit. hm again 23.4 but lower temps. thats fine enough I think.

lower mem clock = less power usage and bigger core overclock potential.
hero member
Activity: 732
Merit: 500
I am at stock core 1070 and mem 1100. this is the difference maybe.
Not to be condescending but have u tried on sgminer command line
--gpu-clock 1100 --mem-clock 150

I will stay at core 1070(dont like to overclock) but will set mem at 150 to see the result.

edit. hm again 23.4 but lower temps. thats fine enough I think.
hero member
Activity: 732
Merit: 500
Can I ask what software you are using to change the values as I'm using msi after burner 4.1 but it wont change the memory clock and the core clock is always lower than what I set?

These changes are really pushing my cards now as they normally sat at 50'C but are now over 60'C (They are watercooled). I will put up power usage when i next reboot and plug the power meter in.

also afterburner but 14.4 driver for me
hero member
Activity: 630
Merit: 500
Wow I got Best share: 702K
Was it a block?Huh :-D

Now let's get serious: I finally have a little time to write some considerations on the ocl and asm kernels.
I believe we should pursue the asm path for a number or reasons:

- currently the OCL kernel is a little faster on hawaii but not on all other cards and I don't think it can be improved in this respect
- the OCL kernel has been tweaked and optimized for months, while the asm one is new so there is probably much more room for improvement
- just by applying the first and last round optimization the asm kernel will probably be faster on hawaii as well; I'm sure that Realhet will find other asm tricks to apply
- with all these catalyst version problems, the best way to share kernels for the people to mine is by bin files, making the asm version and ocl equivalent (for distribution purposes); better yet would be a miner with all the bundled bin files (takes time)
- asm is cooler than ocl ;-)

what do you guys think?
I'm all for sticking with asm route ... u need to feed your ocl tweaks to realhet and lets maximize asm kernel.
As I already suggested to realhet "cross-compile" to generate bins for all arch we support is possible, he needs our bins created on each arch to dig out minor diffs between bins.
newbie
Activity: 13
Merit: 0
Well the best i can run at without crashing is at 1040 core / 1250 memory as it wont go lower with a -0.055 core volt drop

1 card the rig pulls 510W at 28.5MH/s so 0.056MH per watt
2 cards the rig pulls 740W at 57MH/s so 0.077MH per watt
3 cards the rig pull 990W at 85.5MH/s so 0.086MH per watt

which is about 230 - 250W per card with 0.114MH per watt excluding the system use

if anybody can get a higher hash per watt then let me know

EDIT

Even at that rate with my electricity costs i cant make a profit...
hero member
Activity: 630
Merit: 500
I am at stock core 1070 and mem 1100. this is the difference maybe.
Not to be condescending but have u tried on sgminer command line
--gpu-clock 1100 --mem-clock 150
legendary
Activity: 2716
Merit: 1094
Black Belt Developer
Wow I got Best share: 702K
Was it a block?Huh :-D

Now let's get serious: I finally have a little time to write some considerations on the ocl and asm kernels.
I believe we should pursue the asm path for a number or reasons:

- currently the OCL kernel is a little faster on hawaii but not on all other cards and I don't think it can be improved in this respect
- the OCL kernel has been tweaked and optimized for months, while the asm one is new so there is probably much more room for improvement
- just by applying the first and last round optimization the asm kernel will probably be faster on hawaii as well; I'm sure that Realhet will find other asm tricks to apply
- with all these catalyst version problems, the best way to share kernels for the people to mine is by bin files, making the asm version and ocl equivalent (for distribution purposes); better yet would be a miner with all the bundled bin files (takes time)
- asm is cooler than ocl ;-)

what do you guys think?
hero member
Activity: 630
Merit: 500
Can I ask what software you are using to change the values as I'm using msi after burner 4.1 but it wont change the memory clock and the core clock is always lower than what I set?

These changes are really pushing my cards now as they normally sat at 50'C but are now over 60'C (They are watercooled). I will put up power usage when i next reboot and plug the power meter in.
Not sure if you can do this to a 290/290x card because vbios likely to be quite different.  You will have to do some research before you attempt my method usiing VBE7.0.0.7b.exe it is a video bios editor u can use to change voltages, clocks at board level.  If I remember correctly it was only for Tahiti cards ... do your research, then u flash vbios with atiwinflash.  There may be programs like msi afterburner but I did my card at low level Smiley Again check  out if it will work on your card before you do it or u can "brick" your card haha
newbie
Activity: 13
Merit: 0
Can I ask what software you are using to change the values as I'm using msi after burner 4.1 but it wont change the memory clock and the core clock is always lower than what I set?

These changes are really pushing my cards now as they normally sat at 50'C but are now over 60'C (They are watercooled). I will put up power usage when i next reboot and plug the power meter in.
hero member
Activity: 732
Merit: 500
I am at stock core 1070 and mem 1100. this is the difference maybe.
hero member
Activity: 630
Merit: 500
Thanks, I wasnt disabling the intel in UEFI which was my problem. its working now at 26.5MH/s per card which is amazing.
I'm using 14.7r3, xI 2048, 1100/150, -w 256 undervolted to 1.00 and getting 23.38 MH/s.  What's your config?

same here. 23.4
witch is the right miner?
Yer not gonna get 30-33MHs right out of the box on 290/290x, you will have to tune intensity, gpu clock, mem clock (lowest possible).  Pallas can help with these cards if he's in right mood.
On 280x (1180/150) I was able to use my tuning from previous kernel to get 26.0MHs only because it was already maxed out Smiley Volt modded, vbios modded etc.  Info about these techniques is in the thread if you look about ...
As far as miner ... sgminer 4.1.0 (sph) is what I use ...
hero member
Activity: 732
Merit: 500
Thanks, I wasnt disabling the intel in UEFI which was my problem. its working now at 26.5MH/s per card which is amazing.
I'm using 14.7r3, xI 2048, 1100/150, -w 256 undervolted to 1.00 and getting 23.38 MH/s.  What's your config?

same here. 23.4
witch is the right miner?
legendary
Activity: 2716
Merit: 1094
Black Belt Developer
* I've updated the main page with benchmark data I've collected: http://realhet.wordpress.com/gcn-asm-groestl-coin-kernel/

30 Mh/s is for the r9 290, 290x does 33.
hero member
Activity: 630
Merit: 500
ROFL u just had to try 14.12 hahahaha ... now back to 14.7RC3 LOL
newbie
Activity: 32
Merit: 0
Well that neoscrypt is quiet complicated. I can't even got it compiled as I think it needs more defines than just WORKSIZE alone. Some day I gonna chack it from a closer view as it is interesting...

Now I have Cat 14.12 omega (whatever it is) now. Asm kernel is unchanged, original Ocl kernel is 15% faster than cat 14.9 but it is still way too bad.

I've compared your diamondTahiti compilation with my Capeverde one. The differences are not that complicated:
- In the ELF's header the 'archtype' field is 3FF vs. 3FD
- In the small binary info section (outer elf)   2x bytes are different: 9F vs. 9C
- In the small binary info section (inner elf)   one byte difference: 1C vs. 1A
- In the text ARG section the only difference is the strings: capeverde vs. tahiti

So if I collect all these constants/strings I can convert from one to another. But Capeverde and Tahiti are identical chips. It's possible that the binary of Hawaii is much more different.
And yet the two binary (capeverde and tahiti) are almost the same, the clBuildKernel() checks for hardware ids and refuses to load it.
hero member
Activity: 630
Merit: 500
Look better Smiley  Dev 0 is 280x Dev 1 is 7950

List of opencl devices:
Device #0
Target: Tahiti  Series: 7  Core:1150 MHz  CU:32  RAM:3072 MB  UID:4098
ext: cl_khr_fp64 cl_amd_fp64 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_gl_sharing cl_ext_atomic_counters_32 cl_amd_device_attribute_query cl_amd_vec3 cl_amd_printf cl_amd_media_ops cl_amd_media_ops2 cl_amd_popcnt cl_khr_d3d10_sharing cl_khr_d3d11_sharing cl_khr_dx9_media_sharing cl_khr_image2d_from_buffer cl_khr_spir cl_khr_gl_event
Device #1
Target: Tahiti  Series: 7  Core:1150 MHz  CU:28  RAM:3072 MB  UID:4098
ext: cl_khr_fp64 cl_amd_fp64 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_gl_sharing cl_ext_atomic_counters_32 cl_amd_device_attribute_query cl_amd_vec3 cl_amd_printf cl_amd_media_ops cl_amd_media_ops2 cl_amd_popcnt cl_khr_d3d10_sharing cl_khr_d3d11_sharing cl_khr_dx9_media_sharing cl_khr_image2d_from_buffer cl_khr_spir cl_khr_gl_event

Using device:
Target: Tahiti  Series: 7  Core:1150 MHz  CU:32  RAM:3072 MB  UID:4098
ext: cl_khr_fp64 cl_amd_fp64 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_gl_sharing cl_ext_atomic_counters_32 cl_amd_device_attribute_query cl_amd_vec3 cl_amd_printf cl_amd_media_ops cl_amd_media_ops2 cl_amd_popcnt cl_khr_d3d10_sharing cl_khr_d3d11_sharing cl_khr_dx9_media_sharing cl_khr_image2d_from_buffer cl_khr_spir cl_khr_gl_event
* core MHz value is not always accurate, use Catalyst Control Center (or ADL) instead!

Using new GCN ASM code
Kernel binary saved: C:\Miners\HetPas150111_Groestl\groestl\kernel_dump\kernel.elf

elapsed: 53.609 ms  24.449 MH/s   gain:  12.22x
elapsed: 50.710 ms  25.847 MH/s   gain:  12.92x
elapsed: 50.670 ms  25.868 MH/s   gain:  12.93x
elapsed: 50.707 ms  25.849 MH/s   gain:  12.92x

Functional test: RESULT IS OK

   idx        hi       lo           hi           lo
     0: 16410000 D9080000    373358592   -653787136
     1: 4A820000 D0630000   1250033664   -798818304
     2: C3E00000 EDA60000  -1008730112   -307888128
     3: 1F020100 33FF0000    520225024    872349696
     4: 8A200100 F8F10000  -1977614080   -118423552
     5: 9F3A0100 C22D0100  -1623588608  -1037238016
     6: A6000200 86D40100  -1509948928  -2032926464
     7: C52A0200 A7190200   -987102720  -1491533312
     8: 36610200 F6380200    912327168   -164101632
     9: B8E80200 6BAB0200  -1192754688   1806369280
     A: B72B0300 E9280300  -1221917952   -383253760
     B: 684F0300 B04C0300   1750008576  -1337195776
     C: FA6F0300 F15D0300    -93388032   -245562624
     D: A9B80300 BE8D0300  -1447558400  -1098054912
     E: 06CF0300 5FCF0300    114230016   1607402240
     F: DCF90300 EDF00300   -587660544   -303037696
    10: FF300400 2F0A0400    -13630464    789185536
    11: D2DB0400 D5830400   -757398528   -712834048
    12: 97060500 53CD0400  -1761213184   1405944832
    13: E3100500 77160500   -485489408   1997931776
    14: 0E2E0500 3E1B0500    237896960   1041958144
    15: FA460500 2F490500    -96074496    793314560
    16: 2A860500 D0650500    713426176   -798685952
    17: 4BCC0500 8C950500   1271661824  -1936390912
    18: 860F0600 1DED0500  -2045835776    502072576
    19: 3B810600 3C710600    998311424   1014040064
    1A: E09B0600 E9840600   -526711296   -377223680
    1B: 58FE0600 56AF0600   1493042688   1454310912
    1C: 44160700 CBF90600   1142294272   -872872448
    1D: F9240700 DA1F0700   -115079424   -635500800
    1E: 79910700 64700700   2039547648   1685063424
    1F: 98FD0700 DFA10700  -1728248064   -543095040
    20: 44450800 8E0E0800   1145374720  -1911683072
    21: 1E4D0800 8B570800    508364800  -1957230592
    22: 317D0800 52670800    830277632   1382483968
    23: 20A30800 BE830800    547555328  -1098708992
    24: CFAE0800 FCAA0800   -810678272    -55965696
    25: AED30800 00B40800  -1361901568     11798528
    26: 37150900 1D070900    924125440    487000320
    27: 37570900 EE210900    928450816   -299824896
    28: 8F9C0900 21740900  -1885599488    561252608
    29: 729D0900 38960900   1922894080    949356800
    2A: 8C270A00 10D20900  -1943598592    282200320
    2B: A5460A00 163C0A00  -1522136576    373033472
    2C: 93540A00 2E470A00  -1823208960    776407552
    2D: FF7A0A00 19650A00     -8779264    426052096
    2E: BCEA0A00 A09A0A00  -1125512704  -1600517632
    2F: 94210B00 76F80A00  -1809773824   1995966976
    30: 2E5B0B00 38310B00    777718528    942738176
    31: 0BF70B00 27610B00    200739584    660671232
    32: CB8B0C00 EA5D0C00   -880079872   -363000832
    33: 2AA20C00 D59B0C00    715262976   -711259136
    34: 2AB00C00 38AB0C00    716180480    950733824
    35: 79DB0C00 DFC60C00   2044398592   -540668928
    36: A21B0D00 5D0E0D00  -1575285504   1561201920
    37: 05370D00 84190D00     87493888  -2078733056
    38: 58A90D00 4FAA0D00   1487473920   1336544512
    39: 26EF0D00 BAB10D00    653200640  -1162801920
    3A: EA030E00 E0F50D00   -368898560   -520811264
    3B: 960B0E00 A5090E00  -1777660416  -1526133248
    3C: 12410E00 2F140E00    306253312    789843456
    3D: 785B0E00 47490E00   2019233280   1195970048
    3E: 017C0E00 5D7B0E00     24907264   1568345600
    3F: 87B40E00 4D9A0E00  -2018243072   1301941760
    40: 83E20E00 7ACB0E00  -2082337280   2060127744
    41: 11110F00 85E70E00    286330624  -2048455168
    42: F3270F00 AE130F00   -215544064  -1374482688
    43: 19540F00 E8390F00    424939264   -398913792
    44: F4630F00 2C5D0F00   -194834688    744296192
    45: 66780F00 997A0F00   1719144192  -1720054016
    46: D1AE0F00 0AA50F00   -777122048    178589440
    47: 96C00F00 65AD0F00  -1765798144   1705840384
    48: 4ECB0F00 D8C40F00   1321930496   -658239744
    49: 4CF90F00 F1DF0F00   1291390720   -237039872
    4A: 6A1C1000 44171000   1780224000   1142362112
    4B: 62E21000 80841000   1658982400  -2138828800
    4C: 6F2B1100 26141100   1865093376    638849280
    4D: 3D481100 DB351100   1028133120   -617279232
    4E: A04F1100 F3521100  -1605431040   -212725504
    4F: 02BC1100 06881100     45879552    109580544
    50: 1DD71100 73D41100    500633856   1943277824
    51: 0CE51100 E7DF1100    216338688   -404811520
    52: BEFA1100 8DEA1100  -1090907904  -1914040064
    53: F6341200 DB2F1200   -164359680   -617672192
    54: 99541200 54331200  -1722543616   1412633088
    55: 10931200 01711200    278073856     24187392
    56: 1BAC1200 DBA21200    464261632   -610135552
    57: 7EF11200 37F01200   2129728000    938480128
    58: 7FFB1200 40F51200   2147160576   1089802752
    59: 84811300 38791300  -2071915776    947458816
    5A: 6DCA1300 98B11300   1841959680  -1733225728
    5B: 0A001400 64F01300    167777280   1693455104
    5C: 00000000 00000000            0            0
    5D: 00000000 00000000            0            0
    5E: 00000000 00000000            0            0
    5F: 00000000 00000000            0            0
    60: 00000000 00000000            0            0
    61: 00000000 00000000            0            0
    62: 00000000 00000000            0            0
    63: 00000000 00000000            0            0
    64: 00000000 00000000            0            0
    65: 00000000 00000000            0            0
    66: 00000000 00000000            0            0
    67: 00000000 00000000            0            0
    68: 00000000 00000000            0            0
    69: 00000000 00000000            0            0
    6A: 00000000 00000000            0            0
    6B: 00000000 00000000            0            0
    6C: 00000000 00000000            0            0
    6D: 00000000 00000000            0            0
    6E: 00000000 00000000            0            0
    6F: 00000000 00000000            0            0
    70: 00000000 00000000            0            0
    71: 00000000 00000000            0            0
    72: 00000000 00000000            0            0
    73: 00000000 00000000            0            0
    74: 00000000 00000000            0            0
    75: 00000000 00000000            0            0
    76: 00000000 00000000            0            0
    77: 00000000 00000000            0            0
    78: 00000000 00000000            0            0
    79: 00000000 00000000            0            0
    7A: 00000000 00000000            0            0
    7B: 00000000 00000000            0            0
    7C: 00000000 00000000            0            0
    7D: 00000000 00000000            0            0
    7E: 00000000 00000000            0            0
    7F: 000000B8 00000000          184            0
Pages:
Jump to: