Pages:
Author

Topic: My initial Radeon HD 7970 mining benchmarks - page 8. (Read 46778 times)

newbie
Activity: 43
Merit: 0
I can't wait till the 7990 that is going to be impressive but expensive  Sad I might have missed this but what is the heat like hashing overclocked ? and what fan speed

Overclocked @ 1125/975MHz with automatic fan speed I'm getting temperatures hovering 81-83C, and the fan runs at 47-49% speed. You can see some screencaps on one of the earlier pages. But since I prefer lower temperatures and am worried about VRM and memory temps not yet being reported by GPU-Z, I usually run it at 60% fan speed and get temps around 72C. The blower fan at 60% speed is quite loud (its a reference design from Sapphire).

At 100% fan speed, the overclocked card gets below 60C while mining but you can hear it from outside of the house at this point Tongue, so as lovely as these temps are this is not an option for me as it is also my gaming and work PC.
hero member
Activity: 560
Merit: 501
okay, i will fly to Singapore and pick one up if it all makes you happy....


i got a girl there:P
Is it Mrs. Zhou Tong?
donator
Activity: 1218
Merit: 1079
Gerald Davis
Hey OP do you have a kill-a-watt you could purchase locally.  If you are in the states Home Depot and Lowes carry them.  If you can find one locally I am sure we could get together the 3 or 4 BTC to get some accurate power readings.

The kill-a-watt brand doesn't appear to be commercialized here in europe, and I've been searching for an equivalent device locally each time I've had a chance to head out to a store for the past couple of days, but no luck so far.

Well that sucks.  A more universal albeit expensive tool is a clamp meter. 

newbie
Activity: 43
Merit: 0
Wait wait wait. Are we sure uint16 is such a good idea? Last time I tried >4 (which was before 2.6, btw, I haven't tested with 2.6), it would crash in the compiler. Also, does anyone have a count on the number of registers per CU? There might not be enough registers to handle that.

I'm not sure if it's a good idea or not so I wanted to measure it Wink GCN has 64KB worth of registers per CU, and like you said I'm not sure if that's enough. The reason for my curiosity was because GCN's compute units each contain 4 x SIMD units with a width of 16 elements (same size as Larrabee & Intel's MIC, coincidentally), and I recall reading somewhere that each of these SIMD units can retire one 16-way instruction every 4 cycles, so those 16element vectors kind of rang out at me. I also wanted to get familiar with the OpenCL bitcoin mining code and thought it would be a neat exercise (which it was!). Nice code by the way.

I can say for sure that 16element vectors DO compile with the drivers that came with the card.

The -ds code dump for 16 element vectors came out nice and clean, although the last few lines where the result is stored in output seem a bit branchy. It looks something like this:

Code:
    if(XG2.s0 == 0x136032ED) { output[Xnonce.s0 & 0xF] = Xnonce.s0; }
    if(XG2.s1 == 0x136032ED) { output[Xnonce.s1 & 0xF] = Xnonce.s1; }
    if(XG2.s2 == 0x136032ED) { output[Xnonce.s2 & 0xF] = Xnonce.s2; }
    ...
    ...
    if(XG2.sd == 0x136032ED) { output[Xnonce.sd & 0xF] = Xnonce.sd; }
    if(XG2.se == 0x136032ED) { output[Xnonce.se & 0xF] = Xnonce.se; }
    if(XG2.sf == 0x136032ED) { output[Xnonce.sf & 0xF] = Xnonce.sf; }

I tried replacing it with a branch-less expression using shuffle() and vstore16() but haven't managed to get it working. What I've come up with looks something like this:

Code:
    x mask = Xnonce & 0xF;
    x temp = shuffle(select(Xnonce, 0, selection), mask);
    vstore16(temp, 0, output);

Anyhow I'm sure that my code modifications are doing all sorts of dumb things. I'm still learning how it all works so please ignore.

Also, check some of the larger -vs, -v 40 is two sets of uint4 and -v 44 does three uint4s (unlike cgminer, -v 4 does two uint2s).

I've tried all of the different -v settings available (according to the source) but haven't been able to get any higher than the 666MH/s with the default settings and 3 compute threads.
newbie
Activity: 70
Merit: 0
I can't wait till the 7990 that is going to be impressive but expensive  Sad I might have missed this but what is the heat like hashing overclocked ? and what fan speed
legendary
Activity: 1162
Merit: 1000
DiabloMiner author
Hey OP do you have a kill-a-watt you could purchase locally.  If you are in the states Home Depot and Lowes carry them.  If you can find one locally I am sure we could get together the 3 or 4 BTC to get some accurate power readings.

The kill-a-watt brand doesn't appear to be commercialized here in europe, and I've been searching for an equivalent device locally each time I've had a chance to head out to a store for the past couple of days, but no luck so far.

I also took a stab at modifying DiabloMiner and managed to get it to use 16component vectors, which is what GCN is supposed to be tuned for, but performance isn't what I expect and its really hard to profile/debug the tahiti since I could not find any development tools that specificly support it yet.

Wait wait wait. Are we sure uint16 is such a good idea? Last time I tried >4 (which was before 2.6, btw, I haven't tested with 2.6), it would crash in the compiler. Also, does anyone have a count on the number of registers per CU? There might not be enough registers to handle that.

Also, check some of the larger -vs, -v 40 is two sets of uint4 and -v 44 does three uint4s (unlike cgminer, -v 4 does two uint2s).
newbie
Activity: 43
Merit: 0
Hey OP do you have a kill-a-watt you could purchase locally.  If you are in the states Home Depot and Lowes carry them.  If you can find one locally I am sure we could get together the 3 or 4 BTC to get some accurate power readings.

The kill-a-watt brand doesn't appear to be commercialized here in europe, and I've been searching for an equivalent device locally each time I've had a chance to head out to a store for the past couple of days, but no luck so far.

I also took a stab at modifying DiabloMiner and managed to get it to use 16component vectors, which is what GCN is supposed to be tuned for, but performance isn't what I expect and its really hard to profile/debug the tahiti since I could not find any development tools that specificly support it yet.
sr. member
Activity: 271
Merit: 250
man wat a beast
legendary
Activity: 1162
Merit: 1000
DiabloMiner author
(because large parts of the chip shut off).
I know that the shaders are used to do the hashing, but is it possible to utilize more of the chip, even if it were at dramatically lower efficiency?

No. I already tried to abuse the texture/memory fetch units, but couldn't figure out a useful way of doing it. Its all fixed function hardware and its not particularly interesting for what we do. Although, I may go try that again, SDK 2.6 seems to be a much better compiler in some areas.
rjk
sr. member
Activity: 448
Merit: 250
1ngldh
(because large parts of the chip shut off).
I know that the shaders are used to do the hashing, but is it possible to utilize more of the chip, even if it were at dramatically lower efficiency?
legendary
Activity: 1162
Merit: 1000
DiabloMiner author
You think it will be 200 watts w/ a 20% overclock?  I wish the OP had a kill-a-watt.

Thats at stock clocks obviously. I don't know what the mining values will be, all the cards draw less than their full wattage at stock speeds when mining (because large parts of the chip shut off). I imagine 79xx may even get a larger efficiency boost due to this because of AMD's work on power saving, but without a killawatt test, no one knows.
donator
Activity: 1218
Merit: 1079
Gerald Davis
In the UK currently :

5870 costs 170 GBP and gets 440 mhash/s so about 2.6 mhash/GBP

7970 will cost rougly 430 GBP and get 666 mhash/s so about 1.6 mhash/GBP

Thus, the 5870 is still much better and you can also get a 5970 that gets 850 mhash/s for about 400 GBP.

Power figures ?

7970 is going to be 200 watts I believe, and the 5870 is 188 (both at stock watts). This is where the 7970 suddenly shines. Even if the 7970 is 250 watts, thats still a jump in efficiency.

You think it will be 200 watts w/ a 20% overclock?  I wish the OP had a kill-a-watt.

Hey OP do you have a kill-a-watt you could purchase locally.  If you are in the states Home Depot and Lowes carry them.  If you can find one locally I am sure we could get together the 3 or 4 BTC to get some accurate power readings.
legendary
Activity: 1162
Merit: 1000
DiabloMiner author
In the UK currently :

5870 costs 170 GBP and gets 440 mhash/s so about 2.6 mhash/GBP

7970 will cost rougly 430 GBP and get 666 mhash/s so about 1.6 mhash/GBP

Thus, the 5870 is still much better and you can also get a 5970 that gets 850 mhash/s for about 400 GBP.

Power figures ?

7970 is going to be 200 watts I believe, and the 5870 is 188 (both at stock watts). This is where the 7970 suddenly shines. Even if the 7970 is 250 watts, thats still a jump in efficiency.
hero member
Activity: 914
Merit: 500
Very interesting results! The only missing piece is the power draw from the wall.

My only hesitations at this point are:

1) Price Point/Performance is still super high when compared to used 58xx series cards

2) Lack of optimization in Miners for any new features in GCN/SDK 2.6. Current Miners are heavily optimized for VLIW4/5, so obviously there's going to need to be some re-working for full GCN support.

The only way I can see this card being a viable miner is that it needs to outperform 5970/6990 in performance per watt and $/mhash, otherwise it's just a good excuse to see more 58xx's hitting eBay since gamers will be upgrading...

Thanks for the initial benchmarks though OP! Smiley
newbie
Activity: 10
Merit: 0
In the UK currently :

5870 costs 170 GBP and gets 440 mhash/s so about 2.6 mhash/GBP

7970 will cost rougly 430 GBP and get 666 mhash/s so about 1.6 mhash/GBP

Thus, the 5870 is still much better and you can also get a 5970 that gets 850 mhash/s for about 400 GBP.

Power figures ?
hero member
Activity: 504
Merit: 500
With those two changes to the default configuration of cgminer hashes start to get accepted, but the 290MH/s hashing performance with the default settings (-g 2 -v 2 -w 128) for this kernel were slower than the 310MH/s from the trusty OC'd HD5850 that this new card replaced, so I played around with the --gpu-threads, --vectors and --worksize settings and here's a small table with the results:
--gpu-threads 1 --vectors 2 --worksize  32 : 141MH/s
--gpu-threads 1 --vectors 2 --worksize  64 : 285MH/s
--gpu-threads 1 --vectors 2 --worksize 128 : 283MH/s
--gpu-threads 1 --vectors 2 --worksize 256 : 284MH/s

--gpu-threads 1 --vectors 4 --worksize  32 :  66MH/s
--gpu-threads 1 --vectors 4 --worksize  64 : 133MH/s
--gpu-threads 1 --vectors 4 --worksize 128 : 133MH/s
--gpu-threads 1 --vectors 4 --worksize 256 : 133MH/s
Not that it might matter much at this point but with vectors 4, and I believe 2, to some extent. There is a need to adjust the memory clock in order to optimize it. I am not sure it would even help being CGN. But, if you get time, I'd check it out. Sadly, I've no clue where that thread is at this time. :/

**** UPDATE ****

Someone suggested that I give a recent version of the DiabloMiner a try since it should have decent support for GCN, so I did.

~650MH/s with the default diablominer settings and the card OC'd @ 1125/975MHz:



~530MH/s at standard clocks:



pretty freakin awesome, if you ask me. Now if they can just sell the things for <$400 I'd be happy.  Do you have any TPD numbers with this card?
newbie
Activity: 28
Merit: 0
6990 is dual GPU so has total of 3072 shaders gets about 800 mhash/s using two cores total.
7970 is single GPU so has total of 2048 shaders get about 666 mhash/s using one core total.
i stand corrected, thought it was also dual.
i feel the need for read
legendary
Activity: 1162
Merit: 1000
DiabloMiner author
6990 is dual GPU so has total of 3072 shaders gets about 800 mhash/s using two cores total.
7970 is single GPU so has total of 2048 shaders get about 666 mhash/s using one core total.

Get some sleep dude and stay off SR !

This.
hero member
Activity: 518
Merit: 500
now this card is out.. how about some ++real++ benchmarks?
this thing has 500 more sp than the 6990 - how can it possibly be slower?Huh


LOL you don't know a thing about mining, do you ?

6990 has 3 072 or 2*1536

5970 has 2*1600 or 3200 shaders

7970 has 2048 shaders

5870 has 1600 shaders

7990 supposedly has 4096 shaders ?


thanks for repeating what i said.
7970 has 2048 SP
6990 has 1536 SP
thats 500 more. per core, whatever, its more, MORE. so why is the card putting out so much less? maybe if you put your effort into explaining a decent answer (such as 'because the miners need re-optimisation') instead of being a flabberfinger - we could have all benefited.

6990 is dual GPU so has total of 3072 shaders gets about 800 mhash/s using two cores total.
7970 is single GPU so has total of 2048 shaders get about 666 mhash/s using one core total.

Get some sleep dude and stay off SR !
newbie
Activity: 28
Merit: 0
now this card is out.. how about some ++real++ benchmarks?
this thing has 500 more sp than the 6990 - how can it possibly be slower?Huh


LOL you don't know a thing about mining, do you ?

6990 has 3 072 or 2*1536

5970 has 2*1600 or 3200 shaders

7970 has 2048 shaders

5870 has 1600 shaders

7990 supposedly has 4096 shaders ?


thanks for repeating what i said.
7970 has 2048 SP
6990 has 1536 SP
thats 500 more. per core, whatever, its more, MORE. so why is the card putting out so much less? maybe if you put your effort into explaining a decent answer (such as 'because the miners need re-optimisation') instead of being a flabberfinger - we could have all benefited.
Pages:
Jump to: