Author

Topic: Claymore's ZCash/BTG AMD GPU Miner v12.6 (Windows/Linux) - page 480. (Read 3839163 times)

member
Activity: 91
Merit: 10
 
Has got to be 390.  My 390 slightly overclocked is getting 290~sols/s with v8.0 i4.

or 2*380
 Grin
full member
Activity: 189
Merit: 100
1 x sapphire r9 380 stock clock, stock bios, 63 degrees
~ 260 H/s

any advice how to raise that number?
Also I have no way to measure my wattage, can anyone tell me approx. consumption?

380 or 390 ?

Has got to be 390.  My 390 slightly overclocked is getting 290~sols/s with v8.0 i4.
legendary
Activity: 3808
Merit: 1723
Up to 300% + 200 FS deposit bonuses
Too mutch rejects on v8.0 with -i 4 and upper intence.

r9 280x
win 8.1
virtual mem, 16gb
15.12
environment variables setx to on

http://c2n.me/3EOYCOz



means you overclocked too much. Check log for invalid solutions for buffer overflow. I had this also.
sr. member
Activity: 430
Merit: 254
I don`t know how exact you can adjust memory access to GPU memory, but ppl that compare and cry here thet RX 4xx should be fast as 390x should first learn that any architecture is different, the driver is accessing GPU memory as best as it can, if zcash need many small accesses and if 256bit bus is not wide enough its logical that 384 or 512bit bus will be better
Fiji should be faster,but maybe code isnt be suited for HBM

i feel 280x the cards Claymore loves the most - will have 200 sols per card i nthe update Smiley)

I like 390-390X the most - I'm going to reach 300H/s on stock clocks.
RX480 will show about 190-200 I think.
280X - about 200 or a bit more.
and what about nano/fury ? They have 512 gGB of bandwith...

Yes, but too wide memory bus, 4096bit is too much for most PoW algos and therefore cannot be used completely.
Nano will show about 250H/s, may be I will reach a bit more.
member
Activity: 91
Merit: 10
1 x sapphire r9 380 stock clock, stock bios, 63 degrees
~ 260 H/s

any advice how to raise that number?
Also I have no way to measure my wattage, can anyone tell me approx. consumption?

380 or 390 ?
newbie
Activity: 18
Merit: 0
1 x sapphire r9 390 stock clock, stock bios, 63 degrees
~ 260 H/s

any advice how to raise that number?
Also I have no way to measure my wattage, can anyone tell me approx. consumption?
sr. member
Activity: 290
Merit: 250
mine devfee
In claymore v.8
Each 15 minutes in my rig 7gpu... 1minutes..for me is good or bad???
newbie
Activity: 11
Merit: 0
Too mutch rejects on v8.0 with -i 4 and upper intence.

r9 280x
win 8.1
virtual mem, 16gb
15.12
environment variables setx to on

http://c2n.me/3EOYCOz

legendary
Activity: 3416
Merit: 1059
I don`t know how exact you can adjust memory access to GPU memory, but ppl that compare and cry here thet RX 4xx should be fast as 390x should first learn that any architecture is different, the driver is accessing GPU memory as best as it can, if zcash need many small accesses and if 256bit bus is not wide enough its logical that 384 or 512bit bus will be better, even when we know with 2xx and 39x GPU and memory clock is more "aligned" and in sync then on RX cards which usualu work 11xx/2000

CRYING here RX4xx is pointless if you know NOTHING about internel GPU arhitecture and even less about zcash prof of work algo and how its computed

a lot of butt hurt people who bought rx 4xx cards here and some sold their old cards  Grin

member
Activity: 71
Merit: 10
I don`t know how exact you can adjust memory access to GPU memory, but ppl that compare and cry here thet RX 4xx should be fast as 390x should first learn that any architecture is different, the driver is accessing GPU memory as best as it can, if zcash need many small accesses and if 256bit bus is not wide enough its logical that 384 or 512bit bus will be better
Fiji should be faster,but maybe code isnt be suited for HBM
sr. member
Activity: 449
Merit: 251
Tonga is really not optimized

Tonga it's the problem, by itself...
It was close to be a scam, from amd...

They said Tonga was going to replace the aged Tahiti.... but they were just kiddin'
Tonga is more efficient for gaming, it has better memory efficiency, and perhaps more efficient GPU?  My memory is cloudy.  Anyway, for mining it isn't as good, improvements are for gaming.
jr. member
Activity: 36
Merit: 5
Hi guys I have a question. I have 4 RX 480's. For some reason, 3 of them are hashing around 180mh, but the other one is only doing 40mh. Is there anything you can recommend to fix my issue? I have asus h170 pro gaming mb, and a 1200psu, 8gb ram, 120ssd, windows 10.

When this happens to me I remove the driver with DDU then reinstall driver. Problem solved. Good luck.
sr. member
Activity: 430
Merit: 254
On my 290 memory controller load is rarely over 60%. Big difference is that beside 290 memory bus being 2x wider, memory runs at 1250 MHz vs 2000 MHz on your 480. This means that you can do all possible tricks but no way can use that tight timings as on 290 at 1250 MHz or 390 at 1500 MHz. OK suppose that you reduce mem clock on 480 to 1500 or 1250 MHz to get the same timings but then you still do not get the speed that is possible with 2x wider bus.
This is not question of memory througput.
Reducing memclock almost twice affects only to 20% hashrate drop.
https://ip.bitcointalk.org/?u=http%3A%2F%2Fi.piccy.info%2Fi9%2Fce2e18589c91c75caab3b10a46a2c9f2%2F1480097808%2F34039%2F1051816%2Fmemdrop.png&t=571&c=zWfZWhuCJrGaPQ


While my initial analysis was focused on the external GDDR5 bandwidth limits, current ZEC GPU mining software seems to be limited by the memory controller/core bus.  On AMD GCN, each memory controller can xfer 64 bytes (1 cache line) per clock.  In SA5, the ht_store function, in addition to adding to row counters, does 4 separate memory writes for most rounds (3 writes for the last couple rounds).  All of these writes are either 4 or 8 bytes, so much less than 64 bytes per clock are being transferred to the L2 cache.  A single thread (1 SIMD element) can transfer at most 16 bytes (dwordX4) in a single instruction.  This means a modified ht_store thread could update a row slot in 2 clocks.  If the update operation is split between 2 (or 4 or more) threads, one slot can be updated in one clock, since 2 threads can simultaneously write to different parts of the same 64-byte block.  This would mean each row update operation could be done in 2 GPU core clock cycles; one for the counter update, and one for updating the row slot.

Even with those changes, my calculations indicate that a ZEC miner would be limited by the core clock, according to a ratio of approximately 5:6.  In other words, when a Rx 470 has a memory clock of 1750Mhz, the core would need to be clocked at 1750 * 5/6 = 1458Mhz in order to achieve maximum performance.

If the row counters can be kept in LDS or GDS, the core:memory ratio required would be 1:2, thereby allowing full use of the external memory bandwidth.  There is 64KB of LDS per CU, and the AMD GCN architecture docs indicate the LDS can be globally addressed; i.e. one CU can access the LDS of another CU.  However the syntax of OpenCL does not permit the local memory of one work-group to be accessed by a different work-group.  There is only 64KB of GDS shared by all CUs, and even if the row counters could be stored in such a small amount of memory, OpenCL does not have any concept of GDS.

This likely means writing a top performance ZEC miner for AMD is the domain of someone who codes in GCN assembler.  Canis lupus?


Core speed has more of an effect on 480s but they are still limited by memory bandwidth.
member
Activity: 91
Merit: 10
On my 290 memory controller load is rarely over 60%. Big difference is that beside 290 memory bus being 2x wider, memory runs at 1250 MHz vs 2000 MHz on your 480. This means that you can do all possible tricks but no way can use that tight timings as on 290 at 1250 MHz or 390 at 1500 MHz. OK suppose that you reduce mem clock on 480 to 1500 or 1250 MHz to get the same timings but then you still do not get the speed that is possible with 2x wider bus.
This is not question of memory througput.
Reducing memclock almost twice affects only to 20% hashrate drop.


Update: Tahiti have 768 kB L2, Hawaii have 1 MB L2, Ellesmere have 2 MB L2 cache.
newbie
Activity: 51
Merit: 0
I don`t know how exact you can adjust memory access to GPU memory, but ppl that compare and cry here thet RX 4xx should be fast as 390x should first learn that any architecture is different, the driver is accessing GPU memory as best as it can, if zcash need many small accesses and if 256bit bus is not wide enough its logical that 384 or 512bit bus will be better, even when we know with 2xx and 39x GPU and memory clock is more "aligned" and in sync then on RX cards which usualu work 11xx/2000

CRYING here RX4xx is pointless if you know NOTHING about internel GPU arhitecture and even less about zcash prof of work algo and how its computed

Even if 480 and memory bandwidth 256bit bus still only used 50% of its capacity !!! And I think that the manufacturer knowingly went to such a move is likely for this new chip Polaris dostochno and bandwidth 256bit bus, with his new memory controller that provides a slightly lower performance than the 390 !!!
Sorry for my English


That would see the controller load from 390 models think it will give a small concept in this issue

On my 290 memory controller load is rarely over 60%. Big difference is that beside 290 memory bus being 2x wider, memory runs at 1250 MHz vs 2000 MHz on your 480. This means that you can do all possible tricks but no way can use that tight timings as on 290 at 1250 MHz or 390 at 1500 MHz. OK suppose that you reduce mem clock on 480 to 1500 or 1250 MHz to get the same timings but then you still do not get the speed that is possible with 2x wider bus.
member
Activity: 91
Merit: 10
Quote
Anyone have any ideas?
Press "s" on that miner when it runs.
member
Activity: 96
Merit: 10
I have something I cannot figure out and was wondering if any of you gents have an idea.

Two computers.
Each has an idential MSI RX480 8G in it.
Both are running Claymore v8.
Both are using an identical start script - except the worker name.

The one miner preiodically reports the GPU temp and fan % - which I really like.
The other doesnt Huh Huh Huh

Here is a screen capture showing the difference:
https://snag.gy/YAPBXj.jpg


Anyone have any ideas?
cheers and thanks
newbie
Activity: 51
Merit: 0

r7 370 is actually a "pro" chip meaning it has 1024 sps like the 7850 and r7 265, but for some reason seems to perform more like a 1280 sp "XT" chip. They must have made some minor performance tweaks. The r9 270s are the same as 7870s and 270Xs with 1280 sps but most were voltage locked and so just couldn't clock as high without BIOS mods.

chip wise/core count wise: 7850 = r7 265 = r7 370  <  7870 = r9 270 = r9 270X = 370X  <  7870XT


Regarding clocks - I had in past 270x Sapphire Toxic and MSI Hawk models and now I have a MSI Gaming 370 4GB model. Max 100% stable clock I could achieve with all of them is 1200 MHz. Only difference is, that on 270x cards voltage was unlocked and I was able to downvolt them to 1150 mV, but on 370 it is fixed to 1162 mV. Nice thing about the 370 is that while it is doing 128 sol/s desktop responds well enough and I can work at the same time and it is quiet and not too hot. So I do not know about "pro" or not, but I like the card. With v8 I had to reduce GPU clock from 1200 to 1175 because it was not 100% stable any more.
newbie
Activity: 10
Merit: 0
I don`t know how exact you can adjust memory access to GPU memory, but ppl that compare and cry here thet RX 4xx should be fast as 390x should first learn that any architecture is different, the driver is accessing GPU memory as best as it can, if zcash need many small accesses and if 256bit bus is not wide enough its logical that 384 or 512bit bus will be better, even when we know with 2xx and 39x GPU and memory clock is more "aligned" and in sync then on RX cards which usualu work 11xx/2000

CRYING here RX4xx is pointless if you know NOTHING about internel GPU arhitecture and even less about zcash prof of work algo and how its computed

Even if 480 and memory bandwidth 256bit bus still only used 50% of its capacity !!! And I think that the manufacturer knowingly went to such a move is likely for this new chip Polaris dostochno and bandwidth 256bit bus, with his new memory controller that provides a slightly lower performance than the 390 !!!
Sorry for my English


That would see the controller load from 390 models think it will give a small concept in this issue

R9 290 MC usage:

https://i.imgur.com/UX0NIVb.png



Do you also know if you want to check if a algo is memory limited, you can go into GPUZ and check out the MCU (memory controller unit) and see the load on it?

I think this is wrong.  Although I primarily mine using Linux, I have a Windoze box that I use for testing cards.  GPU-z appears to show only external bus bandwidth use (to the GDDR), and not the utilization of the bandwidth between the controller and core.  In practical terms, a miner kernel may be using 200GB/s of memory bandwidth, but a significant percentage of it can be from the L2 cache.  The collision counter tables in SA5 would be an example of this.


Do you have a source for this hypothesis? In all memory restricted algos that correlates to MCU usage. Pretty sure it pertains to any sort of memory overload, bandwidth or bus width...

My knowledge of the AMD GCN architecture (and computer architecture in general), and my experience writing OpenCL.


Loading controller slightly higher than the 480, but the GPU BPM temperature2 temperature is about the same as the GPU BPM temperature1, there is likely to interfere with the speed of the memory controller already that little bandwidth 256bit bus
hero member
Activity: 494
Merit: 500
anyone running into an out of memory error? win 10 16gb ram rx480
Jump to: