Claymore's ZCash/BTG AMD GPU Miner v12.6 (Windows/Linux) - page 480.

adaseb

legendary

Activity: 3808

Merit: 1723

Quote from: NikitaS on November 25, 2016, 05:20:46 PM

Too mutch rejects on v8.0 with -i 4 and upper intence.

r9 280x
win 8.1
virtual mem, 16gb
15.12
environment variables setx to on

http://c2n.me/3EOYCOz

means you overclocked too much. Check log for invalid solutions for buffer overflow. I had this also.

bardacuda

sr. member

Activity: 430

Merit: 254

Quote from: PontiacGTX on November 25, 2016, 05:06:26 PM

Quote from: reb0rn21 on November 25, 2016, 02:50:44 PM

I don`t know how exact you can adjust memory access to GPU memory, but ppl that compare and cry here thet RX 4xx should be fast as 390x should first learn that any architecture is different, the driver is accessing GPU memory as best as it can, if zcash need many small accesses and if 256bit bus is not wide enough its logical that 384 or 512bit bus will be better

Fiji should be faster,but maybe code isnt be suited for HBM

Quote from: Claymore on November 23, 2016, 11:48:26 AM

Quote from: ol92 on November 23, 2016, 11:45:21 AM

Quote from: Claymore on November 23, 2016, 09:51:58 AM

Quote from: forzendiablo on November 23, 2016, 09:34:19 AM

i feel 280x the cards Claymore loves the most - will have 200 sols per card i nthe update

)

I like 390-390X the most - I'm going to reach 300H/s on stock clocks.
RX480 will show about 190-200 I think.
280X - about 200 or a bit more.

and what about nano/fury ? They have 512 gGB of bandwith...

Yes, but too wide memory bus, 4096bit is too much for most PoW algos and therefore cannot be used completely.
Nano will show about 250H/s, may be I will reach a bit more.

Jinx99

member

Activity: 91

Merit: 10

Quote from: orbital_station on November 25, 2016, 05:41:40 PM

1 x sapphire r9 380 stock clock, stock bios, 63 degrees
~ 260 H/s

any advice how to raise that number?
Also I have no way to measure my wattage, can anyone tell me approx. consumption?

380 or 390 ?

orbital_station

newbie

Activity: 18

Merit: 0

1 x sapphire r9 390 stock clock, stock bios, 63 degrees
~ 260 H/s

any advice how to raise that number?
Also I have no way to measure my wattage, can anyone tell me approx. consumption?

naeme18720

sr. member

Activity: 290

Merit: 250

mine devfee
In claymore v.8
Each 15 minutes in my rig 7gpu... 1minutes..for me is good or bad???

NikitaS

newbie

Activity: 11

Merit: 0

Too mutch rejects on v8.0 with -i 4 and upper intence.

r9 280x
win 8.1
virtual mem, 16gb
15.12
environment variables setx to on

http://c2n.me/3EOYCOz

arielbit

legendary

Activity: 3444

Merit: 1061

Quote from: reb0rn21 on November 25, 2016, 02:50:44 PM

I don`t know how exact you can adjust memory access to GPU memory, but ppl that compare and cry here thet RX 4xx should be fast as 390x should first learn that any architecture is different, the driver is accessing GPU memory as best as it can, if zcash need many small accesses and if 256bit bus is not wide enough its logical that 384 or 512bit bus will be better, even when we know with 2xx and 39x GPU and memory clock is more "aligned" and in sync then on RX cards which usualu work 11xx/2000

CRYING here RX4xx is pointless if you know NOTHING about internel GPU arhitecture and even less about zcash prof of work algo and how its computed

a lot of butt hurt people who bought rx 4xx cards here and some sold their old cards Grin

PontiacGTX

member

Activity: 71

Merit: 10

Quote from: reb0rn21 on November 25, 2016, 02:50:44 PM

I don`t know how exact you can adjust memory access to GPU memory, but ppl that compare and cry here thet RX 4xx should be fast as 390x should first learn that any architecture is different, the driver is accessing GPU memory as best as it can, if zcash need many small accesses and if 256bit bus is not wide enough its logical that 384 or 512bit bus will be better

Fiji should be faster,but maybe code isnt be suited for HBM

xeridea

sr. member

Activity: 449

Merit: 251

Quote from: alesx.onfire on November 25, 2016, 07:19:42 AM

Quote from: mitache365 on November 25, 2016, 07:13:10 AM

Tonga is really not optimized

Tonga it's the problem, by itself...
It was close to be a scam, from amd...

They said Tonga was going to replace the aged Tahiti.... but they were just kiddin'

Tonga is more efficient for gaming, it has better memory efficiency, and perhaps more efficient GPU? My memory is cloudy. Anyway, for mining it isn't as good, improvements are for gaming.

pacolito

jr. member

Activity: 36

Merit: 5

Quote from: lithiumviper12 on November 25, 2016, 03:13:01 PM

Hi guys I have a question. I have 4 RX 480's. For some reason, 3 of them are hashing around 180mh, but the other one is only doing 40mh. Is there anything you can recommend to fix my issue? I have asus h170 pro gaming mb, and a 1200psu, 8gb ram, 120ssd, windows 10.

When this happens to me I remove the driver with DDU then reinstall driver. Problem solved. Good luck.

bardacuda

sr. member

Activity: 430

Merit: 254

Quote from: Jinx99 on November 25, 2016, 03:57:03 PM

Quote from: KrokoTill on November 25, 2016, 03:50:45 PM

On my 290 memory controller load is rarely over 60%. Big difference is that beside 290 memory bus being 2x wider, memory runs at 1250 MHz vs 2000 MHz on your 480. This means that you can do all possible tricks but no way can use that tight timings as on 290 at 1250 MHz or 390 at 1500 MHz. OK suppose that you reduce mem clock on 480 to 1500 or 1250 MHz to get the same timings but then you still do not get the speed that is possible with 2x wider bus.

This is not question of memory througput.
Reducing memclock almost twice affects only to 20% hashrate drop.
https://ip.bitcointalk.org/?u=http%3A%2F%2Fi.piccy.info%2Fi9%2Fce2e18589c91c75caab3b10a46a2c9f2%2F1480097808%2F34039%2F1051816%2Fmemdrop.png&t=571&c=zWfZWhuCJrGaPQ

Quote from: nerdralph on November 22, 2016, 03:15:08 PM

While my initial analysis was focused on the external GDDR5 bandwidth limits, current ZEC GPU mining software seems to be limited by the memory controller/core bus. On AMD GCN, each memory controller can xfer 64 bytes (1 cache line) per clock. In SA5, the ht_store function, in addition to adding to row counters, does 4 separate memory writes for most rounds (3 writes for the last couple rounds). All of these writes are either 4 or 8 bytes, so much less than 64 bytes per clock are being transferred to the L2 cache. A single thread (1 SIMD element) can transfer at most 16 bytes (dwordX4) in a single instruction. This means a modified ht_store thread could update a row slot in 2 clocks. If the update operation is split between 2 (or 4 or more) threads, one slot can be updated in one clock, since 2 threads can simultaneously write to different parts of the same 64-byte block. This would mean each row update operation could be done in 2 GPU core clock cycles; one for the counter update, and one for updating the row slot.

Even with those changes, my calculations indicate that a ZEC miner would be limited by the core clock, according to a ratio of approximately 5:6. In other words, when a Rx 470 has a memory clock of 1750Mhz, the core would need to be clocked at 1750 * 5/6 = 1458Mhz in order to achieve maximum performance.

If the row counters can be kept in LDS or GDS, the core:memory ratio required would be 1:2, thereby allowing full use of the external memory bandwidth. There is 64KB of LDS per CU, and the AMD GCN architecture docs indicate the LDS can be globally addressed; i.e. one CU can access the LDS of another CU. However the syntax of OpenCL does not permit the local memory of one work-group to be accessed by a different work-group. There is only 64KB of GDS shared by all CUs, and even if the row counters could be stored in such a small amount of memory, OpenCL does not have any concept of GDS.

This likely means writing a top performance ZEC miner for AMD is the domain of someone who codes in GCN assembler. Canis lupus?

Core speed has more of an effect on 480s but they are still limited by memory bandwidth.

Jinx99

member

Activity: 91

Merit: 10

Quote from: KrokoTill on November 25, 2016, 03:50:45 PM

On my 290 memory controller load is rarely over 60%. Big difference is that beside 290 memory bus being 2x wider, memory runs at 1250 MHz vs 2000 MHz on your 480. This means that you can do all possible tricks but no way can use that tight timings as on 290 at 1250 MHz or 390 at 1500 MHz. OK suppose that you reduce mem clock on 480 to 1500 or 1250 MHz to get the same timings but then you still do not get the speed that is possible with 2x wider bus.

This is not question of memory througput.
Reducing memclock almost twice affects only to 20% hashrate drop.

Update: Tahiti have 768 kB L2, Hawaii have 1 MB L2, Ellesmere have 2 MB L2 cache.

KrokoTill

newbie

Activity: 51

Merit: 0

Quote from: Rusguy on November 25, 2016, 03:00:03 PM

Quote from: reb0rn21 on November 25, 2016, 02:50:44 PM

I don`t know how exact you can adjust memory access to GPU memory, but ppl that compare and cry here thet RX 4xx should be fast as 390x should first learn that any architecture is different, the driver is accessing GPU memory as best as it can, if zcash need many small accesses and if 256bit bus is not wide enough its logical that 384 or 512bit bus will be better, even when we know with 2xx and 39x GPU and memory clock is more "aligned" and in sync then on RX cards which usualu work 11xx/2000

CRYING here RX4xx is pointless if you know NOTHING about internel GPU arhitecture and even less about zcash prof of work algo and how its computed

Even if 480 and memory bandwidth 256bit bus still only used 50% of its capacity !!! And I think that the manufacturer knowingly went to such a move is likely for this new chip Polaris dostochno and bandwidth 256bit bus, with his new memory controller that provides a slightly lower performance than the 390 !!!
Sorry for my English

That would see the controller load from 390 models think it will give a small concept in this issue

On my 290 memory controller load is rarely over 60%. Big difference is that beside 290 memory bus being 2x wider, memory runs at 1250 MHz vs 2000 MHz on your 480. This means that you can do all possible tricks but no way can use that tight timings as on 290 at 1250 MHz or 390 at 1500 MHz. OK suppose that you reduce mem clock on 480 to 1500 or 1250 MHz to get the same timings but then you still do not get the speed that is possible with 2x wider bus.

Jinx99

member

Activity: 91

Merit: 10

Quote

Anyone have any ideas?

Press "s" on that miner when it runs.

topgeek

member

Activity: 96

Merit: 10

I have something I cannot figure out and was wondering if any of you gents have an idea.

Two computers.
Each has an idential MSI RX480 8G in it.
Both are running Claymore v8.
Both are using an identical start script - except the worker name.

The one miner preiodically reports the GPU temp and fan % - which I really like.
The other doesnt Huh

Here is a screen capture showing the difference:
https://snag.gy/YAPBXj.jpg

Anyone have any ideas?
cheers and thanks

KrokoTill

newbie

Activity: 51

Merit: 0

Quote from: bardacuda on November 25, 2016, 02:30:02 PM

r7 370 is actually a "pro" chip meaning it has 1024 sps like the 7850 and r7 265, but for some reason seems to perform more like a 1280 sp "XT" chip. They must have made some minor performance tweaks. The r9 270s are the same as 7870s and 270Xs with 1280 sps but most were voltage locked and so just couldn't clock as high without BIOS mods.

chip wise/core count wise: 7850 = r7 265 = r7 370 < 7870 = r9 270 = r9 270X = 370X < 7870XT

Regarding clocks - I had in past 270x Sapphire Toxic and MSI Hawk models and now I have a MSI Gaming 370 4GB model. Max 100% stable clock I could achieve with all of them is 1200 MHz. Only difference is, that on 270x cards voltage was unlocked and I was able to downvolt them to 1150 mV, but on 370 it is fixed to 1162 mV. Nice thing about the 370 is that while it is doing 128 sol/s desktop responds well enough and I can work at the same time and it is quiet and not too hot. So I do not know about "pro" or not, but I like the card. With v8 I had to reduce GPU clock from 1200 to 1175 because it was not 100% stable any more.

Rusguy

newbie

Activity: 10

Merit: 0

Quote from: bardacuda on November 25, 2016, 03:25:23 PM

Quote from: Rusguy on November 25, 2016, 03:00:03 PM

Quote from: reb0rn21 on November 25, 2016, 02:50:44 PM

I don`t know how exact you can adjust memory access to GPU memory, but ppl that compare and cry here thet RX 4xx should be fast as 390x should first learn that any architecture is different, the driver is accessing GPU memory as best as it can, if zcash need many small accesses and if 256bit bus is not wide enough its logical that 384 or 512bit bus will be better, even when we know with 2xx and 39x GPU and memory clock is more "aligned" and in sync then on RX cards which usualu work 11xx/2000

CRYING here RX4xx is pointless if you know NOTHING about internel GPU arhitecture and even less about zcash prof of work algo and how its computed

Even if 480 and memory bandwidth 256bit bus still only used 50% of its capacity !!! And I think that the manufacturer knowingly went to such a move is likely for this new chip Polaris dostochno and bandwidth 256bit bus, with his new memory controller that provides a slightly lower performance than the 390 !!!
Sorry for my English

That would see the controller load from 390 models think it will give a small concept in this issue

R9 290 MC usage:

https://i.imgur.com/UX0NIVb.png

Quote from: nerdralph on November 18, 2016, 10:02:52 AM

Quote from: bensam1231 on November 17, 2016, 09:31:18 PM

Quote from: nerdralph on November 17, 2016, 03:09:59 PM

Quote from: bensam1231 on November 17, 2016, 11:26:18 AM

Do you also know if you want to check if a algo is memory limited, you can go into GPUZ and check out the MCU (memory controller unit) and see the load on it?

I think this is wrong. Although I primarily mine using Linux, I have a Windoze box that I use for testing cards. GPU-z appears to show only external bus bandwidth use (to the GDDR), and not the utilization of the bandwidth between the controller and core. In practical terms, a miner kernel may be using 200GB/s of memory bandwidth, but a significant percentage of it can be from the L2 cache. The collision counter tables in SA5 would be an example of this.

Do you have a source for this hypothesis? In all memory restricted algos that correlates to MCU usage. Pretty sure it pertains to any sort of memory overload, bandwidth or bus width...

My knowledge of the AMD GCN architecture (and computer architecture in general), and my experience writing OpenCL.

Loading controller slightly higher than the 480, but the GPU BPM temperature2 temperature is about the same as the GPU BPM temperature1, there is likely to interfere with the speed of the memory controller already that little bandwidth 256bit bus

tc61

hero member

Activity: 494

Merit: 500

anyone running into an out of memory error? win 10 16gb ram rx480

bardacuda

sr. member

Activity: 430

Merit: 254

Quote from: Rusguy on November 25, 2016, 03:00:03 PM

Quote from: reb0rn21 on November 25, 2016, 02:50:44 PM

I don`t know how exact you can adjust memory access to GPU memory, but ppl that compare and cry here thet RX 4xx should be fast as 390x should first learn that any architecture is different, the driver is accessing GPU memory as best as it can, if zcash need many small accesses and if 256bit bus is not wide enough its logical that 384 or 512bit bus will be better, even when we know with 2xx and 39x GPU and memory clock is more "aligned" and in sync then on RX cards which usualu work 11xx/2000

CRYING here RX4xx is pointless if you know NOTHING about internel GPU arhitecture and even less about zcash prof of work algo and how its computed

Even if 480 and memory bandwidth 256bit bus still only used 50% of its capacity !!! And I think that the manufacturer knowingly went to such a move is likely for this new chip Polaris dostochno and bandwidth 256bit bus, with his new memory controller that provides a slightly lower performance than the 390 !!!
Sorry for my English

That would see the controller load from 390 models think it will give a small concept in this issue

R9 290 MC usage:

Quote from: nerdralph on November 18, 2016, 10:02:52 AM

Quote from: bensam1231 on November 17, 2016, 09:31:18 PM

Quote from: nerdralph on November 17, 2016, 03:09:59 PM

Quote from: bensam1231 on November 17, 2016, 11:26:18 AM

Do you also know if you want to check if a algo is memory limited, you can go into GPUZ and check out the MCU (memory controller unit) and see the load on it?

I think this is wrong. Although I primarily mine using Linux, I have a Windoze box that I use for testing cards. GPU-z appears to show only external bus bandwidth use (to the GDDR), and not the utilization of the bandwidth between the controller and core. In practical terms, a miner kernel may be using 200GB/s of memory bandwidth, but a significant percentage of it can be from the L2 cache. The collision counter tables in SA5 would be an example of this.

Do you have a source for this hypothesis? In all memory restricted algos that correlates to MCU usage. Pretty sure it pertains to any sort of memory overload, bandwidth or bus width...

My knowledge of the AMD GCN architecture (and computer architecture in general), and my experience writing OpenCL.

Jinx99

member

Activity: 91

Merit: 10

Quote from: lithiumviper12 on November 25, 2016, 03:13:01 PM

Hi guys I have a question. I have 4 RX 480's. For some reason, 3 of them are hashing around 180mh, but the other one is only doing 40mh. Is there anything you can recommend to fix my issue? I have asus h170 pro gaming mb, and a 1200psu, 8gb ram, 120ssd, windows 10.

If you have some performance issues - check GPU-Z "sensors" tab.

Topic: Claymore's ZCash/BTG AMD GPU Miner v12.6 (Windows/Linux) - page 480. (Read 3839398 times)