SILENTARMY v5: Zcash miner, 115 sol/s on R9 Nano, 70 sol/s on GTX 1070 - page 28.

laik2

sr. member

Activity: 652

Merit: 266

Quote from: ?? on ??

Quote from: laik2 on November 18, 2016, 06:22:00 PM

Quote from: ?? on ??

Quote from: laik2 on November 18, 2016, 06:06:00 PM

Quote

Well... erm... dunno how to say this, but not only is vdrop+.rom not edited in Heliox's style, it doesn't seem to be edited for voltage AT ALL - even in the ways that DON'T work. All the DPM states on vdrop+.rom point stock (into the voltage table, which has not been changed). The core clocks have been dropped pretty much across the board, though. Default memory clock was changed to 2080, but the rest of the memory states are untouched.

Heliox/Eliovp would (in a low-power ROM) have added a new VID for the initialization of the regulator in VoltageObjectInfo (changing the length of the table), as well as a value for it, which allows for global core undervolts that apply to every power state.

https://forum.ethereum.org/discussion/9650/sapphire-rx-480-nitro-oc-8gb-11260-01-20g-modded-bios-29-mh-downvolt
Here is where I got these bioses. I haven't touched vbios for 8 years...last time I modded vbios was when I got Radeon 9800Pro for PC and flashed it with Mac rom to put it in my G4

So basicly my knowledge of this is pretty much narrowed to minimum. Don't judge me too harsh... I just started mining (1/2 weeks ago)

Heh, don't worry about it, I won't. I'll just tell you what's there. Or not there. Looking it over now.

I really appreciate your opinion and help.
Thank you. PM your T address and I will put one VM core to mine on it 24/7.

Okay, so, the regulator initialization hasn't been touched, so there's no global core under/over volting going on - at least not that way. But... what is this? Unlike the other two ROMs you posted, this does undervolt the states... in a really bizarre fashion. Instead of updating the pointers in the DPM states into the voltage table to all point to DPM state 3... it has written DPM state 3's voltage value into the voltage table for entries 4, 5, 6, and 7. I can only guess at the cause of this oddity, but it might be something Polaris BIOS Editor does.

The 1625 strap has been copied up to 1750 and 2000 - I would HIGHLY advise against this for stability reasons on Samsung memory. Elpida and Hynix are happy even taking the 1500 strap all the way up to 2000, but Samsung hates it in my experience, and while it may look good at first, keeping the cards running will probably be hell.

The clocks in this VBIOS have been edited for most (if not all) of the DPM states - similar to your v4.rom and vdrop+.rom. Default memclock on it is 2080 for state 1 (performance) - the idle state is untouched.

The name of the VBIOS image file implies it was dumped from a Sapphire Nitro+, but it doesn't look like it to me. Sapphire Nitro+ cards (all 470s and 480s I've seen, regardless of mem type) have an offset in the voltage regulator init that does a (by default) overvolt of +25mV to core across all power states. This VBIOS does not. But it does go to a card with the same connectors - 2 DisplayPort and 2 HDMI Type A, and one DVI-D.

You don't need to donate, don't worry about it.

Ok so your advise is to return them to stock until valid and tested vbios is released for OC version?

laik2

sr. member

Activity: 652

Merit: 266

Quote from: ?? on ??

Quote from: laik2 on November 18, 2016, 06:06:00 PM

Quote

Well... erm... dunno how to say this, but not only is vdrop+.rom not edited in Heliox's style, it doesn't seem to be edited for voltage AT ALL - even in the ways that DON'T work. All the DPM states on vdrop+.rom point stock (into the voltage table, which has not been changed). The core clocks have been dropped pretty much across the board, though. Default memory clock was changed to 2080, but the rest of the memory states are untouched.

Heliox/Eliovp would (in a low-power ROM) have added a new VID for the initialization of the regulator in VoltageObjectInfo (changing the length of the table), as well as a value for it, which allows for global core undervolts that apply to every power state.

https://forum.ethereum.org/discussion/9650/sapphire-rx-480-nitro-oc-8gb-11260-01-20g-modded-bios-29-mh-downvolt
Here is where I got these bioses. I haven't touched vbios for 8 years...last time I modded vbios was when I got Radeon 9800Pro for PC and flashed it with Mac rom to put it in my G4

So basicly my knowledge of this is pretty much narrowed to minimum. Don't judge me too harsh... I just started mining (1/2 weeks ago)

Heh, don't worry about it, I won't. I'll just tell you what's there. Or not there. Looking it over now.

I really appreciate your opinion and help.
Thank you. PM your T address and I will put one VM core to mine on it 24/7.

zawawa

sr. member

Activity: 728

Merit: 304

Miner Developer

Quote from: mrb on November 18, 2016, 05:06:01 PM

Quote from: zawawa on November 18, 2016, 03:06:46 PM

Hey devs, I have been playing with eXtremal's latest kernel and trying to optimize kernel_sols() now as it seems to be one of the bottlenecks as far as I can tell with CodeXL with NUMVGPR being over 170 on RX 480. I was able to reduce it to 50 something, but I cannot get rid of scratchpad registers that are 512 bytes in size. They must have something to do with array indexing, but I was not able to pinpoint the exact portion of the code that is causing register spills. Do you guys have any ideas?

kernel_sols is not a bottleneck. It only takes 1.2-1.5 ms on the R9 Nano out of 17 ms of a full Equihash run.

You are absolutely right! I must tackle equihash_round()...

laik2

sr. member

Activity: 652

Merit: 266

Quote

Well... erm... dunno how to say this, but not only is vdrop+.rom not edited in Heliox's style, it doesn't seem to be edited for voltage AT ALL - even in the ways that DON'T work. All the DPM states on vdrop+.rom point stock (into the voltage table, which has not been changed). The core clocks have been dropped pretty much across the board, though. Default memory clock was changed to 2080, but the rest of the memory states are untouched.

Heliox/Eliovp would (in a low-power ROM) have added a new VID for the initialization of the regulator in VoltageObjectInfo (changing the length of the table), as well as a value for it, which allows for global core undervolts that apply to every power state.

https://forum.ethereum.org/discussion/9650/sapphire-rx-480-nitro-oc-8gb-11260-01-20g-modded-bios-29-mh-downvolt
Here is where I got these bioses. I haven't touched vbios for 8 years...last time I modded vbios was when I got Radeon 9800Pro for PC and flashed it with Mac rom to put it in my G4

So basicly my knowledge of this is pretty much narrowed to minimum. Don't judge me too harsh... I just started mining (1/2 weeks ago)

mrb

legendary

Activity: 1512

Merit: 1028

Quote from: zawawa on November 18, 2016, 03:06:46 PM

Hey devs, I have been playing with eXtremal's latest kernel and trying to optimize kernel_sols() now as it seems to be one of the bottlenecks as far as I can tell with CodeXL with NUMVGPR being over 170 on RX 480. I was able to reduce it to 50 something, but I cannot get rid of scratchpad registers that are 512 bytes in size. They must have something to do with array indexing, but I was not able to pinpoint the exact portion of the code that is causing register spills. Do you guys have any ideas?

kernel_sols is not a bottleneck. It only takes 1.2-1.5 ms on the R9 Nano out of 17 ms of a full Equihash run.

zawawa

sr. member

Activity: 728

Merit: 304

Miner Developer

I was able to remove scratchpad registers from kernel_sols() and placed values_temp[] in __local.
The speed gain was not as much as I hoped for, though. I knew GCN's shared memory was rather slow compared to CUDA...

bardacuda

sr. member

Activity: 430

Merit: 254

Just when I was about to switch to ETC....

reb0rn21

legendary

Activity: 1901

Merit: 1024

clay 7:
280x modded + OC 180sol, 1200Mhz near 200sol/s
390x - 240sol/s
RX 4xx did not gain any speed

looks like its all about memory speed and some compute for blake ago, so 280x need 1200Mhz to make near 200sol/s

zawawa

sr. member

Activity: 728

Merit: 304

Miner Developer

Quote from: zawawa on November 18, 2016, 03:06:46 PM

Hey devs, I have been playing with eXtremal's latest kernel and trying to optimize kernel_sols() now as it seems to be one of the bottlenecks as far as I can tell with CodeXL with NUMVGPR being over 170 on RX 480. I was able to reduce it to 50 something, but I cannot get rid of scratchpad registers that are 512 bytes in size. They must have something to do with array indexing, but I was not able to pinpoint the exact portion of the code that is causing register spills. Do you guys have any ideas?

Never mind, I just found it. Let me see...

Code:

uint	values_tmp[(1 << PARAM_K)];

zawawa

sr. member

Activity: 728

Merit: 304

Miner Developer

By the way, I was able to create a multi-threaded version of sa-solver for Windows with a few percent speed gain.
I would like to see more performance improvements, though. I am pretty sure they are possible if we can get rid of these annoying register spills....

zawawa

sr. member

Activity: 728

Merit: 304

Miner Developer

Hey devs, I have been playing with eXtremal's latest kernel and trying to optimize kernel_sols() now as it seems to be one of the bottlenecks as far as I can tell with CodeXL with NUMVGPR being over 170 on RX 480. I was able to reduce it to 50 something, but I cannot get rid of scratchpad registers that are 512 bytes in size. They must have something to do with array indexing, but I was not able to pinpoint the exact portion of the code that is causing register spills. Do you guys have any ideas?

adamvp

hero member

Activity: 1246

Merit: 708

Quote from: ?? on ??

sorry if it was already answered, but...
I would like to ask about new realase of optiminer..
Could someone give me an example how to use its watchdog (for ubuntu 14)?

Code:

--watchdog-timeout 
     Timeout after which the watchdog triggers if a GPU does not produce
     any solutions. It will execute the command specified by
     --watchdog-cmd. You can use this command to do an appropriate action
     (e.g. reset driver or reboot). 0 disables watchdog.

Are you for real? Open a thread on Optiminer then or use an existing Optiminer one to discuss.. This thread is for a different miner.

sorry, I am a little tired now, can't see any optiminer thread, sorry..
but I'd preffer to use FOSS miner, but while waiting to new relase of SM I should use another due to significally better solrate Sad

xeridea

sr. member

Activity: 449

Merit: 251

Quote from: QuintLeo on November 18, 2016, 09:07:05 AM

Quote from: bensam1231 on November 17, 2016, 09:47:04 PM

Scrypt GPU mining ended in the fall of 14 without private kernels. x11 started up shortly there after, became unprofitable at the beginning of winter. Gridseed weren't ASICs either, the first ones weren't very profitable or good. You may have just remembered those little USB things coming out and thought 'well those were ASICs', they weren't. There were a lot of really bad ASICs. Gridseeds were never a good deal.

Unless you were running private kernels yourself, it wasn't happening.

What other algo are you looking at that's mature? Dagger doesn't count. That's a very niche scenario and it's bound almost exclusively by bus width. The GPUs never get a chance to even be close being fully utilized.

R9-290 has a 512bit bus as was already mentioned.

Who tests GPUs on sha-256? How about trying something remotely relevant to the discussion like say NeoS, Lyra2v2, or even x11. People haven't made optimized miners for Sha in years. As mentioned before if you're talking about 'theoretical usage' scenarios, video games are a very good example of that as GPUs are made to run as fast as possible on them.

Memory usage doesn't need to be about bandwidth or bus width, it could just be the total memory usage as well. Not just that, it doesn't need to be restricted JUST to throughput, it can utilize memory and still do a lot of processing on GPUs. At this point though you're just making shit up and theorycrafting again.

You can blame latency all you want, but Fury not only has a 4096 bit bus, but also gobs of memory bandwidth, it's not eight times faster then R9-290 or even twice as fast. It's not just all about memory speeds here or even latency.

The Gridseed 3355 WAS in fact an ASIC - and on scrypt it was more efficient than anything GPU based at the time by quite a bit. single side of an "80 blade" would pull 2.5 Mhash/sec at 40 watts where the best GPUs of the time were pulling less than half that at a LOT more power (7990 was an exception with it's pair of cores, it could actually manage a bit more than half the hashrate but pulled a TON more power to do so).

Dagger (ETH) isn't "bus width limited, it's memory access limited - NOT the same thing or the RX 480 wouldn't even be close to matching the R9 290 on hashrate.

For MOST usage, the Fury is a LOT faster than the R9 290 - but on ETH it's barely in the same ballpark despite the much higher "in theory" memory bandwidth. *SOMETHING* certainly keeps it uncompetative with much older cards with lower rated memory bandwidth.

The Fury cards have HBM, which has a lot higher memory bandwidth, but higher latency. Eth is sensitive to latency. This is also why 1080 sucks at Eth, the GDDR5X doesn't have much more bandwidth, but has higher latency. Tightening memory timings on Eth or Zcash give you speed boost. Games aren't really affected by latency as much, just raw bandwidth. HBM2 will be better, though Vega 10 may or may not have that much more bandwidth. There is a new way of accessing HBM2 that reduces latency some though, if application is coded for it, so we will see how things go.

xeridea

sr. member

Activity: 449

Merit: 251

Quote from: bensam1231 on November 18, 2016, 08:59:34 AM

Quote from: xeridea on November 17, 2016, 10:41:21 PM

Quote from: bensam1231 on November 17, 2016, 09:31:18 PM

Quote from: nerdralph on November 17, 2016, 03:09:59 PM

Quote from: bensam1231 on November 17, 2016, 11:26:18 AM

Do you also know if you want to check if a algo is memory limited, you can go into GPUZ and check out the MCU (memory controller unit) and see the load on it?

I think this is wrong. Although I primarily mine using Linux, I have a Windoze box that I use for testing cards. GPU-z appears to show only external bus bandwidth use (to the GDDR), and not the utilization of the bandwidth between the controller and core. In practical terms, a miner kernel may be using 200GB/s of memory bandwidth, but a significant percentage of it can be from the L2 cache. The collision counter tables in SA5 would be an example of this.

Do you have a source for this hypothesis? In all memory restricted algos that correlates to MCU usage. Pretty sure it pertains to any sort of memory overload, bandwidth or bus width...

Quote from: xeridea on November 17, 2016, 04:56:45 PM

480 and 1070 have similar TDP. Mining Zcash, their power usage would be similar. 1070 maybe slightly less if you could downclock it, but you can also undervolt the 480. Even if the 1070 is slightly more efficient with optimized Zcash, it doesn't matter much. I make 9x more on ZCash than I spend in power. So it isn't worth spending $400 on card that has same speed as $200 card.

38% wasn't from me. I was using similar method of extrapolation. I get 160S on 480, no overclocks. ~60% MCU on Claymore 6.0.

Their power usage would be similar if they were both being maxed out. Equihash is not a highly optimized algo yet, especially for Nvidia. That's the whole reason we're talking about this. You're trying to make a point of Nvidia not being that more efficient then a AMD with highly unoptimized code, not sure why you assume Nvidia with almost no one working on it is in the same shoes as AMD. Because MBK added Nvidia support he put just as much effort into Nvidia as his AMD endeavors?

What is a 'similiar method'? I was literally talking about MCU usage. Also calling BS on 60% MCU usage. Give me a screenshot, which you didn't provide for Equihash either.

I like how you base assumptions on loose logic. The whole reason I'm not believing Equihash limits are based purely on memory bus width like Dagger (not bus bandwidth). That's what the whole BCT talk thread was.

Screenshot, 55% average memory controller load, GPU-Z. http://prnt.sc/d8phr4
Doing 160S/s, Claymore 6.0. At the wall ~150w, but haven't tuned the voltage/core as much as I could. If you don't believe screenshot, fire up CM 6.0, new version tomorrow.

I know Equihash miners aren't fully optimized yet. But it is obvious it is memory limited, so even if it is fully optimized, the cards would perform similar. I was saying since 1070 has higher compute (ignoring architecture differences that could favor either card), you may be able to underclock some to reduce power, similarly you could undervolt the 480, but it'ts not worth paying extra $200 to save $2/month.

Card wasn't in the screenshot, but I'll believe you for shits and giggles since no one lies online, especially in a argument. The card is at a a reported 106w, so even if the MCU is at 55% you'll hit TDP before ever maxing out the MCU, unless the code becomes more efficient, but that can be done for Nvidia as well.

This goes to show you even more so that this algo isn't completely memory bound. If it were we wouldn't be hitting TDPs before MCU usage. If TDPs are the limiting factor, efficiency definitely becomes more important. Depending on how this algo will stress the cards when it's finally maxed out, based on what we're seeing right here, it's definitely not just memory bound. Lyra2v2 and NeoS also stress memory, but not enough for it to be the sole bottleneck.

And Wolf0 must google his name every day and BCT.

The first card. Others are 4GB, and different 480s. While compute is more of a factor than for Eth, memory is still a major factor, I highly doubt it would ever be worth getting a 1070 over a 480.

nerdralph

sr. member

Activity: 588

Merit: 251

Quote from: bensam1231 on November 17, 2016, 09:31:18 PM

Quote from: nerdralph on November 17, 2016, 03:09:59 PM

Quote from: bensam1231 on November 17, 2016, 11:26:18 AM

Do you also know if you want to check if a algo is memory limited, you can go into GPUZ and check out the MCU (memory controller unit) and see the load on it?

I think this is wrong. Although I primarily mine using Linux, I have a Windoze box that I use for testing cards. GPU-z appears to show only external bus bandwidth use (to the GDDR), and not the utilization of the bandwidth between the controller and core. In practical terms, a miner kernel may be using 200GB/s of memory bandwidth, but a significant percentage of it can be from the L2 cache. The collision counter tables in SA5 would be an example of this.

Do you have a source for this hypothesis? In all memory restricted algos that correlates to MCU usage. Pretty sure it pertains to any sort of memory overload, bandwidth or bus width...

My knowledge of the AMD GCN architecture (and computer architecture in general), and my experience writing OpenCL.

QuintLeo

legendary

Activity: 1498

Merit: 1030

Quote from: bensam1231 on November 18, 2016, 09:04:15 AM

(cores can't be compared across generations of cards or chip makers).

AMD cores in the GCN generations have been pretty consistant on their performance, if anything they've gotten a hair MORE efficient with generational changes.

Comparing GCN to Terrascale cores or to NVidia cores (which I've NOT DONE AT ALL, strawman comment there) is a lot more problematical.

QuintLeo

legendary

Activity: 1498

Merit: 1030

Quote from: bensam1231 on November 17, 2016, 09:47:04 PM

Scrypt GPU mining ended in the fall of 14 without private kernels. x11 started up shortly there after, became unprofitable at the beginning of winter. Gridseed weren't ASICs either, the first ones weren't very profitable or good. You may have just remembered those little USB things coming out and thought 'well those were ASICs', they weren't. There were a lot of really bad ASICs. Gridseeds were never a good deal.

Unless you were running private kernels yourself, it wasn't happening.

What other algo are you looking at that's mature? Dagger doesn't count. That's a very niche scenario and it's bound almost exclusively by bus width. The GPUs never get a chance to even be close being fully utilized.

R9-290 has a 512bit bus as was already mentioned.

Who tests GPUs on sha-256? How about trying something remotely relevant to the discussion like say NeoS, Lyra2v2, or even x11. People haven't made optimized miners for Sha in years. As mentioned before if you're talking about 'theoretical usage' scenarios, video games are a very good example of that as GPUs are made to run as fast as possible on them.

Memory usage doesn't need to be about bandwidth or bus width, it could just be the total memory usage as well. Not just that, it doesn't need to be restricted JUST to throughput, it can utilize memory and still do a lot of processing on GPUs. At this point though you're just making shit up and theorycrafting again.

You can blame latency all you want, but Fury not only has a 4096 bit bus, but also gobs of memory bandwidth, it's not eight times faster then R9-290 or even twice as fast. It's not just all about memory speeds here or even latency.

The Gridseed 3355 WAS in fact an ASIC - and on scrypt it was more efficient than anything GPU based at the time by quite a bit. single side of an "80 blade" would pull 2.5 Mhash/sec at 40 watts where the best GPUs of the time were pulling less than half that at a LOT more power (7990 was an exception with it's pair of cores, it could actually manage a bit more than half the hashrate but pulled a TON more power to do so).

Dagger (ETH) isn't "bus width limited, it's memory access limited - NOT the same thing or the RX 480 wouldn't even be close to matching the R9 290 on hashrate.

For MOST usage, the Fury is a LOT faster than the R9 290 - but on ETH it's barely in the same ballpark despite the much higher "in theory" memory bandwidth. *SOMETHING* certainly keeps it uncompetative with much older cards with lower rated memory bandwidth.

bensam1231

legendary

Activity: 1764

Merit: 1024

Quote from: QuintLeo on November 18, 2016, 08:51:26 AM

Quote from: ?? on ??

Quote from: bardacuda on November 17, 2016, 09:52:22 PM

Quote from: QuintLeo on November 17, 2016, 06:40:07 PM

RX 480 has faster (8000 Mhz effective) but narrower (256 bit) memory than the R9 290 and R9 390 that gives it overall slightly better memory bandwidth than the R9 290 (5000 Mhz effective at 384 bit) but slightly worse than the R9 390 (6000 effective Mhz at 384 bit).
The RX 480 has 12.5% MORE compute cores (2304 vs. 2048 for exactly a 9:8 ratio) at quite a bit HIGHER clock rate than the R9 390 and even more so than the R9 290.
RX 480 and R9 390 are both PCI-E 3.0 cards, R9 290 is only PCI-E 2.0, but that has little or no measurable effect on most mining.

The RX 480 is ~~NOT~~ "close or a bit less than a R9 290" ~~but in fact is a superior card across the board except ONLY for memory bus width~~

Might also want to pay attention to the R9 290x vs the R9 290 as they have the same memory system but the 290x has the same 2304 cores that the RX 480 does

He's gonna need some ice for that burn. Good job fact-checking.

What burn, that GPU-Z image just proves my stated facts about it.
If you're talking about the "listed" memory speed vs my stated EFFECTIVE memory speed, keep in mind that GDDR 5 can transfer 4 bytes per bus cycle - on raw clocks the R9 290 and 290x run at 1250 vs the 2000 for the RX 480, so same ratio as I stated.

I'm not discussing overclocked efforts, or it would be even worse - the RX 480 has demonstrated a LOT more overclock headroom than any R9 2xx series managed.

Too bad the other 2 links appear to be broken, would be interesting to see what they were about.

I don't need GPU-Z images for the R9 290 or R9 290x though when I have several of the first one and one of the second one and have worked with them quite a bit and have the bloody specs on them memorised.

Basic google searches disprove the majority of what you're talking about, including performance (cores can't be compared across generations of cards or chip makers). I pointed out performance and the memory bus width as that was off the top of my head. Your memory is corrupt.

bensam1231

legendary

Activity: 1764

Merit: 1024

Quote from: xeridea on November 17, 2016, 10:41:21 PM

Quote from: bensam1231 on November 17, 2016, 09:31:18 PM

Quote from: nerdralph on November 17, 2016, 03:09:59 PM

Quote from: bensam1231 on November 17, 2016, 11:26:18 AM

Do you also know if you want to check if a algo is memory limited, you can go into GPUZ and check out the MCU (memory controller unit) and see the load on it?

I think this is wrong. Although I primarily mine using Linux, I have a Windoze box that I use for testing cards. GPU-z appears to show only external bus bandwidth use (to the GDDR), and not the utilization of the bandwidth between the controller and core. In practical terms, a miner kernel may be using 200GB/s of memory bandwidth, but a significant percentage of it can be from the L2 cache. The collision counter tables in SA5 would be an example of this.

Do you have a source for this hypothesis? In all memory restricted algos that correlates to MCU usage. Pretty sure it pertains to any sort of memory overload, bandwidth or bus width...

Quote from: xeridea on November 17, 2016, 04:56:45 PM

480 and 1070 have similar TDP. Mining Zcash, their power usage would be similar. 1070 maybe slightly less if you could downclock it, but you can also undervolt the 480. Even if the 1070 is slightly more efficient with optimized Zcash, it doesn't matter much. I make 9x more on ZCash than I spend in power. So it isn't worth spending $400 on card that has same speed as $200 card.

38% wasn't from me. I was using similar method of extrapolation. I get 160S on 480, no overclocks. ~60% MCU on Claymore 6.0.

Their power usage would be similar if they were both being maxed out. Equihash is not a highly optimized algo yet, especially for Nvidia. That's the whole reason we're talking about this. You're trying to make a point of Nvidia not being that more efficient then a AMD with highly unoptimized code, not sure why you assume Nvidia with almost no one working on it is in the same shoes as AMD. Because MBK added Nvidia support he put just as much effort into Nvidia as his AMD endeavors?

What is a 'similiar method'? I was literally talking about MCU usage. Also calling BS on 60% MCU usage. Give me a screenshot, which you didn't provide for Equihash either.

I like how you base assumptions on loose logic. The whole reason I'm not believing Equihash limits are based purely on memory bus width like Dagger (not bus bandwidth). That's what the whole BCT talk thread was.

Screenshot, 55% average memory controller load, GPU-Z. http://prnt.sc/d8phr4
Doing 160S/s, Claymore 6.0. At the wall ~150w, but haven't tuned the voltage/core as much as I could. If you don't believe screenshot, fire up CM 6.0, new version tomorrow.

I know Equihash miners aren't fully optimized yet. But it is obvious it is memory limited, so even if it is fully optimized, the cards would perform similar. I was saying since 1070 has higher compute (ignoring architecture differences that could favor either card), you may be able to underclock some to reduce power, similarly you could undervolt the 480, but it'ts not worth paying extra $200 to save $2/month.

Card wasn't in the screenshot, but I'll believe you for shits and giggles since no one lies online, especially in a argument. The card is at a a reported 106w, so even if the MCU is at 55% you'll hit TDP before ever maxing out the MCU, unless the code becomes more efficient, but that can be done for Nvidia as well.

This goes to show you even more so that this algo isn't completely memory bound. If it were we wouldn't be hitting TDPs before MCU usage. If TDPs are the limiting factor, efficiency definitely becomes more important. Depending on how this algo will stress the cards when it's finally maxed out, based on what we're seeing right here, it's definitely not just memory bound. Lyra2v2 and NeoS also stress memory, but not enough for it to be the sole bottleneck.

And Wolf0 must google his name every day and BCT.

QuintLeo

legendary

Activity: 1498

Merit: 1030

Quote from: ?? on ??

Quote from: bardacuda on November 17, 2016, 09:52:22 PM

Quote from: QuintLeo on November 17, 2016, 06:40:07 PM

RX 480 has faster (8000 Mhz effective) but narrower (256 bit) memory than the R9 290 and R9 390 that gives it overall slightly better memory bandwidth than the R9 290 (5000 Mhz effective at 384 bit) but slightly worse than the R9 390 (6000 effective Mhz at 384 bit).
The RX 480 has 12.5% MORE compute cores (2304 vs. 2048 for exactly a 9:8 ratio) at quite a bit HIGHER clock rate than the R9 390 and even more so than the R9 290.
RX 480 and R9 390 are both PCI-E 3.0 cards, R9 290 is only PCI-E 2.0, but that has little or no measurable effect on most mining.

The RX 480 is ~~NOT~~ "close or a bit less than a R9 290" ~~but in fact is a superior card across the board except ONLY for memory bus width~~

Might also want to pay attention to the R9 290x vs the R9 290 as they have the same memory system but the 290x has the same 2304 cores that the RX 480 does

He's gonna need some ice for that burn. Good job fact-checking.

What burn, that GPU-Z image just proves my stated facts about it.
If you're talking about the "listed" memory speed vs my stated EFFECTIVE memory speed, keep in mind that GDDR 5 can transfer 4 bytes per bus cycle - on raw clocks the R9 290 and 290x run at 1250 vs the 2000 for the RX 480, so same ratio as I stated.

I'm not discussing overclocked efforts, or it would be even worse - the RX 480 has demonstrated a LOT more overclock headroom than any R9 2xx series managed.

Too bad the other 2 links appear to be broken, would be interesting to see what they were about.

I don't need GPU-Z images for the R9 290 or R9 290x though when I have several of the first one and one of the second one and have worked with them quite a bit and have the bloody specs on them memorised.

Topic: SILENTARMY v5: Zcash miner, 115 sol/s on R9 Nano, 70 sol/s on GTX 1070 - page 28. (Read 209337 times)