SILENTARMY v5: Zcash miner, 115 sol/s on R9 Nano, 70 sol/s on GTX 1070 - page 29.

eXtremal

sr. member

Activity: 2106

Merit: 282

👉bit.ly/3QXp3oh | 🔥 Ultimate Launc

Quote from: laik2 on November 18, 2016, 05:27:07 AM

Because it's not native and we are discussing open source project here.If you don't want to share your kernel - that's fine. At least give a little hint to mrb,nerdralph and eXtremal for the new version.
Thank you.

I know what need to do, but have a time problem. I'll make an update at this week and instruction for other developers how to get other miners performance.

Velgelm

sr. member

Activity: 299

Merit: 250

My lastest results using eqm 1.0.0b
GTX 960 1478 core 3505 Mem -t 0
48 Ss

1070 +100 core +850 Mem
132 Ss

laik2

sr. member

Activity: 652

Merit: 266

Quote from: sp_ on November 17, 2016, 06:28:49 PM

Quote from: laik2 on November 17, 2016, 06:01:51 PM

Quote from: sp_ on November 17, 2016, 01:15:05 PM

Just a windows binary and it's a nheqminer fork.

Can you share your kernel?

The dll is wide open. Why don't you extract the ptx assembly. Linux users can run it by using wine.

Because it's not native and we are discussing open source project here.If you don't want to share your kernel - that's fine. At least give a little hint to mrb,nerdralph and eXtremal for the new version.
Thank you.

laik2

sr. member

Activity: 652

Merit: 266

Quote from: nerdralph on November 17, 2016, 08:27:29 PM

Quote from: laik2 on November 17, 2016, 03:14:09 PM

Quote from: nerdralph on November 17, 2016, 03:09:59 PM

I think this is wrong. Although I primarily mine using Linux, I have a Windoze box that I use for testing cards. GPU-z appears to show only external bus bandwidth use (to the GDDR), and not the utilization of the bandwidth between the controller and core. In practical terms, a miner kernel may be using 200GB/s of memory bandwidth, but a significant percentage of it can be from the L2 cache. The collision counter tables in SA5 would be an example of this.

Any news about sa6?

Been fighting with amdgpu-pro drivers for the past day and finally got them working with a Rx 470 and R9 380. I still have to swap out a couple R9 380s from another rig for R7 370s since the amdgpu-pro drivers don't work with Pitcairn. Then I'll have one rig still running Ubuntu 14.04/fglrx with a R9 380 and a few R7 370s, and another rig running Ubuntu 16.04/amdgpu-pro with a few R9 380s and a Rx 470. That will give me the ability to test kernel tweaks on amdgpu-pro instead of relying on Marc.

Check out this:
https://cgit.freedesktop.org/~agd5f/linux/log/?h=amdgpu

Latest pull for 4.9 has SI support.

Velgelm

sr. member

Activity: 299

Merit: 250

just compare EQM 1.0.0b VS SP #1 VS SA Lastest port V5.
1070 +200 core -500Mem
960 +200 core -500 Mem
Core i3 2100
cd 0 1
-t 4
EQM

SP

SA v5 + some tweaks in params.h and main.c for 1070

xeridea

sr. member

Activity: 449

Merit: 251

Quote from: sp_ on November 17, 2016, 05:36:42 PM

Quote from: xeridea on November 17, 2016, 04:56:45 PM

So it isn't worth spending $400 on card that has same speed as $200 card.
38% wasn't from me. I was using similar method of extrapolation. I get 160S on 480, no overclocks. ~60% MCU on Claymore 6.0.

Bether bang for the buck on the cheap 1060 3gb and private kernels. The public Zcash sp-mod #1 produce 80Sol/s on 60% tdp.

For the cost of a 1060 3GB, you can get 480 that will do 160Sol/s easy. Sure it will use more power, but doing twice the solves. Nvidia not optimized yet, but in theory a low budget 470 should be faster than 1060 due to memory. 1060 3GB is a lot better perf/$ than 1070 though.

xeridea

sr. member

Activity: 449

Merit: 251

Quote from: bensam1231 on November 17, 2016, 09:31:18 PM

Quote from: nerdralph on November 17, 2016, 03:09:59 PM

Quote from: bensam1231 on November 17, 2016, 11:26:18 AM

Do you also know if you want to check if a algo is memory limited, you can go into GPUZ and check out the MCU (memory controller unit) and see the load on it?

I think this is wrong. Although I primarily mine using Linux, I have a Windoze box that I use for testing cards. GPU-z appears to show only external bus bandwidth use (to the GDDR), and not the utilization of the bandwidth between the controller and core. In practical terms, a miner kernel may be using 200GB/s of memory bandwidth, but a significant percentage of it can be from the L2 cache. The collision counter tables in SA5 would be an example of this.

Do you have a source for this hypothesis? In all memory restricted algos that correlates to MCU usage. Pretty sure it pertains to any sort of memory overload, bandwidth or bus width...

Quote from: xeridea on November 17, 2016, 04:56:45 PM

480 and 1070 have similar TDP. Mining Zcash, their power usage would be similar. 1070 maybe slightly less if you could downclock it, but you can also undervolt the 480. Even if the 1070 is slightly more efficient with optimized Zcash, it doesn't matter much. I make 9x more on ZCash than I spend in power. So it isn't worth spending $400 on card that has same speed as $200 card.

38% wasn't from me. I was using similar method of extrapolation. I get 160S on 480, no overclocks. ~60% MCU on Claymore 6.0.

Their power usage would be similar if they were both being maxed out. Equihash is not a highly optimized algo yet, especially for Nvidia. That's the whole reason we're talking about this. You're trying to make a point of Nvidia not being that more efficient then a AMD with highly unoptimized code, not sure why you assume Nvidia with almost no one working on it is in the same shoes as AMD. Because MBK added Nvidia support he put just as much effort into Nvidia as his AMD endeavors?

What is a 'similiar method'? I was literally talking about MCU usage. Also calling BS on 60% MCU usage. Give me a screenshot, which you didn't provide for Equihash either.

I like how you base assumptions on loose logic. The whole reason I'm not believing Equihash limits are based purely on memory bus width like Dagger (not bus bandwidth). That's what the whole BCT talk thread was.

Screenshot, 55% average memory controller load, GPU-Z. http://prnt.sc/d8phr4
Doing 160S/s, Claymore 6.0. At the wall ~150w, but haven't tuned the voltage/core as much as I could. If you don't believe screenshot, fire up CM 6.0, new version tomorrow.

I know Equihash miners aren't fully optimized yet. But it is obvious it is memory limited, so even if it is fully optimized, the cards would perform similar. I was saying since 1070 has higher compute (ignoring architecture differences that could favor either card), you may be able to underclock some to reduce power, similarly you could undervolt the 480, but it'ts not worth paying extra $200 to save $2/month.

bardacuda

sr. member

Activity: 430

Merit: 254

Quote from: QuintLeo on November 17, 2016, 06:40:07 PM

RX 480 has faster (8000 Mhz effective) but narrower (256 bit) memory than the R9 290 and R9 390 that gives it overall slightly better memory bandwidth than the R9 290 (5000 Mhz effective at 384 bit) but slightly worse than the R9 390 (6000 effective Mhz at 384 bit).
The RX 480 has 12.5% MORE compute cores (2304 vs. 2048 for exactly a 9:8 ratio) at quite a bit HIGHER clock rate than the R9 390 and even more so than the R9 290.
RX 480 and R9 390 are both PCI-E 3.0 cards, R9 290 is only PCI-E 2.0, but that has little or no measurable effect on most mining.

The RX 480 is ~~NOT~~ "close or a bit less than a R9 290" ~~but in fact is a superior card across the board except ONLY for memory bus width~~

Might also want to pay attention to the R9 290x vs the R9 290 as they have the same memory system but the 290x has the same 2304 cores that the RX 480 does

bensam1231

legendary

Activity: 1764

Merit: 1024

Quote from: QuintLeo on November 17, 2016, 06:40:07 PM

Odd, I was mining Scrypt profitably with GPUs for a couple months into the Gridseed era - "private kernels" did NOT kill Scrypt mining.

Why yes, I DO base my "efficiency" numbers off current conditions - but I don't just look at ONE algorythm that's still new and not optimised for NVIdia, I also look at others that ARE optimied for both and are similar in conditions.
Keep in mind that I SPECIFICALLY STATED "Genoil's miner" for ETH. Your comments about "that was Dagger" just show you didn't bother to read what I POSTED.

RX 480 has faster (8000 Mhz effective) but narrower (256 bit) memory than the R9 290 and R9 390 that gives it overall slightly better memory bandwidth than the R9 290 (5000 Mhz effective at 384 bit) but slightly worse than the R9 390 (6000 effective Mhz at 384 bit).
The RX 480 has 12.5% MORE compute cores (2304 vs. 2048 for exactly a 9:8 ratio) at quite a bit HIGHER clock rate than the R9 390 and even more so than the R9 290.
RX 480 and R9 390 are both PCI-E 3.0 cards, R9 290 is only PCI-E 2.0, but that has little or no measurable effect on most mining.

The RX 480 is NOT "close or a bit less than a R9 290" but in fact is a superior card across the board except ONLY for memory bus width (which is made up for and more by it's much faster memory), but it's speed on ETH and ZEC is almost identical, definitely NOT seeing 12.5% better speed much less it's actual 12.5% MORE CORES TIMES IT'S HIGHER CLOCK SPEED which would be the case on a compute-limited algorythm.
On an actual compute-limited algorythm like SHA256 (which is still used by a few sites like GPUBoss for a benchmark), the RX480 blows the R9 290 and R9 390 completely out of the water.

Might also want to pay attention to the R9 290x vs the R9 290 as they have the same memory system but the 290x has the same 2304 cores that the RX 480 does - yet doesn't hash any faster than the R9 290 despite having 12.5% more cores.

Am I saying there isn't room for improvement on the NVIdia side for ZEC mining? Definitely not!
Am I saying I doubt that NVidia will surpass AMD on ZEC? Given the obvious "heavy memory usage for ASIC resistance" design of ZEC and th very similar memory systems on both sides, definitely.

Yes, I'm fully aware that the FuryX and Nano have 4096 cores and fairly high core clock rates (Higher than most if not all R9 390 as I recall, definitely higher than any R9 290, but not quite as high as the RX480) - which just MAGNIFIES my point as they should be completely destroying anything else AMD on both ETH and ZEC if the protocals were compute-bound, but in actual fact the RX 480 hashes ETH noticeably better and is close or better on ZEC from the benchmarks I've seen posted.
Apparently HBM 1 has some latency issues that make it quite a bit slower than it's "raw memory access speed" would indicate, which doesn't apply when comparing various cards that all have GDDR 5 to each other.

Scrypt GPU mining ended in the fall of 14 without private kernels. x11 started up shortly there after, became unprofitable at the beginning of winter. Gridseed weren't ASICs either, the first ones weren't very profitable or good. You may have just remembered those little USB things coming out and thought 'well those were ASICs', they weren't. There were a lot of really bad ASICs. Gridseeds were never a good deal.

Unless you were running private kernels yourself, it wasn't happening.

What other algo are you looking at that's mature? Dagger doesn't count. That's a very niche scenario and it's bound almost exclusively by bus width. The GPUs never get a chance to even be close being fully utilized.

R9-290 has a 512bit bus as was already mentioned.

Who tests GPUs on sha-256? How about trying something remotely relevant to the discussion like say NeoS, Lyra2v2, or even x11. People haven't made optimized miners for Sha in years. As mentioned before if you're talking about 'theoretical usage' scenarios, video games are a very good example of that as GPUs are made to run as fast as possible on them.

Memory usage doesn't need to be about bandwidth or bus width, it could just be the total memory usage as well. Not just that, it doesn't need to be restricted JUST to throughput, it can utilize memory and still do a lot of processing on GPUs. At this point though you're just making shit up and theorycrafting again.

You can blame latency all you want, but Fury not only has a 4096 bit bus, but also gobs of memory bandwidth, it's not eight times faster then R9-290 or even twice as fast. It's not just all about memory speeds here or even latency.

induktor

hero member

Activity: 710

Merit: 502

Quote from: sp_ on November 17, 2016, 06:28:49 PM

Quote from: laik2 on November 17, 2016, 06:01:51 PM

Quote from: sp_ on November 17, 2016, 01:15:05 PM

Just a windows binary and it's a nheqminer fork.

Can you share your kernel?

The dll is wide open. Why don't you extract the ptx assembly. Linux users can run it by using wine.

not an option in our very specialized mining rig (heavily customized lubuntu 14.04 and 16.04 ).

there is a .tar.gz in that github, so that .tar.gz has nothing to do with the windows compiled version?

bensam1231

legendary

Activity: 1764

Merit: 1024

Quote from: nerdralph on November 17, 2016, 03:09:59 PM

Quote from: bensam1231 on November 17, 2016, 11:26:18 AM

Do you also know if you want to check if a algo is memory limited, you can go into GPUZ and check out the MCU (memory controller unit) and see the load on it?

I think this is wrong. Although I primarily mine using Linux, I have a Windoze box that I use for testing cards. GPU-z appears to show only external bus bandwidth use (to the GDDR), and not the utilization of the bandwidth between the controller and core. In practical terms, a miner kernel may be using 200GB/s of memory bandwidth, but a significant percentage of it can be from the L2 cache. The collision counter tables in SA5 would be an example of this.

Do you have a source for this hypothesis? In all memory restricted algos that correlates to MCU usage. Pretty sure it pertains to any sort of memory overload, bandwidth or bus width...

Quote from: xeridea on November 17, 2016, 04:56:45 PM

480 and 1070 have similar TDP. Mining Zcash, their power usage would be similar. 1070 maybe slightly less if you could downclock it, but you can also undervolt the 480. Even if the 1070 is slightly more efficient with optimized Zcash, it doesn't matter much. I make 9x more on ZCash than I spend in power. So it isn't worth spending $400 on card that has same speed as $200 card.

38% wasn't from me. I was using similar method of extrapolation. I get 160S on 480, no overclocks. ~60% MCU on Claymore 6.0.

Their power usage would be similar if they were both being maxed out. Equihash is not a highly optimized algo yet, especially for Nvidia. That's the whole reason we're talking about this. You're trying to make a point of Nvidia not being that more efficient then a AMD with highly unoptimized code, not sure why you assume Nvidia with almost no one working on it is in the same shoes as AMD. Because MBK added Nvidia support he put just as much effort into Nvidia as his AMD endeavors?

What is a 'similiar method'? I was literally talking about MCU usage. Also calling BS on 60% MCU usage. Give me a screenshot, which you didn't provide for Equihash either.

I like how you base assumptions on loose logic. The whole reason I'm not believing Equihash limits are based purely on memory bus width like Dagger (not bus bandwidth). That's what the whole BCT talk thread was.

Subw

hero member

Activity: 672

Merit: 500

Quote from: QuintLeo on November 17, 2016, 06:40:07 PM

RX 480 has faster (8000 Mhz effective) but narrower (256 bit) memory than the R9 290 and R9 390 that gives it overall slightly better memory bandwidth than the R9 290 (5000 Mhz effective at 384 bit) but slightly worse than the R9 390 (6000 effective Mhz at 384 bit).

don't know what are you arguing about but R9 290/390 have 512bit width memory bus

nerdralph

sr. member

Activity: 588

Merit: 251

Quote from: laik2 on November 17, 2016, 03:14:09 PM

Quote from: nerdralph on November 17, 2016, 03:09:59 PM

I think this is wrong. Although I primarily mine using Linux, I have a Windoze box that I use for testing cards. GPU-z appears to show only external bus bandwidth use (to the GDDR), and not the utilization of the bandwidth between the controller and core. In practical terms, a miner kernel may be using 200GB/s of memory bandwidth, but a significant percentage of it can be from the L2 cache. The collision counter tables in SA5 would be an example of this.

Any news about sa6?

Been fighting with amdgpu-pro drivers for the past day and finally got them working with a Rx 470 and R9 380. I still have to swap out a couple R9 380s from another rig for R7 370s since the amdgpu-pro drivers don't work with Pitcairn. Then I'll have one rig still running Ubuntu 14.04/fglrx with a R9 380 and a few R7 370s, and another rig running Ubuntu 16.04/amdgpu-pro with a few R9 380s and a Rx 470. That will give me the ability to test kernel tweaks on amdgpu-pro instead of relying on Marc.

QuintLeo

legendary

Activity: 1498

Merit: 1030

Quote from: bensam1231 on November 17, 2016, 11:26:18 AM

Quote from: QuintLeo on November 16, 2016, 07:45:08 PM

Scrypt died for GPUs when the Gridseed and later ASIC showed up for it - had nothing to do with AMD vs Nvidia.
I wasn't around early enough for the Bitcoin GPU days but it appears that the same thing happened there.

Also, I never specified memory bus width - I'm talking OVERALL memory access, the stuff that keeps the R9 290x hashing at the same rate on ETH as the R9 290 (among other examples).
The reason the R9 290/390 and such are competative on ETH and ZEC is that their bus width and other memory subsystem design makes up for their much lower memory speed, but the algorythms used in ETH and ZEC are very much memory access limited more than compute limited (or the R9 290x would hash noticeably better than the R9 290 does - on ETH at least where the code has been well optimised, they hash pretty much identically presuming same clocks and same BIOS memory system mods).

Do keep in mind that for ETH at least there IS a miner (genoil's) that started out as CUDA specific and is well optimised for NVidia, yet the AMD RX series cards match or better the NVidia GTX 10xx cards on that algorythm on both raw performance AND hash/watt and at a much lower price point.
This isn't the case as much for ZEC (the code is still getting optimised), but it's become apparent that ZEC is yet another "memory hard" algorythm by design and implimentation that does not reward superior compute performance past the point that the memory subsystem starts hitting it's limits (if not as much so as ETH).

No, I'm not an "ETH baby" - all of my early ETH rigs were Scrypt rigs back in the day (give or take some cards getting moved around) that spent their time after Scrypt went ASIC doing d.net work (and most of the HD 7750s from my scrypt days are STILL working d.net via the BOINC MooWrapper project).

I don't know where you're comming up with NVidia being 40% more efficient than the RX 4xx series - right now it's looking like actual efficiency is more or less a tossup, but very dependent on what you're actually working on with a given card. Even on Folding where NVidia offers a clear performance lead, the RX 480 is a tossup with the GTX 1070 on PPD/$ at the card level and very close at the system level, and very close on PPD/watt (less than 10% per the data I've seen at the card level).
I do NOT see a 40% more efficient benefit to NVidia even in one of it's biggest strongholds.

That is definitely incorrect. Private kernels killed Scrypt mining... ASIC's came along later If you weren't around at the end of 14 you would'nt have figured that out. Not everything is the big bad ASIC boogieman... Sometimes it's just greed and people turning off the lights. You can Google my posts and check them out from BCT in '14. Hence why I'm here trying to motivate some development for Nvidia's side.

"I don't know where you're comming up with NVidia being 40% more efficient than the RX 4xx series - right now it's looking like actual efficiency is more or less a tossup"'

With a lack of coding for Nvidia you're making this statement off of current conditions and rates. Do you think as much effort is going into developing code for Nvidia as AMD right now? The answer is no. You already said no. The efficiency argument is based off of algos that actually use more then memory, not just that but gaming as well. While mining isn't gaming, gaming has been optimized quite a bit over the years. When one brand is getting maxed, the other is as well. Go look up some hardware benchmarks, that's pretty fundamental stuff.

Genoil's miner isn't CUDA optimized. That was Dagger, not Equihash. His endeavours in Equihash are focused on AMD hardware as he owns it. It wasn't until recently that he made a Nvidia compatible miner and it's just a port of SAv5.

Alright, how about some sources for Equihash being hardware memory bus width locked that I haven't seen on BCT and isn't extrapolated from a CPU miner or current rates of AMD hardware. You know Fury also has a better processor then a R9-290? You also know that a RX-480 is basically a mid-range GPU with processing power to match it (close or a bit less then a R9-290)?

Do you also know if you want to check if a algo is memory limited, you can go into GPUZ and check out the MCU (memory controller unit) and see the load on it? Mine sits at 38% at 108sols for a 1070. If we want to take a page from your book and 'extrapolate' from that, that means there is potential there for 284sols on a 1070, that is IF it's completely memory bound and without any sort of optimizing for Nvidia hardware. NeoS also sits around 30% MCU usage. Dagger sits at 100% right before it trades off to more GPU and power usage (if you use a dual miner). Cryptonote also sits at 100% utilization. Weird, all the 'smart minds' and no one bothers checking the gauges.

Odd, I was mining Scrypt profitably with GPUs for a couple months into the Gridseed era - "private kernels" did NOT kill Scrypt mining.

Why yes, I DO base my "efficiency" numbers off current conditions - but I don't just look at ONE algorythm that's still new and not optimised for NVIdia, I also look at others that ARE optimied for both and are similar in conditions.
Keep in mind that I SPECIFICALLY STATED "Genoil's miner" for ETH. Your comments about "that was Dagger" just show you didn't bother to read what I POSTED.

RX 480 has faster (8000 Mhz effective) but narrower (256 bit) memory than the R9 290 and R9 390 that gives it overall slightly better memory bandwidth than the R9 290 (5000 Mhz effective at 384 bit) but slightly worse than the R9 390 (6000 effective Mhz at 384 bit).
The RX 480 has 12.5% MORE compute cores (2304 vs. 2048 for exactly a 9:8 ratio) at quite a bit HIGHER clock rate than the R9 390 and even more so than the R9 290.
RX 480 and R9 390 are both PCI-E 3.0 cards, R9 290 is only PCI-E 2.0, but that has little or no measurable effect on most mining.

The RX 480 is NOT "close or a bit less than a R9 290" but in fact is a superior card across the board except ONLY for memory bus width (which is made up for and more by it's much faster memory), but it's speed on ETH and ZEC is almost identical, definitely NOT seeing 12.5% better speed much less it's actual 12.5% MORE CORES TIMES IT'S HIGHER CLOCK SPEED which would be the case on a compute-limited algorythm.
On an actual compute-limited algorythm like SHA256 (which is still used by a few sites like GPUBoss for a benchmark), the RX480 blows the R9 290 and R9 390 completely out of the water.

Might also want to pay attention to the R9 290x vs the R9 290 as they have the same memory system but the 290x has the same 2304 cores that the RX 480 does - yet doesn't hash any faster than the R9 290 despite having 12.5% more cores.

Am I saying there isn't room for improvement on the NVIdia side for ZEC mining? Definitely not!
Am I saying I doubt that NVidia will surpass AMD on ZEC? Given the obvious "heavy memory usage for ASIC resistance" design of ZEC and th very similar memory systems on both sides, definitely.

Yes, I'm fully aware that the FuryX and Nano have 4096 cores and fairly high core clock rates (Higher than most if not all R9 390 as I recall, definitely higher than any R9 290, but not quite as high as the RX480) - which just MAGNIFIES my point as they should be completely destroying anything else AMD on both ETH and ZEC if the protocals were compute-bound, but in actual fact the RX 480 hashes ETH noticeably better and is close or better on ZEC from the benchmarks I've seen posted.
Apparently HBM 1 has some latency issues that make it quite a bit slower than it's "raw memory access speed" would indicate, which doesn't apply when comparing various cards that all have GDDR 5 to each other.

sp_

legendary

Activity: 2954

Merit: 1087

Team Black developer

Quote from: laik2 on November 17, 2016, 06:01:51 PM

Quote from: sp_ on November 17, 2016, 01:15:05 PM

Just a windows binary and it's a nheqminer fork.

Can you share your kernel?

The dll is wide open. Why don't you extract the ptx assembly. Linux users can run it by using wine.

laik2

sr. member

Activity: 652

Merit: 266

Quote from: sp_ on November 17, 2016, 01:15:05 PM

Just a windows binary and it's a nheqminer fork.

Can you share your kernel?

jk_14

legendary

Activity: 1292

Merit: 1000

Quote from: sp_ on November 17, 2016, 02:57:20 PM

Quote from: hackmyl1fe on November 17, 2016, 02:46:17 PM

Hey SP_,
how do you build and run this on linux? The read me isnt clear to me

Just a beta windows build for now. I think I can push it to 140++ sol/s on the 1070 with some more work. We don't want Claymore to have all the network hash do we?

linux would be nice

sp_

legendary

Activity: 2954

Merit: 1087

Team Black developer

Quote from: xeridea on November 17, 2016, 04:56:45 PM

So it isn't worth spending $400 on card that has same speed as $200 card.
38% wasn't from me. I was using similar method of extrapolation. I get 160S on 480, no overclocks. ~60% MCU on Claymore 6.0.

Bether bang for the buck on the cheap 1060 3gb and private kernels. The public Zcash sp-mod #1 produce 80Sol/s on 60% tdp.

xeridea

sr. member

Activity: 449

Merit: 251

Quote from: bensam1231 on November 17, 2016, 12:41:38 PM

Quote from: xeridea on November 17, 2016, 11:53:29 AM

Quote from: bensam1231 on November 17, 2016, 11:26:18 AM

Quote from: QuintLeo on November 16, 2016, 07:45:08 PM

Quote from: bensam1231 on November 16, 2016, 08:27:58 AM

Quote from: QuintLeo on November 15, 2016, 07:49:16 PM

The RX 480 is a tossup with the GTX 1070 on memory access - which is why they're comparable at best in performance on algorythms like the ones ZEC and ETH use despite the GTX 1070 cost being almost twice as much.

Nvidia gets the "scraps" on mining because most mining algorythms don't use most of the parts of a NVidia card that makes it competative with AMD cards on general or compute-bound usage at a given price point, and as a result few folks use NVidia cards to mine on which makes them a much lower priority for development.

It's not "lack of development" ALONE that keeps Nvidia uncompetative on a hash/$ basis for ETH and ZEC (and derivatives using the same algorythms).
It's the inherent design of the ALGORYTHMS that keep NVidia uncompetative on a hash/$ basis coupled with the higher PRICE of their cards that have competative memory access even when development IS mature.

It's waaaaay too early to call this based on memory bus width. There is a lot of theorycrafting and it's all based on current hashrates and extrapolating against the original CPU miner code, not GPU optimized code, and not code made specifically for Nvidia hardware.

The only algo that doesn't fully utilize a 1070 is Dagger, Ethereum, which I've mentioned before. Which has lead to a misconception of the capabilities of a 1070... see your post. There are a lot of other algos out there... NeoS, Lyra2v2, Lbry, there are more all of which the 1070 performs quite well in. However, they aren't high volume and as such it leads to statements like what you made... Assuming all of crypto land is just Dagger-Hashimoto. Dagger is the only really memory bound Algo out there, Cryptonote also is, but that's controlled by CPU botnets because of it.

It is the lack of development in Equihash, that's for certain. The only Nvidia optimized miner that has come out was from Nicehash and it was worthless a day later as it wasn't being made by the big three.

The term you were looking for is 'scrypt' and that is where things died for AMD as well.

ETH and ZEC are both memory-limited algorythms - and are where AMD is currently shining once again.
NoeS, Lyra2v2, and Lbry don't make much - even with the limitations of the memory system on a 1070 being no faster than the AMD RX 470/480 I still see better profitability out of my 1070s on ETH than any of the coins based on those algorythms.

Scrypt died for GPUs when the Gridseed and later ASIC showed up for it - had nothing to do with AMD vs Nvidia.
I wasn't around early enough for the Bitcoin GPU days but it appears that the same thing happened there.

Also, I never specified memory bus width - I'm talking OVERALL memory access, the stuff that keeps the R9 290x hashing at the same rate on ETH as the R9 290 (among other examples).
The reason the R9 290/390 and such are competative on ETH and ZEC is that their bus width and other memory subsystem design makes up for their much lower memory speed, but the algorythms used in ETH and ZEC are very much memory access limited more than compute limited (or the R9 290x would hash noticeably better than the R9 290 does - on ETH at least where the code has been well optimised, they hash pretty much identically presuming same clocks and same BIOS memory system mods).

Do keep in mind that for ETH at least there IS a miner (genoil's) that started out as CUDA specific and is well optimised for NVidia, yet the AMD RX series cards match or better the NVidia GTX 10xx cards on that algorythm on both raw performance AND hash/watt and at a much lower price point.
This isn't the case as much for ZEC (the code is still getting optimised), but it's become apparent that ZEC is yet another "memory hard" algorythm by design and implimentation that does not reward superior compute performance past the point that the memory subsystem starts hitting it's limits (if not as much so as ETH).

No, I'm not an "ETH baby" - all of my early ETH rigs were Scrypt rigs back in the day (give or take some cards getting moved around) that spent their time after Scrypt went ASIC doing d.net work (and most of the HD 7750s from my scrypt days are STILL working d.net via the BOINC MooWrapper project).

I don't know where you're comming up with NVidia being 40% more efficient than the RX 4xx series - right now it's looking like actual efficiency is more or less a tossup, but very dependent on what you're actually working on with a given card. Even on Folding where NVidia offers a clear performance lead, the RX 480 is a tossup with the GTX 1070 on PPD/$ at the card level and very close at the system level, and very close on PPD/watt (less than 10% per the data I've seen at the card level).
I do NOT see a 40% more efficient benefit to NVidia even in one of it's biggest strongholds.

That is definitely incorrect. Private kernels killed Scrypt mining... ASIC's came along later If you weren't around at the end of 14 you would'nt have figured that out. Not everything is the big bad ASIC boogieman... Sometimes it's just greed and people turning off the lights. You can Google my posts and check them out from BCT in '14. Hence why I'm here trying to motivate some development for Nvidia's side.

"I don't know where you're comming up with NVidia being 40% more efficient than the RX 4xx series - right now it's looking like actual efficiency is more or less a tossup"'

With a lack of coding for Nvidia you're making this statement off of current conditions and rates. Do you think as much effort is going into developing code for Nvidia as AMD right now? The answer is no. You already said no. The efficiency argument is based off of algos that actually use more then memory, not just that but gaming as well. While mining isn't gaming, gaming has been optimized quite a bit over the years. When one brand is getting maxed, the other is as well. Go look up some hardware benchmarks, that's pretty fundamental stuff.

Genoil's miner isn't CUDA optimized. That was Dagger, not Equihash. His endeavours in Equihash are focused on AMD hardware as he owns it. It wasn't until recently that he made a Nvidia compatible miner and it's just a port of SAv5.

Alright, how about some sources for Equihash being hardware memory bus width locked that I haven't seen on BCT and isn't extrapolated from a CPU miner or current rates of AMD hardware. You know Fury also has a better processor then a R9-290? You also know that a RX-480 is basically a mid-range GPU with processing power to match it (close or a bit less then a R9-290)?

Do you also know if you want to check if a algo is memory limited, you can go into GPUZ and check out the MCU (memory controller unit) and see the load on it? Mine sits at 38% at 108sols for a 1070. If we want to take a page from your book and 'extrapolate' from that, that means there is potential there for 284sols on a 1070, that is IF it's completely memory bound and without any sort of optimizing for Nvidia hardware. NeoS also sits around 30% MCU usage. Dagger sits at 100% right before it trades off to more GPU and power usage (if you use a dual miner). Cryptonote also sits at 100% utilization. Weird, all the 'smart minds' and no one bothers checking the gauges.

By similar extrapolation, 480 could do 266S/s (no memory OC). So slightly slower at half the cost, similar power. So even with both using theoretical optimal miners, 1070 is still poor choice. 1060 3GB would be reasonable, but still not as good of S/$.

Where are you getting your power usage numbers from? Power efficiency neither matches other algos or gaming benchmarks. You know why so many people are complaining about 'claymore killing my GPUs' because they aren't used to a full power load on their hardware. They based it around silly low numbers, like first releases of Equihash or even Ethereum. As a miner more fully utilizes your hardware, it will start approaching the maximum TDP of the card. While the 1070 and the 480 have similiar TDPs, the amount of processing power available to a 1070 is almost double that of a 480.

MCU usage is sitting at 38% or did you just take my number and use it yourself? How about a screenshot.

Quote from: toptek on November 17, 2016, 12:09:03 PM

my bad

Code:

That is definitely incorrect. Private kernels killed Scrypt mining... ASIC's came along later

it looked that way back then to me i also stopped using GPU to mine any coins when ASIC 256 miners came out if i remember right they came out before ASIC Script miners did so your probably right and I must have just came around to mining about the time the shit hit the fan and missed more then i thought . i do know the first ASIC Script was made by LKETC it looked like a Grind seed GBlack miner but had more hash power and may of had help making it from some who made the Grin seed but LKETC made the first ASIC Script miner, Avalon made the first ASIC miner of any kind and i remember when LKETC's came out and shortly after grind seed came out with there version ,and took over the market so to speak then Zeus or all the others including Scam company's ,which Zeus turned into, but i didn't make the above post i made another one or short version.

ASIC scrypt miners debuted in 15. Wolf0 is one of the popular private kernel devs that killed Scrypt mining. Back then he only sold to big farms and a handful of them. X11 started coming out around the end of '14, but once again that was specifically private kernels that were dominating that and was unprofitable with public miners at the end of '14. x11 ASIC miners didn't debut until '16, probably in use since the end of '15.

I looked at a ton of different options then, but sold all of my AMD hardware at the time as it was below power costs to mine with said hardware even with pretty cheap electricity. I took a 60% loss to my assets because of it. I would've made a pretty good amount of money on Ethereum with it, but you know hindsight is always 20/20 and it was an entire year before Ethereum came out.

This is going to happen again once Ethereum starts going PoS, which it's already doing. This spring it's going to be pointless to mine Ethereum and all that AMD hash is going to be looking for something juicy to sink their teeth into. Power efficiency is going to start mattering a lot more. Maybe Equihash will turn into the next Dagger, but those are big shoes to fill and we're just at the beginning.

480 and 1070 have similar TDP. Mining Zcash, their power usage would be similar. 1070 maybe slightly less if you could downclock it, but you can also undervolt the 480. Even if the 1070 is slightly more efficient with optimized Zcash, it doesn't matter much. I make 9x more on ZCash than I spend in power. So it isn't worth spending $400 on card that has same speed as $200 card.

38% wasn't from me. I was using similar method of extrapolation. I get 160S on 480, no overclocks. ~60% MCU on Claymore 6.0.

sp_

legendary

Activity: 2954

Merit: 1087

Team Black developer

Quote from: zawawa on November 17, 2016, 04:48:25 PM

Glad your CUDA port is working well! OpenCL implementations for NVIDIA cards are interim solutions anyway.

It's not just a port. It's a rewrite.

Topic: SILENTARMY v5: Zcash miner, 115 sol/s on R9 Nano, 70 sol/s on GTX 1070 - page 29. (Read 209337 times)