The RX 480 is a tossup with the GTX 1070 on memory access - which is why they're comparable at best in performance on algorythms like the ones ZEC and ETH use despite the GTX 1070 cost being almost twice as much.
Nvidia gets the "scraps" on mining because most mining algorythms don't use most of the parts of a NVidia card that makes it competative with AMD cards on general or compute-bound usage at a given price point, and as a result few folks use NVidia cards to mine on which makes them a much lower priority for development.
It's not "lack of development" ALONE that keeps Nvidia uncompetative on a hash/$ basis for ETH and ZEC (and derivatives using the same algorythms).
It's the inherent design of the ALGORYTHMS that keep NVidia uncompetative on a hash/$ basis coupled with the higher PRICE of their cards that have competative memory access even when development IS mature.
It's waaaaay too early to call this based on memory bus width. There is a lot of theorycrafting and it's all based on current hashrates and extrapolating against the original CPU miner code, not GPU optimized code, and not code made specifically for Nvidia hardware.
The only algo that doesn't fully utilize a 1070 is Dagger, Ethereum, which I've mentioned before. Which has lead to a misconception of the capabilities of a 1070... see your post. There are a lot of other algos out there... NeoS, Lyra2v2, Lbry, there are more all of which the 1070 performs quite well in. However, they aren't high volume and as such it leads to statements like what you made... Assuming all of crypto land is just Dagger-Hashimoto. Dagger is the only really memory bound Algo out there, Cryptonote also is, but that's controlled by CPU botnets because of it.
It is the lack of development in Equihash, that's for certain. The only Nvidia optimized miner that has come out was from Nicehash and it was worthless a day later as it wasn't being made by the big three.
The term you were looking for is 'scrypt' and that is where things died for AMD as well.
ETH and ZEC are both memory-limited algorythms - and are where AMD is currently shining once again.
NoeS, Lyra2v2, and Lbry don't make much - even with the limitations of the memory system on a 1070 being no faster than the AMD RX 470/480 I still see better profitability out of my 1070s on ETH than any of the coins based on those algorythms.
Scrypt died for GPUs when the Gridseed and later ASIC showed up for it - had nothing to do with AMD vs Nvidia.
I wasn't around early enough for the Bitcoin GPU days but it appears that the same thing happened there.
Also, I never specified memory bus width - I'm talking OVERALL memory access, the stuff that keeps the R9 290x hashing at the same rate on ETH as the R9 290 (among other examples).
The reason the R9 290/390 and such are competative on ETH and ZEC is that their bus width and other memory subsystem design makes up for their much lower memory speed, but the algorythms used in ETH and ZEC are very much memory access limited more than compute limited (or the R9 290x would hash noticeably better than the R9 290 does - on ETH at least where the code has been well optimised, they hash pretty much identically presuming same clocks and same BIOS memory system mods).
Do keep in mind that for ETH at least there IS a miner (genoil's) that started out as CUDA specific and is well optimised for NVidia, yet the AMD RX series cards match or better the NVidia GTX 10xx cards on that algorythm on both raw performance AND hash/watt and at a much lower price point.
This isn't the case as much for ZEC (the code is still getting optimised), but it's become apparent that ZEC is yet another "memory hard" algorythm by design and implimentation that does not reward superior compute performance past the point that the memory subsystem starts hitting it's limits (if not as much so as ETH).
No, I'm not an "ETH baby" - all of my early ETH rigs were Scrypt rigs back in the day (give or take some cards getting moved around) that spent their time after Scrypt went ASIC doing d.net work (and most of the HD 7750s from my scrypt days are STILL working d.net via the BOINC MooWrapper project).
I don't know where you're comming up with NVidia being 40% more efficient than the RX 4xx series - right now it's looking like actual efficiency is more or less a tossup, but very dependent on what you're actually working on with a given card. Even on Folding where NVidia offers a clear performance lead, the RX 480 is a tossup with the GTX 1070 on PPD/$ at the card level and very close at the system level, and very close on PPD/watt (less than 10% per the data I've seen at the card level).
I do NOT see a 40% more efficient benefit to NVidia even in one of it's biggest strongholds.
That is definitely incorrect. Private kernels killed Scrypt mining... ASIC's came along later If you weren't around at the end of 14 you would'nt have figured that out. Not everything is the big bad ASIC boogieman... Sometimes it's just greed and people turning off the lights. You can Google my posts and check them out from BCT in '14. Hence why I'm here trying to motivate some development for Nvidia's side.
"I don't know where you're comming up with NVidia being 40% more efficient than the RX 4xx series - right now it's looking like actual efficiency is more or less a tossup"'
With a lack of coding for Nvidia you're making this statement off of current conditions and rates. Do you think as much effort is going into developing code for Nvidia as AMD right now? The answer is no. You already said no. The efficiency argument is based off of algos that actually use more then memory, not just that but gaming as well. While mining isn't gaming, gaming has been optimized quite a bit over the years. When one brand is getting maxed, the other is as well. Go look up some hardware benchmarks, that's pretty fundamental stuff.
Genoil's miner isn't CUDA optimized. That was Dagger, not Equihash. His endeavours in Equihash are focused on AMD hardware as he owns it. It wasn't until recently that he made a Nvidia compatible miner and it's just a port of SAv5.
Alright, how about some sources for Equihash being hardware memory bus width locked that I haven't seen on BCT and isn't extrapolated from a CPU miner or current rates of AMD hardware. You know Fury also has a better processor then a R9-290? You also know that a RX-480 is basically a mid-range GPU with processing power to match it (close or a bit less then a R9-290)?
Do you also know if you want to check if a algo is memory limited, you can go into GPUZ and check out the MCU (memory controller unit) and see the load on it? Mine sits at 38% at 108sols for a 1070. If we want to take a page from your book and 'extrapolate' from that, that means there is potential there for 284sols on a 1070, that is IF it's completely memory bound and without any sort of optimizing for Nvidia hardware. NeoS also sits around 30% MCU usage. Dagger sits at 100% right before it trades off to more GPU and power usage (if you use a dual miner). Cryptonote also sits at 100% utilization. Weird, all the 'smart minds' and no one bothers checking the gauges.