Scrypt died for GPUs when the Gridseed and later ASIC showed up for it - had nothing to do with AMD vs Nvidia.
I wasn't around early enough for the Bitcoin GPU days but it appears that the same thing happened there.
Also, I never specified memory bus width - I'm talking OVERALL memory access, the stuff that keeps the R9 290x hashing at the same rate on ETH as the R9 290 (among other examples).
The reason the R9 290/390 and such are competative on ETH and ZEC is that their bus width and other memory subsystem design makes up for their much lower memory speed, but the algorythms used in ETH and ZEC are very much memory access limited more than compute limited (or the R9 290x would hash noticeably better than the R9 290 does - on ETH at least where the code has been well optimised, they hash pretty much identically presuming same clocks and same BIOS memory system mods).
Do keep in mind that for ETH at least there IS a miner (genoil's) that started out as CUDA specific and is well optimised for NVidia, yet the AMD RX series cards match or better the NVidia GTX 10xx cards on that algorythm on both raw performance AND hash/watt and at a much lower price point.
This isn't the case as much for ZEC (the code is still getting optimised), but it's become apparent that ZEC is yet another "memory hard" algorythm by design and implimentation that does not reward superior compute performance past the point that the memory subsystem starts hitting it's limits (if not as much so as ETH).
No, I'm not an "ETH baby" - all of my early ETH rigs were Scrypt rigs back in the day (give or take some cards getting moved around) that spent their time after Scrypt went ASIC doing d.net work (and most of the HD 7750s from my scrypt days are STILL working d.net via the BOINC MooWrapper project).
I don't know where you're comming up with NVidia being 40% more efficient than the RX 4xx series - right now it's looking like actual efficiency is more or less a tossup, but very dependent on what you're actually working on with a given card. Even on Folding where NVidia offers a clear performance lead, the RX 480 is a tossup with the GTX 1070 on PPD/$ at the card level and very close at the system level, and very close on PPD/watt (less than 10% per the data I've seen at the card level).
I do NOT see a 40% more efficient benefit to NVidia even in one of it's biggest strongholds.
That is definitely incorrect. Private kernels killed Scrypt mining... ASIC's came along later If you weren't around at the end of 14 you would'nt have figured that out. Not everything is the big bad ASIC boogieman... Sometimes it's just greed and people turning off the lights. You can Google my posts and check them out from BCT in '14. Hence why I'm here trying to motivate some development for Nvidia's side.
"I don't know where you're comming up with NVidia being 40% more efficient than the RX 4xx series - right now it's looking like actual efficiency is more or less a tossup"'
With a lack of coding for Nvidia you're making this statement off of current conditions and rates. Do you think as much effort is going into developing code for Nvidia as AMD right now? The answer is no. You already said no. The efficiency argument is based off of algos that actually use more then memory, not just that but gaming as well. While mining isn't gaming, gaming has been optimized quite a bit over the years. When one brand is getting maxed, the other is as well. Go look up some hardware benchmarks, that's pretty fundamental stuff.
Genoil's miner isn't CUDA optimized. That was Dagger, not Equihash. His endeavours in Equihash are focused on AMD hardware as he owns it. It wasn't until recently that he made a Nvidia compatible miner and it's just a port of SAv5.
Alright, how about some sources for Equihash being hardware memory bus width locked that I haven't seen on BCT and isn't extrapolated from a CPU miner or current rates of AMD hardware. You know Fury also has a better processor then a R9-290? You also know that a RX-480 is basically a mid-range GPU with processing power to match it (close or a bit less then a R9-290)?
Do you also know if you want to check if a algo is memory limited, you can go into GPUZ and check out the MCU (memory controller unit) and see the load on it? Mine sits at 38% at 108sols for a 1070. If we want to take a page from your book and 'extrapolate' from that, that means there is potential there for 284sols on a 1070, that is IF it's completely memory bound and without any sort of optimizing for Nvidia hardware. NeoS also sits around 30% MCU usage. Dagger sits at 100% right before it trades off to more GPU and power usage (if you use a dual miner). Cryptonote also sits at 100% utilization. Weird, all the 'smart minds' and no one bothers checking the gauges.
Odd, I was mining Scrypt profitably with GPUs for a couple months into the Gridseed era - "private kernels" did NOT kill Scrypt mining.
Why yes, I DO base my "efficiency" numbers off current conditions - but I don't just look at ONE algorythm that's still new and not optimised for NVIdia, I also look at others that ARE optimied for both and are similar in conditions.
Keep in mind that I SPECIFICALLY STATED "Genoil's miner" for ETH. Your comments about "that was Dagger" just show you didn't bother to read what I POSTED.
RX 480 has faster (8000 Mhz effective) but narrower (256 bit) memory than the R9 290 and R9 390 that gives it overall slightly better memory bandwidth than the R9 290 (5000 Mhz effective at 384 bit) but slightly worse than the R9 390 (6000 effective Mhz at 384 bit).
The RX 480 has 12.5% MORE compute cores (2304 vs. 2048 for exactly a 9:8 ratio) at quite a bit HIGHER clock rate than the R9 390 and even more so than the R9 290.
RX 480 and R9 390 are both PCI-E 3.0 cards, R9 290 is only PCI-E 2.0, but that has little or no measurable effect on most mining.
The RX 480 is NOT "close or a bit less than a R9 290" but in fact is a superior card across the board except ONLY for memory bus width (which is made up for and more by it's much faster memory), but it's speed on ETH and ZEC is almost identical, definitely NOT seeing 12.5% better speed much less it's actual 12.5% MORE CORES TIMES IT'S HIGHER CLOCK SPEED which would be the case on a compute-limited algorythm.
On an actual compute-limited algorythm like SHA256 (which is still used by a few sites like GPUBoss for a benchmark), the RX480 blows the R9 290 and R9 390 completely out of the water.
Might also want to pay attention to the R9 290x vs the R9 290 as they have the same memory system but the 290x has the same 2304 cores that the RX 480 does - yet doesn't hash any faster than the R9 290 despite having 12.5% more cores.
Am I saying there isn't room for improvement on the NVIdia side for ZEC mining? Definitely not!
Am I saying I doubt that NVidia will surpass AMD on ZEC? Given the obvious "heavy memory usage for ASIC resistance" design of ZEC and th very similar memory systems on both sides, definitely.
Yes, I'm fully aware that the FuryX and Nano have 4096 cores and fairly high core clock rates (Higher than most if not all R9 390 as I recall, definitely higher than any R9 290, but not quite as high as the RX480) - which just MAGNIFIES my point as they should be completely destroying anything else AMD on both ETH and ZEC if the protocals were compute-bound, but in actual fact the RX 480 hashes ETH noticeably better and is close or better on ZEC from the benchmarks I've seen posted.
Apparently HBM 1 has some latency issues that make it quite a bit slower than it's "raw memory access speed" would indicate, which doesn't apply when comparing various cards that all have GDDR 5 to each other.