Also briefly loaded the wolf0 4-way kernel that he ships with sgminer-gm into this. Performance on RX470 seems just a bit better.
PLEASE EXPLAIN THE "4-WAY KERNEL"--
You may share things with developers, but please make a simple explanation as to what "4-way" means. I've tried the sgminer that was released, it runs better on one of my 280X rigs and is more stable. Likely, I will convert it to a RX 470 rig, but I'd like to know more about the software that I am running. --scryptr
When hashing, a GPU uses many threads in parallel, where basically each thread tries to solve the same puzzle with a slightly different input (the nonce, a number that increases per adjacent thread). Some parts of some algorithms can be made to run faster when groups of threads temporarily join forces to do a bit of the work. You can compare it with hauling heavy boxes. If 4 people have to haul 4 very heavy boxes from A to B, they may actually be faster when carrying each box with 4 people at the same time compared to when they would only carry their own box.
Ethash was originally designed to haul boxes with 8 people at the same time (8-way) while Wolf0 made a 4-way variant. Of course Wolf's 4 threads have to do the same amount of work as the original kernel, but it can be a little bit more efficient. Ultimately (I think) he did this to prepare for his private ETH kernel, because that kernel uses a different way of sharing the work (coordinating the joint box hauling operation) between 4 adjacent threads, and 8 isn't supported.
If I recall well, Etar from the Etarminer (CUDA ETH miner) has a 16-way solution and who knows what Claymore did