Yes that is interesting. I'm guessing you have underclocked your memory exceptionally low, as that was found to be an issue with use of atomic ops. Some people found a bump of 15 in memory was enough to correct it. Lack of atomic functions there could lead to HW errors and loss of shares. It's a tradeoff either way. The change was put in there to make sure no shares were lost, which can happen with the old opencl code (though it's only a very small number that would be lost).
Ah, ok! Thanks for the info. Yep, I'm at 150MHz mem clock. It's to prevent the case of simultaneous nonce finds on different vectors to overwrite the result on the same address, right?
I prefer the tradeoff tbh, I did the math a while ago on the probability of that happening (P=1/(2^32)*1/(2^32)=1/(2^64). On a 1GH/s card, that will happen on average once every ~585 years)
I'm still using that optimization tradeoff I posted for more than a year now!
#elif defined VECTORS2
uint result = W[117].x ? 0u:W[3].x;
result = W[117].y ? result:W[3].y;
if (result)
SETFOUND(result);
No, you're not quite right there btw. There are a few issues that made me use the atomic ops instead.
There is no way to return a nonce value of 0.
Bitmasked nonce values can also be zero meaning they get lost.
It is not just vectors that find nonces at the same time, it's a whole wave front of threads finding nonces at the same time and corrupting both values.
Bitmasked nonce values from results found in the same global worksize can come out the same value and overwrite each other.
It's to consolidate the return values from different kernels and decrease the CPU usage of the return code that checks the nonce values.
Again, very small but far from 2^64. Since bitcoin mining is a game of odds, I didn't see the point of losing that - provided you don't drop the hashrate of course. It's unusual that some devices need higher memory speed just for one atomic op but clearly it's a massively memory intensive operation that affects the whole wave front. Considering increasing ram speed by 15 or 20 would not even register in terms of extra power usage and temperature generated, to me at least it seems a better option.
But the beauty of free software is you can do whatever you like to the code if you don't like the way I do it