#elif defined VECTORS2
uint result = W[117].x ? 0u:W[3].x;
result = W[117].y ? result:W[3].y;
if (result)
SETFOUND(result);
No, you're not quite right there btw. There are a few issues that made me use the atomic ops instead.
There is no way to return a nonce value of 0.
Bitmasked nonce values can also be zero meaning they get lost.
It is not just vectors that find nonces at the same time, it's a whole wave front of threads finding nonces at the same time and corrupting both values.
Bitmasked nonce values from results found in the same global worksize can come out the same value and overwrite each other.
It's to consolidate the return values from different kernels and decrease the CPU usage of the return code that checks the nonce values.
Again, very small but far from 2^64. Since bitcoin mining is a game of odds, I didn't see the point of losing that - provided you don't drop the hashrate of course. It's unusual that some devices need higher memory speed just for one atomic op but clearly it's a massively memory intensive operation that affects the whole wave front. Considering increasing ram speed by 15 or 20 would not even register in terms of extra power usage and temperature generated, to me at least it seems a better option.
But the beauty of free software is you can do whatever you like to the code if you don't like the way I do it
Thanks for the detailed explanation!
Some more food for thought: I think the bitmasked stuff was probably the biggest problem (because the less 1's the nonce has, the bigger the probability of it being lost IIRC), that's why on the above code there is no bitmasking for checking for nonces, it uses the SETA op code IIRC (the C "?" operand gets a specific gpu isa op code).
A specific nonce value of 0 also happens at a rate of P = 1/(2^64) [P_finding_nonce = 1/(2^32), P_nonce_is_all_zeros = 1/(2^32)], so that's also fine by me.
About the global worksize bitmasked problem, if not using bitmasks, the only way for overwrites to happen would be if 2 identical bitwise nonces were found, correct?
I will also try the tiny mem o/c to see the hashrate diference when using the atomic_add, I like to try all angles to solve a problem, thanks!
Btw, since I haven't yet, 1btc donation sent!
I can only imagine the number of hours you spent writing and optimizing cgminer's code!