Well ... as I said, I have no IDE setup, so currently I can't compile a version for myself. If you don't have the time to fiddle around with my commits, then I really need help in setting up an IDE in Windows. Have you got this in a readme, wiki or can you give me a brief explanation in how to do this? I worked with MS VC++ Express as a hobby some time ago ...
You said local copy, is it a copy of the last version of my fork? As you've observed I am new to this kind of working, but I hope you see my progress
.
Dia
Compiling this on windows is nothing short of a DISASTER so forget it.
Anyway I fixed up a few things on my Diapolo branch on github. Pull the changes to bring your local tree into sync. Alas I'm still only getting HW errors, so there's clearly something wrong. The return code for giving me a nonce I use works fine, provided I'm testing for the right thing before sending the nonce back. I've stared at it for half a day and can't find what's wrong. I even tried diablo's kernel and encountered exactly the same problem. For some reason I keep thinking it's something to do with confusion about the initial offset of the nonce and what is passed to the kernel.
Okay, so as I wrote, if Phatk works, then the base-nonces passed to the kernel should be correct for diakgcn. I will check the phatk.cl to be sure. I saw you added a BITALIGN path to diakgcn, that's not using bitalign() or any other OpenCL function, but simply does it's thing directly. What is that for, i'm not sure if that's needed for a GCN kernel anyway
.
Another idea, are you applying a BFI_INT patch on Tahiti (it must not use amd_bytealign())? This is not needed and produces wrong values ... I want that damn thing working
, I stared at it quite a few hours too ^^.
Edit: Perhaps we could try my old approach of writing to output in the kernel, because I know that worked for me?
That's the code I used, but uses your NFLAG. It would need to scan the output buffer on host side everytime after a kernel execution, which could lead to higher CPU usage (and needs changes in host code), but saves the IF-clause and another write into output (which saves the kernel quite some instructions, even on GCN).
u result = (V[7] == 0x136032ed) * nonce;
output[NFLAG & result] = result;
This code would be more like your current code, but uses the approach of comparison and mul to save 0 or a positive nonce in result (and is slower than your current code). But for sure that can't be the problem we are looking for ...
u result = (V[7] == 0x136032ed) * nonce;
if (result)
output[FOUND] = output[NFLAG & result] = result;
Dia