I still need to understand how the signing work
There is actually not much to understand. I moved most of the signing work out of the loop. On each iteration you just need to compute ((a + b)*c mod d) where a, b, c and d are 256-bit numbers. Long addition is trivial, multiplication shouldn't be too hard and finding remainder is the hardest part.
forgot to comment that: you don't really need an external API for GPU. Since OpenCL is compiled on the fly, you can implement it inside the wallet. Just work out an internal API to make it easy to link different miners to the wallet. (As a spreadcoin "entusiast", i'd prefer that, since it would keep a high cost for botnets).
This sounds reasonable, miner as a static or dynamic library. This also can have an advantage that you can switch to new block faster thus spending less time on mining stale blocks than with these json rpc calls. This makes botnets more complicated but still doesn't prevents anyone from actually separating miner and wallet.