Okay, some more info on the default display hang bug in linux, actually think it is an Xserver poclbm fglrx kernel shit fight going on.
I truly thought the hardware on the GPU0 (default display) was fried since I swapped the 2 5970 cards and the hang on launch failure followed the GPU0 that was the display adapter that had hung but was now just another node ... go figure.
After complete re-install to get back to known good fglrx 10.12, ATI-stream SDK 2.1, yada,yada best set-up launched a poclbm instance on default display adapter ... yay, works, hardware not fried.
But wait, after a few moments and some movement of mouse, move an x-term around or something it hangs, @#$%!, restart and try to launch poclbm on GPU0 and hangs the system immediately this time. Back to square one, staring down barrel of complete system install
try one more thing, have you heard about the too cold bug to run 5970 crash all you water-cooler guys?
http://www.tomshardware.com/forum/293030-33-5970-freezes-running-watercooledSo thinking outside chance card maybe too cold I launched an instance of gears;
$ fgl_glxgears
by default it was cranking on GPU0 so then after few seconds launched poclbm on GPU0, it works, yay, eh wtf?!. Shutdown fgl_glxgears and poclbm on GPU0 is away, still hashing like a raped ape, looks like a starter motor needed to get it fired up! Tried this several times and works reliably but one time I used -f 5 and left fgl_glxgears running longer and it all froze again. In amongst all this freezing, hanging and restarting sometimes the system doesn't come back up or sends out an ugly kernel crash and string of errors on reboot ... so not too cold bug but I think there is kernel conflict with Xserver, running on default display, fglrx and poclbm (maybe pyOpenCL has something to answer for also.)
The high CPU work for the default display process is the big hint here, wtf is it churning on the CPU when the poclbm process is running on the GPU0?
$ps -A
...
2001 pts/5 00:01:51 poclbm.py
2004 pts/1 00:00:18 poclbm.py
2007 pts/2 00:00:18 poclbm.py
2010 pts/3 00:00:18 poclbm.py
...
all launched nearly simultaneously but the process 2001 is the one running on the default display adapter (started using the patented fgl_glxgears kick-starter method)
EDIT: update, swapped hardware again and the hanging process followed the dodgy GPU again ... i.e. not default display adapter problem but weird GPU core. Get this, the core will not run the poclbm.py unless I launch fgl_glxgears (maybe some other graphic intensive code also but haven't tested) first on the display associated with that core (e.g. $export DISPLAY=:0.2; fgl_glxgears )and "kickstart" it off. But also noticed that this core runs 1-2Mhash FASTER than other cores when it is crunching but then eventually hangs, sometimes runs for 30-60 mins, sometimes 10 mins only.
Since it is no longer default display GPU it doesn't hang system unless I try to kill it (Ctrl^C) or reboot the system ... at this point requires hard power restart, euchh. So most likely ultra-sensitive GPU core hardware but not so dodgy that it won't run and tempting to see if I can trick it into stabilising since it is a fast core ... any ideas hardware tech. heads? NB: have also tried under-clocking and over-clocking on the same core with same results .... GPU BIOS problem maybe? ... anyone gone under the hood there?