python OpenCL bitcoin miner - page 52.

jef.blanc

newbie

Activity: 29

Merit: 0

Hello, it is the errors I had when running poclbm from poclbm_py2exe.7z with the original OpenCL.dll inside.

The 7z archive contains the dll from the ATI driver, so using an nvidia card I had to replace this dll by the one provided by my nvidia driver which is in C:\Windows\System32\

Is it your case, executing poclbm from the package poclbm_py2exe.7z ?

The things I had to do for using poclbm :
- building pyopencl for my system, I installed the latest CUDA Toolkit from here http://developer.nvidia.com/object/cuda_3_2_toolkit_rc.html#Windows%20XP,%20Windows%20Vista%20and%20Windows7 to build pyopencl.
- using the latest driver (Developer Drivers) from here http://developer.nvidia.com/object/cuda_3_2_toolkit_rc.html#Windows%20XP,%20Windows%20Vista%20and%20Windows7 , I had the following errors without that :

Code:

ImportError: DLL load failed: La procÚdure spÚcifiÚe est introuvable.

- using the latest version of poclbm from http://github.com/m0mchil/poclbm , a correction has been made to make poclbm running on nvidia GPU without errors

Guybrush01

newbie

Activity: 20

Merit: 0

Hi there!

Trying to use this, but here is the error I see:

Code:

D:\pycuda>poclbm.exe
No device specified, you may use -d to specify one of the following

Traceback (most recent call last):
  File "poclbm.py", line 58, in 
  File "pyopencl\__init__.pyc", line 138, in program_build
pyopencl.RuntimeError: clBuildProgram failed: build program failure

Build on :

:52: error: incompatible type assigning 'int', expected 'uint2'
        work[5]=0x00000000;
               ^~~~~~~~~~~
:53: error: incompatible type assigning 'int', expected 'uint2'
        work[6]=0x00000000;
               ^~~~~~~~~~~
:54: error: incompatible type assigning 'int', expected 'uint2'
        work[7]=0x00000000;
               ^~~~~~~~~~~
:55: error: incompatible type assigning 'int', expected 'uint2'
        work[8]=0x00000000;
               ^~~~~~~~~~~
:56: error: incompatible type assigning 'int', expected 'uint2'
        work[9]=0x00000000;
               ^~~~~~~~~~~
:57: error: incompatible type assigning 'int', expected 'uint2'
        work[10]=0x00000000;
                ^~~~~~~~~~~
:58: error: incompatible type assigning 'int', expected 'uint2'
        work[11]=0x00000000;
                ^~~~~~~~~~~
:59: error: incompatible type assigning 'int', expected 'uint2'
        work[12]=0x00000000;
                ^~~~~~~~~~~
:60: error: incompatible type assigning 'int', expected 'uint2'
        work[13]=0x00000000;
                ^~~~~~~~~~~
:61: error: incompatible type assigning 'int', expected 'uint2'
        work[14]=0x00000000;
                ^~~~~~~~~~~
:62: error: incompatible type assigning 'int', expected 'uint2'
        work[15]=0x00000280;
                ^~~~~~~~~~~
:141: error: incompatible type assigning 'int', expected 'uint2'
        work[9]=0x00000000;
               ^~~~~~~~~~~
:142: error: incompatible type assigning 'int', expected 'uint2'
        work[10]=0x00000000;
                ^~~~~~~~~~~
:143: error: incompatible type assigning 'int', expected 'uint2'
        work[11]=0x00000000;
                ^~~~~~~~~~~
:144: error: incompatible type assigning 'int', expected 'uint2'
        work[12]=0x00000000;
                ^~~~~~~~~~~
:145: error: incompatible type assigning 'int', expected 'uint2'
        work[13]=0x00000000;
                ^~~~~~~~~~~
:146: error: incompatible type assigning 'int', expected 'uint2'
        work[14]=0x00000000;
                ^~~~~~~~~~~
:147: error: incompatible type assigning 'int', expected 'uint2'
        work[15]=0x00000100;
                ^~~~~~~~~~~
:149: error: incompatible type assigning 'int', expected 'uint2'
        A=0x6a09e667;
         ^~~~~~~~~~~
:151: error: incompatible type assigning 'int', expected 'uint2'
        C=0x3c6ef372;
         ^~~~~~~~~~~
:153: error: incompatible type assigning 'int', expected 'uint2'
        E=0x510e527f;
         ^~~~~~~~~~~
:155: error: incompatible type assigning 'int', expected 'uint2'
        G=0x1f83d9ab;
         ^~~~~~~~~~~
:156: error: incompatible type assigning 'int', expected 'uint2'
        H=0x5be0cd19;

Wish I were smart enough to understand what that means!

Any ideas?

jef.blanc

newbie

Activity: 29

Merit: 0

ah ... yes, I didn't check that before talking Embarrassed

1800 * 7.5 = 13500 ... no so far than my 15200 ... logical then in fact.

ArtForz

sr. member

Activity: 406

Merit: 257

9400GT 16SPs @ 1.4GHz
9800GT 112SPs @ 1.5GHz
-> a 9800GT is about 7.5x 9400GT ...

jef.blanc

newbie

Activity: 29

Merit: 0

Only 1800kh/s ? It seems very low in comparison to my 9800 GT which give me about 15200 kh/s

Perhaps for some strange reason your GPU is not fully used, if your graphical card gives you this information you can check the GPU load with GPU-Z, in the "Sensors" tab (http://www.techpowerup.com/gpuz/).

sandos

sr. member

Activity: 440

Merit: 250

#SWGT CERTIK Audited

Quote from: sandos on October 19, 2010, 03:29:01 PM

Quote from: jef.blanc on October 19, 2010, 11:27:11 AM

you’re welcome

The new code you provided runs on my computer.

OpenCL-wise it seems to work fine on my 9400 GT, now I just have trouble compiling bitcoind

Ah, it worked it was just using another port.

I knew the 9400 GT would be slow relative to other cards, but maybe not this slow: I get 1800kh/s, which is slower than my cpu =)

I was hoping to maybe double performance, but oh well.

sandos

sr. member

Activity: 440

Merit: 250

#SWGT CERTIK Audited

Quote from: jef.blanc on October 19, 2010, 11:27:11 AM

you’re welcome

The new code you provided runs on my computer.

OpenCL-wise it seems to work fine on my 9400 GT, now I just have trouble compiling bitcoind

jef.blanc

newbie

Activity: 29

Merit: 0

you’re welcome

Thank you for your better code design than mine Grin

The new code you provided runs on my computer.

m0mchil

full member

Activity: 171

Merit: 127

Thanks Jef! There is new version that should work on Nvidia now. I also changed default getwork request rate to 5 seconds to reduce the risk of solving stale block. The patch itself has now one more check to not show 'proof of work found' in this case.

Can someone with Nvidia please test this, it should resolve the 'code selection failed to select' issue.

jef.blanc

newbie

Activity: 29

Merit: 0

Hello again, thanks for the code

but it still doesn't work Sad

Nvidia doesn't like reversing numbers it seems Angry

however, if I comments anyone of the four affectations, it builds Grin

After trying by decomposing the reversing in many ways, I abandoned this path to bypass the problem.

I made a first version running with byte reversing and comparaisons made in the python script, and a second version with a function that make the comparison directly between a word and a reversed word in the OpenCl code. For now this version works only on big endian memory mapping (I hope it's what is used in the GPUs).

Note: I added somme print to the screen to show some informations when running the script.
I also let some comments in the code.

The two versions build and run without error, but I'm not sure that operations are correct.
As expected the second version is more effective (about 14800 khashs/sec with the 1st version, and 15200 khashs/sec with the 2nd version).

Version 1 : reversing and comparison in the python script

Code:

--- F:/Desktop/m0mchil-poclbm-cf33815/poclbm.py    Fri Oct 15 08:48:48 2010
+++ F:/Desktop/m0mchil-poclbm-cf33815/poclbm_nv1.py    Mon Oct 18 23:36:46 2010
@@ -21,6 +21,9 @@
     t2=(rot(a, 30)^rot(a, 19)^rot(a, 10))+((a&b)|(c&(a|b)))
     return (uint32(d + t1), uint32(t1+t2))
 
+def bytereverse(x):
+    return (((x & 0x000000ffL) << 24) | ((x & 0x0000ff00L) << 8) | ((x & 0x00ff0000L) >> 8) | ((x & 0xff000000L) >> 24))
+
 def sysWrite(format, args=()):
     sys.stdout.write('\r                                        \r' + format % args)
     sys.stdout.flush()
@@ -50,16 +53,40 @@
     context = cl.Context([devices[options.device]], None, None)
 else:
     print 'No device specified, you may use -d to specify one of the following\n'
+    # create_some_context should ask for the platform and device to use
     context = cl.create_some_context()
+
+device = context.devices[0]
+print "==============================================================="
+print "Platform name:", platform.name
+print "Platform profile:", platform.profile
+print "Platform vendor:", platform.vendor
+print "Platform version:", platform.version
+print "---------------------------------------------------------------"
+print "Device name:", device.name
+print "Device type:", cl.device_type.to_string(device.type)
+print "Device memory: ", device.global_mem_size//1024//1024, 'MB'
+print "Device max clock speed:", device.max_clock_frequency, 'MHz'
+print "Device compute units:", device.max_compute_units
+print "==============================================================="
+
 queue = cl.CommandQueue(context)
 
-kernelFile = open('btc_miner.cl', 'r')
+kernelFile = open('btc_miner_nv1.cl', 'r')
 miner = cl.Program(context, kernelFile.read()).build()
 kernelFile.close()
 
+print "==============================================================="
+print "build OK"
+print "==============================================================="
+
 if (options.worksize == -1):
     options.worksize = miner.search.get_work_group_info(cl.kernel_work_group_info.WORK_GROUP_SIZE, context.devices[0])
 
+print "==============================================================="
+print "worksize = ", options.worksize
+print "==============================================================="
+
 frames = options.frames
 frame = float(1)/frames
 window = frame/30
@@ -75,7 +102,7 @@
 work['extraNonce'] = 0
 work['block'] = ''
 
-output = np.zeros(2, np.uint32)
+output = np.zeros(6, np.uint32)
 
 while True:
     try:
@@ -117,10 +144,12 @@
     output_buf = cl.Buffer(context, mf.WRITE_ONLY | mf.USE_HOST_PTR, hostbuf=output)
 
     rate = start = time()
+    success = 0
+    nonce = 0
     while True:
-        if (output[0]):
-            work['block'] = work['block'][:152] + pack('I', long(output[1])).encode('hex') + work['block'][160:]
-            sysWriteLn('found: %s, %s', (output[1], datetime.now().strftime("%d/%m/%Y %H:%M")))
+        if (success):
+            work['block'] = work['block'][:152] + pack('I', long(nonce)).encode('hex') + work['block'][160:]
+            sysWriteLn('found: %s, %s', (nonce, datetime.now().strftime("%d/%m/%Y %H:%M")))
             break
 
         if (time() - start > options.askrate or base + globalThreads == 0x7FFFFFFF):
@@ -131,8 +160,20 @@
             base = 0x7FFFFFFF - globalThreads
 
         kernelStart = time()
-        miner.search(queue, (globalThreads, ), (options.worksize, ), block2[0], block2[1], block2[2], state[0], state[1], state[2], state[3], state[4], state[5], state[6], state[7], state2[1], state2[2], state2[3], state2[5], state2[6], state2[7], target[6], pack('I', base), output_buf)
+        miner.search(queue, (globalThreads, ), (options.worksize, ), block2[0], block2[1], block2[2], state[0], state[1], state[2], state[3], state[4], state[5], state[6], state[7], state2[1], state2[2], state2[3], state2[5], state2[6], state2[7], pack('I', base), output_buf)
         cl.enqueue_read_buffer(queue, output_buf, output).wait()
+        #target = target[6]
+        #H.x = output[0] ; H.y = output[1]
+        #G.x = output[2] ; G.y = output[3]
+        #nonce.x = output[4] ; nonce.y = output[5]
+        if (output[0] == 0 and bytereverse(output[2])<=target[6]):
+            success = 1
+            nonce = output[4]
+        
+        if (output[1] == 0 and bytereverse(output[3])<=target[6]):
+            success = 1
+            nonce = output[5]
+        
         kernelTime = time() - kernelStart
 
         if (kernelTime < lower):
@@ -142,4 +183,4 @@
 
         if (time() - rate > options.rate):
             rate = time()
-            sysWrite('%s khash/s', int((base / (time() - start)) / 500))
\ No newline at end of file
+            sysWrite('%s : %s khash/s', (datetime.now().strftime("%d/%m/%Y %H:%M:%S"),int((base / (time() - start)) / 500)))
\ No newline at end of file

Code:

--- F:/Desktop/m0mchil-poclbm-cf33815/btc_miner.cl    Fri Oct 15 08:48:48 2010
+++ F:/Desktop/m0mchil-poclbm-cf33815/btc_miner_nv1.cl    Mon Oct 18 22:01:08 2010
@@ -1,5 +1,4 @@
-#define bytereverse(x) ( ((x) << 24) | (((x) << 8) & 0x00ff0000) | (((x) >> 8) & 0x0000ff00) | ((x) >> 24) )
-#define rot(x, y) rotate(x, (uint)y)
+#define rot(x, y) rotate(x, (uint2)y)
 #define R(x) (work[x] = (rot(work[x-2],15)^rot(work[x-2],13)^((work[x-2])>>10)) + work[x-7] + (rot(work[x-15],25)^rot(work[x-15],14)^((work[x-15])>>3)) + work[x-16])
 #define sharound(a,b,c,d,e,f,g,h,x,K) {h=(h+(rot(e, 26)^rot(e, 21)^rot(e, 7))+(g^(e&(f^g)))+K+x); t1=(rot(a, 30)^rot(a, 19)^rot(a, 10))+((a&b)|(c&(a|b))); d+=h; h+=t1;}
 
@@ -8,7 +7,6 @@
                         const uint state4, const uint state5, const uint state6, const uint state7,
                         const uint B1, const uint C1, const uint D1,
                         const uint F1, const uint G1, const uint H1,
-                        const uint target,
                         const uint base,
                         __global uint * output)
 {
@@ -209,17 +207,12 @@
     G+=0x1f83d9ab;
     H+=0x5be0cd19;
 
-    if((H.x==0) && (bytereverse(G.x)<=target))
-    {
-        output[0] = 1;
-        output[1] = nonce.x;
-    }
-
-    if((H.y==0) && (bytereverse(G.y)<=target))
-    {
-        output[0] = 1;
-        output[1] = nonce.y;
-    }
+    output[0] = H.x;
+    output[1] = H.y;
+    output[2] = G.x;
+    output[3] = G.y;
+    output[4] = nonce.x;
+    output[5] = nonce.y;
 }
 
 // end
\ No newline at end of file

Version 2 : function that make the comparison between inversed and non inversed word

Code:

--- F:/Desktop/m0mchil-poclbm-cf33815/poclbm.py    Fri Oct 15 08:48:48 2010
+++ F:/Desktop/m0mchil-poclbm-cf33815/poclbm_nv2.py    Mon Oct 18 23:53:18 2010
@@ -50,16 +50,40 @@
     context = cl.Context([devices[options.device]], None, None)
 else:
     print 'No device specified, you may use -d to specify one of the following\n'
+    # create_some_context should ask for the platform and device to use
     context = cl.create_some_context()
+
+device = context.devices[0]
+print "==============================================================="
+print "Platform name:", platform.name
+print "Platform profile:", platform.profile
+print "Platform vendor:", platform.vendor
+print "Platform version:", platform.version
+print "---------------------------------------------------------------"
+print "Device name:", device.name
+print "Device type:", cl.device_type.to_string(device.type)
+print "Device memory: ", device.global_mem_size//1024//1024, 'MB'
+print "Device max clock speed:", device.max_clock_frequency, 'MHz'
+print "Device compute units:", device.max_compute_units
+print "==============================================================="
+
 queue = cl.CommandQueue(context)
 
-kernelFile = open('btc_miner.cl', 'r')
+kernelFile = open('btc_miner_nv2.cl', 'r')
 miner = cl.Program(context, kernelFile.read()).build()
 kernelFile.close()
 
+print "==============================================================="
+print "build OK"
+print "==============================================================="
+
 if (options.worksize == -1):
     options.worksize = miner.search.get_work_group_info(cl.kernel_work_group_info.WORK_GROUP_SIZE, context.devices[0])
 
+print "==============================================================="
+print "worksize = ", options.worksize
+print "==============================================================="
+
 frames = options.frames
 frame = float(1)/frames
 window = frame/30
@@ -142,4 +166,4 @@
 
         if (time() - rate > options.rate):
             rate = time()
-            sysWrite('%s khash/s', int((base / (time() - start)) / 500))
\ No newline at end of file
+            sysWrite('%s : %s khash/s', (datetime.now().strftime("%d/%m/%Y %H:%M:%S"),int((base / (time() - start)) / 500)))
\ No newline at end of file

Code:

--- F:/Desktop/m0mchil-poclbm-cf33815/btc_miner.cl    Fri Oct 15 08:48:48 2010
+++ F:/Desktop/m0mchil-poclbm-cf33815/btc_miner_nv2.cl    Mon Oct 18 23:33:50 2010
@@ -1,8 +1,88 @@
-#define bytereverse(x) ( ((x) << 24) | (((x) << 8) & 0x00ff0000) | (((x) >> 8) & 0x0000ff00) | ((x) >> 24) )
-#define rot(x, y) rotate(x, (uint)y)
+#define rot(x, y) rotate(x, (uint2)y)
 #define R(x) (work[x] = (rot(work[x-2],15)^rot(work[x-2],13)^((work[x-2])>>10)) + work[x-7] + (rot(work[x-15],25)^rot(work[x-15],14)^((work[x-15])>>3)) + work[x-16])
 #define sharound(a,b,c,d,e,f,g,h,x,K) {h=(h+(rot(e, 26)^rot(e, 21)^rot(e, 7))+(g^(e&(f^g)))+K+x); t1=(rot(a, 30)^rot(a, 19)^rot(a, 10))+((a&b)|(c&(a|b))); d+=h; h+=t1;}
 
+// test 1 : recursive function, build error :
+// Error: Code selection failed to select: 0x504c1d8: i8 = NVPTXISD::MoveParam 0x504c150
+/*
+bool recurseReversedCompare(uchar *reversed, uchar *normal, uchar ir, uchar in)
+{
+    if(reversed[ir] > normal[in])
+    {
+        return false;
+    }
+    else if(reversed[ir] < normal[in])
+    {
+        return true;
+    }
+    else if(in > 0)
+    {
+        return recurseReversedCompare(reversed, normal, ir+1, in-1);
+    }
+    else
+    {
+        return true;
+    }
+}
+
+bool reversedSmallerThan(uint reversed, uint normal)
+{
+    uchar *r = (uchar *)&reversed;
+    uchar *n = (uchar *)&normal;
+
+    return recurseReversedCompare(r, n, 0, 3);
+}
+*/
+
+// test 2 : non recursive
+bool reversedSmallerThan(uint reversed, uint normal)
+{
+    uchar *r = (uchar *)&reversed;
+    uchar *n = (uchar *)&normal;
+
+    if(r[0] > n[3])
+    {
+        return false;
+    }
+    else if(r[0] < n[3])
+    {
+        return true;
+    }
+    else
+    {
+        if(r[1] > n[2])
+        {
+            return false;
+        }
+        else if(r[1] < n[2])
+        {
+            return true;
+        }
+        else
+        {
+            if(r[2] > n[1])
+            {
+                return false;
+            }
+            else if(r[2] < n[1])
+            {
+                return true;
+            }
+            else
+            {
+                if(r[3] > n[0])
+                {
+                    return false;
+                }
+                else
+                {
+                    return true;
+                }
+            }
+        }
+    }
+}
+
 __kernel void search(    const uint block0, const uint block1, const uint block2,
                         const uint state0, const uint state1, const uint state2, const uint state3,
                         const uint state4, const uint state5, const uint state6, const uint state7,
@@ -209,13 +289,13 @@
     G+=0x1f83d9ab;
     H+=0x5be0cd19;
 
-    if((H.x==0) && (bytereverse(G.x)<=target))
+    if((H.x==0) && reversedSmallerThan(G.x, target)) // (bytereverse(G.x)<=target)
     {
         output[0] = 1;
         output[1] = nonce.x;
     }
 
-    if((H.y==0) && (bytereverse(G.y)<=target))
+    if((H.y==0) && reversedSmallerThan(G.y, target)) // (bytereverse(G.y)<=target)
     {
         output[0] = 1;
         output[1] = nonce.y;

There are probably some optimisations to make

m0mchil

full member

Activity: 171

Merit: 127

Jef, at least it's now clear the problem is in the bytereverse(x) macro. Please try to replace it with:

uint bytereverse(const uint x)
{
   uint result;
   uchar* b = (uchar *)&x;
   uchar* l = (uchar *)&result;
   l[0] = b[3];
   l[1] = b[2];
   l[2] = b[1];
   l[3] = b[0];
   return result;
}

jef.blanc

newbie

Activity: 29

Merit: 0

After verification, the macro CL_VERSION_1_1 is defined in CL/cl.h from the Cuda Toolkit I installed, and this file declare the OpenCL 1.1 functions (it's logic in fact, I should have a compilation error otherwise) !

So the Toolkit I installed take in charge OpenCL 1.1, and my current Nvidia driver (that I didn't update) is still on OpenCL 1.0.

I checked on the "public" drivers on the net, the version is still 258.96 (the one I use), so I downloaded and installed the "Developer Drivers for WinVista and Win7 (260.61)" from http://developer.nvidia.com/object/cuda_3_2_toolkit_rc.html#Windows%20XP,%20Windows%20Vista%20and%20Windows7 (the page on which I download the CUDA Toolkit).

My OpenCL.dll file now provides the OpenCL 1.1 functions.
I tried to run poclbm.py, as m0mchil said there is a problem with the rotate function, but pyopencl seems to work*, that's a good point

* There are still some errors while running the tests provided with pyopencl, but I don't know if this is crucial.

Errors on rotate :

Code:

:204:2: error: no matching function for call to 'rotate'
        sharound(D,E,F,G,H,A,B,C,R(61),0xA4506CEB);
        ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
:4:103: note: instantiated from:
#define sharound(a,b,c,d,e,f,g,h,x,K) {h=(h+(rot(e, 26)^rot(e, 21)^rot(e, 7))+(g^(e&(f^g)))+K+x); t1=(rot(a, 30)^rot(a,
19)^rot(a, 10))+((a&b)|(c&(a|b))); d+=h; h+=t1;}
                                                                                                      ^
:2:19: note: instantiated from:
#define rot(x, y) rotate(x, (uint)y)
                  ^~~~~~
:3025:26: note: candidate function
ulong16 __OVERLOADABLE__ rotate(ulong16, ulong16);
                         ^
:3024:25: note: candidate function
ulong8 __OVERLOADABLE__ rotate(ulong8, ulong8);
                        ^
:3023:25: note: candidate function
ulong4 __OVERLOADABLE__ rotate(ulong4, ulong4);
                        ^
:3019:25: note: candidate function
ulong2 __OVERLOADABLE__ rotate(ulong2, ulong2);
                        ^
:3018:25: note: candidate function
long16 __OVERLOADABLE__ rotate(long16, long16);
                        ^
:3017:24: note: candidate function
long8 __OVERLOADABLE__ rotate(long8, long8);
                       ^
:3016:24: note: candidate function
long4 __OVERLOADABLE__ rotate(long4, long4);
                       ^
:3012:24: note: candidate function
long2 __OVERLOADABLE__ rotate(long2, long2);
                       ^
:3011:25: note: candidate function
uint16 __OVERLOADABLE__ rotate(uint16, uint16);
                        ^
:3010:24: note: candidate function
uint8 __OVERLOADABLE__ rotate(uint8, uint8);
                       ^
:3009:24: note: candidate function
uint4 __OVERLOADABLE__ rotate(uint4, uint4);
                       ^
:3005:24: note: candidate function
uint2 __OVERLOADABLE__ rotate(uint2, uint2);
                       ^
:3004:24: note: candidate function
int16 __OVERLOADABLE__ rotate(int16, int16);
                       ^
:3003:23: note: candidate function
int8 __OVERLOADABLE__ rotate(int8, int8);
                      ^
:3002:23: note: candidate function
int4 __OVERLOADABLE__ rotate(int4, int4);
                      ^
:2998:23: note: candidate function
int2 __OVERLOADABLE__ rotate(int2, int2);
                      ^
:2997:27: note: candidate function
ushort16 __OVERLOADABLE__ rotate(ushort16, ushort16);
                          ^
:2996:26: note: candidate function
ushort8 __OVERLOADABLE__ rotate(ushort8, ushort8);
                         ^
:2995:26: note: candidate function
ushort4 __OVERLOADABLE__ rotate(ushort4, ushort4);
                         ^
:2991:26: note: candidate function
ushort2 __OVERLOADABLE__ rotate(ushort2, ushort2);
                         ^
:2990:26: note: candidate function
short16 __OVERLOADABLE__ rotate(short16, short16);
                         ^
:2989:25: note: candidate function
short8 __OVERLOADABLE__ rotate(short8, short8);
                        ^
:2988:25: note: candidate function
short4 __OVERLOADABLE__ rotate(short4, short4);
                        ^
:2984:25: note: candidate function
short2 __OVERLOADABLE__ rotate(short2, short2);
                        ^
:2983:26: note: candidate function
uchar16 __OVERLOADABLE__ rotate(uchar16, uchar16);
                         ^
:2982:25: note: candidate function
uchar8 __OVERLOADABLE__ rotate(uchar8, uchar8);
                        ^
:2981:25: note: candidate function
uchar4 __OVERLOADABLE__ rotate(uchar4, uchar4);
                        ^
:2977:25: note: candidate function
uchar2 __OVERLOADABLE__ rotate(uchar2, uchar2);
                        ^
:2976:25: note: candidate function
char16 __OVERLOADABLE__ rotate(char16, char16);
                        ^
:2975:24: note: candidate function
char8 __OVERLOADABLE__ rotate(char8, char8);
                       ^
:2974:24: note: candidate function
char4 __OVERLOADABLE__ rotate(char4, char4);
                       ^
:2970:24: note: candidate function
char2 __OVERLOADABLE__ rotate(char2, char2);
                       ^
:2967:24: note: candidate function
ulong __OVERLOADABLE__ rotate(ulong, ulong);
                       ^
:2966:23: note: candidate function
long __OVERLOADABLE__ rotate(long, long);
                      ^
:2965:23: note: candidate function
uint __OVERLOADABLE__ rotate(uint, uint);
                      ^
:2964:22: note: candidate function
int __OVERLOADABLE__ rotate(int, int);
                     ^
:2963:25: note: candidate function
ushort __OVERLOADABLE__ rotate(ushort, ushort);
                        ^
:2962:24: note: candidate function
short __OVERLOADABLE__ rotate(short, short);
                       ^
:2961:24: note: candidate function
uchar __OVERLOADABLE__ rotate(uchar, uchar);
                       ^
:2960:23: note: candidate function
char __OVERLOADABLE__ rotate(char, char);
                      ^

Edit : I just see the rot function in poclbm.py (the one m0mchil was talking in fact I gess), I will try to see how to change the two rot functions, and perhaps bytereverse in btc_miner. cl ?

Edit2 : pfiuuu, it's hard !
first time I work on python and therefore on pyopencl, and using the nvidia toolkit. I don't even know from where the rotate functions come (the candidates functions), where are they defined (nividia toolkit, boost, standard c++, ... and from which include) ?

Edit3 : ...
I made some tries, and I don't succed for now:

Code:

Traceback (most recent call last):
  File "poclbm.py", line 57, in 
    miner = cl.Program(context, kernelFile.read()).build()
  File "E:\Python26\lib\site-packages\pyopencl-0.92-py2.6-win32.egg\pyopencl\__init__.py", line 138, in program_build
    "Build on %s:\n\n%s" % (dev, log) for dev, log in build_logs))
pyopencl.RuntimeError: clBuildProgram failed: build program failure

Build on :

Error: Code selection failed to select: 0x527c7f8: i32 = bswap 0x527cbb0

... I have no idea of what that means Grin

(for this test I just change #define rot(x, y) rotate(x, (uint)y) to #define rot(x, y) rotate(x, (uint2)y) in btc_miner.cl, trying to make it running first before checking the operations made).

Edit 4 :
a "funny" thing, the problem I have come from the end of the function search in btc_miner.cl

If I remove those lines

Code:

    if((H.x==0) && (bytereverse(G.x)<=target))
    {
        output[0] = 1;
        output[1] = nonce.x;
    }

    if((H.y==0) && (bytereverse(G.y)<=target))
    {
        output[0] = 1;
        output[1] = nonce.y;
    }

the script is built without error and start.

If I leave the tests without output[n] = ... it builds, and if I leave output[n] = ... without the tests, it builds also.
but if I put the test and output[n] = ... I have the following error :

Code:

Traceback (most recent call last):
  File "poclbm.py", line 57, in 
    miner = cl.Program(context, kernelFile.read()).build()
  File "E:\Python26\lib\site-packages\pyopencl-0.92-py2.6-win32.egg\pyopencl\__init__.py", line 138, in program_build
    "Build on %s:\n\n%s" % (dev, log) for dev, log in build_logs))
pyopencl.RuntimeError: clBuildProgram failed: build program failure

Build on :

Error: Code selection failed to select: 0x4f5a8d0: i32 = bswap 0x4f94670

GeorgeH

member

Activity: 83

Merit: 10

Definitely hoping to see this on nvidia in the near future! I have 2 graphics cards now and can only use one with the CUDA client!

m0mchil

full member

Activity: 171

Merit: 127

jef.blanc, try to replace the rotate() function with some native rotate left. rotate() is compiled to bit_align AMD specific instruction on AMD. Also there are problems with casts... I switched to uint2 vectors and all operations with mixed types (vector, scalar) work fine on AMD, but Nvidia compilator has problems.

jef.blanc

newbie

Activity: 29

Merit: 0

After writing the post above, I checked the file pyopencl-0.92\src\wrapper\wrap_cl.hpp to see the use of the 6 functions that are not in OpenCL 1.0, and I saw an interesting point :

The use of the 6 functions is "protected" by blocks #ifdef CL_VERSION_1_1 / #endif.
So I will check why CL_VERSION_1_1 was defined when I compiled the wrapper and will try to compil it without this macro defined.

More informations to come ... later (it is now 2h33 AM in France, time to go to bed Grin

)

jef.blanc

newbie

Activity: 29

Merit: 0

Hello All,

I tried without success to use the Python OpenCL Miner, here's what I did and what I got :

First my config, I run on Windows 7 Pro 32 bits with a NVIDIA GeForce 9800 GT.

I downloaded and installed CUDA Toolkit from :
http://developer.nvidia.com/object/cuda_3_2_toolkit_rc.html#Windows%20XP,%20Windows%20Vista%20and%20Windows7
-> cudatoolkit_3.2.7_win_32.msi

I downloaded Boost 1.44.0 and built it with :

Code:

bjam.exe toolset=msvc --with-python --with-date_time --with-thread threading=multi link=shared

as did davidonpda on the post #81.
Then I copied all the dll from H:\boost_1_44_0\stage\lib to system32.

I downloaded and built pyopencl-0.92 :
1) running > python configure.py
2) modifying siteconf.py :
(example from http://www.mail-archive.com/[email protected]/msg00349.html)

Code:

BOOST_INC_DIR = [r'H:\boost_1_44_0']
BOOST_LIB_DIR = [r'H:\boost_1_44_0\stage\lib']
BOOST_COMPILER = 'msvc'
BOOST_PYTHON_LIBNAME = ['boost_python-vc90-mt-1_44']
USE_SHIPPED_BOOST = False
CL_TRACE = False
CL_ENABLE_GL = False
CL_INC_DIR = [r'E:\NVIDIA GPU Computing Toolkit\CUDA\v3.2\include']
CL_LIB_DIR = [r'E:\NVIDIA GPU Computing Toolkit\CUDA\v3.2\lib\Win32']
CL_LIBNAME = ['OpenCL']
CXXFLAGS = ['/EHsc', '/DBOOST_PYTHON_NO_PY_SIGNATURES']
LDFLAGS = ['/FORCE']

3) running > ctags -R src || true
4) running > python setup.py build
-> compilation without error
5) running > python setup.py install
-> installation without error

I downloaded poclbm from http://github.com/m0mchil/poclbm
-> m0mchil-poclbm-cf33815.zip

And trying to run it : > python poclbm.py

Code:

Traceback (most recent call last):
  File "poclbm.py", line 5, in 
    import pyopencl as cl
  File "E:\Python26\lib\site-packages\pyopencl-0.92-py2.6-win32.egg\pyopencl\__init__.py", line 3, in 
    import pyopencl._cl as _cl
ImportError: DLL load failed: La procÚdure spÚcifiÚe est introuvable.

(the text in french with the fucked accents means : The specified procedure could not be found).

Note : I don't make all running together the first time, it took a long time and I can forget some step.

I also tried poclbm_py2exe.7z that I got from the board (post #87), first with the original OpenCL.dll in the archive, I got an error due to my Nvidia card I gess :

Code:

No device specified, you may use -d to specify one of the following

Traceback (most recent call last):
  File "poclbm.py", line 57, in 
  File "pyopencl\__init__.pyc", line 138, in program_build
pyopencl.RuntimeError: clBuildProgram failed: build program failure

Build on :

:37: error: incompatible type assigning 'int', expected 'uint2'
        work[5]=0x00000000;
               ^~~~~~~~~~~
:38: error: incompatible type assigning 'int', expected 'uint2'
        work[6]=0x00000000;
               ^~~~~~~~~~~
...

then I tried by replacing the dll with the one from the CUDA Toolkit, and I got the same error than while running poclbm I built :

Code:

  File "poclbm.py", line 5, in 
  File "pyopencl\__init__.pyc", line 3, in 
  File "pyopencl\_cl.pyc", line 12, in 
  File "pyopencl\_cl.pyc", line 10, in __load
ImportError: DLL load failed: La procÚdure spÚcifiÚe est introuvable.

So the problem come from OpenCL dll from Nvidia !

I checked _cl.pyd with dependency walker, if I'm correct (this is the first time I use this tool), some functions are missing in the OpenCL.dll from Nvidia :

Code:

clSetUserEventStatus (pyopencl-0.92\src\wrapper\wrap_cl.hpp(1051,31))
clCreateUserEvent (pyopencl-0.92\src\wrapper\wrap_cl.hpp(1062,20))
                  (pyopencl-0.92\src\wrapper\wrap_cl.hpp(1063,32))
clCreateSubBuffer (pyopencl-0.92\src\wrapper\wrap_cl.hpp(1165,22))
                  (pyopencl-0.92\src\wrapper\wrap_cl.hpp(1168,36))
clEnqueueReadBufferRect (pyopencl-0.92\src\wrapper\wrap_cl.hpp(1416,27))
clEnqueueWriteBufferRect (pyopencl-0.92\src\wrapper\wrap_cl.hpp(1460,27))
clEnqueueCopyBufferRect (pyopencl-0.92\src\wrapper\wrap_cl.hpp(1496,27))

In the file OpenCL.dll in poclbm_py2exe.7z, those 6 functions exist !

I got the explication with a little more search on the net, with the confirmation on those two pages :
http://www.khronos.org/registry/cl/sdk/1.0/docs/man/xhtml/
http://www.khronos.org/registry/cl/sdk/1.1/docs/man/xhtml/

The 6 functions where added in OpenCL 1.1, and Nvidia OpenCL implements the version 1.0 !

~~So if I'm correct pyopencl won't work on Nvidia cards until Nvidia release an OpenCL 1.1 compliant Toolkit and driver.~~
See the post below.

Note from http://developer.nvidia.com/object/opencl.html

Quote

On the same day Khronos Group announced the new OpenCL v1.1 specification update (June 14th, 2010), NVIDIA released OpenCL v1.1 pre-release drivers and SDK code samples to all GPU Computing registered developers.

So you have to register as a "GPU Computing developer" to get the SDK and driver, I don't do it for now (you have to fill a big form which is verified before validation, and I'm not "GPU Computing developer" Grin

).

PS : I think you might have guessed, I'm French. sorry for mistakes and strange turns of phrase that can be Embarrassed

m0mchil

full member

Activity: 171

Merit: 127

The IDs stay the same as long as there are no changes in hardware (new cards for example).

The -w parameter sets the number of 'work group size' (local threads). Unfortunately it's very difficult to determine the optimal value for this because it's different on different hardware platforms. The default is to use maximum reported by OpenCL for the specific hardware. This also was the behavior of previous versions. Last days I tried an optimization which tests two hashes in kernel run and it is 1-2% slower with default local threads (5770 - 256), but 5-6% faster with half of them (128) and global threads little bit more (-f 35). Generally, with default parameters one should achieve about the same or slightly worse performance than before.

m0mchil

full member

Activity: 171

Merit: 127

bethel, what are 5870s showing at stock frequency (850 MHz)? Perhaps one of the cards throttles down?

Latest version has new parameter, -w, to set work group size. I have best results on 5770 with -w 128 -f 35.

bethel

newbie

Activity: 9

Merit: 0

They are the same.
8.14.10.0779 Catalyst 10.9 on Windows 7 64-bit

I would not expect them to be different because one is a clone of the other.

SmokeTooMuch

legendary

Activity: 860

Merit: 1026

Are both machines using the same driver version ?

Topic: python OpenCL bitcoin miner - page 52. (Read 1239149 times)