Pages:
Author

Topic: [BOUNTY] sha256 shader for Linux OSS video drivers (15 BTC pledged) (Read 31384 times)

legendary
Activity: 1596
Merit: 1100
Here's another 10 BTC. With the recent USD price of bitcoins, I wouldn't even say "it's not much".

Retracting my offer, as GPUs are now useless for sha256 mining.

Yes, the OP was long ago updated to reflect current bounty status (15 BTC from me).

sr. member
Activity: 520
Merit: 253
555
Here's another 10 BTC. With the recent USD price of bitcoins, I wouldn't even say "it's not much".

Retracting my offer, as GPUs are now useless for sha256 mining.
newbie
Activity: 26
Merit: 0
Hi everyone.
I'm probably reviving an old thread and I apologize for it.

I made a GLSL implementation of the sha256d script I posted here:
https://bitcointalksearch.org/topic/how-fastsimpleshort-a-cpu-bitcoin-mining-core-script-can-be-286532

Using Opengl 3.3 and GLSL shaders 1.3 with built-in bitwise operations support I can almost reach cgminer performance.

As an example, in a machine I got a GeForce 9600GT. Pretty bad for mining, but shows very well how things went out.

https://en.bitcoin.it/wiki/Mining_hardware_comparison
The chart tells that the GPU gets 15.66 Mh/s. That's what a couple of benchmarks show:

9600GT -> ideally: 15.66 Mh/s
9600GT -> cgminer speed: 15.34 Mh/s
9600GT -> my GLSL script: 14.80 Mh/s

Not sure how this is going to help. The shader right now is naively translated from C code, but even if most of it can be optimized for glsl at most one can reach cgminer performance but not a Kh more.

I considered the possibility of using a combination of vertex/fragment/geometry shaders together but this didn't work out (how do you call 1M vertex shaders? With 1M GL_POINTS to be rasterized...).

Btw I post the code here just in case someone is interested.

Code:
#version 130
#pragma optionNV(unroll all)

uint ROTLEFT(in uint a, in int b) { return (a << b) | (a >> (32-b)); }
uint ROTRIGHT(in uint a, in int b) { return (a >> b) | (a << (32-b)); }

uint CH(in uint x,in uint y,in uint z) { return (x & y) ^ (~x & z); }
uint MAJ(in uint x,in uint y,in uint z) { return (x & y) ^ (x & z) ^ (y & z); }
uint EP0(in uint x) { return ROTRIGHT(x,2) ^ ROTRIGHT(x,13) ^ ROTRIGHT(x,22); }
uint EP1(in uint x) { return ROTRIGHT(x,6) ^ ROTRIGHT(x,11) ^ ROTRIGHT(x,25); }
uint SIG0(in uint x) { return ROTRIGHT(x,7) ^ ROTRIGHT(x,18) ^ (x >> 3); }
uint SIG1(in uint x) { return ROTRIGHT(x,17) ^ ROTRIGHT(x,19) ^ (x >> 10); }

uint k[64] = uint[64](
0x428a2f98,0x71374491,0xb5c0fbcf,0xe9b5dba5,0x3956c25b,0x59f111f1,0x923f82a4,0xab1c5ed5,
0xd807aa98,0x12835b01,0x243185be,0x550c7dc3,0x72be5d74,0x80deb1fe,0x9bdc06a7,0xc19bf174,
0xe49b69c1,0xefbe4786,0x0fc19dc6,0x240ca1cc,0x2de92c6f,0x4a7484aa,0x5cb0a9dc,0x76f988da,
0x983e5152,0xa831c66d,0xb00327c8,0xbf597fc7,0xc6e00bf3,0xd5a79147,0x06ca6351,0x14292967,
0x27b70a85,0x2e1b2138,0x4d2c6dfc,0x53380d13,0x650a7354,0x766a0abb,0x81c2c92e,0x92722c85,
0xa2bfe8a1,0xa81a664b,0xc24b8b70,0xc76c51a3,0xd192e819,0xd6990624,0xf40e3585,0x106aa070,
0x19a4c116,0x1e376c08,0x2748774c,0x34b0bcb5,0x391c0cb3,0x4ed8aa4a,0x5b9cca4f,0x682e6ff3,
0x748f82ee,0x78a5636f,0x84c87814,0x8cc70208,0x90befffa,0xa4506ceb,0xbef9a3f7,0xc67178f2
);

uniform uint midstate[8];
uniform uint text[16];

void main() {

uint a,b,c,d,e,f,g,h,t1,t2,m[64];
uint ee,eee,eeee;
int i;

a = midstate[0];
b = midstate[1];
c = midstate[2];
d = midstate[3];
e = midstate[4];
f = midstate[5];
g = midstate[6];
h = midstate[7];

for (i = 0;  i < 16; i++) m[i] = text[i];

for (; i < 64; i++) m[i] = SIG1(m[i-2]) + m[i-7] + SIG0(m[i-15]) + m[i-16];

for (i = 0; i < 64; i++) {
t1 = h + EP1(e) + CH(e,f,g) + k[i] + m[i];
t2 = EP0(a) + MAJ(a,b,c);
h = g;
g = f;
f = e;
e = d + t1;
d = c;
c = b;
b = a;
a = t1 + t2;
}

m[0] = midstate[0] + a;
m[1] = midstate[1] + b;
m[2] = midstate[2] + c;
m[3] = midstate[3] + d;
m[4] = midstate[4] + e;
m[5] = midstate[5] + f;
m[6] = midstate[6] + g;
m[7] = midstate[7] + h;

a = 0x6a09e667U;
b = 0xbb67ae85U;
c = 0x3c6ef372U;
d = 0xa54ff53aU;
e = 0x510e527fU;
f = 0x9b05688cU;
g = 0x1f83d9abU;
h = 0x5be0cd19U;

m[8]  = 0x80000000U;
m[9]  = 0x00U;
m[10] = 0x00U;
m[11] = 0x00U;
m[12] = 0x00U;
m[13] = 0x00U;
m[14] = 0x00U;
m[15] = 0x100U;

for (i = 16; i < 64; i++) m[i] = SIG1(m[i-2]) + m[i-7] + SIG0(m[i-15]) + m[i-16];

for (i = 0; i < 57; i++) {
t1 = h + EP1(e) + CH(e,f,g) + k[i] + m[i];
t2 = EP0(a) + MAJ(a,b,c);
h = g;
g = f;
f = e;
e = d + t1;
d = c;
c = b;
b = a;
a = t1 + t2;
}

eeee = d + h + EP1(e) + CH(e,f,g) + 0x78a5636fU + m[57];
eee = c + g + EP1(eeee) + CH(eeee,e,f) + 0x84c87814U + m[58];
ee = b + f + EP1(eee) + CH(eee,eeee,e) + 0x8cc70208U + m[59];
h = a + e + EP1(ee) + CH(ee,eee,eeee) + 0x90befffaU + m[60];

if (0x5be0cd19U + h == 0x00U) {
gl_FragColor=vec4(0.0,1.0,0.0,1.0);
} else { gl_FragColor=vec4(1.0,0.0,0.0,1.0); }
}
legendary
Activity: 2576
Merit: 1186
member
Activity: 89
Merit: 10
legendary
Activity: 1596
Merit: 1100
ACK.  I left the 15 BTC pledge active, as I do consider the project still worthwhile... although of diminished importance in FPGA/ASIC era.
legendary
Activity: 1708
Merit: 1011
I have an update.  I'm withdrawing any pledges that I have made here.  I no longer consider this project to be relevant or worthwhile.
member
Activity: 89
Merit: 10
any updates on this project?

I have a nice old Radeon HD 2600 to play with
legendary
Activity: 1792
Merit: 1008
/dev/null
legendary
Activity: 1596
Merit: 1100
bounty still "alive"?

Yes.  I'll personally keep it alive, matching the $subject pledge (200 BTC).

Updated OP.

legendary
Activity: 1792
Merit: 1008
/dev/null
bounty still "alive"?
legendary
Activity: 965
Merit: 1000
Current status:

- Code is incomplete and buggy, but compiles

- The kernel is not optimized and especially the stream transport of the nonces to the kernel is not really implemented.

- Few issues: nonce is not in the first hash, I think some infos is not passed to the 2nd hash round. And I think the endianess of the hash vs difficulty is not the same.

- The block header and the difficulty are not set yet, since I'm testing other stuff now.

- BrookGPU runs into some sort of infinite loop, consumes up to 2 gb mem and is terminated then (no clue why yet).

- If had tons of problems with arrays, since brook wanted to convert array constants to brook-streams, which are not constant during the nonce yet. No-go, so I just split the array into single vars and wrote me scripts to generate the code (since all var are passed as values and not pointers it's not so much of an issue for now).

- Arrays as local vars are causing trouble, since Brook wants to align them in some way, that cgc doesn't like, so I split them up, too.

Just to give you an idea of the strange-looking code:

Code:
/**
 * Aminer - a bitcoin miner for various platforms.
 *
 * Andreas Rueckert
 *
 * A good part of this code is based on the GLSL sha256 code of xaci: https://bitcointalk.org/index.php?topic=4618.msg191488#msg191488
 */

/*
#pragma optionNV looplimit 32768
*/



/**
 * Some utility functions to process integers represented as float2.
 */

/**
 * Add 2 integers represented as float2.
 *
 * Do not let overflow happen with this function, or use sum_c instead!
 */
kernel float2 add( float2 a, float2 b) {
        float2 ret;

        ret.x = a.x + b.x;
        ret.y = a.y + b.y;

        if (ret.y >= 65536.0) {
                ret.y -= 65536.0;
                ret.x += 1.0;
        }

        if (ret.x >= 65536.0) {
                ret.x -= 65536.0;
}

        return ret;
}

/**
 * Shift an integer represented as a float2 by log2(shift).
 *
 * Note: shift should be a power of two, e.g. to shift 3 steps, use 2^3.
 */
kernel float2 shiftr( float2 a, float shift) {
        float2 ret;

ret.x = a.x / shift;

ret.y = floor( a.y / shift) + frac( ret.x) * 65536.0;

ret.x = floor( ret.x);

        return ret;
}

/**
 * Rotate an integer represented as a float2 by log2(shift).
 *
 * Note: shift should be a power of two, e.g. to rotate 3 steps, use 2^3.
 */
kernel float2 rotater( float2 a, float shift) {
        float2 ret;

ret.x = a.x / shift;  // Shipt words and keep fractions to shift those bits later.
ret.y = a.y / shift;

ret.y += frac( ret.x) * 65536.0;  // Shift low bits from x into y;
ret.x += frac( ret.y) * 65536.0;  // Rotate low bits from y into x;

ret.x = floor( ret.x);  // Cut shifted bits.
ret.y = floor( ret.y);

        return ret;
}

/**
 * Xor half of an integer, represented as a float.
 */
kernel float xor16( float a<>, float b<>) {

        float ret = 0;
        float fact = 32768.0;

        while (fact > 0) {
                if( ( ( a >= fact) || ( b >= fact)) && ( ( a < fact) || ( b < fact))) {
                  ret += fact;
}

                if( a >= fact) {
                  a -= fact;
}
                if (b >= fact) {
                  b -= fact;
}

                fact /= 2.0;
        }
        return ret;
}

/**
 * Xor a complete integer represetended as a float2.
 */
kernel float2 xor( float2 a<>, float2 b<>) {
       float2 ret = { xor16( a.x, b.x), xor16( a.y, b.y) };

       return ret;
}

/**
 * And operation on half of an integer, represented as a float.
 */
kernel float and16( float a<>, float b<>) {
        float ret = 0;
        float fact = 32768.0;

        while (fact > 0) {
                if( ( a >= fact) && ( b >= fact)) {
                  ret += fact;
}

                if( a >= fact) {
                  a -= fact;
}
                if (b >= fact) {
                  b -= fact;
}

                fact /= 2.0;
        }
        return ret;
}

/**
 * And operation on a full integer, represented as a float2.
 */
kernel float2 and( float2 a<>, float2 b<>) {
        float2 ret =  { and16( a.x, b.x), and16( a.y, b.y) };

        return ret;
}

/*
 * Logical complement ("not")
 */
kernel float2 not( float2 a<>) {
       float2 ret = { 65535.0 - a.x, 65535.0 - a.y};

       return ret;
}

/**
 * Swap the 2 words of an int.
 */
kernel swapw( float2 a) {
       float2 ret;

       ret.x = a.y;
       ret.y = a.x;

       return ret;
}

/**
 * Swap the 2 bytes in an 16-bit word.
 */
kernel float swapb( float a) {
       float ret = a / 256.0;

       ret += frac( ret) * 65536.0;

       return floor( ret);
}

/**
 * Swap the 4 bytes of a 4-byte integer;
 */
kernel float2 swapInt( float2 a) {
       float2 ret = swapw( a);

       ret.x = swapb( ret.x);
       ret.y = swapb( ret.y);

       return ret;
}

/**
 * Check if float2 integer a is smaller than float2 integer b.
 */
kernel float isSmaller( float2 a, float2 b) {
       if( ( a.x < b.x) || ( ( a.x == a.x) && ( a.y < b.y))) {
           return 1.0;
       } else {
           return 0.0;
       }
}

kernel float2 blend( float2 m16, float2 m15, float2 m07, float2 m02) {
        float2 s0 = xor( rotater( m15, 128.0), xor( rotater( swapw( m15), 4.0), shiftr( m15, 8)));
        float2 s1 = xor( rotater( swapw( m02), 2.0), xor( rotater( swapw( m02), 8.0), shiftr( m02, 1024.0)));

        return add( add( m16, s0), add( m07, s1));
}

kernel float2 s0( float2 a) {
        return xor( rotater( a, 4.0), xor( rotater( a, 8192.0), rotater( swapw( a), 64.0)));
}

kernel float2 s1( float2 a) {
        return xor( rotater( a, 64.0), xor( rotater( a, 2048.0), rotater( swapw( a), 512.0)));
}

kernel float2 ch( float2 a, float2 b, float2 c) {
        return xor( and( a, b), and( not( a), c));
}

kernel float2 maj( float2 a, float2 b, float2 c) {
        return xor( xor( and( a, b), and( a, c)), and( b, c));
}


/**
 * Let the kernel check a nonce for a given bitcoin block.
 * That's basically 2 rounds of sha256 and a difficulty check.
 *
 * @param nonce The nonce to test for the given block
 * @param block_header* The block header as a set of ints, since brcc always converts constant arrays to brook::stream here.
 * @param difficulty The difficulty as a 256-bit superlong int.
 *
 * @return result Return the nonce, if it is valid. Return -nonce if not.
 */
kernel void kernelMinerCheckNonce( float2 nonce<>
             , float2 block_header0
   , float2 block_header1
   , float2 block_header2
   , float2 block_header3
   , float2 block_header4
   , float2 block_header5
   , float2 block_header6
   , float2 block_header7
   , float2 block_header8
   , float2 block_header9
             , float2 block_header10
   , float2 block_header11
   , float2 block_header12
   , float2 block_header13
   , float2 block_header14
   , float2 block_header15
   , float2 block_header16
   , float2 block_header17
   , float2 block_header18
   , float2 block_header19
   , float2 decoded_difficulty0
   , float2 decoded_difficulty1
   , float2 decoded_difficulty2
   , float2 decoded_difficulty3
   , float2 decoded_difficulty4
   , float2 decoded_difficulty5
   , float2 decoded_difficulty6
   , float2 decoded_difficulty7
   , out float2 result<>) {

       // brcc seems to have problems with array alignment in fp40 model, so no arrays for now... :-(
       float2 k0,k1,k2,k3,k4,k5,k6,k7,k8,k9,k10,k11,k12,k13,k14,k15;
       float2 k16,k17,k18,k19,k20,k21,k22,k23,k24,k25,k26,k27,k28,k29,k30,k31;
       float2 k32,k33,k34,k35,k36,k37,k38,k39,k40,k41,k42,k43,k44,k45,k46,k47;
       float2 k48,k49,k50,k51,k52,k53,k54,k55,k56,k57,k58,k59,k60,k61,k62,k63;
       float2 h0,h1,h2,h3,h4,h5,h6,h7;
       float2 w0,w1,w2,w3,w4,w5,w6,w7,w8,w9,w10,w11,w12,w13,w14,w15;
       float2 w16,w17,w18,w19,w20,w21,w22,w23,w24,w25,w26,w27,w28,w29,w30,w31;
       float2 w32,w33,w34,w35,w36,w37,w38,w39,w40,w41,w42,w43,w44,w45,w46,w47;
       float2 w48,w49,w50,w51,w52,w53,w54,w55,w56,w57,w58,w59,w60,w61,w62,w63;
       float2 a,b,c,d,e,f,g,h;
       float2 t1, t2;

       // Initialize k
       // ( Code generated with the following c-program:
       /*
       #include

       int main(int argc, char *agv[]) {
            unsigned int k[64] = { 0x428a2f98, 0x71374491, 0xb5c0fbcf, 0xe9b5dba5, 0x3956c25b, 0x59f111f1, 0x923f82a4, 0xab1c5ed5,
                         0xd807aa98, 0x12835b01, 0x243185be, 0x550c7dc3, 0x72be5d74, 0x80deb1fe, 0x9bdc06a7, 0xc19bf174,
   0xe49b69c1, 0xefbe4786, 0x0fc19dc6, 0x240ca1cc, 0x2de92c6f, 0x4a7484aa, 0x5cb0a9dc, 0x76f988da,
   0x983e5152, 0xa831c66d, 0xb00327c8, 0xbf597fc7, 0xc6e00bf3, 0xd5a79147, 0x06ca6351, 0x14292967,
           0x27b70a85, 0x2e1b2138, 0x4d2c6dfc, 0x53380d13, 0x650a7354, 0x766a0abb, 0x81c2c92e, 0x92722c85,
       0xa2bfe8a1, 0xa81a664b, 0xc24b8b70, 0xc76c51a3, 0xd192e819, 0xd6990624, 0xf40e3585, 0x106aa070,
         0x19a4c116, 0x1e376c08, 0x2748774c, 0x34b0bcb5, 0x391c0cb3, 0x4ed8aa4a, 0x5b9cca4f, 0x682e6ff3,
           0x748f82ee, 0x78a5636f, 0x84c87814, 0x8cc70208, 0x90befffa, 0xa4506ceb, 0xbef9a3f7, 0xc67178f2 };
    int i;

        for( i = 0; i < 64; ++i) {
            printf( "k%d.x = %5d.0; k%d.y = %5d.0;\n", i, k[i] >> 16, i, k[i] & 0xffff);
            }
       }
       */
       
       k0.x = 17034.0; k0.y = 12184.0;
       k1.x = 28983.0; k1.y = 17553.0;
       k2.x = 46528.0; k2.y = 64463.0;
       k3.x = 59829.0; k3.y = 56229.0;
       k4.x = 14678.0; k4.y = 49755.0;
       k5.x = 23025.0; k5.y =  4593.0;
       k6.x = 37439.0; k6.y = 33444.0;
       k7.x = 43804.0; k7.y = 24277.0;
       k8.x = 55303.0; k8.y = 43672.0;
       k9.x =  4739.0; k9.y = 23297.0;
       k10.x =  9265.0; k10.y = 34238.0;
       k11.x = 21772.0; k11.y = 32195.0;
       k12.x = 29374.0; k12.y = 23924.0;
       k13.x = 32990.0; k13.y = 45566.0;
       k14.x = 39900.0; k14.y =  1703.0;
       k15.x = 49563.0; k15.y = 61812.0;
       k16.x = 58523.0; k16.y = 27073.0;
       k17.x = 61374.0; k17.y = 18310.0;
       k18.x =  4033.0; k18.y = 40390.0;
       k19.x =  9228.0; k19.y = 41420.0;
       k20.x = 11753.0; k20.y = 11375.0;
       k21.x = 19060.0; k21.y = 33962.0;
       k22.x = 23728.0; k22.y = 43484.0;
       k23.x = 30457.0; k23.y = 35034.0;
       k24.x = 38974.0; k24.y = 20818.0;
       k25.x = 43057.0; k25.y = 50797.0;
       k26.x = 45059.0; k26.y = 10184.0;
       k27.x = 48985.0; k27.y = 32711.0;
       k28.x = 50912.0; k28.y =  3059.0;
       k29.x = 54695.0; k29.y = 37191.0;
       k30.x =  1738.0; k30.y = 25425.0;
       k31.x =  5161.0; k31.y = 10599.0;
       k32.x = 10167.0; k32.y =  2693.0;
       k33.x = 11803.0; k33.y =  8504.0;
       k34.x = 19756.0; k34.y = 28156.0;
       k35.x = 21304.0; k35.y =  3347.0;
       k36.x = 25866.0; k36.y = 29524.0;
       k37.x = 30314.0; k37.y =  2747.0;
       k38.x = 33218.0; k38.y = 51502.0;
       k39.x = 37490.0; k39.y = 11397.0;
       k40.x = 41663.0; k40.y = 59553.0;
       k41.x = 43034.0; k41.y = 26187.0;
       k42.x = 49739.0; k42.y = 35696.0;
       k43.x = 51052.0; k43.y = 20899.0;
       k44.x = 53650.0; k44.y = 59417.0;
       k45.x = 54937.0; k45.y =  1572.0;
       k46.x = 62478.0; k46.y = 13701.0;
       k47.x =  4202.0; k47.y = 41072.0;
       k48.x =  6564.0; k48.y = 49430.0;
       k49.x =  7735.0; k49.y = 27656.0;
       k50.x = 10056.0; k50.y = 30540.0;
       k51.x = 13488.0; k51.y = 48309.0;
       k52.x = 14620.0; k52.y =  3251.0;
       k53.x = 20184.0; k53.y = 43594.0;
       k54.x = 23452.0; k54.y = 51791.0;
       k55.x = 26670.0; k55.y = 28659.0;
       k56.x = 29839.0; k56.y = 33518.0;
       k57.x = 30885.0; k57.y = 25455.0;
       k58.x = 33992.0; k58.y = 30740.0;
       k59.x = 36039.0; k59.y =   520.0;
       k60.x = 37054.0; k60.y = 65530.0;
       k61.x = 42064.0; k61.y = 27883.0;
       k62.x = 48889.0; k62.y = 41975.0;
       k63.x = 50801.0; k63.y = 30962.0;


       // Initialize h

       h0.x = 27145.0; h0.y = 58983.0;  // 0x6a09 0xe667
       h1.x = 47975.0; h1.y = 44677.0;  // 0xbb67 0xae85
       h2.x = 15470.0; h2.y = 62322.0;  // 0x3c6e 0xf372
       h3.x = 42319.0; h3.y = 62778.0;  // 0xa54f 0xf53a
       h4.x = 20750.0; h4.y = 21119.0;  // 0x510e 0x527f
       h5.x = 39685.0; h5.y = 26764.0;  // 0x9b05 0x688c
       h6.x =  8067.0; h6.y = 55723.0;  // 0x1f83 0xd9ab
       h7.x = 23520.0; h7.y = 52505.0;  // 0x5be0 0xcd19

       // For the following algorithm, see: http://en.wikipedia.org/wiki/SHA-2#Examples_of_SHA-2_variants

       // Initialize the first 16 w values
       // ToDo? Precompute this outside of the kernel, since the nonce is not in these 16 values?

       /*
        * Process the message in successive 512-bit chunks:
* break message into 512-bit chunks
* for each chunk
    *   break chunk into sixteen 32-bit big-endian words w[0..15]
*/
// Implementation:
w0 = swapInt( block_header0);
w1 = swapInt( block_header1);
w2 = swapInt( block_header2);
w3 = swapInt( block_header3);
w4 = swapInt( block_header4);
w5 = swapInt( block_header5);
w6 = swapInt( block_header6);
w7 = swapInt( block_header7);
w8 = swapInt( block_header8);
w9 = swapInt( block_header9);
w10 = swapInt( block_header10);
w11 = swapInt( block_header11);
w12 = swapInt( block_header12);
w13 = swapInt( block_header13);
w14 = swapInt( block_header14);
w15 = swapInt( block_header15);

/*
         * for( i = 16; i < 64; i++) {
        *   w[i] = blend( w[ i - 16], w[ i - 15], w[ i -7], w[ i - 2]);
        * }
*/
// Implementation
// (Generated with bash script: for i in {16..63}; do echo "w$i = blend( w$((i - 16)), w$((i - 15)), w$((i -7)), w$((i - 2)));"; done ):
w16 = blend( w0, w1, w9, w14);
w17 = blend( w1, w2, w10, w15);
w18 = blend( w2, w3, w11, w16);
w19 = blend( w3, w4, w12, w17);
w20 = blend( w4, w5, w13, w18);
w21 = blend( w5, w6, w14, w19);
w22 = blend( w6, w7, w15, w20);
w23 = blend( w7, w8, w16, w21);
w24 = blend( w8, w9, w17, w22);
w25 = blend( w9, w10, w18, w23);
w26 = blend( w10, w11, w19, w24);
w27 = blend( w11, w12, w20, w25);
w28 = blend( w12, w13, w21, w26);
w29 = blend( w13, w14, w22, w27);
w30 = blend( w14, w15, w23, w28);
w31 = blend( w15, w16, w24, w29);
w32 = blend( w16, w17, w25, w30);
w33 = blend( w17, w18, w26, w31);
w34 = blend( w18, w19, w27, w32);
w35 = blend( w19, w20, w28, w33);
w36 = blend( w20, w21, w29, w34);
w37 = blend( w21, w22, w30, w35);
w38 = blend( w22, w23, w31, w36);
w39 = blend( w23, w24, w32, w37);
w40 = blend( w24, w25, w33, w38);
w41 = blend( w25, w26, w34, w39);
w42 = blend( w26, w27, w35, w40);
w43 = blend( w27, w28, w36, w41);
w44 = blend( w28, w29, w37, w42);
w45 = blend( w29, w30, w38, w43);
w46 = blend( w30, w31, w39, w44);
w47 = blend( w31, w32, w40, w45);
w48 = blend( w32, w33, w41, w46);
w49 = blend( w33, w34, w42, w47);
w50 = blend( w34, w35, w43, w48);
w51 = blend( w35, w36, w44, w49);
w52 = blend( w36, w37, w45, w50);
w53 = blend( w37, w38, w46, w51);
w54 = blend( w38, w39, w47, w52);
w55 = blend( w39, w40, w48, w53);
w56 = blend( w40, w41, w49, w54);
w57 = blend( w41, w42, w50, w55);
w58 = blend( w42, w43, w51, w56);
w59 = blend( w43, w44, w52, w57);
w60 = blend( w44, w45, w53, w58);
w61 = blend( w45, w46, w54, w59);
w62 = blend( w46, w47, w55, w60);
w63 = blend( w47, w48, w56, w61);

/*
* Initialize hash value for this chunk:
    * a := h0
* b := h1
    * c := h2
    * d := h3
    * e := h4
    * f := h5
    * g := h6
    * h := h7
*/
// Implementation:
a = h0;
b = h1;
c = h2;
d = h3;
e = h4;
f = h5;
g = h6;
h = h7;

/*
* Main loop:
* for i from 0 to 63
         *     s0 := (a rightrotate 2) xor (a rightrotate 13) xor (a rightrotate 22)
         *     maj := (a and b) xor (a and c) xor (b and c)
         *     t2 := s0 + maj
         *     s1 := (e rightrotate 6) xor (e rightrotate 11) xor (e rightrotate 25)
         *     ch := (e and f) xor ((not e) and g)
         *     t1 := h + s1 + ch + k[i] + w[i]
*
*     h := g
         *     g := f
         *     f := e
         *     e := d + t1
         *     d := c
         *     c := b
         *     b := a
         *     a := t1 + t2
*/
// Implementation:
// ( Generated with bash script:
// for i in {0..63}; do echo "t2 = s0( a) + maj( a, b, c); t1 = h + s1( e) + ch( e, f, g) + k$i + w$i;  h = g; g = f; f = e; e = d + t1; d = c; c = b; b = a; a = t1 + t2;"; done
// )
t2 = s0( a) + maj( a, b, c); t1 = h + s1( e) + ch( e, f, g) + k0 + w0;  h = g; g = f; f = e; e = d + t1; d = c; c = b; b = a; a = t1 + t2;
t2 = s0( a) + maj( a, b, c); t1 = h + s1( e) + ch( e, f, g) + k1 + w1;  h = g; g = f; f = e; e = d + t1; d = c; c = b; b = a; a = t1 + t2;
t2 = s0( a) + maj( a, b, c); t1 = h + s1( e) + ch( e, f, g) + k2 + w2;  h = g; g = f; f = e; e = d + t1; d = c; c = b; b = a; a = t1 + t2;
t2 = s0( a) + maj( a, b, c); t1 = h + s1( e) + ch( e, f, g) + k3 + w3;  h = g; g = f; f = e; e = d + t1; d = c; c = b; b = a; a = t1 + t2;
t2 = s0( a) + maj( a, b, c); t1 = h + s1( e) + ch( e, f, g) + k4 + w4;  h = g; g = f; f = e; e = d + t1; d = c; c = b; b = a; a = t1 + t2;
t2 = s0( a) + maj( a, b, c); t1 = h + s1( e) + ch( e, f, g) + k5 + w5;  h = g; g = f; f = e; e = d + t1; d = c; c = b; b = a; a = t1 + t2;
t2 = s0( a) + maj( a, b, c); t1 = h + s1( e) + ch( e, f, g) + k6 + w6;  h = g; g = f; f = e; e = d + t1; d = c; c = b; b = a; a = t1 + t2;
t2 = s0( a) + maj( a, b, c); t1 = h + s1( e) + ch( e, f, g) + k7 + w7;  h = g; g = f; f = e; e = d + t1; d = c; c = b; b = a; a = t1 + t2;
t2 = s0( a) + maj( a, b, c); t1 = h + s1( e) + ch( e, f, g) + k8 + w8;  h = g; g = f; f = e; e = d + t1; d = c; c = b; b = a; a = t1 + t2;
t2 = s0( a) + maj( a, b, c); t1 = h + s1( e) + ch( e, f, g) + k9 + w9;  h = g; g = f; f = e; e = d + t1; d = c; c = b; b = a; a = t1 + t2;
t2 = s0( a) + maj( a, b, c); t1 = h + s1( e) + ch( e, f, g) + k10 + w10;  h = g; g = f; f = e; e = d + t1; d = c; c = b; b = a; a = t1 + t2;
t2 = s0( a) + maj( a, b, c); t1 = h + s1( e) + ch( e, f, g) + k11 + w11;  h = g; g = f; f = e; e = d + t1; d = c; c = b; b = a; a = t1 + t2;
t2 = s0( a) + maj( a, b, c); t1 = h + s1( e) + ch( e, f, g) + k12 + w12;  h = g; g = f; f = e; e = d + t1; d = c; c = b; b = a; a = t1 + t2;
t2 = s0( a) + maj( a, b, c); t1 = h + s1( e) + ch( e, f, g) + k13 + w13;  h = g; g = f; f = e; e = d + t1; d = c; c = b; b = a; a = t1 + t2;
t2 = s0( a) + maj( a, b, c); t1 = h + s1( e) + ch( e, f, g) + k14 + w14;  h = g; g = f; f = e; e = d + t1; d = c; c = b; b = a; a = t1 + t2;
t2 = s0( a) + maj( a, b, c); t1 = h + s1( e) + ch( e, f, g) + k15 + w15;  h = g; g = f; f = e; e = d + t1; d = c; c = b; b = a; a = t1 + t2;
t2 = s0( a) + maj( a, b, c); t1 = h + s1( e) + ch( e, f, g) + k16 + w16;  h = g; g = f; f = e; e = d + t1; d = c; c = b; b = a; a = t1 + t2;
t2 = s0( a) + maj( a, b, c); t1 = h + s1( e) + ch( e, f, g) + k17 + w17;  h = g; g = f; f = e; e = d + t1; d = c; c = b; b = a; a = t1 + t2;
t2 = s0( a) + maj( a, b, c); t1 = h + s1( e) + ch( e, f, g) + k18 + w18;  h = g; g = f; f = e; e = d + t1; d = c; c = b; b = a; a = t1 + t2;
t2 = s0( a) + maj( a, b, c); t1 = h + s1( e) + ch( e, f, g) + k19 + w19;  h = g; g = f; f = e; e = d + t1; d = c; c = b; b = a; a = t1 + t2;
t2 = s0( a) + maj( a, b, c); t1 = h + s1( e) + ch( e, f, g) + k20 + w20;  h = g; g = f; f = e; e = d + t1; d = c; c = b; b = a; a = t1 + t2;
t2 = s0( a) + maj( a, b, c); t1 = h + s1( e) + ch( e, f, g) + k21 + w21;  h = g; g = f; f = e; e = d + t1; d = c; c = b; b = a; a = t1 + t2;
t2 = s0( a) + maj( a, b, c); t1 = h + s1( e) + ch( e, f, g) + k22 + w22;  h = g; g = f; f = e; e = d + t1; d = c; c = b; b = a; a = t1 + t2;
t2 = s0( a) + maj( a, b, c); t1 = h + s1( e) + ch( e, f, g) + k23 + w23;  h = g; g = f; f = e; e = d + t1; d = c; c = b; b = a; a = t1 + t2;
t2 = s0( a) + maj( a, b, c); t1 = h + s1( e) + ch( e, f, g) + k24 + w24;  h = g; g = f; f = e; e = d + t1; d = c; c = b; b = a; a = t1 + t2;
t2 = s0( a) + maj( a, b, c); t1 = h + s1( e) + ch( e, f, g) + k25 + w25;  h = g; g = f; f = e; e = d + t1; d = c; c = b; b = a; a = t1 + t2;
t2 = s0( a) + maj( a, b, c); t1 = h + s1( e) + ch( e, f, g) + k26 + w26;  h = g; g = f; f = e; e = d + t1; d = c; c = b; b = a; a = t1 + t2;
t2 = s0( a) + maj( a, b, c); t1 = h + s1( e) + ch( e, f, g) + k27 + w27;  h = g; g = f; f = e; e = d + t1; d = c; c = b; b = a; a = t1 + t2;
t2 = s0( a) + maj( a, b, c); t1 = h + s1( e) + ch( e, f, g) + k28 + w28;  h = g; g = f; f = e; e = d + t1; d = c; c = b; b = a; a = t1 + t2;
t2 = s0( a) + maj( a, b, c); t1 = h + s1( e) + ch( e, f, g) + k29 + w29;  h = g; g = f; f = e; e = d + t1; d = c; c = b; b = a; a = t1 + t2;
t2 = s0( a) + maj( a, b, c); t1 = h + s1( e) + ch( e, f, g) + k30 + w30;  h = g; g = f; f = e; e = d + t1; d = c; c = b; b = a; a = t1 + t2;
t2 = s0( a) + maj( a, b, c); t1 = h + s1( e) + ch( e, f, g) + k31 + w31;  h = g; g = f; f = e; e = d + t1; d = c; c = b; b = a; a = t1 + t2;
t2 = s0( a) + maj( a, b, c); t1 = h + s1( e) + ch( e, f, g) + k32 + w32;  h = g; g = f; f = e; e = d + t1; d = c; c = b; b = a; a = t1 + t2;
t2 = s0( a) + maj( a, b, c); t1 = h + s1( e) + ch( e, f, g) + k33 + w33;  h = g; g = f; f = e; e = d + t1; d = c; c = b; b = a; a = t1 + t2;
t2 = s0( a) + maj( a, b, c); t1 = h + s1( e) + ch( e, f, g) + k34 + w34;  h = g; g = f; f = e; e = d + t1; d = c; c = b; b = a; a = t1 + t2;
t2 = s0( a) + maj( a, b, c); t1 = h + s1( e) + ch( e, f, g) + k35 + w35;  h = g; g = f; f = e; e = d + t1; d = c; c = b; b = a; a = t1 + t2;
t2 = s0( a) + maj( a, b, c); t1 = h + s1( e) + ch( e, f, g) + k36 + w36;  h = g; g = f; f = e; e = d + t1; d = c; c = b; b = a; a = t1 + t2;
t2 = s0( a) + maj( a, b, c); t1 = h + s1( e) + ch( e, f, g) + k37 + w37;  h = g; g = f; f = e; e = d + t1; d = c; c = b; b = a; a = t1 + t2;
t2 = s0( a) + maj( a, b, c); t1 = h + s1( e) + ch( e, f, g) + k38 + w38;  h = g; g = f; f = e; e = d + t1; d = c; c = b; b = a; a = t1 + t2;
t2 = s0( a) + maj( a, b, c); t1 = h + s1( e) + ch( e, f, g) + k39 + w39;  h = g; g = f; f = e; e = d + t1; d = c; c = b; b = a; a = t1 + t2;
t2 = s0( a) + maj( a, b, c); t1 = h + s1( e) + ch( e, f, g) + k40 + w40;  h = g; g = f; f = e; e = d + t1; d = c; c = b; b = a; a = t1 + t2;
t2 = s0( a) + maj( a, b, c); t1 = h + s1( e) + ch( e, f, g) + k41 + w41;  h = g; g = f; f = e; e = d + t1; d = c; c = b; b = a; a = t1 + t2;
t2 = s0( a) + maj( a, b, c); t1 = h + s1( e) + ch( e, f, g) + k42 + w42;  h = g; g = f; f = e; e = d + t1; d = c; c = b; b = a; a = t1 + t2;
t2 = s0( a) + maj( a, b, c); t1 = h + s1( e) + ch( e, f, g) + k43 + w43;  h = g; g = f; f = e; e = d + t1; d = c; c = b; b = a; a = t1 + t2;
t2 = s0( a) + maj( a, b, c); t1 = h + s1( e) + ch( e, f, g) + k44 + w44;  h = g; g = f; f = e; e = d + t1; d = c; c = b; b = a; a = t1 + t2;
t2 = s0( a) + maj( a, b, c); t1 = h + s1( e) + ch( e, f, g) + k45 + w45;  h = g; g = f; f = e; e = d + t1; d = c; c = b; b = a; a = t1 + t2;
t2 = s0( a) + maj( a, b, c); t1 = h + s1( e) + ch( e, f, g) + k46 + w46;  h = g; g = f; f = e; e = d + t1; d = c; c = b; b = a; a = t1 + t2;
t2 = s0( a) + maj( a, b, c); t1 = h + s1( e) + ch( e, f, g) + k47 + w47;  h = g; g = f; f = e; e = d + t1; d = c; c = b; b = a; a = t1 + t2;
t2 = s0( a) + maj( a, b, c); t1 = h + s1( e) + ch( e, f, g) + k48 + w48;  h = g; g = f; f = e; e = d + t1; d = c; c = b; b = a; a = t1 + t2;
t2 = s0( a) + maj( a, b, c); t1 = h + s1( e) + ch( e, f, g) + k49 + w49;  h = g; g = f; f = e; e = d + t1; d = c; c = b; b = a; a = t1 + t2;
t2 = s0( a) + maj( a, b, c); t1 = h + s1( e) + ch( e, f, g) + k50 + w50;  h = g; g = f; f = e; e = d + t1; d = c; c = b; b = a; a = t1 + t2;
t2 = s0( a) + maj( a, b, c); t1 = h + s1( e) + ch( e, f, g) + k51 + w51;  h = g; g = f; f = e; e = d + t1; d = c; c = b; b = a; a = t1 + t2;
t2 = s0( a) + maj( a, b, c); t1 = h + s1( e) + ch( e, f, g) + k52 + w52;  h = g; g = f; f = e; e = d + t1; d = c; c = b; b = a; a = t1 + t2;
t2 = s0( a) + maj( a, b, c); t1 = h + s1( e) + ch( e, f, g) + k53 + w53;  h = g; g = f; f = e; e = d + t1; d = c; c = b; b = a; a = t1 + t2;
t2 = s0( a) + maj( a, b, c); t1 = h + s1( e) + ch( e, f, g) + k54 + w54;  h = g; g = f; f = e; e = d + t1; d = c; c = b; b = a; a = t1 + t2;
t2 = s0( a) + maj( a, b, c); t1 = h + s1( e) + ch( e, f, g) + k55 + w55;  h = g; g = f; f = e; e = d + t1; d = c; c = b; b = a; a = t1 + t2;
t2 = s0( a) + maj( a, b, c); t1 = h + s1( e) + ch( e, f, g) + k56 + w56;  h = g; g = f; f = e; e = d + t1; d = c; c = b; b = a; a = t1 + t2;
t2 = s0( a) + maj( a, b, c); t1 = h + s1( e) + ch( e, f, g) + k57 + w57;  h = g; g = f; f = e; e = d + t1; d = c; c = b; b = a; a = t1 + t2;
t2 = s0( a) + maj( a, b, c); t1 = h + s1( e) + ch( e, f, g) + k58 + w58;  h = g; g = f; f = e; e = d + t1; d = c; c = b; b = a; a = t1 + t2;
t2 = s0( a) + maj( a, b, c); t1 = h + s1( e) + ch( e, f, g) + k59 + w59;  h = g; g = f; f = e; e = d + t1; d = c; c = b; b = a; a = t1 + t2;
t2 = s0( a) + maj( a, b, c); t1 = h + s1( e) + ch( e, f, g) + k60 + w60;  h = g; g = f; f = e; e = d + t1; d = c; c = b; b = a; a = t1 + t2;
t2 = s0( a) + maj( a, b, c); t1 = h + s1( e) + ch( e, f, g) + k61 + w61;  h = g; g = f; f = e; e = d + t1; d = c; c = b; b = a; a = t1 + t2;
t2 = s0( a) + maj( a, b, c); t1 = h + s1( e) + ch( e, f, g) + k62 + w62;  h = g; g = f; f = e; e = d + t1; d = c; c = b; b = a; a = t1 + t2;
t2 = s0( a) + maj( a, b, c); t1 = h + s1( e) + ch( e, f, g) + k63 + w63;  h = g; g = f; f = e; e = d + t1; d = c; c = b; b = a; a = t1 + t2;

/*
* Add this chunk's hash to result so far:
    * h0 := h0 + a
    * h1 := h1 + b
    * h2 := h2 + c
    * h3 := h3 + d
    * h4 := h4 + e
    * h5 := h5 + f
    * h6 := h6 + g
    * h7 := h7 + h
*/
h0 = add( h0, a);
h1 = add( h1, b);
h2 = add( h2, c);
h3 = add( h3, d);
h4 = add( h4, e);
h5 = add( h5, f);
h6 = add( h6, g);
h7 = add( h7, h);

// The result of the first sha256 round should be now in h0..h7 as a big endian encoded int.

// So use it as the new input for w0..w15

// ToDo: check if the h-order should be reversed, like w0 = h15; w1 = h14; ...

w0 = h0; 
w1 = h1;
w2 = h2;
w3 = h3;
w4 = h4;
w5 = h5;
w6 = h6;
w7 = h7;
w8.x = 0.0; w8.y = 0.0;
w9 = w8;
w10 = w8;
w11 = w8;
w12 = w8;
w13 = w8;
w14 = w8;
w15 = w8;

// Re-initialize h for the new sha256 round

        h0.x = 27145.0; h0.y = 58983.0;  // 0x6a09 0xe667
        h1.x = 47975.0; h1.y = 44677.0;  // 0xbb67 0xae85
        h2.x = 15470.0; h2.y = 62322.0;  // 0x3c6e 0xf372
        h3.x = 42319.0; h3.y = 62778.0;  // 0xa54f 0xf53a
        h4.x = 20750.0; h4.y = 21119.0;  // 0x510e 0x527f
        h5.x = 39685.0; h5.y = 26764.0;  // 0x9b05 0x688c
        h6.x =  8067.0; h6.y = 55723.0;  // 0x1f83 0xd9ab
        h7.x = 23520.0; h7.y = 52505.0;  // 0x5be0 0xcd19
/*
         * for( i = 16; i < 64; i++) {
        *   w[i] = blend( w[ i - 16], w[ i - 15], w[ i -7], w[ i - 2]);
        * }
*/
// Implementation
// (Generated with bash script: for i in {16..63}; do echo "w$i = blend( w$((i - 16)), w$((i - 15)), w$((i -7)), w$((i - 2)));"; done ):
w16 = blend( w0, w1, w9, w14);
w17 = blend( w1, w2, w10, w15);
w18 = blend( w2, w3, w11, w16);
w19 = blend( w3, w4, w12, w17);
w20 = blend( w4, w5, w13, w18);
w21 = blend( w5, w6, w14, w19);
w22 = blend( w6, w7, w15, w20);
w23 = blend( w7, w8, w16, w21);
w24 = blend( w8, w9, w17, w22);
w25 = blend( w9, w10, w18, w23);
w26 = blend( w10, w11, w19, w24);
w27 = blend( w11, w12, w20, w25);
w28 = blend( w12, w13, w21, w26);
w29 = blend( w13, w14, w22, w27);
w30 = blend( w14, w15, w23, w28);
w31 = blend( w15, w16, w24, w29);
w32 = blend( w16, w17, w25, w30);
w33 = blend( w17, w18, w26, w31);
w34 = blend( w18, w19, w27, w32);
w35 = blend( w19, w20, w28, w33);
w36 = blend( w20, w21, w29, w34);
w37 = blend( w21, w22, w30, w35);
w38 = blend( w22, w23, w31, w36);
w39 = blend( w23, w24, w32, w37);
w40 = blend( w24, w25, w33, w38);
w41 = blend( w25, w26, w34, w39);
w42 = blend( w26, w27, w35, w40);
w43 = blend( w27, w28, w36, w41);
w44 = blend( w28, w29, w37, w42);
w45 = blend( w29, w30, w38, w43);
w46 = blend( w30, w31, w39, w44);
w47 = blend( w31, w32, w40, w45);
w48 = blend( w32, w33, w41, w46);
w49 = blend( w33, w34, w42, w47);
w50 = blend( w34, w35, w43, w48);
w51 = blend( w35, w36, w44, w49);
w52 = blend( w36, w37, w45, w50);
w53 = blend( w37, w38, w46, w51);
w54 = blend( w38, w39, w47, w52);
w55 = blend( w39, w40, w48, w53);
w56 = blend( w40, w41, w49, w54);
w57 = blend( w41, w42, w50, w55);
w58 = blend( w42, w43, w51, w56);
w59 = blend( w43, w44, w52, w57);
w60 = blend( w44, w45, w53, w58);
w61 = blend( w45, w46, w54, w59);
w62 = blend( w46, w47, w55, w60);
w63 = blend( w47, w48, w56, w61);

/*
* Initialize hash value for this chunk:
    * a := h0
* b := h1
    * c := h2
    * d := h3
    * e := h4
    * f := h5
    * g := h6
    * h := h7
*/
// Implementation:
a = h0;
b = h1;
c = h2;
d = h3;
e = h4;
f = h5;
g = h6;
h = h7;

/*
* Main loop:
* for i from 0 to 63
         *     s0 := (a rightrotate 2) xor (a rightrotate 13) xor (a rightrotate 22)
         *     maj := (a and b) xor (a and c) xor (b and c)
         *     t2 := s0 + maj
         *     s1 := (e rightrotate 6) xor (e rightrotate 11) xor (e rightrotate 25)
         *     ch := (e and f) xor ((not e) and g)
         *     t1 := h + s1 + ch + k[i] + w[i]
*
*     h := g
         *     g := f
         *     f := e
         *     e := d + t1
         *     d := c
         *     c := b
         *     b := a
         *     a := t1 + t2
*/
// Implementation:
// ( Generated with bash script:
// for i in {0..63}; do echo "t2 = s0( a) + maj( a, b, c); t1 = h + s1( e) + ch( e, f, g) + k$i + w$i;  h = g; g = f; f = e; e = d + t1; d = c; c = b; b = a; a = t1 + t2;"; done
// )
t2 = s0( a) + maj( a, b, c); t1 = h + s1( e) + ch( e, f, g) + k0 + w0;  h = g; g = f; f = e; e = d + t1; d = c; c = b; b = a; a = t1 + t2;
t2 = s0( a) + maj( a, b, c); t1 = h + s1( e) + ch( e, f, g) + k1 + w1;  h = g; g = f; f = e; e = d + t1; d = c; c = b; b = a; a = t1 + t2;
t2 = s0( a) + maj( a, b, c); t1 = h + s1( e) + ch( e, f, g) + k2 + w2;  h = g; g = f; f = e; e = d + t1; d = c; c = b; b = a; a = t1 + t2;
t2 = s0( a) + maj( a, b, c); t1 = h + s1( e) + ch( e, f, g) + k3 + w3;  h = g; g = f; f = e; e = d + t1; d = c; c = b; b = a; a = t1 + t2;
t2 = s0( a) + maj( a, b, c); t1 = h + s1( e) + ch( e, f, g) + k4 + w4;  h = g; g = f; f = e; e = d + t1; d = c; c = b; b = a; a = t1 + t2;
t2 = s0( a) + maj( a, b, c); t1 = h + s1( e) + ch( e, f, g) + k5 + w5;  h = g; g = f; f = e; e = d + t1; d = c; c = b; b = a; a = t1 + t2;
t2 = s0( a) + maj( a, b, c); t1 = h + s1( e) + ch( e, f, g) + k6 + w6;  h = g; g = f; f = e; e = d + t1; d = c; c = b; b = a; a = t1 + t2;
t2 = s0( a) + maj( a, b, c); t1 = h + s1( e) + ch( e, f, g) + k7 + w7;  h = g; g = f; f = e; e = d + t1; d = c; c = b; b = a; a = t1 + t2;
t2 = s0( a) + maj( a, b, c); t1 = h + s1( e) + ch( e, f, g) + k8 + w8;  h = g; g = f; f = e; e = d + t1; d = c; c = b; b = a; a = t1 + t2;
t2 = s0( a) + maj( a, b, c); t1 = h + s1( e) + ch( e, f, g) + k9 + w9;  h = g; g = f; f = e; e = d + t1; d = c; c = b; b = a; a = t1 + t2;
t2 = s0( a) + maj( a, b, c); t1 = h + s1( e) + ch( e, f, g) + k10 + w10;  h = g; g = f; f = e; e = d + t1; d = c; c = b; b = a; a = t1 + t2;
t2 = s0( a) + maj( a, b, c); t1 = h + s1( e) + ch( e, f, g) + k11 + w11;  h = g; g = f; f = e; e = d + t1; d = c; c = b; b = a; a = t1 + t2;
t2 = s0( a) + maj( a, b, c); t1 = h + s1( e) + ch( e, f, g) + k12 + w12;  h = g; g = f; f = e; e = d + t1; d = c; c = b; b = a; a = t1 + t2;
t2 = s0( a) + maj( a, b, c); t1 = h + s1( e) + ch( e, f, g) + k13 + w13;  h = g; g = f; f = e; e = d + t1; d = c; c = b; b = a; a = t1 + t2;
t2 = s0( a) + maj( a, b, c); t1 = h + s1( e) + ch( e, f, g) + k14 + w14;  h = g; g = f; f = e; e = d + t1; d = c; c = b; b = a; a = t1 + t2;
t2 = s0( a) + maj( a, b, c); t1 = h + s1( e) + ch( e, f, g) + k15 + w15;  h = g; g = f; f = e; e = d + t1; d = c; c = b; b = a; a = t1 + t2;
t2 = s0( a) + maj( a, b, c); t1 = h + s1( e) + ch( e, f, g) + k16 + w16;  h = g; g = f; f = e; e = d + t1; d = c; c = b; b = a; a = t1 + t2;
t2 = s0( a) + maj( a, b, c); t1 = h + s1( e) + ch( e, f, g) + k17 + w17;  h = g; g = f; f = e; e = d + t1; d = c; c = b; b = a; a = t1 + t2;
t2 = s0( a) + maj( a, b, c); t1 = h + s1( e) + ch( e, f, g) + k18 + w18;  h = g; g = f; f = e; e = d + t1; d = c; c = b; b = a; a = t1 + t2;
t2 = s0( a) + maj( a, b, c); t1 = h + s1( e) + ch( e, f, g) + k19 + w19;  h = g; g = f; f = e; e = d + t1; d = c; c = b; b = a; a = t1 + t2;
t2 = s0( a) + maj( a, b, c); t1 = h + s1( e) + ch( e, f, g) + k20 + w20;  h = g; g = f; f = e; e = d + t1; d = c; c = b; b = a; a = t1 + t2;
t2 = s0( a) + maj( a, b, c); t1 = h + s1( e) + ch( e, f, g) + k21 + w21;  h = g; g = f; f = e; e = d + t1; d = c; c = b; b = a; a = t1 + t2;
t2 = s0( a) + maj( a, b, c); t1 = h + s1( e) + ch( e, f, g) + k22 + w22;  h = g; g = f; f = e; e = d + t1; d = c; c = b; b = a; a = t1 + t2;
t2 = s0( a) + maj( a, b, c); t1 = h + s1( e) + ch( e, f, g) + k23 + w23;  h = g; g = f; f = e; e = d + t1; d = c; c = b; b = a; a = t1 + t2;
t2 = s0( a) + maj( a, b, c); t1 = h + s1( e) + ch( e, f, g) + k24 + w24;  h = g; g = f; f = e; e = d + t1; d = c; c = b; b = a; a = t1 + t2;
t2 = s0( a) + maj( a, b, c); t1 = h + s1( e) + ch( e, f, g) + k25 + w25;  h = g; g = f; f = e; e = d + t1; d = c; c = b; b = a; a = t1 + t2;
t2 = s0( a) + maj( a, b, c); t1 = h + s1( e) + ch( e, f, g) + k26 + w26;  h = g; g = f; f = e; e = d + t1; d = c; c = b; b = a; a = t1 + t2;
t2 = s0( a) + maj( a, b, c); t1 = h + s1( e) + ch( e, f, g) + k27 + w27;  h = g; g = f; f = e; e = d + t1; d = c; c = b; b = a; a = t1 + t2;
t2 = s0( a) + maj( a, b, c); t1 = h + s1( e) + ch( e, f, g) + k28 + w28;  h = g; g = f; f = e; e = d + t1; d = c; c = b; b = a; a = t1 + t2;
t2 = s0( a) + maj( a, b, c); t1 = h + s1( e) + ch( e, f, g) + k29 + w29;  h = g; g = f; f = e; e = d + t1; d = c; c = b; b = a; a = t1 + t2;
t2 = s0( a) + maj( a, b, c); t1 = h + s1( e) + ch( e, f, g) + k30 + w30;  h = g; g = f; f = e; e = d + t1; d = c; c = b; b = a; a = t1 + t2;
t2 = s0( a) + maj( a, b, c); t1 = h + s1( e) + ch( e, f, g) + k31 + w31;  h = g; g = f; f = e; e = d + t1; d = c; c = b; b = a; a = t1 + t2;
t2 = s0( a) + maj( a, b, c); t1 = h + s1( e) + ch( e, f, g) + k32 + w32;  h = g; g = f; f = e; e = d + t1; d = c; c = b; b = a; a = t1 + t2;
t2 = s0( a) + maj( a, b, c); t1 = h + s1( e) + ch( e, f, g) + k33 + w33;  h = g; g = f; f = e; e = d + t1; d = c; c = b; b = a; a = t1 + t2;
t2 = s0( a) + maj( a, b, c); t1 = h + s1( e) + ch( e, f, g) + k34 + w34;  h = g; g = f; f = e; e = d + t1; d = c; c = b; b = a; a = t1 + t2;
t2 = s0( a) + maj( a, b, c); t1 = h + s1( e) + ch( e, f, g) + k35 + w35;  h = g; g = f; f = e; e = d + t1; d = c; c = b; b = a; a = t1 + t2;
t2 = s0( a) + maj( a, b, c); t1 = h + s1( e) + ch( e, f, g) + k36 + w36;  h = g; g = f; f = e; e = d + t1; d = c; c = b; b = a; a = t1 + t2;
t2 = s0( a) + maj( a, b, c); t1 = h + s1( e) + ch( e, f, g) + k37 + w37;  h = g; g = f; f = e; e = d + t1; d = c; c = b; b = a; a = t1 + t2;
t2 = s0( a) + maj( a, b, c); t1 = h + s1( e) + ch( e, f, g) + k38 + w38;  h = g; g = f; f = e; e = d + t1; d = c; c = b; b = a; a = t1 + t2;
t2 = s0( a) + maj( a, b, c); t1 = h + s1( e) + ch( e, f, g) + k39 + w39;  h = g; g = f; f = e; e = d + t1; d = c; c = b; b = a; a = t1 + t2;
t2 = s0( a) + maj( a, b, c); t1 = h + s1( e) + ch( e, f, g) + k40 + w40;  h = g; g = f; f = e; e = d + t1; d = c; c = b; b = a; a = t1 + t2;
t2 = s0( a) + maj( a, b, c); t1 = h + s1( e) + ch( e, f, g) + k41 + w41;  h = g; g = f; f = e; e = d + t1; d = c; c = b; b = a; a = t1 + t2;
t2 = s0( a) + maj( a, b, c); t1 = h + s1( e) + ch( e, f, g) + k42 + w42;  h = g; g = f; f = e; e = d + t1; d = c; c = b; b = a; a = t1 + t2;
t2 = s0( a) + maj( a, b, c); t1 = h + s1( e) + ch( e, f, g) + k43 + w43;  h = g; g = f; f = e; e = d + t1; d = c; c = b; b = a; a = t1 + t2;
t2 = s0( a) + maj( a, b, c); t1 = h + s1( e) + ch( e, f, g) + k44 + w44;  h = g; g = f; f = e; e = d + t1; d = c; c = b; b = a; a = t1 + t2;
t2 = s0( a) + maj( a, b, c); t1 = h + s1( e) + ch( e, f, g) + k45 + w45;  h = g; g = f; f = e; e = d + t1; d = c; c = b; b = a; a = t1 + t2;
t2 = s0( a) + maj( a, b, c); t1 = h + s1( e) + ch( e, f, g) + k46 + w46;  h = g; g = f; f = e; e = d + t1; d = c; c = b; b = a; a = t1 + t2;
t2 = s0( a) + maj( a, b, c); t1 = h + s1( e) + ch( e, f, g) + k47 + w47;  h = g; g = f; f = e; e = d + t1; d = c; c = b; b = a; a = t1 + t2;
t2 = s0( a) + maj( a, b, c); t1 = h + s1( e) + ch( e, f, g) + k48 + w48;  h = g; g = f; f = e; e = d + t1; d = c; c = b; b = a; a = t1 + t2;
t2 = s0( a) + maj( a, b, c); t1 = h + s1( e) + ch( e, f, g) + k49 + w49;  h = g; g = f; f = e; e = d + t1; d = c; c = b; b = a; a = t1 + t2;
t2 = s0( a) + maj( a, b, c); t1 = h + s1( e) + ch( e, f, g) + k50 + w50;  h = g; g = f; f = e; e = d + t1; d = c; c = b; b = a; a = t1 + t2;
t2 = s0( a) + maj( a, b, c); t1 = h + s1( e) + ch( e, f, g) + k51 + w51;  h = g; g = f; f = e; e = d + t1; d = c; c = b; b = a; a = t1 + t2;
t2 = s0( a) + maj( a, b, c); t1 = h + s1( e) + ch( e, f, g) + k52 + w52;  h = g; g = f; f = e; e = d + t1; d = c; c = b; b = a; a = t1 + t2;
t2 = s0( a) + maj( a, b, c); t1 = h + s1( e) + ch( e, f, g) + k53 + w53;  h = g; g = f; f = e; e = d + t1; d = c; c = b; b = a; a = t1 + t2;
t2 = s0( a) + maj( a, b, c); t1 = h + s1( e) + ch( e, f, g) + k54 + w54;  h = g; g = f; f = e; e = d + t1; d = c; c = b; b = a; a = t1 + t2;
t2 = s0( a) + maj( a, b, c); t1 = h + s1( e) + ch( e, f, g) + k55 + w55;  h = g; g = f; f = e; e = d + t1; d = c; c = b; b = a; a = t1 + t2;
t2 = s0( a) + maj( a, b, c); t1 = h + s1( e) + ch( e, f, g) + k56 + w56;  h = g; g = f; f = e; e = d + t1; d = c; c = b; b = a; a = t1 + t2;
t2 = s0( a) + maj( a, b, c); t1 = h + s1( e) + ch( e, f, g) + k57 + w57;  h = g; g = f; f = e; e = d + t1; d = c; c = b; b = a; a = t1 + t2;
t2 = s0( a) + maj( a, b, c); t1 = h + s1( e) + ch( e, f, g) + k58 + w58;  h = g; g = f; f = e; e = d + t1; d = c; c = b; b = a; a = t1 + t2;
t2 = s0( a) + maj( a, b, c); t1 = h + s1( e) + ch( e, f, g) + k59 + w59;  h = g; g = f; f = e; e = d + t1; d = c; c = b; b = a; a = t1 + t2;
t2 = s0( a) + maj( a, b, c); t1 = h + s1( e) + ch( e, f, g) + k60 + w60;  h = g; g = f; f = e; e = d + t1; d = c; c = b; b = a; a = t1 + t2;
t2 = s0( a) + maj( a, b, c); t1 = h + s1( e) + ch( e, f, g) + k61 + w61;  h = g; g = f; f = e; e = d + t1; d = c; c = b; b = a; a = t1 + t2;
t2 = s0( a) + maj( a, b, c); t1 = h + s1( e) + ch( e, f, g) + k62 + w62;  h = g; g = f; f = e; e = d + t1; d = c; c = b; b = a; a = t1 + t2;
t2 = s0( a) + maj( a, b, c); t1 = h + s1( e) + ch( e, f, g) + k63 + w63;  h = g; g = f; f = e; e = d + t1; d = c; c = b; b = a; a = t1 + t2;

/*
* Add this chunk's hash to result so far:
    * h0 := h0 + a
    * h1 := h1 + b
    * h2 := h2 + c
    * h3 := h3 + d
    * h4 := h4 + e
    * h5 := h5 + f
    * h6 := h6 + g
    * h7 := h7 + h
*/
h0 = add( h0, a);
h1 = add( h1, b);
h2 = add( h2, c);
h3 = add( h3, d);
h4 = add( h4, e);
h5 = add( h5, f);
h6 = add( h6, g);
h7 = add( h7, h);

// Compare the nonce with the decoded difficulty.

// ToDo: check for endianess!!!

if( isSmaller( h7, decoded_difficulty7)
    + isSmaller( h6, decoded_difficulty6)
    + isSmaller( h5, decoded_difficulty5)
    + isSmaller( h4, decoded_difficulty4)
    + isSmaller( h3, decoded_difficulty3)
    + isSmaller( h2, decoded_difficulty2)
    + isSmaller( h1, decoded_difficulty1)
    + isSmaller( h0, decoded_difficulty0) == 8.0) {
    result = nonce;  // Found a valid nonce!
} else {
    result.x = -nonce.x;  // Return -nonce to indicate, that this nonce was not valid...
    result.y = -nonce.y;
}

}

Ciao,
Andreas
legendary
Activity: 965
Merit: 1000
Well, at the moment I'm only interested in getting something running on my old GeForce 7600gs, that's not supported by Cuda or OpenCL. Just to give you an idea: my cpu mines at about 300 kilo-hashes at the moment, so 1 Mega-hash would be an improvement for me... Smiley

But even if it works, you have to consider, that the performance would be poor since simple operations like bitshifting require quite some float-operations. And there's no way to really use the entire card, since brook compiles only fragment shaders _or_ vertex shaders. The vertex profile has even limited integer support, but I have only 5 of those shaders, that's why I guess Xaci's system with the floats makes more sense.

So it's a great learning project, but don't see any practical use of it, to be honest...
hero member
Activity: 731
Merit: 503
Libertas a calumnia
[ watching (mining on open source drivers would be awesome) ]
hero member
Activity: 518
Merit: 500
Getting the GLSL code to work properly is really tricky to me. Here's a tutorial that describes some of the issues:

http://www.mathematik.tu-dortmund.de/~goeddeke/gpgpu/tutorial.html

You have to render the GLSL output to a texture and read it back to the host.

At this point, I'm not really sure how Xaci wants to pass the header and the nonce to the shader. Is the header supposed to be variable in a way, too?

I'm trying to simplify things for me a bit, so I translated some of the code to BrookGPU to get float2 streams. This might give a performance hit, since I'm not sure yet, what kind of texture brook generates and passes to the GPU (I've found some posting that said it's a streamlength^2 * 4 * sizeof(float) texture, which would be really big.

So as I'm trying to simplyfy things, I just assume the header as constant and pass an array of nonces to the kernel. The shader should then replace the header nonce with the current nonce and do the double sha256 computation. I guess I'll have to pass the decoded difficulty, too, but I'll see that later...

Ciao,
Andreas


If you can get this working you are my absolute HERO.

I absolutely DESPISE ATI and their proprietary BS drivers that always break. Once they fix X then Y comes up and once they fix Y then Z and X comes up etc.

It's a never ending cycle of desperation, at least for me.

Good luck !
legendary
Activity: 965
Merit: 1000
Completed 'and' and fixed bug in 'not' function. This is the brook version, but it should be easy to port the change back to GLSL if wanted:

Code:
/**
 * Some utility functions to process integers represented as float2.
 */

/**
 * Add 2 integers represented as float2.
 *
 * Do not let overflow happen with this function, or use sum_c instead!
 */
kernel float2 add( float2 a, float2 b) {
        float2 ret;

        ret.x = a.x + b.x;
        ret.y = a.y + b.y;

        if (ret.y >= 65536.0) {
                ret.y -= 65536.0;
                ret.x += 1.0;
        }

        if (ret.x >= 65536.0) {
                ret.x -= 65536.0;
}

        return ret;
}

/**
 * Shift an integer represented as a float2 by log2(shift).
 *
 * Note: shift should be a power of two, e.g. to shift 3 steps, use 2^3.
 */
kernel float2 shiftr( float2 a, float shift) {
        float2 ret;

ret.x = a.x / shift;

ret.y = floor( a.y / shift) + frac( ret.x) * 65536.0;

ret.x = floor( ret.x);

        return ret;
}

/**
 * Rotate an integer represented as a float2 by log2(shift).
 *
 * Note: shift should be a power of two, e.g. to rotate 3 steps, use 2^3.
 */
kernel float2 rotater( float2 a, float shift) {
        float2 ret;

ret.x = a.x / shift;  // Shipt words and keep fractions to shift those bits later.
ret.y = a.y / shift;

ret.y += frac( ret.x) * 65536.0;  // Shift low bits from x into y;
ret.x += frac( ret.y) * 65536.0;  // Rotate low bits from y into x;

ret.x = floor( ret.x);  // Cut shifted bits.
ret.y = floor( ret.y);

        return ret;
}

/**
 * Xor half of an integer, represented as a float.
 */
kernel float xor16( float a<>, float b<>) {

        float ret = 0;
        float fact = 32768.0;

        while (fact > 0) {
                if( ( ( a >= fact) || ( b >= fact)) && ( ( a < fact) || ( b < fact))) {
                  ret += fact;
}

                if( a >= fact) {
                  a -= fact;
}
                if (b >= fact) {
                  b -= fact;
}

                fact /= 2.0;
        }
        return ret;
}

/**
 * Xor a complete integer represetended as a float2.
 */
kernel float2 xor( float2 a<>, float2 b<>) {
       float2 ret = { xor16( a.x, b.x), xor16( a.y, b.y) };

       return ret;
}

/**
 * And operation on half of an integer, represented as a float.
 */
kernel float and16( float a<>, float b<>) {
        float ret = 0;
        float fact = 32768.0;

        while (fact > 0) {
                if( ( a >= fact) && ( b >= fact)) {
                  ret += fact;
}

                if( a >= fact) {
                  a -= fact;
}
                if (b >= fact) {
                  b -= fact;
}

                fact /= 2.0;
        }
        return ret;
}

/**
 * And operation on a full integer, represented as a float2.
 */
kernel float2 and( float2 a<>, float2 b<>) {
        float2 ret =  { and16( a.x, b.x), and16( a.y, b.y) };

        return ret;
}

/*
 * Logical complement ("not")
 */
kernel float2 not( float2 a<>) {
       float2 ret = { 65535.0 - a.x, 65535.0 - a.y};

       return ret;
}

/**
 * Swap the 2 words of an int.
 */
kernel swapw( float2 a) {
       float2 ret;

       ret.x = a.y;
       ret.y = a.x;

       return ret;
}

kernel float2 blend( float2 m16, float2 m15, float2 m07, float2 m02) {
        float2 s0 = xor( rotater( m15, 128.0), xor( rotater( swapw( m15), 4.0), shiftr( m15, 8)));
        float2 s1 = xor( rotater( swapw( m02), 2.0), xor( rotater( swapw( m02), 8.0), shiftr( m02, 1024.0)));

        return add( add( m16, s0), add( m07, s1));
}

kernel float2 e0( float2 a) {
        return xor( rotater( a, 4.0), xor( rotater( a, 8192.0), rotater( swapw( a), 64.0)));
}

kernel float2 e1( float2 a) {
        return xor( rotater( a, 64.0), xor( rotater( a, 2048.0), rotater( swapw( a), 512.0)));
}

kernel float2 ch( float2 a, float2 b, float2 c) {
        return xor( and( a, b), and( not( a), c));
}

kernel float2 maj( float2 a, float2 b, float2 c) {
        return xor( xor( and( a, b), and( a, c)), and( b, c));
}

This code compiles here at least. Don't know if it actually works, since I don't have the actually sha256 code in brook yet.

Ciao,
Andreas
legendary
Activity: 965
Merit: 1000
Getting the GLSL code to work properly is really tricky to me. Here's a tutorial that describes some of the issues:

http://www.mathematik.tu-dortmund.de/~goeddeke/gpgpu/tutorial.html

You have to render the GLSL output to a texture and read it back to the host.

At this point, I'm not really sure how Xaci wants to pass the header and the nonce to the shader. Is the header supposed to be variable in a way, too?

I'm trying to simplify things for me a bit, so I translated some of the code to BrookGPU to get float2 streams. This might give a performance hit, since I'm not sure yet, what kind of texture brook generates and passes to the GPU (I've found some posting that said it's a streamlength^2 * 4 * sizeof(float) texture, which would be really big.

So as I'm trying to simplyfy things, I just assume the header as constant and pass an array of nonces to the kernel. The shader should then replace the header nonce with the current nonce and do the double sha256 computation. I guess I'll have to pass the decoded difficulty, too, but I'll see that later...

Ciao,
Andreas
legendary
Activity: 1204
Merit: 1000
฿itcoin: Currency of Resistance!
subscribing... I love Linux and its new video memory management, called GEM + KMS...

Mining with purely open source tools and drivers will be awesome!!

I wanna test this out!!
legendary
Activity: 965
Merit: 1000
After some more debugging, it seems I've found the problem:

in line 210 nonce is declared as a vec2, so it has 2 elements x and y. But in line 232 and 233 (IIRC), nonce.zw is used for computation. Doesn't work as nonce has no element z and w. When I change those expression to nonce.xy the code compiles and it seems there's even something started, although I get no output so far. Will have to investigate that further and fix more issues of the test code.

Any help is really appreciated!

Ciao,
Andreas
legendary
Activity: 965
Merit: 1000
Did anyone got the sha256 GLSL code to work?

So far I was reading GLSL tutorials hacked me a test app together (from too many sources to recall all the authors... sorry Sad ):

Code:
#include                      //C standard IO
#include                     //C standard lib
#include                     //C string lib

#include                    //GLEW lib
#include                    //GLUT lib


//Function from: http://www.evl.uic.edu/aej/594/code/ogl.cpp
//Read in a textfile (GLSL program)
// we need to pass it as a string to the GLSL driver
char *textFileRead(char *fn) {
  FILE *fp;
  char *content = NULL;
  
  int count=0;
  
  if (fn != NULL) {
    
    fp = fopen(fn,"rt");
    
    if (fp != NULL) {
      
      fseek(fp, 0, SEEK_END);
      count = ftell(fp);
      rewind(fp);
      
      if (count > 0) {
        content = (char *)malloc(sizeof(char) * (count+1));
        count = fread(content,sizeof(char),count,fp);
        content[count] = '\0';
      }
      fclose(fp);
      
    }
  }
  
  return content;
}

//Function from: http://www.evl.uic.edu/aej/594/code/ogl.cpp
//Read in a textfile (GLSL program)
// we can use this to write to a text file
int textFileWrite(char *fn, char *s) {
  FILE *fp;
  int status = 0;
  
  if (fn != NULL) {
    fp = fopen(fn,"w");
    
    if (fp != NULL) {                  
      if (fwrite(s,sizeof(char),strlen(s),fp) == strlen(s))
        status = 1;
      fclose(fp);
    }
  }
  return(status);
}

/**
 * Setup shaders
 */
void setShaders() {
  char *my_fragment_shader_source;
  // char * my_vertex_shader_source;
  GLenum error;

  GLenum my_program;
  // GLenum my_vertex_shader;
  GLenum my_fragment_shader;
  
  // Get Vertex And Fragment Shader Sources
  my_fragment_shader_source = textFileRead( "sha256.glsl");
  // my_vertex_shader_source = GetVertexShaderSource();

  // my_vertex_shader = glCreateShaderObjectARB(GL_VERTEX_SHADER_ARB);
  my_fragment_shader = glCreateShaderObjectARB(GL_FRAGMENT_SHADER_ARB);
 
  // Load Shader Sources
  // glShaderSourceARB(my_vertex_shader, 1, &my_vertex_shader_source, NULL);
  glShaderSourceARB( my_fragment_shader, 1, (const GLcharARB** )&my_fragment_shader_source, NULL);
 
  // Compile The Shaders
  // glCompileShaderARB(my_vertex_shader);
  glCompileShaderARB(my_fragment_shader);
  
  // Check for compile errors
  int compiled = 0;
  glGetObjectParameterivARB( my_fragment_shader, GL_OBJECT_COMPILE_STATUS_ARB, &compiled );

  if  ( !compiled ) {
    int maxLength;

    glGetShaderiv( my_fragment_shader, GL_INFO_LOG_LENGTH, &maxLength);
 
    /* The maxLength includes the NULL character */
    char *fragmentInfoLog = malloc( maxLength *sizeof(char));
    
    glGetShaderInfoLog( my_fragment_shader, maxLength, &maxLength, fragmentInfoLog);
 
    printf( "Compile error log: %s\n\n", fragmentInfoLog);

    /* Handle the error in an appropriate way such as displaying a message or writing to a log file. */
    /* In this simple program, we'll just leave */
    free( fragmentInfoLog);

    // printf( "compile error...\n" );
  }

  // Create Shader And Program Objects
  my_program = glCreateProgramObjectARB();

  if(( error=glGetError()) != GL_NO_ERROR) {
    exit( error);
  }

  // Attach The Shader Objects To The Program Object
  // glAttachObjectARB(my_program, my_vertex_shader);
  glAttachObjectARB(my_program, my_fragment_shader);
 
  // Link The Program Object
  glLinkProgramARB(my_program);
  
  // Use The Program Object Instead Of Fixed Function OpenGL
  glUseProgramObjectARB(my_program);
}

int main( int argc, char *argv[]) {

  glutInit(&argc, argv);
  //glutInitDisplayMode(GLUT_DEPTH | GLUT_DOUBLE | GLUT_RGBA);
  glutInitDisplayMode(GLUT_DOUBLE | GLUT_RGBA);
  glutInitWindowPosition(100,100);
  glutInitWindowSize(320,320);
  glutCreateWindow("GPU");
  
  //  glutDisplayFunc(renderScene);
  // glutIdleFunc(renderScene);
  // glutReshapeFunc(changeSize);
  // glutKeyboardFunc(processNormalKeys);
  
  glewInit();
  if (glewIsSupported("GL_VERSION_2_1"))
    printf("Ready for OpenGL 2.1\n");
  else {
    printf("OpenGL 2.1 not supported\n");
    exit(1);
  }
  if (GLEW_ARB_vertex_shader && GLEW_ARB_fragment_shader && GL_EXT_geometry_shader4)
    printf("Ready for GLSL - vertex, fragment, and geometry units\n");
  else {
    printf("Not totally ready :( \n");
    exit(1);
  }

  setShaders();
  
  glutMainLoop();
  
  // just for compatibiliy purposes
  return 0;

  // glDeleteObjectARB( my_program);
  // glDeleteObjectARB( my_fragment_shader);
}

There are lots of bugs in this code, but at the moment, I just want to compile the shader and start it to do further checks.

I also wrote me a small makefile:
Code:
PROGRAM := glslminer

SOURCES := $(wildcard *.c)

CC = gcc
CCOPTS =
LINKEROPTS = -lGL -lGLEW -lglut

.PHONY: all
all:
        $(CC) $(CCOPTS) $(LINKEROPTS) $(SOURCES) -o $(PROGRAM)

.PHONY: clean
        rm *.o

and when I compile and start the code as root (as a regular user, I don't get access the the nvidia card here), I get:

Code:
localhost glsl # ./glslminer
Ready for OpenGL 2.1
Ready for GLSL - vertex, fragment, and geometry units
Compile error log: 0(232) : error C1031: swizzle mask element not present in operand "zw"
0(233) : error C1031: swizzle mask element not present in operand "zw"

, which seems to mean, that some of the .zw operations fail (I don't know the linenumber yet, since the newlines seems to get lost in my shader source import).

Anyone with more luck?

Ciao,
Andreas
Pages:
Jump to: