Pages:
Author

Topic: further improved phatk_dia kernel for Phoenix + SDK 2.6 - 2012-01-13 - page 5. (Read 106928 times)

sr. member
Activity: 256
Merit: 250
The OpenCL compiler does involve constant folding as an optimization pass. It is an obvious optimization, no need to try this.
legendary
Activity: 4592
Merit: 1851
Linux since 1997 RedHat 4
True - however, consider this little comparison ...
A reasonably simple version of sha256 in C when compile with -O2 versus without is almost a double in performance.
(yeah I spent a couple of weeks recently playing with sha256 in C code and seeing what I could do with it ... and early on wondering why I was getting so bad results when I noticed I stupidly left out -O2 ... Tongue)
Their compiler may not be as good as gcc, but hopefully not much worse.

Of course yes do try and many will be interested in your results Smiley
sr. member
Activity: 378
Merit: 250
Is there a way to disassemble the compiled version to the readable format so that I can do a little bit of a search for things to optimize?  I've learned never leave to a compile what you can do yourself.  Sometimes compilers will take you at your word.
hero member
Activity: 772
Merit: 500
Small change I could suggest just looking at some of the code.  I notice that some variables use simple addition and subtraction a few times.  For example:  
Code:
// intermediate W calculations
#define P1(n) (rot(W[(n) - 2 - O], 15U) ^ rot(W[(n) - 2 - O], 13U) ^ (W[(n) - 2 - O] >> 10U))
#define P2(n) (rot(W[(n) - 15 - O], 25U) ^ rot(W[(n) - 15 - O], 14U) ^ (W[(n) - 15 - O] >> 3U))
#define P3(n) W[n - 7 - O]
#define P4(n) W[n - 16 - O]

// full W calculation
#define W(n) (W[n - O] = P4(n) + P3(n) + P2(n) + P1(n))
You notice that n - O comes up about 3 times in a row.  Wouldn't it be better to just combine n - O into its own variable and subtract from it to reduce the number of variables required to read?  Afterall, n - 7 - O is the same as n - 16 - O.  If you combine n - O, it's just a simple "pull from buffer and subtract this number" problem.
I haven't tested it, but I imagine it could speed things alone slightly.
Just looking at that from a standard compiler point of view
(I have no idea how good or bad or literal the OpenCL compiler is)
The compiler would probably notice that anyway if the OpenCL compiler was able to do basic optimisations.

Actually, Diapolo, do you know if it has an optimiser in it? (and how good it is?)
i.e. would that change suggested make a difference, or would the compiler work it out itself?
... just curious Smiley

It's not a beneficial change, because the compiler optimizes this out + it makes the code a bit more readable.
I'm pretty sure the easy optimizations are all done, but if you guys prove me wrong it would be nice Wink.

Dia
legendary
Activity: 1512
Merit: 1036
Yes, in fact there are some tweaks done in the code now to make the OpenCL compiler produce more optimized code than it normally does.
legendary
Activity: 4592
Merit: 1851
Linux since 1997 RedHat 4
Small change I could suggest just looking at some of the code.  I notice that some variables use simple addition and subtraction a few times.  For example:  
Code:
// intermediate W calculations
#define P1(n) (rot(W[(n) - 2 - O], 15U) ^ rot(W[(n) - 2 - O], 13U) ^ (W[(n) - 2 - O] >> 10U))
#define P2(n) (rot(W[(n) - 15 - O], 25U) ^ rot(W[(n) - 15 - O], 14U) ^ (W[(n) - 15 - O] >> 3U))
#define P3(n) W[n - 7 - O]
#define P4(n) W[n - 16 - O]

// full W calculation
#define W(n) (W[n - O] = P4(n) + P3(n) + P2(n) + P1(n))
You notice that n - O comes up about 3 times in a row.  Wouldn't it be better to just combine n - O into its own variable and subtract from it to reduce the number of variables required to read?  Afterall, n - 7 - O is the same as n - 16 - O.  If you combine n - O, it's just a simple "pull from buffer and subtract this number" problem.
I haven't tested it, but I imagine it could speed things alone slightly.
Just looking at that from a standard compiler point of view
(I have no idea how good or bad or literal the OpenCL compiler is)
The compiler would probably notice that anyway if the OpenCL compiler was able to do basic optimisations.

Actually, Diapolo, do you know if it has an optimiser in it? (and how good it is?)
i.e. would that change suggested make a difference, or would the compiler work it out itself?
... just curious Smiley
sr. member
Activity: 378
Merit: 250
Small change I could suggest just looking at some of the code.  I notice that some variables use simple addition and subtraction a few times.  For example:  
Code:
// intermediate W calculations
#define P1(n) (rot(W[(n) - 2 - O], 15U) ^ rot(W[(n) - 2 - O], 13U) ^ (W[(n) - 2 - O] >> 10U))
#define P2(n) (rot(W[(n) - 15 - O], 25U) ^ rot(W[(n) - 15 - O], 14U) ^ (W[(n) - 15 - O] >> 3U))
#define P3(n) W[n - 7 - O]
#define P4(n) W[n - 16 - O]

// full W calculation
#define W(n) (W[n - O] = P4(n) + P3(n) + P2(n) + P1(n))
You notice that n - O comes up about 3 times in a row.  Wouldn't it be better to just combine n - O into its own variable and subtract from it to reduce the number of variables required to read?  Afterall, n - 7 - O is the same as n - 16 - O.  If you combine n - O, it's just a simple "pull from buffer and subtract this number" problem.
I haven't tested it, but I imagine it could speed things alone slightly.
hero member
Activity: 772
Merit: 500
Sorry for not donating.
I tried & used your software when i was mining.
2 btc sent to the address in signature. 1B6LEGEUu1USreFNaUfvPWLu6JZb7TLivM
Something is better than nothing.
Thank you for your valuable kernel.

I received your donation, a warm thank you Smiley.

Dia
hero member
Activity: 772
Merit: 500
I did Smiley, liked getting (positve) feedback and to be an interesting part of Bitcoin for a short period of time.
Dia

Still coding OpenCL stuff, Diapolo?

I had no time to make any further progress, from time to time I vist AMDs OpenCL forum to stay a little up to date, but I'm currently not coding. Last thing I tried was to implement 3-component vectors into the kernel, but AMDs drivers seem still buggy there.

Dia
sr. member
Activity: 256
Merit: 250
I did Smiley, liked getting (positve) feedback and to be an interesting part of Bitcoin for a short period of time.
Dia

Still coding OpenCL stuff, Diapolo?
legendary
Activity: 1855
Merit: 1016
Sorry for not donating.
I tried & used your software when i was mining.
2 btc sent to the address in signature. 1B6LEGEUu1USreFNaUfvPWLu6JZb7TLivM
Something is better than nothing.
Thank you for your valuable kernel.
hero member
Activity: 772
Merit: 500
Bitcoin price don't encourage to work, and kernels are almost perfect efficient, so many threads are dead...

It's not only the BTC price, but the thankfulness in terms of donations was not really motivating anymore + most people compared my mod only in terms of raw performance with Phat's 2.X kernel, which made me think no one cares about a different approach or even to test out if there are differences in power draw or other factors. Well know I'm to far away anyway ^^.

Dia

The donation problems throughout Bitcoin are rather strange.  Despite having a very easy way to send wealth across the internet even the most well known and respected developers receive very little as thanks for their efforts.  Perhaps this is partly due to the fact that Bitcoin (particularly Bitcoin mining) attracts people who are, on average, not so inclined to donations, and that most of the miners that care about the extra 0.5% of income from mining are naturally very tight with their money.

Ah well.  Thanks once again for all of your work.  Following this tread and trying out the patches as they came out was fun.  I hope you had fun too!


I did Smiley, liked getting (positve) feedback and to be an interesting part of Bitcoin for a short period of time.

Dia
legendary
Activity: 1246
Merit: 1011
Bitcoin price don't encourage to work, and kernels are almost perfect efficient, so many threads are dead...

It's not only the BTC price, but the thankfulness in terms of donations was not really motivating anymore + most people compared my mod only in terms of raw performance with Phat's 2.X kernel, which made me think no one cares about a different approach or even to test out if there are differences in power draw or other factors. Well know I'm to far away anyway ^^.

Dia

The donation problems throughout Bitcoin are rather strange.  Despite having a very easy way to send wealth across the internet even the most well known and respected developers receive very little as thanks for their efforts.  Perhaps this is partly due to the fact that Bitcoin (particularly Bitcoin mining) attracts people who are, on average, not so inclined to donations, and that most of the miners that care about the extra 0.5% of income from mining are naturally very tight with their money.

Ah well.  Thanks once again for all of your work.  Following this tread and trying out the patches as they came out was fun.  I hope you had fun too!
hero member
Activity: 772
Merit: 500
Bitcoin price don't encourage to work, and kernels are almost perfect efficient, so many threads are dead...

It's not only the BTC price, but the thankfulness in terms of donations was not really motivating anymore + most people compared my mod only in terms of raw performance with Phat's 2.X kernel, which made me think no one cares about a different approach or even to test out if there are differences in power draw or other factors. Well know I'm to far away anyway ^^.

Dia
legendary
Activity: 1029
Merit: 1000
Bitcoin price don't encourage to work, and kernels are almost perfect efficient, so many threads are dead...
legendary
Activity: 1344
Merit: 1004
Sad Dead thread is dead. No more kernel updates?
legendary
Activity: 1344
Merit: 1004
Diapolo, i tried the latest kernel and you say that if BFI int is supported it will be enabled, well on my 5870 it wasnt...
So from 385 i went to 334mh/s

That's strange, could you check if your OpenCL driver reports cl_amd_media_ops as available (via GPU Caps Viewer).

Thanks,
Dia

Did he even bother to post what flags he was using? I know there was some confusion between VECTORS and VECTORS2 between yours and phateus kernels. It sounds like he's only doing 1 nonce per execution instead of 2.
hero member
Activity: 772
Merit: 500
Diapolo, i tried the latest kernel and you say that if BFI int is supported it will be enabled, well on my 5870 it wasnt...
So from 385 i went to 334mh/s

That's strange, could you check if your OpenCL driver reports cl_amd_media_ops as available (via GPU Caps Viewer).

Thanks,
Dia
legendary
Activity: 1862
Merit: 1011
Reverse engineer from time to time
Diapolo, i tried the latest kernel and you say that if BFI int is supported it will be enabled, well on my 5870 it wasnt...
So from 385 i went to 334mh/s
hero member
Activity: 772
Merit: 500
I have finally found the 3% boost that I was talking about. Would this work with your improvement?

https://bitcointalksearch.org/topic/3-faster-mining-with-phoenixphatk-diablo-or-poclbm-for-everyone-23067

This is in since I saw that thread, I'm sorry to say Wink.

Dia
Pages:
Jump to: