further improved phatk_dia kernel for Phoenix + SDK 2.6 - 2012-01-13 - page 5.

gat3way

sr. member

Activity: 256

Merit: 250

The OpenCL compiler does involve constant folding as an optimization pass. It is an obvious optimization, no need to try this.

kano

legendary

Activity: 4634

Merit: 1851

Linux since 1997 RedHat 4

True - however, consider this little comparison ...
A reasonably simple version of sha256 in C when compile with -O2 versus without is almost a double in performance.
(yeah I spent a couple of weeks recently playing with sha256 in C code and seeing what I could do with it ... and early on wondering why I was getting so bad results when I noticed I stupidly left out -O2 ... Tongue

)
Their compiler may not be as good as gcc, but hopefully not much worse.

Of course yes do try and many will be interested in your results

d3m0n1q_733rz

sr. member

Activity: 378

Merit: 250

Is there a way to disassemble the compiled version to the readable format so that I can do a little bit of a search for things to optimize? I've learned never leave to a compile what you can do yourself. Sometimes compilers will take you at your word.

Diapolo

hero member

Activity: 772

Merit: 500

Quote from: kano on December 08, 2011, 06:27:41 AM

Quote from: d3m0n1q_733rz on December 08, 2011, 03:50:39 AM

Small change I could suggest just looking at some of the code. I notice that some variables use simple addition and subtraction a few times. For example:

Code:

// intermediate W calculations
#define P1(n) (rot(W[(n) - 2 - O], 15U) ^ rot(W[(n) - 2 - O], 13U) ^ (W[(n) - 2 - O] >> 10U))
#define P2(n) (rot(W[(n) - 15 - O], 25U) ^ rot(W[(n) - 15 - O], 14U) ^ (W[(n) - 15 - O] >> 3U))
#define P3(n) W[n - 7 - O]
#define P4(n) W[n - 16 - O]

// full W calculation
#define W(n) (W[n - O] = P4(n) + P3(n) + P2(n) + P1(n))

You notice that n - O comes up about 3 times in a row. Wouldn't it be better to just combine n - O into its own variable and subtract from it to reduce the number of variables required to read? Afterall, n - 7 - O is the same as n - 16 - O. If you combine n - O, it's just a simple "pull from buffer and subtract this number" problem.
I haven't tested it, but I imagine it could speed things alone slightly.

Just looking at that from a standard compiler point of view
(I have no idea how good or bad or literal the OpenCL compiler is)
The compiler would probably notice that anyway if the OpenCL compiler was able to do basic optimisations.

Actually, Diapolo, do you know if it has an optimiser in it? (and how good it is?)
i.e. would that change suggested make a difference, or would the compiler work it out itself?
... just curious

It's not a beneficial change, because the compiler optimizes this out + it makes the code a bit more readable.
I'm pretty sure the easy optimizations are all done, but if you guys prove me wrong it would be nice Wink

.

Dia

deepceleron

legendary

Activity: 1512

Merit: 1036

Yes, in fact there are some tweaks done in the code now to make the OpenCL compiler produce more optimized code than it normally does.

kano

legendary

Activity: 4634

Merit: 1851

Linux since 1997 RedHat 4

Quote from: d3m0n1q_733rz on December 08, 2011, 03:50:39 AM

Small change I could suggest just looking at some of the code. I notice that some variables use simple addition and subtraction a few times. For example:

Code:

// intermediate W calculations
#define P1(n) (rot(W[(n) - 2 - O], 15U) ^ rot(W[(n) - 2 - O], 13U) ^ (W[(n) - 2 - O] >> 10U))
#define P2(n) (rot(W[(n) - 15 - O], 25U) ^ rot(W[(n) - 15 - O], 14U) ^ (W[(n) - 15 - O] >> 3U))
#define P3(n) W[n - 7 - O]
#define P4(n) W[n - 16 - O]

// full W calculation
#define W(n) (W[n - O] = P4(n) + P3(n) + P2(n) + P1(n))

You notice that n - O comes up about 3 times in a row. Wouldn't it be better to just combine n - O into its own variable and subtract from it to reduce the number of variables required to read? Afterall, n - 7 - O is the same as n - 16 - O. If you combine n - O, it's just a simple "pull from buffer and subtract this number" problem.
I haven't tested it, but I imagine it could speed things alone slightly.

Just looking at that from a standard compiler point of view
(I have no idea how good or bad or literal the OpenCL compiler is)
The compiler would probably notice that anyway if the OpenCL compiler was able to do basic optimisations.

Actually, Diapolo, do you know if it has an optimiser in it? (and how good it is?)
i.e. would that change suggested make a difference, or would the compiler work it out itself?
... just curious

d3m0n1q_733rz

sr. member

Activity: 378

Merit: 250

Small change I could suggest just looking at some of the code. I notice that some variables use simple addition and subtraction a few times. For example:

Code:

// intermediate W calculations
#define P1(n) (rot(W[(n) - 2 - O], 15U) ^ rot(W[(n) - 2 - O], 13U) ^ (W[(n) - 2 - O] >> 10U))
#define P2(n) (rot(W[(n) - 15 - O], 25U) ^ rot(W[(n) - 15 - O], 14U) ^ (W[(n) - 15 - O] >> 3U))
#define P3(n) W[n - 7 - O]
#define P4(n) W[n - 16 - O]

// full W calculation
#define W(n) (W[n - O] = P4(n) + P3(n) + P2(n) + P1(n))

You notice that n - O comes up about 3 times in a row. Wouldn't it be better to just combine n - O into its own variable and subtract from it to reduce the number of variables required to read? Afterall, n - 7 - O is the same as n - 16 - O. If you combine n - O, it's just a simple "pull from buffer and subtract this number" problem.
I haven't tested it, but I imagine it could speed things alone slightly.

Diapolo

hero member

Activity: 772

Merit: 500

Quote from: dishwara on November 04, 2011, 10:11:47 AM

Sorry for not donating.
I tried & used your software when i was mining.
2 btc sent to the address in signature. 1B6LEGEUu1USreFNaUfvPWLu6JZb7TLivM
Something is better than nothing.
Thank you for your valuable kernel.

I received your donation, a warm thank you

.

Dia

Diapolo

hero member

Activity: 772

Merit: 500

Quote from: gat3way on November 05, 2011, 03:27:23 PM

Quote from: Diapolo on November 04, 2011, 09:12:26 AM

I did

, liked getting (positve) feedback and to be an interesting part of Bitcoin for a short period of time.
Dia

Still coding OpenCL stuff, Diapolo?

I had no time to make any further progress, from time to time I vist AMDs OpenCL forum to stay a little up to date, but I'm currently not coding. Last thing I tried was to implement 3-component vectors into the kernel, but AMDs drivers seem still buggy there.

Dia

gat3way

sr. member

Activity: 256

Merit: 250

Quote from: Diapolo on November 04, 2011, 09:12:26 AM

I did

, liked getting (positve) feedback and to be an interesting part of Bitcoin for a short period of time.
Dia

Still coding OpenCL stuff, Diapolo?

dishwara

legendary

Activity: 1855

Merit: 1016

Sorry for not donating.
I tried & used your software when i was mining.
2 btc sent to the address in signature. 1B6LEGEUu1USreFNaUfvPWLu6JZb7TLivM
Something is better than nothing.
Thank you for your valuable kernel.

Diapolo

hero member

Activity: 772

Merit: 500

Quote from: teukon on November 04, 2011, 07:25:35 AM

Quote from: Diapolo on November 04, 2011, 07:00:28 AM

Quote from: Dexter770221 on November 04, 2011, 03:30:54 AM

Bitcoin price don't encourage to work, and kernels are almost perfect efficient, so many threads are dead...

It's not only the BTC price, but the thankfulness in terms of donations was not really motivating anymore + most people compared my mod only in terms of raw performance with Phat's 2.X kernel, which made me think no one cares about a different approach or even to test out if there are differences in power draw or other factors. Well know I'm to far away anyway ^^.

Dia

The donation problems throughout Bitcoin are rather strange. Despite having a very easy way to send wealth across the internet even the most well known and respected developers receive very little as thanks for their efforts. Perhaps this is partly due to the fact that Bitcoin (particularly Bitcoin mining) attracts people who are, on average, not so inclined to donations, and that most of the miners that care about the extra 0.5% of income from mining are naturally very tight with their money.

Ah well. Thanks once again for all of your work. Following this tread and trying out the patches as they came out was fun. I hope you had fun too!

I did

, liked getting (positve) feedback and to be an interesting part of Bitcoin for a short period of time.

Dia

teukon

legendary

Activity: 1246

Merit: 1011

Quote from: Diapolo on November 04, 2011, 07:00:28 AM

Quote from: Dexter770221 on November 04, 2011, 03:30:54 AM

Bitcoin price don't encourage to work, and kernels are almost perfect efficient, so many threads are dead...

It's not only the BTC price, but the thankfulness in terms of donations was not really motivating anymore + most people compared my mod only in terms of raw performance with Phat's 2.X kernel, which made me think no one cares about a different approach or even to test out if there are differences in power draw or other factors. Well know I'm to far away anyway ^^.

Dia

The donation problems throughout Bitcoin are rather strange. Despite having a very easy way to send wealth across the internet even the most well known and respected developers receive very little as thanks for their efforts. Perhaps this is partly due to the fact that Bitcoin (particularly Bitcoin mining) attracts people who are, on average, not so inclined to donations, and that most of the miners that care about the extra 0.5% of income from mining are naturally very tight with their money.

Ah well. Thanks once again for all of your work. Following this tread and trying out the patches as they came out was fun. I hope you had fun too!

Diapolo

hero member

Activity: 772

Merit: 500

Quote from: Dexter770221 on November 04, 2011, 03:30:54 AM

Bitcoin price don't encourage to work, and kernels are almost perfect efficient, so many threads are dead...

It's not only the BTC price, but the thankfulness in terms of donations was not really motivating anymore + most people compared my mod only in terms of raw performance with Phat's 2.X kernel, which made me think no one cares about a different approach or even to test out if there are differences in power draw or other factors. Well know I'm to far away anyway ^^.

Dia

Dexter770221

legendary

Activity: 1029

Merit: 1000

Bitcoin price don't encourage to work, and kernels are almost perfect efficient, so many threads are dead...

ssateneth

legendary

Activity: 1344

Merit: 1004

Dead thread is dead. No more kernel updates?

ssateneth

legendary

Activity: 1344

Merit: 1004

Quote from: Diapolo on September 06, 2011, 12:03:07 AM

Quote from: Remember remember the 5th of November on September 05, 2011, 08:03:07 PM

Diapolo, i tried the latest kernel and you say that if BFI int is supported it will be enabled, well on my 5870 it wasnt...
So from 385 i went to 334mh/s

That's strange, could you check if your OpenCL driver reports cl_amd_media_ops as available (via GPU Caps Viewer).

Thanks,
Dia

Did he even bother to post what flags he was using? I know there was some confusion between VECTORS and VECTORS2 between yours and phateus kernels. It sounds like he's only doing 1 nonce per execution instead of 2.

Diapolo

hero member

Activity: 772

Merit: 500

Quote from: Remember remember the 5th of November on September 05, 2011, 08:03:07 PM

Diapolo, i tried the latest kernel and you say that if BFI int is supported it will be enabled, well on my 5870 it wasnt...
So from 385 i went to 334mh/s

That's strange, could you check if your OpenCL driver reports cl_amd_media_ops as available (via GPU Caps Viewer).

Thanks,
Dia

Remember remember the 5th of November

legendary

Activity: 1862

Merit: 1014

Reverse engineer from time to time

Diapolo, i tried the latest kernel and you say that if BFI int is supported it will be enabled, well on my 5870 it wasnt...
So from 385 i went to 334mh/s

Diapolo

hero member

Activity: 772

Merit: 500

Quote from: mute20 on September 02, 2011, 11:00:21 AM

I have finally found the 3% boost that I was talking about. Would this work with your improvement?

https://bitcointalksearch.org/topic/3-faster-mining-with-phoenixphatk-diablo-or-poclbm-for-everyone-23067

This is in since I saw that thread, I'm sorry to say Wink

.

Dia

Topic: further improved phatk_dia kernel for Phoenix + SDK 2.6 - 2012-01-13 - page 5. (Read 107059 times)