a flurry of kernel improvements + constant folding

bitless

newbie

Activity: 28

Merit: 0

Well, I hope people will still continue looking into kernel improvements, but start clearly indicating how they've tested their changes and, most importantly, why they think their changes make a difference. Otherwise it is just guessing...

Also, rethaw, thanks for reposting my change to the proper board when I was a newbie and couldn't post! It made it into git/other kernels quite quickly, thanks for that!

rethaw

sr. member

Activity: 378

Merit: 255

Thanks for this post. I have steered clear from every other kernel change in the last week, besides yours.

bitless

newbie

Activity: 28

Merit: 0

Hi all,

First of all, I'm very glad that there's lots of people now out there trying to improve the kernels. Thanks for doing this!

Second, if I may ask this of you... could you please be very very careful when introducing these modifications? By careful, I mean test your changes for at least a day or so and see what happens. As well, looking at the disasm results in kernel analyzer never hurts

Third... Most of the changes that I saw deal with removing some adds and stuff. Normally, any modern compiler should be able to remove all useless adds, such as adding two constants, etc. So I saw no reason for these changes to actually help with the speed. However, some of them were helping, which warranted at least some looking into why they help. So...

Fourth. Here's why I think they help. Most compilers will rip out expressions like (a&b)+c and replace it with a single constant, as long as a,b,c are constants. However. They won't do this if you're using an intrinsic in the expression. For instance, if you do a rotate as x<<(32-n) | x>>n, for constants x and n the whole expression will get replaced with just a single constant. However, if you use amd_bitalign for this, it will not get replaced with a constant (especially if you do it for Ch/Maj, which is patched - how can it?). Yet, if you do a rotate using bitshifts but x or n aren't constants, you'll be slowing things down because now the compiler can't optimize it and you've replaced a bitalign() with a bunch of ops. So, long story short, less intrinsics and more constant expressions is probably the reason your changes help.

PcChip and I tested out this theory; we've replaced the rotates that use intrinsics with the rotate that uses shifts and or-s (i.e. the stuff the compiler can easily see through) for inputs that are constant. We've got 1-2% improvement, PcChip posted the kernel here http://pastebin.com/NPDTfAVd, but we've done it only for the rotates... if somebody could go through the kernel carefully and find other constants and constant expressions that could be removed, it would probably help even more.

Fifth. Feel free to donate coins, but also consider donating to PcChip and the original authors of these kernels; these people put a lot of work into the miners that everybody is using, and they deserve donations more than some dude who happened to notice a 3% improvement to Ma() Wink

(also, I'm not being critical or overly judgmental, just sharing some stuff that I happen to know; it is all my personal opinion, I may be wrong, so feel free to criticize and/or disagree)

Topic: a flurry of kernel improvements + constant folding (Read 811 times)