Pages:
Author

Topic: Modified Kernel for Phoenix 1.5 - page 12. (Read 96725 times)

sr. member
Activity: 254
Merit: 250
July 28, 2011, 08:19:37 PM
#84
Great to see the continuous improvment
newbie
Activity: 52
Merit: 0
July 28, 2011, 04:16:19 PM
#83
Yours gives less hash than Diapolo's
http://forum.bitcoin.org/index.php?topic=25860.0

Using Diapolo's 2011-7-17 i gets 434 & 427 in 5870
While using yours gave 424 & 417, exactly a 10 Mhash/s less.

Ah.. there is a lot I've missed since I've been gone...

I will combine my improvements and his to see if I can get it lower.  Thanks for the info.

-Phateus
legendary
Activity: 1855
Merit: 1016
July 28, 2011, 03:23:40 AM
#82
Yours gives less hash than Diapolo's
http://forum.bitcoin.org/index.php?topic=25860.0

Using Diapolo's 2011-7-17 i gets 434 & 427 in 5870
While using yours gave 424 & 417, exactly a 10 Mhash/s less.
newbie
Activity: 52
Merit: 0
July 27, 2011, 10:53:33 PM
#81
Sorry I haven't really be on the forums much lately... wedding planning stuff Cheesy.

But...

Any chance of getting a kernel optimized for the 6xxx series?

The is optimized for the 5xxx series, the 66xx series, the 67xx series and the 68xx series since they all use the same architecture.  Only the 69xx cards use a different architecture which is less efficient for mining (VLIW4 instead of VLIW5 for those who are interested).  I have debated whether to rewrite the kernel for the 69xx series, but at most, it would only increase performance by at most ~1%.

Phateus, would you consider to replace the Ma() macro as suggested by bitless and re-run the ATI optimization to check if it can be further improved? Bitless saved 1 operation from each Ma() call. Maybe, with some re-ordering, this can be optimized further.

In the current version, In addition to numerous very tiny optimizations, I have reordered the Ma() operands which reduce the number of instructions on operations with at least one non-vector operand.
Code:
#define Ma(z, x, y) amd_bytealign((y), (x | z), (z & x))
I think this is what you are talking about...

Anywho... here is my new version which is a very slight improvement over 1.0 (about 1% faster for me).

One thing to note is that you MUST put in a valid WORKSIZE value when running version 1.1 due to one of the optimizations.

https://sourceforge.net/projects/phatk/files/phatk-1.1.zip/download

 Post any questions or bugs you have, thanks

-Phateus
hero member
Activity: 531
Merit: 505
June 27, 2011, 04:08:35 PM
#80
Phateus, would you consider to replace the Ma() macro as suggested by bitless and re-run the ATI optimization to check if it can be further improved? Bitless saved 1 operation from each Ma() call. Maybe, with some re-ordering, this can be optimized further.
newbie
Activity: 2
Merit: 0
June 11, 2011, 01:41:34 PM
#79
can you post some number? I'm contemplating switching from windows to linux just because of this and not sure if its worth while. Currentlly getting 300mh/s with 5830 at 970/300..

These are already great numbers, don't think they'll change much on linux or windows. I also do not believe that mining gets faster because the CPU is able to work 64bits in a single cycle. It's not GPU related.
hero member
Activity: 504
Merit: 500
June 11, 2011, 11:11:07 AM
#78
Hashkill is faster for me on Linux 64-bit.


can you post some number? I'm contemplating switching from windows to linux just because of this and not sure if its worth while. Currentlly getting 300mh/s with 5830 at 970/300..
legendary
Activity: 1148
Merit: 1001
Radix-The Decentralized Finance Protocol
June 11, 2011, 10:13:45 AM
#77
My experience with my 5870 shows that worksize=128 works the best. With worksize=256 I show a slightly higher hashrate but overall submitted shares goes down a bit.

How is this posible?
newbie
Activity: 2
Merit: 0
June 11, 2011, 09:52:33 AM
#76
HD5830 (Sapphire, stock volts) with SDK 2.4
VECTORS BFI_INT AGGRESSION=12 DEVICE=0 FASTLOOP=false WORKSIZE=256

1000/300: 298MH/s, 66°C
1000/300: 310MH/s, 66°C (phatk)

Thanks a lot man!
newbie
Activity: 12
Merit: 0
June 07, 2011, 02:10:01 PM
#75
Any chance of getting a kernel optimized for the 6xxx series?

+1 as well!

I'm now using phatk (with Phoenix) for my double 6870's, it is working like a charm. But the tought that it might do better with an optimized kernel is killing me ;-)
Are there indications a better/optimized kernel for the 6xxx series can be created?
legendary
Activity: 1855
Merit: 1016
June 04, 2011, 12:28:05 PM
#74
Waiting for windows version, so i too can get more hashes.
legendary
Activity: 3080
Merit: 1080
June 04, 2011, 04:00:46 AM
#73
Hashkill is faster for me on Linux 64-bit.


Hmm, wish they'd release a windblowz binary soon Sad
sr. member
Activity: 392
Merit: 250
June 03, 2011, 12:07:55 PM
#72
Hashkill is faster for me on Linux 64-bit.
hero member
Activity: 575
Merit: 500
The North Remembers
May 25, 2011, 11:39:25 PM
#71
My experience with my 5870 shows that worksize=128 works the best. With worksize=256 I show a slightly higher hashrate but overall submitted shares goes down a bit.
legendary
Activity: 800
Merit: 1001
May 25, 2011, 04:19:28 PM
#70
Any chance of getting a kernel optimized for the 6xxx series?

+1 !!
legendary
Activity: 3878
Merit: 1193
May 24, 2011, 12:23:40 PM
#69
Any chance of getting a kernel optimized for the 6xxx series?
newbie
Activity: 15
Merit: 0
May 21, 2011, 02:10:55 PM
#68
Works very well. I am getting over 440 Mhash/s on HD 5870 (1000/375) with -k phatk AGGRESSION=13 WORKSIZE=256 VECTORS BFI_INT and about 416 Mhash/s on poclbm. However my other HD 5870 running at 950/375 with same switches only hashes about 410 MHash/s with phatk while poclbm gives about 400MHash/s.


Just out of curiosity, how do you tell what worksize you need for a specific card?

There is no general rule. It mostly depends on the architecture and memory technology used.  In heavy scientific calculations best worksize is usually the one that the card can process natively but in mining where a single loop is very simple and fast the optimal worksize can vary. In mining lowering memory clocks saves power and therefore may allow for extra OC on the core thus speeding up computation. If you lower your memory clocks too much it can lower your processing power but this kind of loss can be compensated by lowering worksize.

So without solid background in high speed computation architectures the fastest way to know is trying out all possible combinations.
member
Activity: 67
Merit: 10
May 21, 2011, 01:17:20 PM
#67
Very nice job!
Finally got to break the 400Mhash/s barrier on my HD5850, an increase of about 8-10Mhash/s over POCLBM kernel.  Grin
full member
Activity: 154
Merit: 100
May 21, 2011, 12:47:35 PM
#66
Works very well. I am getting over 440 Mhash/s on HD 5870 (1000/375) with -k phatk AGGRESSION=13 WORKSIZE=256 VECTORS BFI_INT and about 416 Mhash/s on poclbm. However my other HD 5870 running at 950/375 with same switches only hashes about 410 MHash/s with phatk while poclbm gives about 400MHash/s.


This worked well for me! Thanks for this tip man! +1: I have 3 5850s and these settings added like 20 Mhash/s while on my 6850 +10Mh/s! Cool!
newbie
Activity: 34
Merit: 0
May 21, 2011, 10:23:52 AM
#65
Works very well. I am getting over 440 Mhash/s on HD 5870 (1000/375) with -k phatk AGGRESSION=13 WORKSIZE=256 VECTORS BFI_INT and about 416 Mhash/s on poclbm. However my other HD 5870 running at 950/375 with same switches only hashes about 410 MHash/s with phatk while poclbm gives about 400MHash/s.


Just out of curiosity, how do you tell what worksize you need for a specific card?
Pages:
Jump to: