Modified Kernel for Phoenix 1.5 - page 12.

pennytrader

sr. member

Activity: 254

Merit: 250

Great to see the continuous improvment

Phateus

newbie

Activity: 52

Merit: 0

Quote from: dishwara on July 28, 2011, 03:23:40 AM

Yours gives less hash than Diapolo's
http://forum.bitcoin.org/index.php?topic=25860.0

Using Diapolo's 2011-7-17 i gets 434 & 427 in 5870
While using yours gave 424 & 417, exactly a 10 Mhash/s less.

Ah.. there is a lot I've missed since I've been gone...

I will combine my improvements and his to see if I can get it lower. Thanks for the info.

-Phateus

dishwara

legendary

Activity: 1855

Merit: 1016

Yours gives less hash than Diapolo's
http://forum.bitcoin.org/index.php?topic=25860.0

Using Diapolo's 2011-7-17 i gets 434 & 427 in 5870
While using yours gave 424 & 417, exactly a 10 Mhash/s less.

Phateus

newbie

Activity: 52

Merit: 0

Sorry I haven't really be on the forums much lately... wedding planning stuff Cheesy

.

But...

Quote from: Syke on May 24, 2011, 12:23:40 PM

Any chance of getting a kernel optimized for the 6xxx series?

The is optimized for the 5xxx series, the 66xx series, the 67xx series and the 68xx series since they all use the same architecture. Only the 69xx cards use a different architecture which is less efficient for mining (VLIW4 instead of VLIW5 for those who are interested). I have debated whether to rewrite the kernel for the 69xx series, but at most, it would only increase performance by at most ~1%.

Quote from: Hawkix on June 27, 2011, 04:08:35 PM

Phateus, would you consider to replace the Ma() macro as suggested by bitless and re-run the ATI optimization to check if it can be further improved? Bitless saved 1 operation from each Ma() call. Maybe, with some re-ordering, this can be optimized further.

In the current version, In addition to numerous very tiny optimizations, I have reordered the Ma() operands which reduce the number of instructions on operations with at least one non-vector operand.

Code:

#define Ma(z, x, y) amd_bytealign((y), (x | z), (z & x))

I think this is what you are talking about...

Anywho... here is my new version which is a very slight improvement over 1.0 (about 1% faster for me).

One thing to note is that you MUST put in a valid WORKSIZE value when running version 1.1 due to one of the optimizations.

https://sourceforge.net/projects/phatk/files/phatk-1.1.zip/download

Post any questions or bugs you have, thanks

-Phateus

Hawkix

hero member

Activity: 531

Merit: 505

Phateus, would you consider to replace the Ma() macro as suggested by bitless and re-run the ATI optimization to check if it can be further improved? Bitless saved 1 operation from each Ma() call. Maybe, with some re-ordering, this can be optimized further.

mbraun

newbie

Activity: 2

Merit: 0

Quote from: hchc on June 11, 2011, 11:11:07 AM

can you post some number? I'm contemplating switching from windows to linux just because of this and not sure if its worth while. Currentlly getting 300mh/s with 5830 at 970/300..

These are already great numbers, don't think they'll change much on linux or windows. I also do not believe that mining gets faster because the CPU is able to work 64bits in a single cycle. It's not GPU related.

hchc

hero member

Activity: 504

Merit: 500

Quote from: AngelusWebDesign on June 03, 2011, 12:07:55 PM

Hashkill is faster for me on Linux 64-bit.

can you post some number? I'm contemplating switching from windows to linux just because of this and not sure if its worth while. Currentlly getting 300mh/s with 5830 at 970/300..

hugolp

legendary

Activity: 1148

Merit: 1001

Radix-The Decentralized Finance Protocol

Quote from: tiberiandusk on May 25, 2011, 11:39:25 PM

My experience with my 5870 shows that worksize=128 works the best. With worksize=256 I show a slightly higher hashrate but overall submitted shares goes down a bit.

How is this posible?

mbraun

newbie

Activity: 2

Merit: 0

HD5830 (Sapphire, stock volts) with SDK 2.4
VECTORS BFI_INT AGGRESSION=12 DEVICE=0 FASTLOOP=false WORKSIZE=256

1000/300: 298MH/s, 66°C
1000/300: 310MH/s, 66°C (phatk)

Thanks a lot man!

redcodenl

newbie

Activity: 12

Merit: 0

Quote from: Syke on May 24, 2011, 12:23:40 PM

Any chance of getting a kernel optimized for the 6xxx series?

+1 as well!

I'm now using phatk (with Phoenix) for my double 6870's, it is working like a charm. But the tought that it might do better with an optimized kernel is killing me ;-)
Are there indications a better/optimized kernel for the 6xxx series can be created?

dishwara

legendary

Activity: 1855

Merit: 1016

Waiting for windows version, so i too can get more hashes.

allinvain

legendary

Activity: 3080

Merit: 1083

Quote from: AngelusWebDesign on June 03, 2011, 12:07:55 PM

Hashkill is faster for me on Linux 64-bit.

Hmm, wish they'd release a windblowz binary soon Sad

AngelusWebDesign

sr. member

Activity: 392

Merit: 250

Hashkill is faster for me on Linux 64-bit.

tiberiandusk

hero member

Activity: 575

Merit: 500

The North Remembers

My experience with my 5870 shows that worksize=128 works the best. With worksize=256 I show a slightly higher hashrate but overall submitted shares goes down a bit.

EPiSKiNG

legendary

Activity: 800

Merit: 1001

Quote from: Syke on May 24, 2011, 12:23:40 PM

Any chance of getting a kernel optimized for the 6xxx series?

+1 !!

Syke

legendary

Activity: 3878

Merit: 1193

Any chance of getting a kernel optimized for the 6xxx series?

William Reed

newbie

Activity: 15

Merit: 0

Quote from: JayC on May 21, 2011, 10:23:52 AM

Quote from: William Reed on May 21, 2011, 10:05:57 AM

Works very well. I am getting over 440 Mhash/s on HD 5870 (1000/375) with -k phatk AGGRESSION=13 WORKSIZE=256 VECTORS BFI_INT and about 416 Mhash/s on poclbm. However my other HD 5870 running at 950/375 with same switches only hashes about 410 MHash/s with phatk while poclbm gives about 400MHash/s.

Just out of curiosity, how do you tell what worksize you need for a specific card?

There is no general rule. It mostly depends on the architecture and memory technology used. In heavy scientific calculations best worksize is usually the one that the card can process natively but in mining where a single loop is very simple and fast the optimal worksize can vary. In mining lowering memory clocks saves power and therefore may allow for extra OC on the core thus speeding up computation. If you lower your memory clocks too much it can lower your processing power but this kind of loss can be compensated by lowering worksize.

So without solid background in high speed computation architectures the fastest way to know is trying out all possible combinations.

lagmo

member

Activity: 67

Merit: 10

Very nice job!
Finally got to break the 400Mhash/s barrier on my HD5850, an increase of about 8-10Mhash/s over POCLBM kernel. Grin

huayra.agera

full member

Activity: 154

Merit: 100

Quote from: William Reed on May 21, 2011, 10:05:57 AM

Works very well. I am getting over 440 Mhash/s on HD 5870 (1000/375) with -k phatk AGGRESSION=13 WORKSIZE=256 VECTORS BFI_INT and about 416 Mhash/s on poclbm. However my other HD 5870 running at 950/375 with same switches only hashes about 410 MHash/s with phatk while poclbm gives about 400MHash/s.

This worked well for me! Thanks for this tip man! +1: I have 3 5850s and these settings added like 20 Mhash/s while on my 6850 +10Mh/s! Cool!

JayC

newbie

Activity: 34

Merit: 0

Quote from: William Reed on May 21, 2011, 10:05:57 AM

Works very well. I am getting over 440 Mhash/s on HD 5870 (1000/375) with -k phatk AGGRESSION=13 WORKSIZE=256 VECTORS BFI_INT and about 416 Mhash/s on poclbm. However my other HD 5870 running at 950/375 with same switches only hashes about 410 MHash/s with phatk while poclbm gives about 400MHash/s.

Just out of curiosity, how do you tell what worksize you need for a specific card?

Topic: Modified Kernel for Phoenix 1.5 - page 12. (Read 96811 times)