Pages:
Author

Topic: Modified Kernel for Phoenix 1.5 (Read 96713 times)

member
Activity: 77
Merit: 10
January 23, 2012, 11:38:31 PM
Note: new sdk version works best work worksize 64 for 5870
legendary
Activity: 1344
Merit: 1004
January 23, 2012, 09:43:21 PM
I'm checking back in after being gone for so long...  I just downloaded the 2.6 SDK and it destroys my optimization...  Undecided I will see if there is anything I can do without completely rewriting.  Stay tuned and I should have more info later this week.

P.S. Thanks to everyone who has donated to me in the past, I have been busy lately, but I have not forgotten.

-Phateus

Should be noted that current phatk 2.2 still runs amazingly fast on 2.1 SDK; people suggested poclbm but phatk2 still runs fastest for my system with 2.1 along with 2.4/2.5. At least it does for me. I use VLIW5 hardware. Perhaps keep a 2.1-2.5 kernel around for those that want to keep using it, and a seperate 2.6 optimized kernel for those that need it for GCN architecture hardware or people that game with their gpus (they'll probably be running 1GHz+ on memory which is better suited for VECTORS4 which works well with high memory frequency)
newbie
Activity: 52
Merit: 0
January 21, 2012, 03:44:43 PM
I'm checking back in after being gone for so long...  I just downloaded the 2.6 SDK and it destroys my optimization...  Undecided I will see if there is anything I can do without completely rewriting.  Stay tuned and I should have more info later this week.

P.S. Thanks to everyone who has donated to me in the past, I have been busy lately, but I have not forgotten.

-Phateus
legendary
Activity: 1344
Merit: 1004
January 19, 2012, 03:58:16 AM
Bumping an ancient thread. Just noting that phatk 2.2 still holds the crown for fastest kernel on 2.1, 2.5, and 2.6 SDKs for VLIW5 tech (radeon 5xxx and 60xx-68xx). I hope phateus can come back and do even moar tweaks for moar speed!
hero member
Activity: 686
Merit: 500
Shame on everything; regret nothing.
October 09, 2011, 11:29:51 AM
If it works out for you and you're feeling generous, any donations would be greatly appreciated so I can continue to put out bitcoin related software:
124RraPqYcEpX5qFcQ2ZBVD9MqUamfyQnv

-Phateus

It worked out for me and I'm feeling generous, so I donated for further development.

+1
I would if I could.  I wonder how many BTC enthusiasts live below the USA's so-called "poverty line"?
donator
Activity: 477
Merit: 250
October 09, 2011, 08:47:33 AM
If it works out for you and you're feeling generous, any donations would be greatly appreciated so I can continue to put out bitcoin related software:
124RraPqYcEpX5qFcQ2ZBVD9MqUamfyQnv

-Phateus

It worked out for me and I'm feeling generous, so I donated for further development.
hero member
Activity: 686
Merit: 500
Shame on everything; regret nothing.
October 06, 2011, 12:58:56 PM
Is the latest version of phatk the one that's included in LinuxCoin final?

I could probably check somehow, as I am a LinuxCoin user... I just don't know much about Linux and don't want to poke at my rig while it's on a roll...  Grin
legendary
Activity: 1708
Merit: 1020
October 02, 2011, 03:33:46 PM
first you shocked me with TWO double hashes  Shocked

but ~3375 integer operations per hash is just perfect  Grin

edit: did you mean 3385??

It's actually closer to 3375 because some VLIW5 instructions only have 4 operations in them.  I can get a more exact number if needed, but its kinda a PITA cuz AMD's software won't actually tell you outright.
ok thanks for elaborating. I used 3385 in the last calc but will just say it makes up for all the 6xxx cards Wink
newbie
Activity: 52
Merit: 0
October 01, 2011, 05:29:26 PM
first you shocked me with TWO double hashes  Shocked

but ~3375 integer operations per hash is just perfect  Grin

edit: did you mean 3385??

It's actually closer to 3375 because some VLIW5 instructions only have 4 operations in them.  I can get a more exact number if needed, but its kinda a PITA cuz AMD's software won't actually tell you outright.
legendary
Activity: 1708
Merit: 1020
September 30, 2011, 10:04:31 AM
first you shocked me with TWO double hashes  Shocked

but ~3375 integer operations per hash is just perfect  Grin

edit: did you mean 3385??
newbie
Activity: 52
Merit: 0
September 29, 2011, 10:21:15 AM

1354 OPs are for two double hashes.

SHA256(SHA256(Block_Header1)), SHA256(SHA256(Block_Header2))

so, 677 per double hash.

Although, there aren't completely full hashes, since the first and last few rounds (a few %) have optimized out.

Also, each ALU OP is a VLIW5 (very long instruction word) instruction which contains 5 integer operations that run simultaneously, so... depending on how you think about it,

could be ~3375 integer operations or 677 VLIW5 instructions

Hope this helps, let me know if you need any more help with this.  I am interested in how this turns out.
legendary
Activity: 1708
Merit: 1020
donator
Activity: 477
Merit: 250
September 25, 2011, 01:28:38 PM
donated knickknack
legendary
Activity: 1344
Merit: 1004
September 06, 2011, 11:36:44 AM

how can I generate this kind of a graph for my 5850 and 5750? I'm having an argument with Diablo about the best memory clocks vs. core clocks

go to google docs, make a spreadsheet, test all the speeds and options on your end manually (this part will be extremely time consuming for a high resolution graph), and put the data in yourself, and generate graph. presto pronto.
surely, this can be done programmatically, it's just changing clocks and measuring speeds for x seconds and averaging

although on my cards some values will make them unstable, lol

well the OP already said he did it manually. you're free to write a program to do it automatically, or hire someone to write one for you.
hero member
Activity: 658
Merit: 500
September 06, 2011, 05:26:13 AM

how can I generate this kind of a graph for my 5850 and 5750? I'm having an argument with Diablo about the best memory clocks vs. core clocks

go to google docs, make a spreadsheet, test all the speeds and options on your end manually (this part will be extremely time consuming for a high resolution graph), and put the data in yourself, and generate graph. presto pronto.
surely, this can be done programmatically, it's just changing clocks and measuring speeds for x seconds and averaging

although on my cards some values will make them unstable, lol
legendary
Activity: 1344
Merit: 1004
September 05, 2011, 05:47:39 PM

how can I generate this kind of a graph for my 5850 and 5750? I'm having an argument with Diablo about the best memory clocks vs. core clocks

go to google docs, make a spreadsheet, test all the speeds and options on your end manually (this part will be extremely time consuming for a high resolution graph), and put the data in yourself, and generate graph. presto pronto.
hero member
Activity: 658
Merit: 500
September 04, 2011, 07:47:26 AM

how can I generate this kind of a graph for my 5850 and 5750? I'm having an argument with Diablo about the best memory clocks vs. core clocks
-ck
legendary
Activity: 4088
Merit: 1631
Ruu \o/
August 18, 2011, 08:48:38 PM
Hey phateus, just a head's up. Your cgminer code only worked for 2 vectors. I've updated it in my git tree to work with 1 and 4. Simple enough change.
hero member
Activity: 658
Merit: 500
August 18, 2011, 08:45:40 PM
I am using BFI_INT, the hardware errors are kind of random
I should mention I'm using fpgaminer's poclbm fork for this so maybe it might have something to do with it
newbie
Activity: 52
Merit: 0
August 18, 2011, 12:07:53 PM
I'm getting hardware errors on phatk 2.2, didn't get them on diapolo's or 2.1

the three are about undistinguishable in terms of speed for me

Are you using BFI_INT?  Of not, there is a bug in the 2.2 kernel, Vince found that in the kernel.cl file, you have to replace

Code:
#define Ch(x, y, z) bitselect(x,y,z)

on line 78 with

Code:
#define Ch(x, y, z) bitselect(z, y, x)

I haven't gotten around to release a new version, but if you make the change yourself, it should fix it.

-Phateus
Pages:
Jump to: