Pages:
Author

Topic: DiaKGCN kernel for CGMINER + Phoenix 2 (79XX / 78XX / 77XX / GCN) - 2012-05-25 - page 6. (Read 27827 times)

hero member
Activity: 772
Merit: 500
I have 3, 7970s I'd be willing to test on. Shoot me a pm.

Done, thanks for helping Smiley.

Dia
full member
Activity: 131
Merit: 100
I have 3, 7970s I'd be willing to test on. Shoot me a pm.
hero member
Activity: 772
Merit: 500
DiaKGCN is a work-in-progress GCN optimised mining-kernel for CGMINER and Phoenix 2. Currently it ate weeks of hard work and trial and error. It will run on VLIW4 and VLIW5 GPUs just fine, but it's not optimised for them.

As the kernel is now part of CGMINER since version 2.2.7, there is no need to download additional files, you can use it out of the box. I will supply an updated kernel package for Phoenix 2, when the final version is available!

I'd like to get feedback, performance results and ideas to optimise it even further!
To support the further development of this kernel please donate to: 1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x (0.94 BTC donated so far, thanks!)

Diapolo



CGMINER thread with download links and documentation:
https://bitcointalksearch.org/topic/official-cgminer-mining-software-thread-for-linuxwinosxmipsarmr-pi-4110-28402

DiaKGCN - Phoenix 2 download history:
https://anonfiles.com/file/a88219997407050d4b2ec153b35b2c0a
http://www.filedropper.com/diakgcnphoenix2
http://www.filedropper.com/diakgcnphoenix2preview_1

DiaKGCN - Phoenix 1 download history (just for reference):
http://www.filedropper.com/diakgcn04-02-2012
http://www.filedropper.com/diakgcn03-02-2012_1
http://www.filedropper.com/diakgcn02-02-2012
http://www.filedropper.com/diakgcn29-01-2012
http://www.filedropper.com/diakgcn28-01-2012



instructions for CGMINER

To use the current optimal settings on 79XX cards add this parameters to your CGMINER command-line:
Code:
-k diakgcn -v 2 -w 256

You need CGMINER >= 2.2.7 to be able to use diakgcn!



instructions for Phoenix 2

Place the folder diakgcn in phoenix2\plugins and use this for your config-file on 79XX cards (here it's for platform and device 0):
Code:
[cl:0:0]
kernel = diakgcn
aggression = 12
goffset = true
vectors2 = true
vectors4 = false
vectors8 = false
worksize = 256

For VLIW4 / VLIW5 you should use:
Code:
[cl:0:0]
kernel = diakgcn
aggression = 12
goffset = true
vectors2 = false
vectors4 = false
vectors8 = true
worksize = 128

With the current Phoenix 2 version don't use 1 instance with mixed GCN or VLIW4 / VLIW5 GPUs as this will lead to very poor performance!



instructions for Phoenix 1

Place the folder diakgcn in phoenix\kernels and use this command line on 79XX cards:
Code:
-k diakgcn AGGRESSION=12 VECTORS2 WORKSIZE=256

For VLIW4 / VLIW5 you should use:
Code:
-k diakgcn AGGRESSION=12 VECTORS4 WORKSIZE=128
or
Code:
-k diakgcn AGGRESSION=12 VECTORS8 WORKSIZE=128

If you encounter high CPU usage and use multiple cards, try to give each Phoenix instance a single CPU core (set a CPU affinity)!



DiaKGCN parameter description for Phoenix

BFI_INT
Use BFI_INT instruction patching (default is true).

GOFFSET
Use OpenCL 1.1 global offset parameter (default is true).

VECTORS2
Enable uint2 vector support in the kernel (default is false).

VECTORS4
Enable uint4 vector support in the kernel (default is false).

VECTORS8
Enable uint8 vector support in the kernel (default is false).



BFI_INT patching whitelist (only VLIW4 / VLIW5 GPUs)

Barts
BeaverCreek
Caicos
Cayman
Cedar
Cypress
Devastator
Juniper
Loveland
Redwood
Scrapper
Turks
WinterPark



changelog 04-02-2012:
- added uint8 vectors support in the kernel and the init (use VECTORS8 switch to activate it)
- added GOFFSET switch to be able to disable global offset parameter (use GOFFSET=False to disable it)
  -> perhaps GOFFSET is slower for some, now you can try the alternative
- changed some kernel parameter descriptions
- removed unused VECTORS3 code, never got it working :-/
- renamed OpenCL11 flag to hasOpenCL11 in the init
- removed some unneeded references to phatk from the init
- added a few comments in the init
- upped init revision to 127

changelog 03-02-2012:
- fixed the VECTORS4 code-path, which is now usable again
  -> VECTORS4 should be beneficial for VLIW4 / VLIW5, but not for GCN
- removed the (u) typecasts in the non BFI_INT Ch() and Ma() versions
  -> the hex values, who are directly used in Ch() or Ma() were changed to be unsigned
- added 2 different Ma() versions, one for VECTORS2 or VECTORS4 defined (was in before), the other for the scalar version of the kernel (new)
  -> new scalar version saves 4 Bytes in compiled GPU ISA code (but VECTORS2 is still fastest for GCN)
- hardened the BFI_INT auto patching code in the init
  -> a whitelisted OpenCL device is now checked for cl_amd_media_ops extension
- fixed a small bug where I tried to use the C-operator "&" as a "logical and" in the init
  -> changed into an Python "and" ^^
- removed a few lines unused code from the init
- upped init revision to 126

changelog 02-02-2012:
- added an automatic usage of the OpenCL 1.1 global offset parameter, on OpenCL >= 1.1 platforms -> Thanks DiabloD3 for the idea
- removed both __constant arrays in the kernel, values are now used directly
- changed Ma() function from a general one into faster ones for the BFI_INT path and the non BFI_INT path
- added new kernel parameters (W16addK16, W17addK17, state0A and state0B)
- added 2 new local variables state0AaddV0 and state0BaddV0
- rewrote some rounds to use new kernel parameters and variables for faster execution
- fixed a write to output buffer bug for the non VECTORS path in the kernel
- changed the BFI_INT whitelisted flag code in the init
- added an OpenCL >= 1.1 flag in the init used for activating the global offset parameter
- reactivated PyOpenCL version output in the init
- upped init revision to 125
- removed unneeded code or comments from the kernel and the init
- added DiabloMiner kernel as addition reference for getting new ideas in the kernel header

changelog 29-01-2012:
- reordered kernel parameters in order of usage in the kernel
- removed unused kernel parameters (B1addF1addK6, C1addG1addK5, D1addH1)
- added new kernel parameter (PreVal0addK7)
- rewrote first 4 rounds to speed up the kernel
- VECTORS4 parameter is not finished, it currently uses VECTORS2 code-path
Pages:
Jump to: