DiaKGCN is a work-in-progress GCN optimised mining-kernel for CGMINER and Phoenix 2. Currently it ate weeks of hard work and trial and error. It will run on VLIW4 and VLIW5 GPUs just fine, but it's not optimised for them.
As the kernel is now part of CGMINER since version 2.2.7, there is no need to download additional files, you can use it out of the box. I will supply an updated kernel package for Phoenix 2, when the final version is available!
I'd like to get feedback, performance results and ideas to optimise it even further!
To support the further development of this kernel please donate to: 1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x (
0.94 BTC donated so far, thanks!)
Diapolo
CGMINER thread with download links and documentation:
https://bitcointalksearch.org/topic/official-cgminer-mining-software-thread-for-linuxwinosxmipsarmr-pi-4110-28402DiaKGCN - Phoenix 2 download history:
https://anonfiles.com/file/a88219997407050d4b2ec153b35b2c0ahttp://www.filedropper.com/diakgcnphoenix2http://www.filedropper.com/diakgcnphoenix2preview_1DiaKGCN - Phoenix 1 download history (just for reference):
http://www.filedropper.com/diakgcn04-02-2012http://www.filedropper.com/diakgcn03-02-2012_1http://www.filedropper.com/diakgcn02-02-2012http://www.filedropper.com/diakgcn29-01-2012http://www.filedropper.com/diakgcn28-01-2012
instructions for CGMINERTo use the current optimal settings on
79XX cards add this parameters to your CGMINER command-line:
You need CGMINER >= 2.2.7 to be able to use diakgcn!
instructions for Phoenix 2Place the folder diakgcn in phoenix2\plugins and use this for your config-file on
79XX cards (here it's for platform and device 0):
[cl:0:0]
kernel = diakgcn
aggression = 12
goffset = true
vectors2 = true
vectors4 = false
vectors8 = false
worksize = 256
For
VLIW4 / VLIW5 you should use:
[cl:0:0]
kernel = diakgcn
aggression = 12
goffset = true
vectors2 = false
vectors4 = false
vectors8 = true
worksize = 128
With the current Phoenix 2 version don't use 1 instance with mixed GCN or VLIW4 / VLIW5 GPUs as this will lead to very poor performance!
instructions for Phoenix 1Place the folder diakgcn in phoenix\kernels and use this command line on
79XX cards:
-k diakgcn AGGRESSION=12 VECTORS2 WORKSIZE=256
For
VLIW4 / VLIW5 you should use:
-k diakgcn AGGRESSION=12 VECTORS4 WORKSIZE=128
or
-k diakgcn AGGRESSION=12 VECTORS8 WORKSIZE=128
If you encounter high CPU usage and use multiple cards, try to give each Phoenix instance a single CPU core (set a CPU affinity)!
DiaKGCN parameter description for PhoenixBFI_INT
Use BFI_INT instruction patching (default is true).
GOFFSET
Use OpenCL 1.1 global offset parameter (default is true).
VECTORS2
Enable uint2 vector support in the kernel (default is false).
VECTORS4
Enable uint4 vector support in the kernel (default is false).
VECTORS8
Enable uint8 vector support in the kernel (default is false).
BFI_INT patching whitelist (only VLIW4 / VLIW5 GPUs)Barts
BeaverCreek
Caicos
Cayman
Cedar
Cypress
Devastator
Juniper
Loveland
Redwood
Scrapper
Turks
WinterPark
changelog 04-02-2012:
- added uint8 vectors support in the kernel and the init (use VECTORS8 switch to activate it)
- added GOFFSET switch to be able to disable global offset parameter (use GOFFSET=False to disable it)
-> perhaps GOFFSET is slower for some, now you can try the alternative
- changed some kernel parameter descriptions
- removed unused VECTORS3 code, never got it working :-/
- renamed OpenCL11 flag to hasOpenCL11 in the init
- removed some unneeded references to phatk from the init
- added a few comments in the init
- upped init revision to 127
changelog 03-02-2012:
- fixed the VECTORS4 code-path, which is now usable again
-> VECTORS4 should be beneficial for VLIW4 / VLIW5, but not for GCN
- removed the (u) typecasts in the non BFI_INT Ch() and Ma() versions
-> the hex values, who are directly used in Ch() or Ma() were changed to be unsigned
- added 2 different Ma() versions, one for VECTORS2 or VECTORS4 defined (was in before), the other for the scalar version of the kernel (new)
-> new scalar version saves 4 Bytes in compiled GPU ISA code (but VECTORS2 is still fastest for GCN)
- hardened the BFI_INT auto patching code in the init
-> a whitelisted OpenCL device is now checked for cl_amd_media_ops extension
- fixed a small bug where I tried to use the C-operator "&" as a "logical and" in the init
-> changed into an Python "and" ^^
- removed a few lines unused code from the init
- upped init revision to 126
changelog 02-02-2012:
- added an automatic usage of the OpenCL 1.1 global offset parameter, on OpenCL >= 1.1 platforms -> Thanks DiabloD3 for the idea
- removed both __constant arrays in the kernel, values are now used directly
- changed Ma() function from a general one into faster ones for the BFI_INT path and the non BFI_INT path
- added new kernel parameters (W16addK16, W17addK17, state0A and state0B)
- added 2 new local variables state0AaddV0 and state0BaddV0
- rewrote some rounds to use new kernel parameters and variables for faster execution
- fixed a write to output buffer bug for the non VECTORS path in the kernel
- changed the BFI_INT whitelisted flag code in the init
- added an OpenCL >= 1.1 flag in the init used for activating the global offset parameter
- reactivated PyOpenCL version output in the init
- upped init revision to 125
- removed unneeded code or comments from the kernel and the init
- added DiabloMiner kernel as addition reference for getting new ideas in the kernel header
changelog 29-01-2012:
- reordered kernel parameters in order of usage in the kernel
- removed unused kernel parameters (B1addF1addK6, C1addG1addK5, D1addH1)
- added new kernel parameter (PreVal0addK7)
- rewrote first 4 rounds to speed up the kernel
- VECTORS4 parameter is not finished, it currently uses VECTORS2 code-path