Phoenix 2 beta discussion - page 6. | Bitcointalksearch.org

Diapolo

hero member

Activity: 772

Merit: 500

#[cl:0:0] should be [cl:0:0], currently it's unused, because it's treated as a comment and is therefore simply ignored.

Dia

baby_ghost

member

Activity: 89

Merit: 10

Here is my phoenix.cfg, may I wrong on other sections? Miner speed lower than Phoenix 1.7.5

Code:

[general]
verbose = True
autodetect = +cl -cpu # The rightmost parameter takes precedence. This enables all OpenCL devices, except those that are CPUs.
backend = http://vinhpk.06:[email protected]:8332/ # URL format is exactly as it was in Phoenix 1

[web]
password = rpc_password # Set an RPC password to keep people from messing with your miners.

# If you want to configure miners yourself, edit and uncomment this section:
#[cl:0:0] # Or whatever ID you want to configure.
#autoconfigure = True # Do you still want autoconfiguration?
#disabled = False # Do you want to disable the miner?
#bfi_int = True    # Any other kernel options...
#vectors = False # can go into this section.
phatk2 = True
VECTORS = True
BFI_INT = True
WORKSIZE = 256 
AGGRESSION = 11 
FASTLOOPS = false

jedi95

full member

Activity: 219

Merit: 120

Quote from: Diapolo on February 07, 2012, 10:47:42 AM

Above config generates "Detected [cl:0:0]: [Tahiti 0] using opencl (rating 2)", which I don't understand. Shouldn't it simply use the kernel specified and tell that autodetect had been overridden by own settings.

That's because you still have autodetect = +cl in the config file. Any devices with specific settings defined in the config file will use those instead of autodetect. The autodetect messages are currently displayed even if the settings are overridden by the config file. This will be clarified in a future release by either hiding the autodetect message or changing the message to indicate that the user-defined settings were used.

Quote from: Diapolo on February 07, 2012, 10:47:42 AM

Another thing I don't understand is, why getDevice() and autodetect() reside in kernels\opencl\__init__.py I dislike the idea, that these functions are derived from there, because opencl is simply another kernel folder. I think they should be placed somewhere else (have no good idea currently, but perhaps in PhoenixCore.py).

The reason we have the device detection code at the kernel level is so that it can support any type of device. For example, the current FPGA miners don't have a standard API, which makes including this functionality in the Phoenix core a bad idea. We would have to add support for new devices into the Phoenix core. By doing these functions at the kernel level, it allows other developers to support new hardware with no changes to Phoenix itself.

Quote from: Diapolo on February 07, 2012, 10:47:42 AM

The supplied phatk2 version uses stuff in opencl\__init__.py, too via "opencl = sys.modules['opencl']", which seems sort of not ideal. I think every kernel should specify his own options and stuff, even if they are the same. Your idea was perhaps to edit only one place, if you add new changes, but for addon kernels like diakgcn I really have to specify my own options, which I would promote as a rule for all supplied or addon kernels to be better structured and to be independend of the opencl kernel folder. What do you think?

Using functions from opencl for other kernels isn't required. This is simply how we decided to implement the supplied version of phatk2. Kernels DO NOT need to be implemented in this way.

Quote from: Diapolo on February 07, 2012, 10:47:42 AM

Another small change I would suggest for analyzeDevice() is your CPU detection code, which could be replaced with:

Code:

# Check if the device is a CPU
if device.get_info(cl.device_info.TYPE) == cl.device_type.CPU:
return (1, {'name': name, 'aggression': 0}, [devid, 'cpu:0'])

Thanks for this code, I will modify opencl/phatk2 to use this method of detecting CPUs.

ssateneth

legendary

Activity: 1344

Merit: 1004

Quote from: Schwede65 on February 07, 2012, 12:44:43 PM

great work... just tested 1.75... and now only one process for all gpu's

Question of setting the back-up-pool:

beginning of phoenix.cfg:

[general]
verbose = True
autodetect = +cl -cpu
backend = http: 123:[email protected]:8332/ # URL format is exactly as it was in Phoenix 1
backup = http: 456:[email protected]:8332/

is that correctly done?

there is no backup pool support (atm)

d3m0n1q_733rz

sr. member

Activity: 378

Merit: 250

Think you could add the ability to use plugins? That might shut-up some of the people wanting more functions like overclocking, slowing hash rates based on core temps, restarting of crashed GPUs, pausing GPUs, etc. by letting them code things their self. Just push P for plugins and start configuring away or access the config file for the plugin directly to change settings. Seems like a good solution.

Schwede65

sr. member

Activity: 309

Merit: 250

great work... just tested 1.75... and now only one process for all gpu's

Question of setting the back-up-pool:

beginning of phoenix.cfg:

[general]
verbose = True
autodetect = +cl -cpu
backend = http: 123:[email protected]:8332/ # URL format is exactly as it was in Phoenix 1
backup = http: 456:[email protected]:8332/

is that correctly done?

Diapolo

hero member

Activity: 772

Merit: 500

Code:

[general]
autodetect = +cl
backend = XYZ
verbose = true

[cl:0:0]
kernel = diakgcn
aggression = 12
vectors2 = true
vectors4 = false
vectors8 = false
worksize = 256

[cl:0:1]
disabled = true

[cl:0:2]
disabled = true

[web]
disabled = true

Above config generates "Detected [cl:0:0]: [Tahiti 0] using opencl (rating 2)", which I don't understand. Shouldn't it simply use the kernel specified and tell that autodetect had been overridden by own settings. Another thing I don't understand is, why getDevice() and autodetect() reside in kernels\opencl\__init__.py I dislike the idea, that these functions are derived from there, because opencl is simply another kernel folder. I think they should be placed somewhere else (have no good idea currently, but perhaps in PhoenixCore.py).

The supplied phatk2 version uses stuff in opencl\__init__.py, too via "opencl = sys.modules['opencl']", which seems sort of not ideal. I think every kernel should specify his own options and stuff, even if they are the same. Your idea was perhaps to edit only one place, if you add new changes, but for addon kernels like diakgcn I really have to specify my own options, which I would promote as a rule for all supplied or addon kernels to be better structured and to be independend of the opencl kernel folder. What do you think?

Another small change I would suggest for analyzeDevice() is your CPU detection code, which could be replaced with:

Code:

# Check if the device is a CPU
if device.get_info(cl.device_info.TYPE) == cl.device_type.CPU:
return (1, {'name': name, 'aggression': 0}, [devid, 'cpu:0'])

Dia

HendrikJan

member

Activity: 64

Merit: 10

Quote from: CFSworks on February 07, 2012, 12:11:55 AM

Adding OC and fan management is very tempting, I must admit. A thought occurs to me: What if we renamed the "kernels" directory to "modules" or the like, and allowed non-kernel modules to be loaded into Phoenix as well? This allows a third party to easily develop a complete GPU management subsystem that integrates into Phoenix while still keeping the core slim and fast for those who prefer to do OC themselves.

It could be a nice way to get the best of both worlds.

But i still get less Mh/s with this 2.0 version.
What could be the difference?
Should i get the same result or does this version still need some tweaking?

echris1

full member

Activity: 125

Merit: 100

Just switched all my miners to this after a drastic drop in cgminer performance (something to do with SDK, not sure)

Working great on my 2x6770, 6850 and 5770, back at max after a bit of tweaking.

I was just wondering how to add a backup pool to the conf file?

Keep up the good work!

wind

member

Activity: 125

Merit: 10

is it possible to add commandline parameters like phoenix 1.x does?

lodcrappo

hero member

Activity: 616

Merit: 506

Quote from: CFSworks on February 07, 2012, 12:11:55 AM

Adding OC and fan management is very tempting, I must admit. A thought occurs to me: What if we renamed the "kernels" directory to "modules" or the like, and allowed non-kernel modules to be loaded into Phoenix as well? This allows a third party to easily develop a complete GPU management subsystem that integrates into Phoenix while still keeping the core slim and fast for those who prefer to do OC themselves.

as the developer of a popular mining farm management system that uses "best of breed" tools for each function, including phoenix for the mining client part, I think this is a great idea. Currently we have to hack the management code into each phoenix release, which isn't a big deal but having a standard way to interface would be much nicer.

CFSworks

member

Activity: 63

Merit: 10

Quote from: bulanula on February 06, 2012, 06:28:32 AM

Will this include automatic fan and gpu management like cgminer ?

What about killing mining thread on GPU that has a dead fan while I am away ?

While both of these features are really nice, there is no easy way to interface with these functions on the card through OpenCL. We can do it, and I've asked jedi95 about it. His stance on the issue is pretty much what lodcrappo said:

Quote from: lodcrappo on February 06, 2012, 03:40:02 PM

please no. don't turn phoenix into the monstrosity of some other miners. it's simplicity is it's beauty.

do one thing, and do it well.

when you throw everything possible into one program, you end up with too many compromises.

So, we're definitely not adding GPU management features to the Phoenix 2 core.

Quote from: bulanula on February 06, 2012, 06:28:32 AM

IMHO cgminer is good because of auto fan feature and support for backup pools and screw all that python BS.

Thank you !

It's important to keep in mind that cgminer and Phoenix are alternatives, not competitors. We're both on the same side here. If you like cgminer better, please, use it!

Phoenix's purpose is to do things differently for those that don't prefer the way cgminer operates. (And, Python BS? I'm assuming you mean the dependencies you have to install to get Phoenix operational, since I don't see how choice of language affects the end-user, especially when the core was written with speed in mind.)

Quote from: bulanula on February 06, 2012, 03:49:20 PM

I 100% agree and support this decision. Remember the KISS principle. If you need OC stick to AMD API thing.

Adding OC and fan management is very tempting, I must admit. A thought occurs to me: What if we renamed the "kernels" directory to "modules" or the like, and allowed non-kernel modules to be loaded into Phoenix as well? This allows a third party to easily develop a complete GPU management subsystem that integrates into Phoenix while still keeping the core slim and fast for those who prefer to do OC themselves.

d3m0n1q_733rz

sr. member

Activity: 378

Merit: 250

Bananington

sr. member

Activity: 1470

Merit: 428

How high can I turn up the aggression? Is 12 the max?

Bananington

sr. member

Activity: 1470

Merit: 428

Quote from: CFSworks on February 06, 2012, 12:39:05 AM

Quote from: d3m0n1q_733rz on February 06, 2012, 12:21:50 AM

Could we get a list of all accepted options?
Kernel selection
Worksize=
Vectors, Vectors4, Vectors8, Vectors16
etc.

The Phoenix 2 core itself recognizes (not an exhaustive list):

Code:

# [cl:0:0] autoconfigure # Allow the kernel to choose its own configuration variables?
# [cl:0:0] disabled # Prevent mining on this device?
# [cl:0:0] kernel # Can be used to manually choose a kernel to mine on.
# [cl:0:0] start_undetected # Start even if autodetect doesn't find it?
# [general] autodetect # A list of rules for what devices to autodetect.
# [general] backend # The URL to mine on in the backend.
# [general] logfile # Set this option to log to a file.
# [general] queuedelay # Advanced - ???
# [general] queuesize # Advanced - ???
# [general] ratesamples # Advanced - number of samples to average for rate reporting
# [general] statusinterval # Advanced - how long to delay between statusbar updates
# [general] verbose # Enable verbose mode? (Shows debug messages)
# [web] bind # Bind the web/RPC server to a specific IP
# [web] disabled # Disable the webserver altogether?
# [web] logbuffer # Advanced - how many logs to remember in the getlogs() RPC call
# [web] password # The password necessary for web/RPC login (username is ignored)
# [web] port # What port should the web/RPC server listen on?
# [web] root # Advanced - The root directory for the webserver.

The phatk2/opencl kernels recognize:

Code:

# [cl:0:0] vectors # Enable two-way vectors?
# [cl:0:0] vectors4 # Enable four-way vectors?
# [cl:0:0] fastloop # Advanced - fastloop optimization for low aggressions
# [cl:0:0] aggression # Controls how hard Phoenix 2 hits the hardware
# [cl:0:0] worksize # Advanced - controls size of individual executions
# [cl:0:0] bfi_int # Enable BFI_INT optimization for Radeon cards that support it

Quote from: d3m0n1q_733rz on February 06, 2012, 12:37:11 AM

[cl:0:0]
autoconfigure = False
BFI_INT VECTORS4 WORKSIZE=64

What's wrong with this portion that it's not giving me the correct settings?

Try:

Code:

[cl:0:0]
autoconfigure = false # Not actually needed since autoconfiguration disables by default when you supply your own args
BFI_INT = true # Also the boolean options aren't case sensitive.
VECTORS4 = true
WORKSIZE = 64

I believe this should be edited into the first post for the late joiners.

d3m0n1q_733rz

sr. member

Activity: 378

Merit: 250

Alright, I have a register spill somewhere in here...can someone find it for me? I'm using 8 vectors to make better use of the 16 available to the HD79xx cards. But the code itself isn't made to handle it. I just can't figure out what part needs to be changed to make it capable. Embarrassed

Code:

// This file is in the public domain

#ifdef VECTORS8 
	typedef uint8 u;
#elif defined VECTORS4
	typedef uint4 u;
#elif defined VECTORS
		typedef uint2 u;
	#else
		typedef uint u;
#endif

__constant uint K[64] = { 
	0x428a2f98, 0x71374491, 0xb5c0fbcf, 0xe9b5dba5, 0x3956c25b, 0x59f111f1, 0x923f82a4, 0xab1c5ed5,
	0xd807aa98, 0x12835b01, 0x243185be, 0x550c7dc3, 0x72be5d74, 0x80deb1fe, 0x9bdc06a7, 0xc19bf174,
	0xe49b69c1, 0xefbe4786, 0x0fc19dc6, 0x240ca1cc, 0x2de92c6f, 0x4a7484aa, 0x5cb0a9dc, 0x76f988da,
	0x983e5152, 0xa831c66d, 0xb00327c8, 0xbf597fc7, 0xc6e00bf3, 0xd5a79147, 0x06ca6351, 0x14292967,
	0x27b70a85, 0x2e1b2138, 0x4d2c6dfc, 0x53380d13, 0x650a7354, 0x766a0abb, 0x81c2c92e, 0x92722c85,
	0xa2bfe8a1, 0xa81a664b, 0xc24b8b70, 0xc76c51a3, 0xd192e819, 0xd6990624, 0xf40e3585, 0x106aa070,
	0x19a4c116, 0x1e376c08, 0x2748774c, 0x34b0bcb5, 0x391c0cb3, 0x4ed8aa4a, 0x5b9cca4f, 0x682e6ff3,
	0x748f82ee, 0x78a5636f, 0x84c87814, 0x8cc70208, 0x90befffa, 0xa4506ceb, 0xbef9a3f7, 0xc67178f2
};

__constant uint ConstW[128] = {
0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x80000000U, 0x00000000, 0x00000000, 0x00000000,
0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000280U,
0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000,
0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000,
0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000,
0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000,
0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000,
0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000,

0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000,
0x80000000U, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000100U,
0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000,
0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000,
0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000,
0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000,
0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000,
0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000
};

__constant uint H[8] = { 
	0x6a09e667, 0xbb67ae85, 0x3c6ef372, 0xa54ff53a, 0x510e527f, 0x9b05688c, 0x1f83d9ab, 0x5be0cd19
};

#ifdef BITALIGN
	#pragma OPENCL EXTENSION cl_amd_media_ops : enable
	#define rot(x, y) amd_bitalign(x, x, (uint)(32 - y))
#else
	#define rot(x, y) rotate(x, (uint)y)
#endif

// Some AMD devices have the BFI_INT opcode, which behaves exactly like the
// SHA-256 Ch function, but provides it in exactly one instruction. If
// detected, use it for Ch. Otherwise, use bitselect() for Ch.

#ifdef BFI_INT
	// Well, slight problem... It turns out BFI_INT isn't actually exposed to
	// OpenCL (or CAL IL for that matter) in any way. However, there is 
	// a similar instruction, BYTE_ALIGN_INT, which is exposed to OpenCL via
	// amd_bytealign, takes the same inputs, and provides the same output. 
	// We can use that as a placeholder for BFI_INT and have the application 
	// patch it after compilation.

	// This is the BFI_INT function
	#define Ch(x, y, z) amd_bytealign(x,y,z)
	// Ma can also be implemented in terms of BFI_INT...
	#define Ma(z, x, y) amd_bytealign(z^x,y,x)
#else
	#define Ch(x, y, z) bitselect(z,y,x)
	#define Ma(x, y, z) bitselect(x,y,(z^x))
#endif

//Various intermediate calculations for each SHA round
#define s0(n) (S0(Vals[(128 - (n)) % 8]))
#define S0(n) (rot(n, 30u)^rot(n, 19u)^rot(n,10u))

#define s1(n) (S1(Vals[(132 - (n)) % 8]))
#define S1(n) (rot(n, 26u)^rot(n, 21u)^rot(n, 7u))

#define ch(n) Ch(Vals[(132 - (n)) % 8],Vals[(133 - (n)) % 8],Vals[(134 - (n)) % 8])
#define maj(n) Ma(Vals[(129 - (n)) % 8],Vals[(130 - (n)) % 8],Vals[(128 - (n)) % 8])

//t1 calc when W is already calculated
#define t1(n) K[(n) % 64] + Vals[(135 - (n)) % 8] +  W[(n)] + s1(n) + ch(n) 

//t1 calc which calculates W
#define t1W(n) K[(n) % 64] + Vals[(135 - (n)) % 8] +  W(n) + s1(n) + ch(n)

//Used for constant W Values (the compiler optimizes out zeros)
#define t1C(n) (K[(n) % 64]+ ConstW[(n)]) + Vals[(135 - (n)) % 8] + s1(n) + ch(n)

//t2 Calc
#define t2(n)  maj(n) + s0(n)

#define rotC(x,n) (x<> (32-n))

//W calculation used for SHA round
#define W(n) (W[n] = P4(n) + P3(n) + P2(n) + P1(n))


//Partial W calculations (used for the begining where only some values are nonzero)
#define P1(n) ((rot(W[(n)-2],15u)^rot(W[(n)-2],13u)^((W[(n)-2])>>10U)))
#define P2(n) ((rot(W[(n)-15],25u)^rot(W[(n)-15],14u)^((W[(n)-15])>>3U)))
#define p1(x) ((rot(x,15u)^rot(x,13u)^((x)>>10U)))
#define p2(x) ((rot(x,25u)^rot(x,14u)^((x)>>3U)))
#define P3(n)  W[n-7]
#define P4(n)  W[n-16]

//Partial Calcs for constant W values
#define P1C(n) ((rotC(ConstW[(n)-2],15)^rotC(ConstW[(n)-2],13)^((ConstW[(n)-2])>>10U)))
#define P2C(n) ((rotC(ConstW[(n)-15],25)^rotC(ConstW[(n)-15],14)^((ConstW[(n)-15])>>3U)))
#define P3C(x)  ConstW[x-7]
#define P4C(x)  ConstW[x-16]

//SHA round with built in W calc
#define sharoundW(n) Barrier1(n);  Vals[(131 - (n)) % 8] += t1W(n); Vals[(135 - (n)) % 8] = t1W(n) + t2(n);  

//SHA round without W calc
#define sharound(n)  Barrier2(n); Vals[(131 - (n)) % 8] += t1(n); Vals[(135 - (n)) % 8] = t1(n) + t2(n);

//SHA round for constant W values
#define sharoundC(n)  Barrier2(n); Vals[(131 - (n)) % 8] += t1C(n); Vals[(135 - (n)) % 8] = t1C(n) + t2(n);

//The compiler is stupid... I put this in there only to stop the compiler from (de)optimizing the order
#define Barrier1(n) t1 = t1C((n+1))
#define Barrier2(n) t1 = t1C((n))

__kernel
//removed this to allow detection of invalid work size
//__attribute__((reqd_work_group_size(WORKSIZE, 1, 1)))
void search(	const uint state0, const uint state1, const uint state2, const uint state3,
						const uint state4, const uint state5, const uint state6, const uint state7,
						const uint B1, const uint C1, const uint D1,
						const uint F1, const uint G1, const uint H1,
						const u base,
						const uint W16, const uint W17,
						const uint PreVal4, const uint PreVal0,
						const uint PreW31, const uint PreW32,
						const uint PreW19, const uint PreW20,
						__global uint * output)
{

	u W[124];
	u Vals[8];

//Dummy Variable to prevent compiler from reordering between rounds
	u t1;

	W[16] = W16;
	W[17] = W17;

#ifdef VECTORS8
	
//Modified from VECTORS4
	W[3] = base + (uint)(get_local_id(0)) * 8u + (uint)(get_group_id(0)) * (WORKSIZE * 8u);
	uint r = rot(W[3].s0,25u)^rot(W[3].s0,14u)^((W[3].s0)>>3U);
	W[18] = PreW20 + (u){r, r ^ 0x2004000U, r ^ 0x4008000U, r ^ 0x600C000U, r ^ 0x8010000U, r ^ 0xA050000U, r ^ 0xC090000U, r ^ 0xE0D0000U};

#elif defined VECTORS4
	//Less dependencies to get both the local id and group id and then add them
	W[3] = base + (uint)(get_local_id(0)) * 4u + (uint)(get_group_id(0)) * (WORKSIZE * 4u);
	uint r = rot(W[3].s0,25u)^rot(W[3].s0,14u)^((W[3].s0)>>3U);
	//Since only the 2 LSB is opposite between the nonces, we can save an instruction by flipping the 4 bits in W18 rather than the 1 bit in W3
	W[18] = PreW20 + (u){r, r ^ 0x2004000U, r ^ 0x4008000U, r ^ 0x600C000U};

#elif defined VECTORS
		W[3] = base + (uint)(get_local_id(0)) * 2u + (uint)(get_group_id(0)) * (WORKSIZE * 2u);
		uint r = rot(W[3].s0,25u)^rot(W[3].s0,14u)^((W[3].s0)>>3U);
		W[18] = PreW20 + (u){r, r ^ 0x2004000U};

#else
		W[3] = base + get_local_id(0) + get_group_id(0) * (WORKSIZE);
		u r = rot(W[3],25u)^rot(W[3],14u)^((W[3])>>3U);
		W[18] = PreW20 + r;
#endif

	//the order of the W calcs and Rounds is like this because the compiler needs help finding how to order the instructions


	//Vals[0]=state0;
	Vals[0] = PreVal0 + W[3];
	Vals[1]=B1;
	Vals[2]=C1;
	Vals[3]=D1;
	//Vals[4]=PreVal4;
	Vals[4] = PreVal4 + W[3];
	Vals[5]=F1;
	Vals[6]=G1;
	Vals[7]=H1;

	sharoundC(4);
	W[19] = PreW19 + W[3];
	sharoundC(5);
	W[20] = P1(20) + P4C(20);
	sharoundC(6);
	W[21] = P1(21);
	sharoundC(7);
	W[22] = P1(22) + P3C(22);
	sharoundC(8);
	W[23] = W[16] + P1(23);
	sharoundC(9);
	W[24] = W[17] + P1(24);
	sharoundC(10);
	W[25] = P1(25) + P3(25);
	W[26] = P1(26) + P3(26);
	sharoundC(11);
	W[27] = P1(27) + P3(27);
	W[28] = P1(28) + P3(28);
	sharoundC(12);
	W[29] = P1(29) + P3(29);
	sharoundC(13);
	W[30] = P1(30) + P2C(30) + P3(30);
	W[31] = P1(31) + P3(31) + PreW31;
	sharoundC(14);
	W[32] = P1(32) + P3(32) + PreW32;
	sharoundC(15);
	sharound(16);
	sharound(17);
	sharound(18);
	sharound(19);
	sharound(20);
	sharound(21);
	sharound(22);
	sharound(23);
	sharound(24);
	sharound(25);
	sharound(26);
	sharound(27);
	sharound(28);
	sharound(29);
	sharound(30);
	sharound(31);
	sharound(32);
	sharoundW(33);
	sharoundW(34);
	sharoundW(35);
	sharoundW(36);
	sharoundW(37);
	sharoundW(38);
	sharoundW(39);
	sharoundW(40);
	sharoundW(41);
	sharoundW(42);
	sharoundW(43);
	sharoundW(44);
	sharoundW(45);
	sharoundW(46);
	sharoundW(47);
	sharoundW(48);
	sharoundW(49);
	sharoundW(50);
	sharoundW(51);
	sharoundW(52);
	sharoundW(53);
	sharoundW(54);
	sharoundW(55);
	sharoundW(56);
	sharoundW(57);
	sharoundW(58);
	sharoundW(59);
	sharoundW(60);
	sharoundW(61);
	sharoundW(62);
	sharoundW(63);

	W[64]=state0+Vals[0];
	W[65]=state1+Vals[1];
	W[66]=state2+Vals[2];
	W[67]=state3+Vals[3];
	W[68]=state4+Vals[4];
	W[69]=state5+Vals[5];
	W[70]=state6+Vals[6];
	W[71]=state7+Vals[7];

	Vals[0]=H[0];
	Vals[1]=H[1];
	Vals[2]=H[2];
//	Vals[3]=H[3];
	Vals[3] = 0xa54ff53aU + (0xb0edbdd0U + K[0]) +  W[64];
	Vals[4]=H[4];
	Vals[5]=H[5];
	Vals[6]=H[6];
//	Vals[7]=H[7];
	Vals[7] = 0x08909ae5U + (0xb0edbdd0U + K[0]) +  W[64];

//const u Temp = (0xb0edbdd0U + K[0]) +  W[64];
	
	

//#define P124(n) P1(n) + P2(n) + P4(n)

	W[80] = + P2(80) + P4(80);
	sharound(65);
	W[81] = P1C(81) + P2(81) + P4(81);
	sharound(66);
	W[82] = P1(82) + P2(82) + P4(82);
	sharound(67);
	W[83] = P1(83) + P2(83) + P4(83);
	sharound(68);
	W[84] = P1(84) + P2(84) + P4(84);
	sharound(69);
	W[85] = P1(85) + P2(85) + P4(85);
	sharound(70);
	W[86] = P1(86) + P2(86) + P3C(86) + P4(86);
	sharound(71);
	W[87] = P1(87) + P2C(87) + P3(87) + P4(87);
	sharoundC(72);
	W[88] =   P1(88) + P3(88) + P4C(88);
	sharoundC(73);
	W[89] = P1(89) + P3(89);
	sharoundC(74);
	W[90] = P1(90) + P3(90);
	sharoundC(75);
	W[91] = P1(91) + P3(91);
	sharoundC(76);
	W[92] = P1(92) + P3(92);
	sharoundC(77);
	W[93] = P1(93) + P3(93);
	W[94] = P1(94) + P2C(94) + P3(94);
	sharoundC(78);
	W[95] = P1(95) + P2(95) + P3(95) + P4C(95);
	sharoundC(79);
	sharound(80);
	sharound(81);
	sharound(82);
	sharound(83);
	sharound(84);
	sharound(85);
	sharound(86);
	sharound(87);
	sharound(88);
	sharound(89);
	sharound(90);
	sharound(91);
	sharound(92);
	sharound(93);
	sharound(94);
	sharound(95);
	sharoundW(96);
	sharoundW(97);
	sharoundW(98);
	sharoundW(99);
	sharoundW(100);
	sharoundW(101);
	sharoundW(102);
	sharoundW(103);
	sharoundW(104);
	sharoundW(105);
	sharoundW(106);
	sharoundW(107);
	sharoundW(108);
	sharoundW(109);
	sharoundW(110);
	sharoundW(111);
	sharoundW(112);
	sharoundW(113);
	sharoundW(114);
	sharoundW(115);
	sharoundW(116);
	sharoundW(117);
	sharoundW(118);
	sharoundW(119);
	sharoundW(120);
	sharoundW(121);
	sharoundW(122);

	u v = W[117] + W[108] + Vals[3] + Vals[7] + P2(124) + P1(124) + Ch((Vals[0] + Vals[4]) + (K[59] + W(123)) + s1(123)+ ch(123),Vals[1],Vals[2]); 
	u g = -(K[60] + H[7]) - S1((Vals[0] + Vals[4]) + (K[59] + W(123))  + s1(123)+ ch(123));

	uint nonce = 0;

#ifdef VECTORS8
	if (v.s0 == g.s0)
	{
		nonce = W[3].s0;
	}
	if (v.s1 == g.s1)
	{
		nonce = W[3].s1;
	}
	if (v.s2 == g.s2)
	{
		nonce = W[3].s2;
	}
	if (v.s3 == g.s3)
	{
		nonce = W[3].s3;
	}
	if (v.s4 == g.s4)
	{
		nonce = W[3].s4;
	}
	if (v.s5 == g.s5)
	{
		nonce = W[3].s5;
	}
	if (v.s6 == g.s6)
	{
		nonce = W[3].s6;
	}
	if (v.s7 == g.s7)
	{
		nonce = W[3].s7;
	}
#elif defined VECTORS4
	if (v.s0 == g.s0)
	{
		nonce = W[3].s0;
	}
	if (v.s1 == g.s1)
	{
		nonce = W[3].s1;
	}
	if (v.s2 == g.s2)
	{
		nonce = W[3].s2;
	}
	if (v.s3 == g.s3)
	{
		nonce = W[3].s3;
	}
#elif defined VECTORS
		if (v.s0 == g.s0)
		{
			nonce = W[3].s0;
		}
		if (v.s1 == g.s1)
		{
			nonce = W[3].s1;
		}
	#else
		if (v == g)
		{
			nonce = W[3];
		}
#endif
	if(nonce)
	{
		//Faster to shift the nonce by 2 due to 4-DWORD addressing and does not add more collisions
		output[WORKSIZE] = nonce;
		output[get_local_id(0)] = nonce;
	}
}

Barlog

member

Activity: 65

Merit: 10

You will need new address for donations aka 2PHoenix Cheesy

HendrikJan

member

Activity: 64

Merit: 10

I did try the diakgcn kernel.
Still about 10Mh/s less then older version.

Also would like to know what the "Rolling time" is.

bulanula

hero member

Activity: 518

Merit: 500

Quote from: lodcrappo on February 06, 2012, 03:40:02 PM

Quote from: ZPK on February 06, 2012, 05:38:38 AM

please add to phoenix overclocking support...

please no. don't turn phoenix into the monstrosity of some other miners. it's simplicity is it's beauty.

do one thing, and do it well.

when you throw everything possible into one program, you end up with too many compromises.

I 100% agree and support this decision. Remember the KISS principle. If you need OC stick to AMD API thing.

lodcrappo

hero member

Activity: 616

Merit: 506

Quote from: ZPK on February 06, 2012, 05:38:38 AM

please add to phoenix overclocking support...

please no. don't turn phoenix into the monstrosity of some other miners. it's simplicity is it's beauty.

do one thing, and do it well.

when you throw everything possible into one program, you end up with too many compromises.

Topic: Phoenix 2 beta discussion - page 6. (Read 58097 times)