Pages:
Author

Topic: DiabloMiner GPU Miner - page 40. (Read 866674 times)

member
Activity: 70
Merit: 10
September 08, 2011, 04:27:17 AM
Hi Diablo,
I have a Radeon 6670 running on Win7-64bit and after i upgraded the Catalyst to the last version (11.8 from 11.6) my hardware errors reported are between 20% and 25% and the CPU usage is 40% to 50% (before was under 10%).

Any idea what should i do or where i should look for some info?

Thanks.
edit: forgot to mention that i use the default configuration (launched the .exe using only with the -o, -r, -u and -p options).
legendary
Activity: 1162
Merit: 1000
DiabloMiner author
September 04, 2011, 07:40:53 AM
Phateus posted this graph:

does that look like 316 is the fastest? no, I'm pretty sure 410 is faster (vectors 2, worksize 256) right after the dip in speeds

and it obviously doesn't matter that much whether you're running 300ish or 400ish clocks according to the graph

Huh, I wonder what hes using for vectors, I assume he means uint4 = V4, etc. That graph is very interesting, it highlights the register spillover problem in the phatk design quite nicely.

I also wonder what card that is.

5870 overclocked, and v4 is indeed uint4

Those numbers might not be entirely valid then. (Some?) 1200mhz cards do not seem to have the same timing as 1000mhz cards, so 1/4th might work better. On my 5850, the peak seems to be around 1/3rd instead, and on some 5870s from what I've heard its still 1/3rd.
hero member
Activity: 658
Merit: 500
September 04, 2011, 07:20:43 AM
Phateus posted this graph:

does that look like 316 is the fastest? no, I'm pretty sure 410 is faster (vectors 2, worksize 256) right after the dip in speeds

and it obviously doesn't matter that much whether you're running 300ish or 400ish clocks according to the graph

Huh, I wonder what hes using for vectors, I assume he means uint4 = V4, etc. That graph is very interesting, it highlights the register spillover problem in the phatk design quite nicely.

I also wonder what card that is.

5870 overclocked, and v4 is indeed uint4
legendary
Activity: 1162
Merit: 1000
DiabloMiner author
September 02, 2011, 11:50:19 AM
Phateus posted this graph:

does that look like 316 is the fastest? no, I'm pretty sure 410 is faster (vectors 2, worksize 256) right after the dip in speeds

and it obviously doesn't matter that much whether you're running 300ish or 400ish clocks according to the graph

Huh, I wonder what hes using for vectors, I assume he means uint4 = V4, etc. That graph is very interesting, it highlights the register spillover problem in the phatk design quite nicely.

I also wonder what card that is.
hero member
Activity: 658
Merit: 500
September 02, 2011, 10:06:41 AM
Phateus posted this graph:

does that look like 316 is the fastest? no, I'm pretty sure 410 is faster (vectors 2, worksize 256) right after the dip in speeds

and it obviously doesn't matter that much whether you're running 300ish or 400ish clocks according to the graph
legendary
Activity: 1162
Merit: 1000
DiabloMiner author
August 27, 2011, 12:09:59 AM
Update: Make kernel arrays an option, default to off, use -a to turn on.

This should help users that had a speed decrease after introducing phatk-like arrays, such as OSX and Nvidia and SDK 2.1 users.
legendary
Activity: 1162
Merit: 1000
DiabloMiner author
August 24, 2011, 06:20:53 PM
As I just said to iopq, some kernels require more than others. Try 1/3rd, it will probably bring back your missing hashes.

Diablo,

with cores 860,287 and 980,327 and -v 18 I now get 795-797 so it is 3-4 MHs faster than the previous version.

Thanks a lot.

spiccioli

btw, what makes a kernel depend upon memory speed?

I thought that there is no (or very very little) video memory use in hashing the bitcoin chain and, up until now, I always did lower memory clock as much I could to lower energy consumption.
Well, it's not such a bad idea to keep the algorithm in memory to unroll back to the GPUs once the unroll is used up.  It should be faster than referring to the system memory.  Of course, if it could be unrolled straight from the GPU back to the GPU once the unrolls nearly reach their end (1 unroll away), it would be a lot better.  But that would involve holding the entire code in a register or so and somehow converting it which doesn't seem all that possible.

The program (the kernel) is kept loaded in graphics memory, but the compute units dump the program when it switches to something else (EVERYTHING is a program, even rendering boring 2D desktop shit).

Radeons have multiple levels of graphics memory, the memory clock just controls the actual GDDR5 RAM chips (ie, the "lowest" level as far as OpenCL is concerned). Kernel arguments and constants are stored in constant RAM (which for all intents and purposes are as fast as registers), and then theres scratch RAM that belongs to the CU which can be used to backfill register overflow (which isn't controlled by the memory clock, but seems to synchronize timings in some way). There are also multiple levels of caches for the CU and the texture processing units.
legendary
Activity: 1162
Merit: 1000
DiabloMiner author
August 24, 2011, 06:12:29 PM
As I just said to iopq, some kernels require more than others. Try 1/3rd, it will probably bring back your missing hashes.

Diablo,

with cores 860,287 and 980,327 and -v 18 I now get 795-797 so it is 3-4 MHs faster than the previous version.

Thanks a lot.

spiccioli

btw, what makes a kernel depend upon memory speed?

I thought that there is no (or very very little) video memory use in hashing the bitcoin chain and, up until now, I always did lower memory clock as much I could to lower energy consumption.

You have a limited number of registers, and the drivers build programs that swap unused registers in and out as needed. I use far less registers than phatk, but it also nails memory timing harder, but goes faster as a result since less registers get swapped out.
sr. member
Activity: 378
Merit: 250
August 24, 2011, 05:43:45 PM
As I just said to iopq, some kernels require more than others. Try 1/3rd, it will probably bring back your missing hashes.

Diablo,

with cores 860,287 and 980,327 and -v 18 I now get 795-797 so it is 3-4 MHs faster than the previous version.

Thanks a lot.

spiccioli

btw, what makes a kernel depend upon memory speed?

I thought that there is no (or very very little) video memory use in hashing the bitcoin chain and, up until now, I always did lower memory clock as much I could to lower energy consumption.
Well, it's not such a bad idea to keep the algorithm in memory to unroll back to the GPUs once the unroll is used up.  It should be faster than referring to the system memory.  Of course, if it could be unrolled straight from the GPU back to the GPU once the unrolls nearly reach their end (1 unroll away), it would be a lot better.  But that would involve holding the entire code in a register or so and somehow converting it which doesn't seem all that possible.
legendary
Activity: 1379
Merit: 1003
nec sine labore
August 24, 2011, 04:34:24 PM
As I just said to iopq, some kernels require more than others. Try 1/3rd, it will probably bring back your missing hashes.

Diablo,

with cores 860,287 and 980,327 and -v 18 I now get 795-797 so it is 3-4 MHs faster than the previous version.

Thanks a lot.

spiccioli

btw, what makes a kernel depend upon memory speed?

I thought that there is no (or very very little) video memory use in hashing the bitcoin chain and, up until now, I always did lower memory clock as much I could to lower energy consumption.
hero member
Activity: 658
Merit: 500
August 24, 2011, 04:02:22 PM
No, it isn't a guideline. 1/3rd core clock for memory clock sits in a zone that on most Radeon 5xxxes it hits the stock memory timings correctly and incurs no speed loss for applications that don't rely on memory bandwidth.

If you're too low or too high, you incur a speed loss or sometimes the card just locks up.

Some kernels require better compliance with this than others.

except it is a guideline, because my 5750 is not stable with memory at 233 mhz
my 5850 card is faster as slightly more than 1/3, its core clock is is 725 and 275 is faster than both 242 and 300

you can blame the kernel, but phatk 2.2 is the fastest kernel on both cards and those timings are the fastest timings in practice
legendary
Activity: 1162
Merit: 1000
DiabloMiner author
August 24, 2011, 02:57:26 PM
It seems my response got eaten.

]this is just a guideline
the fastest for me at 700 clock speed is 204 memory speed (205 gives artifacts, 210 hangs computer) with a 5750
on a 5850 the fastest with 725 clock speed is 275 memory speed, with 250 and 300 being slower

No, it isn't a guideline. 1/3rd core clock for memory clock sits in a zone that on most Radeon 5xxxes it hits the stock memory timings correctly and incurs no speed loss for applications that don't rely on memory bandwidth.

If you're too low or too high, you incur a speed loss or sometimes the card just locks up.

Some kernels require better compliance with this than others.

They are the lowest ones that don't slow down mining.

spiccioli.

As I just said to iopq, some kernels require more than others. Try 1/3rd, it will probably bring back your missing hashes.
legendary
Activity: 1379
Merit: 1003
nec sine labore
August 24, 2011, 11:52:06 AM
DiabloD3,

I've downloaded latest DiabloMiner, sadly it went from 793-795 MHs on a dual GPU (5850/5870) rig (860,260/980,280 as clocks) to 445-448 MHs!!

I start it with -v 19 -l url

best regards.

spiccioli


Try -v 2 or -v 18.


DiabloD3,

it is still slower

-v 18:  790/792 MHs
-v 2  :  789/792 MHs

best regards.

spiccioli


Btw, why are your memory clocks wrong? They should be 1/3rd of your core clock's speed.

They are the lowest ones that don't slow down mining.

spiccioli.
hero member
Activity: 658
Merit: 500
August 24, 2011, 11:37:18 AM
DiabloD3,

I've downloaded latest DiabloMiner, sadly it went from 793-795 MHs on a dual GPU (5850/5870) rig (860,260/980,280 as clocks) to 445-448 MHs!!

I start it with -v 19 -l url

best regards.

spiccioli


Try -v 2 or -v 18.


DiabloD3,

it is still slower

-v 18:  790/792 MHs
-v 2  :  789/792 MHs

best regards.

spiccioli


Btw, why are your memory clocks wrong? They should be 1/3rd of your core clock's speed.
this is just a guideline
the fastest for me at 700 clock speed is 204 memory speed (205 gives artifacts, 210 hangs computer) with a 5750
on a 5850 the fastest with 725 clock speed is 275 memory speed, with 250 and 300 being slower
legendary
Activity: 1162
Merit: 1000
DiabloMiner author
August 24, 2011, 10:26:36 AM
DiabloD3,

I've downloaded latest DiabloMiner, sadly it went from 793-795 MHs on a dual GPU (5850/5870) rig (860,260/980,280 as clocks) to 445-448 MHs!!

I start it with -v 19 -l url

best regards.

spiccioli


Try -v 2 or -v 18.


DiabloD3,

it is still slower

-v 18:  790/792 MHs
-v 2  :  789/792 MHs

best regards.

spiccioli


Btw, why are your memory clocks wrong? They should be 1/3rd of your core clock's speed.
legendary
Activity: 1379
Merit: 1003
nec sine labore
August 24, 2011, 03:04:48 AM
DiabloD3,

I've downloaded latest DiabloMiner, sadly it went from 793-795 MHs on a dual GPU (5850/5870) rig (860,260/980,280 as clocks) to 445-448 MHs!!

I start it with -v 19 -l url

best regards.

spiccioli


Try -v 2 or -v 18.


DiabloD3,

it is still slower

-v 18:  790/792 MHs
-v 2  :  789/792 MHs

best regards.

spiccioli
legendary
Activity: 1162
Merit: 1000
DiabloMiner author
August 23, 2011, 11:27:48 PM
DiabloD3,

I've downloaded latest DiabloMiner, sadly it went from 793-795 MHs on a dual GPU (5850/5870) rig (860,260/980,280 as clocks) to 445-448 MHs!!

I start it with -v 19 -l url

best regards.

spiccioli


Try -v 2 or -v 18.
legendary
Activity: 1379
Merit: 1003
nec sine labore
August 23, 2011, 09:40:22 AM
DiabloD3,

I've downloaded latest DiabloMiner, sadly it went from 793-795 MHs on a dual GPU (5850/5870) rig (860,260/980,280 as clocks) to 445-448 MHs!!

I start it with -v 19 -l url

best regards.

spiccioli
legendary
Activity: 1162
Merit: 1000
DiabloMiner author
August 22, 2011, 02:12:18 AM
OSX user here, new version is working steadily for 15 mins now, no more running wild behavior. I will keep it going for a day and see what happens, but I believe you fixed it, thanks! Do you use a variant of your original kernel again, or this this still phatk?

BTW I am still using Snow Leopard since I don't believe in 1.0 OS versions. Can anyone confirm the miner working on Lion?

It never really was pure phatk to begin with. I tried to phatk-arize the existing kernel, but it ended up causing more work and problems than it was worth. It is still phatk-ized, but in a way that properly uses the technique (something phateus himself doesn't yet).
What technique is that?  Oh!  And hey, what do you think of compressing the initial tables by utilizing the difference between the current and previous values in the table instead of the actual values?  I know that it might add a slight overhead on the addition/subtraction, but it would decrease the amount of memory that would need to be read before running the calculations and might allow for another unroll or two.

The phatk technique is... to use an array. Thats it.

Those arguments are loaded into constant memory, which is as fast as registers but is workgroup wide. You can't win that way.
member
Activity: 78
Merit: 10
August 21, 2011, 02:29:03 AM
Update: Increase speed 1.3% on SDK 2.1 and 0.2% on SDK 2.5, use Deque instead of AtomicReference for incoming new work

OSX users: Test to see if the new kernel fixes things or makes it faster.
Past versions have performed slightly better Mhash-wise. I actually saw a decrease in speed with this version, so I will probably use an older version since I never had the problems others have had with the older versions.
Pages:
Jump to: