DiabloMiner GPU Miner - page 40. | Bitcointalksearch.org

TheMalon

member

Activity: 70

Merit: 10

Hi Diablo,
I have a Radeon 6670 running on Win7-64bit and after i upgraded the Catalyst to the last version (11.8 from 11.6) my hardware errors reported are between 20% and 25% and the CPU usage is 40% to 50% (before was under 10%).

Any idea what should i do or where i should look for some info?

Thanks.
edit: forgot to mention that i use the default configuration (launched the .exe using only with the -o, -r, -u and -p options).

DiabloD3

legendary

Activity: 1162

Merit: 1000

DiabloMiner author

Quote from: iopq on September 04, 2011, 07:20:43 AM

Quote from: DiabloD3 on September 02, 2011, 11:50:19 AM

Quote from: iopq on September 02, 2011, 10:06:41 AM

Phateus posted this graph:

does that look like 316 is the fastest? no, I'm pretty sure 410 is faster (vectors 2, worksize 256) right after the dip in speeds

and it obviously doesn't matter that much whether you're running 300ish or 400ish clocks according to the graph

Huh, I wonder what hes using for vectors, I assume he means uint4 = V4, etc. That graph is very interesting, it highlights the register spillover problem in the phatk design quite nicely.

I also wonder what card that is.

5870 overclocked, and v4 is indeed uint4

Those numbers might not be entirely valid then. (Some?) 1200mhz cards do not seem to have the same timing as 1000mhz cards, so 1/4th might work better. On my 5850, the peak seems to be around 1/3rd instead, and on some 5870s from what I've heard its still 1/3rd.

iopq

hero member

Activity: 658

Merit: 500

Quote from: DiabloD3 on September 02, 2011, 11:50:19 AM

Quote from: iopq on September 02, 2011, 10:06:41 AM

Phateus posted this graph:

does that look like 316 is the fastest? no, I'm pretty sure 410 is faster (vectors 2, worksize 256) right after the dip in speeds

and it obviously doesn't matter that much whether you're running 300ish or 400ish clocks according to the graph

Huh, I wonder what hes using for vectors, I assume he means uint4 = V4, etc. That graph is very interesting, it highlights the register spillover problem in the phatk design quite nicely.

I also wonder what card that is.

5870 overclocked, and v4 is indeed uint4

DiabloD3

legendary

Activity: 1162

Merit: 1000

DiabloMiner author

Quote from: iopq on September 02, 2011, 10:06:41 AM

Phateus posted this graph:

does that look like 316 is the fastest? no, I'm pretty sure 410 is faster (vectors 2, worksize 256) right after the dip in speeds

and it obviously doesn't matter that much whether you're running 300ish or 400ish clocks according to the graph

Huh, I wonder what hes using for vectors, I assume he means uint4 = V4, etc. That graph is very interesting, it highlights the register spillover problem in the phatk design quite nicely.

I also wonder what card that is.

iopq

hero member

Activity: 658

Merit: 500

Phateus posted this graph:

does that look like 316 is the fastest? no, I'm pretty sure 410 is faster (vectors 2, worksize 256) right after the dip in speeds

and it obviously doesn't matter that much whether you're running 300ish or 400ish clocks according to the graph

DiabloD3

legendary

Activity: 1162

Merit: 1000

DiabloMiner author

Update: Make kernel arrays an option, default to off, use -a to turn on.

This should help users that had a speed decrease after introducing phatk-like arrays, such as OSX and Nvidia and SDK 2.1 users.

DiabloD3

legendary

Activity: 1162

Merit: 1000

DiabloMiner author

Quote from: d3m0n1q_733rz on August 24, 2011, 05:43:45 PM

Quote from: spiccioli on August 24, 2011, 04:34:24 PM

Quote from: DiabloD3 on August 24, 2011, 02:57:26 PM

As I just said to iopq, some kernels require more than others. Try 1/3rd, it will probably bring back your missing hashes.

Diablo,

with cores 860,287 and 980,327 and -v 18 I now get 795-797 so it is 3-4 MHs faster than the previous version.

Thanks a lot.

spiccioli

btw, what makes a kernel depend upon memory speed?

I thought that there is no (or very very little) video memory use in hashing the bitcoin chain and, up until now, I always did lower memory clock as much I could to lower energy consumption.

Well, it's not such a bad idea to keep the algorithm in memory to unroll back to the GPUs once the unroll is used up. It should be faster than referring to the system memory. Of course, if it could be unrolled straight from the GPU back to the GPU once the unrolls nearly reach their end (1 unroll away), it would be a lot better. But that would involve holding the entire code in a register or so and somehow converting it which doesn't seem all that possible.

The program (the kernel) is kept loaded in graphics memory, but the compute units dump the program when it switches to something else (EVERYTHING is a program, even rendering boring 2D desktop shit).

Radeons have multiple levels of graphics memory, the memory clock just controls the actual GDDR5 RAM chips (ie, the "lowest" level as far as OpenCL is concerned). Kernel arguments and constants are stored in constant RAM (which for all intents and purposes are as fast as registers), and then theres scratch RAM that belongs to the CU which can be used to backfill register overflow (which isn't controlled by the memory clock, but seems to synchronize timings in some way). There are also multiple levels of caches for the CU and the texture processing units.

DiabloD3

legendary

Activity: 1162

Merit: 1000

DiabloMiner author

Quote from: spiccioli on August 24, 2011, 04:34:24 PM

Quote from: DiabloD3 on August 24, 2011, 02:57:26 PM

As I just said to iopq, some kernels require more than others. Try 1/3rd, it will probably bring back your missing hashes.

Diablo,

with cores 860,287 and 980,327 and -v 18 I now get 795-797 so it is 3-4 MHs faster than the previous version.

Thanks a lot.

spiccioli

btw, what makes a kernel depend upon memory speed?

I thought that there is no (or very very little) video memory use in hashing the bitcoin chain and, up until now, I always did lower memory clock as much I could to lower energy consumption.

You have a limited number of registers, and the drivers build programs that swap unused registers in and out as needed. I use far less registers than phatk, but it also nails memory timing harder, but goes faster as a result since less registers get swapped out.

d3m0n1q_733rz

sr. member

Activity: 378

Merit: 250

Quote from: spiccioli on August 24, 2011, 04:34:24 PM

Quote from: DiabloD3 on August 24, 2011, 02:57:26 PM

As I just said to iopq, some kernels require more than others. Try 1/3rd, it will probably bring back your missing hashes.

Diablo,

with cores 860,287 and 980,327 and -v 18 I now get 795-797 so it is 3-4 MHs faster than the previous version.

Thanks a lot.

spiccioli

btw, what makes a kernel depend upon memory speed?

I thought that there is no (or very very little) video memory use in hashing the bitcoin chain and, up until now, I always did lower memory clock as much I could to lower energy consumption.

Well, it's not such a bad idea to keep the algorithm in memory to unroll back to the GPUs once the unroll is used up. It should be faster than referring to the system memory. Of course, if it could be unrolled straight from the GPU back to the GPU once the unrolls nearly reach their end (1 unroll away), it would be a lot better. But that would involve holding the entire code in a register or so and somehow converting it which doesn't seem all that possible.

spiccioli

legendary

Activity: 1379

Merit: 1003

nec sine labore

Quote from: DiabloD3 on August 24, 2011, 02:57:26 PM

As I just said to iopq, some kernels require more than others. Try 1/3rd, it will probably bring back your missing hashes.

Diablo,

with cores 860,287 and 980,327 and -v 18 I now get 795-797 so it is 3-4 MHs faster than the previous version.

Thanks a lot.

spiccioli

btw, what makes a kernel depend upon memory speed?

I thought that there is no (or very very little) video memory use in hashing the bitcoin chain and, up until now, I always did lower memory clock as much I could to lower energy consumption.

iopq

hero member

Activity: 658

Merit: 500

Quote from: DiabloD3 on August 24, 2011, 02:57:26 PM

No, it isn't a guideline. 1/3rd core clock for memory clock sits in a zone that on most Radeon 5xxxes it hits the stock memory timings correctly and incurs no speed loss for applications that don't rely on memory bandwidth.

If you're too low or too high, you incur a speed loss or sometimes the card just locks up.

Some kernels require better compliance with this than others.

except it is a guideline, because my 5750 is not stable with memory at 233 mhz
my 5850 card is faster as slightly more than 1/3, its core clock is is 725 and 275 is faster than both 242 and 300

you can blame the kernel, but phatk 2.2 is the fastest kernel on both cards and those timings are the fastest timings in practice

DiabloD3

legendary

Activity: 1162

Merit: 1000

DiabloMiner author

It seems my response got eaten.

Quote from: iopq on August 24, 2011, 11:37:18 AM

]this is just a guideline
the fastest for me at 700 clock speed is 204 memory speed (205 gives artifacts, 210 hangs computer) with a 5750
on a 5850 the fastest with 725 clock speed is 275 memory speed, with 250 and 300 being slower

No, it isn't a guideline. 1/3rd core clock for memory clock sits in a zone that on most Radeon 5xxxes it hits the stock memory timings correctly and incurs no speed loss for applications that don't rely on memory bandwidth.

If you're too low or too high, you incur a speed loss or sometimes the card just locks up.

Some kernels require better compliance with this than others.

Quote from: spiccioli on August 24, 2011, 11:52:06 AM

They are the lowest ones that don't slow down mining.

spiccioli.

As I just said to iopq, some kernels require more than others. Try 1/3rd, it will probably bring back your missing hashes.

spiccioli

legendary

Activity: 1379

Merit: 1003

nec sine labore

Quote from: DiabloD3 on August 24, 2011, 10:26:36 AM

Quote from: spiccioli on August 24, 2011, 03:04:48 AM

Quote from: DiabloD3 on August 23, 2011, 11:27:48 PM

Quote from: spiccioli on August 23, 2011, 09:40:22 AM

DiabloD3,

I've downloaded latest DiabloMiner, sadly it went from 793-795 MHs on a dual GPU (5850/5870) rig (860,260/980,280 as clocks) to 445-448 MHs!!

I start it with -v 19 -l url

best regards.

spiccioli

Try -v 2 or -v 18.

DiabloD3,

it is still slower

-v 18: 790/792 MHs
-v 2 : 789/792 MHs

best regards.

spiccioli

Btw, why are your memory clocks wrong? They should be 1/3rd of your core clock's speed.

They are the lowest ones that don't slow down mining.

spiccioli.

iopq

hero member

Activity: 658

Merit: 500

Quote from: DiabloD3 on August 24, 2011, 10:26:36 AM

Quote from: spiccioli on August 24, 2011, 03:04:48 AM

Quote from: DiabloD3 on August 23, 2011, 11:27:48 PM

Quote from: spiccioli on August 23, 2011, 09:40:22 AM

DiabloD3,

I've downloaded latest DiabloMiner, sadly it went from 793-795 MHs on a dual GPU (5850/5870) rig (860,260/980,280 as clocks) to 445-448 MHs!!

I start it with -v 19 -l url

best regards.

spiccioli

Try -v 2 or -v 18.

DiabloD3,

it is still slower

-v 18: 790/792 MHs
-v 2 : 789/792 MHs

best regards.

spiccioli

Btw, why are your memory clocks wrong? They should be 1/3rd of your core clock's speed.

this is just a guideline
the fastest for me at 700 clock speed is 204 memory speed (205 gives artifacts, 210 hangs computer) with a 5750
on a 5850 the fastest with 725 clock speed is 275 memory speed, with 250 and 300 being slower

DiabloD3

legendary

Activity: 1162

Merit: 1000

DiabloMiner author

Quote from: spiccioli on August 24, 2011, 03:04:48 AM

Quote from: DiabloD3 on August 23, 2011, 11:27:48 PM

Quote from: spiccioli on August 23, 2011, 09:40:22 AM

DiabloD3,

I've downloaded latest DiabloMiner, sadly it went from 793-795 MHs on a dual GPU (5850/5870) rig (860,260/980,280 as clocks) to 445-448 MHs!!

I start it with -v 19 -l url

best regards.

spiccioli

Try -v 2 or -v 18.

DiabloD3,

it is still slower

-v 18: 790/792 MHs
-v 2 : 789/792 MHs

best regards.

spiccioli

Btw, why are your memory clocks wrong? They should be 1/3rd of your core clock's speed.

spiccioli

legendary

Activity: 1379

Merit: 1003

nec sine labore

Quote from: DiabloD3 on August 23, 2011, 11:27:48 PM

Quote from: spiccioli on August 23, 2011, 09:40:22 AM

DiabloD3,

I've downloaded latest DiabloMiner, sadly it went from 793-795 MHs on a dual GPU (5850/5870) rig (860,260/980,280 as clocks) to 445-448 MHs!!

I start it with -v 19 -l url

best regards.

spiccioli

Try -v 2 or -v 18.

DiabloD3,

it is still slower

-v 18: 790/792 MHs
-v 2 : 789/792 MHs

best regards.

spiccioli

DiabloD3

legendary

Activity: 1162

Merit: 1000

DiabloMiner author

Quote from: spiccioli on August 23, 2011, 09:40:22 AM

DiabloD3,

I've downloaded latest DiabloMiner, sadly it went from 793-795 MHs on a dual GPU (5850/5870) rig (860,260/980,280 as clocks) to 445-448 MHs!!

I start it with -v 19 -l url

best regards.

spiccioli

Try -v 2 or -v 18.

spiccioli

legendary

Activity: 1379

Merit: 1003

nec sine labore

DiabloD3,

I've downloaded latest DiabloMiner, sadly it went from 793-795 MHs on a dual GPU (5850/5870) rig (860,260/980,280 as clocks) to 445-448 MHs!!

I start it with -v 19 -l url

best regards.

spiccioli

DiabloD3

legendary

Activity: 1162

Merit: 1000

DiabloMiner author

Quote from: d3m0n1q_733rz on August 21, 2011, 02:05:07 AM

Quote from: DiabloD3 on August 20, 2011, 06:01:13 AM

Quote from: kaosbit on August 20, 2011, 03:44:32 AM

OSX user here, new version is working steadily for 15 mins now, no more running wild behavior. I will keep it going for a day and see what happens, but I believe you fixed it, thanks! Do you use a variant of your original kernel again, or this this still phatk?

BTW I am still using Snow Leopard since I don't believe in 1.0 OS versions. Can anyone confirm the miner working on Lion?

It never really was pure phatk to begin with. I tried to phatk-arize the existing kernel, but it ended up causing more work and problems than it was worth. It is still phatk-ized, but in a way that properly uses the technique (something phateus himself doesn't yet).

What technique is that? Oh! And hey, what do you think of compressing the initial tables by utilizing the difference between the current and previous values in the table instead of the actual values? I know that it might add a slight overhead on the addition/subtraction, but it would decrease the amount of memory that would need to be read before running the calculations and might allow for another unroll or two.

The phatk technique is... to use an array. Thats it.

Those arguments are loaded into constant memory, which is as fast as registers but is workgroup wide. You can't win that way.

Druas

member

Activity: 78

Merit: 10

Quote from: DiabloD3 on August 19, 2011, 08:28:30 PM

Update: Increase speed 1.3% on SDK 2.1 and 0.2% on SDK 2.5, use Deque instead of AtomicReference for incoming new work

OSX users: Test to see if the new kernel fixes things or makes it faster.

Past versions have performed slightly better Mhash-wise. I actually saw a decrease in speed with this version, so I will probably use an older version since I never had the problems others have had with the older versions.

Topic: DiabloMiner GPU Miner - page 40. (Read 866674 times)