Author

Topic: OFFICIAL CGMINER mining software thread for linux/win/osx/mips/arm/r-pi 4.11.0 - page 847. (Read 5805677 times)

-ck
legendary
Activity: 4088
Merit: 1631
Ruu \o/
@Diapolo. The HW errors are to to do with loading the kernel and running the code asynchronously with a flush afterwards. Because there's a flush right at the start it seems to pick up crap left over from the last cl code lying around. Maybe a flush/finish before that will help those of you that are seeing that. The HW errors are harmless if they don't continue after startup.

Utility is simply how many accepted shares are returned per minute.
Efficiency is calculated as the number of accepted shares compared to the number of requested work items. (i.e. it is NOT the rejection rate, but lots of rejects will decrease it). Mining software that does not search the entire work space offered to it tends to have low efficiency (such as cpu mining which gives up long before it's finished searching all hashes).

As for the intensity. Yeah it's just -too long- in GPU space if you set it too high. That's why I recommend 8 max for most cards and 9 for 6990 only.
hero member
Activity: 772
Merit: 500
Another observation, without -I 9 I get a smooth GPU usage of ~99%, with -I 9 defined I get spikes with highs and lows down to 90% GPU usage (I use MSI AfterBurner for monitoring).

Dia
newbie
Activity: 51
Merit: 0
Hi,

Im trying to get this running in my test environment and im having a mare.. not being a linux bod in any shape or form, i can move about inside the system ok.

is there a specific make command for this?
i've can the ./configure and found i needs yasm, so i got that installed, but there dosn't seem to be a way i can see of getting this made to an install package... im a windows bod mostly so be gentle Tongue
hero member
Activity: 772
Merit: 500
What is meant by utility, I currently don't understand that value.
How is efficiency calculated?

I tried it now and it looks quite nice, what bothers me is, that I seem to get hw errors right after I start cgminer.
After a few seconds this fades out and everything works ... could there be a small bug in hw error detection?

5870:
cgminer.exe -o http://bitcoins.lc:8080/ -u XXX -p YYY -Q 2 -d 0 -v 2 -w 128 --no-dynamic

5830:
cgminer.exe -o http://bitcoins.lc:8080/ -u XXX -p YYY -Q 2 -d 1 -v 2 -w 256 --no-dynamic

Thanks,
Dia
-ck
legendary
Activity: 4088
Merit: 1631
Ruu \o/
Just wanted to mention: in the README file, the forum link still points to the old thread.

Both include the new dynamic feature. Disable for dedicated mining!
What does this "dynamic feature" do, exactly?

Thanks for the heads up.

Dynamic keeps an eye on how long the GPU code is executing for and makes sure it always returns within a time frame that will allow the screen to take focus and refresh at a reliable rate that is visible to the user. When your machine is not in use by you, the GPU code will be able to do much more execution during this time frame which will raise your mhash. When you start using your machine, even just moving the mouse, clicking windows, browsing, and especially with watching videos, gaming etc, it will execute less and less hashing code to ensure it returns the GPU for the user experience. Basically it's a dynamic mining process that should make it invisible to the user who actually uses the GPU for regular PC uses, but mines with all the excess GPU power available. The difference on my desktop is up to 15MHash more when it's idle, but it's much snappier when I actually use it.
full member
Activity: 373
Merit: 100
Just wanted to mention: in the README file, the forum link still points to the old thread.

Both include the new dynamic feature. Disable for dedicated mining!
What does this "dynamic feature" do, exactly?
-ck
legendary
Activity: 4088
Merit: 1631
Ruu \o/
@d3m0n1q_733rz is that different to the existing "atom" asm code in cgminer? does it need specific cpu support, and if so I'll need to have it as a separate optional assembly miner.

New windows build.
http://ck.kolivas.org/apps/cgminer-1.2.4-win32.zip

New Source tarball.
http://ck.kolivas.org/apps/cgminer-1.2.4-1.tar.bz2

Both include the new dynamic feature. Disable for dedicated mining!

Discussed the other problem of TurdHurdur's (off the forum) and it turns out it was missing the kernel file because he had done "make install" which doesn't really work properly unless you run from the directory you install to. The files should be all together in the same directory.
sr. member
Activity: 378
Merit: 250
Here, have fun with this; it's the atom optimized code by Neil Kettle with some slight SSE4.1 mods to it.  It doesn't include any of the YMM additions or SSE horizontal math calculations I've been playing around with, but it's a good general purpose hash speed increase for CPU.  Don't forget to add -msse4.1 to your CFLAGS.  And Neil, if you read this, let me know what you think of the reordering of the commands for the prefetch.

Code:
;; SHA-256 for X86-64 for Linux, based off of:

; (c) Ufasoft 2011 http://ufasoft.com mailto:[email protected]
; Version 2011
; This software is Public Domain

; Significant re-write/optimisation and reordering by,
; Neil Kettle
; ~18% performance improvement

; SHA-256 CPU SSE cruncher for Bitcoin Miner

ALIGN 32
BITS 64

%define hash rdi
%define data rsi
%define init rdx

; 0 = (1024 - 256) (mod (LAB_CALC_UNROLL*LAB_CALC_PARA*16))
%define LAB_CALC_PARA 2
%define LAB_CALC_UNROLL 8

%define LAB_LOOP_UNROLL 8

extern g_4sha256_k

global CalcSha256_x64
; CalcSha256 hash(rdi), data(rsi), init(rdx)
CalcSha256_x64:

push rbx

LAB_NEXT_NONCE:

mov rcx, 64*4 ; 256 - rcx is # of SHA-2 rounds
mov rax, 16*4 ; 64 - rax is where we expand to

LAB_SHA:
push rcx
lea rcx, qword [data+rcx*4] ; + 1024
lea r11, qword [data+rax*4] ; + 256

LAB_CALC:
%macro lab_calc_blk 1

movntdqa xmm0, [r11-(15-%1)*16] ; xmm0 = W[I-15]
movdqa xmm2, xmm0 ; xmm2 = W[I-15]
movntdqa xmm4, [r11-(15-(%1+1))*16] ; xmm4 = W[I-15+1]
movdqa xmm6, xmm4 ; xmm6 = W[I-15+1]

psrld xmm0, 3 ; xmm0 = W[I-15] >> 3
movdqa xmm1, xmm0 ; xmm1 = W[I-15] >> 3
pslld xmm2, 14 ; xmm2 = W[I-15] << 14
psrld xmm4, 3 ; xmm4 = W[I-15+1] >> 3
movdqa xmm5, xmm4 ; xmm5 = W[I-15+1] >> 3
psrld xmm5, 4 ; xmm5 = W[I-15+1] >> 7
pxor xmm4, xmm5 ; xmm4 = (W[I-15+1] >> 3) ^ (W[I-15+1] >> 7)
pslld xmm6, 14 ; xmm6 = W[I-15+1] << 14
psrld xmm1, 4 ; xmm1 = W[I-15] >> 7
pxor xmm0, xmm1 ; xmm0 = (W[I-15] >> 3) ^ (W[I-15] >> 7)
pxor xmm0, xmm2 ; xmm0 = (W[I-15] >> 3) ^ (W[I-15] >> 7) ^ (W[I-15] << 14)
psrld xmm1, 11 ; xmm1 = W[I-15] >> 18
psrld xmm5, 11 ; xmm5 = W[I-15+1] >> 18
pxor xmm4, xmm6 ; xmm4 = (W[I-15+1] >> 3) ^ (W[I-15+1] >> 7) ^ (W[I-15+1] << 14)
pxor xmm4, xmm5 ; xmm4 = (W[I-15+1] >> 3) ^ (W[I-15+1] >> 7) ^ (W[I-15+1] << 14) ^ (W[I-15+1] >> 18)
pslld xmm2, 11 ; xmm2 = W[I-15] << 25
pslld xmm6, 11 ; xmm6 = W[I-15+1] << 25
pxor xmm4, xmm6 ; xmm4 = (W[I-15+1] >> 3) ^ (W[I-15+1] >> 7) ^ (W[I-15+1] << 14) ^ (W[I-15+1] >> 18) ^ (W[I-15+1] << 25)
pxor xmm0, xmm1 ; xmm0 = (W[I-15] >> 3) ^ (W[I-15] >> 7) ^ (W[I-15] << 14) ^ (W[I-15] >> 18)
pxor xmm0, xmm2 ; xmm0 = (W[I-15] >> 3) ^ (W[I-15] >> 7) ^ (W[I-15] << 14) ^ (W[I-15] >> 18) ^ (W[I-15] << 25)
paddd xmm0, [r11-(16-%1)*16] ; xmm0 = s0(W[I-15]) + W[I-16]
paddd xmm4, [r11-(16-(%1+1))*16] ; xmm4 = s0(W[I-15+1]) + W[I-16+1]
movntdqa xmm3, [r11-(2-%1)*16] ; xmm3 = W[I-2]
movntdqa xmm7, [r11-(2-(%1+1))*16] ; xmm7 = W[I-2+1]

;;;;;;;;;;;;;;;;;;

movdqa xmm2, xmm3 ; xmm2 = W[I-2]
psrld xmm3, 10 ; xmm3 = W[I-2] >> 10
movdqa xmm1, xmm3 ; xmm1 = W[I-2] >> 10
movdqa xmm6, xmm7 ; xmm6 = W[I-2+1]
psrld xmm7, 10 ; xmm7 = W[I-2+1] >> 10
movdqa xmm5, xmm7 ; xmm5 = W[I-2+1] >> 10

paddd xmm0, [r11-(7-%1)*16] ; xmm0 = s0(W[I-15]) + W[I-16] + W[I-7]
paddd xmm4, [r11-(7-(%1+1))*16] ; xmm4 = s0(W[I-15+1]) + W[I-16+1] + W[I-7+1]

pslld xmm2, 13 ; xmm2 = W[I-2] << 13
pslld xmm6, 13 ; xmm6 = W[I-2+1] << 13
psrld xmm1, 7 ; xmm1 = W[I-2] >> 17
psrld xmm5, 7 ; xmm5 = W[I-2+1] >> 17



pxor xmm3, xmm1 ; xmm3 = (W[I-2] >> 10) ^ (W[I-2] >> 17)
psrld xmm1, 2 ; xmm1 = W[I-2] >> 19
pxor xmm3, xmm2 ; xmm3 = (W[I-2] >> 10) ^ (W[I-2] >> 17) ^ (W[I-2] << 13)
pslld xmm2, 2 ; xmm2 = W[I-2] << 15
pxor xmm7, xmm5 ; xmm7 = (W[I-2+1] >> 10) ^ (W[I-2+1] >> 17)
psrld xmm5, 2 ; xmm5 = W[I-2+1] >> 19
pxor xmm7, xmm6 ; xmm7 = (W[I-2+1] >> 10) ^ (W[I-2+1] >> 17) ^ (W[I-2+1] << 13)
pslld xmm6, 2 ; xmm6 = W[I-2+1] << 15



pxor xmm3, xmm1 ; xmm3 = (W[I-2] >> 10) ^ (W[I-2] >> 17) ^ (W[I-2] << 13) ^ (W[I-2] >> 19)
pxor xmm3, xmm2 ; xmm3 = (W[I-2] >> 10) ^ (W[I-2] >> 17) ^ (W[I-2] << 13) ^ (W[I-2] >> 19) ^ (W[I-2] << 15)
paddd xmm0, xmm3 ; xmm0 = s0(W[I-15]) + W[I-16] + s1(W[I-2]) + W[I-7]
pxor xmm7, xmm5 ; xmm7 = (W[I-2+1] >> 10) ^ (W[I-2+1] >> 17) ^ (W[I-2+1] << 13) ^ (W[I-2+1] >> 19)
pxor xmm7, xmm6 ; xmm7 = (W[I-2+1] >> 10) ^ (W[I-2+1] >> 17) ^ (W[I-2+1] << 13) ^ (W[I-2+1] >> 19) ^ (W[I-2+1] << 15)
paddd xmm4, xmm7 ; xmm4 = s0(W[I-15+1]) + W[I-16+1] + s1(W[I-2+1]) + W[I-7+1]

movdqa [r11+(%1*16)], xmm0
movdqa [r11+((%1+1)*16)], xmm4
%endmacro

%assign i 0
%rep    LAB_CALC_UNROLL
        lab_calc_blk i
%assign i i+LAB_CALC_PARA
%endrep

add r11, LAB_CALC_UNROLL*LAB_CALC_PARA*16
cmp r11, rcx
jb LAB_CALC

pop rcx
mov rax, 0

; Load the init values of the message into the hash.

movntdqa xmm7, [init]
pshufd xmm5, xmm7, 0x55 ; xmm5 == b
pshufd xmm4, xmm7, 0xAA ; xmm4 == c
pshufd xmm3, xmm7, 0xFF ; xmm3 == d
pshufd xmm7, xmm7, 0 ; xmm7 == a

movntdqa xmm0, [init+4*4]
pshufd xmm8, xmm0, 0x55 ; xmm8 == f
pshufd xmm9, xmm0, 0xAA ; xmm9 == g
pshufd xmm10, xmm0, 0xFF ; xmm10 == h
pshufd xmm0, xmm0, 0 ; xmm0 == e

LAB_LOOP:

;; T t1 = h + (Rotr32(e, 6) ^ Rotr32(e, 11) ^ Rotr32(e, 25)) + ((e & f) ^ AndNot(e, g)) + Expand32(g_sha256_k[j]) + w[j]

%macro lab_loop_blk 0
movntdqa xmm6, [data+rax*4]
paddd xmm6, g_4sha256_k[rax*4]
add rax, 4

paddd xmm6, xmm10 ; +h

movdqa xmm1, xmm0
movdqa xmm2, xmm9
pandn xmm1, xmm2 ; ~e & g

movdqa xmm10, xmm2 ; h = g
movdqa xmm2, xmm8 ; f
movdqa xmm9, xmm2 ; g = f

pand xmm2, xmm0 ; e & f
pxor xmm1, xmm2 ; (e & f) ^ (~e & g)
movdqa xmm8, xmm0 ; f = e

paddd xmm6, xmm1 ; Ch + h + w[i] + k[i]

movdqa xmm1, xmm0
psrld xmm0, 6
movdqa xmm2, xmm0
pslld xmm1, 7
psrld xmm2, 5
pxor xmm0, xmm1
pxor xmm0, xmm2
pslld xmm1, 14
psrld xmm2, 14
pxor xmm0, xmm1
pxor xmm0, xmm2
pslld xmm1, 5
pxor xmm0, xmm1 ; Rotr32(e, 6) ^ Rotr32(e, 11) ^ Rotr32(e, 25)
paddd xmm6, xmm0 ; xmm6 = t1

movdqa xmm0, xmm3 ; d
paddd xmm0, xmm6 ; e = d+t1

movdqa xmm1, xmm5 ; =b
movdqa xmm3, xmm4 ; d = c
movdqa xmm2, xmm4 ; c
pand xmm2, xmm5 ; b & c
pand xmm4, xmm7 ; a & c
pand xmm1, xmm7 ; a & b
pxor xmm1, xmm4
movdqa xmm4, xmm5 ; c = b
movdqa xmm5, xmm7 ; b = a
pxor xmm1, xmm2 ; (a & c) ^ (a & d) ^ (c & d)
paddd xmm6, xmm1 ; t1 + ((a & c) ^ (a & d) ^ (c & d))

movdqa xmm2, xmm7
psrld xmm7, 2
movdqa xmm1, xmm7
pslld xmm2, 10
psrld xmm1, 11
pxor xmm7, xmm2
pxor xmm7, xmm1
pslld xmm2, 9
psrld xmm1, 9
pxor xmm7, xmm2
pxor xmm7, xmm1
pslld xmm2, 11
pxor xmm7, xmm2
paddd xmm7, xmm6 ; a = t1 + (Rotr32(a, 2) ^ Rotr32(a, 13) ^ Rotr32(a, 22)) + ((a & c) ^ (a & d) ^ (c & d));
%endmacro

%assign i 0
%rep    LAB_LOOP_UNROLL
        lab_loop_blk
%assign i i+1
%endrep

cmp rax, rcx
jb LAB_LOOP

; Finished the 64 rounds, calculate hash and save

movntdqa xmm1, [rdx]
pshufd xmm2, xmm1, 0x55
paddd xmm5, xmm2
pshufd xmm6, xmm1, 0xAA
paddd xmm4, xmm6
pshufd xmm11, xmm1, 0xFF
paddd xmm3, xmm11
pshufd xmm1, xmm1, 0
paddd xmm7, xmm1

movntdqa xmm1, [rdx+4*4]
pshufd xmm2, xmm1, 0x55
paddd xmm8, xmm2
pshufd xmm6, xmm1, 0xAA
paddd xmm9, xmm6
pshufd xmm11, xmm1, 0xFF
paddd xmm10, xmm11
pshufd xmm1, xmm1, 0
paddd xmm0, xmm1

movdqa [hash+0*16], xmm7
movdqa [hash+1*16], xmm5
movdqa [hash+2*16], xmm4
movdqa [hash+3*16], xmm3
movdqa [hash+4*16], xmm0
movdqa [hash+5*16], xmm8
movdqa [hash+6*16], xmm9
movdqa [hash+7*16], xmm10

LAB_RET:
pop rbx
ret
-ck
legendary
Activity: 4088
Merit: 1631
Ruu \o/
The windows build is never quite as good because of the mingw interface. I do know the CPU usage is much much higher as a result of the pthread library. Dunno what to do about that.
full member
Activity: 126
Merit: 100
Just try a couple of times to start it, if you're on windows. It happens to me as well...5 or 6 tries and it should work.
-ck
legendary
Activity: 4088
Merit: 1631
Ruu \o/
Alright I did typo the subdomain, this time there was a

Code:
[2011-07-13 02:06:45] Long-polling activated for http://uscentral.btcguild.com:8332/LP

before...

Code:
[2011-07-13 02:06:47] Attempting to restart thread 0, idle for more than 60 seconds

with the rest being the same.

Can you do the same with -D and -P and log all the output and either pastebin it or email me or something please? (feel free to XXX out your name and pass in the logs)
full member
Activity: 216
Merit: 100

Thanks. That doesn't look like it ever started mining. Were all the login parameters ok? Did debug show you any http error messages or the like? Unfortunately cgminer doesn't abort when it fails to connect right at the start and may just sit there looking stupid.

Alright I did typo the subdomain, this time there was a

Code:
[2011-07-13 02:06:45] Long-polling activated for http://uscentral.btcguild.com:8332/LP

before...

Code:
[2011-07-13 02:06:47] Attempting to restart thread 0, idle for more than 60 seconds

with the rest being the same.
-ck
legendary
Activity: 4088
Merit: 1631
Ruu \o/
Sounds good, if I like CGMiner, is there any way to talk about (or edit in) new or some more init values for the kernel? I guess it could be tweaked some more for even higher performance Smiley.

Sure. Email me, post here or see me on IRC? I hang out in #bitcoin-mining and #ozcoin (the pool I use).
hero member
Activity: 772
Merit: 500
I didn't try this version, but perhaps someone can answer my questions.

1. Are OpenCK kernels editable (.cl file somewhere) or is it hard coded? If no, could you please make that an option or make it modular.
2. Are the OpenCL init values for the kernel editable? If no, could you please make that an option or make it modular.
3. How much CPU time is eaten for each GPU running?

Thanks,
Dia

The kernels are source kernels .cl, built once and then loaded as binaries from then on.
The init values are the same as those for phatk. Internally I modify the kernel to suit how my code expects output results and to patch properly with BFI INT, BITALIGN and VECTORS, and to support 4 vectors as well.
The CPU time is negligible unless you use a very low intensity level and then (paradoxically) it rises slightly because it loops more often. On my 4x6970 machine pushing 1690 Mhash/s it is using 8% cpu with the amd phenom throttled to 800Mhz.

Sounds good, if I like CGMiner, is there any way to talk about (or edit in) new or some more init values for the kernel? I guess it could be tweaked some more for even higher performance Smiley.

Dia
full member
Activity: 182
Merit: 100
I'd like to see answers to #3 from different people because on my system it's quite high.
On my other machine with a slower CPU and another with a faster one, they get < 1% usage.

Intensity and verbosity have no effect so it might be a broken library or something.
-ck
legendary
Activity: 4088
Merit: 1631
Ruu \o/
I didn't try this version, but perhaps someone can answer my questions.

1. Are OpenCK kernels editable (.cl file somewhere) or is it hard coded? If no, could you please make that an option or make it modular.
2. Are the OpenCL init values for the kernel editable? If no, could you please make that an option or make it modular.
3. How much CPU time is eaten for each GPU running?

Thanks,
Dia

The kernels are source kernels .cl, built once and then loaded as binaries from then on.
The init values are the same as those for phatk. Internally I modify the kernel to suit how my code expects output results and to patch properly with BFI INT, BITALIGN and VECTORS, and to support 4 vectors as well.
The CPU time is negligible unless you use a very low intensity level and then (paradoxically) it rises slightly because it loops more often. On my 4x6970 machine pushing 1690 Mhash/s it is using 8% cpu with the amd phenom throttled to 800Mhz.
hero member
Activity: 772
Merit: 500
I didn't try this version (but will do this later today), perhaps someone can answer my questions.

1. Are the OpenCL kernels editable (.cl file somewhere) or is it hard coded? If no, could you please make that an option or make it modular.
2. Are the OpenCL init values for the kernel editable? If no, could you please make that an option or make it modular.

I'm asking, because it's very easy for me to customize and modify kernels in Phoenix + init values Smiley.

3. How much CPU time is eaten for each GPU running?

Thanks,
Dia
-ck
legendary
Activity: 4088
Merit: 1631
Ruu \o/
Debian 6 built from git:

Quote
$ cgminer -o http://domain.tld:8332 -u X -p Y -d 0 -D
cgminer version 1.2.4
--------------------------------------------------------------------------------
Totals:
--------------------------------------------------------------------------------
GPU 0: [0.0 Mh/s] [Q:0  A:0  R:0  HW:0  E:0%  U:0.00/m]
GPU 1: [0.0 Mh/s] [Q:0  A:0  R:0  HW:0  E:0%  U:0.00/m]


--------------------------------------------------------------------------------

[2011-07-13 00:42:22] Attempting to restart thread 0, idle for more than 60 seconds
[2011-07-13 00:42:22] Failed to pthread_cancel in reinit_gputhread
[2011-07-13 00:42:22] Received kill message
[2011-07-13 00:42:22] Thread 0 restarted
[2011-07-13 00:42:22] Attempting to restart thread 1, idle for more than 60 secondsSegmentation fault

/var/log/messages:
Quote
Jul 13 00:42:22 debianminer2 kernel: [  385.148119] cgminer[6395]: segfault at 28 ip 00007f296cb02ed4 sp 00007f29620c9d20 error 4 in libpthread-2.11.2.so[7f296cafa000+17000]

Thanks. That doesn't look like it ever started mining. Were all the login parameters ok? Did debug show you any http error messages or the like? Unfortunately cgminer doesn't abort when it fails to connect right at the start and may just sit there looking stupid.
-ck
legendary
Activity: 4088
Merit: 1631
Ruu \o/
Thanks for taking over "the C miner" project!

You're most welcome. Thanks so much for the original stable working framework and all the cpu mining bits without which it would have taken me much longer (if ever) to get this far!
legendary
Activity: 1596
Merit: 1100
Thanks for taking over "the C miner" project!
Jump to: