Nexus - Pure SHA3 + CPU/GPU + nPoS + 15 Active Innovations + More to Come - page 306.

paulthetafy

hero member

Activity: 820

Merit: 1000

Quote from: skunk on October 26, 2014, 09:44:33 PM

Quote from: paulthetafy on October 26, 2014, 09:21:22 PM

Quote from: Supercomputing on October 26, 2014, 08:55:35 PM

Quote from: paulthetafy on October 26, 2014, 08:42:06 PM

Thanks for the tips everyone. Removing the unroll's made the difference, but now I have a compiled (and running) version I'll keep playing with the compiler options and retrying with the pragma unroll's back in.

Incidentally when compiled with sm_35 the miner is reporting a MH/s value that I think should be KH/s...

Code:

[MASTER] Coinshield Network: New Block 29091
367528.1 MH/s | 0 Blks ACC=0 REJ=0 | Height = 29091 | Diff = 35 0-bits | 00:03:17
365121.1 MH/s | 0 Blks ACC=0 REJ=0 | Height = 29091 | Diff = 35 0-bits | 00:03:28

When compiled with sm_30 it reports correctly around the 28 MH/s mark, but with sm_35 it gives this. Should I be concerned?

Your GPU's compute capability is not set correctly in the Makefile and the kernel is not being launched correctly. please see the link below:

https://developer.nvidia.com/cuda-gpus

Also please note that the application currently only works on GPUs with compute capability 3.5 or greater.

Thanks for the quick reply. Changed it back to 30 and all is working fine... I hope

Code:

[MASTER] Coinshield Network: New Block 29127
24.0 MH/s | 0 Blks ACC=0 REJ=0 | Height = 29127 | Diff = 36 0-bits | 00:04:09
24.0 MH/s | 0 Blks ACC=0 REJ=0 | Height = 29127 | Diff = 36 0-bits | 00:04:20
24.0 MH/s | 0 Blks ACC=0 REJ=0 | Height = 29127 | Diff = 36 0-bits | 00:04:31

ehmmm... you missed:

Quote from: Supercomputing on October 26, 2014, 08:55:35 PM

Also please note that the application currently only works on GPUs with compute capability 3.5 or greater.

you'll not find any block ever...

You're absolutely right, in my haste I did miss that! I'll switch off until the kernel is rewritten to support sm_30.

skunk

sr. member

Activity: 329

Merit: 250

Quote from: paulthetafy on October 26, 2014, 09:21:22 PM

Quote from: Supercomputing on October 26, 2014, 08:55:35 PM

Quote from: paulthetafy on October 26, 2014, 08:42:06 PM

Thanks for the tips everyone. Removing the unroll's made the difference, but now I have a compiled (and running) version I'll keep playing with the compiler options and retrying with the pragma unroll's back in.

Incidentally when compiled with sm_35 the miner is reporting a MH/s value that I think should be KH/s...

Code:

[MASTER] Coinshield Network: New Block 29091
367528.1 MH/s | 0 Blks ACC=0 REJ=0 | Height = 29091 | Diff = 35 0-bits | 00:03:17
365121.1 MH/s | 0 Blks ACC=0 REJ=0 | Height = 29091 | Diff = 35 0-bits | 00:03:28

When compiled with sm_30 it reports correctly around the 28 MH/s mark, but with sm_35 it gives this. Should I be concerned?

Your GPU's compute capability is not set correctly in the Makefile and the kernel is not being launched correctly. please see the link below:

https://developer.nvidia.com/cuda-gpus

Also please note that the application currently only works on GPUs with compute capability 3.5 or greater.

Thanks for the quick reply. Changed it back to 30 and all is working fine... I hope

Code:

[MASTER] Coinshield Network: New Block 29127
24.0 MH/s | 0 Blks ACC=0 REJ=0 | Height = 29127 | Diff = 36 0-bits | 00:04:09
24.0 MH/s | 0 Blks ACC=0 REJ=0 | Height = 29127 | Diff = 36 0-bits | 00:04:20
24.0 MH/s | 0 Blks ACC=0 REJ=0 | Height = 29127 | Diff = 36 0-bits | 00:04:31

ehmmm... you missed:

Quote from: Supercomputing on October 26, 2014, 08:55:35 PM

Also please note that the application currently only works on GPUs with compute capability 3.5 or greater.

you'll not find any block ever...

paulthetafy

hero member

Activity: 820

Merit: 1000

Quote from: Supercomputing on October 26, 2014, 08:55:35 PM

Quote from: paulthetafy on October 26, 2014, 08:42:06 PM

Thanks for the tips everyone. Removing the unroll's made the difference, but now I have a compiled (and running) version I'll keep playing with the compiler options and retrying with the pragma unroll's back in.

Incidentally when compiled with sm_35 the miner is reporting a MH/s value that I think should be KH/s...

Code:

[MASTER] Coinshield Network: New Block 29091
367528.1 MH/s | 0 Blks ACC=0 REJ=0 | Height = 29091 | Diff = 35 0-bits | 00:03:17
365121.1 MH/s | 0 Blks ACC=0 REJ=0 | Height = 29091 | Diff = 35 0-bits | 00:03:28

When compiled with sm_30 it reports correctly around the 28 MH/s mark, but with sm_35 it gives this. Should I be concerned?

Your GPU's compute capability is not set correctly in the Makefile and the kernel is not being launched correctly. please see the link below:

https://developer.nvidia.com/cuda-gpus

Also please note that the application currently only works on GPUs with compute capability 3.5 or greater.

Thanks for the quick reply. Changed it back to 30 and all is working fine... I hope

Code:

[MASTER] Coinshield Network: New Block 29127
24.0 MH/s | 0 Blks ACC=0 REJ=0 | Height = 29127 | Diff = 36 0-bits | 00:04:09
24.0 MH/s | 0 Blks ACC=0 REJ=0 | Height = 29127 | Diff = 36 0-bits | 00:04:20
24.0 MH/s | 0 Blks ACC=0 REJ=0 | Height = 29127 | Diff = 36 0-bits | 00:04:31

Supercomputing

sr. member

Activity: 278

Merit: 250

Quote from: paulthetafy on October 26, 2014, 08:42:06 PM

Thanks for the tips everyone. Removing the unroll's made the difference, but now I have a compiled (and running) version I'll keep playing with the compiler options and retrying with the pragma unroll's back in.

Incidentally when compiled with sm_35 the miner is reporting a MH/s value that I think should be KH/s...

Code:

[MASTER] Coinshield Network: New Block 29091
367528.1 MH/s | 0 Blks ACC=0 REJ=0 | Height = 29091 | Diff = 35 0-bits | 00:03:17
365121.1 MH/s | 0 Blks ACC=0 REJ=0 | Height = 29091 | Diff = 35 0-bits | 00:03:28

When compiled with sm_30 it reports correctly around the 28 MH/s mark, but with sm_35 it gives this. Should I be concerned?

Your GPU's compute capability is not set correctly in the Makefile and the kernel is not being launched correctly. please see the link below:

https://developer.nvidia.com/cuda-gpus

Also please note that the application currently only works on GPUs with compute capability 3.5 or greater.

paulthetafy

hero member

Activity: 820

Merit: 1000

Quote from: djm34 on October 26, 2014, 08:12:55 PM

Quote from: paulthetafy on October 26, 2014, 07:20:54 PM

Has anyone experienced nvcc taking a long time to complete on linux? Mine has been going for over an hour now and has till not finished. The process appears to be running fine but at 100% CPU.

if you compile for compute 30 up to compute 50 (or more) it may takes some time...
alternatively you can remove the pragma unroll in the main loop of skein, it will be real fast, however power consumption will be higher and it won't run faster...

Thanks for the tips everyone. Removing the unroll's made the difference, but now I have a compiled (and running) version I'll keep playing with the compiler options and retrying with the pragma unroll's back in.

Incidentally when compiled with sm_35 the miner is reporting a MH/s value that I think should be KH/s...

Code:

[MASTER] Coinshield Network: New Block 29091
367528.1 MH/s | 0 Blks ACC=0 REJ=0 | Height = 29091 | Diff = 35 0-bits | 00:03:17
365121.1 MH/s | 0 Blks ACC=0 REJ=0 | Height = 29091 | Diff = 35 0-bits | 00:03:28

When compiled with sm_30 it reports correctly around the 28 MH/s mark, but with sm_35 it gives this. Should I be concerned?

Poena

newbie

Activity: 48

Merit: 0

Working now! Got 3 blocks in a few hours, but 1 rejected. Using two 750ti for a total of 33500kh/s.

djm34

legendary

Activity: 1400

Merit: 1050

Quote from: paulthetafy on October 26, 2014, 07:20:54 PM

Has anyone experienced nvcc taking a long time to complete on linux? Mine has been going for over an hour now and has till not finished. The process appears to be running fine but at 100% CPU.

if you compile for compute 30 up to compute 50 (or more) it may takes some time...
alternatively you can remove the pragma unroll in the main loop of skein, it will be real fast, however power consumption will be higher and it won't run faster...

Supercomputing

sr. member

Activity: 278

Merit: 250

Quote from: cbuchner1 on October 26, 2014, 04:44:15 PM

Quote from: RUNNY on October 26, 2014, 04:26:26 PM

AMD GPU miner??? Angry

the GPU mining channel draws way more power on my 780Ti's, so I am back to the CPU channel with our nonpublic GPU miner. Wink

Rig runs quieter and cooler now.

here's what 36 hours on the GPU channel got me (on Linux, thanks Supercomputing for the makefiles):
98.5 MH/s | 98 Blks ACC=82 REJ=16 | Height = 28862 | Diff = 35 0-bits | 35:44:32

lots of rejects though.

Christian

The GTX 750 Ti is the most efficient card for mining on the GPU channel at this time. It also may prove itself to be the most efficient for mining on both channels.

Supercomputing

sr. member

Activity: 278

Merit: 250

Quote from: paulthetafy on October 26, 2014, 07:20:54 PM

Has anyone experienced nvcc taking a long time to complete on linux? Mine has been going for over an hour now and has till not finished. The process appears to be running fine but at 100% CPU.

In your Makefile, replace the NVCC command with the one below and see if it makes a difference:

$(NVCC) -g -O3 -I . -Xptxas "-v" -arch=compute_50 --ptxas-options=-v $(JANSSON_INCLUDES) -o $@ -c $<

Also, do not forget to change the compute capability to match your GPU's.

paulthetafy

hero member

Activity: 820

Merit: 1000

Quote from: paulthetafy on October 26, 2014, 07:20:54 PM

Has anyone experienced nvcc taking a long time to complete on linux? Mine has been going for over an hour now and has till not finished. The process appears to be running fine but at 100% CPU.

Nevermind - it eventually finished!

paulthetafy

hero member

Activity: 820

Merit: 1000

Has anyone experienced nvcc taking a long time to complete on linux? Mine has been going for over an hour now and has till not finished. The process appears to be running fine but at 100% CPU.

enerbyte

hero member

Activity: 556

Merit: 501

{
"blocks" : 28996,
"currentblocksize" : 0,
"currentblocktx" : 0,
"difficulty" : 0.00000193,
"errors" : "",
"generate" : false,
"genproclimit" : -1,
"pooledtx" : 0,
"testnet" : false
}

antonio8

legendary

Activity: 1400

Merit: 1000

Can someone confirm the correct block? I show I am on block 28,947.

Just want to make sure I am not forked or anything from the amount I have found.

cbuchner1

hero member

Activity: 756

Merit: 502

Quote from: RUNNY on October 26, 2014, 04:26:26 PM

AMD GPU miner??? Angry

the GPU mining channel draws way more power on my 780Ti's, so I am back to the CPU channel with our nonpublic GPU miner. Wink

Rig runs quieter and cooler now.

here's what 36 hours on the GPU channel got me (on Linux, thanks Supercomputing for the makefiles):
98.5 MH/s | 98 Blks ACC=82 REJ=16 | Height = 28862 | Diff = 35 0-bits | 35:44:32

lots of rejects though.

Christian

RUNNY

sr. member

Activity: 251

Merit: 250

AMD GPU miner??? Angry

enerbyte

hero member

Activity: 556

Merit: 501

Quote from: antonio8 on October 26, 2014, 03:28:43 PM

@ enerbyte

Thanks again for the help.

received!
help as I can, although my English is not good.
I hope you find many blocks.

antonio8

legendary

Activity: 1400

Merit: 1000

@ enerbyte

Thanks again for the help.

djm34

legendary

Activity: 1400

Merit: 1050

Quote from: skunk on October 26, 2014, 12:49:15 PM

Quote from: mumus on October 26, 2014, 12:07:26 PM

Quote from: skunk on October 26, 2014, 11:09:14 AM

Quote from: mumus on October 26, 2014, 09:37:30 AM

Quote from: skunk on October 26, 2014, 08:56:15 AM

Quote from: djm34 on October 26, 2014, 07:50:15 AM

Quote from: skunk on October 26, 2014, 05:50:13 AM

Quote from: Supercomputing on October 26, 2014, 12:37:57 AM

GTX 760 is an sm_30 GPU and the application only seems to work on GPUs with greater compute capability for now.

so is there a chance to get it fixed?
ps: could somebody tell if sm_30 cards are finding any block under windows?

I didn't find any on testnet with a 660.
I am looking into a new kernel for the compute_30 cards (they don't have enough registers for the current one... )
but I have many things on my plate at the moment...
(I might have something... but the diff on testnet is a bit too high to get a fast answer... so it tells you already it won't be fast...)

so would lowering minimal difficulty on testnet help?
@viz: are you about to release a c release of the wallet? could you also please consider lowering the diff. in this release?
i consider donating the first 24h (or more if asked) of mining with a fixed sm_30 miner if this would help as a incentive for your (or anybody else) time...
thank you.

edit: djm34, if you have a roughly idea about which code parameters needs to be changed to fix this, please explain what i should try so i could do a trial/error loop myself until a fix is found. i'm not a software developer, but i've enough programming knowledge to perform simple code read and modifications...

If you just want to "simulate" a lower difficulty in the code then just simply do this in MinerThread.cpp (-> MinerThread::SK1024Miner() function)
            CBigNum target;
            target.SetCompact(m_pBLOCK->GetBits());
            target.SetCompact(0x7e003fff); //simulate lower difficulty

With this setting my single 750ti "finds" a "block" every couple of seconds. Of course submission will fail but you can check that the mining algo code is working.

thank you mumus, there are no submission attempts thus confirming the mining code is actually not working...
i've blindly tried to enable SKEIN_ERR_CHECK into hash/skein.h but nothing changed, do you know if there's something that could eventually be tuned in the code that could help fixing it?

Try to lower the const int throughput = 512*8 * 512 * 4; in sk1024.cu, for example to 512* 8 * 512 * 1 or just 512 * 512. Let me know if it worked or not. I'm trying to understand the code and now I'm playing now with another parameter that can be related and it may help. I'm a beginner in cuda coding and I definitely don't understand the code crypto algorithm yet.

unfortunately still no submission attempts, even not with just 512...

It isn't related to the throughput.
I need to rewrite the kernel for skein for compute_30 and I don't have time at the moment to work on that task...