Author

Topic: Nexus - Pure SHA3 + CPU/GPU + nPoS + 15 Active Innovations + More to Come - page 306. (Read 785514 times)

hero member
Activity: 820
Merit: 1000

Thanks for the tips everyone.  Removing the unroll's made the difference, but now I have a compiled (and running) version I'll keep playing with the compiler options and retrying with the pragma unroll's back in.

Incidentally when compiled with sm_35 the miner is reporting a MH/s value that I think should be KH/s...

Code:
[MASTER] Coinshield Network: New Block 29091
367528.1 MH/s | 0 Blks ACC=0 REJ=0 | Height = 29091 | Diff = 35 0-bits | 00:03:17
365121.1 MH/s | 0 Blks ACC=0 REJ=0 | Height = 29091 | Diff = 35 0-bits | 00:03:28

When compiled with sm_30 it reports correctly around the 28 MH/s mark, but with sm_35 it gives this.  Should I be concerned?


Your GPU's compute capability is not set correctly in the Makefile and the kernel is not being launched correctly. please see the link below:

https://developer.nvidia.com/cuda-gpus

Also please note that the application currently only works on GPUs with compute capability 3.5 or greater.
Thanks for the quick reply. Changed it back to 30 and all is working fine... I hope
Code:
[MASTER] Coinshield Network: New Block 29127
24.0 MH/s | 0 Blks ACC=0 REJ=0 | Height = 29127 | Diff = 36 0-bits | 00:04:09
24.0 MH/s | 0 Blks ACC=0 REJ=0 | Height = 29127 | Diff = 36 0-bits | 00:04:20
24.0 MH/s | 0 Blks ACC=0 REJ=0 | Height = 29127 | Diff = 36 0-bits | 00:04:31

ehmmm... you missed:

Also please note that the application currently only works on GPUs with compute capability 3.5 or greater.

you'll not find any block ever...
You're absolutely right, in my haste I did miss that!  I'll switch off until the kernel is rewritten to support sm_30.
sr. member
Activity: 329
Merit: 250

Thanks for the tips everyone.  Removing the unroll's made the difference, but now I have a compiled (and running) version I'll keep playing with the compiler options and retrying with the pragma unroll's back in.

Incidentally when compiled with sm_35 the miner is reporting a MH/s value that I think should be KH/s...

Code:
[MASTER] Coinshield Network: New Block 29091
367528.1 MH/s | 0 Blks ACC=0 REJ=0 | Height = 29091 | Diff = 35 0-bits | 00:03:17
365121.1 MH/s | 0 Blks ACC=0 REJ=0 | Height = 29091 | Diff = 35 0-bits | 00:03:28

When compiled with sm_30 it reports correctly around the 28 MH/s mark, but with sm_35 it gives this.  Should I be concerned?


Your GPU's compute capability is not set correctly in the Makefile and the kernel is not being launched correctly. please see the link below:

https://developer.nvidia.com/cuda-gpus

Also please note that the application currently only works on GPUs with compute capability 3.5 or greater.
Thanks for the quick reply. Changed it back to 30 and all is working fine... I hope
Code:
[MASTER] Coinshield Network: New Block 29127
24.0 MH/s | 0 Blks ACC=0 REJ=0 | Height = 29127 | Diff = 36 0-bits | 00:04:09
24.0 MH/s | 0 Blks ACC=0 REJ=0 | Height = 29127 | Diff = 36 0-bits | 00:04:20
24.0 MH/s | 0 Blks ACC=0 REJ=0 | Height = 29127 | Diff = 36 0-bits | 00:04:31

ehmmm... you missed:

Also please note that the application currently only works on GPUs with compute capability 3.5 or greater.

you'll not find any block ever...
hero member
Activity: 820
Merit: 1000

Thanks for the tips everyone.  Removing the unroll's made the difference, but now I have a compiled (and running) version I'll keep playing with the compiler options and retrying with the pragma unroll's back in.

Incidentally when compiled with sm_35 the miner is reporting a MH/s value that I think should be KH/s...

Code:
[MASTER] Coinshield Network: New Block 29091
367528.1 MH/s | 0 Blks ACC=0 REJ=0 | Height = 29091 | Diff = 35 0-bits | 00:03:17
365121.1 MH/s | 0 Blks ACC=0 REJ=0 | Height = 29091 | Diff = 35 0-bits | 00:03:28

When compiled with sm_30 it reports correctly around the 28 MH/s mark, but with sm_35 it gives this.  Should I be concerned?


Your GPU's compute capability is not set correctly in the Makefile and the kernel is not being launched correctly. please see the link below:

https://developer.nvidia.com/cuda-gpus

Also please note that the application currently only works on GPUs with compute capability 3.5 or greater.
Thanks for the quick reply. Changed it back to 30 and all is working fine... I hope
Code:
[MASTER] Coinshield Network: New Block 29127
24.0 MH/s | 0 Blks ACC=0 REJ=0 | Height = 29127 | Diff = 36 0-bits | 00:04:09
24.0 MH/s | 0 Blks ACC=0 REJ=0 | Height = 29127 | Diff = 36 0-bits | 00:04:20
24.0 MH/s | 0 Blks ACC=0 REJ=0 | Height = 29127 | Diff = 36 0-bits | 00:04:31
sr. member
Activity: 278
Merit: 250

Thanks for the tips everyone.  Removing the unroll's made the difference, but now I have a compiled (and running) version I'll keep playing with the compiler options and retrying with the pragma unroll's back in.

Incidentally when compiled with sm_35 the miner is reporting a MH/s value that I think should be KH/s...

Code:
[MASTER] Coinshield Network: New Block 29091
367528.1 MH/s | 0 Blks ACC=0 REJ=0 | Height = 29091 | Diff = 35 0-bits | 00:03:17
365121.1 MH/s | 0 Blks ACC=0 REJ=0 | Height = 29091 | Diff = 35 0-bits | 00:03:28

When compiled with sm_30 it reports correctly around the 28 MH/s mark, but with sm_35 it gives this.  Should I be concerned?


Your GPU's compute capability is not set correctly in the Makefile and the kernel is not being launched correctly. please see the link below:

https://developer.nvidia.com/cuda-gpus

Also please note that the application currently only works on GPUs with compute capability 3.5 or greater.
hero member
Activity: 820
Merit: 1000
Has anyone experienced nvcc taking a long time to complete on linux?  Mine has been going for over an hour now and has till not finished. The process appears to be running fine but at 100% CPU.
if you compile for compute 30 up to compute 50 (or more) it may takes some time...
alternatively you can remove the pragma unroll in the main loop of skein, it will be real fast, however power consumption will be higher and it won't run faster...

Thanks for the tips everyone.  Removing the unroll's made the difference, but now I have a compiled (and running) version I'll keep playing with the compiler options and retrying with the pragma unroll's back in.

Incidentally when compiled with sm_35 the miner is reporting a MH/s value that I think should be KH/s...

Code:
[MASTER] Coinshield Network: New Block 29091
367528.1 MH/s | 0 Blks ACC=0 REJ=0 | Height = 29091 | Diff = 35 0-bits | 00:03:17
365121.1 MH/s | 0 Blks ACC=0 REJ=0 | Height = 29091 | Diff = 35 0-bits | 00:03:28

When compiled with sm_30 it reports correctly around the 28 MH/s mark, but with sm_35 it gives this.  Should I be concerned?
newbie
Activity: 48
Merit: 0
Working now! Got 3 blocks in a few hours, but 1 rejected. Using two 750ti for a total of 33500kh/s.
legendary
Activity: 1400
Merit: 1050
Has anyone experienced nvcc taking a long time to complete on linux?  Mine has been going for over an hour now and has till not finished. The process appears to be running fine but at 100% CPU.
if you compile for compute 30 up to compute 50 (or more) it may takes some time...
alternatively you can remove the pragma unroll in the main loop of skein, it will be real fast, however power consumption will be higher and it won't run faster...
sr. member
Activity: 278
Merit: 250
AMD GPU miner??? Angry Angry Angry


the GPU mining channel draws way more power on my 780Ti's, so I am back to the CPU channel with our nonpublic GPU miner. Wink
Rig runs quieter and cooler now.

here's what 36 hours on the GPU channel got me (on Linux, thanks Supercomputing for the makefiles):
98.5 MH/s | 98 Blks ACC=82 REJ=16 | Height = 28862 | Diff = 35 0-bits | 35:44:32

lots of rejects though.

Christian


The GTX 750 Ti is the most efficient card for mining on the GPU channel at this time. It also may prove itself to be the most efficient for mining on both channels.
sr. member
Activity: 278
Merit: 250
Has anyone experienced nvcc taking a long time to complete on linux?  Mine has been going for over an hour now and has till not finished. The process appears to be running fine but at 100% CPU.

In your Makefile, replace the NVCC command with the one below and see if it makes a difference:

$(NVCC) -g -O3 -I . -Xptxas "-v" -arch=compute_50 --ptxas-options=-v $(JANSSON_INCLUDES) -o $@ -c $<

Also, do not forget to change the compute capability to match your GPU's.
hero member
Activity: 820
Merit: 1000
Has anyone experienced nvcc taking a long time to complete on linux?  Mine has been going for over an hour now and has till not finished. The process appears to be running fine but at 100% CPU.
Nevermind - it eventually finished!
hero member
Activity: 820
Merit: 1000
Has anyone experienced nvcc taking a long time to complete on linux?  Mine has been going for over an hour now and has till not finished. The process appears to be running fine but at 100% CPU.
hero member
Activity: 556
Merit: 501
{
"blocks" : 28996,
"currentblocksize" : 0,
"currentblocktx" : 0,
"difficulty" : 0.00000193,
"errors" : "",
"generate" : false,
"genproclimit" : -1,
"pooledtx" : 0,
"testnet" : false
}
legendary
Activity: 1400
Merit: 1000
Can someone confirm the correct block? I show I am on block 28,947.

Just want to make sure I am not forked or anything from the amount I have found.
hero member
Activity: 756
Merit: 502
AMD GPU miner??? Angry Angry Angry


the GPU mining channel draws way more power on my 780Ti's, so I am back to the CPU channel with our nonpublic GPU miner. Wink
Rig runs quieter and cooler now.

here's what 36 hours on the GPU channel got me (on Linux, thanks Supercomputing for the makefiles):
98.5 MH/s | 98 Blks ACC=82 REJ=16 | Height = 28862 | Diff = 35 0-bits | 35:44:32

lots of rejects though.

Christian
sr. member
Activity: 251
Merit: 250
hero member
Activity: 556
Merit: 501
@ enerbyte

Thanks again for the help.

received!
help as I can, although my English is not good.
I hope you find many blocks.
legendary
Activity: 1400
Merit: 1000
@ enerbyte

Thanks again for the help.
legendary
Activity: 1400
Merit: 1050
GTX 760 is an sm_30 GPU and the application only seems to work on GPUs with greater compute capability for now.
so is there a chance to get it fixed?
ps: could somebody tell if sm_30 cards are finding any block under windows?

I didn't find any on testnet with a 660.
I am looking into a new kernel for the compute_30 cards (they don't have enough registers for the current one... )
but I have many things on my plate at the moment...
(I might have something... but the diff on testnet is a bit too high to get a fast answer... so it tells you already it won't be fast...)

so would lowering minimal difficulty on testnet help?
@viz: are you about to release a c release of the wallet? could you also please consider lowering the diff. in this release?
i consider donating the first 24h (or more if asked) of mining with a fixed sm_30 miner if this would help as a incentive for your (or anybody else) time...
thank you.

edit: djm34, if you have a roughly idea about which code parameters needs to be changed to fix this, please explain what i should try so i could do a trial/error loop myself until a fix is found. i'm not a software developer, but i've enough programming knowledge to perform simple code read and modifications...

If you just want to "simulate" a lower difficulty in the code then just simply do this in MinerThread.cpp (-> MinerThread::SK1024Miner() function)
            CBigNum target;
            target.SetCompact(m_pBLOCK->GetBits());            
            target.SetCompact(0x7e003fff); //simulate lower difficulty

With this setting my single 750ti "finds" a "block" every couple of seconds. Of course submission will fail but you can check that the mining algo code is working. 
thank you mumus, there are no submission attempts thus confirming the mining code is actually not working...
i've blindly tried to enable SKEIN_ERR_CHECK into hash/skein.h but nothing changed, do you know if there's something that could eventually be tuned in the code that could help fixing it?

Try to lower the const int throughput = 512*8 * 512 * 4; in sk1024.cu, for example to 512* 8 * 512 * 1 or just 512 *  512. Let me know if it worked or not. I'm trying to understand the code and now I'm playing now with another parameter that can be related and it may help. I'm a beginner in cuda coding and I definitely don't understand the code crypto algorithm yet.
unfortunately still no submission attempts, even not with just 512...
It isn't related to the throughput.
I need to rewrite the kernel for skein for compute_30 and I don't have time at the moment to work on that task...
sr. member
Activity: 329
Merit: 250
GTX 760 is an sm_30 GPU and the application only seems to work on GPUs with greater compute capability for now.
so is there a chance to get it fixed?
ps: could somebody tell if sm_30 cards are finding any block under windows?

I didn't find any on testnet with a 660.
I am looking into a new kernel for the compute_30 cards (they don't have enough registers for the current one... )
but I have many things on my plate at the moment...
(I might have something... but the diff on testnet is a bit too high to get a fast answer... so it tells you already it won't be fast...)

so would lowering minimal difficulty on testnet help?
@viz: are you about to release a c release of the wallet? could you also please consider lowering the diff. in this release?
i consider donating the first 24h (or more if asked) of mining with a fixed sm_30 miner if this would help as a incentive for your (or anybody else) time...
thank you.

edit: djm34, if you have a roughly idea about which code parameters needs to be changed to fix this, please explain what i should try so i could do a trial/error loop myself until a fix is found. i'm not a software developer, but i've enough programming knowledge to perform simple code read and modifications...

If you just want to "simulate" a lower difficulty in the code then just simply do this in MinerThread.cpp (-> MinerThread::SK1024Miner() function)
            CBigNum target;
            target.SetCompact(m_pBLOCK->GetBits());            
            target.SetCompact(0x7e003fff); //simulate lower difficulty

With this setting my single 750ti "finds" a "block" every couple of seconds. Of course submission will fail but you can check that the mining algo code is working.  
thank you mumus, there are no submission attempts thus confirming the mining code is actually not working...
i've blindly tried to enable SKEIN_ERR_CHECK into hash/skein.h but nothing changed, do you know if there's something that could eventually be tuned in the code that could help fixing it?

Try to lower the const int throughput = 512*8 * 512 * 4; in sk1024.cu, for example to 512* 8 * 512 * 1 or just 512 *  512. Let me know if it worked or not. I'm trying to understand the code and now I'm playing now with another parameter that can be related and it may help. I'm a beginner in cuda coding and I definitely don't understand the code crypto algorithm yet.
unfortunately still no submission attempts, even not with just 512...
sr. member
Activity: 291
Merit: 250
GTX 760 is an sm_30 GPU and the application only seems to work on GPUs with greater compute capability for now.
so is there a chance to get it fixed?
ps: could somebody tell if sm_30 cards are finding any block under windows?

I didn't find any on testnet with a 660.
I am looking into a new kernel for the compute_30 cards (they don't have enough registers for the current one... )
but I have many things on my plate at the moment...
(I might have something... but the diff on testnet is a bit too high to get a fast answer... so it tells you already it won't be fast...)

so would lowering minimal difficulty on testnet help?
@viz: are you about to release a c release of the wallet? could you also please consider lowering the diff. in this release?
i consider donating the first 24h (or more if asked) of mining with a fixed sm_30 miner if this would help as a incentive for your (or anybody else) time...
thank you.

edit: djm34, if you have a roughly idea about which code parameters needs to be changed to fix this, please explain what i should try so i could do a trial/error loop myself until a fix is found. i'm not a software developer, but i've enough programming knowledge to perform simple code read and modifications...

If you just want to "simulate" a lower difficulty in the code then just simply do this in MinerThread.cpp (-> MinerThread::SK1024Miner() function)
            CBigNum target;
            target.SetCompact(m_pBLOCK->GetBits());            
            target.SetCompact(0x7e003fff); //simulate lower difficulty

With this setting my single 750ti "finds" a "block" every couple of seconds. Of course submission will fail but you can check that the mining algo code is working.  
thank you mumus, there are no submission attempts thus confirming the mining code is actually not working...
i've blindly tried to enable SKEIN_ERR_CHECK into hash/skein.h but nothing changed, do you know if there's something that could eventually be tuned in the code that could help fixing it?

Try to lower the const int throughput = 512*8 * 512 * 4; in sk1024.cu, for example to 512* 8 * 512 * 1 or just 512 *  512. Let me know if it worked or not. I'm trying to understand the code and now I'm playing now with another parameter that can be related and it may help. I'm a beginner in cuda coding and I definitely don't understand the code crypto algorithm yet.
Jump to: