[ANN] cudaMiner & ccMiner CUDA based mining applications [Windows/Linux/MacOSX] - page 240.

Poena

newbie

Activity: 48

Merit: 0

Quote from: sp_ on July 17, 2014, 08:05:49 PM

Quote from: Waldozaur12 on July 17, 2014, 08:01:42 PM

any Virus&Trojans inside Cudaminer software ?

Claymore made 100 000$ on the monero miner with 5% tip...

Hidden tip or a legit fee for using his apps?

sp_

legendary

Activity: 2954

Merit: 1087

Team Black developer

Quote from: Waldozaur12 on July 17, 2014, 08:01:42 PM

any Virus&Trojans inside Cudaminer software ?

Claymore made 100 000$ on the monero miner with 5% tip...

Waldozaur12

legendary

Activity: 1223

Merit: 1000

any Virus&Trojans inside Cudaminer software ?

Newwsr

sr. member

Activity: 311

Merit: 250

Quote from: d33_man on July 17, 2014, 03:44:11 PM

Hey,

I'm using ccminer 2.1 and keep getting the error "abnormal hashes, exiting with code 211!" when mining x11 on 4 750tis. Any advice as to what may be causing this? CCminer just shuts down after the error which occurs after a few hours mining.

Thanks

Friend uses ccminer v.1.1 use it here I have not poblema not

d33_man

member

Activity: 65

Merit: 10

Hey,

I'm using ccminer 2.1 and keep getting the error "abnormal hashes, exiting with code 211!" when mining x11 on 4 750tis. Any advice as to what may be causing this? CCminer just shuts down after the error which occurs after a few hours mining.

Thanks

Schleicher

hero member

Activity: 675

Merit: 514

Quote from: NeuroticFish on July 17, 2014, 06:00:22 AM

I am already bound to ancient versions of ccminer. Because I still have compute 2.1....

Such areas really can't be fixed / done with #ifdef and also kept a not-so-good version for older GPUs?
But maybe I wish too much. Obviously everybody only improves the versions mostly for own needs....

Sure, that's possible.
Example for rotate function in sha256:

Code:

#if __CUDA_ARCH__<350
#define rrot(x, bits) ((x >> bits) | (x << (32 - bits)))
#else
#define rrot(x, bits) __funnelshift_r(x, x, bits)
#endif

But usually there are other reasons for not supporting older cards

polanskiman

full member

Activity: 266

Merit: 100

Quote from: cayars on July 17, 2014, 05:40:43 AM

Quote from: polanskiman on July 17, 2014, 02:11:21 AM

Quote from: Boffinboy on July 17, 2014, 02:08:03 AM

Quote from: polanskiman on July 17, 2014, 02:03:43 AM

I've update to the Beta drivers 340.88 and the hash increased slightly. Now getting 7.1 Mh or so but still not the 7.8/7.9 Mh you are getting. I also tried with minep.it pool but as I was expecting it didn't change anything.

Any more ideas?

I have not followed the whole conversation, but I am wondering if one of you is running on risers and the other not?

That's an interesting theory. I read somewhere that indeed it could affect the hash rate http://cryptomining-blog.com/1276-first-impressions-from-a-6-card-mining-rig-using-geforce-gtx-750-ti-gpus/

I, on my side am using risers. Powered for that matter. The difference it rather significant: ~800/900 Kh

I am also using ASUS, not Gigabyte. Perhaps that could also be the reason.

Good observation. No risers of any kind for me.

Well I don't think it's a riser issue. I've connected the 2 GPUs directly to the slots of the mobo and I get the same hashrate as with risers... back to square one.

djm34

legendary

Activity: 1400

Merit: 1050

Quote from: sp_ on July 17, 2014, 06:08:25 AM

Quote from: cayars on July 17, 2014, 05:54:16 AM

I won't have the source available until after the weekend as I want to compile/test on Linux first and need to get the makefiles setup properly.
However, if you want to start playing now pull down the latest from dsm34 and use this to test with. https://github.com/djm34/ccminer
If I remember correctly I have tried to compile compute5.0 under CUDA 6 and it broke something in the FRESH algo. Because of this I went back to compute 3/3.5 on Cuda 5.5 for the last release.
So point being after trying your changes, please make sure to test every algo to make sure something else doesn't brake under Cuda 6 and compute 5 (if you try this).
BTW, are you doing your testing on Windows or Linux?

Ok. I'll start with the dsm34 branch. But in order to test all the algo's I need the unified sourcecode. I will be developing on windows.

If possible, try to put changes in cuda_helper.h rather than breaking everybody else code...

yellowduck2

hero member

Activity: 868

Merit: 1000

Quote from: djm34 on July 17, 2014, 05:25:23 AM

Quote from: yellowduck2 on July 17, 2014, 05:00:44 AM

Quote from: sp_ on July 17, 2014, 02:30:28 AM

Can anyone send me a link to the latest NVminer sourcecode. (merged)

Seems like the funnel shift is missing in some of the 11 algorithms: (With a ROL compute 5.0 + devices will get a boost)

From Github (ccminer 1.2):

cuda_x11_luffa512.cu:

   define TWEAK(a0,a1,a2,a3,j)\
   a0 = (a0<<(j))|(a0>>(32-j));\
   a1 = (a1<<(j))|(a1>>(32-j));\
   a2 = (a2<<(j))|(a2>>(32-j));\
   a3 = (a3<<(j))|(a3>>(32-j));

#define MIXWORD(a0,a4)\
   a4 ^= a0;\
   a0 = (a0<<2) | (a0>>(30));\
   a0 ^= a4;\
   a4 = (a4<<14) | (a4>>(18));\
   a4 ^= a0;\
   a0 = (a0<<10) | (a0>>(22));\
   a0 ^= a4;\
   a4 = (a4<<1) | (a4>>(31));

cuda_x11_cubehash512.cu:

#define ROTATEUPWARDS7(a) (((a) << 7) | ((a) >> 25))
#define ROTATEUPWARDS11(a) (((a) << 11) | ((a) >> 21))

etc..

By rewriting these macros shift+shift+or cuda instructions can be replaced with a single rol (3 times faster) (Compute maxwell / 5.0+)

Hope u succeed in optimizing code

The problem is that it may break definitively the compatibility with other versions

5.0 is the future. Need to get started and path the way for 800 series. By the time 800 series is out, there will be perfectly optimized 5.0

sp_

legendary

Activity: 2954

Merit: 1087

Team Black developer

Quote from: cayars on July 17, 2014, 05:54:16 AM

I won't have the source available until after the weekend as I want to compile/test on Linux first and need to get the makefiles setup properly.
However, if you want to start playing now pull down the latest from dsm34 and use this to test with. https://github.com/djm34/ccminer
If I remember correctly I have tried to compile compute5.0 under CUDA 6 and it broke something in the FRESH algo. Because of this I went back to compute 3/3.5 on Cuda 5.5 for the last release.
So point being after trying your changes, please make sure to test every algo to make sure something else doesn't brake under Cuda 6 and compute 5 (if you try this).
BTW, are you doing your testing on Windows or Linux?

Ok. I'll start with the dsm34 branch. But in order to test all the algo's I need the unified sourcecode. I will be developing on windows.

NeuroticFish

legendary

Activity: 3668

Merit: 6382

Looking for campaign manager? Contact icopress!

I am already bound to ancient versions of ccminer. Because I still have compute 2.1....

Such areas really can't be fixed / done with #ifdef and also kept a not-so-good version for older GPUs?
But maybe I wish too much. Obviously everybody only improves the versions mostly for own needs....

cayars

full member

Activity: 168

Merit: 100

Quote from: sp_ on July 17, 2014, 04:21:35 AM

Cuda 6.0

Check:

Version features and specifications

The funnel shift is available for compute 3.5 and higher.

http://en.wikipedia.org/wiki/CUDA

How to inline CUDA Assembly:
http://docs.nvidia.com/cuda/inline-ptx-assembly/index.html#axzz37iRLSsMj

Instruction set.

docs.nvidia.com/cuda/parallel-thread-execution/index.html#logic-and-shift-instructions

8.7.5.1. Logic and Shift Instructions:
(8.7.5.6. Logic and Shift Instructions: shf)

So the following macro should be converted to something like:

a0 = (a0<<(j))|(a0>>(32-j)); --> shf.l.wrap.b32 a0,a0,j,c
...

I have some time in the weekend to do the full implementation. Just give me the latest branch to work on...

I won't have the source available until after the weekend as I want to compile/test on Linux first and need to get the makefiles setup properly.

However, if you want to start playing now pull down the latest from dsm34 and use this to test with. https://github.com/djm34/ccminer

If I remember correctly I have tried to compile compute5.0 under CUDA 6 and it broke something in the FRESH algo. Because of this I went back to compute 3/3.5 on Cuda 5.5 for the last release.

So point being after trying your changes, please make sure to test every algo to make sure something else doesn't brake under Cuda 6 and compute 5 (if you try this).

BTW, are you doing your testing on Windows or Linux?

sp_

legendary

Activity: 2954

Merit: 1087

Team Black developer

Christian has already broken compabillity for any hardware below compute 3.0. In the Killer Groestl implementation he use Compute 3.0 + instructions. like perm:

static __device__ uint32_t cuda_swab32(uint32_t x)
{
return __byte_perm(x, 0, 0x0123);
}

I will implement the changes by using a compilerflag like this:

#if __CUDA_ARCH__ >= 130
return (uint32_t)__double2hiint(__longlong_as_double(x));
#else
return (uint32_t)(x >> 32);
#endif

So if you compile with compute 5.0 you will get maxwell funnelshift instead.

The current CC miner runs at the same speed for compute 3.0, 3,5 and 5.0. This is about to change.

cayars

full member

Activity: 168

Merit: 100

Quote from: polanskiman on July 17, 2014, 02:11:21 AM

Quote from: Boffinboy on July 17, 2014, 02:08:03 AM

Quote from: polanskiman on July 17, 2014, 02:03:43 AM

I've update to the Beta drivers 340.88 and the hash increased slightly. Now getting 7.1 Mh or so but still not the 7.8/7.9 Mh you are getting. I also tried with minep.it pool but as I was expecting it didn't change anything.

Any more ideas?

I have not followed the whole conversation, but I am wondering if one of you is running on risers and the other not?

That's an interesting theory. I read somewhere that indeed it could affect the hash rate http://cryptomining-blog.com/1276-first-impressions-from-a-6-card-mining-rig-using-geforce-gtx-750-ti-gpus/

I, on my side am using risers. Powered for that matter. The difference it rather significant: ~800/900 Kh

I am also using ASUS, not Gigabyte. Perhaps that could also be the reason.

Good observation. No risers of any kind for me.

djm34

legendary

Activity: 1400

Merit: 1050

Quote from: yellowduck2 on July 17, 2014, 05:00:44 AM

Quote from: sp_ on July 17, 2014, 02:30:28 AM

Can anyone send me a link to the latest NVminer sourcecode. (merged)

Seems like the funnel shift is missing in some of the 11 algorithms: (With a ROL compute 5.0 + devices will get a boost)

From Github (ccminer 1.2):

cuda_x11_luffa512.cu:

   define TWEAK(a0,a1,a2,a3,j)\
   a0 = (a0<<(j))|(a0>>(32-j));\
   a1 = (a1<<(j))|(a1>>(32-j));\
   a2 = (a2<<(j))|(a2>>(32-j));\
   a3 = (a3<<(j))|(a3>>(32-j));

#define MIXWORD(a0,a4)\
   a4 ^= a0;\
   a0 = (a0<<2) | (a0>>(30));\
   a0 ^= a4;\
   a4 = (a4<<14) | (a4>>(18));\
   a4 ^= a0;\
   a0 = (a0<<10) | (a0>>(22));\
   a0 ^= a4;\
   a4 = (a4<<1) | (a4>>(31));

cuda_x11_cubehash512.cu:

#define ROTATEUPWARDS7(a) (((a) << 7) | ((a) >> 25))
#define ROTATEUPWARDS11(a) (((a) << 11) | ((a) >> 21))

etc..

By rewriting these macros shift+shift+or cuda instructions can be replaced with a single rol (3 times faster) (Compute maxwell / 5.0+)

Hope u succeed in optimizing code

The problem is that it may break definitively the compatibility with other versions

yellowduck2

hero member

Activity: 868

Merit: 1000

Quote from: sp_ on July 17, 2014, 02:30:28 AM

Can anyone send me a link to the latest NVminer sourcecode. (merged)

Seems like the funnel shift is missing in some of the 11 algorithms: (With a ROL compute 5.0 + devices will get a boost)

From Github (ccminer 1.2):

cuda_x11_luffa512.cu:

   define TWEAK(a0,a1,a2,a3,j)\
   a0 = (a0<<(j))|(a0>>(32-j));\
   a1 = (a1<<(j))|(a1>>(32-j));\
   a2 = (a2<<(j))|(a2>>(32-j));\
   a3 = (a3<<(j))|(a3>>(32-j));

#define MIXWORD(a0,a4)\
   a4 ^= a0;\
   a0 = (a0<<2) | (a0>>(30));\
   a0 ^= a4;\
   a4 = (a4<<14) | (a4>>(18));\
   a4 ^= a0;\
   a0 = (a0<<10) | (a0>>(22));\
   a0 ^= a4;\
   a4 = (a4<<1) | (a4>>(31));

cuda_x11_cubehash512.cu:

#define ROTATEUPWARDS7(a) (((a) << 7) | ((a) >> 25))
#define ROTATEUPWARDS11(a) (((a) << 11) | ((a) >> 21))

etc..

By rewriting these macros shift+shift+or cuda instructions can be replaced with a single rol (3 times faster) (Compute maxwell / 5.0+)

Hope u succeed in optimizing code

miner256

newbie

Activity: 23

Merit: 0

Not that I can see either.
All closed, and binaries for Windows only :-(

Looking forward to trying something when (if?) the source does get released though.

This weekend I am going to try KopiemTu 1.4 and see what that is like - reading good things about it, and I like the idea of trying some tweaking of the card on linux.
https://litecointalk.org/index.php?topic=16800.0

Quote from: sp_ on July 17, 2014, 03:00:16 AM

No sourcecode available in nvminer.zip.

sp_

legendary

Activity: 2954

Merit: 1087

Team Black developer

Cuda 6.0

Check:

Version features and specifications

The funnel shift is available for compute 3.5 and higher.

http://en.wikipedia.org/wiki/CUDA

How to inline CUDA Assembly:
http://docs.nvidia.com/cuda/inline-ptx-assembly/index.html#axzz37iRLSsMj

Instruction set.

docs.nvidia.com/cuda/parallel-thread-execution/index.html#logic-and-shift-instructions

8.7.5.1. Logic and Shift Instructions:
(8.7.5.6. Logic and Shift Instructions: shf)

So the following macro should be converted to something like:

a0 = (a0<<(j))|(a0>>(32-j)); --> shf.l.wrap.b32 a0,a0,j,c
...

I have some time in the weekend to do the full implementation. Just give me the latest branch to work on...

djm34

legendary

Activity: 1400

Merit: 1050

Quote from: sp_ on July 17, 2014, 02:30:28 AM

Can anyone send me a link to the latest NVminer sourcecode. (merged)

Seems like the funnel shift is missing in some of the 11 algorithms: (With a ROL compute 5.0 + devices will get a boost)

From Github (ccminer 1.2):

cuda_x11_luffa512.cu:

define TWEAK(a0,a1,a2,a3,j)\
a0 = (a0<<(j))|(a0>>(32-j));\
a1 = (a1<<(j))|(a1>>(32-j));\
a2 = (a2<<(j))|(a2>>(32-j));\
a3 = (a3<<(j))|(a3>>(32-j));

#define MIXWORD(a0,a4)\
a4 ^= a0;\
a0 = (a0<<2) | (a0>>(30));\
a0 ^= a4;\
a4 = (a4<<14) | (a4>>(18));\
a4 ^= a0;\
a0 = (a0<<10) | (a0>>(22));\
a0 ^= a4;\
a4 = (a4<<1) | (a4>>(31));

cuda_x11_cubehash512.cu:

#define ROTATEUPWARDS7(a) (((a) << 7) | ((a) >> 25))
#define ROTATEUPWARDS11(a) (((a) << 11) | ((a) >> 21))

etc..

By rewriting these macros shift+shift+or cuda instructions can be replaced with a single rol (3 times faster) (Compute maxwell / 5.0+)

which cuda version ?

sp_

legendary

Activity: 2954

Merit: 1087

Team Black developer

No sourcecode available in nvminer.zip.

Topic: [ANN] cudaMiner & ccMiner CUDA based mining applications [Windows/Linux/MacOSX] - page 240. (Read 3426985 times)