[ANN] cudaMiner & ccMiner CUDA based mining applications [Windows/Linux/MacOSX] - page 111.

bigjme

sr. member

Activity: 350

Merit: 250

Quote from: djm34 on September 08, 2014, 11:06:27 AM

Quote from: bigjme on September 08, 2014, 10:50:21 AM

Memory clocks can have an affect on it aswell. It may not handle memory+core clock well.
If its not a memory intensive algo, try dropping memory to stock and upping the core. It often gives better results, but a core of 1.37GHz is pretty good

I wonder if the 980's will overclock aswell, that would be a monster if we bump the clock up by the extra 200MHz

2x6pin, might limit a bit the possibility for extra juice and higher clock (?)

It must be 2 8pins. 6 pin is only 75w limiting the cards to 150w plus the power from the mb. 2 8pins would allow over 300w.

It would be strange to see 6 pins now I think

djm34

legendary

Activity: 1400

Merit: 1050

Quote from: bigjme on September 08, 2014, 10:50:21 AM

Memory clocks can have an affect on it aswell. It may not handle memory+core clock well.
If its not a memory intensive algo, try dropping memory to stock and upping the core. It often gives better results, but a core of 1.37GHz is pretty good

I wonder if the 980's will overclock aswell, that would be a monster if we bump the clock up by the extra 200MHz

2x6pin, might limit a bit the possibility for extra juice and higher clock (?)

Ignition75

sr. member

Activity: 462

Merit: 250

www.dashpay.io

Quote from: bigjme on September 08, 2014, 10:50:21 AM

Memory clocks can have an affect on it aswell. It may not handle memory+core clock well.
If its not a memory intensive algo, try dropping memory to stock and upping the core. It often gives better results, but a core of 1.37GHz is pretty good

I wonder if the 980's will overclock aswell, that would be a monster if we bump the clock up by the extra 200MHz

Droooool... Tongue

bigjme

sr. member

Activity: 350

Merit: 250

Memory clocks can have an affect on it aswell. It may not handle memory+core clock well.
If its not a memory intensive algo, try dropping memory to stock and upping the core. It often gives better results, but a core of 1.37GHz is pretty good

I wonder if the 980's will overclock aswell, that would be a monster if we bump the clock up by the extra 200MHz

Ignition75

sr. member

Activity: 462

Merit: 250

www.dashpay.io

Quote from: bigjme on September 08, 2014, 10:03:20 AM

some cards simply wont overclock as high as others, even the same model and bios.
Best bet is to overclock all cards separately and go from there.

Make sure you untick Sync when overclocking else they will all clock the same

I had 3 cards, one would go over 100MHz higher then the other 2, no reason why

Worked a treat, thanks for the suggestion. It was one card that didn't want to go over factory OC, the rest were happy with +70/350.

Must have just scraped through on the test bench.

Ignition75

sr. member

Activity: 462

Merit: 250

www.dashpay.io

Quote from: bigjme on September 08, 2014, 10:03:20 AM

some cards simply wont overclock as high as others, even the same model and bios.
Best bet is to overclock all cards separately and go from there.

Make sure you untick Sync when overclocking else they will all clock the same

I had 3 cards, one would go over 100MHz higher then the other 2, no reason why

Roger that...Thx...

bigjme

sr. member

Activity: 350

Merit: 250

some cards simply wont overclock as high as others, even the same model and bios.
Best bet is to overclock all cards separately and go from there.

Make sure you untick Sync when overclocking else they will all clock the same

I had 3 cards, one would go over 100MHz higher then the other 2, no reason why

Ignition75

sr. member

Activity: 462

Merit: 250

www.dashpay.io

2 Nvidia rigs, exactly the same hardware and software, mining x11, 6 x 750ti GA Blacks:

First one, happy as, clocked a little:

Second one, cracks the shits as soon as I wanna clock even a little bit:

They both even have the same BIOS image.

Wipe it clean and start again with drivers/cuda/visual studio etc?

djm34

legendary

Activity: 1400

Merit: 1050

Quote from: bigjme on September 08, 2014, 07:06:14 AM

Basically it means you can tell the program what to do when, so instead of it auto running a long task and leaving quick ones to wait, you could do the quicker tasks and let the longer ones run after. In theory reducing the time the gpu is waiting around for

May be wrong though

yeah I guess I need the laymans terms for your explanation Grin

In layman terms, I would say that usually when you ask your gpu to do something, you use a translator (compiler)
But since the translator isn't really native from the country of the gpu, sometime things get a bit distorted...
So what it is proposed here is to speak directly to the gpu in his own language and not even using google translate (inline ptx)

bigjme

sr. member

Activity: 350

Merit: 250

Basically it means you can tell the program what to do when, so instead of it auto running a long task and leaving quick ones to wait, you could do the quicker tasks and let the longer ones run after. In theory reducing the time the gpu is waiting around for

May be wrong though

yellowduck2

hero member

Activity: 868

Merit: 1000

Quote from: cbuchner1 on September 08, 2014, 06:41:59 AM

Quote from: yellowduck2 on September 08, 2014, 06:19:46 AM

So its gonna be Killer Groestl part 2 final episode ? Cheesy

I think one can kill a lot of things with such a low level assembler. Wink

It offers much more control than if one used inline PTX. Inline PTX still doesn't allow you to perform manual register allocation. And a lot of PTX instructions expand to several machine commands, reducing the amount of control over the exact sequence of instructions that the programmer may want to assert.

I totally got no idea what u are talking about but i am sure your insight will be understood by many here Grin

Lots of developer lurking around

cbuchner1

hero member

Activity: 756

Merit: 502

Quote from: yellowduck2 on September 08, 2014, 06:19:46 AM

So its gonna be Killer Groestl part 2 final episode ? Cheesy

I think one can kill a lot of things with such a low level assembler. Wink

It offers much more control than if one used inline PTX. Inline PTX still doesn't allow you to perform manual register allocation. And a lot of PTX instructions expand to several machine commands, reducing the amount of control over the exact sequence of instructions that the programmer may want to assert.

djm34

legendary

Activity: 1400

Merit: 1050

Quote from: yellowduck2 on September 08, 2014, 06:08:23 AM

Quote from: cbuchner1 on September 08, 2014, 05:34:19 AM

FYI.

Scott Gray from the nVidia forums is about to release a nearly feature complete assembler for the Maxwell architecture.

https://devtalk.nvidia.com/default/topic/773064/cuda-programming-and-performance/maxwell-assembler/

It could be highly interesting to tweak innermost loops of hashing algorithms for minimal register use, and for maximum instruction throughput.

He's got one interesting example of what kind of performance boost he is able to achieve:
https://devtalk.nvidia.com/default/topic/690631/cuda-programming-and-performance/so-whats-new-about-maxwell-/post/4305310/#4305310

He has also hinted at the possibility of a Kepler version of his assembler.

Christian

I don't really understand. In layman terms , it means able lower to wattage and increase hash ?

I think that could increase the hash... as well as the wattage.
I mean nobody wants a card working at 50% of its capability.
You want to use all the tdp to do calculation not some %age of it (right ?).
Usually when a card show low tdp usage it just means that the card is spending some time just waiting to be able to perform calculations

edit: anyone wanting to reduce tdp usage. Just use the slider in msi afterburner to your prefered max tdp usage, the card will adjust its clocks accordingly...

Epsylon3

legendary

Activity: 1484

Merit: 1082

ccminer/cpuminer developer

Quote from: cbuchner1 on September 08, 2014, 05:34:19 AM

FYI.

Scott Gray from the nVidia forums is about to release a nearly feature complete assembler for the Maxwell architecture.

https://devtalk.nvidia.com/default/topic/773064/cuda-programming-and-performance/maxwell-assembler/
...

Nice

i was trying to link custom functions in ptx files, but its a hell to do :p

yellowduck2

hero member

Activity: 868

Merit: 1000

Quote from: cbuchner1 on September 08, 2014, 06:16:08 AM

Quote from: yellowduck2 on September 08, 2014, 06:08:23 AM

I don't really understand. In layman terms , it means able lower to wattage and increase hash ?

I pitched this at developers mostly, not laymans Wink

The net effect will be more hash *and* more wattage.

So its gonna be Killer Groestl part 2 final episode ? Cheesy

cbuchner1

hero member

Activity: 756

Merit: 502

Quote from: yellowduck2 on September 08, 2014, 06:08:23 AM

I don't really understand. In layman terms , it means able lower to wattage and increase hash ?

I pitched this at developers mostly, not laymans Wink

The net effect will be more hash *and* more wattage.

yellowduck2

hero member

Activity: 868

Merit: 1000

Quote from: cbuchner1 on September 08, 2014, 05:34:19 AM

FYI.

Scott Gray from the nVidia forums is about to release a nearly feature complete assembler for the Maxwell architecture.

https://devtalk.nvidia.com/default/topic/773064/cuda-programming-and-performance/maxwell-assembler/

It could be highly interesting to tweak innermost loops of hashing algorithms for minimal register use, and for maximum instruction throughput.

He's got one interesting example of what kind of performance boost he is able to achieve:
https://devtalk.nvidia.com/default/topic/690631/cuda-programming-and-performance/so-whats-new-about-maxwell-/post/4305310/#4305310

He has also hinted at the possibility of a Kepler version of his assembler.

Christian

I don't really understand. In layman terms , it means able lower to wattage and increase hash ?

cbuchner1

hero member

Activity: 756

Merit: 502

FYI.

Scott Gray from the nVidia forums is about to release a nearly feature complete assembler for the Maxwell architecture.

https://devtalk.nvidia.com/default/topic/773064/cuda-programming-and-performance/maxwell-assembler/

It could be highly interesting to tweak innermost loops of hashing algorithms for minimal register use, and for maximum instruction throughput.

He's got one interesting example of what kind of performance boost he is able to achieve:
https://devtalk.nvidia.com/default/topic/690631/cuda-programming-and-performance/so-whats-new-about-maxwell-/post/4305310/#4305310

He has also hinted at the possibility of a Kepler version of his assembler.

Christian

bigjme

sr. member

Activity: 350

Merit: 250

Quote from: Ignition75 on September 08, 2014, 03:35:20 AM

Quote from: bigjme on September 08, 2014, 03:02:31 AM

it depends if talk is right, and if those results scale with mining performance.
If so, the gtx 980 is the same as 3 750ti's power wise.

performance wise a 780 is 3 750ti's, and a 980 is 1.3 times a 780, making a 980 the equivalent to 4 750ti's and the power usage of 3, plus only a single slot card

I may be wrong but that's how i see things, although i do think their graph is most likely incorrect

That is awesome isn't it?

Do we have any idea what the 980 may retail for?

Under $500 USD is probably wishful thinking...

i think the expected retail was $550-$600 which isnt bad
unsure how much a 750Ti is in $ now, but even at say $110 each, the 980 would be better off space wise and power wise then 4 750ti's even at a slightly higher price

AizenSou

hero member

Activity: 938

Merit: 1000

Quote from: Epsylon3 on September 08, 2014, 03:17:36 AM

I just want to make a small announce if you missed it :

You can find my "new" ccminer fork for blake (NEOS and SFR) and blakecoin here :

This algo seems to report nice hashrates on pools, and ive put inside a protection to prevent useless recomputing of the same blocks.
GPU Usage seems limited à 90% with small pauses when the pool does not send new jobs, which reduce the Card(s) heat.

Reminder: SFR (SaffronCoin) wallet has a bridge to buy coins via Paypal, its nice to play with some money on exchanges.. promising

https://bitcointalksearch.org/topic/ann-ccminer-23-opensource-gpl-tpruvot-770064

Wow nice. I will ask whether djm could merge your changes in his next release. Thanks.

Quote from: Ignition75 on September 08, 2014, 03:35:20 AM

Quote from: bigjme on September 08, 2014, 03:02:31 AM

it depends if talk is right, and if those results scale with mining performance.
If so, the gtx 980 is the same as 3 750ti's power wise.

performance wise a 780 is 3 750ti's, and a 980 is 1.3 times a 780, making a 980 the equivalent to 4 750ti's and the power usage of 3, plus only a single slot card

I may be wrong but that's how i see things, although i do think their graph is most likely incorrect

That is awesome isn't it?

Do we have any idea what the 980 may retail for?

Under $500 USD is probably wishful thinking...

Prepare to sell all your AMD farm, mate Cheesy

Topic: [ANN] cudaMiner & ccMiner CUDA based mining applications [Windows/Linux/MacOSX] - page 111. (Read 3426991 times)