Author

Topic: [ANN] cudaMiner & ccMiner CUDA based mining applications [Windows/Linux/MacOSX] - page 111. (Read 3426930 times)

sr. member
Activity: 350
Merit: 250
Memory clocks can have an affect on it aswell. It may not handle memory+core clock well.
If its not a memory intensive algo, try dropping memory to stock and upping the core. It often gives better results, but a core of 1.37GHz is pretty good

I wonder if the 980's will overclock aswell, that would be a monster if we bump the clock up by the extra 200MHz
2x6pin, might limit a bit the possibility for extra juice and higher clock (?)

It must be 2 8pins. 6 pin is only 75w limiting the cards to 150w plus the power from the mb. 2 8pins would allow over 300w.

It would be strange to see 6 pins now I think
legendary
Activity: 1400
Merit: 1050
Memory clocks can have an affect on it aswell. It may not handle memory+core clock well.
If its not a memory intensive algo, try dropping memory to stock and upping the core. It often gives better results, but a core of 1.37GHz is pretty good

I wonder if the 980's will overclock aswell, that would be a monster if we bump the clock up by the extra 200MHz
2x6pin, might limit a bit the possibility for extra juice and higher clock (?)
sr. member
Activity: 462
Merit: 250
www.dashpay.io
Memory clocks can have an affect on it aswell. It may not handle memory+core clock well.
If its not a memory intensive algo, try dropping memory to stock and upping the core. It often gives better results, but a core of 1.37GHz is pretty good

I wonder if the 980's will overclock aswell, that would be a monster if we bump the clock up by the extra 200MHz

Droooool... Tongue

sr. member
Activity: 350
Merit: 250
Memory clocks can have an affect on it aswell. It may not handle memory+core clock well.
If its not a memory intensive algo, try dropping memory to stock and upping the core. It often gives better results, but a core of 1.37GHz is pretty good

I wonder if the 980's will overclock aswell, that would be a monster if we bump the clock up by the extra 200MHz
sr. member
Activity: 462
Merit: 250
www.dashpay.io
some cards simply wont overclock as high as others, even the same model and bios.
Best bet is to overclock all cards separately and go from there.

Make sure you untick Sync when overclocking else they will all clock the same

I had 3 cards, one would go over 100MHz higher then the other 2, no reason why

Worked a treat, thanks for the suggestion.  It was one card that didn't want to go over factory OC, the rest were happy with +70/350.

Must have just scraped through on the test bench.
sr. member
Activity: 462
Merit: 250
www.dashpay.io
some cards simply wont overclock as high as others, even the same model and bios.
Best bet is to overclock all cards separately and go from there.

Make sure you untick Sync when overclocking else they will all clock the same

I had 3 cards, one would go over 100MHz higher then the other 2, no reason why

Roger that...Thx...
sr. member
Activity: 350
Merit: 250
some cards simply wont overclock as high as others, even the same model and bios.
Best bet is to overclock all cards separately and go from there.

Make sure you untick Sync when overclocking else they will all clock the same

I had 3 cards, one would go over 100MHz higher then the other 2, no reason why
sr. member
Activity: 462
Merit: 250
www.dashpay.io
2 Nvidia rigs, exactly the same hardware and software, mining x11, 6 x 750ti GA Blacks:


First one, happy as, clocked a little:




Second one, cracks the shits as soon as I wanna clock even a little bit:




They both even have the same BIOS image.

Wipe it clean and start again with drivers/cuda/visual studio etc?
legendary
Activity: 1400
Merit: 1050
Basically it means you can tell the program what to do when, so instead of it auto running a long task and leaving quick ones to wait, you could do the quicker tasks and let the longer ones run after. In theory reducing the time the gpu is waiting around for

May be wrong though
yeah I guess I need the laymans terms for your explanation  Grin

In layman terms, I would say that usually when you ask your gpu to do something, you use a translator (compiler)
But since the translator isn't really native from the country of the gpu, sometime things get a bit distorted...
So what it is proposed here is to speak directly to the gpu in his own language and not even using google translate (inline ptx)

sr. member
Activity: 350
Merit: 250
Basically it means you can tell the program what to do when, so instead of it auto running a long task and leaving quick ones to wait, you could do the quicker tasks and let the longer ones run after. In theory reducing the time the gpu is waiting around for

May be wrong though
hero member
Activity: 868
Merit: 1000
So its gonna be Killer Groestl part 2 final episode ?  Cheesy

I think one can kill a lot of things with such a low level assembler. Wink

It offers much more control than if one used inline PTX. Inline PTX still doesn't allow you to perform manual register allocation. And a lot of PTX instructions expand to several machine commands, reducing the amount of control over the exact sequence of instructions that the programmer may want to assert.


I totally got no idea what u are talking about but i am sure your insight will be understood by many here  Grin Lots of developer lurking around
hero member
Activity: 756
Merit: 502
So its gonna be Killer Groestl part 2 final episode ?  Cheesy

I think one can kill a lot of things with such a low level assembler. Wink

It offers much more control than if one used inline PTX. Inline PTX still doesn't allow you to perform manual register allocation. And a lot of PTX instructions expand to several machine commands, reducing the amount of control over the exact sequence of instructions that the programmer may want to assert.
legendary
Activity: 1400
Merit: 1050
FYI.

Scott Gray from the nVidia forums is about to release a nearly feature complete assembler for the Maxwell architecture.

https://devtalk.nvidia.com/default/topic/773064/cuda-programming-and-performance/maxwell-assembler/

It could be highly interesting to tweak innermost loops of hashing algorithms for minimal register use, and for maximum instruction throughput.

He's got one interesting example of what kind of performance boost he is able to achieve:
https://devtalk.nvidia.com/default/topic/690631/cuda-programming-and-performance/so-whats-new-about-maxwell-/post/4305310/#4305310

He has also hinted at the possibility of a Kepler version of his assembler.

Christian


I don't really understand. In layman terms , it means able lower to wattage and increase hash ?
I think that could increase the hash... as well as the wattage.
I mean nobody wants a card working at 50% of its capability.
You want to use all the tdp to do calculation not some %age of it (right ?).
Usually when a card show low tdp usage it just means that the card is spending some time just waiting to be able to perform calculations

edit: anyone wanting to reduce tdp usage. Just use the slider in msi afterburner to your prefered max tdp usage, the card will adjust its clocks accordingly...
legendary
Activity: 1484
Merit: 1082
ccminer/cpuminer developer
FYI.

Scott Gray from the nVidia forums is about to release a nearly feature complete assembler for the Maxwell architecture.

https://devtalk.nvidia.com/default/topic/773064/cuda-programming-and-performance/maxwell-assembler/
...

Nice Smiley i was trying to link custom functions in ptx files, but its a hell to do :p
hero member
Activity: 868
Merit: 1000
I don't really understand. In layman terms , it means able lower to wattage and increase hash ?

I pitched this at developers mostly, not laymans Wink

The net effect will be more hash *and* more wattage.


So its gonna be Killer Groestl part 2 final episode ?  Cheesy
hero member
Activity: 756
Merit: 502
I don't really understand. In layman terms , it means able lower to wattage and increase hash ?

I pitched this at developers mostly, not laymans Wink

The net effect will be more hash *and* more wattage.
hero member
Activity: 868
Merit: 1000
FYI.

Scott Gray from the nVidia forums is about to release a nearly feature complete assembler for the Maxwell architecture.

https://devtalk.nvidia.com/default/topic/773064/cuda-programming-and-performance/maxwell-assembler/

It could be highly interesting to tweak innermost loops of hashing algorithms for minimal register use, and for maximum instruction throughput.

He's got one interesting example of what kind of performance boost he is able to achieve:
https://devtalk.nvidia.com/default/topic/690631/cuda-programming-and-performance/so-whats-new-about-maxwell-/post/4305310/#4305310

He has also hinted at the possibility of a Kepler version of his assembler.

Christian


I don't really understand. In layman terms , it means able lower to wattage and increase hash ?
hero member
Activity: 756
Merit: 502
FYI.

Scott Gray from the nVidia forums is about to release a nearly feature complete assembler for the Maxwell architecture.

https://devtalk.nvidia.com/default/topic/773064/cuda-programming-and-performance/maxwell-assembler/

It could be highly interesting to tweak innermost loops of hashing algorithms for minimal register use, and for maximum instruction throughput.

He's got one interesting example of what kind of performance boost he is able to achieve:
https://devtalk.nvidia.com/default/topic/690631/cuda-programming-and-performance/so-whats-new-about-maxwell-/post/4305310/#4305310

He has also hinted at the possibility of a Kepler version of his assembler.

Christian
sr. member
Activity: 350
Merit: 250
it depends if talk is right, and if those results scale with mining performance.
If so, the gtx 980 is the same as 3 750ti's power wise.

performance wise a 780 is 3 750ti's, and a 980 is 1.3 times a 780, making a 980 the equivalent to 4 750ti's and the power usage of 3, plus only a single slot card

I may be wrong but that's how i see things, although i do think their graph is most likely incorrect

That is awesome isn't it?

Do we have any idea what the 980 may retail for?

Under $500 USD is probably wishful thinking...

i think the expected retail was $550-$600 which isnt bad
unsure how much a 750Ti is in $ now, but even at say $110 each, the 980 would be better off space wise and power wise then 4 750ti's even at a slightly higher price
hero member
Activity: 938
Merit: 1000
I just want to make a small announce if you missed it :

You can find my "new" ccminer fork for blake (NEOS and SFR) and blakecoin here :

This algo seems to report nice hashrates on pools, and ive put inside a protection to prevent useless recomputing of the same blocks.
GPU Usage seems limited à 90% with small pauses when the pool does not send new jobs, which reduce the Card(s) heat.

Reminder: SFR (SaffronCoin) wallet has a bridge to buy coins via Paypal, its nice to play with some money on exchanges.. promising

https://bitcointalksearch.org/topic/ann-ccminer-23-opensource-gpl-tpruvot-770064

Wow nice. I will ask whether djm could merge your changes in his next release. Thanks.

it depends if talk is right, and if those results scale with mining performance.
If so, the gtx 980 is the same as 3 750ti's power wise.

performance wise a 780 is 3 750ti's, and a 980 is 1.3 times a 780, making a 980 the equivalent to 4 750ti's and the power usage of 3, plus only a single slot card

I may be wrong but that's how i see things, although i do think their graph is most likely incorrect

That is awesome isn't it?

Do we have any idea what the 980 may retail for?

Under $500 USD is probably wishful thinking...

Prepare to sell all your AMD farm, mate Cheesy
Jump to: