Author

Topic: [ANN] cudaMiner & ccMiner CUDA based mining applications [Windows/Linux/MacOSX] - page 599. (Read 3426985 times)

newbie
Activity: 19
Merit: 0
Can you add to your table what are the change between each lines.

Sure! I will update have updated my post accordingly.

Did you try on windows ?

I haven't because I do not have a Windows rig, and likely will not test this because I do not want to reimage or deal with Windows taking over my boot record Smiley See my diff in a previous post above for the changes I made. If you are capable of compiling this, I'd be very curious to see the results.

Must say I am a bit surprise bu the 23MHash/s. You should run a little longer to make sure everything is stable.

Configurations with the highest hashrates were stable enough to run in the sense that the program would not crash, however they were not stable enough to provide valid shares. For instance, 384 blocks x 768 threads @ 23213 khash/s attempted 27 shares, but only 16 were valid (less than half that of the 550x768 config).
newbie
Activity: 19
Merit: 0
How on earth did you manage that? We havent been able to get over 13Mh/s

Just by benchmarking various launch configs until I found one that worked well, in addition to the other changes I listed in my original post. I modified the hefty_cpu_hash function in cuda_hefty1.cu. Changes made are expressed in this diff: https://gist.github.com/danryan/6a631e0ece773e5f6788

Correct. I should have been more clear about that. Fixing the original post. Thanks for pointing that out!
Is this with or without the failed hashes included?

Could you clarify what you mean by failed hashes? If you're referring to ones that didn't pass CPU validation, yes they are included in the hashrate average, but they are not included in the share metrics (I care more about these, as these are the canonical numbers by which one gets credited for work).
legendary
Activity: 1400
Merit: 1050
I did some more HVC benchmarking of ccminer, varying the launch parameters of the hefty_gpu_hash kernel. I chose this kernel to tweak as the majority of the runtime is spent on it according to nvprof (due to stream synchronization after hefty and sha256 kernels are launched).

Each launch config was tested 5 times over 5 minute intervals (25 minute total sample) at the hvc.1gh.com pool, and results were averaged. Note that I did see CPU validation failures, however both the average hashrate and accepted shares outweighed them, confirmed by the 1gh dashboard. My best configuration was 550 blocks x 768 threads per block (average khash/s rate is per 750ti; share metrics are for all six cards):
Code:
+--------+---------+-------------------+------------------+-----------------+-----------------+------------------+
| blocks | threads | avg. khash/s rate | shares attempted | shares accepted | shares rejected | shares success % |
+========+=========+===================+==================+=================+=================+==================+
|   550  |   768   |       16781       |        32        |       28        |        4        |       87         |
+--------+---------+-------------------+------------------+-----------------+-----------------+------------------+

Other than the launch parameter change, the miner code under test has no local modifications. I have, however, made a few changes to how the code is compiled:
  • Using CUDA 6 RC
  • Compiled with relocatable device code support, (--relocatable-device-code=true --compile, requires manual linking for both host and device objects)
  • Removed maxrregcount to let compiler choose register count

The full data for all block configs can be found here: https://docs.google.com/spreadsheets/d/1C6fSk0pkDXBFIzXselXDE8IJP26dj6grWAJxnRrHO3Y/edit?usp=sharing

Tests run on a system with the following specs: https://gist.github.com/danryan/7c8762fda4d9783a58ae

Can you add to your table what are the change between each lines.
Did you try on windows ?
Must say I am a bit surprise bu the 23MHash/s. You should run a little longer to make sure everything is stable.
full member
Activity: 182
Merit: 100
Code:
+--------+---------+-------------------+------------------+-----------------+-----------------+------------------+
| blocks | threads | avg. khash/s rate | shares attempted | shares accepted | shares rejected | shares success % |
+========+=========+===================+==================+=================+=================+==================+
|   550  |   768   |       16781       |        32        |       28        |        4        |       87         |
+--------+---------+-------------------+------------------+-----------------+-----------------+------------------+

almost 17MH/s for 1 750Ti?

Correct. I should have been more clear about that. Fixing the original post. Thanks for pointing that out!
Is this with or without the failed hashes included?
sr. member
Activity: 350
Merit: 250
How on earth did you manage that? We havent been able to get over 13Mh/s

Christian i found an example on how to implement a rpc into a c++ program, i will try to see if i can get it working but i don't have a clue what i am doing  Cool
newbie
Activity: 19
Merit: 0
Code:
+--------+---------+-------------------+------------------+-----------------+-----------------+------------------+
| blocks | threads | avg. khash/s rate | shares attempted | shares accepted | shares rejected | shares success % |
+========+=========+===================+==================+=================+=================+==================+
|   550  |   768   |       16781       |        32        |       28        |        4        |       87         |
+--------+---------+-------------------+------------------+-----------------+-----------------+------------------+

almost 17MH/s for 1 750Ti?

Correct. I should have been more clear about that. Fixing the original post. Thanks for pointing that out!
sr. member
Activity: 350
Merit: 250
Code:
+--------+---------+-------------------+------------------+-----------------+-----------------+------------------+
| blocks | threads | avg. khash/s rate | shares attempted | shares accepted | shares rejected | shares success % |
+========+=========+===================+==================+=================+=================+==================+
|   550  |   768   |       16781       |        32        |       28        |        4        |       87         |
+--------+---------+-------------------+------------------+-----------------+-----------------+------------------+

almost 17MH/s for 1 750Ti?
newbie
Activity: 19
Merit: 0
I did some more HVC benchmarking of ccminer, varying the launch parameters of the hefty_gpu_hash kernel. I chose this kernel to tweak as the majority of the runtime is spent on it according to nvprof (due to stream synchronization after hefty and sha256 kernels are launched). I based block size on a multiple of SMs per card (e.g. 110 * 5 SMs on 750ti == 550).

Each launch config was tested 5 times over 5 minute intervals (25 minute total sample) at the hvc.1gh.com pool, and results were averaged. Note that I did see CPU validation failures, however both the average hashrate and accepted shares outweighed them, confirmed by the 1gh dashboard. My best configuration was 550 blocks x 768 threads per block (average khash/s rate is per 750ti; share metrics are for all six cards):

Code:
‡ is default launch config.
+---------++--------+---------+-------------------+------------------+-----------------+-----------------+------------------+
|         || blocks | threads | avg. khash/s rate | shares attempted | shares accepted | shares rejected | shares success % |
+=========++========+=========+===================+==================+=================+=================+==================+
| best    ||   550  |   768   |       16781       |        32        |       28        |        4        |       87         |
+---------++--------+---------+-------------------+------------------+-----------------+-----------------+------------------+
| default || ‡ 683  |   768   |       13987       |        17        |       16        |        1        |       94         |
+---------++--------+---------+-------------------+------------------+-----------------+-----------------+------------------+
| diff    ||  -133  |    -    |       +2794       |       +15        |      +12        |       +3        |       -7         |
+---------++--------+---------+-------------------+------------------+-----------------+-----------------+------------------+

Other than the launch parameter change, the miner code under test has no local modifications. I have, however, made a few changes to how the code is compiled:
  • Using CUDA 6 RC
  • Compiled with relocatable device code support, (--relocatable-device-code=true --compile, requires manual linking for both host and device objects)
  • Removed maxrregcount to let compiler choose register count

The full data for all block configs can be found here: https://docs.google.com/spreadsheets/d/1C6fSk0pkDXBFIzXselXDE8IJP26dj6grWAJxnRrHO3Y/edit?usp=sharing

Tests run on a system with the following specs: https://gist.github.com/danryan/7c8762fda4d9783a58ae

edits:
  • added default block size baseline for comparison
  • clarified block size calculation
  • added ± diff comparison
sr. member
Activity: 350
Merit: 250
parsing the cudaminer/ccminer output from a detached screen. since you can name a screen instance, you can easily get the output from each detached screen instance.

straight over my head :p
im stuck in windows so i am pretty limited. hmm i wonder if i can go through the cudaminer code and see where it reports hashrate and edit the outputs

under windows, you could write a little app which starts the cudaminer instances and then reads from stdout. should be easy.

i will have to have a look. writing websites i can do. writing and manipulating batch files, im not so good at
ok then so i have been able to write a code that will read the last reported hashrate from a file, the only problem is that after a while the file will get large, so i would need the file to be overwritten with every output line cudaminer gives, that part i can not figure out

soooo much easier if someone was able to figure out how to add in a curl to cudaminer :-(
newbie
Activity: 4
Merit: 0
I have a gt650m, using cudaminer from feb 9.
Whenever set clocks below 800 mhz, it mines at 65 C no problem at 65k/h
when I increase the clocks even a little bit, the temps slowly rise to 90-100 C (wtf?) and its mining rate doesn't even change significantly
Is anyone else having this issue?

I would also appreciate it if cudaminer had backup pool support...
full member
Activity: 182
Merit: 100
parsing the cudaminer/ccminer output from a detached screen. since you can name a screen instance, you can easily get the output from each detached screen instance.

straight over my head :p
im stuck in windows so i am pretty limited. hmm i wonder if i can go through the cudaminer code and see where it reports hashrate and edit the outputs

under windows, you could write a little app which starts the cudaminer instances and then reads from stdout. should be easy.
Can even be done in Java
legendary
Activity: 914
Merit: 1001
parsing the cudaminer/ccminer output from a detached screen. since you can name a screen instance, you can easily get the output from each detached screen instance.

straight over my head :p
im stuck in windows so i am pretty limited. hmm i wonder if i can go through the cudaminer code and see where it reports hashrate and edit the outputs

under windows, you could write a little app which starts the cudaminer instances and then reads from stdout. should be easy.
sr. member
Activity: 350
Merit: 250
parsing the cudaminer/ccminer output from a detached screen. since you can name a screen instance, you can easily get the output from each detached screen instance.

straight over my head :p
im stuck in windows so i am pretty limited. hmm i wonder if i can go through the cudaminer code and see where it reports hashrate and edit the outputs
legendary
Activity: 914
Merit: 1001
parsing the cudaminer/ccminer output from a detached screen. since you can name a screen instance, you can easily get the output from each detached screen instance.
sr. member
Activity: 350
Merit: 250


how are you getting the reported hashrate?
one thing i have wanted is to be able to read the last reported hashrate for each cudaminer instance
legendary
Activity: 914
Merit: 1001
just throw a line here, when we can test it again, I'll happily help testing as much as I can.
hero member
Activity: 756
Merit: 502
just compiled it under ubuntu 13.10 server, but something seems to be very wrong. It looks like only one card is used out of my 5x750Ti rig. Even with -d 0,1,2,3,4 only one card seem to be working (i guess that by the cards temperatures):

well he made substantial changes in unfamiliar code with only a few days to spare, so it's expected that he broke a couple of things Wink  Don't worry, I'll clean up.

Christian
legendary
Activity: 914
Merit: 1001
Hmm, Linux compilation of cudaminer is borked.

cpu-miner.c won't build, most likely because Alexey used C++'isms... which works on Windows because I complile this module with the /TP flag in order to trick it into allowing inline delarations of variables (and other things) requiring C99 support which Visual Studio 2010 is lacking)

EDIT: it's fixed!

Christian


just compiled it under ubuntu 13.10 server, but something seems to be very wrong. It looks like only one card is used out of my 5x750Ti rig. Even with -d 0,1,2,3,4 only one card seem to be working (i guess that by the cards temperatures) using scrypt-n:



Edit: switched back to tagged release 2014-2-28 and everything is back to normal.
newbie
Activity: 52
Merit: 0

I should try djm34 version because I don't get that sort of performance on scrypt with my gtx780ti  Smiley
For scrypt-jane try to autotune first

 Grin djm... it was a EVGA 780Ti Superclocked with Skynet bios.  OC was +135 (total 1180.6).

The autotune for YAC works on my 780, so I will try the 750's when I get home tonight, as I have to leave for work in a few.

legendary
Activity: 1400
Merit: 1050
I tried the binaries posted by djm34 and it looks like I am getting a little better scrypt performance with the new version:

GTX 780Ti  (~20-40 khash) improvement, it might be me but I think I'm seeing a little more CPU usage as well, but it's a little too early in the morning for me


However scrypt-jane appears to be broken:

I should try djm34 version because I don't get that sort of performance on scrypt with my gtx780ti  Smiley
For scrypt-jane try to autotune first
Jump to: