Author

Topic: [ANN] cudaMiner & ccMiner CUDA based mining applications [Windows/Linux/MacOSX] - page 598. (Read 3426975 times)

newbie
Activity: 4
Merit: 0
NVIDIA GeForce 605 (Elitegroup)

Ouch. I think you may just have to accept you're not going to mine with that.
If it is elitegroup, you should try to mine ~BCX~ the coin for the elite

Have mercy: DABceSeUHRfeRnByhP3yGHnHk8Fx136ehR
legendary
Activity: 1400
Merit: 1050
NVIDIA GeForce 605 (Elitegroup)

Ouch. I think you may just have to accept you're not going to mine with that.
If it is elitegroup, you should try to mine ~BCX~ the coin for the elite
full member
Activity: 154
Merit: 100
NVIDIA GeForce 605 (Elitegroup)

Ouch. I think you may just have to accept you're not going to mine with that.
newbie
Activity: 4
Merit: 0
I'm having trouble with mining dogecoin. Getting 9 h/s. Should I do anything to the script instead of autotune?

I downloaded the 337.50 beta update from Nvidia with no improvement!

Here's my specs:

Windows 8 64-bit
Intel Pentium G2020 @ 2.90GHz   35 °C
Ivy Bridge 22nm Technology
4.00GB Single-Channel DDR3 @ 665MHz (9-9-9-24)
Acer Aspire XC600 (SOCKET 0)   28 °C
1023MB NVIDIA GeForce 605 (Elitegroup)   58 °C

cudaminer.exe -H 1 -i 0 -l auto -C 1 -o stratum+tcp://stratum.teamdoge.com:3333 -O   
legendary
Activity: 1400
Merit: 1050
I'm having trouble with mining dogecoin. Getting 9 h/s. Should I do anything to the script instead of autotune?
9 hashes per second, sounds like my calculator can do more Smiley
he didn't say what he was using...  Grin
full member
Activity: 182
Merit: 100
I'm having trouble with mining dogecoin. Getting 9 h/s. Should I do anything to the script instead of autotune?
9 hashes per second, sounds like my calculator can do more Smiley
newbie
Activity: 4
Merit: 0
I'm having trouble with mining dogecoin. Getting 9 h/s. Should I do anything to the script instead of autotune?
full member
Activity: 182
Merit: 100
Suddenly I feel no longer sad that I sold that bitcoin at the price I did, I am sad that I didn't sell my other bitcoin at the higher price Cry

Edit:
Just wondering: How can you people even get a profit out of cards like GTX 780?
sr. member
Activity: 350
Merit: 250
any ideas?

do you have two main functions in the code now?  does your extra main function use a different argument list?

Code:
int main(int argc, char **argv)

I checked all the code and there is no main function. The only word main in the entire files is for domain.

I will have to check through again and see whats going on
hero member
Activity: 756
Merit: 502
any ideas?

do you have two main functions in the code now?  does your extra main function use a different argument list?

Code:
int main(int argc, char **argv)
hero member
Activity: 756
Merit: 502
How on earth did you manage that? We havent been able to get over 13Mh/s

Just by benchmarking various launch configs until I found one that worked well, in addition to the other changes I listed in my original post. I modified the hefty_cpu_hash function in cuda_hefty1.cu. Changes made are expressed in this diff: https://gist.github.com/danryan/6a631e0ece773e5f6788

this change is potentially dangerous as the total number of threads run on the GPU is not aligned with the "throughput" variable as used by the heavycoin scanhash function (passed in as the variable "threads" into the function you modified). This could lead to overlapping shares being found (same nonce leading to rejects), part of the nonce space to be skipped (not actually a problem), or buffers to be overrun (potentially serious).

You need to add some code to compute the throughput variable (=total number of GPU threads) based on device properties, e.g. in an early function call to the cuda_hefty1.cu module.

Christian
newbie
Activity: 19
Merit: 0
On the original version the program is using 683blocks and 768 threads per block.
threads = 524 288
threadsperblock = 768
blocks = dim3 grid((threads + threadsperblock-1)/threadsperblock); = 682

http://runnable.com/U0YK9Jzak4RoTzpU/ccminer-grid-dimensions-example-for-c%2B%2B
newbie
Activity: 19
Merit: 0
On the original version the program is using 683blocks and 768 threads per block.
With your modification it is using 32x15=480 and 768 thread/block
However the number of thread is 524288, which in my opininon in the reason why I get
"the does not validate on cpu" and why 683 got chosen, since it is just thread/thread_per_block
This gives me around 36MHash/s

Yes, your numbers are correct, though it is not as simple as dividing the number of total threads by desired threads per block. Not all of the 524288 threads can be executed simultaneously; max resident threads for 3.x-5.x devices is 2048/SM (10240 on 750 Ti for example). However, they can be scheduled, and are processed once resources become available as previous tasks complete.
 
I have the feeling it is faster because it throws away  a lot of things...

This is indeed what happens when you get the "does not validate" error. The CPU tries to recreate the hash one last time before submitting it as proof, and it gets dropped if it fails validation. Work in this case is simply trashed. I have not finished instrumenting the code fully to provide exact details. What I do have is verification from pools through higher reported hashrate (calculated from rate of valid shares) and in particular a correlated increase in valid share counts.

Would be interesting to have Christian opinion on that.
In there a way to decrease the number of thread ? (assuming it works) ?

Agreed, I will 100% defer to Christian on this subject Smiley

legendary
Activity: 1400
Merit: 1050
On the original version the program is using 683blocks and 768 threads per block.
With your modification it is using 32x15=480 and 768 thread/block
However the number of thread is 524288, which in my opininon in the reason why I get
"the does not validate on cpu" and why 683 got chosen, since it is just thread/thread_per_block
This gives me around 36MHash/s

I changed 768 by 512 and then I get 39~40MHash/s
no rejected, however high rate of "does not valitate".
Which means large fraction of the shares are just thrown

I have the feeling it is faster because it throws away  a lot of things...

Would be interesting to have Christian opinion on that.
In there a way to decrease the number of thread ? (assuming it works) ?
newbie
Activity: 19
Merit: 0
To what corresponds this 768, is this the number of cuda core of the 750ti ? (need to see how this can be updated to the 780ti).

Launching a CUDA kernel uses the following syntax (ignoring optional parameters for now):

Code:
kernel_name<<>>(kernel_function_args...)

768 is the number of threads launched per block. The 750 Ti has 640 cores (128/SM (multiprocessor), 5 SMs/card). The 780 Ti has 2880 cores (192/SM, 15 SMs/card). I used very a basic calculation, essentially choosing a block count that is some multiple of the number of cores. In the case of the 780 Ti, 100 * SM count, or 100 * 15 == 1500. I haven't looked closely at the 780's specs, so one might run into a limitation on how many blocks per grid the card can support. You should be able to glean additional information from the following references:


legendary
Activity: 1400
Merit: 1050
I did a rapid test over the code modification.
I get quite a lot of "hash for nonce ... does not validate of cpu"
However the shares are accepted and the speed is 35Mh/s (instead of 28~30 depending on clock speed and number of instance)
To what corresponds this 768, is this the number of cuda core of the 750ti ? (need to see how this can be updated to the 780ti).

I only modified the cuda_hefty1.cu (I am lazy...), compiled with cuda 5.5 (didn't use either --relocatable-device-code=true) and in principle compute_3.5
sr. member
Activity: 350
Merit: 250
christian i just found a webserver to implement into ccminer for giving some json formatted output, i am able to implement it

i guess i need to add it into cpu-miner.c as thats the base
but reading through the code, i have a few options for the hashrate value, which of these is it i need?

1. 1e-3 * hashrate
2. hashrate

EDIT:

also since updating my drivers it seems i am unable to compile, i am getting the following message on new and old versions
Code:
C:\Program Files (x86)\MSBuild\Microsoft.Cpp\v4.0\BuildCustomizations\CUDA 5.5.targets(592,9): error MSB3721: The command ""C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v5.5\bin\nvcc.exe" -gencode=arch=compute_10,code=\"sm_10,compute_10\" --use-local-env --cl-version 2010 -ccbin "M:\Program Files\Microsoft Visual Studio 10.0\VC\bin"  -I. -Icompat -Icompat\jansson -Icompat\getopt -I"..\pthreads\Pre-built.2\include" -I"..\curl-7.29.0\include" -I"..\OpenSSL-Win32\include" -I"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v5.5\include" -I"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v5.5\include" -I"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v5.5\include"    --keep --keep-dir Release -maxrregcount=64 --ptxas-options=-v --machine 32 --compile -cudart static -Xptxas -v,-abi=no     -DWIN32 -DNDEBUG -D_CONSOLE -D_CRT_SECURE_NO_WARNINGS -DCURL_STATICLIB -DSCRYPT_KECCAK512 -DSCRYPT_CHACHA -DSCRYPT_CHOOSE_COMPILETIME -D_MBCS -Xcompiler "/EHsc /W3 /nologo /O2 /Zi  /MD  " -o Release\fermi_kernel.cu.obj "M:\CUDAMINER\Building\CudaMiner-13-2-14\fermi_kernel.cu"" exited with code 1.

can't test it my implemented webserver works or not without compiling  Sad

you know what, its nice to have a good anti-virus but when i can't compile because its too paranoid  Cry

UPDATE:
so i added all my webserver stuff, its very basic additional code, about 12 lines. but it fails at the end of compile with this

Code:
1>cl : Command line warning D9025: overriding '/TC' with '/TP'
1>  util.c
1>  sha2.c
1>  cpu-miner.c
1>m:\cudaminer\building\ccminer-0.5\cpu-miner.c(1389): error C2731: 'main' : function cannot be overloaded
1>          m:\cudaminer\building\ccminer-0.5\cpu-miner.c(1388) : see declaration of 'main'
1>  Generating Code...
========== Build: 0 succeeded, 1 failed, 0 up-to-date, 0 skipped ==========

any ideas?
newbie
Activity: 19
Merit: 0

wow, i think at over 16Mh/s i would accept 1 validation error everything 12 accepted lmao

O.o   o.O Impressive! What was done to achieve these numbers?
sr. member
Activity: 350
Merit: 250

 Grin

wow, i think at over 16Mh/s i would accept 1 validation error everything 12 accepted lmao
full member
Activity: 263
Merit: 100
Jump to: