Author

Topic: [ANN] cudaMiner & ccMiner CUDA based mining applications [Windows/Linux/MacOSX] - page 735. (Read 3426922 times)

member
Activity: 84
Merit: 10
6x 750 Ti's (OC'd to stable + BIOS Power Mod)

Pretty please could you edit your post to include the exact make and model of your cards? It helps other people greatly Smiley Thank you!
His results are consistent with my 8x Zotac 750 Ti's on Win7. Note that in my testing there was no need for the BIOS power mod since it yielded no increase hashrate or stability, just extra power consumption. My overclocking is +135 core, +600 mem on all 8 cards.

Benchmarking results of each of my Zotacs here: https://bitcointalksearch.org/topic/m.5441080
member
Activity: 70
Merit: 10

What I can't understand is why does T12x10 give me 188 kh/s


Something most be bottlenecking you, RAM, VRAM, CPU, PCI?

I'm running a 3960x with 16gb of ram. The only thing that could possibly be causing issues I've got all 4 slots populated with 780's. Don't know if that would make a difference or not.
Yeah that does make a difference, looks like Christian is working on fixing that though.

Wait really? It should mine better if I disabled one of my 780's? -_-

I guess I can test it when I get home and flip one of the dip switches on the RIVE
full member
Activity: 182
Merit: 100

What I can't understand is why does T12x10 give me 188 kh/s


Something most be bottlenecking you, RAM, VRAM, CPU, PCI?

I'm running a 3960x with 16gb of ram. The only thing that could possibly be causing issues I've got all 4 slots populated with 780's. Don't know if that would make a difference or not.
Yeah that does make a difference, looks like Christian is working on fixing that though.
full member
Activity: 140
Merit: 100
Quote
2014-03-03 15:45:15] GPU #0: cudaError 30 (unknown error) calling 'cudaStreamQu
ery(context_streams[stream][thr_id])' (D:/Christian/Documents/Visual Studio 2010
/Projects/CudaMiner/salsa_kernel.cu line 958)

[2014-03-03 15:45:15] GPU #0: cudaError 30 (unknown error) calling 'cudaStreamWa
itEvent(context_streams[stream][thr_id], context_serialize[(stream+1)&1][thr_id]
, 0)' (D:/Christian/Documents/Visual Studio 2010/Projects/CudaMiner/salsa_kernel
.cu line 946)

[2014-03-03 15:45:15] GPU #0: cudaError 30 (unknown error) calling 'cudaMemcpyAs
ync(hash, context_hash[stream][thr_id], mem_size, cudaMemcpyDeviceToHost, contex
t_streams[stream][thr_id])' (D:/Christian/Documents/Visual Studio 2010/Projects/
CudaMiner/sha256.cu line 446)

When Cudaminer does this, attempting to mine afterwards results in a very very low hashrate. Is there anyway to resolve this without having to reboot the computer?
hero member
Activity: 676
Merit: 500
New trade is coming , register now and get bonus http://www.virtapay.com/r/liomojo
sr. member
Activity: 476
Merit: 250
For some reason my graphics driver crashes when autotuning with higher lookup-gaps. Not immediately, but when it´s almost done.
the crashing is due to a time-out... haven't found a way to solve it yet. Use the -D option to at least get some readings before the crash.
The timeout settings can be changed in the registry:
http://www.microsoft.com/whdc/device/display/wddm_timeout.mspx

PS- Does it normallly list your folder structure on errors? D:\Christian\Documents\Visual Studio....etc?

WDDM timeout can only be disabled on non-display driving cards.

I added code to not show my folder structure, but I wasn't really expecting forward slashes to be used on Windows in the __FILE__ macro.  So currently this only works on Linux to suppress my path.

Well apparently Windows 8 has issues with ones that aren't driving displays because the one driving the display never seemed to get knocked out.  I thought it was temps or the lower pcie slots, but changing configs didn't make a difference.  So far so good after registry change.

I've also got a few on risers now so thanks for working on that fix as well.
full member
Activity: 140
Merit: 100
6x 750 Ti's (OC'd to stable + BIOS Power Mod)

Pretty please could you edit your post to include the exact make and model of your cards? It helps other people greatly Smiley Thank you!
full member
Activity: 196
Merit: 100
I have been getting an odd error with the latest version 2014-02-18

It doesn't happen right away, but after a few hours of running.

I have reverted to cudaminer-2013-12-18 which doesn't exhibit this issue.

For both versions I am using x64 version.
My full system detail
https://drive.google.com/file/d/0B5cEvOA-L4zcd3dXN1R1Z1Biak0/edit?usp=sharing

I run via a bat file that just says:
cudaminer.exe -o stratum+tcp://middlecoin.com:3333 -u USERNAME -p x
member
Activity: 70
Merit: 10

What I can't understand is why does T12x10 give me 188 kh/s


Something most be bottlenecking you, RAM, VRAM, CPU, PCI?

I'm running a 3960x with 16gb of ram. The only thing that could possibly be causing issues I've got all 4 slots populated with 780's. Don't know if that would make a difference or not.
legendary
Activity: 1400
Merit: 1050
Guys, check out the development of auroracoin. It is poised to replaced LTC in coin market cap... (okay, the huge premine artificially inflates the market cap of course, the only tradable coins at the moment are the freshly mined ones!)

The only major risk is that the intended airdrop to Icelanders will be a major disaster. In this case, expect the coin to crash...



Been trying to mine aurora since it was around diff 300.. but comepletely unable to find a stable pool Sad.  they've gotten better but the diff is through the roof now
Guys, check out the development of auroracoin. It is poised to replaced LTC in coin market cap... (okay, the huge premine artificially inflates the market cap of course, the only tradable coins at the moment are the freshly mined ones!)

The only major risk is that the intended airdrop to Icelanders will be a major disaster. In this case, expect the coin to crash...

I think many people will set up their proxy in iceland...
actually if you want to mine it, p2pool from a private node is the best option. (I don't think I saw one though)

edit: I saw one, I have even compiled it few days ago... (forgot about it...)

Funny, I ran on the p2pool for something like 2 or 3 hours last friday or saturday, I still received payout from my shares...
sr. member
Activity: 280
Merit: 250
Guys, check out the development of auroracoin. It is poised to replaced LTC in coin market cap... (okay, the huge premine artificially inflates the market cap of course, the only tradable coins at the moment are the freshly mined ones!)

The only major risk is that the intended airdrop to Icelanders will be a major disaster. In this case, expect the coin to crash...



Been trying to mine aurora since it was around diff 300.. but comepletely unable to find a stable pool Sad.  they've gotten better but the diff is through the roof now
hero member
Activity: 756
Merit: 502
Guys, check out the development of auroracoin. It is poised to replaced LTC in coin market cap... (okay, the huge premine artificially inflates the market cap of course, the only tradable coins at the moment are the freshly mined ones!)

The only major risk is that the intended airdrop to Icelanders will be a major disaster. In this case, expect the coin to crash...
hero member
Activity: 756
Merit: 502
My feeling is that I'm leaving about 10-30 kH/s per card on the table because of x1 slots / risers.

this will be fixed at the software level soon'ish. I will be using risers myself, so it's in my best interest to fix it.

Christian
sr. member
Activity: 350
Merit: 250
Hmm 173khash/so is all I can get out of my 780

are those the right numbers for a 780 cause my asus 680 gives me 356 kh and those are with stock settings


bigjme, a crash of a CUDA kernel, the GPU clocks are sometimes throttled to half of the peak speeds. A reboot may fix it. You can check your clocks with GPU-z...


I was using lower case T not upper case. Switching to uppercase T fixed it
newbie
Activity: 21
Merit: 0
6x 750 Ti's (OC'd to stable + BIOS Power Mod)
-2x Zotac Reference
-4x ASUS OC'd w/ 6-pin PCIe header
-1x ASUS OC'd w/ 6-pin PCIe header (not currently in use...someone please help me get 7 cards working on a Z87-Pro!) Smiley
ASUS Z87-Pro
Intel G3220
4GB DDR1333
1kW PSU
320GB HDD
6x USB3.0 x1-x16 Powered Riser Cable Assemblies

W/ Windows 8.1 , scaling past two cards was futile--even placed in x16 slots and running at PCIe 3.0 x8 for each, there was a small performance hit. With six cards in the riser cables, I could not get > 230-240kH/s from the cards...some had a hard time hitting 200Kh/s. I was not getting above 1400 kH/s. Machine was not stable enough to allow for overclocking via software tool (MSI AB).

Same HW setup w/ Ubuntu 12.04 yielded about 1600 kH/s (BIOS power mod+modest MEM CLK increase).

Same HW setup w/ WINDOWS 7, was able to individually OC with MSI AB, each card, and get ~ 1750 kH/s out of six riser'd cards. This is the closest I could get to the 1800 kH/s holy grail.

Someone mentioned about that disabling iGPU would allow for the 7th PCie slot to be utilized? That sounds interesting. Smiley

I also have witnessed the performance degradtion from using anything but a PCIe 3.0 x16 slot in that native mode. Going down to x8 makes a small hit. Forcing a PCIe 3.0 x16 slot to operate in x1 mode does cause about the same performance hit as a x1 riser.


SO there were two issues going on here...the x1 / riser performance hit....and then a WIndows 8.1 scalability issue. No idea WTF was going on with W8.1, but I'm sticking iwth W7x64 for now for sure.

Just having six cards connected + W8.1 = unstable feeling system.

Here is screen capture of W7x64 with six riser'd GTX 750 Ti's.
http://1drv.ms/1fBX72c

My feeling is that I'm leaving about 10-30 kH/s per card on the table because of x1 slots / risers.
legendary
Activity: 1400
Merit: 1050

We should experiment using SIMPLE also for scrypt with N>=2048. Those with access to the source code
and a working comipler environment can already play with swapping ANDERSEN for SIMPLE in the kernel launches (the places with three brackets like <<< >>>) to check if there are any benefits to be had.

Christian

In which file is this ?

titan_kernel.cu (compute 3.5) or kepler_kernel.cu (compute 3.0)
ok, I recompiled with the SIMPLE method.
That doesn't seem to improve the hashrate.
I usually get around 300khash/s at t15x32, here I get with L1 around 200khash/s
If I change to L2, I get a somewhat better hashrate 280khash/s at t32x32 (If I remember correctly) but it is still lower than what I get with the other method.

Now that doesn't mean it won't work for other type of 780ti (the T15x24 or T15x16 doesn't work at all for me and give something around 15khash/s)
newbie
Activity: 6
Merit: 0

We should experiment using SIMPLE also for scrypt with N>=2048. Those with access to the source code
and a working comipler environment can already play with swapping ANDERSEN for SIMPLE in the kernel launches (the places with three brackets like <<< >>>) to check if there are any benefits to be had.

Christian

In which file is this ?

titan_kernel.cu (compute 3.5) or kepler_kernel.cu (compute 3.0)

I would have a crack at it myself, but I don't compile stuff (never learnt :/). I tried autotuning and it gave me my same config that I previously posted, so I guess that's good Tongue

I'm not sure how to force the autotuner to autotune for "t kernels" only rather than either kernel. I tried setting "-l t" in the config file, but cudaminer handled it as a invalid argument and autotuned as usual (eventually giving me T15x16).

Is there something I'm missing here?
hero member
Activity: 756
Merit: 502

We should experiment using SIMPLE also for scrypt with N>=2048. Those with access to the source code
and a working comipler environment can already play with swapping ANDERSEN for SIMPLE in the kernel launches (the places with three brackets like <<< >>>) to check if there are any benefits to be had.

Christian

In which file is this ?

titan_kernel.cu (compute 3.5) or kepler_kernel.cu (compute 3.0)
hero member
Activity: 756
Merit: 502
For some reason my graphics driver crashes when autotuning with higher lookup-gaps. Not immediately, but when it´s almost done.
the crashing is due to a time-out... haven't found a way to solve it yet. Use the -D option to at least get some readings before the crash.
The timeout settings can be changed in the registry:
http://www.microsoft.com/whdc/device/display/wddm_timeout.mspx

PS- Does it normallly list your folder structure on errors? D:\Christian\Documents\Visual Studio....etc?

WDDM timeout can only be disabled on non-display driving cards.

I added code to not show my folder structure, but I wasn't really expecting forward slashes to be used on Windows in the __FILE__ macro.  So currently this only works on Linux to suppress my path.
legendary
Activity: 1400
Merit: 1050
can I be doing better with something else?

Yeah, for Vertcoin there is a bit of a performance cliff, which you can see firsthand when running
autotune with the -D flag. At some warp number (dependent on the block count) the performance
drops drastically. So the ideal configurations for normal scrypt with warp numbers at the kernel's
limit (x24) don't work.

I think this must be due to saturating/overloading the memory controller.

Have you ever tried autotuning the lower case "t" kernel for Vertcoin? This used to be the fastest
kernel before nVidia submitted something better. This "t" kernel implements two different memory
access schemes. One is called SIMPLE (used for Yacoin), the other is called ANDERSEN (used for scrypt).

We should experiment using SIMPLE also for scrypt with N>=2048. Those with access to the source code
and a working comipler environment can already play with swapping ANDERSEN for SIMPLE in the kernel launches (the places with three brackets like <<< >>>) to check if there are any benefits to be had.

Christian

In which file is this ?
Jump to: