[ANN][YAC] Yacminer GPU miner for Yacoin - page 11.

mikaelh

sr. member

Activity: 301

Merit: 250

Quote from: FreeTrade on July 08, 2013, 02:20:34 PM

Quote from: mikaelh on July 08, 2013, 12:49:22 PM

GPUs being 10x as efficient sounds a bit dubious but I don't have the numbers right now to check that. I'm not sure how you did your math either. Did you calculate when the efficiency of GPUs would meet with CPUs? Also you seem to be assuming that GPUs scale linearly when cores go idle which may not be the case. I actually did try idle cores but I don't remember the numbers unfortunately.

Yes, just using some rough numbers and simple assumptions as best I am able with my limited knowledge. Also assuming the hardware stays relatively the same.

Interesting that you say that some idle cores on the GPU would allow others to run faster . . . less heat to dissipate I'm guessing . . . and I guess the GPU could cycle through its cores, so as one got hotter, it could switch to a cooler one.

Well, it's almost linear. I actually did a quick benchmark of it.

Code:

Active cores | Hashrate
-------------+----------
100% | 59.88
50% | 30.78
25% | 14.61
12.5% | 7.361

I would assume that power consumption would go down as well but I cannot measure that now. It is very unlikely that it would scale linearly though.

FreeTrade

legendary

Activity: 1470

Merit: 1030

Quote from: mikaelh on July 08, 2013, 12:49:22 PM

GPUs being 10x as efficient sounds a bit dubious but I don't have the numbers right now to check that. I'm not sure how you did your math either. Did you calculate when the efficiency of GPUs would meet with CPUs? Also you seem to be assuming that GPUs scale linearly when cores go idle which may not be the case. I actually did try idle cores but I don't remember the numbers unfortunately.

Yes, just using some rough numbers and simple assumptions as best I am able with my limited knowledge. Also assuming the hardware stays relatively the same.

Interesting that you say that some idle cores on the GPU would allow others to run faster . . . less heat to dissipate I'm guessing . . . and I guess the GPU could cycle through its cores, so as one got hotter, it could switch to a cooler one.

gyverlb

hero member

Activity: 896

Merit: 1000

Quote from: mikaelh on July 08, 2013, 12:09:48 PM

In either case, the difficulty is pretty low. If the number of HW errors is exceeding accepted shares with that difficulty, then it's definitely not good.

I'm wondering if the p2pool fork ( https://github.com/Rav3nPL/p2pool-yac.git ) was based on a recent enough p2pool: there was a bug with scrypt where p2pool didn't handle the difficulty correctly and sent a lower one to the miners. This kind of logs was exactly what we had with Litecoin.

HW errors can be limited, but 5x70 cards perform poorly : 7950 and 7970 are OK and I only had to tune memory and intensity for my particular setup to get hashrate/2 compared to the previous N value (as expected) but for 5870 and 5970 this is more like hashrate/3.

mikaelh

sr. member

Activity: 301

Merit: 250

Quote from: FreeTrade on July 08, 2013, 11:57:22 AM

Thanks for the extra info and explanation. Based on some rough back of the envelope calculations, assuming a GPU with 4GB vs. a Quad Core CPU, assuming the CPU cores are 10x as efficient as the GPU cores at hashing (correct me if that sounds barmy) . . . we're looking at about an NFactor of 19 (August 2017) before CPUs draw even with GPUs. At this point, about 95% of the GPU cores could be idling for lack of memory, and most of the memory on the PC could still be unused. Does that sound about right?

GPUs being 10x as efficient sounds a bit dubious but I don't have the numbers right now to check that. I'm not sure how you did your math either. Did you calculate when the efficiency of GPUs would meet with CPUs? Also you seem to be assuming that GPUs scale linearly when cores go idle which may not be the case. I actually did try idle cores but I don't remember the numbers unfortunately.

mikaelh

sr. member

Activity: 301

Merit: 250

Quote from: gyverlb on July 08, 2013, 11:17:57 AM

Quote from: mikaelh on July 08, 2013, 10:57:35 AM

Just to double check, can you tell me the difficulty on the p2pool? The HW error numbers are a bit difficult to interpret because they are difficulty-1 shares. You need to multiply your accepted shares with the difficulty reported by the miner before you can compare it with HW errors.

Looking for this I noticed in my yacoin p2pool logs lots of:

Code:

2013-07-08 18:07:05.117352 Worker submitted share with hash > target:
2013-07-08 18:07:05.117431 Hash: fe983383877a82de6330326a41011914da8606f1900c3b669d4fcea9bb8b
2013-07-08 18:07:05.117495 Target: fffffffffffffffffffffffffffffffffffffffffffffffffffffffffff

I'm not sure of the formula for Target -> diff for scrypt-jane, but that's an interesting yacminer/p2pool diff mismatch. Doesn't affect my effective hashrate though.

That's interesting. The numbers are missing leading zeros, but it does look like a target miss. If I add the leading zeros, I get:

Code:

Hash: 0000fe983383877a82de6330326a41011914da8606f1900c3b669d4fcea9bb8b
Target: 00000fffffffffffffffffffffffffffffffffffffffffffffffffffffffffff

I'm not sure if that's a bug in Yacminer or P2pool. Yacminer is working fine with pushpools and yacoind.

Yacminer (cgminer) uses the target 0x0000ffffffffffffffffffffffffffffffffffffffffffffffffffffffffffff as difficulty 1. It then calculates difficulty as diff1_target / target. That gives a nice integer number. Your P2pool's difficulty should be 16.

Yacoind uses the target 00000000ffff0000000000000000000000000000000000000000000000000000 as difficulty 1. It then uses a tricky algorithm that corresponds roughly to target / diff1_target. That produces a floating point number. Your P2pool's difficulty should be about 0.00024413 that way.

In either case, the difficulty is pretty low. If the number of HW errors is exceeding accepted shares with that difficulty, then it's definitely not good.

FreeTrade

legendary

Activity: 1470

Merit: 1030

Quote from: mikaelh on July 08, 2013, 09:50:04 AM

Well, my current theory is that N = 8192 will still be doable with lookup gap = 2 on my HD 7790. After that I will probably need to increase lookup gap when N hits 16384 near the end of September.

My HD 7790 has 896 cores and 1 GB of memory. I can allocate about 768 MB of that (which corresponds to thread concurrency = 12000 with lookup gap = 2). 896 is the absolute minimum value of TC needed to sustain 896 cores at N = 1024. The effective thread concurrency is divided by 2 for each Nfactor increase after N = 1024. At N = 8192, the lower bound of TC is 8 * 896 = 7168. My GPU still has enough memory for that but not after the next increase of N.

The exact technical details are fairly complicated so I'm not going to try to explain them here. I'm not even sure about all the details myself. Please note using the optimal value for TC is the key to being able to use high intensities with lookup gap = 2.

I will try to do some benchmarks and post some numbers to support my theory. That's going to take a while though.

Thanks for the extra info and explanation. Based on some rough back of the envelope calculations, assuming a GPU with 4GB vs. a Quad Core CPU, assuming the CPU cores are 10x as efficient as the GPU cores at hashing (correct me if that sounds barmy) . . . we're looking at about an NFactor of 19 (August 2017) before CPUs draw even with GPUs. At this point, about 95% of the GPU cores could be idling for lack of memory, and most of the memory on the PC could still be unused. Does that sound about right?

cryptrol

hero member

Activity: 637

Merit: 500

Using the suggested TC by the table above with a 7950 does not work.

I get the best hashrate for a 7950 with this settings :

Quote

yacminer --scrypt -o http://pool -u me -p x --gpu-platform 0 -w 256 -I 19 --thread-concurrency 41216 --gpu-memclock 1483 --gpu-engine 1044 --lookup-gap 2

I get 120Kh with the above settings. I was getting 230 before the N change.

gyverlb

hero member

Activity: 896

Merit: 1000

Quote from: mikaelh on July 08, 2013, 10:57:35 AM

Just to double check, can you tell me the difficulty on the p2pool? The HW error numbers are a bit difficult to interpret because they are difficulty-1 shares. You need to multiply your accepted shares with the difficulty reported by the miner before you can compare it with HW errors.

Looking for this I noticed in my yacoin p2pool logs lots of:

Code:

2013-07-08 18:07:05.117352 Worker submitted share with hash > target:
2013-07-08 18:07:05.117431 Hash: fe983383877a82de6330326a41011914da8606f1900c3b669d4fcea9bb8b
2013-07-08 18:07:05.117495 Target: fffffffffffffffffffffffffffffffffffffffffffffffffffffffffff

I'm not sure of the formula for Target -> diff for scrypt-jane, but that's an interesting yacminer/p2pool diff mismatch. Doesn't affect my effective hashrate though.

gyverlb

hero member

Activity: 896

Merit: 1000

Quote from: gyverlb on July 08, 2013, 10:42:54 AM

Quote from: mikaelh on July 08, 2013, 08:06:13 AM

I have discovered a new sweet spot for thread concurrency. After setting it to 6912 (which is 14 * 512 - 256), I'm getting 61 kh/s on a 7790 (instead of 59.3 kh/s with a TC of 8000). My full settings are:

Code:

--scrypt -w 128 --lookup-gap 2 -I 17 --thread-concurrency 6912 -g 1

For other models in the HD 7000 series, I suggest the following formula for thread concurrrency with high intensities:

Code:

x * 512 - 256

x should be the number of compute units if possible. That's 14 for my HD 7790. For the HD 7000 series, the number of compute units should be the number of shaders divided by 64. You can also try other values.

Note that these values will need to be adjusted when N changes because thread concurrency is automatically adjusted by the kernel. Right now it's being divided by 4, so setting TC to 6912 means that the real TC is 1728.

It doesn't work for me with 7950 and 7970, with your formula I get more hashrate in yacminer but lots of HW errors (several times the accepted shares) and less than half of the effective hashrate I can get with optimal settings as reported by my p2pool node.

I have to use 16x the number of shaders to avoid HW errors on 79x0 and get the best result. I can use higher numbers but it doesn't improve the effective hashrate.

As I mine on p2pool I had to reduce my intensity to 16 from 17 to cope with longer processing times. I noticed that gpu-memdiff=60 is the sweet spot too now (-150 was optimal with previous N).

On 5x70 the situation is noticeably different on my hardware, I need to have more than 20000 shaders to avoid hardware errors and low effective hashrate (20000 with lookup-gap 5 seems optimal with some but relatively few hardware errors). The hashrate is ~30% of what it was before.

mikaelh

sr. member

Activity: 301

Merit: 250

Quote from: eule on July 08, 2013, 10:53:54 AM

yacminer.exe --scrypt -w 64 -I 10 --thread-concurrency 5120
Using that for my 5770, get 23-24Kh/s. Can't go higher than I 10, thread concurrency seems to be at the sweet spot, got 20kH around TC 8000 and 4000. Switching to YBC for now

The HD 5770 has 10 compute units. You can try adding/subtracting values like 64, 128, 256 to/from 5120. I'm not sure how exactly it works for HD 5000 series.

mikaelh

sr. member

Activity: 301

Merit: 250

Quote from: gyverlb on July 08, 2013, 10:42:54 AM

Quote from: mikaelh on July 08, 2013, 08:06:13 AM

I have discovered a new sweet spot for thread concurrency. After setting it to 6912 (which is 14 * 512 - 256), I'm getting 61 kh/s on a 7790 (instead of 59.3 kh/s with a TC of 8000). My full settings are:

Code:

--scrypt -w 128 --lookup-gap 2 -I 17 --thread-concurrency 6912 -g 1

For other models in the HD 7000 series, I suggest the following formula for thread concurrrency with high intensities:

Code:

x * 512 - 256

x should be the number of compute units if possible. That's 14 for my HD 7790. For the HD 7000 series, the number of compute units should be the number of shaders divided by 64. You can also try other values.

Note that these values will need to be adjusted when N changes because thread concurrency is automatically adjusted by the kernel. Right now it's being divided by 4, so setting TC to 6912 means that the real TC is 1728.

It doesn't work for me with 7950 and 7970, with your formula I get more hashrate in yacminer but lots of HW errors (several times the accepted shares) and less than half of the effective hashrate I can get with optimal settings as reported by my p2pool node.

I have to use 16x the number of shaders to avoid HW errors on 79x0 and get the best result. I can use higher numbers but it doesn't improve the effective hashrate.

As I mine on p2pool I had to reduce my intensity to 16 from 17 to cope with longer processing times. I noticed that gpu-memdiff=60 is the sweet spot too now (-150 was optimal with previous N).

Alright, thanks for testing. I don't have a 7970, so I'm just guessing here.

Just to double check, can you tell me the difficulty on the p2pool? The HW error numbers are a bit difficult to interpret because they are difficulty-1 shares. You need to multiply your accepted shares with the difficulty reported by the miner before you can compare it with HW errors.

eule

hero member

Activity: 756

Merit: 501

yacminer.exe --scrypt -w 64 -I 10 --thread-concurrency 5120
Using that for my 5770, get 23-24Kh/s. Can't go higher than I 10, thread concurrency seems to be at the sweet spot, got 20kH around TC 8000 and 4000. Switching to YBC for now

gyverlb

hero member

Activity: 896

Merit: 1000

Quote from: mikaelh on July 08, 2013, 08:06:13 AM

I have discovered a new sweet spot for thread concurrency. After setting it to 6912 (which is 14 * 512 - 256), I'm getting 61 kh/s on a 7790 (instead of 59.3 kh/s with a TC of 8000). My full settings are:

Code:

--scrypt -w 128 --lookup-gap 2 -I 17 --thread-concurrency 6912 -g 1

For other models in the HD 7000 series, I suggest the following formula for thread concurrrency with high intensities:

Code:

x * 512 - 256

x should be the number of compute units if possible. That's 14 for my HD 7790. For the HD 7000 series, the number of compute units should be the number of shaders divided by 64. You can also try other values.

Note that these values will need to be adjusted when N changes because thread concurrency is automatically adjusted by the kernel. Right now it's being divided by 4, so setting TC to 6912 means that the real TC is 1728.

It doesn't work for me with 7950 and 7970, with your formula I get more hashrate in yacminer but lots of HW errors (several times the accepted shares) and less than half of the effective hashrate I can get with optimal settings as reported by my p2pool node.

I have to use 16x the number of shaders to avoid HW errors on 79x0 and get the best result. I can use higher numbers but it doesn't improve the effective hashrate.

As I mine on p2pool I had to reduce my intensity to 16 from 17 to cope with longer processing times. I noticed that gpu-memdiff=60 is the sweet spot too now (-150 was optimal with previous N).

mikaelh

sr. member

Activity: 301

Merit: 250

Quote from: mikaelh on July 08, 2013, 08:06:13 AM

I have discovered a new sweet spot for thread concurrency. After setting it to 6912 (which is 14 * 512 - 256), I'm getting 61 kh/s on a 7790 (instead of 59.3 kh/s with a TC of 8000). My full settings are:

Code:

--scrypt -w 128 --lookup-gap 2 -I 17 --thread-concurrency 6912 -g 1

For other models in the HD 7000 series, I suggest the following formula for thread concurrrency with high intensities:

Code:

x * 512 - 256

x should be the number of compute units if possible. That's 14 for my HD 7790. For the HD 7000 series, the number of compute units should be the number of shaders divided by 64. You can also try other values.

Note that these values will need to be adjusted when N changes because thread concurrency is automatically adjusted by the kernel. Right now it's being divided by 4, so setting TC to 6912 means that the real TC is 1728.

If someone wants to help test the above formula, I calculated the suggested TC values for the HD 7000 series:

Code:

Model | Shaders | Compute Units | Suggested TC
--------------------+---------+---------------+--------------
HD 7750 | 512 | 8 | 3840
HD 7770 GHz Edition | 640 | 10 | 4864
HD 7790 | 896 | 14 | 6912
HD 7850 | 1024 | 16 | 7936
HD 7870 GHz Edition | 1280 | 20 | 9984
HD 7870 XT | 1536 | 24 | 12032
HD 7950 | 1792 | 28 | 14080
HD 7970 | 2048 | 32 | 16128

Use lookup gap = 2 and grab the thread concurrency from the table. Try increasing intensity as high as possible. You can ignore a few HW errors but if you start getting a lot of them, then it's not working.

mikaelh

sr. member

Activity: 301

Merit: 250

Quote from: FreeTrade on July 08, 2013, 03:36:23 AM

Quote from: mikaelh on July 03, 2013, 09:08:02 AM

Well, let's re-iterate this once more. GPU mining is not going to magically stop at N = 8192. My miner should survive it just fine. I'm not going to benchmark it because people always keep finding new settings after N changes. Some people will use a higher lookup gap and others will find ways to use lower values. I might come up with some new tricks but at the moment I'm pretty happy how the code scales with the lookup gap.

Hi Mikael. Firstly, thanks for your releases. Could you make a prediction about the long term viability of GPU mining? Yacoin was originally designed to be a currency that could be mined with CPU efficiently. Is that a busted flush now? Or will continued increments in N lead to a situation where CPU miners compete with, or even outmine GPU for equivalent energy input? I'm sure it's difficult to say for certain, but your guess is likely to be more educated than others.

Well, my current theory is that N = 8192 will still be doable with lookup gap = 2 on my HD 7790. After that I will probably need to increase lookup gap when N hits 16384 near the end of September.

My HD 7790 has 896 cores and 1 GB of memory. I can allocate about 768 MB of that (which corresponds to thread concurrency = 12000 with lookup gap = 2). 896 is the absolute minimum value of TC needed to sustain 896 cores at N = 1024. The effective thread concurrency is divided by 2 for each Nfactor increase after N = 1024. At N = 8192, the lower bound of TC is 8 * 896 = 7168. My GPU still has enough memory for that but not after the next increase of N.

The exact technical details are fairly complicated so I'm not going to try to explain them here. I'm not even sure about all the details myself. Please note using the optimal value for TC is the key to being able to use high intensities with lookup gap = 2.

I will try to do some benchmarks and post some numbers to support my theory. That's going to take a while though.

Wolf0

member

Activity: 81

Merit: 1002

It was only the wind.

Does anyone know when N last changed?

paulthetafy

hero member

Activity: 820

Merit: 1000

Quote from: Wolf0 on July 08, 2013, 08:54:34 AM

Does anyone know when N last changed?

Sun 07 Jul 2013 21:54:40

http://yacexplorer.tk/graphs.htm

mikaelh

sr. member

Activity: 301

Merit: 250

I have discovered a new sweet spot for thread concurrency. After setting it to 6912 (which is 14 * 512 - 256), I'm getting 61 kh/s on a 7790 (instead of 59.3 kh/s with a TC of 8000). My full settings are:

Code:

--scrypt -w 128 --lookup-gap 2 -I 17 --thread-concurrency 6912 -g 1

For other models in the HD 7000 series, I suggest the following formula for thread concurrrency with high intensities:

Code:

x * 512 - 256

x should be the number of compute units if possible. That's 14 for my HD 7790. For the HD 7000 series, the number of compute units should be the number of shaders divided by 64. You can also try other values.

Note that these values will need to be adjusted when N changes because thread concurrency is automatically adjusted by the kernel. Right now it's being divided by 4, so setting TC to 6912 means that the real TC is 1728.

mikaelh

sr. member

Activity: 301

Merit: 250

Quote from: nfuse on July 08, 2013, 07:50:57 AM

i tryed mining for a while (about 18 hours) didnt found any blocks with about 283kh/s per card. am i doing something wrong?

settings i use with solomining and Yacminer 3.3.1 x64

yacminer --scrypt -o 127.0.0.1:8112 -u yacoin -p x -I 12 -w 256 --thread-concurrency 8192 -g 2

If your HW error count isn't going up, your miner should be doing just fine. It's just a matter of the difficulty being high and being unlucky in general.

nfuse

member

Activity: 97

Merit: 10

i tryed mining for a while (about 18 hours) didnt found any blocks with about 283kh/s per card. am i doing something wrong?

settings i use with solomining and Yacminer 3.3.1 x64

yacminer --scrypt -o 127.0.0.1:8112 -u yacoin -p x -I 12 -w 256 --thread-concurrency 8192 -g 2

Topic: [ANN][YAC] Yacminer GPU miner for Yacoin - page 11. (Read 57543 times)