Author

Topic: Core, Bus, Cache... What is more important for mining? (Read 2191 times)

legendary
Activity: 2142
Merit: 1010
Newbie
Thx, but I switched to other project.
full member
Activity: 132
Merit: 100
why "bond to a pool" and not allow to be used by all

Just to earn some money for beer.

I would recommend http://pool-x.eu

Contact info:
skype: ag2x3k
irc:#pool-x.eu @ Freenode iRC ( http://webchat.freenode.net/?channels=#pool-x.eu )
web: [email protected]
sr. member
Activity: 462
Merit: 250
I heart thebaron
If everyone uses a fast miner this is the same as everyone uses a slow one coz of DIFFICULTY adjusting. So no point to give it to everyone.

Why not offer it up for sale then ? require a key or some sort of license validation and make some cash from it.

There is a pretty fancy BTC Mining Utility on this forum that is the flashiest I have ever seen and looks awesome, but it is also tied to a pool and unfortunately, that pool is not as popular as I am sure the Pool owner would like it to be......regardless of his awesome performing/looking software.
The 'Pool Binding' idea is bad....just sell it.
legendary
Activity: 2142
Merit: 1010
Newbie
why "bond to a pool" and not allow to be used by all

Just to earn some money for beer.
legendary
Activity: 2142
Merit: 1010
Newbie
When i said "tweak" i meant "lowering". Yes, i lower mining rate, coz there is no need to publish very fast miner. It's enough for me if it's just 10% faster than any other miner.

Why would you lower it when you can make it better and when you're going to give it out for free anyways?  That seems really counterintuitive.

If everyone uses a fast miner this is the same as everyone uses a slow one coz of DIFFICULTY adjusting. So no point to give it to everyone.
vip
Activity: 980
Merit: 1001


I'd like to publish my miner but I wish to have it bonded to a particular pool, so now I'm looking for a pool owner who could cooperate with me. Do u know anyone btw?
Also I'm not going to create Linux or MacOS editions of the miner to avoid situations when all miners (human) will use only my miner (soft).
why "bond to a pool" and not allow to be used by all

all of the pools have threads in this forum, you could post there or Private message one? (they dont hide) Cheesy

in my experience - even when there is an excellent miner (soft) many users (for wehatever reason) will continue to use the one they have or other miners they think work better for them, if your software is truly revolutionary someone will port it - even if you dont Smiley
hero member
Activity: 784
Merit: 1000
bitcoin hundred-aire
When i said "tweak" i meant "lowering". Yes, i lower mining rate, coz there is no need to publish very fast miner. It's enough for me if it's just 10% faster than any other miner.

Why would you lower it when you can make it better and when you're going to give it out for free anyways?  That seems really counterintuitive.
legendary
Activity: 2142
Merit: 1010
Newbie
>> I'm not sure if I fully understand this part. What does "tweak the ratio per GHz" mean?

When i said "tweak" i meant "lowering". Yes, i lower mining rate, coz there is no need to publish very fast miner. It's enough for me if it's just 10% faster than any other miner.

>> Have you noticed that pooler has also updated his miner and introduced processing of two hashes at once?

No, i didn't know that. If he didn't follow my advice to calculate 2 hashes in maneur when instructions are doubled to avoid cpu stalls, then he still have a trick to use.

>> Does your miner have any speed advantage over it?

Current version calculates 2000 hashes per 1 GHz. If pooler's miner has the same ratio, then there is no any speed advantage using my miner. I can't test it by myself coz Windows version of pooler's miner crashes on my computer.

>> After all, your benchmark numbers look pretty normal for "two hashes at once" processing.

Yes, seems so.

>> How much does the algorithmic optimization actually contribute?

Not much, when i remove all unnecessary code i have 2500 hashes per 1 GHz.
newbie
Activity: 39
Merit: 0
Yes, I removed some redundant calculations from salsa (not all though, I use them to tweak the ratio per GHz). It's the only trick I use. All others are just standard optimization techniques.
I'm not sure if I fully understand this part. What does "tweak the ratio per GHz" mean?

Quote
I'd like to publish my miner but I wish to have it bonded to a particular pool, so now I'm looking for a pool owner who could cooperate with me. Do u know anyone btw?
Also I'm not going to create Linux or MacOS editions of the miner to avoid situations when all miners (human) will use only my miner (soft).
Have you noticed that pooler has also updated his miner and introduced processing of two hashes at once? And actually his commit at github seems to be 2 days old already. Does your miner have any speed advantage over it? After all, your benchmark numbers look pretty normal for "two hashes at once" processing. How much does the algorithmic optimization actually contribute?
legendary
Activity: 2142
Merit: 1010
Newbie
I calculate 2 hashes at once to avoid register/memory read stalls (xmm0-xmm3 for 1st hash and xmm4-xmm7 for 2nd one).
OK, this explains why we get similar performance improvements (using pooler's code as a baseline for comparison). Do you have any other trick in your sleeve? Of course unless you want to keep it secret Wink Anyway, I got an impressions that you managed to somehow simplify the algorithm (for N = 1024, r = 1, p = 1 configuration) and reduce the number of arithmetic operations based on the information from this post.

Quote
So there was something else that increased performance.
My guess is that it's because sandybridge is a nice microarchitecture improvement over previous processors (eliminated register read stalls and instruction decoder bottlenecks). Agner Fog explains it quite well. Various benchmarkers/reviewers also have confirmed better performance per MHz.

Quote
After u told about cache I noticed that computer from the picture had very big cache size, 4 x 32 instead of 2 x 32 like it was on other machines. I suspect that cache size is more important.
I think 4 x 32 just means four cores with 32K of L1 each. The other processor was likely dualcore. 32K much smaller than 128K needed for hash calculation, which means that L2 cache plays a more significant role. Because each core has 256K of L2 cache, parallel computation of two caches at once makes a lot of sense at least for the processors without hyperthreading.

What are your plans regarding you miner?

Yes, I removed some redundant calculations from salsa (not all though, I use them to tweak the ratio per GHz). It's the only trick I use. All others are just standard optimization techniques.

I'd like to publish my miner but I wish to have it bonded to a particular pool, so now I'm looking for a pool owner who could cooperate with me. Do u know anyone btw?
Also I'm not going to create Linux or MacOS editions of the miner to avoid situations when all miners (human) will use only my miner (soft).
newbie
Activity: 39
Merit: 0
I calculate 2 hashes at once to avoid register/memory read stalls (xmm0-xmm3 for 1st hash and xmm4-xmm7 for 2nd one).
OK, this explains why we get similar performance improvements (using pooler's code as a baseline for comparison). Do you have any other trick in your sleeve? Of course unless you want to keep it secret Wink Anyway, I got an impressions that you managed to somehow simplify the algorithm (for N = 1024, r = 1, p = 1 configuration) and reduce the number of arithmetic operations based on the information from this post.

Quote
So there was something else that increased performance.
My guess is that it's because sandybridge is a nice microarchitecture improvement over previous processors (eliminated register read stalls and instruction decoder bottlenecks). Agner Fog explains it quite well. Various benchmarkers/reviewers also have confirmed better performance per MHz.

Quote
After u told about cache I noticed that computer from the picture had very big cache size, 4 x 32 instead of 2 x 32 like it was on other machines. I suspect that cache size is more important.
I think 4 x 32 just means four cores with 32K of L1 each. The other processor was likely dualcore. 32K much smaller than 128K needed for hash calculation, which means that L2 cache plays a more significant role. Because each core has 256K of L2 cache, parallel computation of two caches at once makes a lot of sense at least for the processors without hyperthreading.

What are your plans regarding you miner?
legendary
Activity: 2142
Merit: 1010
Newbie
It could be also the effect of turbo boost. Can you share any additional details about your implementation? Are you calculating more than one hash in parallel? If doing so, it is important for the working set not to exceed L2 cache size per CPU core.

Right now with my own tweaks coincidentally developed just today (for now using intrinsics for prototyping purposes), I'm getting ~3.4 khash/s for calculating one hash at once and ~5.1 khash/s when calculating two hashes at once when running a single thread on Intel Core i7 860 @2.8GHz (turbo boost disabled). The hand tuned pooler's assembly code runs at ~4.0 khash/s in the same conditions on my machine. But if hyperthreading comes into action, it ruins everything Smiley For total 8 threads both my and pooler's implementations get roughly the same ~3 khash/s per thread. Two hashes at once use double working set (~256K of memory) and that's exactly the size of L2 cache. There is no room for the second hardware thread on the same core and hyperthreading is apparently thrashing L2 cache, killing the performance benefits. Converting the code to hand tuned assembly may turn the tables though.

BTW, your sandybridge cpu should also have an advantage per mhz over my nehalem because it does not suffer from register read stalls which are a pest under registers pressure.

I calculate 2 hashes at once to avoid register/memory read stalls (xmm0-xmm3 for 1st hash and xmm4-xmm7 for 2nd one). So there was something else that increased performance. After u told about cache I noticed that computer from the picture had very big cache size, 4 x 32 instead of 2 x 32 like it was on other machines. I suspect that cache size is more important.
legendary
Activity: 2142
Merit: 1010
Newbie
Instruction set/pipelining/instructions per cycle are just as important as GHz when looking at CPUs... What were the other ones? The i5 probably takes less cycles per hash.

I recall there were no i5. Only this one...
newbie
Activity: 39
Merit: 0
I tested my miner for LTC an hour ago. The main task was to adjust a ratio of hashes per GHz. I lowered it to 1.87 (1870 hashes per 1 GHz). And it was 1.8-1.9 on most machines. But 1 computer had 2.1 ratio and I noticed no difference in hardware if compare to other machines. That puzzled me. Could u look at this screenshot and tell me what is so special in this machine? I have no ideas.
It could be also the effect of turbo boost. Can you share any additional details about your implementation? Are you calculating more than one hash in parallel? If doing so, it is important for the working set not to exceed L2 cache size per CPU core.

Right now with my own tweaks coincidentally developed just today (for now using intrinsics for prototyping purposes), I'm getting ~3.4 khash/s for calculating one hash at once and ~5.1 khash/s when calculating two hashes at once when running a single thread on Intel Core i7 860 @2.8GHz (turbo boost disabled). The hand tuned pooler's assembly code runs at ~4.0 khash/s in the same conditions on my machine. But if hyperthreading comes into action, it ruins everything Smiley For total 8 threads both my and pooler's implementations get roughly the same ~3 khash/s per thread. Two hashes at once use double working set (~256K of memory) and that's exactly the size of L2 cache. There is no room for the second hardware thread on the same core and hyperthreading is apparently thrashing L2 cache, killing the performance benefits. Converting the code to hand tuned assembly may turn the tables though.

BTW, your sandybridge cpu should also have an advantage per mhz over my nehalem because it does not suffer from register read stalls which are a pest under registers pressure.
newbie
Activity: 40
Merit: 0
Instruction set/pipelining/instructions per cycle are just as important as GHz when looking at CPUs... What were the other ones? The i5 probably takes less cycles per hash.
legendary
Activity: 2142
Merit: 1010
Newbie
I tested my miner for LTC an hour ago. The main task was to adjust a ratio of hashes per GHz. I lowered it to 1.87 (1870 hashes per 1 GHz). And it was 1.8-1.9 on most machines. But 1 computer had 2.1 ratio and I noticed no difference in hardware if compare to other machines. That puzzled me. Could u look at this screenshot and tell me what is so special in this machine? I have no ideas.

Jump to: