MemoryCoin 2.0 Proof Of Work - page 5.

FreeTrade

legendary

Activity: 1470

Merit: 1030

Quote from: Sharky444 on December 04, 2013, 05:45:40 AM

Freetrade you forget that GPUs can have up to 300GB/s main memory bandwidth. So you get only a latency advantage with L3, not a bandwidth advantage. The data will not be in L2 at all, as you have only 256KB/core.

Wow, yes actually I hadn't realized the differential between newer GPUs and CPUs was so great. I'll need to reconsider.

Sharky444

hero member

Activity: 724

Merit: 500

Freetrade you forget that GPUs can have up to 300GB/s main memory bandwidth. So you get only a latency advantage with L3, not a bandwidth advantage. The data will not be in L2 at all, as you have only 256KB/core.

FreeTrade

legendary

Activity: 1470

Merit: 1030

So - small update to the algorithm. Rather than use SHA512 to fill the psuedorandom data, I've decided to use Scrypt instead. This takes longer to generate, so should protect against the possibility of a GPU just generating the psuedorandom data and processing it as it needs it rather than storing it and fetching it from main memory.

Here's a comparison of Intel and AMD processors and includes measures of L2 and L3 cache -

http://en.wikipedia.org/wiki/Comparison_of_AMD_processors
http://en.wikipedia.org/wiki/Comparison_of_Intel_processors

I think the L2/L3 caches on newer and older processors are not directly comparable, and it'll be difficult to tell how efficient a given processor will be without testing it.

In order for a process to be efficient it'll need

1. Reasonably Fast access to 512MB memory - main memory
2. Very Fast access to 512KB memory - L2/L3 cache memory

The first few processes on a GPU will have these, but run out of them in the same way a CPU does. The additional processing power the GPU won't help it, because it won't have data to operate on.

Sharky444

hero member

Activity: 724

Merit: 500

Quote from: AnonyMint on December 04, 2013, 04:18:55 AM

Also as a separate problem I see which I will reveal, it appears he is hitting main memory bandwidth on the CPU not L2 bandwidth due to the 512 MB, which is 10 - 30 times slower than the GPU's main memory bandwidth.

It appears from the OP that he thinks he is staying within L2 by computing XORs in 512 KB chunks, but appears to me that he reads from the entire written 512 MB space thus there is no locality of cache.

I haven't verified any of this with his algorithm, so he would need to test to verify. I can only go by what I believe the description means it is doing. Perhaps my interpretation is incorrect.

Yes, it will not be in L2 (especially since it's 256KB per core), but probably in L3. GPUs have a L2 Cache of 256-768KB, but usually no L3. The problem is GDDR5 bandwidth is probably as good as Intels L3 bandwidth, but latency with the L3 is much shorter.

traderCJ

sr. member

Activity: 280

Merit: 250

Quote from: AnonyMint on December 04, 2013, 04:11:05 AM

Quote from: Sharky444 on December 04, 2013, 04:09:40 AM

Quote from: AnonyMint on December 03, 2013, 11:07:52 AM

Sorry I can't give away my algorithms sooner. They will soon be open sourced.

Will you release the algo so that it could be used in Memorycoin, or will it be a separate coin?

Both. But I don't know if MemoryCoin can use it, because the hash is necessarily slow. This impacts on denial of service rejection. There must be a holistic design to deal with that I think. But I don't claim to be omniscient. I wish FreeTrade the best in all his endeavors.

And I hate to talk about my vaporware. FreeTrade invited me to comment on this thread. I wish I could help him more now. The best way for me to help him, is rush my release.

He has other features and ideas for his coin, so there is probably much room for differentiation. Let a 1000 flowers bloom. May the best be picked for a bouquet.

Looking forward to seeing what you're working on!

AnonyMint

hero member

Activity: 518

Merit: 521

Quote from: AnonyMint on December 04, 2013, 04:15:23 AM

Quote from: Sharky444 on December 04, 2013, 04:14:04 AM

Quote from: AnonyMint on December 04, 2013, 03:58:14 AM

However, the L2 are 256 MB on Intel so you need to adjust your 512 MB downwards.

L2 on Intel Core Prozessors is 1-2MB, not 256.

Excuse me I meant 256 KB. I will correct my typo.

Also as a separate problem I see which I will reveal, it appears he is hitting main memory bandwidth on the CPU not L2 bandwidth due to the 512 MB, which is 10 - 30 times slower than the GPU's main memory bandwidth.

It appears from the OP that he thinks he is staying within L2 by computing XORs in 512 KB chunks, but appears to me that he reads from the entire written 512 MB space thus there is no locality of cache.

I haven't verified any of this with his algorithm, so he would need to test to verify. I can only go by what I believe the description means it is doing. Perhaps my interpretation is incorrect.

AnonyMint

hero member

Activity: 518

Merit: 521

Quote from: Sharky444 on December 04, 2013, 04:14:04 AM

Quote from: AnonyMint on December 04, 2013, 03:58:14 AM

However, the L2 are 256 MB on Intel so you need to adjust your 512 MB downwards.

L2 on Intel Core Prozessors is 1-2MB, not 256.

Excuse me I meant 256 KB. I will correct my typo.

Sharky444

hero member

Activity: 724

Merit: 500

Quote from: AnonyMint on December 04, 2013, 03:58:14 AM

However, the L2 are 256 MB on Intel so you need to adjust your 512 MB downwards.

L2 on Intel Core Prozessors is 1-2MB, not 256.

Quote from: AnonyMint on December 04, 2013, 04:11:05 AM

And I hate to talk about my vaporware. FreeTrade invited me to comment on this thread. I wish I could help him more now. The best way for me to help him, is rush my release.

Why don't you create a coin together with him? This will be best for the community.

AnonyMint

hero member

Activity: 518

Merit: 521

Quote from: Sharky444 on December 04, 2013, 04:09:40 AM

Quote from: AnonyMint on December 03, 2013, 11:07:52 AM

Sorry I can't give away my algorithms sooner. They will soon be open sourced.

Will you release the algo so that it could be used in Memorycoin, or will it be a separate coin?

Both. But I don't know if MemoryCoin can use it, because the hash is necessarily slow. This impacts on denial of service rejection. There must be a holistic design to deal with that I think. But I don't claim to be omniscient. I wish FreeTrade the best in all his endeavors.

And I hate to talk about my vaporware. FreeTrade invited me to comment on this thread. I wish I could help him more now. The best way for me to help him, is rush my release.

He has other features and ideas for his coin, so there is probably much room for differentiation. Let a 1000 flowers bloom. May the best be picked for a bouquet.

Sharky444

hero member

Activity: 724

Merit: 500

Quote from: AnonyMint on December 03, 2013, 11:07:52 AM

Sorry I can't give away my algorithms sooner. They will soon be open sourced.

Will you release the algo so that it could be used in Memorycoin, or will it be a separate coin?

AnonyMint

hero member

Activity: 518

Merit: 521

Quote from: FreeTrade on December 03, 2013, 01:31:10 PM

Quote from: AnonyMint on December 03, 2013, 12:15:05 PM

Top-of-the-line GPUs have nearly the same main memory bandwidth as L2 cache or within a factor of 2 or 3, e.g. 2012 model AMD Taihiti at 264 GB per second. Some latest GPUs may be 1 TB per second even nearly as fast as L1.

That's fine - as long as those GPUs are of a similar cost and/or have higher energy requirements than comparable CPUs. Even if the GPUs are 2 or 3 times more efficient than CPUs - the capital investment still precludes it from being a viable business, which is the aim.

However, the L2 are 256 KB on Intel so you need to adjust your 512 KB downwards.

Also AMD has no L2 and the L3 is significantly slower. Maybe you are not concerned about losing those who run AMD.

Quote from: FreeTrade on December 03, 2013, 01:31:10 PM

Quote from: AnonyMint on December 03, 2013, 12:15:05 PM

Worse as far as I can see your algorithm can be trivially parallel.

The memory bus is the bottleneck - you can parallelize until you run out of bandwidth there.

I see one definite problem that makes your assumption false and potentially a second problem, but if I tell you what they are then I will give away a lot of the work I have done to make a truly CPU-only proof-of-work.

CPU-only will always have a slow hash. There is no way around it.

I guess you will find out when you release this and it is attacked by GPUs (and botnets), and if I release my open source, then you can copy it (although you won't be able to because the slow hash won't work in your overall design). I don't want to give you first mover advantage by telling you now.

ludd

newbie

Activity: 21

Merit: 0

All my Core i7 are ready - just waiting a sign!

FreeTrade

legendary

Activity: 1470

Merit: 1030

Quote from: AnonyMint on December 03, 2013, 12:15:05 PM

Top-of-the-line GPUs have nearly the same main memory bandwidth as L2 cache or within a factor of 2 or 3, e.g. 2012 model AMD Taihiti at 264 GB per second. Some latest GPUs may be 1 TB per second even nearly as fast as L1.

That's fine - as long as those GPUs are of a similar cost and/or have higher energy requirements than comparable CPUs. Even if the GPUs are 2 or 3 times more efficient than CPUs - the capital investment still precludes it from being a viable business, which is the aim.

Quote from: AnonyMint on December 03, 2013, 12:15:05 PM

Worse as far as I can see your algorithm can be trivially parallel.

The memory bus is the bottleneck - you can parallelize until you run out of bandwidth there.

AnonyMint

hero member

Activity: 518

Merit: 521

Quote from: FreeTrade on December 03, 2013, 11:17:28 AM

Quote from: AnonyMint on December 03, 2013, 11:07:52 AM

That can only be true because you've eliminated what you thought I meant to eliminate and thus made the GPU faster. I said you were getting closer, meaning you are going to learn an important lesson.

Sorry I can't give away my algorithms sooner. They will soon be open sourced.

Not following you. But a specific hash takes .01 seconds, but to perform related hashes in bulk on a CPU is more like .0006 seconds per hash . . a GPU can't just scale up because it is missing the memory bandwidth and L2 cache.

Top-of-the-line GPUs have nearly the same main memory bandwidth as L2 cache or within a factor of 2 or 3, e.g. 2012 model AMD Taihiti at 264 GB per second. Some latest GPUs may be 1 TB per second even nearly as fast as L1.

Worse as far as I can see your algorithm can be trivially parallel.

P.S. ASICs scale too well and result in centralization of mining:

http://www.kotaku.com.au/2013/11/bitcoin-mining-is-getting-out-of-control/

FreeTrade

legendary

Activity: 1470

Merit: 1030

Quote from: AnonyMint on December 03, 2013, 11:07:52 AM

That can only be true because you've eliminated what you thought I meant to eliminate and thus made the GPU faster. I said you were getting closer, meaning you are going to learn an important lesson.

Sorry I can't give away my algorithms sooner. They will soon be open sourced.

Not following you. But a specific hash takes .01 seconds, but to perform related hashes in bulk on a CPU is more like .0006 seconds per hash . . a GPU can't just scale up because it is missing the memory bandwidth and L2 cache.

AnonyMint

hero member

Activity: 518

Merit: 521

Quote from: FreeTrade on December 03, 2013, 03:31:57 AM

Quote from: AnonyMint on December 03, 2013, 03:11:01 AM

A slow hash is a huge problem in terms of denial of service attacks on the mining nodes.

The hash looks like being 0.01 seconds - maybe faster with some optimization. That should be sufficiently fast.

That can only be true because you've eliminated what you thought I meant to eliminate and thus made the GPU faster. I said you were getting closer, meaning you are going to learn an important lesson.

Sorry I can't give away my algorithms sooner. They will soon be open sourced.

AnonyMint

hero member

Activity: 518

Merit: 521

You might consider how much DRAM is necessary to cause the user to notice his PC isn't performing correctly even with CPU usage scaled down to 50%. If his paged virtual memory in his games are now swapping to hard-disk, they may slow down considerably.

Stinky_Pete

hero member

Activity: 560

Merit: 500

I like the RAM idea too. It will rule out many of the PCs in a botnet - assuming that PC enthusiasts with 16GB are the sort of people who will notice that their machines are compromised. But it will also cut out many potential users, which is not a good thing. Perhaps (hastily checks own machines) 8GB is the right level?

It's a tricky one - to get mass adoption of the coins requires it to run on the most basic of machines, which are also those most likely to be in a botnet. Do machines in a botnet automatically send their mined coins to the same address?

superresistant

legendary

Activity: 2156

Merit: 1131