Pages:
Author

Topic: MemoryCoin 2.0 Proof Of Work - page 5. (Read 21451 times)

legendary
Activity: 1470
Merit: 1030
December 04, 2013, 06:18:17 AM
#42
Freetrade you forget that GPUs can have up to 300GB/s main memory bandwidth. So you get only a latency advantage with L3, not a bandwidth advantage. The data will not be in L2 at all, as you have only 256KB/core.

Wow, yes actually I hadn't realized the differential between newer GPUs and CPUs was so great. I'll need to reconsider.
hero member
Activity: 724
Merit: 500
December 04, 2013, 05:45:40 AM
#41
Freetrade you forget that GPUs can have up to 300GB/s main memory bandwidth. So you get only a latency advantage with L3, not a bandwidth advantage. The data will not be in L2 at all, as you have only 256KB/core.
legendary
Activity: 1470
Merit: 1030
December 04, 2013, 05:42:26 AM
#40
So - small update to the algorithm. Rather than use SHA512 to fill the psuedorandom data, I've decided to use Scrypt instead. This takes longer to generate, so should protect against the possibility of a GPU just generating the psuedorandom data and processing it as it needs it rather than storing it and fetching it from main memory.

Here's a comparison of Intel and AMD processors and includes measures of L2 and L3 cache -

http://en.wikipedia.org/wiki/Comparison_of_AMD_processors
http://en.wikipedia.org/wiki/Comparison_of_Intel_processors

I think the L2/L3 caches on newer and older processors are not directly comparable, and it'll be difficult to tell how efficient a given processor will be without testing it.

In order for a process to be efficient it'll need

1. Reasonably Fast access to 512MB memory - main memory
2. Very Fast access to 512KB memory  - L2/L3 cache memory

The first few processes on a GPU will have these, but run out of them in the same way a CPU does. The additional processing power the GPU won't help it, because it won't have data to operate on.


hero member
Activity: 724
Merit: 500
December 04, 2013, 05:02:34 AM
#39
Also as a separate problem I see which I will reveal, it appears he is hitting main memory bandwidth on the CPU not L2 bandwidth due to the 512 MB, which is 10 - 30 times slower than the GPU's main memory bandwidth.

It appears from the OP that he thinks he is staying within L2 by computing XORs in 512 KB chunks, but appears to me that he reads from the entire written 512 MB space thus there is no locality of cache.

I haven't verified any of this with his algorithm, so he would need to test to verify. I can only go by what I believe the description means it is doing. Perhaps my interpretation is incorrect.

Yes, it will not be in L2 (especially since it's 256KB per core), but probably in L3. GPUs have a L2 Cache of 256-768KB, but usually no L3. The problem is GDDR5 bandwidth is probably as good as Intels L3 bandwidth, but latency with the L3 is much shorter.
sr. member
Activity: 280
Merit: 250
December 04, 2013, 04:36:56 AM
#38
Sorry I can't give away my algorithms sooner. They will soon be open sourced.

Will you release the algo so that it could be used in Memorycoin, or will it be a separate coin?

Both. But I don't know if MemoryCoin can use it, because the hash is necessarily slow. This impacts on denial of service rejection. There must be a holistic design to deal with that I think. But I don't claim to be omniscient. I wish FreeTrade the best in all his endeavors.

And I hate to talk about my vaporware. FreeTrade invited me to comment on this thread. I wish I could help him more now. The best way for me to help him, is rush my release.

He has other features and ideas for his coin, so there is probably much room for differentiation. Let a 1000 flowers bloom. May the best be picked for a bouquet.

Looking forward to seeing what you're working on!
hero member
Activity: 518
Merit: 521
December 04, 2013, 04:18:55 AM
#37
However, the L2 are 256 MB on Intel so you need to adjust your 512 MB downwards.

L2 on Intel Core Prozessors is 1-2MB, not 256.

Excuse me I meant 256 KB. I will correct my typo.

Also as a separate problem I see which I will reveal, it appears he is hitting main memory bandwidth on the CPU not L2 bandwidth due to the 512 MB, which is 10 - 30 times slower than the GPU's main memory bandwidth.

It appears from the OP that he thinks he is staying within L2 by computing XORs in 512 KB chunks, but appears to me that he reads from the entire written 512 MB space thus there is no locality of cache.

I haven't verified any of this with his algorithm, so he would need to test to verify. I can only go by what I believe the description means it is doing. Perhaps my interpretation is incorrect.
hero member
Activity: 518
Merit: 521
December 04, 2013, 04:15:23 AM
#36
However, the L2 are 256 MB on Intel so you need to adjust your 512 MB downwards.

L2 on Intel Core Prozessors is 1-2MB, not 256.

Excuse me I meant 256 KB. I will correct my typo.
hero member
Activity: 724
Merit: 500
December 04, 2013, 04:14:04 AM
#35
However, the L2 are 256 MB on Intel so you need to adjust your 512 MB downwards.

L2 on Intel Core Prozessors is 1-2MB, not 256.

And I hate to talk about my vaporware. FreeTrade invited me to comment on this thread. I wish I could help him more now. The best way for me to help him, is rush my release.

Why don't you create a coin together with him? This will be best for the community.
hero member
Activity: 518
Merit: 521
December 04, 2013, 04:11:05 AM
#34
Sorry I can't give away my algorithms sooner. They will soon be open sourced.

Will you release the algo so that it could be used in Memorycoin, or will it be a separate coin?

Both. But I don't know if MemoryCoin can use it, because the hash is necessarily slow. This impacts on denial of service rejection. There must be a holistic design to deal with that I think. But I don't claim to be omniscient. I wish FreeTrade the best in all his endeavors.

And I hate to talk about my vaporware. FreeTrade invited me to comment on this thread. I wish I could help him more now. The best way for me to help him, is rush my release.

He has other features and ideas for his coin, so there is probably much room for differentiation. Let a 1000 flowers bloom. May the best be picked for a bouquet.
hero member
Activity: 724
Merit: 500
December 04, 2013, 04:09:40 AM
#33
Sorry I can't give away my algorithms sooner. They will soon be open sourced.

Will you release the algo so that it could be used in Memorycoin, or will it be a separate coin?
hero member
Activity: 518
Merit: 521
December 04, 2013, 03:58:14 AM
#32
Top-of-the-line GPUs have nearly the same main memory bandwidth as L2 cache or within a factor of 2 or 3, e.g. 2012 model AMD Taihiti at 264 GB per second. Some latest GPUs may be 1 TB per second even nearly as fast as L1.

That's fine - as long as those GPUs are of a similar cost and/or have higher energy requirements than comparable CPUs. Even if the GPUs are 2 or 3 times more efficient than CPUs - the capital investment still precludes it from being a viable business, which is the aim.

However, the L2 are 256 KB on Intel so you need to adjust your 512 KB downwards.

Also AMD has no L2 and the L3 is significantly slower. Maybe you are not concerned about losing those who run AMD.

Worse as far as I can see your algorithm can be trivially parallel.

The memory bus is the bottleneck - you can parallelize until you run out of bandwidth there.

I see one definite problem that makes your assumption false and potentially a second problem, but if I tell you what they are then I will give away a lot of the work I have done to make a truly CPU-only proof-of-work.

CPU-only will always have a slow hash. There is no way around it.

I guess you will find out when you release this and it is attacked by GPUs (and botnets), and if I release my open source, then you can copy it (although you won't be able to because the slow hash won't work in your overall design). I don't want to give you first mover advantage by telling you now.
newbie
Activity: 21
Merit: 0
December 04, 2013, 02:02:05 AM
#31
All my Core i7 are ready - just waiting a sign! Smiley
legendary
Activity: 1470
Merit: 1030
December 03, 2013, 01:31:10 PM
#30
Top-of-the-line GPUs have nearly the same main memory bandwidth as L2 cache or within a factor of 2 or 3, e.g. 2012 model AMD Taihiti at 264 GB per second. Some latest GPUs may be 1 TB per second even nearly as fast as L1.

That's fine - as long as those GPUs are of a similar cost and/or have higher energy requirements than comparable CPUs. Even if the GPUs are 2 or 3 times more efficient than CPUs - the capital investment still precludes it from being a viable business, which is the aim.

Worse as far as I can see your algorithm can be trivially parallel.

The memory bus is the bottleneck - you can parallelize until you run out of bandwidth there.
hero member
Activity: 518
Merit: 521
December 03, 2013, 12:15:05 PM
#29
That can only be true because you've eliminated what you thought I meant to eliminate and thus made the GPU faster. I said you were getting closer, meaning you are going to learn an important lesson.

Sorry I can't give away my algorithms sooner. They will soon be open sourced.

Not following you. But a specific hash takes .01 seconds, but to perform related hashes in bulk on a CPU is more like .0006 seconds per hash . . a GPU can't just scale up because it is missing the memory bandwidth and L2 cache.

Top-of-the-line GPUs have nearly the same main memory bandwidth as L2 cache or within a factor of 2 or 3, e.g. 2012 model AMD Taihiti at 264 GB per second. Some latest GPUs may be 1 TB per second even nearly as fast as L1.

Worse as far as I can see your algorithm can be trivially parallel.

P.S. ASICs scale too well and result in centralization of mining:

http://www.kotaku.com.au/2013/11/bitcoin-mining-is-getting-out-of-control/
legendary
Activity: 1470
Merit: 1030
December 03, 2013, 11:17:28 AM
#28
That can only be true because you've eliminated what you thought I meant to eliminate and thus made the GPU faster. I said you were getting closer, meaning you are going to learn an important lesson.

Sorry I can't give away my algorithms sooner. They will soon be open sourced.

Not following you. But a specific hash takes .01 seconds, but to perform related hashes in bulk on a CPU is more like .0006 seconds per hash . . a GPU can't just scale up because it is missing the memory bandwidth and L2 cache.
hero member
Activity: 518
Merit: 521
December 03, 2013, 11:07:52 AM
#27
A slow hash is a huge problem in terms of denial of service attacks on the mining nodes.

The hash looks like being 0.01 seconds - maybe faster with some optimization. That should be sufficiently fast.

That can only be true because you've eliminated what you thought I meant to eliminate and thus made the GPU faster. I said you were getting closer, meaning you are going to learn an important lesson.

Sorry I can't give away my algorithms sooner. They will soon be open sourced.
hero member
Activity: 518
Merit: 521
December 03, 2013, 11:03:40 AM
#26
You might consider how much DRAM is necessary to cause the user to notice his PC isn't performing correctly even with CPU usage scaled down to 50%. If his paged virtual memory in his games are now swapping to hard-disk, they may slow down considerably.
hero member
Activity: 560
Merit: 500
December 03, 2013, 10:07:34 AM
#25
I like the RAM idea too. It will rule out many of the PCs in a botnet - assuming that PC enthusiasts with 16GB are the sort of people who will notice that their machines are compromised. But it will also cut out many potential users, which is not a good thing. Perhaps (hastily checks own machines) 8GB is the right level?

It's a tricky one - to get mass adoption of the coins requires it to run on the most basic of machines, which are also those most likely to be in a botnet. Do machines in a botnet automatically send their mined coins to the same address?
legendary
Activity: 2156
Merit: 1131
December 03, 2013, 09:10:45 AM
#24
I like the RAM idea, anyone can afford some RAM contrary to a $10000 ASIC hardware for Bitcoin.
hero member
Activity: 695
Merit: 500
December 03, 2013, 07:47:41 AM
#23
Well OK Sad
I guess there is a way to stop botnets, we just have to get the right Idea, but this one dose not seem to be the right one Cheesy
Pages:
Jump to: