Pages:
Author

Topic: [neㄘcash, ᨇcash, net⚷eys, or viᖚes?] Name AnonyMint's vapor coin? - page 21. (Read 95306 times)

legendary
Activity: 2282
Merit: 1050
Monero Core Team
https://forum.bitcoin.com/ama-ask-me-anything/i-m-zooko-wilcox-ceo-of-the-zcash-company-ask-me-anything-t5413.html

Quote
Hello Zooko,

I'm interested in reading your goals and motivations for taking on anonymity in general or anonymous digital cash specifically as your priority project?

Haven't you seen the new laws coming (eventually in all Five Eyes countries I've heard from reliable sources) that will ban end-to-end encryption?

To that end do you expect to support a viewkey or other way that users individually or a global backdoor, so that Zcash can be compliant with the lurch towards a 666 NWO which seems to be rapidly taking form now (and I assert will accelerate with the full global contagion sovereign debt collapse 2017 -2020)?

I am all for the ideology, but I am also pragmatic. We as society may have to fight with social networking and the political-economic revolution of a DIY economy, e.g. self-publishing, 3D printing, etc.. I have been looking at the concept of a decentralized social network. Any comments?

Sincerely,
TPTB_need_war
AnonyMint
Shelby Moore III

Bold my emphasis. The proposed legislation in the United Kingdom does not ban end to end encryption. What it does however is to require those companies that provide proprietary encryption products to retain a back door key and make this back door key available to law enforcement. http://www.telegraph.co.uk/news/uknews/terrorism-in-the-uk/11970391/Internet-firms-to-be-banned-from-offering-out-of-reach-communications-under-new-laws.html?utm_campaign=Echobox&utm_medium=Social&utm_source=Twitter#link_time=1446482200 There is a critical difference here in that FLOSS end to end encryption tools remain perfectly legal and secure while proprietary end to end encryption tools would have a back door. This is not unlike the FinCEN guidance in the United States https://www.fincen.gov/statutes_regs/guidance/html/FIN-2013-G001.html where proprietary development funding models for crypto currency (using the emission to fund development) will likely lead to a legal requirement for MSB registration while FLOSS development funding models can retain their decentralized virtual currency designation and avoid the MSB registration requirement. In Canada there was a proposed law that did not pass that required providers of email servers to retain records for law enforcement with an exception for those who ran their own mail server from their own homes.. A more relevant question to ask would have been: Given that you plan to use a portion of the emission over the next 4 years to fund your company, have you registered as an MSB with FinCEN or have you obtained guidance from FinCEN that MSB registration is not required in your case?

The pattern here is starting to become apparent. Click I agree on some company's terms for the use of proprietary software and become a slave, click I do not agree and use FLOSS tools instead and remain free.
legendary
Activity: 990
Merit: 1108
How extensively is this specific NP problem studied?

The closest seems to be the study of tradeoffs in undirected graph traversal such as

http://www.cs.toronto.edu/~bor/Papers/time-space-tradeoffs-undirected-graph-traversal-graph-automata.pdf

which states:

Quote
Although it would be desirable to show a tradeoff for a general model of computation
such as a random access machine, obtaining such a tradeoff is beyond the
reach of current techniques. Thus it is natural to consider a ``structured'' model
(Borodin [16]), that is, one whose basic move is based on the adjacencies of the
graph, as opposed to one whose basic move is based on the bits in the graph's
encoding.

My model is rather odd in that you cannot easily figure out the neighbours of a node,
since edges are generated from nonces, and in the worst case you have to map over
all nonces to find the edges adjacent to a given node.

The paper establishes a quadratic lower bound on the
product of time and space for graph traversal, which I interpret
as good evidence for the edge trimming in my Cuckoo Cycle being optimal,
as it uses just 1 bit per edge.
sr. member
Activity: 420
Merit: 262
https://forum.bitcoin.com/ama-ask-me-anything/i-m-zooko-wilcox-ceo-of-the-zcash-company-ask-me-anything-t5413.html

Quote
Hello Zooko,

I'm interested in reading your goals and motivations for taking on anonymity in general or anonymous digital cash specifically as your priority project?

Haven't you seen the new laws coming (eventually in all Five Eyes countries I've heard from reliable sources) that will ban end-to-end encryption?

To that end do you expect to support a viewkey or other way that users individually or a global backdoor, so that Zcash can be compliant with the lurch towards a 666 NWO which seems to be rapidly taking form now (and I assert will accelerate with the full global contagion sovereign debt collapse 2017 -2020)?

I am all for the ideology, but I am also pragmatic. We as society may have to fight with social networking and the political-economic revolution of a DIY economy, e.g. self-publishing, 3D printing, etc.. I have been looking at the concept of a decentralized social network. Any comments?

Sincerely,
TPTB_need_war
AnonyMint
Shelby Moore III
sr. member
Activity: 420
Merit: 262
Proving data is stored on a decentralised node is something of an ongoing project.  

Others are looking at the issue:

https://bitslog.wordpress.com/2015/09/16/proof-of-unique-blockchain-storage-revised/

So far I think PoW for nodes to validate blocks or data they contain is an interesting approach.

Afaics, all of these proof-of-storage/retrievably designs are fundamentally flawed (MaidSafe, Storj, Sia, etc) and can't ever be fixed.

They try to use network latency to prevent centralized outsourcing, but ubiquitously consistent network latency is not a reliable commodity. Duh. Sorry!

Sergio also the one who started the idea for a DAG which I have explained is fundamentally flawed (and afaics even had an egregious/fundamental error in this white paper for his DagCoin). Find that info in the thread linked above.
sr. member
Activity: 420
Merit: 262
Note the 5.5 minute per GB proving time for your Cuckoo Cycle PoW (and slower on mobile devices I presume) is entirely unacceptable for my design for decentralizing Satoshi's PoW design by having each payer include a PoW share. Even one minute is too slow, so then you aren't even monetizing a fraction of the CPU's main memory which is the performance per $ hardware economics.

Less time translates to less memory. A graph size of 2^26 takes about 8MB of memory and 5 sec runtime single-threaded. But you might need several runs to find a cycle of the required length.
You'd probably want to accept a whole range of lengths, like 16-64, to reduce the variance.

Yeah but my point is then we haven't monetized the DRAM of the computer relative to the professional mining farm. The user paid the entire purchase price for the computer not just 8MB of his 2+GB DRAM. As I wrote upthread, this impacts the ratio of unprofitable miners to ones that desire to profitable required to make all unprofitable in my design (it also has impacts in a profitable PoW mining design such as Satoshi's design). The economics of mining is electricity and hardware amortization. For a memory hard scenario, afaics the DRAM amortization becomes a more significant factor in the economics equation.

I figure I can probably do 0.5 GB in roughly 5 seconds or less (perhaps under a second, I need to restudy my late 2013/early 2014 Shazam 512B variant of Chacha work in a new context). I lose the asymmetric verification of yours, but gain much better proving times relative to memory amortized (and this wouldn't be vulnerable to probabilistic graph-theoretic analysis breakage of your NP problem). Nevertheless afaics (thus far) it doesn't help me with the music marketing strategy.  Undecided  It may help with my decentralized PoW design. Need to study that aspect. I am more focused on marketing strategy today.
legendary
Activity: 990
Merit: 1108
Note the 5.5 minute per GB proving time for your Cuckoo Cycle PoW (and slower on mobile devices I presume) is entirely unacceptable for my design for decentralizing Satoshi's PoW design by having each payer include a PoW share. Even one minute is too slow, so then you aren't even monetizing a fraction of the CPU's main memory which is the performance per $ hardware economics.

Less time translates to less memory. A graph size of 2^26 takes about 8MB of memory and 5 sec runtime single-threaded. But you might need several runs to find a cycle of the required length.
You'd probably want to accept a whole range of lengths, like 16-64, to reduce the variance.
sr. member
Activity: 420
Merit: 262
Also I am wondering if you tried instead of hitting the memory bound with one instance, run multiple instances on the GPU since I assume you are not using 6GB for each instance? In this case the tradeoff between increased computation (thus masking memory latency) and memory bandwidth may be more favorable to the GPU?

Cuckoo Cycle, even using smaller amounts of memory, like 512MB, consumes practically all memory bandwidth of a GPU. So the GPU having 4GB or 6GB is irrelevant.
If you run two 512MB Cuckoo instances, then each runs at about half the speed due to memory contention.
What happens is that the memory used by Cuckoo Cycle is spread across all the memory banks
(using let's say 1/8th of each) as this allows the maximum throughput. Each memory bank suffers a row switching delay on every memory access, forming the bottleneck. Running two instances means they
have to take turns getting results from each memory bank, so there's no increase in throughput.
No amount of extra computational resources changes that.

I've measured the same phenomenon on CPUs.

I know that of course. What I was getting at is whether the parallelization on one instance was more costly in terms of memory accesses versus running a separate instance. It has been 2 years since I studied that Cuckoo Cycle graph NP problem briefly (and no available time or human brain bandwidth to delve into it today), so I didn't know the answer. See also my other reply above (which I even edited) while you were replying.

Note the 5.5 minute per GB proving time for your Cuckoo Cycle PoW (and slower on mobile devices I presume) is entirely unacceptable for my design for decentralizing Satoshi's PoW design by having each payer include a PoW share. Even one minute is too slow, so then you aren't even monetizing a fraction of the CPU's main memory which is the performance per $ hardware economics.

Designing a holistic PoW system for crypto currency has many variables, including marketing strategy.
legendary
Activity: 990
Merit: 1108
Also I am wondering if you tried instead of hitting the memory bound with one instance, run multiple instances on the GPU since I assume you are not using 6GB for each instance? In this case the tradeoff between increased computation (thus masking memory latency) and memory bandwidth may be more favorable to the GPU?

Cuckoo Cycle, even using smaller amounts of memory, like 512MB, consumes practically all memory bandwidth of a GPU. So the GPU having 4GB or 6GB is irrelevant.
If you run two 512MB Cuckoo instances, then each runs at about half the speed due to memory contention.
What happens is that the memory used by Cuckoo Cycle is spread across all the memory banks
(using let's say 1/8th of each) as this allows the maximum throughput. Each memory bank suffers a row switching delay on every memory access, forming the bottleneck. Running two instances means they
have to take turns getting results from each memory bank, so there's no increase in throughput.
No amount of extra computational resources changes that.

I've measured the same phenomenon on CPUs.

We cannot say what is the performance-per-watt advantage of GPU over CPU until we measure it.
Note that I don't even have physical access to the machine that I bench. So I'm hoping other people
can do the measurement.

sr. member
Activity: 420
Merit: 262
Was that running all threads of the i7 versus all compute units of the GPU? Did you maximize the # of instances each could do with its available compute units, i.e. normalize for if the GPU has 8 or 16GB as necessary to max out its FLOPS?

That's with maxing out either cores (CPU) or memory bandwidth (GPU).

But not necessarily maxing out CPU computation, since the NP problem is memory latency bound i.e. cores spend a lot of time idle. This is why TDP isn't a reliable estimate of what is going on. I assume you know the CPU has 25 times lower/faster main memory latency than the GPU. So the CPU may be sucking a lot more electricity than the GPU which is hitting its bound on memory bandwidth (and not latency thus not maximum masked parallel computation).

I am shocked that you hadn't purchased a $20 Kill-A-Watt meter given you have invested so much of your time on this. However, I believe these may not be available in some countries in Europe. In the USA, we can order it from Amazon.

I see you said you maxed out memory bandwidth, but what about trading some memory for 10X more computation until the memory bandwidth bound and computation bound (FLOPS) are matched?

The only known trade-off uses k times less memory but 15k times more computation *and* memory accesses.

Perhaps there might be a probabilistic approach. I really haven't studied the NP problem you are applying. It worries me that you assume there aren't other tradeoffs when this is not serial random walk. Your claims are graph-theoretic only, not informational theoretic (as is the case in a perfectly uniformly distributed serial random walk). How extensively is this specific NP problem studied?
sr. member
Activity: 420
Merit: 262
tromp, the bottom line is that when I wrote that document I was trying to find a way to make a PoW where CPUs would be as power efficient as GPUs and ASICs, so that a professional miner wouldn't be more profitable than a CPU miner. That was back when I hadn't yet contemplated the solution of making mining unprofitable. And in the portion of the paper I omitted, I concluded that I had failed (even with fiddling around the various SRAM caches on Intel CPUs). Even if one makes a Scrypt-like random walk through memory entirely latency bound, thus GPU can run multiple instances until the latency is masked by computation and becomes computation bound or memory bandwidth bound. And believe in both cases, then the GPU will be more efficient on computation being performed.

What Cryptonote's Cryptonite PoW hash apparently does is make it impossible to run enough instances on a typical GPU (with its limited memory of say 6GB unless one were to customize one) to overcome the AES-NI instructions incorporated into the memory hard algorithm, since the GPU is apparently only at par in computational efficiency on AES-NI. Or what I think is really going on but I haven't confirmed it, is Cryptonite is AES-NI bound, so the GPU remains at parity. Which is exactly the direction I investigated next in 2014 after abandoning memory hard PoW (even the NP complexity class asymmetric variant such as yours). Also CN attempts to fiddle around the size of the various SRAM caches, but that can be a pitfall in scenarios such as ASICS or Tilegra or other hardware competitors.

So that is why I had abandoned memory hard PoW and investigated a specific instruction in the AES-NI instruction set which appears to have the highest level of optimization in terms of power efficiency as far as I can estimate. This also meant the PoW hash could be very fast (noting Cryptonite is slow) and would help with asymmetric validation of PoW shares in the DDoS scenario (although in my latest 2015 coin design I can verify Merkel signatures orders-of-magnitude faster than any memory hard PoW hash so the point becomes irrelevant).

I have recently entertained the thought that the only way to make them (nearly) equal with a memory hard approach would be to be no computation or for the computation to be so small so as to require an inordinate amount of total RAM or where the memory bandwidth bound would limit the computation latency masking to an insignificant portion of total power consumed. But this may not be easy to accomplish because DRAM is so power efficient. I also noticed an error in my thought process before in my rough draft paper where I hadn't contemplated another way to force a serial random walk that much more strongly resists memory vs. computation tradeoff and for which the computation would be very tiny relative to memory latency. And now my goal is no longer for them to be equal (and besides even if they were equal the mining farms have up to an order-of-magnitude cheaper electricity and more efficient amortization of power supplies), but just to be within say an order-of-magnitude because I am targeting unprofitable mining and the ratio dictates the ratio of unprofitable miners to those miners who desire to be profitable required for all miners to be unprofitable. This approach might be superior to the specific AES-NI instruction I had designed a PoW hash around in 2014.

But the main reason I revisited memory hard PoW is because I can't optimize an AES-NI instruction PoW hash from a browser (no native assembly code and because WebGL on mobile phones means as GPU or ASIC orders of magnitude more power efficient and hardware cost efficient) which impacted a marketing strategy I was investigating. However I have concluded last night that the marketing strategy I was contemplated is flawed because there isn't enough value in the electricity (and memory cost) consumed by the PoW hash to give sufficient value to computing the hash for transferred income (even if unprofitable) on a mobile phone. It turns out that marketing is much more important than PoW in terms of a problem that needs to be solved for crypto currency. The income transfer would make music download bandwidth profitable, but that is peanuts compared to the value generated by social media advertising and user expenditures. I am starting to bump up against some fundamental marketing barriers, e.g. microtransactions are useless is most every scenario (even music!), mobile is the future and there isn't enough electricity to monetize anything from PoW (besides competing on electricity consumption due to computation is a losing strategy w.r.t. to GPUs and ASICs). The money to be made in social media isn't from monetizing the CPU/DRAM nor the users' Likes as Synereo is attempting, but from creating value for the users (then either profiting on the advertising and/or the users' expenditures on what they value). This unfortunately has nothing to do with crypto currency, although it is very interesting to me and requires a lot of fun programming & design so I am starting to get frustrated with crypto currency as being an enormous time waster for me. Three years of researching and still not funding a sure project to work on in crypto currency that doesn't just devolve to a P&D to speculators, because crypto currency has no significant adoption markets (subject to change as I do my final thinking on these fundamental question before I quit crypto).

Bottom line is your Cuckcoo Cycle PoW can be parallelized and so the GPU can employ more computation to mask some of its slower main memory latency up to the memory bandwidth bound. With a guesstimated power efficiency advantage of perhaps 2 - 3X, although you have only estimated that from TDP and not actually measured it. It behoves you to attempt to measure that as it is a very important metric for considering whether to deploy your PoW. The reason it can be parallelized is because the entropy of the data structures is not a serial random walk (which is the point I was trying to make in my poorly worded text from the rough draft of an unpublished paper). Also I am wondering if you tried instead of hitting the memory bound with one instance, run multiple instances on the GPU since I assume you are not using 6GB for each instance? In this case the tradeoff between increased computation (thus masking memory latency) and memory bandwidth may be more favorable to the GPU?

Note that other paper's proposed PoW can also be parallelized up to the memory bandwidth bound but afaics they didn't measure relative power efficiency:


Quote
thus the total advantage
of GPU over CPU is about the factor of 4, which is even
smaller than bandwidth ratio (134 GB/s in GTX480 vs 17
GB/s for DDR3). This supports our assumption of very limited
parallelism advantage due to restrictions of memory bandwidth


tromp, I think you will remember the discussion we had in 2014 about ASICs and I was claiming it could be parallelized and you were pointing out the bandwidth limitation at the interconnection between IC chips due to the fact the memory can't fit on the same silicon chip as the computational logic (or that not all the memory can fit on one chip).

So what I am really saying above is that afaics the fundamentally important invention of lasting value that I have found for crypto is unprofitable mining. I haven't decided yet whether to pursue that to implementation.
legendary
Activity: 2968
Merit: 1198
I don't know how to measure power usage of CPU and GPU. What I can measure is that the GPU is about 5x faster.

The easiest is probably just measure at the wall using Kill-A-Watt (around $20 for a cheap version)

TDP specs are kind of useless for anything, except as a very rough indicator.
legendary
Activity: 990
Merit: 1108
The paper says:

Quote
1The project webpage [37] claims Andersen’s optimizations be integrated into the miner, but the performance numbers are mainly unchanged since before the cryptanalysis appeared


All my code is right there to inspect and ready to be run if the authors don't trust my stated numbers.

Quote
What is the performance per Watt and performance per $ hardware comparing CPU and GPU now for reference miners?


IN another thread I commented
Quote
I have very limited data. A GTX980 with a TDP of around 160W was 5x faster than an i7-4790K with a TDP of around 80W. So the GPU appears to be between 2x and 3x more efficient in this case.
But since Cuckoo Cycle is memory bound, neither CPU nor GPU is going to be near their TDP.
So we really need to bench it with power measuring tools, which I'm lacking...
I don't know how to measure power usage of CPU and GPU. What I can measure is that the GPU is about 5x faster.

Quote
Was that running all threads of the i7 versus all compute units of the GPU? Did you maximize the # of instances each could do with its available compute units, i.e. normalize for if the GPU has 8 or 16GB as necessary to max out its FLOPS?

That's with maxing out either cores (CPU) or memory bandwidth (GPU).

Quote
I see you said you maxed out memory bandwidth, but what about trading some memory for 10X more computation until the memory bandwidth bound and computation bound (FLOPS) are matched?

The only known trade-off uses k times less memory but 15k times more computation *and* memory accesses.
sr. member
Activity: 420
Merit: 262
What I am saying is that the entropy of your problem space is large but limited, which is indeed because the confusion and diffusion injected into the memory space is not entirely randomized over the entire memory space allocated to the PoW computation. Duh. Which is precisely what Andersen discovered when he broke your Cuckoo Cycle as I warned you would be the case. Quoting from the above paper:

You're living in the past quoting 2 papers that both focus on an early 2014 version of Cuckoo Cycle.

What David Andersen did in april 2014 is to reduce memory consumption by a factor of 32, which has become
part of the reference miner in may 2014, well in the past when my Cuckoo Cycle paper was published in BITCOIN 2015.

The paper says:

But you don't need to read that paper to learn of the linear time memory trade off, which is right on the project page:

"I claim that trading off memory for running time, as implemented in tomato_miner.h, incurs at least one order of magnitude extra slowdown".

Btw, there is maximum entropy in the bitmap of alive edges once 50% of them have been eliminated.

But the GPU gets that computation for free because it is masked by the latency for the random accesses. That is why I asked for some performance figures above comparing CPU and GPU. I haven't looked at your project page for 2+ years.

Edit#3: so apparently about a 3X advantage in rate per watt since the TDP of the GPU you cited is 165W (including 4 GB ram) and i7 afair is about 125W:

https://github.com/tromp/cuckoo#1-memory-bank--1-virtual-core--1-vote

Was that running all threads of the i7 versus all compute units of the GPU? Did you maximize the # of instances each could do with its available compute units, i.e. normalize for if the GPU has 8 or 16GB as necessary to max out its FLOPS? I see you said you maxed out memory bandwidth, but what about trading some memory for 10X more computation until the memory bandwidth bound and computation bound (FLOPS) are matched?
legendary
Activity: 990
Merit: 1108
What I am saying is that the entropy of your problem space is large but limited, which is indeed because the confusion and diffusion injected into the memory space is not entirely randomized over the entire memory space allocated to the PoW computation. Duh. Which is precisely what Andersen discovered when he broke your Cuckoo Cycle as I warned you would be the case. Quoting from the above paper:

You're living in the past quoting 2 papers that both focus on an early 2014 version of Cuckoo Cycle.

What David Andersen did in march 2014 is to show how to reduce memory consumption by a an order of magnitude, which has become part of the reference miner since april 2014, well in the past when my Cuckoo Cycle paper was published in the January 2015 BITCOIN workshop. Please focus your criticism on that version.

But you don't need to read that paper to learn of the linear time memory trade off, which is right on the project page:

"I claim that trading off memory for running time, as implemented in tomato_miner.h, incurs at least one order of magnitude extra slowdown".

Btw, there is maximum entropy in the bitmap of alive edges once 50% of them have been eliminated.
sr. member
Activity: 420
Merit: 262
Verification—that a hash output is valid for a given input—should be orders-of-magnitude more efficient than computing the hash.

The validation ratio when used in a proof-of-work system also depends on how fast the hash is, because validation only requires one hash whereas solving proof-of-work requires as many hashes as the collective difficulty or more importantly the pool's minimum share difficulty.

Cuckoo Cycle is not a hash.

I was using the term 'hash' to mean proof-of-work problem with a verifier in the sense as defined here:

https://eprint.iacr.org/2015/946.pdf#page=2

I wasn't focused on defining terms (in an informal internal paper for my own analysis that I never prepared for publishing), but rather on analyzing the issues pertaining to memory hardness, such as resistance to parallelization and/or trading computation for space as should be obvious from my analysis of Scrypt.

The Cuckoo Cycle hash [19] significantly increases the asymmetric validation ratio, but has unprovable security because its algorithm is based on permutated orderings which do not incorporate diffusion and confusion [20]

Diffusion and confusion are properties of hash functions. Cuckoo Cycle is not a hash function.
You need to get your basics straight.

Careful of that condescending attitude. You are starting to act like Shen-noether, Gregory Maxwell and other condescending jerks that have come before you.

What I am saying is that the entropy of your problem space is large but limited, which is indeed because the confusion and diffusion injected into the memory space is not entirely randomized over the entire memory space allocated to the PoW computation. Duh. Which is precisely what Andersen discovered when he broke your Cuckoo Cycle as I warned you would be the case. Quoting from the above paper:


Although its entropy is large, i.e. the factorial permutation of all buckets in the hash space, it doesn't require that its entire memory space be accessed, thus possibly bit flags could be employed to reduce the memory used and make it faster.

There is no "entire memory space". Cuckoo Cycle uses memory for data structures, not for filling up with random garbage like scrypt does.

You lack imagination and abstraction conceptualization skills. Think of about the algorithm from the analogy that applies, and discover new insights.

Think about the entropy of the memory allocated for the data structures. If the entropy is not maximum one can in theory find a way to shrink that memory space and trade computation for memory space, then your algorithm is no longer memory hard. As well if one can parallelize computation within the same memory structures, it is no longer memory hard. As well if the entropy is not maximized, there is no evidence that a faster algorithm can't be found.

The "random garbage" is precisely necessary to maximize entropy and eliminate the possibility to trade memory for computation and/or parallelize the computation. Even the above paper's proposal seems to give up parallelization to the GPU (although they claim only by an order-of-magnitude and that is not including the GPU has multi-GB of memory and can run more than one instance if not memory bandwidth bounded).

I agree that Cuckoo Cycle can be parallellized. In fact the GPU solver works best with 16384 threads.
But it's only marginally faster than 4096 threads. Because they're saturating the random access
memory bandwidth.

Not just parallelization but also memory for computation time improvement as I predicted and now confirmed:

"Andersen [6]: a prover can reduce the memory by the factor of 50 with time increase by the factor of 2 only."
legendary
Activity: 1456
Merit: 1000
Proving data is stored on a decentralised node is something of an ongoing project. 

Others are looking at the issue:

https://bitslog.wordpress.com/2015/09/16/proof-of-unique-blockchain-storage-revised/

So far I think PoW for nodes to validate blocks or data they contain is an interesting approach.
legendary
Activity: 990
Merit: 1108
Verification—that a hash output is valid for a given input—should be orders-of-magnitude more efficient than computing the hash.

The validation ratio when used in a proof-of-work system also depends on how fast the hash is, because validation only requires one hash whereas solving proof-of-work requires as many hashes as the collective difficulty or more importantly the pool's minimum share difficulty.

Cuckoo Cycle is not a hash.

Quote
The Cuckoo Cycle hash [19] significantly increases the asymmetric validation ratio, but has unprovable security because its algorithm is based on permutated orderings which do not incorporate diffusion and confusion [20]

Diffusion and confusion are properties of hash functions. Cuckoo Cycle is not a hash function.
You need to get your basics straight.

Quote
Although its entropy is large, i.e. the factorial permutation of all buckets in the hash space, it doesn't require that its entire memory space be accessed, thus possibly bit flags could be employed to reduce the memory used and make it faster.

There is no "entire memory space". Cuckoo Cycle uses memory for data structures, not for filling up with random garbage like scrypt does.

I agree that Cuckoo Cycle can be parallellized. In fact the GPU solver works best with 16384 threads.
But it's only marginally faster than 4096 threads. Because they're saturating the random access
memory bandwidth.
sr. member
Activity: 420
Merit: 262
I decided to publish a section of my research on memory hard PoW hash functions from 2013. This is only the first section where I analyzed Scrypt. I may have published this section before when Monero was first announced and I was publicly debating with one of the Monero dudes about Cryptonote's PoW hash Cryptonite (yet another discussion that turned condescending and foul mouthed). Note there are 23 references cited in the complete version of this paper. This explains why Cryptonite employs AES-NI instructions to defeat the FLOPS and superior memory bandwidth advantage of the GPU.

I claim my analysis of tromp's Cuckoo below predated this as I added it to my paper immediately after tromp posted about his new paper:

David Andersen. A public review of cuckoo cycle. http://www.cs.cmu.edu/dga/crypto/cuckoo/analysis.pdf, 2014
http://da-data.blogspot.com/2014/03/a-public-review-of-cuckoo-cycle.html



Scrypt ROMix

   1: X <- B
   2: for i = 0 to N-1 do
   3:    V[ i] <- X
   4:    X <- H(X)
   5: end for
   6: for i = 0 to N-1 do
   7:    j <- Integerify(X) mod N
   8:    X <- H(X ^ V[j])
   9: end for
  10: B' <- X

Without parallelism the execution time of ROMix is bounded by that of the hash function H or the random access memory latency to read V[j].

Without parallelism the memory bandwidth to write V[ i] can not be a significant factor because in the case that the execution time of the first loop is dependent on memory bandwidth instead of H and not random access latency since the writes are sequential, the random access latency to read V[j] in the second loop is slower than the memory bandwidth in the first loop.

Percival's sequential memory-hard proof [1] states that redundant recomputation of H in exchange for a reduced memory footprint will not be asymptotically faster. For example even if H is asymptotically (as N goes to infinity) perfectly distributed in the Random Oracle model so the second loop only accesses each V[j] at most once, the H for each second element will be computed twice when reducing the memory footprint by ⅔ by recomputing the H for each second and third element in the second loop instead of retrieved from memory.

Percival's proof fails if the random access latency is not insignificant compared to the execution time of H because the execution time for H in a parallel thread is free as it masked by the other thread is which is stalled for the duration of the latency. This is why BlockMix is employed to increase the execution time of H.

Consider the example that the execution of H is twice as fast as the random access memory latency, i.e. H executes in ½ the delay of each random access. Analogous to cpuminer's "lookup gap" of 3, the computation of H for each second and third elements of V[j] can be repeated in the second loop instead of retrieved from memory. Thus ⅓ the memory requirements, average execution time equal to the latency (indicated by the computed value of 1 below), and only a ½ × ⅓ = 1/6 average increase in computational cost for accessing the third elements which is masked by ½ of the latency not offset by H in line 8. Each first, second, and third element of V[j] has a ⅓ probability of being accessed, so the relative execution time is computed as follows.

   ½ × ⅓ + (½ + ½) × ⅓ + (½ + ½ + ½) × ⅓ = 1

Since GPU DDR main memory has nearly an order-of-magnitude slower random access memory latency than the CPU's DRAM, the GPU employs a "lookup gap" to reduce the memory footprint to allow more parallel instances of Scrypt to execute simultaneously up to the available number of threads and memory bandwidth. The GPU's order-of-magnitude faster memory bandwidth allows running more parallel instances of the first loop. Thus the superior FLOPs of the GPU is fully utilized, making it faster than the CPU.

L3crypt

[...]

Even without employing "lookup gap", the GPU could potentially execute more than 200 concurrent instances of L3crypt to leverage its superior FLOPs and offset the 25x slower main memory latency and the CPU's 8 hyperthreads. To defeat this, the output of L3crypt should be hashed with a cryptographic hash the leverages the CPU's AES-NI instructions and with enough rounds to roughly equal the computation time of L3crypt. GPUs are roughly at parity with AES-NI in hashes per watt [24].

[...]

Asymmetric Validation

Verification—that a hash output is valid for a given input—should be orders-of-magnitude more efficient than computing the hash.

The validation ratio when used in a proof-of-work system also depends on how fast the hash is, because validation only requires one hash whereas solving proof-of-work requires as many hashes as the collective difficulty or more importantly the pool's minimum share difficulty.

The Cuckoo Cycle hash [19] significantly increases the asymmetric validation ratio, but has unprovable security because its algorithm is based on permutated orderings which do not incorporate diffusion and confusion [20] and thus not provably secure in the Random Oracle model, i.e. we can't prove there aren't algorithms to speed up the solutions. Although its entropy is large, i.e. the factorial permutation of all buckets in the hash space, it doesn't require that its entire memory space be accessed, thus possibly bit flags could be employed to reduce the memory used and make it faster.

The Cuckoo Cycle hash isn't a personal computer only hash because it requires only a minimal CPU and it is non-sequentially latency bound. It can be parallelized over the same memory space masking much of the latency. Thus for example the Tilera server CPUs will outperform Intel's personal computer CPUs because Tilera's has 3X to 12X more hardware threads per watt and isn't plagued by the GPU's slow main memory latency. Whereas for L3crypt no further parallelization is possible so even though compared to the 8-core Xeon E5 or Haswell-E the Tilera has same L3 cache [21] and 50% to 67% of the power consumption, the latency is 2X greater [22] and each clock cycle is 2X to 3X slower. Although parallelization can be applied for computing H to try to match Intel's AVX2 acceleration, L3crypt is sequentially memory latency bound.

[...]

Future Proof


CPU memory bandwidth is doubling approximately every four years [7] with up to a 50% improvement expected by 2015 [6] and memory size is doubling approximately every two years which why Moore's Law expects a doubling of performance every 18 months [8] computed as follows.

   2^(years/2) × 2^(years/4) = 2^(years/1.5)

However Percival noted that memory latency is not following Moore's Law [1].

References

[1] Percival, Colin. Stronger key derivation via sequential memory-hard functions.
    BSDCan'09, May 2009. http://www.tarsnap.com/scrypt/scrypt.pdf

[...]

[6] http://www.pcworld.com/article/2050260/hefty-price-premium-awaits-early-ddr4-memory-adopters.html

[7] http://www.timeline-help.com/computer-memory-timeline-4.html

[8] http://en.wikipedia.org/wiki/Moore's_law#cite_note-IntelInterview-2

[...]

[19] https://github.com/tromp/cuckoo/blob/master/cuckoo.pdf

[20] http://en.wikipedia.org/wiki/Confusion_and_diffusion
     http://www.theamazingking.com/crypto-block.php

[21] http://www.tilera.com/sites/default/files/productbriefs/TILE-Gx8072_PB041-03_WEB.pdf

[22] http://www.tilera.com/scm/docs/UG101-User-Architecture-Reference.pdf#page=369

[...]

[24] https://www.iacr.org/workshops/ches/ches2010/presentations/CHES2010_Session06_Talk03.pdf#page=16
legendary
Activity: 2996
Merit: 1903
...

Welcome back, TPTB.

Now act nice, ya hear?   :p   Smiley
sr. member
Activity: 420
Merit: 262
And they insulted me about doing a 30 second rough sketch of a logo (just to see how it would appear not because I thought it was a great result):



Saint Patrick's Day  Huh  Roll Eyes

4-leaf clover or a bowtie.

Smooth you are out of your experience zone.

The above is apparently after "many iterations" whereas my ideas below were quick one-offs:



Very roughly (the proportions are not balanced in this rapidly hacked up mockup) what I am thinking about for a Sync logo, if we choose the Sync name for the consensus network.



I am still considering 'ion' as well. Perhaps people can learn to think of it as the "electric charge" (fuel) they need to do actions in virtual reality. The logo would probably be a lighting bolt and/or positively charge ions floating in a cloud . If 'ion' wins out, I won't be upset. I just want to make sure all perspectives have been weighed by voters.

Here were some refined logos I did:

Here are various logos in I designed from 1999 - 2002:




Pages:
Jump to: