What gives is very simple: You're wrong;
I still believe you are wrong. See below...
you're also being needlessly insulting, in a discussion that need not become personal.
You claimed authority with "professional opinion" instead of publishing analysis of all possible attacks. Sorry peer review requires we publish analysis not authority.
If you'd like to engage in a credential pissing match, fine, but that seems like a waste of time.
No I'd like to engage in published analysis instead of claiming the blackbox (closed source) called "professional opinion".
Let's settle for me pointing out that I'm the original source of the code that's now used in the inner loop of the CPU cryptonight mining and block verification code, so I will claim some familiarity thereby.
Okay but I spent considerable time designing what the CryptoNote designers were attempting to design (even designing a 512 BYTE version of the ChaCha ARX style hash to make it fast enough) and wrote a very detailed set of whitepapers on the L3crypt and the Shazam! hash with around 30 citations. Also thought about the math and wrote it down, even for example studying cryptanalysis attacks on the design of ARX hashes.
You haven't posted enough details about your L3scrypt design to determine if your analysis actually applies to CryptoNight, but let's walk through the math a little:
The math I posted applies to any algorithm that uses randomized lookups from reading and writing to a table.
There are 1,000,000 random accesses of the inner loop of CryptoNight.
There are 131,072 individual 128 bit slots in the lookup table.
...
Your approach: Dynamic recomputation.
The first flaw in your analysis: Your l3scrypt seems, from what you wrote below, to use 512b (bit? likely, if scrypt) entries. CryptoNight uses 128 bit entries, which means that the cost of a 24 bit counter to indicate the last-modified-in round information for a particular value is still fairly significant in comparison to the original storage.
As an example, consider LOOKUP_GAP=2:
1MB of full cache to store actual values + 64k*4bytes ~= 256KB = 1.25MB of space.
Correct if you make your hash so slow that you can't deal with DDoS attacks (which is the case for CryptoNote), the size of the 'values' table needed to walk back each path of computation to trade computation for space, becomes larger than the space to store the values normally.
Whereas in L3crypt I have 512B entries in order to make the hash fast enough and still cover 1MB of cache to keep it in L3 cache in order to defeat economies-of-scale with Tilera cpus, GPUs, and ASICs.
So agreed CryptoNote in that case (1MB/16B table with 128-bit, i.e. 16B, elements with 1M writes) defeats the dynamic lookup gap strategy, but at the cost of making the hash too slow to defeat DDoS attacks.
If you actually design hash that won't subject your coin to the threat of being DDoS destroyed in the future, then the dynamic lookup gap strategy can't be avoided. Do some calculations to verify.
You furthermore haven't dealt with the issue of potential cycles in the recomputation graph, which requires a somewhat more sophisticated data structure to handle: A depends on B depends on C which depends on an earlier-computed version of A. (Keeping in mind that there's a non-negligible chance of A immediately modifying A! It happens, on average, a few times per hash).
Each entry in the 'values' table can index to another entry in the 'values' table enabling to trace back.
I missed the part of your proposal that handled that. Furthermore, there's some internal state associated with the mixing that happens at each round -- it's not simplify a crank-through of X iterations of a hash on a static data item. That state is carried forward from the previous half-round (the multiply or the AES mix, respectively), so you have to have a way to backtrack to that.
I believe what I quoted from my rough draft whitepaper is incorrect on that (hadn't looked at that for some months), and each entry in the 'values' table should point to another entry in the table until it is traced back to a stored value.
As I said in my post, there are possibly some weaknesses involved in the use of a single round of AES as a random number generator, but I *suspect* they're not exploitable enough to confer a major speed advantage. That's not an expert part of my conclusion, because I'm not a cryptographer.
Single round can only diffuse over 32-bits (and that doesn't even mean all the 32-bit space is randomized), and there are other attacks such as on the key scheduling.
The GPU has more FLOPs and can mask away the latency by running sufficient threads but it lacks an AES circuit. ASICs can add the AES circuit to eliminate that CPU advantage (and even accelerate the computational portion) and apply the GPU style advantage for masking the random access latency.
The CryptoNote hash keeps GPUs at power efficiency parity (and MemoryCoin 2.0 did that too) and it doesn't defeat ASICs dominance, rather it only delays due to more complexity of implementing an ASIC. And I have stated that making it complex but not impossible to make a superior ASIC is a big risk because it could mean when they do come, they won't be ubiquitous (and this is the reason I aborted my L3crypt design).
And these design choices come at the cost of making your coin DDoS attackable (even worse for MemoryCoin at 10 hpm, not per second) and also the slow proof-of-work hash eliminates the opportunity to solve the anonymity correctly (i.e. not using Tor or I2P) but I won't reveal that to you.
I think you're being overly optimistic about the success of your own approach based upon the flaws in your (completely unexplained) l3scrypt.
I know of no design flaws in L3crypt. It achieves the goal of being fast enough and making it complex to implement an ASIC, leveraging Intel's economy-of-scale with L3 caches (which is even superior to Tilera). Note at 512B writes, the write-back bandwidth becomes a factor in the design. However in attaining that speed, it is vulnerable to the aforementioned lookup-gap approach of trading computation for space. The concept of reading and writing over a memory table is the same in L3crypt and CryptoNote. The difference is the size of the r/w elements, the number of random access iterations relative to the table size, and the resultant speed of the hash. And the design choices in those variables for CryptoNote makes it DDoS attackable because the hash is slow.
If DDoS attacker sends bogus proof-of-work blocks, the calculation time around 1/100 of second for an average node, or 1/1000 second for a high powered node.
This impacts on how many IP addresses you can blacklist per second, and also the propagation time of new blocks which affects the orphan rate[1], which thus impacts how fast transactions can be. DDoS could drive orphan rate skyhigh.
[1]
https://bitcointalksearch.org/topic/reasons-to-keep-10-min-target-blocktime-260180 http://bitcoin.stackexchange.com/a/4958 https://eprint.iacr.org/2013/881.pdf#page=11Of course you can resolve it by reducing decentralization and having everything go through pools that trust each other, a la Bitcoin which now has 1 pool with 50% of hashrate.
You're missing way too many CryptoNight-specific details to be convincing at all. I think that underlying this is an important difference: Your PoW design didn't carry as much information forward between rounds as CN does. Your approach isn't crazy, but you've left way too many important parts out of the analysis.
It is carrying state forward just the same. The differences are the variables I stated above.
Regarding the bandwidth-intensive approach, you're still wrong about where the time is being spent in the GPU. It's about 50/50 in random memory access and AES computation time. Amdahl's law gets you again there -- I'll certainly grant something like a 4x speedup, but it starts to decline after that.
That is because you aren't running enough threads on the GPU to mask away all the latency with the coalescing of memory accesses on the GPU. As
the number of threads increase, this will improve.
Update: I also read your linked thread's comments about the use of AES. You're not looking at the big picture. In the context of a proof-of-work scheme (NOT as the hash to verify integrity), the limitation of 128 bits at each step is unimportant.
In terms of you missing the 'big picture' see my points up-post.
CryptoNote employs AES encryption as a random oracle so that all possible cache table elements should be equally probable at each random access. But AES encryption isn't designed to be a random oracle. Thus there may exist attacks on the structure of the probabilities of random accesses in the table.
Note the AES vulnerability isn't required to implement an ASIC that out peforms. It is an orthogonal potential attack. There might be a way to trade computation for space within some structure that deviates from uniform random distribution given by the misuse of AES encryption.
More to the point, your post has absolutely no substantiation of your claim and has a link to a stackexchange article that in no way suggests any easy-to-exploit repeating pattern of the output bits that could be used to shrink the scratchpad size. If you'd care to actually provide a substantive reference for and explanation of your claim, then perhaps the Monero developers (or bytecoin developers) might take it a little more seriously.
I don't have to do your work for you. Ask a cryptographer that knows about AES, and they can explain this to you in more detail.
Seriously until you get some qualified cryptanalysis on your proof-of-work, you are just blowing hot air.