[XMR] Monero - A secure, private, untraceable cryptocurrency - page 1607.

aminorex

legendary

Activity: 1596

Merit: 1030

Sine secretum non libertas

Quote from: RentaMouse on July 22, 2014, 10:08:51 PM

...Mr X...

Mr X is female, and her name is Eve (because she's "evil"). Just sayin'

smooth

legendary

Activity: 2968

Merit: 1198

Quote from: AnonyMint on July 22, 2014, 08:41:19 PM

Okay but I spent considerable time designing what the CryptoNote designers were attempting to design

Actually as far as I can tell, they delivered what you claim you were attempting to design. You have delivered no work product at all. No code. No white papers. Nothing. Except maybe delivered to yourself, which does not count. It is all hot air.

At this point I would ask that you take this discussion elsewhere. There may well be (theoretical, unidentified and undemonstrated) cryptographic or other issues with Cryptonight, but that is a tiny subset of topics of general interest to the user base and potential user base of Monero. Now we have the past several pages of this thread being taken over by the discussion, and the way the last few exchanges have gone, every appearance this back-and-forth will continue for several more pages. That is inappropriate. Please stop, or just go ahead and create your own thread focused on that particular subtopic.

smooth

legendary

Activity: 2968

Merit: 1198

Quote from: RentaMouse on July 22, 2014, 10:08:51 PM

Your post makes some good points and these issues are definitely being looked it. There is wide agreement that the one minute block time was a mistake absent other changes in the design of the block chain (such as GHOST or similar ideas).

One minor technical correction though:

Quote

In the future those key factors protecting Monero may well change, with more pools online and more decentralisation, plus a much higher tx generation rate resulting in significantly larger blocks requiring longer to validate.

As things stand today larger blocks don't take much longer to validate. The PoW hash is performed not directly on the entire block but on a much faster hash of the block, so it is essentially independent of the block size. The transactions within a block also need to be validated, but currently that is much faster than the PoW. That could change if database lookups are required. Note that if the dominant portion of block validation time is database lookups, then the speed of the PoW becomes unimportant to validation speed.

bitbudget

newbie

Activity: 24

Merit: 0

Quote from: eizh on July 22, 2014, 10:04:49 PM

Quote from: bitbudget on July 22, 2014, 09:56:23 PM

Is there a place where one can discuss technical issues and project development, and get timely replies from devs?
I'm working on a rather interesting project and need some help getting Monero RPC to work.

#monero-dev is the best place. You left the channel right before busoni offered to help regarding your RPC issues.

thanks

I'll try again

smooth

legendary

Activity: 2968

Merit: 1198

Quote from: eizh on July 22, 2014, 10:02:25 PM

Quote from: AnonyMint on July 22, 2014, 08:41:19 PM

CryptoNote employs AES encryption as a random oracle so that all possible cache table elements should be equally probable at each random access. But AES encryption isn't designed to be a random oracle. Thus there may exist attacks on the structure of the probabilities of random accesses in the table.

Can this be quantified somehow? Saying AES may be exploitable isn't that much more interesting than saying that SHA3 or some other hash function may some day be found to have collisions. While no one may disagree with that, no one will care either if the possibility is extremely remote.

Reasonably good info here: http://security.stackexchange.com/questions/8048/why-aes-is-not-used-for-secure-hashing-instead-of-sha-x

RentaMouse

sr. member

Activity: 252

Merit: 250

Most of that long Anonymint post I don't enough about to debate with any authority, but the issue of block time, DDoS and orphan rate has been of more interest to me recently. The section in question was:

Quote

If DDoS attacker sends bogus proof-of-work blocks, the calculation time around 1/100 of second for an average node, or 1/1000 second for a high powered node.

This impacts on how many IP addresses you can blacklist per second, and also the propagation time of new blocks which affects the orphan rate[1], which thus impacts how fast transactions can be. DDoS could drive orphan rate skyhigh.

[1] https://bitcointalksearch.org/topic/reasons-to-keep-10-min-target-blocktime-260180
http://bitcoin.stackexchange.com/a/4958
https://bitcointalksearch.org/topic/m.3647346
https://eprint.iacr.org/2013/881.pdf#page=11

Of course you can resolve it by reducing decentralization and having everything go through pools that trust each other, a la Bitcoin which now has 1 pool with 50% of hashrate.

I won't disagree with the figure of 1ms for a high powered node to verify a share, my rough figures suggest its within that order of magnitude for the average pool server.

Interpreting your argument into more easily understood language (doing my best not lose the essence of it but it does allow more readers to participate):

An attacker, lets call him Mr X, who could make a significant financial gain by disrupting the Monero network and delaying transaction confirmations would not have to spend a huge amount to obtain the use of a 2M node botnet (interesting aside, from another rough calculation an average botnet could make $0.75/hr per 1000 nodes mining XMR currently) and use it for a DDoS attack against mining pools by sending bogus shares. The pools will ban each node for submitting bogus shares, but at their maximum rate that could only be 1000 banned every second. With the current distribution there are only about 5 pools the botnet would need to attack (although there are at least 3 other "dark" pools of 1-3MH/s, not sure how Mr X identifies them). If they spent all of their processing power banning botnet nodes its going to require around 7 minutes to blacklist the entire botnet, but after 4 minutes over half of it will have been blocked. With an average blocktime of 1min there could be several found blocks waiting to be submitted, so we can be pretty sure that one or more will have been accepted by a pool and started to propagate over the network.

Which takes us to the other part of the argument, by hammering the pools Mr X slows down network communication leading to segmentation and mini-forks being created which will subsequently be orphaned. With its fast block time Monero is more at risk of divergence - ideally you want the majority of the network to have validated a block before the next one is found, although it can cope with a few "quick blocks" in a row because there will always be a longer one to provide time for the network to agree (converge) again. Incidentally this is why exchanges tend to require a large number of confirmations before accepting a tx, to ensure they arent invalidated by an orphaned chain. Most of the theory here about problems with convergence seems to be based on bitcoin and a large array of decentralised network nodes, meaning you only have to cause a small increase in propagation time to get an exponential effect over the entire network. I would actually describe the Monero mining network as semi-centralised, because most of the pools configure each other as priority nodes so they are all connected by 1-3 network hops, meaning new blocks only in fact require 4 hops to propagate on average.

So it must be easy for Mr X then - he only has to DDoS those core pools and he can take down the whole network surely? Well not really, we've already seen that it would take a large botnet to kill the pools for a few minutes, and just putting them under 50% load for twice as long will have even less effect. It also ignores the fact that the pools usually run their daemons on separate servers to the pool code, so he may delay a new block for a few minutes but it will be able to propagate quickly once found. Currently the best Mr X can probably achieve is a few minutes of disruption and some agitated pool ops before the network sorts itself out and the blockchain continues.

In the future those key factors protecting Monero may well change, with more pools online and more decentralisation, plus a much higher tx generation rate resulting in significantly larger blocks requiring longer to validate. This could make it possible for Mr X to launch a more disruptive attack so it is worth considering methods of mitigating that risk, one of the more obvious being to increase the block time, which has been discussed before and I dont believe it has been ruled out by the dev team yet. Monero is still at the stage where major changes to the protocol could be made if deemed necessary, it is not set in stone yet.

eizh

hero member

Activity: 560

Merit: 500

Quote from: bitbudget on July 22, 2014, 09:56:23 PM

Is there a place where one can discuss technical issues and project development, and get timely replies from devs?
I'm working on a rather interesting project and need some help getting Monero RPC to work.

#monero-dev is the best place. You left the channel right before busoni offered to help regarding your RPC issues.

eizh

hero member

Activity: 560

Merit: 500

Quote from: AnonyMint on July 22, 2014, 08:41:19 PM

CryptoNote employs AES encryption as a random oracle so that all possible cache table elements should be equally probable at each random access. But AES encryption isn't designed to be a random oracle. Thus there may exist attacks on the structure of the probabilities of random accesses in the table.

Can this be quantified somehow? Saying AES may be exploitable isn't that much more interesting than saying that SHA3 or some other hash function may some day be found to have collisions. While no one may disagree with that, no one will care either if the possibility is extremely remote.

bitbudget

newbie

Activity: 24

Merit: 0

Is there a place where one can discuss technical issues and project development, and get timely replies from devs?
I'm working on a rather interesting project and need some help getting Monero RPC to work.

phzi

hero member

Activity: 700

Merit: 500

Quote from: AnonyMint on July 22, 2014, 08:41:19 PM

Seriously until you get some qualified cryptanalysis on your proof-of-work, you are just blowing hot air.

Says the guy blowing more hot air then anybody, and refusing to back up his statements...

eizh

hero member

Activity: 560

Merit: 500

Quote from: uvwvj on July 22, 2014, 08:58:04 PM

Otherwise having to read 1 page thesis about what appears to be who thinks who is god is going to turn off others who are the investing in this coin to use it , and just turn it in another shitty ass Altcoin.

No one is obligated to read any post. It's precisely because XMR isn't "another shitty ass Altcoin" that these sort of discussions take place. The level of intelligence on display in most altcoin threads is just depressing.

The ones who may be scared off are not investors -- it would be the legions of small-time day traders who are cluelessly hemorrhaging their own money.

uvwvj

full member

Activity: 145

Merit: 100

Alright all you "smart people" arguing who has the biggest nutsack or brain, could you all just work together on getting a Bitcoind type system up or an official GUI wallet so we can store this coin elsewhere other than Poloniex and we can use it to buy shit and gamble. This adds value to the coin.

Otherwise having to read 1 page thesis about what appears to be who thinks who is god is going to turn off others who are the investing in this coin to use it , and just turn it in another shitty ass Altcoin.

AnonyMint

hero member

Activity: 518

Merit: 521

Quote from: dga on July 22, 2014, 03:11:11 PM

What gives is very simple: You're wrong;

I still believe you are wrong. See below...

Quote from: dga on July 22, 2014, 03:11:11 PM

you're also being needlessly insulting, in a discussion that need not become personal.

You claimed authority with "professional opinion" instead of publishing analysis of all possible attacks. Sorry peer review requires we publish analysis not authority.

Quote from: dga on July 22, 2014, 03:11:11 PM

If you'd like to engage in a credential pissing match, fine, but that seems like a waste of time.

No I'd like to engage in published analysis instead of claiming the blackbox (closed source) called "professional opinion".

Quote from: dga on July 22, 2014, 03:11:11 PM

Let's settle for me pointing out that I'm the original source of the code that's now used in the inner loop of the CPU cryptonight mining and block verification code, so I will claim some familiarity thereby.

Okay but I spent considerable time designing what the CryptoNote designers were attempting to design (even designing a 512 BYTE version of the ChaCha ARX style hash to make it fast enough) and wrote a very detailed set of whitepapers on the L3crypt and the Shazam! hash with around 30 citations. Also thought about the math and wrote it down, even for example studying cryptanalysis attacks on the design of ARX hashes.

Quote from: dga on July 22, 2014, 03:11:11 PM

You haven't posted enough details about your L3scrypt design to determine if your analysis actually applies to CryptoNight, but let's walk through the math a little:

The math I posted applies to any algorithm that uses randomized lookups from reading and writing to a table.

Quote from: dga on July 22, 2014, 03:11:11 PM

There are 1,000,000 random accesses of the inner loop of CryptoNight.

There are 131,072 individual 128 bit slots in the lookup table.

...

Your approach: Dynamic recomputation.

The first flaw in your analysis: Your l3scrypt seems, from what you wrote below, to use 512b (bit? likely, if scrypt) entries. CryptoNight uses 128 bit entries, which means that the cost of a 24 bit counter to indicate the last-modified-in round information for a particular value is still fairly significant in comparison to the original storage.

As an example, consider LOOKUP_GAP=2:
1MB of full cache to store actual values + 64k*4bytes ~= 256KB = 1.25MB of space.

Correct if you make your hash so slow that you can't deal with DDoS attacks (which is the case for CryptoNote), the size of the 'values' table needed to walk back each path of computation to trade computation for space, becomes larger than the space to store the values normally.

Whereas in L3crypt I have 512B entries in order to make the hash fast enough and still cover 1MB of cache to keep it in L3 cache in order to defeat economies-of-scale with Tilera cpus, GPUs, and ASICs.

So agreed CryptoNote in that case (1MB/16B table with 128-bit, i.e. 16B, elements with 1M writes) defeats the dynamic lookup gap strategy, but at the cost of making the hash too slow to defeat DDoS attacks.

If you actually design hash that won't subject your coin to the threat of being DDoS destroyed in the future, then the dynamic lookup gap strategy can't be avoided. Do some calculations to verify.

Quote from: dga on July 22, 2014, 03:11:11 PM

You furthermore haven't dealt with the issue of potential cycles in the recomputation graph, which requires a somewhat more sophisticated data structure to handle: A depends on B depends on C which depends on an earlier-computed version of A. (Keeping in mind that there's a non-negligible chance of A immediately modifying A! It happens, on average, a few times per hash).

Each entry in the 'values' table can index to another entry in the 'values' table enabling to trace back.

Quote from: dga on July 22, 2014, 03:11:11 PM

I missed the part of your proposal that handled that. Furthermore, there's some internal state associated with the mixing that happens at each round -- it's not simplify a crank-through of X iterations of a hash on a static data item. That state is carried forward from the previous half-round (the multiply or the AES mix, respectively), so you have to have a way to backtrack to that.

I believe what I quoted from my rough draft whitepaper is incorrect on that (hadn't looked at that for some months), and each entry in the 'values' table should point to another entry in the table until it is traced back to a stored value.

Quote from: dga on July 22, 2014, 03:11:11 PM

As I said in my post, there are possibly some weaknesses involved in the use of a single round of AES as a random number generator, but I *suspect* they're not exploitable enough to confer a major speed advantage. That's not an expert part of my conclusion, because I'm not a cryptographer.

Single round can only diffuse over 32-bits (and that doesn't even mean all the 32-bit space is randomized), and there are other attacks such as on the key scheduling.

The GPU has more FLOPs and can mask away the latency by running sufficient threads but it lacks an AES circuit. ASICs can add the AES circuit to eliminate that CPU advantage (and even accelerate the computational portion) and apply the GPU style advantage for masking the random access latency.

The CryptoNote hash keeps GPUs at power efficiency parity (and MemoryCoin 2.0 did that too) and it doesn't defeat ASICs dominance, rather it only delays due to more complexity of implementing an ASIC. And I have stated that making it complex but not impossible to make a superior ASIC is a big risk because it could mean when they do come, they won't be ubiquitous (and this is the reason I aborted my L3crypt design).

And these design choices come at the cost of making your coin DDoS attackable (even worse for MemoryCoin at 10 hpm, not per second) and also the slow proof-of-work hash eliminates the opportunity to solve the anonymity correctly (i.e. not using Tor or I2P) but I won't reveal that to you.

Quote from: dga on July 22, 2014, 03:11:11 PM

I think you're being overly optimistic about the success of your own approach based upon the flaws in your (completely unexplained) l3scrypt.

I know of no design flaws in L3crypt. It achieves the goal of being fast enough and making it complex to implement an ASIC, leveraging Intel's economy-of-scale with L3 caches (which is even superior to Tilera). Note at 512B writes, the write-back bandwidth becomes a factor in the design. However in attaining that speed, it is vulnerable to the aforementioned lookup-gap approach of trading computation for space. The concept of reading and writing over a memory table is the same in L3crypt and CryptoNote. The difference is the size of the r/w elements, the number of random access iterations relative to the table size, and the resultant speed of the hash. And the design choices in those variables for CryptoNote makes it DDoS attackable because the hash is slow.

If DDoS attacker sends bogus proof-of-work blocks, the calculation time around 1/100 of second for an average node, or 1/1000 second for a high powered node.

This impacts on how many IP addresses you can blacklist per second, and also the propagation time of new blocks which affects the orphan rate[1], which thus impacts how fast transactions can be. DDoS could drive orphan rate skyhigh.

[1] https://bitcointalksearch.org/topic/reasons-to-keep-10-min-target-blocktime-260180
http://bitcoin.stackexchange.com/a/4958
https://eprint.iacr.org/2013/881.pdf#page=11

Of course you can resolve it by reducing decentralization and having everything go through pools that trust each other, a la Bitcoin which now has 1 pool with 50% of hashrate.

Quote from: dga on July 22, 2014, 03:11:11 PM

You're missing way too many CryptoNight-specific details to be convincing at all. I think that underlying this is an important difference: Your PoW design didn't carry as much information forward between rounds as CN does. Your approach isn't crazy, but you've left way too many important parts out of the analysis.

It is carrying state forward just the same. The differences are the variables I stated above.

Quote from: dga on July 22, 2014, 03:11:11 PM

Regarding the bandwidth-intensive approach, you're still wrong about where the time is being spent in the GPU. It's about 50/50 in random memory access and AES computation time. Amdahl's law gets you again there -- I'll certainly grant something like a 4x speedup, but it starts to decline after that.

That is because you aren't running enough threads on the GPU to mask away all the latency with the coalescing of memory accesses on the GPU. As the number of threads increase, this will improve.

Quote from: dga on July 22, 2014, 03:11:11 PM

Update: I also read your linked thread's comments about the use of AES. You're not looking at the big picture. In the context of a proof-of-work scheme (NOT as the hash to verify integrity), the limitation of 128 bits at each step is unimportant.

In terms of you missing the 'big picture' see my points up-post.

CryptoNote employs AES encryption as a random oracle so that all possible cache table elements should be equally probable at each random access. But AES encryption isn't designed to be a random oracle. Thus there may exist attacks on the structure of the probabilities of random accesses in the table.

Note the AES vulnerability isn't required to implement an ASIC that out peforms. It is an orthogonal potential attack. There might be a way to trade computation for space within some structure that deviates from uniform random distribution given by the misuse of AES encryption.

Quote from: dga on July 22, 2014, 03:11:11 PM

More to the point, your post has absolutely no substantiation of your claim and has a link to a stackexchange article that in no way suggests any easy-to-exploit repeating pattern of the output bits that could be used to shrink the scratchpad size. If you'd care to actually provide a substantive reference for and explanation of your claim, then perhaps the Monero developers (or bytecoin developers) might take it a little more seriously.

I don't have to do your work for you. Ask a cryptographer that knows about AES, and they can explain this to you in more detail.

Seriously until you get some qualified cryptanalysis on your proof-of-work, you are just blowing hot air.

bitbudget

newbie

Activity: 24

Merit: 0

Hey, can someone help me with RPC commands? I get empty "result": { } on get_payments method.

Quote

Request:
array (
  'jsonrpc' => '2.0',
  'method' => 'get_payments',
  'id' => 1378931342,
  'params' =>
  array (
   'payment_id' => '64-char string',
  ),
)

Response JSON:
'{
  "id": 1378931342,
  "jsonrpc": "2.0",
  "result": {
  }
}'

However incoming_transfers method works fine

Quote

Request:
array (
  'jsonrpc' => '2.0',
  'method' => 'incoming_transfers',
  'id' => 1998971379,
  'params' =>
  array (
   'transfer_type' => 'all',
  ),
)

Response JSON:
'{
  "id": 1998971379,
  "jsonrpc": "2.0",
  "result": {
   "transfers": [{
   "amount": 90000000000,
   "global_index": 86923,
   "spent": false,
   "tx_hash": "",
   "tx_hash_proper": "AAAAAAAAAAAAAAAAAAAAA"
   },{
   "amount": 200000000000,
   "global_index": 242013,
   "spent": false,
   "tx_hash": "",
   "tx_hash_proper": "AAAAAAAAAAAAAAAAAAAAA"
   }]
  }
}'

DogTheHunter

sr. member

Activity: 784

Merit: 272

Quote from: r05 on July 22, 2014, 04:08:51 PM

These guys are brilliant!

certainly worth a read.

r05

full member

Activity: 193

Merit: 100

test cryptocoin please ignore

These guys are brilliant!

dga

hero member

Activity: 737

Merit: 511

Quote from: AnonyMint on July 22, 2014, 01:24:57 PM

Quote from: AnonyMint on July 22, 2014, 04:14:50 AM

However that is a significant disadvantage to all those who run a 32-bit operating system, which is apparently still greater than 50% of all computers:

http://community.spiceworks.com/topic/426628-windows-32-bit-vs-64-bit-market-share

http://en.wikipedia.org/wiki/Usage_share_of_operating_systems#Desktop_and_laptop_computers

Probably due to having twice as many fat registers in 64-bit mode, which means among other possibilities you can pipeline up to doubly better (although a hyperthreaded CPU should compensate if your pipelining is nailed fully with 16 fat registers).

Quote from: dga on July 22, 2014, 10:14:26 AM

I've posted an informal summary of my analysis of the CryptoNight algorithm earlier in this thread with respect to GPU balance and its likely eventual balance with ASICs.

It's good.

Link please?

Quote from: dga on July 22, 2014, 10:26:01 AM

Quote from: AnonyMint on July 22, 2014, 04:29:29 AM

Quote from: smooth on July 22, 2014, 04:10:33 AM

The L3 cache by itself is almost half of the chip.

I looked at an image of the Haswell die and appears to be less than 20%. The APU (GPU) is taking up more space on the consumer models. On the server models there is no GPU and the cache is probably a higher percentage of the die.

Quote from: smooth on July 22, 2014, 04:10:33 AM

There is also a 64 bit multiply, which is I'm told is non-trivial. Once you combine that with your observation about Intel having a (likely persistent) process advantage (and also the inherent average unit cost advantage of a widely-used general purpose device), there just isn't much if anything left for an ASIC-maker to to work with.

So no I don't think the point is really valid. You won't be able to get thousands of times anything with a straightforward ASIC design here. There may be back doors though, we don't know. The point about lack of a clear writeup and peer review is valid.

Quote

The CPU has an inherent disadvantage in that it is designed to be a general purpose computing device so it can't be as specialized at any one computation as an ASIC can be.

This is obviously going to be true, but the scope of the task here is very different. Thousands of copies will not work.

I believe that is wrong. I suspect an ASIC can be designed that vastly outperform (at least on a power efficiency basis) and one of the reasons is the algorithm is so complex, thus it probably has many ways to be optimized with specific circuitry instead of generalized circuitry. My point is isolating a simpler ("enveloped") instruction such as aesinc would be a superior strategy (and embrace USB pluggable ASICs and get them spread out to the consumer).

Also I had noted (find my post in my thread a couple of months ago) that the way the AES is incorrectly employed as a random oracle (as the index to lookup in the memory table), the algorithm is very likely subject to some reduced solution space. This is perhaps Claymore's advantage (I could probably figure it out if I was inclined to spend sufficient time on it).

There is no cryptographic analysis of the hash. It might have impossible images, collisions, etc..

I strongly disagree.

The algorithm is *not* complex, it's very simple. Grab a random-indexed 128 bit value from the big lookup table. Mix it using a single round of AES. Store part of the result back. Use that to index the next item. Mix that with a 64 bit multiply. Store back. Repeat. It's intellectually very close to scrypt, with a few tweaks to take advantage of things that are fast on modern CPUs.

I know more about this because I independently developed an algorithm several months ago that is very similar which I named L3scrypt. I also solved several problems which are not solved in CryptoNite such as the speed.

The concept of lookup can still be utilized to trade computation for space. I will quote from my white papers as follows which also explains why I abandoned it (at least until I can do some real world testing on GPUs) when I realized the coalescing of memory access is likely much more sophisticated on the GPU.

Code: (AnonyMint)

However, since first loop of L3crypt is overwriting values in random order instead of sequentially, a more
complex data structure is required for implementing "lookup gap" than was the case for Scrypt. For every element
store in an elements table the index to a values table, index of a stored value in that values table and the
number of iterations of H required on the stored value. Each time an element in the values table needs to be
overwritten, an additional values table must be created because other elements may reference the existing stored
value.

Thus for example, reducing the memory required by up to half (if no element is overwritten), doubles the number of H
computed for the input of FH as follows because there is a recursive 50% chance to recompute H before reaching a
stored value.

1 + 1/2 + 1/4 + 1/8 + 1/16 + ... = 2 [15]

Storing only every third V[j] reducing the memory required by up to two-thirds (if no element is overwritten), trebles the
number of H computed for the input of FH as follows because there is a recursive 2/3 chance to recompute H before
reaching a stored value.

1 + 2/3 + 4/9 + 8/27 + 16/81 + ... = 3

The increased memory required due to overwritten 512B elements is approximately the factor n. With n = 8 to reduce the
1MB memory footprint to 256KB would require 32X more computation of H if the second loop isn't also overwriting elements.
Optionally given the second loop overwrites twice as many 32B elements, to reduce the 1MB memory footprint to 256KB
would also require 4X more computation of FH.

However since execution time of the first loop bounded by latency can be significantly reduced by trading recomputation of
H for lower memory requirements if FLOPs exceed the CPU by significantly more than a factor of 8, it is a desirable precaution
to make the the latency bound of the second loop a significant portion of the execution time so L3crypt remains latency
bound in that case.

Even without employing "lookup gap", the GPU could potentially execute more than 200 concurrent instances of L3crypt to
leverage its superior FLOPs and offset the 25x slower main memory latency and the CPU's 8 hyperthreads.

So if you can trade computation for space, then an ASIC can potentially clobber the CPU. The GPU would be beating the CPU, except for the inclusion of the AES instructions which the GPU doesn't have. An ASIC won't have this limitation.

Also the use of AES as random oracle to generate the lookup in the table is a potential major snafu, because AES is not designed to be a hash function. Thus it is possible that certain non-randomized patterns exist which can be exploited. I covered this in post in my thread, which the Monero developers were made of aware of but apparently decided to conveniently ignore(?).

Adding all these together it may also be possible to utilize more efficient caching design on the ASIC that is tailored for the access profile of this algorithm. There are sram caches for ASICs (e.g. Toshiba) and there is a lot of leeway in terms of parameters such as set associativity, etc..

So sorry, I don't think you know in depth what you are talking about. And I think I do.

Quote from: dga on July 22, 2014, 10:26:01 AM

Remember that there are two ways to implement the CryptoNight algorithm:
(1) Try to fit a few copies in cache and pound the hell out of them;
(2) Fit a lot of copies in DRAM and use a lot of bandwidth.

Approach (1) is what's being done on CPUs. Approach (2) is what's being done on GPUs.

And the coalescing of memory accesses for #2 is precisely what I meant. It is only the AES instructions that impeding the GPU from clobbering the CPU.

Quote from: dga on July 22, 2014, 10:26:01 AM

I tried implementing #2 on CPU and couldn't get it to perform as well as my back-of-the-envelope analysis suggests it should, but it's possible it could outperform the current CPU implementations by about 20%. (I believe yvg1900 tried something similar and came to the same conclusion I did).

No because external memory access parameters decline as the number of threads simultaneously accessing it increases (less so for the high end server CPUs). So what you saw is what I expected. If you need me to cite a reference I can go dig it up.

Quote from: dga on July 22, 2014, 10:26:01 AM

An ASIC approach might well be better off with #2, however, but it simply moves the bottleneck to the memory controller, and it's a hard engineering job compared to building an AES unit, a 64 bit multiplier, and 2MB of DRAM. But that 2MB of DRAM area limits you in a big way.

Computation can be traded for space to use fast caches. See what I wrote up-post. And or you could design an ASIC to drop into an existing GPU memory controller setup. Etc.. There are numerous options. Yes it is a more difficult engineering job that is a worse for CryptoNite because it means who ever is first to solve it, will limit supply and give an incredible advantage to a few, which is what plagued Bitcoin in 2013 until the ASICs became ubiquitous. This proprietary advantage might linger for much longer duration.

Quote from: dga on July 22, 2014, 10:26:01 AM

In my best professional opinion, barring funky weaknesses lingering within the single round of AES, CryptoNight is a very solid PoW. Its only real disadvantage is comparatively slow verification time, which really hurts the time to download and verify the blockchain.

In my professional opinion, I think you lack depth of understanding. What gives?

What gives is very simple: You're wrong; you're also being needlessly insulting, in a discussion that need not become personal. If you'd like to engage in a credential pissing match, fine, but that seems like a waste of time. Let's settle for me pointing out that I'm the original source of the code that's now used in the inner loop of the CPU cryptonight mining and block verification code, so I will claim some familiarity thereby.

You haven't posted enough details about your L3scrypt design to determine if your analysis actually applies to CryptoNight, but let's walk through the math a little:

There are 1,000,000 random accesses of the inner loop of CryptoNight.

There are 131,072 individual 128 bit slots in the lookup table.

Simple approach #1: Store only elements after they have been modified as part of the execution of CryptoNight. Assume *zero* cost to compute the "initial" table entries on-demand, but assume that all values are stored after they have been modified, so that the inner loop doesn't have to backtrack:

- Balls-in-bins analysis of 1M balls into 128k bins; how many are empty at the end? As a first approximation, not very many at all. Saves less than 10% of the storage space. Not an effective optimization.

Your approach: Dynamic recomputation.

The first flaw in your analysis: Your l3scrypt seems, from what you wrote below, to use 512b (bit? likely, if scrypt) entries. CryptoNight uses 128 bit entries, which means that the cost of a 24 bit counter to indicate the last-modified-in round information for a particular value is still fairly significant in comparison to the original storage.

As an example, consider LOOKUP_GAP=2:
1MB of full cache to store actual values + 64k*4bytes ~= 256KB = 1.25MB of space.

You furthermore haven't dealt with the issue of potential cycles in the recomputation graph, which requires a somewhat more sophisticated data structure to handle: A depends on B depends on C which depends on an earlier-computed version of A. (Keeping in mind that there's a non-negligible chance of A immediately modifying A! It happens, on average, a few times per hash).

I missed the part of your proposal that handled that. Furthermore, there's some internal state associated with the mixing that happens at each round -- it's not simplify a crank-through of X iterations of a hash on a static data item. That state is carried forward from the previous half-round (the multiply or the AES mix, respectively), so you have to have a way to backtrack to that. Likely, you could store another bit in the entry to indicate if it was in the first or second half-round, but you still need to be able to track that part back to an up-to-date stored value as well. And you need to have a previous round in which not only did you generate the value to be stored in the LOOKUP_GAP space, but where you're able to go back to a previous round and find the initial value of 'a' that was used in the AES encryption.

Your stored values table must be versioned, because each subsequent modification updates it. That's going to add another bit of bookkeeping overhead.

As I said in my post, there are possibly some weaknesses involved in the use of a single round of AES as a random number generator, but I *suspect* they're not exploitable enough to confer a major speed advantage. That's not an expert part of my conclusion, because I'm not a cryptographer.

I think you're being overly optimistic about the success of your own approach based upon the flaws in your (completely unexplained) l3scrypt. I'm delighted at the idea that CryptoNight is flawed, but you've completely failed to prove it, and presenting an analysis of some *other* PoW function that you designed and that, as far as I can tell, exists only in your head and in your own private document repository, is hardly a way to go about it.

You're missing way too many CryptoNight-specific details to be convincing at all. I think that underlying this is an important difference: Your PoW design didn't carry as much information forward between rounds as CN does. Your approach isn't crazy, but you've left way too many important parts out of the analysis.

Regarding the bandwidth-intensive approach, you're still wrong about where the time is being spent in the GPU. It's about 50/50 in random memory access and AES computation time. Amdahl's law gets you again there -- I'll certainly grant something like a 4x speedup, but it starts to decline after that.

Update: I also read your linked thread's comments about the use of AES. You're not looking at the big picture. In the context of a proof-of-work scheme (NOT as the hash to verify integrity), the limitation of 128 bits at each step is unimportant. More to the point, your post has absolutely no substantiation of your claim and has a link to a stackexchange article that in no way suggests any easy-to-exploit repeating pattern of the output bits that could be used to shrink the scratchpad size. If you'd care to actually provide a substantive reference for and explanation of your claim, then perhaps the Monero developers (or bytecoin developers) might take it a little more seriously.

otila

sr. member

Activity: 336

Merit: 250

Quote from: fluffypony on July 22, 2014, 02:41:21 PM

Quote from: otila on July 22, 2014, 02:21:40 PM

Quote from: fluffypony on July 22, 2014, 04:28:26 AM

We didn't create it, we inherited it from the CryptoNote reference code. All optimisations we've made to it are public and in master on github.

It had world's slowest AES implementation in git for a month, before someone bothered to add AES-NI support.
Why did you not have AES-NI from day one?

How? thankful_for_today forked and launched it, everyone that played around with it from very early on (myself included) solo mined on the miner that came with it. It took some time to even figure out how all the moving pieces in the code fit together, much less begin to grok it. Only after all of that could any optimisations happen, and that was LONG before we added AES-NI support (which was a complete PITA). Remember: at the time it was code we inherited, not code we wrote.

OK, I forgot the dev change. I also looked at the code and thought WTH... I optimized it and made it three times faster (I didn't try AES-NI, because it was not clear was it just mixing features of AES randomly in the PoW), and I also wondered the great effort that was put into the obfuscation.
I found one block and then AES-NI patch was released. Roll Eyes

Oh and I missed the initial announcement because BCT didn't send me new thread notification about monero.

fluffypony

donator

Activity: 1274

Merit: 1060

GetMonero.org / MyMonero.com

Quote from: otila on July 22, 2014, 02:21:40 PM

Quote from: fluffypony on July 22, 2014, 04:28:26 AM

We didn't create it, we inherited it from the CryptoNote reference code. All optimisations we've made to it are public and in master on github.

It had world's slowest AES implementation in git for a month, before someone bothered to add AES-NI support.
Why did you not have AES-NI from day one?

How? thankful_for_today forked and launched it, everyone that played around with it from very early on (myself included) solo mined on the miner that came with it. It took some time to even figure out how all the moving pieces in the code fit together, much less begin to grok it. Only after all of that could any optimisations happen, and that was LONG before we added AES-NI support (which was a complete PITA). Remember: at the time it was code we inherited, not code we wrote.

cAPSLOCK

legendary

Activity: 3766

Merit: 5146

Note the unconventional cAPITALIZATION!

Quote from: tacotime on July 22, 2014, 02:27:23 PM

Quote from: otila on July 22, 2014, 02:21:40 PM

It had world's slowest AES implementation in git for a month, before someone bothered to add AES-NI support.
Why did you not have AES-NI from day one?

None of the current members of the core team had anything to do with the initial reference code or the cryptocurrency's inception.

So what he meant was: "Congrats for the AES-NI support!"

Topic: [XMR] Monero - A secure, private, untraceable cryptocurrency - page 1607. (Read 4671975 times)