Author

Topic: [XMR] Monero - A secure, private, untraceable cryptocurrency - page 1608. (Read 4670622 times)

hero member
Activity: 518
Merit: 521
We've had the troll wars, now we have the crypto-expert wars! Cheesy

False dilemma?

Agreed only good can come out of hashing out such details earlier rather than later.
r05
full member
Activity: 193
Merit: 100
test cryptocoin please ignore
We've had the troll wars, now we have the crypto-expert wars! Cheesy

False dilemma?

I have no idea what you mean?
legendary
Activity: 1596
Merit: 1030
Sine secretum non libertas
We've had the troll wars, now we have the crypto-expert wars! Cheesy

False dilemma?
legendary
Activity: 1596
Merit: 1030
Sine secretum non libertas
the emission halves every 512 days?

This is a continuous process, not suddenly halved (like bitcoin). The continuous decrease is such that after 512 days the reward is half of original one.

i always wondered why satoshi designed bitcoin to drop off these gigantic cliffs every few years.

It is a marvelous opportunity to study the impulse response of a real economy.  Generations of econometricians will be deeply indebted to (presumptive default gender) him for the emission step function.  The guinea pigs already owe him for bitcoin, so they can't complain much.  You can unwind a lot of structure from a binary ping!

r05
full member
Activity: 193
Merit: 100
test cryptocoin please ignore
We've had the troll wars, now we have the crypto-expert wars! Cheesy
hero member
Activity: 518
Merit: 521
However that is a significant disadvantage to all those who run a 32-bit operating system, which is apparently still greater than 50% of all computers:

http://community.spiceworks.com/topic/426628-windows-32-bit-vs-64-bit-market-share

http://en.wikipedia.org/wiki/Usage_share_of_operating_systems#Desktop_and_laptop_computers

Probably due to having twice as many fat registers in 64-bit mode, which means among other possibilities you can pipeline up to doubly better (although a hyperthreaded CPU should compensate if your pipelining is nailed fully with 16 fat registers).

I've posted an informal summary of my analysis of the CryptoNight algorithm earlier in this thread with respect to GPU balance and its likely eventual balance with ASICs.

It's good.

Link please?

The L3 cache by itself is almost half of the chip.

I looked at an image of the Haswell die and appears to be less than 20%. The APU (GPU) is taking up more space on the consumer models. On the server models there is no GPU and the cache is probably a higher percentage of the die.

There is also a 64 bit multiply, which is I'm told is non-trivial. Once you combine that with your observation about Intel having a (likely persistent) process advantage (and also the inherent average unit cost advantage of a widely-used general purpose device), there just isn't much if anything left for an ASIC-maker to to work with.

So no I don't think the point is really valid. You won't be able to get thousands of times anything with a straightforward ASIC design here. There may be back doors though, we don't know. The point about lack of a clear writeup and peer review is valid.

Quote
The CPU has an inherent disadvantage in that it is designed to be a general purpose computing device so it can't be as specialized at any one computation as an ASIC can be.

This is obviously going to be true, but the scope of the task here is very different. Thousands of copies will not work.

I believe that is wrong. I suspect an ASIC can be designed that vastly outperform (at least on a power efficiency basis) and one of the reasons is the algorithm is so complex, thus it probably has many ways to be optimized with specific circuitry instead of generalized circuitry. My point is isolating a simpler ("enveloped") instruction such as aesinc would be a superior strategy (and embrace USB pluggable ASICs and get them spread out to the consumer).

Also I had noted (find my post in my thread a couple of months ago) that the way the AES is incorrectly employed as a random oracle (as the index to lookup in the memory table), the algorithm is very likely subject to some reduced solution space. This is perhaps Claymore's advantage (I could probably figure it out if I was inclined to spend sufficient time on it).

There is no cryptographic analysis of the hash. It might have impossible images, collisions, etc..

I strongly disagree.

The algorithm is *not* complex, it's very simple.  Grab a random-indexed 128 bit value from the big lookup table.  Mix it using a single round of AES.  Store part of the result back.  Use that to index the next item.  Mix that with a 64 bit multiply.  Store back.  Repeat.  It's intellectually very close to scrypt, with a few tweaks to take advantage of things that are fast on modern CPUs.

I know more about this because I independently developed an algorithm several months ago that is very similar which I named L3scrypt. I also solved several problems which are not solved in CryptoNite such as the speed.

The concept of lookup can still be utilized to trade computation for space. I will quote from my white papers as follows which also explains why I abandoned it (at least until I can do some real world testing on GPUs) when I realized the coalescing of memory access is likely much more sophisticated on the GPU.

Code: (AnonyMint)
However, since first loop of L3crypt is overwriting values in random order instead of sequentially, a more
complex data structure is required for implementing "lookup gap" than was the case for Scrypt. For every element
store in an elements table the index to a values table, index of a stored value in that values table and the
number of iterations of H required on the stored value. Each time an element in the values table needs to be
overwritten, an additional values table must be created because other elements may reference the existing stored
value.

Thus for example, reducing the memory required by up to half (if no element is overwritten), doubles the number of H
computed for the input of FH as follows because there is a recursive 50% chance to recompute H before reaching a
stored value.

   1 + 1/2 + 1/4 + 1/8 + 1/16 + ... = 2 [15]

Storing only every third V[j] reducing the memory required by up to two-thirds (if no element is overwritten), trebles the
number of H computed for the input of FH as follows because there is a recursive 2/3 chance to recompute H before
reaching a stored value.

   1 + 2/3 + 4/9 + 8/27 + 16/81 + ... = 3

The increased memory required due to overwritten 512B elements is approximately the factor n. With n = 8 to reduce the
1MB memory footprint to 256KB would require 32X more computation of H if the second loop isn't also overwriting elements.
Optionally given the second loop overwrites twice as many 32B elements, to reduce the 1MB memory footprint to 256KB
would also require 4X more computation of FH.

However since execution time of the first loop bounded by latency can be significantly reduced by trading recomputation of
H for lower memory requirements if FLOPs exceed the CPU by significantly more than a factor of 8, it is a desirable precaution
to make the the latency bound of the second loop a significant portion of the execution time so L3crypt remains latency
bound in that case.

Even without employing "lookup gap", the GPU could potentially execute more than 200 concurrent instances of L3crypt to
leverage its superior FLOPs and offset the 25x slower main memory latency and the CPU's 8 hyperthreads.

So if you can trade computation for space, then an ASIC can potentially clobber the CPU. The GPU would be beating the CPU, except for the inclusion of the AES instructions which the GPU doesn't have. An ASIC won't have this limitation.

Also the use of AES as random oracle to generate the lookup in the table is a potential major snafu, because AES is not designed to be a hash function. Thus it is possible that certain non-randomized patterns exist which can be exploited. I covered this in post in my thread, which the Monero developers were made of aware of but apparently decided to conveniently ignore(?).

Adding all these together it may also be possible to utilize more efficient caching design on the ASIC that is tailored for the access profile of this algorithm. There are sram caches for ASICs (e.g. Toshiba) and there is a lot of leeway in terms of parameters such as set associativity, etc..

So sorry, I don't think you know in depth what you are talking about. And I think I do.

Remember that there are two ways to implement the CryptoNight algorithm:
  (1) Try to fit a few copies in cache and pound the hell out of them;
  (2) Fit a lot of copies in DRAM and use a lot of bandwidth.

Approach (1) is what's being done on CPUs.  Approach (2) is what's being done on GPUs.

And the coalescing of memory accesses for #2 is precisely what I meant. It is only the AES instructions that impeding the GPU from clobbering the CPU.

I tried implementing #2 on CPU and couldn't get it to perform as well as my back-of-the-envelope analysis suggests it should, but it's possible it could outperform the current CPU implementations by about 20%.  (I believe yvg1900 tried something similar and came to the same conclusion I did).

No because external memory access parameters decline as the number of threads simultaneously accessing it increases (less so for the high end server CPUs). So what you saw is what I expected. If you need me to cite a reference I can go dig it up.

An ASIC approach might well be better off with #2, however, but it simply moves the bottleneck to the memory controller, and it's a hard engineering job compared to building an AES unit, a 64 bit multiplier, and 2MB of DRAM.  But that 2MB of DRAM area limits you in a big way.

Computation can be traded for space to use fast caches. See what I wrote up-post. And or you could design an ASIC to drop into an existing GPU memory controller setup. Etc.. There are numerous options. Yes it is a more difficult engineering job that is a worse for CryptoNite because it means who ever is first to solve it, will limit supply and give an incredible advantage to a few, which is what plagued Bitcoin in 2013 until the ASICs became ubiquitous. This proprietary advantage might linger for much longer duration.

In my best professional opinion, barring funky weaknesses lingering within the single round of AES, CryptoNight is a very solid PoW.  Its only real disadvantage is comparatively slow verification time, which really hurts the time to download and verify the blockchain.

In my professional opinion, I think you lack depth of understanding. What gives?
legendary
Activity: 1470
Merit: 1000
Want privacy? Use Monero!
the emission halves every 512 days?

This is a continuous process, not suddenly halved (like bitcoin). The continuous decrease is such that after 512 days the reward is half of original one.

i always wondered why satoshi designed bitcoin to drop off these gigantic cliffs every few years.

yes, when you look at the monero emission, it makes much more sense Cheesy
The block reward halving can cause volatility, speculation + risk that mining pools just don't want to switch to lower block rewards. If all mining pools decide that they don't do the halving, this can cause real trust problems Tongue
legendary
Activity: 3766
Merit: 5146
Whimsical Pants
After testing the "--restore-deterministic-wallet" on windows, i lost some transactions.
I re-download the blockchain but only the last transactions appear.

What is the problem?


Your transactions are still there.  This bug has been fixed in a subsequent version of the wallet on github, and I assume it will be rolled into the main distribution eventually.
donator
Activity: 1274
Merit: 1060
GetMonero.org / MyMonero.com
After testing the "--restore-deterministic-wallet" on windows, i lost some transactions.
I re-download the blockchain but only the last transactions appear.

What is the problem?


(save Bitmonero Appdata on usb storage)

Move wallet.bin somewhere else other than the appdata folder and rerun your wallet.

Then press refresh and wait

That won't solve it - the address of the wallet creation is serialised and stored in the .keys file, and it only rescans from there. We've fixed this quite a while back, though:)
donator
Activity: 1274
Merit: 1060
GetMonero.org / MyMonero.com
After testing the "--restore-deterministic-wallet" on windows, i lost some transactions.
I re-download the blockchain but only the last transactions appear.

What is the problem?


There was a bug in an older version of simplewallet where a restore will only scan blocks from 24 hours before the restore. Are you using the most recent version? We may have to put new binaries out for Windows if the patch isn't in the latest.
hero member
Activity: 565
Merit: 500
After testing the "--restore-deterministic-wallet" on windows, i lost some transactions.
I re-download the blockchain but only the last transactions appear.

What is the problem?


(save Bitmonero Appdata on usb storage)

Move wallet.bin somewhere else other than the appdata folder and rerun your wallet.

Then press refresh and wait

 
newbie
Activity: 28
Merit: 0

1. July 18th: at the price of 0.004976 at 10:33:41.045 by an anonymous person
2. July 19th: at the price of 0.004888 at 21:44:09.451 by equipoise here on Bitcointalk.org. He also shared a link to his personal page.

congrats guys! I'm cosidering marrying you, rich people!  Grin
dga
hero member
Activity: 737
Merit: 511
The L3 cache by itself is almost half of the chip.

I looked at an image of the Haswell die and appears to be less than 20%. The APU (GPU) is taking up more space on the consumer models. On the server models there is no GPU and the cache is probably a higher percentage of the die.

There is also a 64 bit multiply, which is I'm told is non-trivial. Once you combine that with your observation about Intel having a (likely persistent) process advantage (and also the inherent average unit cost advantage of a widely-used general purpose device), there just isn't much if anything left for an ASIC-maker to to work with.

So no I don't think the point is really valid. You won't be able to get thousands of times anything with a straightforward ASIC design here. There may be back doors though, we don't know. The point about lack of a clear writeup and peer review is valid.

Quote
The CPU has an inherent disadvantage in that it is designed to be a general purpose computing device so it can't be as specialized at any one computation as an ASIC can be.

This is obviously going to be true, but the scope of the task here is very different. Thousands of copies will not work.

I believe that is wrong. I suspect an ASIC can be designed that vastly outperform (at least on a power efficiency basis) and one of the reasons is the algorithm is so complex, thus it probably has many ways to be optimized with specific circuitry instead of generalized circuitry. My point is isolating a simpler ("enveloped") instruction such as aesinc would be a superior strategy (and embrace USB pluggable ASICs and get them spread out to the consumer).

Also I had noted (find my post in my thread a couple of months ago) that the way the AES is incorrectly employed as a random oracle (as the index to lookup in the memory table), the algorithm is very likely subject to some reduced solution space. This is perhaps Claymore's advantage (I could probably figure it out if I was inclined to spend sufficient time on it).

There is no cryptographic analysis of the hash. It might have impossible images, collisions, etc..

I strongly disagree.

The algorithm is *not* complex, it's very simple.  Grab a random-indexed 128 bit value from the big lookup table.  Mix it using a single round of AES.  Store part of the result back.  Use that to index the next item.  Mix that with a 64 bit multiply.  Store back.  Repeat.  It's intellectually very close to scrypt, with a few tweaks to take advantage of things that are fast on modern CPUs.

Claymore has no fundamental advantage beyond lots of memory bandwidth and compute.  His results are actually slightly slower than what is achievable on a GPU with no algorithmic magic -- compare Claymore's speeds to tsiv's for nvidia and extrapolate another 10%-20% due to slightly better code.

Remember that there are two ways to implement the CryptoNight algorithm:
  (1) Try to fit a few copies in cache and pound the hell out of them;
  (2) Fit a lot of copies in DRAM and use a lot of bandwidth.

Approach (1) is what's being done on CPUs.  Approach (2) is what's being done on GPUs.  I tried implementing #2 on CPU and couldn't get it to perform as well as my back-of-the-envelope analysis suggests it should, but it's possible it could outperform the current CPU implementations by about 20%.  (I believe yvg1900 tried something similar and came to the same conclusion I did).  An ASIC approach might well be better off with #2, however, but it simply moves the bottleneck to the memory controller, and it's a hard engineering job compared to building an AES unit, a 64 bit multiplier, and 2MB of DRAM.  But that 2MB of DRAM area limits you in a big way.

In my best professional opinion, barring funky weaknesses lingering within the single round of AES, CryptoNight is a very solid PoW.  Its only real disadvantage is comparatively slow verification time, which really hurts the time to download and verify the blockchain.
dga
hero member
Activity: 737
Merit: 511
There is much more that has to be investigated and I can find nearly nothing on Monero's proof-of-work hash. No benchmarking. No detailed whitepaper. Nothing but the source code.

We have the same issue with the CryptoNight PoW algorithm. To quote the initial CryptoNote whitepaper review we've released:

Quote
It's absolutely unconscionable to to come up with a new "Proof of Work Algorithm" and then refrain from including any sort of pseudocode to describe that algorithm. Upon which. Your entire. Coin. Is. Based. Ugh.

I've posted an informal summary of my analysis of the CryptoNight algorithm earlier in this thread with respect to GPU balance and its likely eventual balance with ASICs.

It's good.

I concur with you about the lameness of the ByteCoin release, of course, but it's obvious why they did it -- they took a simple and elegant proof of work function that was clearly designed by someone with a clue, and they wrapped it in a complex load of poop to slow it down artificially.  Most likely, they then used that to fake two years of the blockchain, but it's impossible to prove that, so that's purely my speculation.
legendary
Activity: 1722
Merit: 1217
the emission halves every 512 days?

This is a continuous process, not suddenly halved (like bitcoin). The continuous decrease is such that after 512 days the reward is half of original one.

i always wondered why satoshi designed bitcoin to drop off these gigantic cliffs every few years.
jr. member
Activity: 54
Merit: 257
New exchange: lazycoins.com

Disclaimer: please take note that referencing it do not imply we endorse it. Trade at your own risk.

Updated by David Latapie
r05
full member
Activity: 193
Merit: 100
test cryptocoin please ignore
Quite interested in having a purely OpenCL miner. I know Claymore is OpenCL, but it only works with AMD GPUs. The Intel HD series of GPUs are very poor I know, but they too utilize OpenCL - for those of us that are hoping to squeeze every little last bit from our systems using the onboard GPU would be very advantageous.
So you mean that I can use both AMD GPU and Intel HD graphics for mining on one computer?
And your CPU, yes. They are separate devices with separate resource pools, despite the Intel HD GPU being on the same unit as the CPU. I have read up on it and other coin algos have miners for the Intel HD series and the use of it doesn't impact the CPU mining speed.

Miner for Intel HD? Rough estimate on h/s on 4000/5000?


on scrypt, gpu mined slower than cpu.
but probably using less power.
Can confirm pretty much the same metrics. It's slower than CPU but can be used at the same time.

Seems a bit silly to be having idle processing power  Wink
legendary
Activity: 2716
Merit: 1094
Black Belt Developer
Quite interested in having a purely OpenCL miner. I know Claymore is OpenCL, but it only works with AMD GPUs. The Intel HD series of GPUs are very poor I know, but they too utilize OpenCL - for those of us that are hoping to squeeze every little last bit from our systems using the onboard GPU would be very advantageous.
So you mean that I can use both AMD GPU and Intel HD graphics for mining on one computer?
And your CPU, yes. They are separate devices with separate resource pools, despite the Intel HD GPU being on the same unit as the CPU. I have read up on it and other coin algos have miners for the Intel HD series and the use of it doesn't impact the CPU mining speed.

Miner for Intel HD? Rough estimate on h/s on 4000/5000?


on scrypt, gpu mined slower than cpu.
but probably using less power.
sr. member
Activity: 525
Merit: 250
Hello once again, people!

As we’ve announced yesterday, the 100 XMR rewards for taking part in our most recent contest were sent out to their respective winners.

Here are the winning deals:
1. July 18th: at the price of 0.004976 at 10:33:41.045 by an anonymous person
2. July 19th: at the price of 0.004888 at 21:44:09.451 by equipoise here on Bitcointalk.org. He also shared a link to his personal page.
Thanks once again for your partaking and for your intense interest in the coin’s performance on our exchange, and congrats!
Please let us know if this was exciting and if you want to see us continue running such contests, and we shall keep doing that.
Read the full text of this announcement

r05
full member
Activity: 193
Merit: 100
test cryptocoin please ignore
Quite interested in having a purely OpenCL miner. I know Claymore is OpenCL, but it only works with AMD GPUs. The Intel HD series of GPUs are very poor I know, but they too utilize OpenCL - for those of us that are hoping to squeeze every little last bit from our systems using the onboard GPU would be very advantageous.
So you mean that I can use both AMD GPU and Intel HD graphics for mining on one computer?
And your CPU, yes. They are separate devices with separate resource pools, despite the Intel HD GPU being on the same unit as the CPU. I have read up on it and other coin algos have miners for the Intel HD series and the use of it doesn't impact the CPU mining speed.
Jump to: