[XMR] Monero Improvement Technical Discussion - page 7.

GingerAle

legendary

Activity: 1260

Merit: 1008

Quote from: ArticMine on August 21, 2015, 05:07:54 PM

The slack would be built into the spread between the minimum fee for relay and the suggested fee to the user. If the user goes out of her way to pay just the minimum fee then she would run the risk of her transaction not being relayed. The averaging over a number of blocks is designed to dampen the impact of a sudden rise in difficulty. I am not convinced that users would go out of their way to delay transactions in order to speculate on lower fees since the cost of delaying the transaction would be larger in most cases than any possible fee savings.

The problem I see with using fees paid as a measure of the value of XMR is that unlike the difficulty this could be easy to game. A spammer could gradually ramp up the spam since as the total fees per block increase then individual per kb fees would drop. This would leave the actual base reward before the inclusion of fees and / or penalties as the optimal parameter for R_A.

I think the aspect that would make it easy to game is the timescale... the rolling window within which the average is calculated. I think if you made the window absurdly large it would become impossible to game. Perhaps 6 months in blocktime. The amount of spam required to ramp up over a 6 month period would be significant.

Hrm, and perhaps the other problem with using total fees paid as a measure of value of XMR is that the formula becomes recursive or is looped. The new minimum fee calculated will influence the total fees in the next block.

Gah, im trying to attack this with excel. its really uncharted territory. The only decentralized currency network we have to work with for example on how the difficulty will rise is bitcoin, and asics really borked that. Perhaps I need to study litecoin. I think if we could extract some models / functions from those data, we could get a better sense of how the system operates, and thus design the fee equation. Does anyone know how to do this? If you can get the raw data, I think I have some software that could pull some equations from the data.

ArticMine

legendary

Activity: 2282

Merit: 1050

Monero Core Team

The slack would be built into the spread between the minimum fee for relay and the suggested fee to the user. If the user goes out of her way to pay just the minimum fee then she would run the risk of her transaction not being relayed. The averaging over a number of blocks is designed to dampen the impact of a sudden rise in difficulty. I am not convinced that users would go out of their way to delay transactions in order to speculate on lower fees since the cost of delaying the transaction would be larger in most cases than any possible fee savings.

The problem I see with using fees paid as a measure of the value of XMR is that unlike the difficulty this could be easy to game. A spammer could gradually ramp up the spam since as the total fees per block increase then individual per kb fees would drop. This would leave the actual base reward before the inclusion of fees and / or penalties as the optimal parameter for R_A.

MoneroMooo

legendary

Activity: 1276

Merit: 1001

Two points of interest:

- a moving min reward like this might cause delayed transactions to be spuriously rejected, if the min fee increases just after the tx is broadcasted. Maybe some kind of slack is warranted. However, if some slack is allowed, it is expected that some people will pay fees taking the slack into account, nnullifying the slack.

- some people might decide to "bet" on decreasing fees, and delay transactions till they can be sent with a lower fee. This might have knock on effects on the blockchain utilization: both in instantaneous decreased use, and burst use when the fees goes down.

GingerAle

legendary

Activity: 1260

Merit: 1008

Quote from: ArticMine on August 21, 2015, 11:48:48 AM

Quote from: smooth on August 21, 2015, 09:19:49 AM

There should be a factor for block subsidy.

Yes. In order for the difficulty to be a surrogate for the value of XMR one needs to factor in the actual block reward, So the formula would look as follows:

F = F₀*D₀*R_A/(D_A*R₀)

Where

F₀ is a constant comparable to the current per kb fee
R₀ is a constant comparable to the current block reward
R_A is the average block reward of the blocks N-2K to N-K
D₀ is a constant comparable to the current difficulty
D_A is the average difficulty of the blocks N-2K to N-K
N is the last block number
K is a constant for example 1000

One could replace R_A by the average of base block reward namely by netting out fees and penalties; however then it would not track value as closely particularly if fees were to dominate the actual block reward in the future.

The question of the responsiveness of the difficulty to a sudden increase in price is valid to a degree; however it is tempered by the fact that the increase would be temporary, the increase would hit XMR holders at a time when they are experiencing a significant increase in the value of their XMR, and in the worst case scenario any drop in transaction volume would likely temper the sudden price movement. Attacks based on manipulating the hash rate I do not see as an issue here simply because any impact on fees is extremely minor compared to the risk of a 51% type attack.

My main concern here is to not create a situation similar to what is currently happening in Bitcoin with the blocksize debate. It is fairly easy to get the necessary consensus to fork an 18 month old coin, it is quite another matter to try to undo the fork 5 years later; particularly when there is no immediate threat. This is the critical lesson we must learn form Bitcoin.

Exactly. Thats why I wanted to get these conversations going. IMO, the fee thing is the most addressible component of the code that needs to be dehumanized ASAP.

Re: first bold - are we talking block reward (as in, fees + hard coded emission) or block subsidy (hard coded emission?)

Regarding these semantics, and the concept of using the difficulty + block reward as a surrogate for XMR value..

I think we could use fees per block as an additional surrogate (to fit somewhere in the equation). So, remove base block reward.

Ultimately, I think I'm just talking my way back to using the # of transactions / block as a surrogate for valuation. Because if we just use total fees, or fees / transaction, we run into the problem that high mixin transactions are huge, and thus have a higher fee, so the total reward per block can be skewed by an era of high mixin transactions.

here's a ~2X difficulty increase, where the emission has stopped and the average block reward is 2.2 (lots of transactions)

0.01 * 9e8 * 2.2 / (1.8e9 * 9) = 0.0012

here's a 2X difficulty increase, where the emission has stopped and average block reward is 0.2 (very few transactions)

0.01 * 9e8 * 0.2 / (1.8e9 * 9) = 1.1e-4

IMO, I don't agree with this behavior. To me, a lot of transactions indicate that the coin has higher value - the real world actually values the coin so it is used as a means to transfer value. Few transactions indicate the coin has less value, because the real world isn't using it that much. That the fee decreases is odd.

Perhaps it has to do with using the block subsidy and not just total fees. Here's using just the fees (using 0.001 as average fees for existing usage)

0.01 * 9e8 * 2.2 / (1.8e9 * 0.001) = 11

hrm...... did I do my math wrong? PMDAS

ArticMine

legendary

Activity: 2282

Merit: 1050

Monero Core Team

Quote from: smooth on August 21, 2015, 12:32:27 PM

Quote from: ArticMine on August 21, 2015, 11:48:48 AM

My main concern here is to not create a situation similar to what is currently happening in Bitcoin with the blocksize debate. It is fairly easy to get the necessary consensus to fork an 18 month old coin, it is quite another matter to try to undo the fork 5 years later; particularly when there is no immediate threat. This is the critical lesson we must learn form Bitcoin.

Fee rates don't need a fork or consensus though. Miners can mine whatever they want. People can set their wallets to use whatever fees they want. Relays can relay whatever they want, but if that becomes an impediment to miners and wallets, then wallets can connect directly to miners.

However, I still agree there is merit in default/recommended fees following some sort of metric rather than just an arbitrary number.

Thanks for the clarification. It is a critical distinction that the minimum fee is not enforced in the blockchain but rather by convention. among the nodes. So if this became an issue node could simple start relaying transactions with lower than the minimum fee without creating a hard fork.

smooth

legendary

Activity: 2968

Merit: 1198

Quote from: ArticMine on August 21, 2015, 11:48:48 AM

My main concern here is to not create a situation similar to what is currently happening in Bitcoin with the blocksize debate. It is fairly easy to get the necessary consensus to fork an 18 month old coin, it is quite another matter to try to undo the fork 5 years later; particularly when there is no immediate threat. This is the critical lesson we must learn form Bitcoin.

Fee rates don't need a fork or consensus though. Miners can mine whatever they want. People can set their wallets to use whatever fees they want. Relays can relay whatever they want, but if that becomes an impediment to miners and wallets, then wallets can connect directly to miners.

However, I still agree there is merit in default/recommended fees following some sort of metric rather than just an arbitrary number.

ArticMine

legendary

Activity: 2282

Merit: 1050

Monero Core Team

Quote from: smooth on August 21, 2015, 09:19:49 AM

There should be a factor for block subsidy.

Yes. In order for the difficulty to be a surrogate for the value of XMR one needs to factor in the actual block reward, So the formula would look as follows:

F = F₀*D₀*R_A/(D_A*R₀)

Where

F₀ is a constant comparable to the current per kb fee
R₀ is a constant comparable to the current block reward
R_A is the average block reward of the blocks N-2K to N-K
D₀ is a constant comparable to the current difficulty
D_A is the average difficulty of the blocks N-2K to N-K
N is the last block number
K is a constant for example 1000

One could replace R_A by the average of base block reward namely by netting out fees and penalties; however then it would not track value as closely particularly if fees were to dominate the actual block reward in the future.

The question of the responsiveness of the difficulty to a sudden increase in price is valid to a degree; however it is tempered by the fact that the increase would be temporary, the increase would hit XMR holders at a time when they are experiencing a significant increase in the value of their XMR, and in the worst case scenario any drop in transaction volume would likely temper the sudden price movement. Attacks based on manipulating the hash rate I do not see as an issue here simply because any impact on fees is extremely minor compared to the risk of a 51% type attack.

My main concern here is to not create a situation similar to what is currently happening in Bitcoin with the blocksize debate. It is fairly easy to get the necessary consensus to fork an 18 month old coin, it is quite another matter to try to undo the fork 5 years later; particularly when there is no immediate threat. This is the critical lesson we must learn form Bitcoin.

smooth

legendary

Activity: 2968

Merit: 1198

There should be a factor for block subsidy.

GingerAle

legendary

Activity: 1260

Merit: 1008

Quote from: ArticMine on August 20, 2015, 11:01:40 PM

I am proposing for discussion making the minimum per kb fee, F, a function of the difficulty as follows

F = F₀*D₀/D_A

Where

F₀ is a constant comparable to the current per kb fee
D₀ is a constant comparable to the current difficulty
D_A is the average difficulty of the blocks N-2K to N-K
N is the last block number
K is a constant for example 1000

Rationale: The concept is that the difficulty is an approximate measure of the cost of hardware, bandwidth etc in terms of XMR. The minimum fee is would then approximate that adverse impact of the spam it is intended to prevent. Furthermore as the purchasing power of XMR increases the fee in XMR would decrease, while if the purchasing power of XMR decreases the fee in XMR would increase. The remaining suggested fees for both nodes and miners could also be dynamically set in terms of the minimum per kb fee.

interesting. The difficulty is possibly the best surrogate for the value of XMR. Of course there's the botnet / sysadmin issue (i.e., boost difficulty with abstract costs), but IMO thats not really addressable so its best ignored.

So basically use the existing numbers as constants.

What are the possible attack / manipulation vectors?

1. To drive the min per kb fee down to an unusable level, one would have to increase the hashrate of the network for an extended amount of time.
1. To drive the min per kb fee up one would have to decrease the hashrate of the network, which is achievable by attacking pools (if we don't find a way to escape mining centralization)

The problem, I think, will be determining the width of the rolling window (K). Prices can skyrocket overnight, and people wouldn't transact if the network difficulty doesn't catch up to the valuation overnight.

Or maybe prices will be more gradual, who knows.

With the above proposal, I think we would have to include a hard floor and ceiling. I.e., if the entire solar system is using monero (presuming we've figured out using superposition to overcome speed of light problems in a consensus network), the network difficulty will be astronomical. Would the above equation work?

0.01*865102700/865102700e12 = 0.00000000000001. I guess realistically the floor would be a piconero

Finally, when utilizing difficulty as a means to adjust fees, I think we would have to wait to implement this post primary emissions, or phase it in somehow (perhaps couple K to block reward). Right now the difficulty is high (presumably) due to the high block reward - its a murky connection between cost of mining, valuation of XMR, and ideal cost of transacting.

ArticMine

legendary

Activity: 2282

Merit: 1050

Monero Core Team

I am proposing for discussion making the minimum per kb fee, F, a function of the difficulty as follows

F = F₀*D₀/D_A

Where

F₀ is a constant comparable to the current per kb fee
D₀ is a constant comparable to the current difficulty
D_A is the average difficulty of the blocks N-2K to N-K
N is the last block number
K is a constant for example 1000

Rationale: The concept is that the difficulty is an approximate measure of the cost of hardware, bandwidth etc in terms of XMR. The minimum fee is would then approximate that adverse impact of the spam it is intended to prevent. Furthermore as the purchasing power of XMR increases the fee in XMR would decrease, while if the purchasing power of XMR decreases the fee in XMR would increase. The remaining suggested fees for both nodes and miners could also be dynamically set in terms of the minimum per kb fee.

TPTB_need_war

sr. member

Activity: 420

Merit: 262

Quote from: smooth on August 20, 2015, 04:56:58 AM

Quote from: TPTB_need_war on August 20, 2015, 04:49:43 AM

the 40 Gbytes/sec for ASIC on

It was gigabits, btw. The main application for this stuff seems to be comm gear so usually quoted in bits (but yes an OOM faster than Intel in-CPU)

But it wasn't stated whether it also consumed an OOM more electricity (than the Intel in-CPU).

smooth

legendary

Activity: 2968

Merit: 1198

Quote from: TPTB_need_war on August 20, 2015, 04:49:43 AM

Indeed converting a more compute intensive variant that makes it less trivial to convert to compute bound thus makes it compute bound. Hehe. See where I ended up?

Well sure, certainly compute bound, but not necessarily useful at all (as you said in your first post).

Quote

It seems to me that a slower SHA hash is not something that can slow your computer to a crawl. Whereas with full disk encryption, Intel needs AES to be very fast.

Maybe maturity too. This is the first version of ISHAE (shipped for less than a month). I think AESNI has had at least two significant speed bumps (one for sure).

That may include seeing if people actually use it. Otherwise it may get slower (or even gone; as usual with these extensions you are supposed to always check CPUID before using it, so they can take it out whenever they want really). AESNI has the virtuous cycle of people using it because it was reasonably fast, which motivated Intel to make it even faster. Repeat. ISHAE may not.

Quote

the 40 Gbytes/sec for ASIC on

It was gigabits, btw. The main application for this stuff seems to be comm gear so usually quoted in bits (but yes an OOM faster than Intel in-CPU)

TPTB_need_war

sr. member

Activity: 420

Merit: 262

Quote from: smooth on August 20, 2015, 03:45:42 AM

Quote from: TPTB_need_war on August 20, 2015, 03:20:15 AM

Scrypt uses the very efficient Salsa (ChaCha is nearly identical) one-way hash from Bernstein (not SHA256) which has a very low count of ALU operations. Afaik, Litecoin (which uses Scrypt) has ASICs which I assume have displaced CPU and GPU mining. I haven't been following Litecoin since the ASICs were rumored to be released.

Yes I was relating to the gains from SHA256 ASICs. I'm less familiar with Scrypt ASICs in all matters (how much they cost, how much performance they gain, etc.)

But as you are aware Scrypt is designed in a manner that makes the TMTO particularly trivial to implement. Even then it isn't necessarily the case that a Scrypt ASIC would pay off (as you said). It seems they do but that is an empirical result that depends on the particulars of the algorithm. It is not the case that the lookup table approach for read-write scratchpads is equally trivial. Still feasible, possibly, but it isn't obvious.

Indeed converting a more compute intensive variant that makes it less trivial to convert to compute bound thus makes it compute bound. Hehe. See where I ended up?

Quote from: smooth on August 20, 2015, 03:45:42 AM

Quote from: TPTB_need_war on August 20, 2015, 03:20:15 AM

Even if you are using dedicated hardware, to some extent it must be diluting the benefit of doing so by accessing them through generalized pipelines, 3 levels of caches (L1, L2, L3), and all the glue logic to make all those parts of the CPU work for the general purpose use case of the CPU. It is very unlikely that you've been able to isolate such that you are only activating the same number of transistors that the ASIC would for the specialized silicon.

Clearly true, but less clear there is anything yielding "orders of magnitude" here, nor, again, that the gain is enough to offset the costs of trying to reduce latency.

Apologies perhaps the plurality on "orders" in my prior post is what you meant by being an overly generalized statement and in that case you are literally correct.

When I write orders in plural form, I include the singular form as one of the applicable cases.

I am more inclined to believe both Cryptonite and my pure AES-NI are within 1 - 2 orders of magnitude of any possible ASIC optimization, in terms of the relevant parameters of Hash/watt and Hash/$capital outlaw.

I would only argue that I feel a more complex approach such that Cryptonite uses, has a greater risk of slipping a bit maybe between 1 and 2 orders range. Again we are just making very rough guesses based on plausible theories.

Quote from: smooth on August 20, 2015, 03:45:42 AM

Quote from: TPTB_need_war on August 20, 2015, 03:20:15 AM

We try to minimize the advantage the ASIC will have, but we can not be at parity with the ASIC

Of course, it is a given that you can take some things off the chip and yield an ASIC with better marginal economics. But equally ASICs can (probably) never equal the economies of scale of general purpose CPUs, either on production or R&D. So the actual numbers matter. If If you can convincingly show orders of magnitude gain, then it is easy to dismiss the rest. Otherwise not.

I almost mentioned Intel's superior fabs, but I didn't because well for one thing Intel has been failing to meet deadlines recent years. Moore's law appears to be failing.

If we are talking about a $4 billion Bitcoin market cap then yeah. But if we are talking about a $10 trillion crypto-coin world, then maybe not so any more.

Agreed the devil is in the details.

I think what I am trying to argue is that I've hoped to stay below 2 orders-of-magnitude advantage for the ASIC and by being very conservative with which CPU features are combined, then hopefully less than 1 order-of-magnitude. And that is enough for me in my consensus network design, because I don't need economic parity with ASICs in order to eliminate the profitability of ASICs in my holistic coin design.

And I am saying that the extra CPU features that Cryptonite is leveraging seem to add more risk, not less (not only risk of ASICs but risk that someone else found a way to optimize the source code). So I opted for simplicity and clarity instead. AES benchmarks against ASICs are published. Cryptonite versus ASIC benchmarks don't exist. I prefer the marketing argument that is easier to support.

Quote from: smooth on August 20, 2015, 03:45:42 AM

Quote from: TPTB_need_war on August 20, 2015, 03:20:15 AM

In order to make it so it is not economical for the ASIC to convert the latency bound to compute bound, you end up making your "memory-hard" hash compute bound in the first place! That was a major epiphany.

I don't think you have shown it is economical, only that it is possible. In fact you have specifically agreed that the gain from ASICs vs. in-CPU dedicate circuits is likely relatively small, which is why you propose using the in-CPU circuits!

I think perhaps you are missing my point. In order to make it so it is only "possible" and not likely for the ASIC to convert the latency bound (of the CPU hash) to compute bound, then it appears it is likely to end up making the CPU hash compute bound any way by adding more computation between latency accesses, as I believe Cryptonite has done. Haven't benchmarked it though.

So that is one reason why I discarded all that latency bound circuitry and just K.I.S.S. on the documented less than 1 order-of-magnitude advantage for ASICs w.r.t. to AES-NI. And then I also did one innovation better than that (secret).

Quote from: smooth on August 20, 2015, 03:45:42 AM

Quote from: TPTB_need_war on August 20, 2015, 03:20:15 AM

Thus why waste the time pretending the memory-hard complexity is helping you. If anything it is likely hurting you. Just go directly to using AES-NI for the hash. Which is what I did in spades.

Because I think this more clearly loses to economies of scale by replicating the relatively small AES hardware many, many times and getting rid of overhead of packaging, being instructions decoded inside a CPU (you can never eliminate all overhead from that regardless of how "tight" your algorithm) inside a computer (memory, fans, batteries, I/O, etc.) with a relatively tiny hash rate compared to a chip with 1000x AES circuits inside a mining rig with 100 chips.

In terms of Hashes/capital outlaw yes. But not necessarily true for Hashes/watt.

Even you admitted upthread that the 40 Gbytes/sec for ASIC on AES might not be an order-of-magnitude advantage in terms of Hashes/watt. I had seen some stats on relative Hashes/watt and I've forgotten what they said. Will need to dig my past notes for it.

But for our application, I wasn't concerned about the capital cost because the user has a $0 capital cost because they already own a CPU.

I was supremely focused on the Hashes/watt, because I know if I can get users to mine, and they don't notice a 1% increase in their electricity bill, then their Hashes/watt is infinite.

Also so that the rest of the user's CPU isn't tied up by the hashing, so they can mine and work simultaneously.

And thus I make ASICs relatively unprofitable. They may still be able to make a "profit" if the coin value is rising to level of hashrate invested (which would so awesome if you have 100 million users mining), but the ASICs can not likely dominate the network hashrate. Or at least they can't make it totally worthless for a user to mine, if there is perceived value to microtransactions and users don't want to hassle with an exchange just to use a service they want that only accepts microtransactions. But even that is not the incentive I count on to get users to mine.

There is a lot of holistic economics involved in designing your proof-of-work hash. It is very complex.

Quote from: smooth on August 20, 2015, 03:45:42 AM

Memory gives physical size (and also its own cost, but mostly just size), which makes it impossible to replicate to whatever arbitrary degree is needed to amortize product-type and packaging overhead.

Agreed if Hashes/capital outlaw is your objective but you may also worse your relative (to ASIC) Hashes/watt. Maybe not, but again there are no Cryptonite vs. ASIC benchmarks so we can't know. There are AES benchmarks.

Quote from: smooth on August 20, 2015, 03:45:42 AM

There may be economic arguments apart from mining efficiency why people mine on computers vs. minings rigs. But if those exist they must not exclusively rely on pure raw efficiency (because that will clearly lose by the argument in the previous paragraph) but to efficiency being "close enough" for the other factors to come into play. So again, the number matter. And once you resort to "close enough" then yet again numbers matter.

Yup. Well stated to point out that as get closer to parity on efficiencies then all sorts of game theory economic scenarios come into play. That is what I meant by it is complex.

Quote from: smooth on August 20, 2015, 03:45:42 AM

Anyway, I won't be surprised to be proven "wrong" (not really wrong because I'm not saying any of this won't work, just that it isn't clear that it will work) but I will be surprised if the market cap at which cryptonight ASICs appear isn't a lot higher than Scrypt ASICs (adjusting for differences in time period and such).

Well no one can for any circuit of any reasonable complexity implemented on a complex circuit of the CPU which is somewhat of a blackbox. Berstein even complained recently that Intel isn't documenting all its timings.

The basic problem as I see it is that the only thing Cryptonote does to deal with ASICs is the proof-of-work hash. It doesn't look at other aspects of the system for ways to drive away ASICs. And in some sense, it shouldn't since Cryptonote like all existing crypto is subject to a 51% attack and thus it needs as much hashrate as it can get (up to some level).

Quote from: smooth on August 20, 2015, 03:45:42 AM

EDIT: there is actually a real world example of using just in-CPU crypto not being enough to compete with ASICs and that is Intels SHA extensions. They are not competitive with Bitcoin mining ASICs.

Seems I had read something about that where Intel hadn't done the circuit necessary to be competitive. As you said, SHA is a complex algorithm. If there is some instruction which is fundamental and highly optimized, e.g. 64x64 it might be as optimized as it can get in silicon. I doubt Intel has fully optimized anything to that degree, but apparently on the AES benchmarks they got close.

It seems to me that a slower SHA hash is not something that can slow your computer to a crawl. Whereas with full disk encryption, Intel needs AES to be very fast.

smooth

legendary

Activity: 2968

Merit: 1198

Quote from: TPTB_need_war on August 20, 2015, 03:20:15 AM

Scrypt uses the very efficient Salsa (ChaCha is nearly identical) one-way hash from Bernstein (not SHA256) which has a very low count of ALU operations. Afaik, Litecoin (which uses Scrypt) has ASICs which I assume have displaced CPU and GPU mining. I haven't been following Litecoin since the ASICs were rumored to be released.

Yes I was relating to the gains from SHA256 ASICs. I'm less familiar with Scrypt ASICs in all matters (how much they cost, how much performance they gain, etc.)

But as you are aware Scrypt is designed in a manner that makes the TMTO particularly trivial to implement. Even then it isn't necessarily the case that a Scrypt ASIC would pay off (as you said). It seems they do but that is an empirical result that depends on the particulars of the algorithm. It is not the case that the lookup table approach for read-write scratchpads is equally trivial. Still feasible, possibly, but it isn't obvious.

Quote

Even if you are using dedicated hardware, to some extent it must be diluting the benefit of doing so by accessing them through generalized pipelines, 3 levels of caches (L1, L2, L3), and all the glue logic to make all those parts of the CPU work for the general purpose use case of the CPU. It is very unlikely that you've been able to isolate such that you are only activating the same number of transistors that the ASIC would for the specialized silicon.

Clearly true, but less clear there is anything yielding "orders of magnitude" here, nor, again, that the gain is enough to offset the costs of trying to reduce latency.

Quote

We try to minimize the advantage the ASIC will have, but we can not be at parity with the ASIC

Of course, it is a given that you can take some things off the chip and yield an ASIC with better marginal economics. But equally ASICs can (probably) never equal the economies of scale of general purpose CPUs, either on production or R&D. So the actual numbers matter. If If you can convincingly show orders of magnitude gain, then it is easy to dismiss the rest. Otherwise not.

Quote

In order to make it so it is not economical for the ASIC to convert the latency bound to compute bound, you end up making your "memory-hard" hash compute bound in the first place! That was a major epiphany.

I don't think you have shown it is economical, only that it is possible. In fact you have specifically agreed that the gain from ASICs vs. in-CPU dedicate circuits is likely relatively small, which is why you propose using the in-CPU circuits!

Quote

Thus why waste the time pretending the memory-hard complexity is helping you. If anything it is likely hurting you. Just go directly to using AES-NI for the hash. Which is what I did in spades.

Because I think this more clearly loses to economies of scale by replicating the relatively small AES hardware many, many times and getting rid of overhead of packaging, being instructions decoded inside a CPU (you can never eliminate all overhead from that regardless of how "tight" your algorithm) inside a computer (memory, fans, batteries, I/O, etc.) with a relatively tiny hash rate compared to a chip with 1000x AES circuits inside a mining rig with 100 chips. Memory gives physical size (and also its own cost, but mostly just size), which makes it impossible to replicate to whatever arbitrary degree is needed to amortize product-type and packaging overhead.

There may be economic arguments apart from mining efficiency why people mine on computers vs. minings rigs. But if those exist they must not exclusively rely on pure raw efficiency (because that will clearly lose by the argument in the previous paragraph) but to efficiency being "close enough" for the other factors to come into play. So again, the number matter. And once you resort to "close enough" then yet again numbers matter.

Anyway, I won't be surprised to be proven "wrong" (not really wrong because I'm not saying any of this won't work, just that it isn't clear that it will work) but I will be surprised if the market cap at which cryptonight ASICs appear isn't a lot higher than Scrypt ASICs (adjusting for differences in time period and such).

EDIT: there is actually a real world example of using just in-CPU crypto not being enough to compete with ASICs and that is Intels SHA extensions. They are not competitive with Bitcoin mining ASICs.

TPTB_need_war

sr. member

Activity: 420

Merit: 262

Quote from: smooth on August 20, 2015, 02:20:15 AM

Quote from: TPTB_need_war on August 20, 2015, 01:16:57 AM

ASICs are usually orders-of-magnitude more efficient at preordained computation, thus they will probably always be able to use a smaller cache than is used by the memory-hard CPU algorithm.

This sort of general statement is too broad to be useful. Most of the work being done in cryptonight is already done by dedicated hardware (AES round or 64x64 multiply), which is very different from the experience with SHA256 or Scrypt, where the CPU implementation of the hashes uses many simple ALU operations. The inner loop of cryptonight is about a dozen instructions. By contrast SHA256 is roughly 3375 basic integer operations. That's a huge difference in kind. I can't speak to Scrypt as I haven't studied it at all.

Scrypt uses the very efficient Salsa (ChaCha is nearly identical) one-way hash from Bernstein (not SHA256) which has a very low count of ALU operations. Afaik, Litecoin (which uses Scrypt) has ASICs which I assume have displaced CPU and GPU mining. I haven't been following Litecoin since the ASICs were rumored to be released.

Even if you are using dedicated hardware, to some extent it must be diluting the benefit of doing so by accessing them through generalized pipelines, 3 levels of caches (L1, L2, L3), and all the glue logic to make all those parts of the CPU work for the general purpose use case of the CPU. It is very unlikely that you've been able to isolate such that you are only activating the same number of transistors that the ASIC would for the specialized silicon.

Thus I think the general statement is not too broad, but rather apropos until someone can convince me otherwise.

Quote from: smooth on August 20, 2015, 02:20:15 AM

For example, in the case of AES ASICs that exist, they are not orders of magnitude faster than the in-CPU implementation and likely also aren't orders-of-magnitude better in terms of power-efficiency either (though the latter numbers are harder to come by and harder to even estimate in the case of the in-CPU implementation).

Intel reports 4.52 cycles per byte single threaded, which at 2 GHz clock comes to roughly 3.5 gigabits per second. That's in line with reported commercial AES ASICs such as the Helios FAST core (3 Gbps). Their Giga core is one order of magnitude faster (40 Gbps), but carries a heavy penalty in latency and gate count (sounds like extremely heavily pipelined to me, though that could conceivably be okay for mining).

Well yeah I've cited the similar references in my work. That is why I went with AES-NI for my CPU proof-of-work hash. And why did I not access caches, instead keep a tight loop that can be loaded once in microcode and not reengage the pipeline logic over and over, and try to minimize the number of other transistors I would be activating in the CPU. One can assume the CPU will highly optimized for doing encryption since OS's offer full disk encryption.

Afaics, complexity of Cryptonite's bag-O-all-tricks (AES-NI, 64x64, Scrypt-like cache latency) makes that less likely that extra generalized logic in the CPU won't be accessed, e.g. checking if a cache hit is in L1 and L2 before accessing L3 which an ASIC would not need to do.

Quote from: smooth on August 20, 2015, 02:20:15 AM

But from that one order of magnitude you have to subtract the added costs of the more complex table lookup.

Not even that. Just run the ASIC with a full size cache but eliminate L1 and L2 and other generalized logic that is consuming electricity. Intel is working hard at improving the parts of the CPU that are powered down when not in use, but it is still not very fine grained and will never be the equivalent of just specializing all the logic in an ASIC.

We try to minimize the advantage the ASIC will have, but we can not be at parity with the ASIC.

Quote from: smooth on August 20, 2015, 02:20:15 AM

This is not to say that no TMTO is potentially possible but the assumption of orders-of-magnitude gains on the T side with which to make that tradeoff is questionable at best (and remember this doesn't get you to no memory, only to less memory). But as you say earlier in your post, the feasibility and efficiency payoff of this depends very much on the numbers. It is certainly possible that the most efficient way to implement any particular algorithm is with a simple lookup table, even if such a table is not strictly required.

I think perhaps you are missing my point and perhaps because it had to be deduced from what I wrote in the prior post, i.e. I didn't state it explicitly.

In order to make it so it is not economical for the ASIC to convert the latency bound to compute bound, you end up making your "memory-hard" hash compute bound in the first place! That was a major epiphany.

Thus why waste the time pretending the memory-hard complexity is helping you. If anything it is likely hurting you. Just go directly to using AES-NI for the hash. Which is what I did in spades.

Quote from: smooth on August 20, 2015, 02:20:15 AM

I remain fairly unconvinced that cryptonight necessarily can't use the latency bound of a lookup table access (which is largely an irreducible quantity) and the inherent cost of such a table along with the significant costs of reducing it to limit the payoff from highly-parallel dedicated implementations. But that doesn't mean that it can either.

If you are not doing the algorithm with serialized random access the way I described it as an improvement over Scrypt, then yes parallelism can eat your lunch.

And if you do it the way I described, then you end up making it compute bound in order to prevent an ASIC from making it compute bound. So no matter which way you go, you are going to be compute bound.

Quote from: smooth on August 20, 2015, 02:20:15 AM

I understand your comments about associative caches vs a simple lookup table but it still isn't at all clear that large gains are possible. It seems most of the latency of larger caches just comes from them being larger. But then this too remains to be seen.

Afair, I believe my concern was on the extra transistors that are being activated with the ASIC won't need to. Again remember I am loading this from memory not even having looked at my old work today. Don't really have time to do an exhaustive reload of prior hash work today.

Quote from: smooth on August 20, 2015, 02:20:15 AM

Quote

hash can become very slow, perhaps as slow as 10 or less hashes per second. This introduces an additional problem because the pool will expend nearly as much resources verifying mine share hashes as the miner will expend producing them

Believe it or not we saw one brain-dead, or more likely deliberate scam, attempt that had 0.1 (!) hashes/sec!

And there was MemoryCoin with afair an 8MB memory hard hash and with upwards of 1 second per hash.

Quote from: smooth on August 20, 2015, 02:20:15 AM

But in any case the pool expending more resources will never happen, regardless of the hash rate. In order to produce a valid share you have to hash many invalid ones, but only the valid share is sent to the pool.

I am talking about the ratio. If the pool is doing 1/1000th of the work of the share miners, then the pool has to maybe charge a larger fee.

Any way that wasn't my major point. I am just pointing out that there is limit to how large you can make the memory cache and thus how slow the hash function.

Quote from: smooth on August 20, 2015, 02:20:15 AM

The bigger problem with super-slow hashes is verification by the network. Despite some early concerns (I guess maybe before the implementation was un-deoptimized), cryptonight doesn't seem too bad; it only requires about 20ms, which is roughly in line with how long it might take to physically deliver a block over the network. Networks vary of course. Two nodes in the same data center will be much faster than that. Across the world it might take longer even if the block is empty.

Also theoretically the relative slowness of the hash verification could become a more severe problem in some imagined consensus network designs which attempted to eliminate pools and record all mining shares directly to the block chain.

smooth

legendary

Activity: 2968

Merit: 1198

Quote from: TPTB_need_war on August 20, 2015, 01:16:57 AM

ASICs are usually orders-of-magnitude more efficient at preordained computation, thus they will probably always be able to use a smaller cache than is used by the memory-hard CPU algorithm.

This sort of general statement is too broad to be useful. Most of the work being done in cryptonight is already done by dedicated hardware (AES round or 64x64 multiply), which is very different from the experience with SHA256 or Scrypt, where the CPU implementation of the hashes uses many simple ALU operations. The inner loop of cryptonight is about a dozen instructions. By contrast SHA256 is roughly 3375 basic integer operations. That's a huge difference in kind. I can't speak to Scrypt as I haven't studied it at all.

For example, in the case of AES ASICs that exist, they are not orders of magnitude faster than the in-CPU implementation and likely also aren't orders-of-magnitude better in terms of power-efficiency either (though the latter numbers are harder to come by and harder to even estimate in the case of the in-CPU implementation).

Intel reports 4.52 cycles per byte single threaded, which at 2 GHz clock comes to roughly 3.5 gigabits per second. That's in line with reported commercial AES ASICs such as the Helios FAST core (3 Gbps). Their Giga core is one order of magnitude faster (40 Gbps), but carries a heavy penalty in latency and gate count (sounds like extremely heavily pipelined to me, though that could conceivably be okay for mining).

But from that one order of magnitude you have to subtract the added costs of the more complex table lookup.

This is not to say that no TMTO is potentially possible but the assumption of orders-of-magnitude gains on the T side with which to make that tradeoff is questionable at best (and remember this doesn't get you to no memory, only to less memory). But as you say earlier in your post, the feasibility and efficiency payoff of this depends very much on the numbers. It is certainly possible that the most efficient way to implement any particular algorithm is with a simple lookup table, even if such a table is not strictly required.

I remain fairly unconvinced that cryptonight necessarily can't use the latency bound of a lookup table access (which is largely an irreducible quantity) and the inherent cost of such a table along with the significant costs of reducing it to limit the payoff from highly-parallel dedicated implementations. But that doesn't mean that it can either.

I understand your comments about associative caches vs a simple lookup table but it still isn't at all clear that large gains are possible. It seems most of the latency of larger caches just comes from them being larger. But then this too remains to be seen.

Quote

hash can become very slow, perhaps as slow as 10 or less hashes per second. This introduces an additional problem because the pool will expend nearly as much resources verifying mine share hashes as the miner will expend producing them

Believe it or not we saw one brain-dead, or more likely deliberate scam, attempt that had 0.1 (!) hashes/sec!

But in any case the pool expending more resources will never happen, regardless of the hash rate. In order to produce a valid share you have to hash many invalid ones, but only the valid share is sent to the pool. If you send invalid shares you will be banned. You can attempt to sybil attack but then the pool can respond by requiring some sort of "one time" registration fee or identify validation. These are likely desirable anyway as probably the only defenses against share withholding.

The bigger problem with super-slow hashes is verification by the network. Despite some early concerns (I guess maybe before the implementation was un-deoptimized), cryptonight doesn't seem too bad; it only requires about 20ms, which is roughly in line with how long it might take to physically deliver a block over the network. Networks vary of course. Two nodes in the same data center will be much faster than that. Across the world it might take longer even if the block is empty.

TPTB_need_war

sr. member

Activity: 420

Merit: 262

I am trying to recall and regenerate some of my thought processes from my 2013 research about memory-hard hash function designs.

The basic idea of Scrypt is to one-way hash a value and write the hash output to a memory location, hash that output write the hash output to the next memory location, and repeat until all the memory you want to use is written. Then random access read from memory locations using the value read to determine the next memory location to read. It is hoped that your algorithm becomes random access latency bound. However the problem is a GPU or ASIC can trade computation for memory size by only storing every Nth hash output and then calculating on the fly the other hash values as they are accessed. So then the algorithm becomes more compute bound than latency bound. However that doesn't necessarily mean hashes/watt is superior with the compute bound version of the algorithm. It depends on the relative power inefficiencies of the smaller memory versus the increased computation. Also if hashes/capital cost is a factor in the return on investment, then hashrate versus cost of the silicon hardware for two algorithms has to be compared and noting that smaller caches typically have nearly an order-of-magnitude lower random access latencies.

I suppose the ASICs and GPUs mining Litecoin proved that the factors favored the compute bound algorithm, at least for the Scrypt parameters Litecoin uses.

The next attempt at an improvement over Scrypt was (which I wrote down in 2013 well before the Cryptonite proof-of-work hash was published) was to not just read pseudo-randomly as described above, but to write new pseudo-random values to each pseudo-randomly accessed location. This defeats storing only every Nth random access location. However it is still possible to trade latency for computation by only storing the location that was accessed to compute the value to store instead of storing the entire value. The compute bound version of the algorithm can not be defeated by reading numerous locations before storing a value because the starting seed need only be stored.

Thus I concluded it is always possible to trade computation for latency in any memory-hard hash function.

ASICs are usually orders-of-magnitude more efficient at preordained computation, thus they will probably always be able to use a smaller cache than is used by the memory-hard CPU algorithm.

Even if the ASIC did not trade more computation for less latency, the CPU will be using more electricity to do the computational parts of the latency bound algorithm such as the one-way hash functions or how ever the data is manipulated before each write to make it pseudo-random. Even though the CPU can mask the time required to do this computation by running two threads per cache thus doing computation during the latency of the cache accesses of the partner thread, the CPU can not make the electricity cost of the computation disappear. And the ASIC are usually orders-of-magnitude more efficient at preordained computation, because the CPU has all this general computation logic such as generalized pipelines, the L1 and L2 caches (with their 8-way set associativity overhead!), and lots of other transistors that are not off when not being used.

Thus as per Litecoin's experience, I do not expect any derivative of a memory-hard hash function, including Cryptonite, to be immune from radical speedups and efficiency improvenments due to well designed ASICs. This of course won't happen until a Cryptonite coin has a very large market cap.

There is another issue. Using large caches such as 1 or 2MB and smaller data words, e.g. 256-bit hashes, and then the large number of random access reads and writes necessary to attempt to defeat compute bound algorithms, means the hash can become very slow, perhaps as slow as 10 or less hashes per second. This introduces an additional problem because the pool will expend nearly as much resources verifying mining share hashes as the miner will expend producing them. In 2013, I had attempted to resolve this side issue by successfully designing a 4096-bit hash based on the concepts of Bernstein's ChaCha hash. Any way, I abandoned it all because of the bolded statement above.

smooth

legendary

Activity: 2968

Merit: 1198

The actual cache size is less important than cache per core. That seems to have been fairly constant over a several year period (i.e. more cores and more cache both over time). How far into the future that will continue Huh

Whether L4 is worth using for this is unclear. It is about half the speed of L3. That probably translates into lower hash/watt, so kinda bad.

An algorithm with a much bigger memory footprint, like cuckoo, without a corresponding increase in verification time, might be sized for L4 though.

GingerAle

legendary

Activity: 1260

Merit: 1008

RE: POW mod.

It seems that cryptonight is fighting the good fight re: maintaining CPU dominance with few benefits of GPU (besides hardware configuration scalability). From what I understand, it does this by being "memory-hard" - i.e., requiring fast access to memory to be efficient. Today's CPUs are designed with fast access memory built in (the L3 cache), wherein GPUs have high bandwidth memory but not necessarily fast memory (random read/write)

I know aeon recently modded cryptonight to a lower memory requirement (1 Mb per instance).

I wonder if the algorithm could be modified so this parameter is slowly adaptive. While its impossible to know how hardware will evolve, the general architecture evolution has been to get faster memory closer to the CPU (apparently new chips have L4 cache).

I.e., whatever factor causes the algorithm to require 2 mb (or 1 in the case of aeon), could be modified to increase over time.... talking on the scale of decades, wherein over 10 years the memory requirement would go from 2 MB to 3 MB.. of course, to be dated using blocktime. Or maybe it should be exponential to follow the actual evolution of CPU cache.

So what would this do? On one hand, it wouldn't do much besides make outdated hardware useless. In a world without the adaptive tech, people with old hardware would get less of the network hashrate, because newer tech has larger / faster cache. In a world with adaptive tech, people with old hardware get less of the network hashrate, because the algorithm has changed to become harder. So, I think things are equal on this front.

What it does do is prohibit useful development of asics. And asics that are developed have a limited useful shelf life. Perhaps I will compare cryptonite light with standard to see where that parameter is.

Of course, this ties the POW to a function of time, which could get tricky. But I think that would be the easiest implementation of an adaptive system. Otherwise, we would have to make the protocol "sentient" in some fashion, and these could all be gamed - e.g., the protocol "knows" how frequently particular miners are being rewarded, and infers centralization.

smooth

legendary

Activity: 2968

Merit: 1198

Quote from: Hueristic on August 10, 2015, 10:33:20 AM

Quote from: smooth on July 31, 2015, 12:52:26 PM

Removing support for payment IDs entirely could be a consensus change. I have no idea if they intend that.

The transaction format could be made a lot cleaner if tx_extra were remove and a fixed size transaction-key field were added instead. Plus it would remove one obvious way for scumbags to stuff kiddie porn on the blockchain.

Shit, only skimmed this thread but this popped out at me. Is there that shit on the chain right now or are you warning of a possibility? Is anyone with the chain getting setup? To remove would take a hard fork prune?

How about adding a option to boycott specific transaction meta data and maintaining a blacklist? Is there another place where this is being discussed?

It's been discussion countless times on this forum and elsewhere in the context of Bitcoin (which indeed does have some nasty stuff on its blockchain) and possibly other coins. There isn't really a good solution, but there are less bad solutions.

Topic: [XMR] Monero Improvement Technical Discussion - page 7. (Read 14771 times)