Looking for PoW recommendations. - page 2.

Cryddit

legendary

Activity: 924

Merit: 1132

I figured out the solution to another problem; unfortunately this solution will completely screw with fine distinctions among hashing algorithms. Pretty much the only meaningful parameter remaining to tweak is the amount of memory it'll take, because no matter what happens, mining will be CPU-bound by the rate at which miners can calculate signatures.

And yes, that can be parallelized, and GPU'd, and ASIC'd. I can screw with momentum hash badly enough to make it remain memory size bound, and maybe even somewhat memory bandwidth bound. But it's gonna be ugly.

You guys want a laugh? Here's the current block header structure. See if you can spot the ringer. Hint: I regard mining pools as an existential threat to the stability of any cryptocurrency.

offset bytes block 0 4 0 block version 4 4 0 block height 12 8 0 UNIX timestamp 20 4 0 current target (of this algm, compressed form) 24 4 0 current hash algm 28 4 0 zero, do not use (future update space) 32 32 0 merkle root of current tx tree 0 32 1 hash of prev block 32 32 1 merkle root of current unspent txout set 0 20 2 current nonce (yes damnit, that's huge) 20 32 2 sig of all the above (including nonce) with privkey of coinbase txOut 52 12 2 reserved for SHA2 endbit & length field (it actually uses 8 bytes)

smolen

hero member

Activity: 524

Merit: 500

More fun around Bitcoin block structure - the order of transactions in block is almost not restricted, so couple of KB of hidden data could be embedded into blockchain per 1MB block. If updating nonce2 becomes the bottleneck for mining and transaction fees will be significant, miner could iterate through transaction order by swapping children in nodes, optimal block will contain power of two number of transactions - 64, 128, 256 etc

smolen

hero member

Activity: 524

Merit: 500

Quote from: Cryddit on March 30, 2015, 03:12:29 PM

There are variations on the theme, some of which work by making it deliberately harder to construct an efficiently searchable compact index (forcing more table searching for each sub-nonce considered).

It provokes to create efficiently searchable and update-able index, that's good for a database, but, I think, wrong for mining. The scarcest resource going to be memory bandwidth, we can sacrify memory and then CPU for it. The miner has not to be deterministic, some nonces could be thrown away. I'd start with set of buckets with ring buffer inside of each, advance through nonces in big chunks, newly calculated hashes are first uploaded (from cache to main memory) into buckets and then, when entire new chunk is distributed, download each bucket back to fast cache and search for triplets with hash table. Memory access going to be smooth, may be even some HDD bandwidth could be used

djm34

legendary

Activity: 1400

Merit: 1050

Quote from: Cryddit on March 30, 2015, 03:12:29 PM

One of the ways to do momentum hashing is to search for a triplet birthday; that is, find three nonces such that when the block is hashed with each of them, the resulting hashes match (to a certain width).

There are variations on the theme, some of which work by making it deliberately harder to construct an efficiently searchable compact index (forcing more table searching for each sub-nonce considered). But the idea is that in order to form any kind of valid *whole* momentum nonce you have find to some number of sub-nonces that have some particular mathematical relationship which you can only find by comparing their results, meaning you have to keep an indexed table of the intermediate results. The twin paradox says you can find pairs with some related property fairly quickly, but efficiently finding triplets require either lotsa luck or a pretty full table.

could be interesting... (haven't had time yet to read through the whole discussion)

Cryddit

legendary

Activity: 924

Merit: 1132

One of the ways to do momentum hashing is to search for a triplet birthday; that is, find three nonces such that when the block is hashed with each of them, the resulting hashes match (to a certain width).

There are variations on the theme, some of which work by making it deliberately harder to construct an efficiently searchable compact index (forcing more table searching for each sub-nonce considered). But the idea is that in order to form any kind of valid *whole* momentum nonce you have find to some number of sub-nonces that have some particular mathematical relationship which you can only find by comparing their results, meaning you have to keep an indexed table of the intermediate results. The twin paradox says you can find pairs with some related property fairly quickly, but efficiently finding triplets require either lotsa luck or a pretty full table.

smolen

hero member

Activity: 524

Merit: 500

I was a bit confused with the term 'collision' in description of modified Momentum, was it for all 3 out of 3 or any 2 out of 3. Anyway, both lead to nice puzzles

GPU could calculate all hashes, discard all images but in small range that fits to GPU memory and do collision search there. So hash should be slow enough or search space should be big enough to prevent it. Separate PCs or GPUs could calculate ranges around adjanced base nonces (separated by two offset widths), perform collision search and then exchange data layer by layer instead of dropping a layer and calculating next. So hash should be fast enough to saturate PCI-E and LAN bandwidth.

New ARM cores (some or all, not sure) have hardware SHA1, a chance for mobiles to have a stake at mining.

Burst coin employs a novel way to waste resources, HDD mining. Perhaps they failed to address time to memory tradeoff.

Cryddit

legendary

Activity: 924

Merit: 1132

Quote from: Cryddit on March 29, 2015, 01:38:16 PM

Quote from: smolen on March 29, 2015, 06:00:55 AM

Another potential point of failure is hash function used in Merkle tree.

Oh? I haven't heard of anything dodgy or suspect about the Merkle tree implementation. What's better about something else?

Aw crap, you're absolutely right. Bitcoin (and pretty much every alt) is using a non-standard variation of Merkle trees which is subject to malleability. Under some circumstances sequences of transactions could be repeated in a block without changing the Merkle root. If there are clients that respond differently (reprocess, ignore, or reject block) to the repeated sequence of tx, and somebody puts an attack block out there that contains such a repeated sequence, that could cause a chain fork.

Cryddit

legendary

Activity: 924

Merit: 1132

I can answer some of that.

Double SHA256 was chosen over single SHA256 to thwart block boundary extension attacks.

With single SHA256, the mid-state (basis, along with the last block, on which the final hash is calculated) would depend only on the data in the first block. If this were the case, an adversary finding a collision for just the final block of the header could substitute bogus data for the last block. This would result in a fake header having the same hash (although the second block is different). Finding a one-block collision, in this case, isn't as difficult as you'd hope, because the bitcoin header varies only 12 bytes of the final block.

By using SHA256D, the final mid state (AND the final block which is hashed starting from that mid-state) depends on every bit of the header, including the bits in the last block.

This makes finding a collision on the whole header much, much harder.

That's why Satoshi picked SHA256D rather than plain SHA256.

And from that fact alone I knew he must be a crypto pro. Most beginners, or even talented, interested programmers who haven't done a lot of crypto already, would never have considered the possibility of an extension attack and modified the application of SHA256 to counter it. The usual thing for everybody except old hands, is that people think of SHA256 as a bulletproof primitive. But it isn't. It's sequential, and its state when it comes to any block depends only on the earlier blocks.

That said, I have absolutely no idea why Satoshi picked secp256k1 for the elliptic-curve stuff.

And I have no idea what attack Sergio is talking about. I was pretty sure that with the doubled construction to stop extension attacks, bitcoin's application of SHA256 was flawless.

smolen

hero member

Activity: 524

Merit: 500

Quote from: Cryddit on March 29, 2015, 01:38:16 PM

Quote from: smolen on March 29, 2015, 06:00:55 AM

Another potential point of failure is hash function used in Merkle tree.

Oh? I haven't heard of anything dodgy or suspect about the Merkle tree implementation. What's better about something else?

At the time of writing I meant possible full solution of SHA-256 in the future, but your response reminded me about old concerns with block header and inner structure of SHA-256 - and guess who started discussion here on bitcointalk? Well, SHA-256 splits input message into 512 bit (64 byte) blocks and process them in iterations (64 rounds each), the last block is padded with zeroes and message length. When Bitcoin computes Merkle root, it mostly (except transaction data itself) applies double SHA-256 to 64 bit messages. So the first iteration of the first hash process fixed state (initialization vector) and variable schedule (2 256-bit child hashes), the second iteration process variable ~~512~~ 256 bit state (result of previous iteration) and fully fixed schedule (padding block), the only iteration of the second hash gets fixed initialization vectors and a schedule with 256 variable and 256 fixed bits. Clearly there is some interference between SHA-256 design and the way it is used in Bitcoin. I see no way to exploit it, on the other hand double SHA-256 as used in block header hashing seems safe to me too, but Sergio_Demian_Lerner talks about possible attacks! (OK, there is a way to boost up mining by throwing away ~40000 binary operations out of ~120000, unfortunately, days when it could be used with dynamically reconfigured FPGAs are gone) Proposed new header structure hints that Sergio considers the first 512-bit block in header too fixed, but I fail to grasp the nature of his concerns.
Speculating of total solution of SHA-256 (quite unlikely while NP barrier stands, even Keccak seems more vulnerable despite bigger inner state because of lower height), it potentially could be possible to isolate new nodes by network manipulations, give them rewritten history with the same hashes and cause a fork or perform double-spend. This rather fantastic scenario could be performed only once, given that Bitcoin developers watch the blockchain.

EDIT: Still trying to understand what attack this header change protects from. Sergio dragged as much noise as possible from transaction data to schedule part. So might be attack works by fixing schedule (nonce and time in the header) and searching in state. Or there is some bad schedules that make state hashing trivial. But there is Merkle tree, even if degenerated, it's at least one double SHA-256. But there is obligatory block height in coinbase, so we cannot precompute more than for the current block. May be there is some way to lower the complexity by neutralizing one part of schedule by properly calculated rest? So nonce + free time bits will lower hardness of header hashing, and in coinbase we have plenty of free variables. Or ability to do length extension. But Satoshi chooses DOUBLE hashing, so enough noise will go through full hashing, how to jump over it? No, SHA2 developers choose fixed IV (even for short messages) and were aware about differentials and linear approximations. Good low-degree approximation? Unbelievable, and that would be sensation. Mystery. Oh, sorry for braindump

Cryddit

legendary

Activity: 924

Merit: 1132

Quote from: smolen on March 29, 2015, 06:00:55 AM

Quote from: Cryddit on March 28, 2015, 09:15:45 PM

the "nonce" presented for the final hashing against the difficulty function is presented as a full-width collision nonce plus two n-bit offsets, where your table size (and collision requirement) is 2ⁿ. If you don't update the table for every hash, evicting old values and inserting new ones, the things stored in the table will get too old to be within the range of offsets.

Probably I didn't understand your trick, how PoW will be verified on small nodes?

Okay... What's visible in the block is the nonce presented in the block header, which, when the block header is hashed, yields a hash that meets the difficulty target. You take *that* nonce and deconstruct it into a (64-bit) base nonce and two (smaller, depends on table size) offsets. The 64-bit nonce being one of the three momentum nonces, The 64-bit nonce minus the first of the two offsets being the second momentum nonce, and the 64-bit nonce plus the second of the two offsets being the third momentum nonce (and yes, I have made the nonce field in the header much wider to accommodate real memory-hard algorithms.)

Next you take the header and hash it three times - once with each of the three momentum nonces, instead of the original presented nonce. If the resulting hashes have the required degree of collision, then the presented nonce is valid. Meaning, it is one of the very few possible presentable nonces among billions which satisfies the triplet property required by momentum hashing.

Nothing that's hard for a small node to check; it has to do a hash, check the difficulty, separate the fields of the presented nonce, do one addition and one subtraction, then do three more hashes and compare the results for collision.

The trick is that because the two offsets are limited in width, the three momentum nonces must all be found in a narrow subrange of the 64-bit search space. That means that, once you have populated your table with two collisions per hash target, you can't just start hashing at top speed presenting whatever's in the table along with the new value as a triplet; because if you do, your hashing will rapidly overshoot the subrange and then you won't be able to form triplets that can be encoded that way because you'd have to use offsets that are too big to fit in the limited offset bit width.

Instead, you have to keep updating your table. Every time you find a new nonce, you have to evict the oldest nonce found in its table location (the one nearest, or past, the trailing edge of the subrange you're currently searching) and replace it with the new nonce (which is on the absolute leading edge of the subrange you're currently searching). That way, when you find the next nonce whose hash matches that target, you don't have the too-old entry in your table (now past the trailing edge of the subrange) preventing you from forming a valid presentable hash. And this means that you have to do a table read and a table write with every new nonce, instead of just a table read. Writes are much slower, and mean that hashing can't be parallelized much because table contention.

Quote from: smolen on March 29, 2015, 06:00:55 AM

Another potential point of failure is hash function used in Merkle tree.

Oh? I haven't heard of anything dodgy or suspect about the Merkle tree implementation. What's better about something else?

Cryddit

legendary

Activity: 924

Merit: 1132

It is not necessary that something should actually be more efficient for home CPU's; in fact, if something is truly efficient for ordinary run-of-the-mill home CPU's, that's like a pledge to feed all the poor starving botnet operators.

And botnet operators are the scum of the earth and I would rather kick them in the face than feed them, so I really and truly don't want anything that wouldn't enable a better class of machine to absolutely run circles around truly ordinary home machines.

What meets my criterion is that different classes of machines, if arranging the various hashing algorithms in order of expected revenue per minute of hashing, should all end up with a different sequence - and preferably with a different *first* algorithm of the sequence.

MaxDZ8

hero member

Activity: 672

Merit: 500

No such thing as "efficient for home CPUs": CPUs are not intensive compute devices. As long as the algo needs compute, it will be blasted by a proper compute-oriented architecture.

The only thing that makes sense for CPU is yescrypt, I'm surprised nobody pulled that out. Yescrypt is not compute oriented and therefore cannot be accelerated by using fast compute-oriented architectures. It is surprisingly scalable to cheap CPUs (that does not imply mining with them is profitable).

smolen

hero member

Activity: 524

Merit: 500

Quote from: Cryddit on March 28, 2015, 09:15:45 PM

the "nonce" presented for the final hashing against the difficulty function is presented as a full-width collision nonce plus two n-bit offsets, where your table size (and collision requirement) is 2ⁿ. If you don't update the table for every hash, evicting old values and inserting new ones, the things stored in the table will get too old to be within the range of offsets.

Probably I didn't understand your trick, how PoW will be verified on small nodes?

Quote from: Cryddit on March 28, 2015, 09:29:53 PM

There is no known attack, but it isn't as well studied and well vetted as some other curves, and at least one theoretical weakness (which could eventually turn into an attack) has been found. It doesn't affect mining/block chain security, but if there actually is a viable attack it HORRIBLY affects the security of the coins in the block chain against theft.

Another potential point of failure is hash function used in Merkle tree.

Cryddit

legendary

Activity: 924

Merit: 1132

Dr. Curtois has a point about secp256k1. It's more serious than the choice of one of multiple hash functions, which will just get kicked to the curb by difficulty adjustments if it gets broken: Secp256k1 is what actually secures people's coins in Bitcoin and every altcoin I've ever heard of. There is no known attack, but it isn't as well studied and well vetted as some other curves, and at least one theoretical weakness (which could eventually turn into an attack) has been found. It doesn't affect mining/block chain security, but if there actually is a viable attack it HORRIBLY affects the security of the coins in the block chain against theft.

I haven't done a damn thing about it yet in the code I'm working on, but that's a pointed reminder that I really need to before I release.

I'm nervous about it though; as a rule of thumb the more things you change the more chances you're taking on breaking it. And I'm changing a lot of stuff.

Cryddit

legendary

Activity: 924

Merit: 1132

Quote from: smolen on March 28, 2015, 07:04:31 PM

Choosing good base hash function is tough task. If it will too slow, hash calculations and collision search could be separated, GPUs will serve as hashing co-processors for workstations with loads of memory, or Ciscos will do bucket sort, distributing images between small PCs in a cluster. If it will be too small, the algorithm could be vulnerable to SAT solvers or algebraic attacks.

I've discovered a couple ways to "tweak" momentum hash.

I can force contention for the table (1 read and 1 write at an unpredicted location, per hash performed) by using a representation trick, so GPU hashing parallelism gets crushed by a bandwidth-to-main-memory requirement. So I can prevent pipelined hashing from being done without a need to read and write the table.

The representation trick is rather simple, really: the "nonce" presented for the final hashing against the difficulty function is presented as a full-width collision nonce plus two n-bit offsets, where your table size (and collision requirement) is 2ⁿ. If you don't update the table for every hash, evicting old values and inserting new ones, the things stored in the table will get too old to be within the range of offsets. This also reduces the effectiveness of the Bloom filter implementation by at least 50%; Bloom filter users will have to throw away their filter/tables every so often, losing their momentum, and build new ones.

smolen

hero member

Activity: 524

Merit: 500

Quote from: Cryddit on March 28, 2015, 05:19:34 PM

It's weird to think that if some hash function gets completely broken, the difficulty adjustments will immediately kick it to the curb and after a few quick blocks are awarded to it, life will continue as though it hadn't been broken at all.

Dr. Nicolas Courtois talking about "rolling curves"

smolen

hero member

Activity: 524

Merit: 500

Quote from: Cryddit on March 28, 2015, 01:22:29 PM

So there'll be a long period of GPU and CPU co-dominance before any ASICs exist (if mining the thing ever becomes valuable enough to justify ASIC development at all). Still, I want to have at least one hashing algo that will be obvious "low hanging fruit" for the ASIC developers so that if and when they get there they can leave the other algos alone.

Bitcoin almost turned FPGAs into consumer-grade devices and I was too lazy to grab one. May be there will be another chance with your coin

Quote from: Cryddit on March 28, 2015, 01:22:29 PM

For the memory-hard hashes I have been considering momentum, cuckoo cycle and scrypt, all with some tougher parameters than usually found.

I particularly like momentum hash, because somebody who can build a complete table should asymptotically approach one momentum hash per colliding-hash performed. This makes it very efficient for a particular amount of memory, with no advantage for greater amounts, so it seems like a good way to tune for machines with different levels of memory. I've seen how it broke for Protoshares, though, and that's a pretty general break. It means GPU memory counts for about 10x main memory in terms of hashing speed, so if the complete momentum table is less than 20Gbytes, it'll be faster on GPU. That's a pretty tough spec for a high-end home machine, but if GPUs have no advantage there, the GPU miners will be working on a different algorithm I hope. If I pop it up to 40Gbytes and 60Gbytes, those can be for the server-class-machines (for now - home machines in a few more years).

At 20Gbytes it's good for a custom-built high-end home machine - GPUs might be as fast there, but GPUs have easier things to do like concentrate on the SHA256D stuff.

Choosing good base hash function is tough task. If it will too slow, hash calculations and collision search could be separated, GPUs will serve as hashing co-processors for workstations with loads of memory, or Ciscos will do bucket sort, distributing images between small PCs in a cluster. If it will be too small, the algorithm could be vulnerable to SAT solvers or algebraic attacks.

Quote from: Cryddit on March 28, 2015, 01:22:29 PM

I don't care as much for cuckoo cycle, because it doesn't approach anything near the asymptotic hashing efficiency for large-memory machines that momentum does. Using a classic example from Graph theory was a great idea though; it inspires me to look into solvable instances of other classic problems like factoring, etc.

Nearest vector problem looks like a good candidate with decent cryptographic background.

smolen

hero member

Activity: 524

Merit: 500

Quote from: ?? on ??

Quote from: Bombadil on March 28, 2015, 08:15:24 AM

I can already tell you that the Whirlpool algo performs pretty well on older (=< compute 3.5, =< 780TI) Nvidia cards, while the newer Nvidia cards and most AMDs are in the same region. Quark does great on recent Nvidias, but not by that much difference. It all depends on optimization levels, I suppose.
I'm glad you're thinking about us cudaminers too Cheesy

Many devs release new algos with just an opencl miner. It works on Nvidias, but mostly at 1/3 it could do when programmed in Cuda.
Some people also think only AMDs are worth it while mining, while my records are showing otherwise.

Actually, I did WhirlpoolX for AMD, which is just Whirlpool, and slaughtered high end Maxwell. Grin

NVidia camp is defenseless - sp_ is on vacation, djm34 was busy with Ziftr, Christians are rather silent. Anyway, I think we are getting close to the GCN hardware limits. Straightforward 8-bit table approach appears to be limited by something like 5 instructions per s-box, wide-block multiplication could save ~25%, my current kernel is slightly above 540 MH/s on 280x@1150, that equals to ~9 op/sbox. I don't have yet good representation of Whirlpool s-box, the best known AES takes 114 AND/XOR/NXOR gates, with 32-bit registers that is ~3.6 op/sbox, probably the linear part of circuit going to be of the same size, so expect ~7 op/sbox for bitslising.

Quote from: Bombadil on March 28, 2015, 08:15:24 AM

And what about systems like Curecoin? They take 45% of a mined block and reward that to folders, off-chain. They also take 10% and use that for the back-end of that system. That latter is an argument why it might be less popular, but it's an interesting coin and distribution system nonetheless.

The idea of scientific coin is great, but good implementation is yet to come. Curecoin performs some calculation by undisclosed algorithm, we can only hope that it is for good. On the bright side, anonymous microbiology research could become profitable with North Korean or Iranian grants Grin

Bombadil

hero member

Activity: 644

Merit: 500

Quote from: Cryddit on March 28, 2015, 05:19:34 PM

Thanks for the pointers to Ramhog and Whirlpool. It's fairly easy to target several memory sizes for Memory-intensive and CPU algorithms, and the ASIC-able algorithms are also easy to find. But I'm really sort of at a loss to figure out what hashes are better for which families of graphics cards.

Well, you could make a new algo altogether per family and make sure it exploits the strengths of those families. Starting from scratch with that in mind will also make it A LOT easier to fully optimize the miner.
Will also generate attention, miners will like such algos and will probably be used in future coins too, but then you can claim you were the first.

Cryddit

legendary

Activity: 924

Merit: 1132

Thanks for the pointers to Ramhog and Whirlpool. It's fairly easy to target several memory sizes for Memory-intensive and CPU algorithms, and the ASIC-able algorithms are also easy to find. But I'm really sort of at a loss to figure out what hashes are better for which families of graphics cards.

Anyway, GPUs are going to own a lot of the hashing power on this thing for quite a while; aside from the seriously memory-intensive things I'm putting together, GPUs are going to be getting both their own share and the ASIC-able share for the foreseeable future. And because of awarding an equal number of coins per algorithm, developing an ASIC will only net people 1/algorithms coins anyway; I'm hoping that the prospect of getting 1/10 (or less) of the coins for the effort of developing ASICs will make them a poor business proposition for a long-ish time. But hopefully not forever. Grin

It's weird to think that if some hash function gets completely broken, the difficulty adjustments will immediately kick it to the curb and after a few quick blocks are awarded to it, life will continue as though it hadn't been broken at all.

Topic: Looking for PoW recommendations. - page 2. (Read 2589 times)