[BBR] Boolberry Hash-on-blockchain discussion

funnyman21

member

Activity: 109

Merit: 10

Are there any other coins that have copied this hash function yet?

doldgigger

full member

Activity: 170

Merit: 100

Quote from: crypto_zoidberg on April 28, 2014, 09:54:25 PM

Let's open hash-function discussion friends.
Just want to uncover our approach and show differences with CryptoNote that we use in our project announced here: https://bitcointalksearch.org/topic/bbr-boolberry-privacy-and-security-guaranteed-since-2014-577267

First of all I want to say that CryptoNote hash function (so called cn_slow_hash) is actually a very strong protected from ASIC's with different CPU instructions set as well as memory consuming algo. cn_slow_hash works hard on 2MB scratchpad and most of this scratchpad are fits in CPU cache.

For now it is difficult imagine that will be possible to make some specific hardware which will be more effective than CPU and will coast less than CPU. But world changes so fast, nobody knows what will happen in near future. We've all seen how rapid technological breakthroughs capable of performing the computer industry. Huh

Since cn_slow_hash created 2MB scratchpad, it's have to cover all this data, that's why they use 2²⁰ iterations, and side-effect from this pretty slow work (about 500ms on normal laptop, twice faster on normal pc with suitable cpu cache). It may slow down synchronisation process at downloading blockchain (that is not a big problem) and theoretically it may be possible to attack network - connect and send a random block to make peer calculate slow_hash for useless fake block.

So, putting all together, we want to have:
1. Wide CPU instruction set
2. Memory-oriented algo
3. Small work time.

Realizing it, we've tried to take a step to the side.

Idea of using blockchain data as scratchpad resulted in this hash function:

Actually this is a keccak hybrid, which use external scratchpad. After each keccack round, psudo-randomly addressed[state vector used as addresses] data is taken from scratchpad and xored with state.
Calculating each block PoW usualy hits about 1100 randomly addressed reading of blocks by 32 bytes.

I used "performance_tests" with different scratchpad size to find out memory hardness:

Quote

Warm up: 2161 ms
test_wild_keccak<400> - OK:
  loop count: 100000
  elapsed: 3020 ms
  time per call: 0 ms/call

Warm up: 2158 ms
test_wild_keccak<40000> - OK:
  loop count: 100000
  elapsed: 3060 ms
  time per call: 0 ms/call

Warm up: 2168 ms
test_wild_keccak<4000000> - OK:
  loop count: 100000
  elapsed: 3484 ms
  time per call: 0 ms/call

Warm up: 2156 ms
test_wild_keccak<40000000> - OK:
  loop count: 100000
  elapsed: 8119 ms
  time per call: 0 ms/call

Warm up: 2150 ms
test_wild_keccak<100000000> - OK:
  loop count: 100000
  elapsed: 8574 ms
  time per call: 0 ms/call

As you can see, working on small amount of memory 100000 hash operations takes 3020 ms, meanwhile work on 100Mb scratchpad with the same operations count takes 8574 ms.
Such difference(caused by the cache memory overflow) points to real memory hardness we guess.

Wellcome to comment.

Do you have some cryptography-based rationale on the "wild keccak" approach?

otila

sr. member

Activity: 336

Merit: 250

Quote from: BitRock on May 19, 2014, 10:22:40 AM

Have you got a chance to optimize the miner?

Not much to do, since PoW still uses % operation (and gcc generates four div instructions in that function because there's also some array indexing), and I didn't figure out the C++ code with splattered-around lambda functions, callbacks and recursive macros. And then summer came in Finland and I have been out more Cool

Hash rate could be two times faster if I figure out what the code does and can do with less div operations. (I have been programming in C for 20 years, maybe I should learn more C++..)

However, I replaced the keccak function with a faster one, but it does not help much, because much of the CPU time is spent in div operations.

BitRock

full member

Activity: 137

Merit: 100

Quote from: otila on May 14, 2014, 08:52:13 AM

seems this miner stuff is obfuscated on purpose, dev is probably running 10x faster miner himself..
but I have two days to make a faster version Sad

Code:

#0 std::vector >::operator[] (this=0x7f837d3fb950, __n=12302) at /usr/include/c++/4.9.0/bits/stl_vector.h:780
#1 0x0000000000c4e75c in currency::miner::::operator()(uint64_t) const (__closure=0x7f837d3fb4e0, index=9325784170990468272)
at /c/boolberry/src/currency_core/miner.cpp:365
#2 0x0000000000c4f912 in currency::::operator()(crypto::state_t_m &, crypto::mixin_t &) const (__closure=0x7f837d3fb200, st=...,
mix=...) at /c/boolberry/src/currency_core/currency_format_utils.h:189
#3 0x0000000000c4fdab in crypto::wild_keccak; currency::blobdata = std::basic_string; uint64_t = long unsigned int]:: >(const uint8_t *, size_t, uint8_t *, size_t, currency::) (
in=0x7f837d3fb540 "\366+\307\351\330\b3pQ\264\067\061ǭVjf1\034s \237\224\233\327\016\226-\332xko", inlen=33,
md=0x7f837d3fb540 "\366+\307\351\330\b3pQ\264\067\061ǭVjf1\034s \237\224\233\327\016\226-\332xko", mdlen=32, cb=...) at /c/boolberry/src/crypto/wild_keccak.h:134
#4 0x0000000000c4fb09 in crypto::wild_keccak_dbl; currency::blobdata = std::basic_string; uint64_t = long unsigned int]:: >(const uint8_t *, size_t, uint8_t *, size_t, currency::) (
in=0x7f8381dba098 "\001-\261FVm\345O\330\061\021\237\257\210\200\204\364=\374\243\031.\023\254\350\233O\372\373\262\032~łz$i", inlen=76,
md=0x7f837d3fb540 "\366+\307\351\330\b3pQ\264\067\061ǭVjf1\034s \237\224\233\327\016\226-\332xko", mdlen=32, cb=...) at /c/boolberry/src/crypto/wild_keccak.h:151
#5 0x0000000000c4fa8d in currency::get_blob_longhash >(const currency::blobdata &, crypto::hash &, uint64_t, currency::miner::) (
bd="\001-\261FVm\345O\330\061\021\237\257\210\200\204\364=\374\243\031.\023\254\350\233O\372\373\262\032~łz$i\000\216\353қ\005QJ\034k0\023pq\024y\\\356\031\343\376\376\342\366\250{\340\327\363\344RSn\002c\f\277\236\001", res=..., height=2056, accessor=...) at /c/boolberry/src/currency_core/currency_format_utils.h:179
#6 0x0000000000c4ec65 in currency::miner::worker_thread (this=0x7fff82e7fdd8) at /c/boolberry/src/currency_core/miner.cpp:366

Have you got a chance to optimize the miner?

otila

sr. member

Activity: 336

Merit: 250

Quote from: perl on May 18, 2014, 05:48:44 PM

You have smoke for write algo hash or I did not understand ?

Talking to me?
I don't understand you Grin

perl

legendary

Activity: 1918

Merit: 1190

You have smoke for write algo hash or I did not understand ?

What interest is one pool not receveid more 20 personne ?
Pool as need validate best resultat of miner for validate submit .
Make validation 20 personne get more resources of mining directly .

otila

sr. member

Activity: 336

Merit: 250

seems this miner stuff is obfuscated on purpose, dev is probably running 10x faster miner himself..
but I have two days to make a faster version Sad

Code:

#0 std::vector >::operator[] (this=0x7f837d3fb950, __n=12302) at /usr/include/c++/4.9.0/bits/stl_vector.h:780
#1 0x0000000000c4e75c in currency::miner::::operator()(uint64_t) const (__closure=0x7f837d3fb4e0, index=9325784170990468272)
at /c/boolberry/src/currency_core/miner.cpp:365
#2 0x0000000000c4f912 in currency::::operator()(crypto::state_t_m &, crypto::mixin_t &) const (__closure=0x7f837d3fb200, st=...,
mix=...) at /c/boolberry/src/currency_core/currency_format_utils.h:189
#3 0x0000000000c4fdab in crypto::wild_keccak; currency::blobdata = std::basic_string; uint64_t = long unsigned int]:: >(const uint8_t *, size_t, uint8_t *, size_t, currency::) (
in=0x7f837d3fb540 "\366+\307\351\330\b3pQ\264\067\061ǭVjf1\034s \237\224\233\327\016\226-\332xko", inlen=33,
md=0x7f837d3fb540 "\366+\307\351\330\b3pQ\264\067\061ǭVjf1\034s \237\224\233\327\016\226-\332xko", mdlen=32, cb=...) at /c/boolberry/src/crypto/wild_keccak.h:134
#4 0x0000000000c4fb09 in crypto::wild_keccak_dbl; currency::blobdata = std::basic_string; uint64_t = long unsigned int]:: >(const uint8_t *, size_t, uint8_t *, size_t, currency::) (
in=0x7f8381dba098 "\001-\261FVm\345O\330\061\021\237\257\210\200\204\364=\374\243\031.\023\254\350\233O\372\373\262\032~łz$i", inlen=76,
md=0x7f837d3fb540 "\366+\307\351\330\b3pQ\264\067\061ǭVjf1\034s \237\224\233\327\016\226-\332xko", mdlen=32, cb=...) at /c/boolberry/src/crypto/wild_keccak.h:151
#5 0x0000000000c4fa8d in currency::get_blob_longhash >(const currency::blobdata &, crypto::hash &, uint64_t, currency::miner::) (
bd="\001-\261FVm\345O\330\061\021\237\257\210\200\204\364=\374\243\031.\023\254\350\233O\372\373\262\032~łz$i\000\216\353қ\005QJ\034k0\023pq\024y\\\356\031\343\376\376\342\366\250{\340\327\363\344RSn\002c\f\277\236\001", res=..., height=2056, accessor=...) at /c/boolberry/src/currency_core/currency_format_utils.h:179
#6 0x0000000000c4ec65 in currency::miner::worker_thread (this=0x7fff82e7fdd8) at /c/boolberry/src/currency_core/miner.cpp:366

smooth

legendary

Activity: 2968

Merit: 1198

Quote from: otila on May 13, 2014, 05:22:37 AM

Quote from: smooth on May 13, 2014, 04:00:45 AM

Quote from: otila on May 13, 2014, 03:50:46 AM

now when mining in testnet, currency::get_blob_longhash takes 75% of CPU time, and 75% of CPU time in that function is spent in doing div instructions due to size() and operator[], not quite memory-hard Cry

The block chain is tiny right? Probably all in near cache

EDIT: div cycle count on Sandy Bridge depends on divisor, bigger divisors take more time (latency: 30-94 cycles). As a comparison, L2 cache minimum latency is 11 cycles.

Right but memory is much higher latency. The idea is for the block chain data to (eventually) be in memory, not L2.

I agree replacing div by something faster is probably a good idea, but I haven't looked at this code at all.

otila

sr. member

Activity: 336

Merit: 250

Quote from: smooth on May 13, 2014, 04:00:45 AM

Quote from: otila on May 13, 2014, 03:50:46 AM

now when mining in testnet, currency::get_blob_longhash takes 75% of CPU time, and 75% of CPU time in that function is spent in doing div instructions due to size() and operator[], not quite memory-hard Cry

The block chain is tiny right? Probably all in near cache

EDIT: div cycle count on Sandy Bridge depends on divisor, bigger divisors take more time (latency: 30-94 cycles). As a comparison, L2 cache minimum latency is 11 cycles.

The vector size could as well be rounded up to next power of two and doing some padding, avoiding modulus by doing bitwise and operation, so instead of size() you use shift count..
I wouldn't like to uglify the code by implementing reciprocal multiplication hacks.

But who knows, maybe GPUs and ASICs have slow 64 bit divide Roll Eyes

smooth

legendary

Activity: 2968

Merit: 1198

Quote from: otila on May 13, 2014, 03:50:46 AM

now when mining in testnet, currency::get_blob_longhash takes 75% of CPU time, and 75% of CPU time in that function is spent in doing div instructions due to size() and operator[], not quite memory-hard Cry

The block chain is tiny right? Probably all in near cache

otila

sr. member

Activity: 336

Merit: 250

now when mining in testnet, currency::get_blob_longhash takes 75% of CPU time, and 75% of CPU time in that function is spent in doing div instructions due to size() and operator[], not quite memory-hard Cry

otila

sr. member

Activity: 336

Merit: 250

Quote from: crypto_zoidberg on May 10, 2014, 09:46:26 AM

I feel that i need to have more clear description.
Each block's entry in scratchpad is not depends of number of transactions included in it.
It is fixed to about 320 bytes and use prev_id, merkle root, onetime coinbase key, and hashed coinbase outs (usually 8 ).

(with mixin_t) with width=1600, rate=1536, and capacity=(1600-1536)=64, you get collision resistance=2^32, (second) preimage resistance=2^32.
However, data is mixed into the state each round and you use multiply instead of xor, so security level of the construction is unknown.

crypto_zoidberg

hero member

Activity: 976

Merit: 646

Quote from: hirschhornsalz on May 10, 2014, 06:52:43 AM

Now lets imagine - just for a short time - this currency will be really successful. The blockchain of bitcoin grows faster thean linear in time, it does make sense to assume a slow exponential growth for a successful altcoin too.

Code:

~/.bitcoin $ du -sh blocks
20G blocks/

Now lets just assume you hit a 40 G blockchain in 3 years. Are you sure there are enough nodes left with 64 GB Ram? What about the distribution of this kind of workstations?

I feel that i need to have more clear description.
Each block's entry in scratchpad is not depends of number of transactions included in it.
It is fixed to about 320 bytes and use prev_id, merkle root, onetime coinbase key, and hashed coinbase outs (usually 8 ).
I'll put more detailed description.

hirschhornsalz

newbie

Activity: 16

Merit: 0

Now lets imagine - just for a short time - this currency will be really successful. The blockchain of bitcoin grows faster thean linear in time, it does make sense to assume a slow exponential growth for a successful altcoin too.

Code:

~/.bitcoin $ du -sh blocks
20G blocks/

Now lets just assume you hit a 40 G blockchain in 3 years. Are you sure there are enough nodes left with 64 GB Ram? What about the distribution of this kind of workstations?

crypto_zoidberg

hero member

Activity: 976

Merit: 646

Quote from: otila on May 10, 2014, 03:21:33 AM

Quote from: crypto_zoidberg on April 28, 2014, 09:54:25 PM

As you can see, working on small amount of memory 100000 hash operations takes 3020 ms, meanwhile work on 100Mb scratchpad with the same operations count takes 8574 ms.
Such difference(caused by the cache memory overflow) points to real memory hardness we guess.

Compare memory read/written per second by the hash to memmove() speed. What do you get?

Does each round of keccak read from different areas of scratchpad?

1. it gives correlation between calculations time and memory wait time.
2. yes. addressing based on state buffer.

crypto_zoidberg

hero member

Activity: 976

Merit: 646

Quote from: superresistant on May 10, 2014, 01:48:20 AM

Quote from: crypto_zoidberg on April 28, 2014, 09:54:25 PM

1. Wide CPU instruction set
2. Memory-oriented algo
3. Small work time.

Memorycoin failed on the 2 first point (I don't know about the last). It was AES-NI instruction only, if you didn't have a recent CPU, you were very slow or it didn't work.
It was supposed to rely on RAM amount to counter bot farming but it didn't.
I think it is a great to be memory dependent.

What do you think about a minimum memory amount to mine ?

Not very big.
It's not about huge memory amount, scratchpad is building on blocks pseudorandom data, such as hashes and tx keys, and will grow about 90MB/year. Huge scratchpad gonna make almost impossible to have SPV-client.

otila

sr. member

Activity: 336

Merit: 250

Quote from: crypto_zoidberg on April 28, 2014, 09:54:25 PM

As you can see, working on small amount of memory 100000 hash operations takes 3020 ms, meanwhile work on 100Mb scratchpad with the same operations count takes 8574 ms.
Such difference(caused by the cache memory overflow) points to real memory hardness we guess.

Compare memory read/written per second by the hash to memmove() speed. What do you get?

Does each round of keccak read from different areas of scratchpad?

superresistant

legendary

Activity: 2156

Merit: 1131

Quote from: crypto_zoidberg on April 28, 2014, 09:54:25 PM

1. Wide CPU instruction set
2. Memory-oriented algo
3. Small work time.

Memorycoin failed on the 2 first point (I don't know about the last). It was AES-NI instruction only, if you didn't have a recent CPU, you were very slow or it didn't work.
It was supposed to rely on RAM amount to counter bot farming but it didn't.
I think it is a great to be memory dependent.

What do you think about a minimum memory amount to mine ?

otila

sr. member

Activity: 336

Merit: 250

Quote from: crypto_zoidberg on May 02, 2014, 04:56:24 PM

We are looking for a way to be memory hard (at mining) on the one hand, on the other to use wider cpu instruction set (if possible). In our opinion it make sense in ASIC protection. Please correct me if i wrong.

You can't make it memory-hard with 24-round non-optimized keccak, so why insist on using it?

crypto_zoidberg

hero member

Activity: 976

Merit: 646

Quote from: digicoin on May 01, 2014, 04:03:37 AM

Is it possible to extend daemond to return full list of block header hashes instead of the full blockchain? What is the security implications of this approach? E.x: a malicious/compromised node can response with purposefully modified hash list?

Yes, for example we can make daemon able to return randomly requested block headers. But don't think it's necessary.

SPV client have to keep block id vector, like our wallet do, it is 8Mb per year.
For SPV client each new block should be received with the headers required to get this PoW.
To check if this supplied headers valid you just needed to get id-hash(cn_fast_hash which is actually keccak) of this header and validate if this id equal with id in SPV's vector on correspond height.

Even if compromised node will make PoW with fake headers, SPV client is able to validate it.
So, probably we need to extend daemon to be able work with SPV clients by making possible to send blocks coupled with related PoW headers.

Think that SPV client could be done based on Wallet + p2p layer + PoW.

Quote from: digicoin on May 01, 2014, 04:03:37 AM

I believe that this coin can take CryptoNote as its core technology but it must separate itself from Bytecoin to make room for improvement. At least for the middle term.

Not sure that i gues what you mean about separating from Bytecoin.

Topic: [BBR] Boolberry Hash-on-blockchain discussion (Read 6885 times)