Pages:
Author

Topic: [BBR] Boolberry Hash-on-blockchain discussion (Read 6885 times)

member
Activity: 109
Merit: 10
August 27, 2015, 09:19:12 AM
#45
Are there any other coins that have copied this hash function yet?
full member
Activity: 170
Merit: 100
Let's open hash-function discussion friends.
Just want to uncover our approach and show differences with CryptoNote that we use in our project announced here: https://bitcointalksearch.org/topic/bbr-boolberry-privacy-and-security-guaranteed-since-2014-577267

First of all I want to say that CryptoNote hash function (so called cn_slow_hash) is actually a very strong protected from ASIC's with different CPU instructions set as well as memory consuming algo. cn_slow_hash works hard on 2MB scratchpad and most of this scratchpad are fits in CPU cache.

For now it is difficult imagine that will be possible to make some specific hardware which will be more effective than CPU and will coast less than CPU. But world changes so fast, nobody knows what will happen in near future. We've all seen how rapid technological breakthroughs capable of performing the computer industry.  Huh

Since cn_slow_hash created 2MB scratchpad, it's have to cover all this data, that's why they use 220 iterations, and side-effect from this pretty slow work (about 500ms on normal laptop, twice faster on normal pc with suitable cpu cache). It may slow down synchronisation process at downloading blockchain (that is not a big problem) and theoretically it may be possible to attack network - connect and send a random block to make peer calculate slow_hash for useless fake block.

So, putting all together, we want to have:
1. Wide CPU instruction set
2. Memory-oriented algo
3. Small work time.

Realizing it, we've  tried to take a step to the side.

Idea of using blockchain data as scratchpad resulted in this hash function:



Actually this is a keccak hybrid, which use external scratchpad. After each keccack round, psudo-randomly addressed[state vector used as addresses] data is taken from scratchpad and xored with state.
Calculating each block PoW usualy hits about 1100 randomly addressed reading of blocks by 32 bytes.

I used "performance_tests" with different scratchpad size to find out memory hardness:

Quote
Warm up: 2161 ms
test_wild_keccak<400> - OK:
  loop count:    100000
  elapsed:       3020 ms
  time per call: 0 ms/call

Warm up: 2158 ms
test_wild_keccak<40000> - OK:
  loop count:    100000
  elapsed:       3060 ms
  time per call: 0 ms/call

Warm up: 2168 ms
test_wild_keccak<4000000> - OK:
  loop count:    100000
  elapsed:       3484 ms
  time per call: 0 ms/call

Warm up: 2156 ms
test_wild_keccak<40000000> - OK:
  loop count:    100000
  elapsed:       8119 ms
  time per call: 0 ms/call

Warm up: 2150 ms
test_wild_keccak<100000000> - OK:
  loop count:    100000
  elapsed:       8574 ms
  time per call: 0 ms/call

As you can see, working on small amount of memory 100000 hash operations takes 3020 ms, meanwhile work on 100Mb scratchpad with the same operations count takes 8574 ms.
Such difference(caused by the cache memory overflow) points to real memory hardness we guess.

Wellcome to comment.

Do you have some cryptography-based rationale on the "wild keccak" approach?
sr. member
Activity: 336
Merit: 250
Have you got a chance to optimize the miner?

Not much to do, since PoW still uses % operation (and gcc generates four div instructions in that function because there's also some array indexing), and I didn't figure out the C++ code with splattered-around lambda functions, callbacks and recursive macros.  And then summer came in Finland and I have been out more  Cool
Hash rate could be two times faster if I figure out what the code does and can do with less div operations.  (I have been programming in C for 20 years, maybe I should learn more C++..)

However, I replaced the keccak function with a faster one, but it does not help much, because much of the CPU time is spent in  div operations.
full member
Activity: 137
Merit: 100
seems this miner stuff is obfuscated on purpose, dev is probably running 10x faster miner himself..
but I have two days to make a faster version  Sad

Code:
#0  std::vector >::operator[] (this=0x7f837d3fb950, __n=12302) at /usr/include/c++/4.9.0/bits/stl_vector.h:780
#1  0x0000000000c4e75c in currency::miner::::operator()(uint64_t) const (__closure=0x7f837d3fb4e0, index=9325784170990468272)
    at /c/boolberry/src/currency_core/miner.cpp:365
#2  0x0000000000c4f912 in currency::::operator()(crypto::state_t_m &, crypto::mixin_t &) const (__closure=0x7f837d3fb200, st=...,
    mix=...) at /c/boolberry/src/currency_core/currency_format_utils.h:189
#3  0x0000000000c4fdab in crypto::wild_keccak; currency::blobdata = std::basic_string; uint64_t = long unsigned int]:: >(const uint8_t *, size_t, uint8_t *, size_t, currency::) (
    in=0x7f837d3fb540 "\366+\307\351\330\b3pQ\264\067\061ǭVjf1\034s \237\224\233\327\016\226-\332xko", inlen=33,
    md=0x7f837d3fb540 "\366+\307\351\330\b3pQ\264\067\061ǭVjf1\034s \237\224\233\327\016\226-\332xko", mdlen=32, cb=...) at /c/boolberry/src/crypto/wild_keccak.h:134
#4  0x0000000000c4fb09 in crypto::wild_keccak_dbl; currency::blobdata = std::basic_string; uint64_t = long unsigned int]:: >(const uint8_t *, size_t, uint8_t *, size_t, currency::) (
    in=0x7f8381dba098 "\001-\261FVm\345O\330\061\021\237\257\210\200\204\364=\374\243\031.\023\254\350\233O\372\373\262\032~łz$i", inlen=76,
    md=0x7f837d3fb540 "\366+\307\351\330\b3pQ\264\067\061ǭVjf1\034s \237\224\233\327\016\226-\332xko", mdlen=32, cb=...) at /c/boolberry/src/crypto/wild_keccak.h:151
#5  0x0000000000c4fa8d in currency::get_blob_longhash >(const currency::blobdata &, crypto::hash &, uint64_t, currency::miner::) (
    bd="\001-\261FVm\345O\330\061\021\237\257\210\200\204\364=\374\243\031.\023\254\350\233O\372\373\262\032~łz$i\000\216\353қ\005QJ\034k0\023pq\024y\\\356\031\343\376\376\342\366\250{\340\327\363\344RSn\002c\f\277\236\001", res=..., height=2056, accessor=...) at /c/boolberry/src/currency_core/currency_format_utils.h:179
#6  0x0000000000c4ec65 in currency::miner::worker_thread (this=0x7fff82e7fdd8) at /c/boolberry/src/currency_core/miner.cpp:366


Have you got a chance to optimize the miner?
sr. member
Activity: 336
Merit: 250
You have smoke for write algo hash or I did not understand ?

Talking to me?
I don't understand you  Grin
legendary
Activity: 1918
Merit: 1190
You have smoke for write algo hash or I did not understand ?


What interest is one pool not receveid more 20 personne ?
Pool as need validate best resultat of miner for validate submit .
Make validation 20 personne get more resources of mining directly .
sr. member
Activity: 336
Merit: 250
seems this miner stuff is obfuscated on purpose, dev is probably running 10x faster miner himself..
but I have two days to make a faster version  Sad

Code:
#0  std::vector >::operator[] (this=0x7f837d3fb950, __n=12302) at /usr/include/c++/4.9.0/bits/stl_vector.h:780
#1  0x0000000000c4e75c in currency::miner::::operator()(uint64_t) const (__closure=0x7f837d3fb4e0, index=9325784170990468272)
    at /c/boolberry/src/currency_core/miner.cpp:365
#2  0x0000000000c4f912 in currency::::operator()(crypto::state_t_m &, crypto::mixin_t &) const (__closure=0x7f837d3fb200, st=...,
    mix=...) at /c/boolberry/src/currency_core/currency_format_utils.h:189
#3  0x0000000000c4fdab in crypto::wild_keccak; currency::blobdata = std::basic_string; uint64_t = long unsigned int]:: >(const uint8_t *, size_t, uint8_t *, size_t, currency::) (
    in=0x7f837d3fb540 "\366+\307\351\330\b3pQ\264\067\061ǭVjf1\034s \237\224\233\327\016\226-\332xko", inlen=33,
    md=0x7f837d3fb540 "\366+\307\351\330\b3pQ\264\067\061ǭVjf1\034s \237\224\233\327\016\226-\332xko", mdlen=32, cb=...) at /c/boolberry/src/crypto/wild_keccak.h:134
#4  0x0000000000c4fb09 in crypto::wild_keccak_dbl; currency::blobdata = std::basic_string; uint64_t = long unsigned int]:: >(const uint8_t *, size_t, uint8_t *, size_t, currency::) (
    in=0x7f8381dba098 "\001-\261FVm\345O\330\061\021\237\257\210\200\204\364=\374\243\031.\023\254\350\233O\372\373\262\032~łz$i", inlen=76,
    md=0x7f837d3fb540 "\366+\307\351\330\b3pQ\264\067\061ǭVjf1\034s \237\224\233\327\016\226-\332xko", mdlen=32, cb=...) at /c/boolberry/src/crypto/wild_keccak.h:151
#5  0x0000000000c4fa8d in currency::get_blob_longhash >(const currency::blobdata &, crypto::hash &, uint64_t, currency::miner::) (
    bd="\001-\261FVm\345O\330\061\021\237\257\210\200\204\364=\374\243\031.\023\254\350\233O\372\373\262\032~łz$i\000\216\353қ\005QJ\034k0\023pq\024y\\\356\031\343\376\376\342\366\250{\340\327\363\344RSn\002c\f\277\236\001", res=..., height=2056, accessor=...) at /c/boolberry/src/currency_core/currency_format_utils.h:179
#6  0x0000000000c4ec65 in currency::miner::worker_thread (this=0x7fff82e7fdd8) at /c/boolberry/src/currency_core/miner.cpp:366
legendary
Activity: 2968
Merit: 1198
now when mining in testnet, currency::get_blob_longhash takes 75% of CPU time, and 75% of CPU time in that function is spent in doing div instructions due to size() and operator[], not quite memory-hard Cry

The block chain is tiny right? Probably all in near cache

EDIT: div cycle count on Sandy Bridge depends on divisor, bigger divisors take more time (latency: 30-94 cycles). As a comparison, L2 cache minimum latency is 11 cycles.

Right but memory is much higher latency. The idea is for the block chain data to (eventually) be in memory, not L2.

I agree replacing div by something faster is probably a good idea, but I haven't looked at this code at all.

sr. member
Activity: 336
Merit: 250
now when mining in testnet, currency::get_blob_longhash takes 75% of CPU time, and 75% of CPU time in that function is spent in doing div instructions due to size() and operator[], not quite memory-hard Cry

The block chain is tiny right? Probably all in near cache

EDIT: div cycle count on Sandy Bridge depends on divisor, bigger divisors take more time (latency: 30-94 cycles). As a comparison, L2 cache minimum latency is 11 cycles.

The vector size could as well be rounded up to next power of two and doing some padding, avoiding modulus by doing bitwise and operation, so instead of size() you use shift count..
I wouldn't like to uglify the code by implementing reciprocal multiplication hacks.

But who knows, maybe GPUs and ASICs have slow 64 bit divide Roll Eyes
legendary
Activity: 2968
Merit: 1198
now when mining in testnet, currency::get_blob_longhash takes 75% of CPU time, and 75% of CPU time in that function is spent in doing div instructions due to size() and operator[], not quite memory-hard Cry

The block chain is tiny right? Probably all in near cache

sr. member
Activity: 336
Merit: 250
now when mining in testnet, currency::get_blob_longhash takes 75% of CPU time, and 75% of CPU time in that function is spent in doing div instructions due to size() and operator[], not quite memory-hard Cry
sr. member
Activity: 336
Merit: 250
I feel that i need to have more clear description.
Each block's entry in scratchpad is not depends of number of transactions included in it.
It is fixed to about 320 bytes and use prev_id, merkle root, onetime coinbase  key, and hashed coinbase outs (usually 8 ).

(with mixin_t) with width=1600, rate=1536, and capacity=(1600-1536)=64, you get collision resistance=2^32,  (second) preimage resistance=2^32.
However, data is mixed into the state each round and you use multiply instead of xor, so security level of the construction is unknown.
hero member
Activity: 976
Merit: 646
Now lets imagine - just for a short time - this currency will be really successful. The blockchain of bitcoin grows faster thean linear in time, it does make sense to assume a slow exponential growth for a successful altcoin too.

Code:
~/.bitcoin $ du -sh blocks
20G     blocks/

Now lets just assume you hit a 40 G blockchain in 3 years. Are you sure there are enough nodes left with 64 GB Ram? What about the distribution of this kind of workstations?
I feel that i need to have more clear description.
Each block's entry in scratchpad is not depends of number of transactions included in it.
It is fixed to about 320 bytes and use prev_id, merkle root, onetime coinbase  key, and hashed coinbase outs (usually 8 ).
I'll put more detailed description.
newbie
Activity: 16
Merit: 0
Now lets imagine - just for a short time - this currency will be really successful. The blockchain of bitcoin grows faster thean linear in time, it does make sense to assume a slow exponential growth for a successful altcoin too.

Code:
~/.bitcoin $ du -sh blocks
20G     blocks/

Now lets just assume you hit a 40 G blockchain in 3 years. Are you sure there are enough nodes left with 64 GB Ram? What about the distribution of this kind of workstations?
hero member
Activity: 976
Merit: 646
As you can see, working on small amount of memory 100000 hash operations takes 3020 ms, meanwhile work on 100Mb scratchpad with the same operations count takes 8574 ms.
Such difference(caused by the cache memory overflow) points to real memory hardness we guess.

Compare memory read/written per second by the hash to memmove() speed. What do you get?

Does each round of keccak read from different areas of scratchpad?

1. it gives correlation between calculations time and memory wait time.
2. yes. addressing based on state buffer.
hero member
Activity: 976
Merit: 646
1. Wide CPU instruction set
2. Memory-oriented algo
3. Small work time.

Memorycoin failed on the 2 first point (I don't know about the last). It was AES-NI instruction only, if you didn't have a recent CPU, you were very slow or it didn't work.
It was supposed to rely on RAM amount to counter bot farming but it didn't.
I think it is a great to be memory dependent.

What do you think about a minimum memory amount to mine ?
Not very big.
It's not about huge memory amount, scratchpad is building on blocks pseudorandom data, such as hashes and tx keys, and will grow about 90MB/year. Huge scratchpad gonna make almost impossible to have SPV-client.
sr. member
Activity: 336
Merit: 250
As you can see, working on small amount of memory 100000 hash operations takes 3020 ms, meanwhile work on 100Mb scratchpad with the same operations count takes 8574 ms.
Such difference(caused by the cache memory overflow) points to real memory hardness we guess.

Compare memory read/written per second by the hash to memmove() speed. What do you get?

Does each round of keccak read from different areas of scratchpad?
legendary
Activity: 2156
Merit: 1131
1. Wide CPU instruction set
2. Memory-oriented algo
3. Small work time.

Memorycoin failed on the 2 first point (I don't know about the last). It was AES-NI instruction only, if you didn't have a recent CPU, you were very slow or it didn't work.
It was supposed to rely on RAM amount to counter bot farming but it didn't.
I think it is a great to be memory dependent.

What do you think about a minimum memory amount to mine ?

sr. member
Activity: 336
Merit: 250
We are looking for a way to be memory hard (at mining) on the one hand, on the other to use wider cpu instruction set (if possible). In our opinion it make sense in ASIC protection. Please correct me if i wrong.

You can't make it memory-hard with 24-round non-optimized keccak, so why insist on using it?
hero member
Activity: 976
Merit: 646
Is it possible to extend daemond to return full list of block header hashes instead of the full blockchain? What is the security implications of this approach? E.x: a malicious/compromised node can response with purposefully modified hash list?
Yes, for example we can make daemon able to return randomly requested block headers. But don't think it's necessary.

SPV client have to keep block id vector, like our wallet do, it is 8Mb per year.
For SPV client each new block should be received with the headers required to get this PoW.
To check if this supplied headers valid you just needed to get id-hash(cn_fast_hash which is actually keccak) of this header and validate if this id equal with  id in SPV's vector on correspond height.

Even if compromised node will make PoW with fake headers, SPV client is able to validate it.
So, probably we need to extend daemon to be able work with SPV clients by making possible to send blocks coupled with related PoW headers.

Think that SPV client could be done based on Wallet + p2p layer + PoW.

I believe that this coin can take CryptoNote as its core technology but it must separate itself from Bytecoin to make room for improvement. At least for the middle term.
Not sure that i gues what you mean about separating from Bytecoin.



Pages:
Jump to: