Author

Topic: [BOUNTY] Next Generation Proof of Work Algorithm 1 BTC + 1000 BitShare (Read 5166 times)

hero member
Activity: 770
Merit: 566
fractally
The hardware that you would have to specialize to improve my current algorithm would be RAM densities, bus speeds, and fast recovery from branch miss-prediction.   These are all general purpose improvements that would benefit everyone. 

A GPU is data parallel and thus can gain by adding even more specialized (less general purpose) transistors.
newbie
Activity: 67
Merit: 0
Don't you want a specialized hardware or the advance of technology ?
The gpu is an advance of technology and not a specialized hardware .
I think the problem of the asics is its not general purpose characteristic . With that asics you can compute only sha and the sha is not a general purpose computation.
What the gpu do is a massive general purpose parallel computation and this is more general purpose than the serial cpu.
The old theoretical advantage of a serial general purpose processor is the ability to scale a generic problem where one part is parallelizable and one other not . If you increase the power of this serial processor you increase the performance of the same factor .
In a parallel processor increasing the number of parallel cores you can increase the performance only for the parallel side of the problem.
The history tell this assumption was wrong for the simple reason that in the reality when you have a bigger problem what you get is a problem with its parallel side increased so the natural solution is an increase of the parallel solver. This is well known by the scientific comunity .
So I don't think is the case to have an asic specialized for serial computations.

I think the solution is to find a proof of work that is a natural generic problem solver and if this let the advance of an asic this will be welcome because it will be an advance of technology useful not only for proof of work but for everything.
hero member
Activity: 770
Merit: 566
fractally
My first optimization:  replace the final 'hash' of the entire 128MB buffer with CityHash128 instead of sha1.  This is 'secure' because you are hashing 'random data' seeded from a secure random number generator seeded by sha256 hash of the block header.   

End result:  to create a valid collision, you would have to flip a bit in the block header that could survive sha256+blowfish and then collide with CityHash.

Given that CityHash should be about 20x faster than sha1 and the final sha1 is the a large part of my benchmark time (~1.5-2 sec) to hash 128 MB, I should be able to significantly reduce the validation time from 7 days to less than 1.   Given my latest design for BitShares only requires 1 year of history, it could potentially be well under 12 hours to validate the chain.

I could probably accelerate it more by 'spot checking' instead of hashing the whole thing.
hero member
Activity: 770
Merit: 566
fractally
   I am in need of a next-generation proof-of-work algorithm and my goal is to select the algorithm for BitShares.   The goal of the algorithm is to prevent the profitability of the development of specialized hardware that has an order of magnitude advantage over the hardware already owned by the masses.     I will pay 1 BTC to the best POW algorithm that I select for use with BitShares.   To kick off the ideas, I would like to present my draft algorithm:  https://github.com/bytemaster/bitshares/blob/master/src/proof_of_work.cpp           Now I want to encourage people to post ideas and cooperate, so I will divide up the bounty based upon whoever provided insights that are ultimately included in the final algorithm.   In other words, there is partial credit for contributing to the problem and NOTHING to gain by hiding your algorithm from everyone else to prevent giving away your insights.  

   The proof-of-work used by Bitcoin is double SHA256 and has resulted in the production of specialized ASIC chips designed to perform this function.  This level of specialization has caused mining to become a high-risk specialty business that is controlled by a small minority of the Bitcoin community.   This centralization becomes a liability on the network because these miners can be manipulated or shut down just like the exchanges.   Mining pools are another form of centralization that also serves to centralize power.  Taking out a single mining pool could disrupt the hashing power of the network to such an extent that it could delay transactions for days until the difficulty could adjust.

   The first step toward keeping the proof-of-work more decentralized is to design a hash function that will perform best on a CPU and will not benefit at all from a GPU or even specialized ASIC.  There are several keys to creating such an algorithm.  

   First, it must be based upon utilizing the most transistors in the most efficient means available that is also widely available to all consumers.  RAM and cache are very dense, highly-optimized ASIC chips that are already distributed far and wide.   Historically RAM and CPU power availability have tended to grow at about the same rate; therefore, RAM usage is a requirement that specialized ASIC chips or even GPUs will not be able to optimize away via ‘parallel’ operation.  This leaves all performance gains to be serial in nature.   Another characteristic of using RAM is that system bus speed will be far more important than CPU power as the CPU will be starved for data most of the time.  By keeping the CPU starved for data optimizations to the CPU processing will not affect the bottleneck and therefore even a 10 fold increase in performance will not result in an increase in throughput.  
    
   Second, it must be based upon sequential data processing with no ability to perform branch prediction.  This alone would prevent most data-parallel or pre-fetching optimizations from working which will make the GPU architecture perform far worse than a CPU.  GPUs also suffer from memory bus limitations and their performance drops significantly if the working set cannot be kept in the much smaller local caches associated with each core of the GPU.
  
   My algorithm  so far should meet almost all of my criteria, the only 'problem' with it is that it takes about 1-2 seconds per hash on a Core i7 which means that it would take almost 7 days just to validate the existing blockchain.    This also means that a secondary 'proof-of-work' would be required to prevent spamming the network with bogus block headers and forcing everyone to perform a 2 second calculation.

   One of the measures by which candidates will be judged is how quickly they can be validated while still preventing optimizations such as 'sparse memory buffers' or significantly reducing the RAM requirement.

   Assuming this the winning algorithm is ultimately included in a deployed BitShare blockchain, I will also award a 1000 BS bounty payable within the first 6 months of BS launch in the same ratio as the 1 BTC bounty.   Obviously the BitShare portion of this bounty cannot and will not be paid if BitShares are never released.    

Let the games begin!
 

Jump to: