Author

Topic: Musings on Proof of Disk Space (Read 269 times)

sr. member
Activity: 467
Merit: 578
October 06, 2019, 03:13:17 PM
#5
In my mind, the only way to accurately check for space is to write as much random data to the drive as possible, store a secure hash of all the data, read it all back and confirm it still matches. File or space sizes can be spoofed. You've got to write, and then read the whole drive. Anything less, can, in theory be spoofed or faked.
This could  be further optimized but the overall idea of writing random data to the whole of the disk space and then hashing it and rechecking it would be a way of assuring that disk space has not been spoofed.  If we look at the way proof of work (works) then we will see the basic fundamentals at work and would just replace computation power with storage. The same principals can be used to determine proof of capacity.
hero member
Activity: 1778
Merit: 764
www.V.systems
October 02, 2019, 09:53:14 AM
#4
The problem I find with these types of functions, is that they all use some sort of short cut to determine the proof of space.

In my mind, the only way to accurately check for space is to write as much random data to the drive as possible, store a secure hash of all the data, read it all back and confirm it still matches. File or space sizes can be spoofed. You've got to write, and then read the whole drive. Anything less, can, in theory be spoofed or faked.

I have seen this kind of concepts before. What happens is people never consider the data redundancy fact or the effective read write rate. Which is basically the more important factor when you consider that there will be a read write event on the entire capacity of the storage that is being tested.
legendary
Activity: 990
Merit: 1108
October 02, 2019, 06:59:00 AM
#3
- Is there anything left to grind?
- Is anything to be gained by not storing PoW records?
- Is anything to be gained by storing PoW records differently?

The answers to these questions turn out to be yes;
one can successfully apply the "Hellman attack" from
https://ee.stanford.edu/~hellman/publications/36.pdf
by iterating the function that maps an index to (a PoW instance to a solution to) its hash.
legendary
Activity: 3416
Merit: 1912
The Concierge of Crypto
October 01, 2019, 11:58:40 AM
#2
The problem I find with these types of functions, is that they all use some sort of short cut to determine the proof of space.

In my mind, the only way to accurately check for space is to write as much random data to the drive as possible, store a secure hash of all the data, read it all back and confirm it still matches. File or space sizes can be spoofed. You've got to write, and then read the whole drive. Anything less, can, in theory be spoofed or faked.
legendary
Activity: 990
Merit: 1108
September 30, 2019, 09:00:05 AM
#1
I'd like to explore some simple-minded proof of disk space (PoD) design based on
the idea of separate PoD and tx chains, a characteristic shared by Burst and Chia.
I want to keep things as simple as possible. In particular,
- no proof of time
- no need to process more than a few disk blocks per second

The PoD chain will be as deterministic as possible, to prevent the possibility
of grinding attacks. It will not commit to any transaction data.
Each PoD block will just have the following fields
- height
- previous block hash
- timestamp
- farmer public key
- PoW solution.

A PoD block is valid if its timestamp is larger than the previous block's, and

    hash(prePoW) <= hash(PoW solution) < hash(prePoW) + 1 / difficulty

where prePoW consists of all fields apart from the PoW solution, and hashes are
interpreted as fractions in [0,1). The underlying PoW can be anything, either
hashcash or some memory hard asymmetric PoW with instant verification. We need
each PoW instance to commit to the farmer public key though, so that no
grinding is possible over the public key. Valid PoD blocks with timestamps in
the future are not considered until the local time catches up.

The separate tx chain will have blocks of tx data, each of which links to a
same-height PoD block and is signed with the public key in the latter.

The farming process
=============
I.e. how to extend the PoD chain.

Each farmer computes prePow hashes at increasingly larger timestamps and
looks for a close enough stored PoW solution hash on its hard disk.
On success, it regenerates the PoW instance and solves it to complete the new PoD block.

Storage of PoW hashes
===============

Let a PoW record be a pair (index, hash) where the 64-bit index allows us to
reconstruct the PoW instance, and hash is the most significant 64-bits of the
hash of the PoW solution.

A typical 4TB hard disk costs $100. Let's use up to $100 worth of leased mining
hardware and electricity to fill the disk with bucket sorted 12-byte PoW
records, where the most significant 32 bits of hash are used as bucket
selector.  Each of the 2^32 buckets will have on average 4*10^12/2^32/12 ~ 78
hash-sorted records, taking up under 1KB, or 2 512-byte disk blocks.

Work per PoW record
==============

If we optimistically assume that electricity costs $0.04 per KWh and that we
spend $10 on electricity, then that gives us (10 / 0.04) * 3600J = 0.9 MJ of energy.
Dividing that over the 10^12/3 PoW records gives 2.7 uJ (microJoule) per PoW
solution.  With memory access costs measured in picoJoules per bit, that could
involve on the order of a million bit accesses.

PoW solution rate
============

If we want to fill the disk in about one month, or 30 days, than the mining
hardware will use 250KWh / (30*24h) ~ 350W, and will need to produce 10^12/3 /
(30*24*3600) = 128600 solutions per second.

PoD as PoW amplification
================

Although this scheme aims to substitute disk space for work, it still involves a
large amount of work in filling the disk. This is mostly due to PoW records
being so compact. One would like to force them to take more space, but I do
not see any way to do so. Perhaps this is the real reason that Chia needs
to use proof of time.
This scheme can also be seen as a PoW amplifier. A 30 day run of a mining rig
is captured on disk so that it can be re-used for many years.

Questions
=========

- Is there some big error in my calculations?
- Or in the logic of this scheme?
- Is there anything left to grind?
- Is anything to be gained by not storing PoW records?
- Is anything to be gained by storing PoW records differently?
- Can we force larger PoW records?
- Is the assumed mining hardware within the realm of feasibility?
- Will the hard disk manufacturer have to prepare disks, presumably in a place with very cheap electricity?

-John
Jump to: