POW via training/validating Deep Neural Networks

HeRetiK

legendary

Activity: 3192

Merit: 2248

Top-tier crypto casino and sportsbook

Quote from: lordjulian on June 19, 2018, 09:40:38 PM

3c) Experts in the field, e.g., computer vision, after looking at the dataset, will know how difficult the problem is.

Quote from: lordjulian on July 11, 2018, 03:58:42 AM

Experts = Miners.

They will look at the current elected problem (equivalent to hash problem), and decide on whether to participate in the mining (training of the neural network). They do not rate a dataset per se.

So... human experts in the field will manually monitor their GPU clusters at all times to determine the difficulty of a problem? ಠ_ಠ

Quote from: lordjulian on July 11, 2018, 03:58:42 AM

Yes, so I guess the solution is to somehow automatically select the next block of POW problem, and let miners just work on it.

That's merely a vague way to describe one of the many problems of this approach, not the solution.

Quote from: lordjulian on July 11, 2018, 03:58:42 AM

Quote from: HeRetiK on June 20, 2018, 04:07:00 AM

Quote from: lordjulian on June 19, 2018, 12:17:50 PM

Quote from: HeRetiK on June 19, 2018, 01:15:37 PM

How to stop rogue clients from DDoSing the network by flooding it with wrong timestamps and turning 2 minute block intervals into 2 days or weeks or years?

By attaching a cost/fee to each broadcasted result?

So miners are paying to mine blocks?

Why would anyone pay to mine a block? How would currency be issued in a system where miners pay for mining blocks but don't receive any block rewards? Or does the fee simply get substracted from the block reward? In which case, how is it a fee if miners simply receive reduced block rewards to begin with?

We will just follow how bitcoin does for handling DDOS. if bitcoin sucks at it, we will have to think of a way of facing DDOS attacks.

Bitcoin handles the problem of timestamping by using a PoW scheme with an objectively determinable difficulty allowing for striving towards predetermined block intervals. It's why Bitcoin and other alts use cryptographic hashes as part of their PoW scheme and not a problem that can't be quantified without human intervention. I'd recommend taking a closer look at how and why PoW works and what problems it solves.

aliashraf

legendary

Activity: 1456

Merit: 1177

Always remember the cause!

A naive perception of PoW, specially in the first stages of everybody's journey in crypto, would be this false idea that the energy and resources are somehow 'wasted' in POW and asking why instead of foolishly and exhaustively wasting time and energy, miners shouldn't utilize their resources to do something more 'useful'?

Noobs are not alone, PoS proponents repeatedly accuse PoW of not being 'environment friendly' and try to take advantage of it to advertise their shitty ideas, as well. They suppose in PoW miners are solving a 'meaningless' puzzle, just like noobs but instead of trying to 'fix' this by suggesting an alternative problem for miners to solve, they ask themselves: Why PoW?

I suppose they are just bored with failing to 'fix' imaginary problems like this and real problems like scalability. Anyhow this is interesting to see noobs are best candidates for trapping in PoS agenda: Making an environment friendly, zero cost, global monetary system based on reputation/credit/deposit ==>stake of participants.

Miners are not wasting anything, they are consuming energy to prove their commitment to the protocol instead of spamming the network.

This is canonical and can't be changed even a bit. If mining process might be considered a byproduct of some other problem-solving procedure that is considered to be 'useful' , miners would be rewarded because of that job primarily and their commitment to the protocol would be measured (and rewarded) much lower because they would have much less incentive to remain loyal.

Accusing PoW of being a waste of energy and computation power is based on a false assumption about the hashing algorithm being stupidly useless.
It is not. The nonce miners try to find in the process, makes the block header qualified for being immutable, adding such a 'quality' to data is not wasting anything.

On the contrary I think tha value added to data in mining process, is one of the most important and useful products of civilization ever. And it is just ridiculous to assume the work and the energy consumed for the production process as a waste.

Noobs and their PoS leaders are the ones who are wasting something: our time.

lordjulian

newbie

Activity: 15

Merit: 3

Quote from: HeRetiK on June 20, 2018, 04:07:00 AM

Quote from: lordjulian on June 19, 2018, 09:40:38 PM

3c) Experts in the field, e.g., computer vision, after looking at the dataset, will know how difficult the problem is.

How to know who is an expert? Why would experts spend time evaluating datasets for a cryptocurrency? How to ensure that they evaluate the difficulty of a given challenge fairly and objectively?

Introducing the need for experts to evaluate the nature of the work to be solved (ie. the dataset difficulty) means introducing trusted third parties. This means you lose the core value propositions of cryptocurrencies -- being trustless and permissionless. And since the nature of evaluating the difficulty of a dataset appears to be subjective, you even lose the security aspect of cryptocurrencies.

Don't get me wrong, if you turn this project into a sort of incentivized Folding@home there might be some merit in terms of gamification to it. But from my current understanding of this approach it would make for an awful cryptocurrency.

Experts = Miners.

They will look at the current elected problem (equivalent to hash problem), and decide on whether to participate in the mining (training of the neural network). They do not rate a dataset per se.

Quote from: HeRetiK on June 20, 2018, 04:07:00 AM

Quote from: lordjulian on June 19, 2018, 12:17:50 PM

1) bootstrapping is similar to how bitcoin started, no miners, easy to control 51% of the hash power. Value of coin is low. the pioneers will have to bootstrap the chain by giving out coins, etc. or follow the footsteps of ETH. Need to do research on how these 2 started.

I was referring to the following statement:

Quote from: lordjulian

Sybill attacks can be discouraged by requiring each dataset that has a defined problem to come with an attached processing fee.

According to this, you have to pay a processing fee for either (a) submitting a dataset or for (b) solving a dataset challenge ie. mine a block. I'm not quite sure whether you had (a) or (b) in mind. Either way this means that currency needs to exist before the first block can be mined. In essence, this introduces another centralized entity, namely one that has to issue the first units of currency.

Yes, so I guess the solution is to somehow automatically select the next block of POW problem, and let miners just work on it.

Quote from: HeRetiK on June 20, 2018, 04:07:00 AM

Quote from: lordjulian on June 19, 2018, 12:17:50 PM

Quote from: HeRetiK on June 19, 2018, 01:15:37 PM

2) yes, authentic dataset creation is a tough problem. Dataset contribution can be constrained as follows:

only aggregate datasets are considered, i.e., 100 individual user profile pictures, contributed by 100 users. This will increase the sybil attack cost. The mining problem could be to classify the user pictures into male/female, different races, etc.

3) fake dataset if contributed by a single person, is hard to detect. That's why I suggested the above. Some one posting a dataset and solving it himself would not reap benefits, unless he mobilize an army of miners, as described earlier.

100 individual users? Users of what? Facebook? Google? Some other form of online account? How does relying on the accounts of a third party service protect against a sybill attack? How do you even know that 100 users are 100 different persons?

Identifying 100 individual users (or votes, or however you may call it) is at the core of the sybill attack problem. Not its solution.

Yes, you hit it on the nail. In our original design, each individual can contribute their personal data to the ecosystem, or contribute bulk data to the ecosystem, and I was referring to the bulk data part. But you are right, there is no way to validate the authenticity of the bulk data, which can be doctored.

Quote from: HeRetiK on June 20, 2018, 04:07:00 AM

Quote from: lordjulian on June 19, 2018, 12:17:50 PM

Quote from: HeRetiK on June 19, 2018, 01:15:37 PM

How to stop rogue clients from DDoSing the network by flooding it with wrong timestamps and turning 2 minute block intervals into 2 days or weeks or years?

By attaching a cost/fee to each broadcasted result?

So miners are paying to mine blocks?

Why would anyone pay to mine a block? How would currency be issued in a system where miners pay for mining blocks but don't receive any block rewards? Or does the fee simply get substracted from the block reward? In which case, how is it a fee if miners simply receive reduced block rewards to begin with?

We will just follow how bitcoin does for handling DDOS. if bitcoin sucks at it, we will have to think of a way of facing DDOS attacks.

lordjulian

newbie

Activity: 15

Merit: 3

To recap: a lot of problems need to be addressed before AI tasks can be used as POW.

1) Data must be true / valid

2) Selection of Data must be random / fair

3) Validation of work must be fast, easy, and deterministic

4) Computation must be non-trivial

Main obstacles lie in 1 and 2, where data is too large to store on-chain, and must be stored off-chain via IPFS, with a data hash-signature stored on-chain.

HeRetiK

legendary

Activity: 3192

Merit: 2248

Top-tier crypto casino and sportsbook

Quote from: lordjulian on June 19, 2018, 09:40:38 PM

3c) Experts in the field, e.g., computer vision, after looking at the dataset, will know how difficult the problem is.

How to know who is an expert? Why would experts spend time evaluating datasets for a cryptocurrency? How to ensure that they evaluate the difficulty of a given challenge fairly and objectively?

Introducing the need for experts to evaluate the nature of the work to be solved (ie. the dataset difficulty) means introducing trusted third parties. This means you lose the core value propositions of cryptocurrencies -- being trustless and permissionless. And since the nature of evaluating the difficulty of a dataset appears to be subjective, you even lose the security aspect of cryptocurrencies.

Don't get me wrong, if you turn this project into a sort of incentivized Folding@home there might be some merit in terms of gamification to it. But from my current understanding of this approach it would make for an awful cryptocurrency.

Quote from: lordjulian on June 19, 2018, 12:17:50 PM

1) bootstrapping is similar to how bitcoin started, no miners, easy to control 51% of the hash power. Value of coin is low. the pioneers will have to bootstrap the chain by giving out coins, etc. or follow the footsteps of ETH. Need to do research on how these 2 started.

I was referring to the following statement:

Quote from: lordjulian on June 19, 2018, 12:17:50 PM

2) yes, authentic dataset creation is a tough problem. Dataset contribution can be constrained as follows:

only aggregate datasets are considered, i.e., 100 individual user profile pictures, contributed by 100 users. This will increase the sybil attack cost. The mining problem could be to classify the user pictures into male/female, different races, etc.

3) fake dataset if contributed by a single person, is hard to detect. That's why I suggested the above. Some one posting a dataset and solving it himself would not reap benefits, unless he mobilize an army of miners, as described earlier.

100 individual users? Users of what? Facebook? Google? Some other form of online account? How does relying on the accounts of a third party service protect against a sybill attack? How do you even know that 100 users are 100 different persons?

Identifying 100 individual users (or votes, or however you may call it) is at the core of the sybill attack problem. Not its solution.

Quote from: lordjulian on June 19, 2018, 12:17:50 PM

Quote from: HeRetiK on June 19, 2018, 01:15:37 PM

How to stop rogue clients from DDoSing the network by flooding it with wrong timestamps and turning 2 minute block intervals into 2 days or weeks or years?

By attaching a cost/fee to each broadcasted result?

So miners are paying to mine blocks?

Why would anyone pay to mine a block? How would currency be issued in a system where miners pay for mining blocks but don't receive any block rewards? Or does the fee simply get substracted from the block reward? In which case, how is it a fee if miners simply receive reduced block rewards to begin with?

monsterer2

full member

Activity: 351

Merit: 134

Quote from: lordjulian on June 19, 2018, 09:02:40 PM

Quote from: monsterer2 on June 19, 2018, 12:36:39 PM

Quote from: lordjulian on June 19, 2018, 12:02:51 PM

Proof is not subjective, but determined by a collection of results, and the one with the smallest error wins.

You are right, IPFS does open a pandora box of attack scenarios. maybe models should only be trained on datasets that meet a certain characteristics like: recency, age, size, transaction fee?

But the datasets containing the verification and the proof are external to the chain - the integrity of this external data is not guaranteed, therefore this entire process is subjective; you have to trust it is valid, which is the antitheses of cryptocurrency.

hmmm, in that case, how about putting a portion (10K samples) of the validation dataset on chain, rendering them immutable, at the start of each contest epoch.

A simple way to validate data integrity is to check the data signature hash.

You need both validation and proof data sets on chain for this to work objectively. And worse yet, they need to be generated by the blockchain itself, not some external party, otherwise you have more attack angles to worry about.

lordjulian

newbie

Activity: 15

Merit: 3

Quote from: lordjulian on June 18, 2018, 10:31:22 PM

Has anyone seen work attempted on using deep neural networks (DNN) training/validation as POW?

Sergey Surkov proposed a more in-depth breakdown of doing SGD (Stochastic Gradien Descent) on mini-batch of data as POW:

https://medium.com/@sergey_surkov/random-a-marriage-between-crypto-and-ai-1a3f4aa752cad

I believe he was the initiator of a similar thread on bitcointalk.org:

https://bitcointalksearch.org/topic/can-blockchain-train-neural-networks-2240148

lordjulian

newbie

Activity: 15

Merit: 3

Quote from: HeRetiK on June 19, 2018, 01:15:37 PM

Quote from: lordjulian on June 19, 2018, 12:17:50 PM

Evaluation of difficulty will depend on the individual miner's experience after reading the dataset's metadata and existing reported results (if any) recorded also on the block chain.

How to prevent sybill attacks on the evaluation process? Or is reading and evaluating a dataset also subject to a processing fee?

How would existing reported results help in determining the difficulty of a challenge?

What metric is used to objectively define the difficulty of a challenge to begin with?

1) the evaluation process is similar to validating the hash signature of hashcash, there is zero cost in evaluation, as long as 51% of the miners are incentivized to be honest, the evaluation will be sound.

2) existing reported results could be results of previous epochs, e.g.,

dataset 236 (with hash signature fadc432ad.... and random seed 342)

epoch, datetime stamp, winner's MSE, winning margin over runner up
1 10:33 0.233 0.2
2 10:34 0.210 0.15
3 10:35 0.20 0.12
....

3) metric = 10-fold cross validated prediction / classification error on dataset 236 using random seed 342 (random seed is used to generate the same K=10 equal sized partitions of dataset 236).

3a) will someone manipulate the metric for his benefit? yes, only if the manipulated results help his own submission, but his submission results will be validated by everyone else.

3b) at the beginning of the challenge, with zero reported results, miners can run a quick training, to get the results of the first few epochs, and get a feel of the problem difficulty.

3c) Experts in the field, e.g., computer vision, after looking at the dataset, will know how difficult the problem is.

Quote from: HeRetiK on June 19, 2018, 01:15:37 PM

Quote from: lordjulian on June 19, 2018, 12:17:50 PM

Sybill attacks can be discouraged by requiring each dataset that has a defined problem to come with an attached processing fee. X% of miners could collude to pick one fake dataset, and pretend to do work on it, where X is simply the dataset with largest number of miners (colluded). but in the end, only one winner earns the reward. But this does not prevent the remainder miners from also working on it, possibly with better algorithms or experience, and who may eventually win the competition.

How to bootstrap such a cryptocurrency if mining a block requires a processing fee? Ie. where do the first miners get their coins for paying the processing fee if no coins have yet been mined?

How to determine if a dataset is fake?

Note that when referring to sybill attacks I'm not even yet talking about the mining process. I'm talking about the dataset contribution process that happens beforehand, where no miner is yet involved.

Or are you suggesting that the miners solving the challenges should also be the ones contributing the datasets, each dataset submission being attached to a fee?

1) bootstrapping is similar to how bitcoin started, no miners, easy to control 51% of the hash power. Value of coin is low. the pioneers will have to bootstrap the chain by giving out coins, etc. or follow the footsteps of ETH. Need to do research on how these 2 started.

2) yes, authentic dataset creation is a tough problem. Dataset contribution can be constrained as follows:

only aggregate datasets are considered, i.e., 100 individual user profile pictures, contributed by 100 users. This will increase the sybil attack cost. The mining problem could be to classify the user pictures into male/female, different races, etc.

3) fake dataset if contributed by a single person, is hard to detect. That's why I suggested the above. Some one posting a dataset and solving it himself would not reap benefits, unless he mobilize an army of miners, as described earlier.

Quote from: HeRetiK on June 19, 2018, 01:15:37 PM

Quote from: lordjulian on June 19, 2018, 12:17:50 PM

This leads to another problem: if the dataset is artificially generated with a known formula or DNN by the colluder, then he already knows the answer (perfect fit model), so he could guarantee to be able to win. but he must then control enough miners
in order to make this dataset the chosen dataset for the next block, so in effect he will have to gather on average 30% of the node power in other to fake the win.

How did you reach the conclusion that 30% of computational power would be sufficient for faking a challenge win? 30% seems an awfully low threshold for maintaining security.

30% is just a random number, assuming a free form nomination process: the highest voted candidate gets 30% votes, second highest 25%, 3rd 15%, etc.

Quote from: HeRetiK on June 19, 2018, 01:15:37 PM

How to stop rogue clients from DDoSing the network by flooding it with wrong timestamps and turning 2 minute block intervals into 2 days or weeks or years?

By attaching a cost/fee to each broadcasted result?

lordjulian

newbie

Activity: 15

Merit: 3

Quote from: monsterer2 on June 19, 2018, 12:36:39 PM

Quote from: lordjulian on June 19, 2018, 12:02:51 PM

Proof is not subjective, but determined by a collection of results, and the one with the smallest error wins.

You are right, IPFS does open a pandora box of attack scenarios. maybe models should only be trained on datasets that meet a certain characteristics like: recency, age, size, transaction fee?

But the datasets containing the verification and the proof are external to the chain - the integrity of this external data is not guaranteed, therefore this entire process is subjective; you have to trust it is valid, which is the antitheses of cryptocurrency.

hmmm, in that case, how about putting a portion (10K samples) of the validation dataset on chain, rendering them immutable, at the start of each contest epoch.

A simple way to validate data integrity is to check the data signature hash.

HeRetiK

legendary

Activity: 3192

Merit: 2248

Top-tier crypto casino and sportsbook

Quote from: lordjulian on June 19, 2018, 12:17:50 PM

Evaluation of difficulty will depend on the individual miner's experience after reading the dataset's metadata and existing reported results (if any) recorded also on the block chain.

How to prevent sybill attacks on the evaluation process? Or is reading and evaluating a dataset also subject to a processing fee?

How would existing reported results help in determining the difficulty of a challenge?

What metric is used to objectively define the difficulty of a challenge to begin with?

Quote from: lordjulian on June 19, 2018, 12:17:50 PM

Sybill attacks can be discouraged by requiring each dataset that has a defined problem to come with an attached processing fee. X% of miners could collude to pick one fake dataset, and pretend to do work on it, where X is simply the dataset with largest number of miners (colluded). but in the end, only one winner earns the reward. But this does not prevent the remainder miners from also working on it, possibly with better algorithms or experience, and who may eventually win the competition.

How to bootstrap such a cryptocurrency if mining a block requires a processing fee? Ie. where do the first miners get their coins for paying the processing fee if no coins have yet been mined?

How to determine if a dataset is fake?

Note that when referring to sybill attacks I'm not even yet talking about the mining process. I'm talking about the dataset contribution process that happens beforehand, where no miner is yet involved.

Or are you suggesting that the miners solving the challenges should also be the ones contributing the datasets, each dataset submission being attached to a fee?

Quote from: lordjulian on June 19, 2018, 12:17:50 PM

This leads to another problem: if the dataset is artificially generated with a known formula or DNN by the colluder, then he already knows the answer (perfect fit model), so he could guarantee to be able to win. but he must then control enough miners
in order to make this dataset the chosen dataset for the next block, so in effect he will have to gather on average 30% of the node power in other to fake the win.

How did you reach the conclusion that 30% of computational power would be sufficient for faking a challenge win? 30% seems an awfully low threshold for maintaining security.

Quote from: lordjulian on June 19, 2018, 07:34:36 AM

This timing will have to be solidified into the protocol's design, a fixed number, e.g. 2 minutes.

If a majority picked dataset is too simple, such that it completes training (converges) before 2 minutes elapsed, then there will be no clear winner, then the next dataset is picked as the candidate dataset and everything restarts?

How to stop rogue clients from DDoSing the network by flodding it with wrong timestamps and turning 2 minute block intervals into 2 days or weeks or years?

monsterer2

full member

Activity: 351

Merit: 134

Quote from: lordjulian on June 19, 2018, 12:02:51 PM

Proof is not subjective, but determined by a collection of results, and the one with the smallest error wins.

You are right, IPFS does open a pandora box of attack scenarios. maybe models should only be trained on datasets that meet a certain characteristics like: recency, age, size, transaction fee?

But the datasets containing the verification and the proof are external to the chain - the integrity of this external data is not guaranteed, therefore this entire process is subjective; you have to trust it is valid, which is the antitheses of cryptocurrency.

lordjulian

newbie

Activity: 15

Merit: 3

Quote from: HeRetiK on June 19, 2018, 08:27:16 AM

Quote from: lordjulian on June 19, 2018, 07:34:36 AM

1) problems/datasets will be contributed by the populace. Selection of problems will also be decided by the miners; each miner pick a problem he is interested to work on, and broadcast his progress periodically. Picking simple problems risks immense competition, picking difficult problems will be easier to make progress if the miner has good hardware. The dataset picked by the majority of the network will become the dataset for sealing the next block. And competition will begin, with a check on everybody's progress every 1 minute, for example.

How is it decided whether a dataset is "simple" or "difficult"? How to prevent a sybill attack on the dataset contribution process?

Evaluation of difficulty will depend on the individual miner's experience after reading the dataset's metadata and existing reported results (if any) recorded also on the block chain.

Sybill attacks can be discouraged by requiring each dataset that has a defined problem to come with an attached processing fee. X% of miners could collude to pick one fake dataset, and pretend to do work on it, where X is simply the dataset with largest number of miners (colluded). but in the end, only one winner earns the reward. But this does not prevent the remainder miners from also working on it, possibly with better algorithms or experience, and who may eventually win the competition.

This leads to another problem: if the dataset is artificially generated with a known formula or DNN by the colluder, then he already knows the answer (perfect fit model), so he could guarantee to be able to win. but he must then control enough miners
in order to make this dataset the chosen dataset for the next block, so in effect he will have to gather on average 30% of the node power in other to fake the win.

Quote from: HeRetiK on June 19, 2018, 08:27:16 AM

Quote from: lordjulian on June 19, 2018, 07:34:36 AM

4) As mentioned above, we can just do a rain check every minute, and the best progressive learning node wins the round for that time period.

How to determine how much time has passed, ie. how to coordinate that the network checks every minute (and not any other arbitrary timeframe that may be beneficial to would-be adversaries)?

This timing will have to be solidified into the protocol's design, a fixed number, e.g. 2 minutes.

If a majority picked dataset is too simple, such that it completes training (converges) before 2 minutes elapsed, then there will be no clear winner, then the next dataset is picked as the candidate dataset and everything restarts?

lordjulian

newbie

Activity: 15

Merit: 3

Quote from: monsterer2 on June 19, 2018, 08:23:37 AM

Quote from: lordjulian on June 19, 2018, 06:48:53 AM

Quote from: monsterer2 on June 19, 2018, 03:11:34 AM

Not easily achievable IMO. For PoW, you need two characteristics:

1) The solution must apply to data available 'within' the chain
2) Any proposed solution must be easily verifiable using only the solution itself

What you're proposing fails both of these two requirements.

Agreed. Not easy to achieve. So far only prime coin does something useful, to the best of my knowledge.

1) Suppose we put 1000 different datasets on IPFS, and a public chain like GDOC randomly picks a dataset to start the deep neural net training, training can take anywhere between few seconds to few days, but the idea is to let whoever achieves the smallest error in the shortest period of time, win the bounty, we have to define the bounty as, perhaps:

achieve accuracy verifiable and 10% better than 50% of all results out there, then he walks away with the bounty

2) Verification is relatively slow (compared to SHA, perhaps 100-1000 times slower) for a DNN, but can still be done in a matter of seconds. DNN is slow to train, but validating a trained model on data is very fast (seconds for thousands of samples). But verification requires a dataset, again from IPFS. So making the dataset easily readily accessible is important.

It can't be called a 'proof of work' if said proof is subjective. Using IPFS presents all kinds of attack scenarios on the validation and task data sets.

Proof is not subjective, but determined by a collection of results, and the one with the smallest error wins.

You are right, IPFS does open a pandora box of attack scenarios. maybe models should only be trained on datasets that meet a certain characteristics like: recency, age, size, transaction fee?

HeRetiK

legendary

Activity: 3192

Merit: 2248

Top-tier crypto casino and sportsbook

Quote from: lordjulian on June 19, 2018, 07:34:36 AM

1) problems/datasets will be contributed by the populace. Selection of problems will also be decided by the miners; each miner pick a problem he is interested to work on, and broadcast his progress periodically. Picking simple problems risks immense competition, picking difficult problems will be easier to make progress if the miner has good hardware. The dataset picked by the majority of the network will become the dataset for sealing the next block. And competition will begin, with a check on everybody's progress every 1 minute, for example.

How is it decided whether a dataset is "simple" or "difficult"? How to prevent a sybill attack on the dataset contribution process?

Quote from: lordjulian on June 19, 2018, 07:34:36 AM

4) As mentioned above, we can just do a rain check every minute, and the best progressive learning node wins the round for that time period.

How to determine how much time has passed, ie. how to coordinate that the network checks every minute (and not any other arbitrary timeframe that may be beneficial to would-be adversaries)?

monsterer2

full member

Activity: 351

Merit: 134

Quote from: lordjulian on June 19, 2018, 06:48:53 AM

Quote from: monsterer2 on June 19, 2018, 03:11:34 AM

Not easily achievable IMO. For PoW, you need two characteristics:

1) The solution must apply to data available 'within' the chain
2) Any proposed solution must be easily verifiable using only the solution itself

What you're proposing fails both of these two requirements.

Agreed. Not easy to achieve. So far only prime coin does something useful, to the best of my knowledge.

1) Suppose we put 1000 different datasets on IPFS, and a public chain like GDOC randomly picks a dataset to start the deep neural net training, training can take anywhere between few seconds to few days, but the idea is to let whoever achieves the smallest error in the shortest period of time, win the bounty, we have to define the bounty as, perhaps:

achieve accuracy verifiable and 10% better than 50% of all results out there, then he walks away with the bounty

2) Verification is relatively slow (compared to SHA, perhaps 100-1000 times slower) for a DNN, but can still be done in a matter of seconds. DNN is slow to train, but validating a trained model on data is very fast (seconds for thousands of samples). But verification requires a dataset, again from IPFS. So making the dataset easily readily accessible is important.

It can't be called a 'proof of work' if said proof is subjective. Using IPFS presents all kinds of attack scenarios on the validation and task data sets.

lordjulian

newbie

Activity: 15

Merit: 3

Quote from: HeRetiK on June 19, 2018, 04:32:57 AM

monsterer2 and ETFbitcoin pretty much stated the core of the matter. To expand on what they pointed out:

1) How to provide viable problems in a decentralized manner? Just picking one up at random from a previously agreed-upon set is not enough -- who provides the set? How does the set get agreed upon?

2) Requiring the likes of Docker and Kubernetes to verify transactions adds quite a overhead for running nodes. Also this opens up the question of how the datasets are provided to validating nodes in a tamper-proof and reliable way. Additionally the datasets would increase the overhead for running nodes even further. Datasets will have a hash signature. So every competiting miner must work on the dataset with the same signature, otherwise it is a different dataset.

3) It seems like you are suggesting block times of 1 hour which, given the flak Bitcoin occasionally gets for its 10 minutes, would definitely need to get reduced if such a cryptocurrency were to gain any form of traction.

4) How to keep block times steady? How to reliably know when 1 hour has passed without having to rely on an external, centralized oracle? Traditional PoW can easily quantify how much work is to be put into a block to keep block intervals steady. Time is derived from the timestamp of said blocks, without an external time source. How to quantify how much deep-learning PoW is to be put into a block?

1) problems/datasets will be contributed by the populace. Selection of problems will also be decided by the miners; each miner pick a problem he is interested to work on, and broadcast his progress periodically. Picking simple problems risks immense competition, picking difficult problems will be easier to make progress if the miner has good hardware. The dataset picked by the majority of the network will become the dataset for sealing the next block. And competition will begin, with a check on everybody's progress every 1 minute, for example.

2) I agree, Kubernetes is a bit of an overkill, probably a Docker image running on a node is sufficient to do the POW or validation. validation should be enforced in the protocol, and given a DNN model weight file, a Dockerfile, and IPFS address to the dataset, anyone should be able to validate the results in tens of seconds. Note that testing a DNN is way faster than training it.

Training can take anywhere from seconds to days, so the goal is not to finish the training, but let the faster runner/leaper win the bounty of each round. Each round can be 1 minute, and see who beats 50% of the competition by a 10% margin in terms of 10-fold cross-validated classification validation error or Mean squared error (enforced by the protocol, with a fixed random key for the folds). and the competition will continue in the next block of 10 seconds to pick the next winner.

As DNN training converges, the margin of lead will diminish, until there is no more winner with a clear lead, than a new dataset will be selected and the next themed competition begins.

3) you are right, I should decrease the period to 1 minute, but then collecting all the reported results from the network may take time.

4) As mentioned above, we can just do a rain check every minute, and the best progressive learning node wins the round for that time period.

lordjulian

newbie

Activity: 15

Merit: 3

Quote from: ?? on ??

Using computational power for science-technology research is good idea instead only for calculating/search "meaningless" hash.
However, your idea won't work since there's no way to submit the task and verify the result without 3rd/centralized party help.

Even if you find way to make the data required for training available on network while keep decentralization, it would sacrifice decentralization/scaling since storage required for training data and computational power to verify the result would be far bigger than any PoW algorithm available today.

I share your views on doing useful work with computational power.

1) "miners" will periodically broadcast their model (very small file of weights), anyone can validate the results (with IPFS data) in a few seconds, and come up with a number/metric. No 3rd party is needed.

2) data will be submitted to a distributed repository like IPFS, monitored by a public chain, with "miners" randomly picking a dataset of their liking to train. Easy datasets are highly competitive. Hard (to learn) datasets are more challenging, so the dataset with the most number of miners picked (simple majority) could become the candidate dataset for the next block.

storage need not be centralized, as the dataset will have a hash. picking an obscure dataset to train may not win the miner any reward, so miner will monitor the broadcast stream to figure out what dataset everyone is training on.

Miners will periodically broadcast their current model result (10 fold Cross-validated results on the dataset), anyone can easily validate it. And the winner whose lead margin exceeds 50% of all miners by 10% will win the current round, and gets to seal the block.

lordjulian

newbie

Activity: 15

Merit: 3

Quote from: monsterer2 on June 19, 2018, 03:11:34 AM

Not easily achievable IMO. For PoW, you need two characteristics:

1) The solution must apply to data available 'within' the chain
2) Any proposed solution must be easily verifiable using only the solution itself

What you're proposing fails both of these two requirements.

Agreed. Not easy to achieve. So far only prime coin does something useful, to the best of my knowledge.

1) Suppose we put 1000 different datasets on IPFS, and a public chain like GDOC randomly picks a dataset to start the deep neural net training, training can take anywhere between few seconds to few days, but the idea is to let whoever achieves the smallest error in the shortest period of time, win the bounty, we have to define the bounty as, perhaps:

achieve accuracy verifiable and 10% better than 50% of all results out there, then he walks away with the bounty

2) Verification is relatively slow (compared to SHA, perhaps 100-1000 times slower) for a DNN, but can still be done in a matter of seconds. DNN is slow to train, but validating a trained model on data is very fast (seconds for thousands of samples). But verification requires a dataset, again from IPFS. So making the dataset easily readily accessible is important.

lordjulian

newbie

Activity: 15

Merit: 3

Quote from: andrew1carlssin on June 18, 2018, 11:41:38 PM

Domingos, Pedro
AAAI[/tt]

Thanks for the nice synopsis of the state-of-the-art by Domingos. A probabilisitic ensemble (evolutionists) of:

symbolism and connectionist (deep learning)

would make a formidable system, with the weights tuned by analogizers similarity approach.

An example is the chatbot, who can output useful facts if it has knowledge (rule-based symbolism, e.g., what time you get off work? where do you live? Shanghai? oh, I have a friend in shanghai working in Zhangjiang area, etc.), but can also quibble with you on pleasantries.

lordjulian

newbie

Activity: 15

Merit: 3

Quote from: MuskShing on June 19, 2018, 03:10:21 AM

which phase of GDOC that DNN would be deployed to ?
how much&what impacts would be come out after DNN's deployment for GDOC?

We are thinking of incorporating it into the POW itself, but this is still a thought.

HeRetiK

legendary

Activity: 3192

Merit: 2248

Top-tier crypto casino and sportsbook

monsterer2 and ETFbitcoin pretty much stated the core of the matter. To expand on what they pointed out:

1) How to provide viable problems in a decentralized manner? Just picking one up at random from a previously agreed-upon set is not enough -- who provides the set? How does the set get agreed upon?

2) Requiring the likes of Docker and Kubernetes to verify transactions adds quite a overhead for running nodes. Also this opens up the question of how the datasets are provided to validating nodes in a tamper-proof and reliable way. Additionally the datasets would increase the overhead for running nodes even further.

3) It seems like you are suggesting block times of 1 hour which, given the flak Bitcoin occasionally gets for its 10 minutes, would definitely need to get reduced if such a cryptocurrency were to gain any form of traction.

4) How to keep block times steady? How to reliably know when 1 hour has passed without having to rely on an external, centralized oracle? Traditional PoW can easily quantify how much work is to be put into a block to keep block intervals steady. Time is derived from the timestamp of said blocks, without an external time source. How to quantify how much deep-learning PoW is to be put into a block?

monsterer2

full member

Activity: 351

Merit: 134

Quote from: lordjulian on June 18, 2018, 10:31:22 PM

We at GDOC (Global Data Ownership Chain) are contemplating this approach, but would like to solicit inputs from the powers that may be.

Thank you for sharing your ideas and feedback.

Not easily achievable IMO. For PoW, you need two characteristics:

1) The solution must apply to data available 'within' the chain
2) Any proposed solution must be easily verifiable using only the solution itself

What you're proposing fails both of these two requirements.

MuskShing

newbie

Activity: 8

Merit: 0

which phase of GDOC that DNN would be deployed to ?
how much&what impacts would be come out after DNN's deployment for GDOC?

andrew1carlssin

jr. member

Activity: 168

Merit: 3

#Please, read:Daniel Ellsberg,-The Doomsday *wk

with back propagation we need lots of pull/push weight ...

quick recap

For symbolists, all intelligence can be reduced to manipulating symbols, in the same way that a mathematician solves equations by replacing expressions by other expressions. Symbolists understand that you can’t learn from scratch: you need some initial knowledge to go with the data. They’ve figured out how to incorporate preexisting knowledge into learning, and how to combine different pieces of knowledge on the fly in order to solve new problems. Their master algorithm is inverse deduction, which figures out what knowledge is missing in order to make a deduction go through, and then makes it as general as possible. For connectionists, learning is what the brain does, and so what we need to do is reverse engineer it. The brain learns by adjusting the strengths of connections between neurons, and the crucial problem is figuring out which connections are to blame for which errors and changing them accordingly. The connectionists’ master algorithm is backpropagation, which compares a system’s output with the desired one and then successively changes the connections in layer after layer of neurons so as to bring the output closer to what it should be. Evolutionaries believe that the mother of all learning is natural selection. If it made us, it can make anything, and all we need to do is simulate it on the computer. The key problem that evolutionaries solve is learning structure: not just adjusting parameters, like backpropagation does, but creating the brain that those adjustments can then fine-tune. The incomplete, and even contradictory information without falling apart. The solution is probabilistic inference, and the master algorithm is Bayes’ theorem and its derivates. Bayes’ theorem tells us how to incorporate new evidence into our beliefs, and probabilistic inference algorithms do that as efficiently as possible. For analogizers, the key to learning is recognizing similarities between situations and thereby inferring other similarities. If two patients have similar symptoms, perhaps they have the same disease. The key problem is judging how similar two things are. The analogizers’ master algorithm is the support vector machine, which figures out which experiences to remember and how to combine them to make new predictions. Domingos, Pedro AAAI

lordjulian

newbie

Activity: 15

Merit: 3

Has anyone seen work attempted on using deep neural networks (DNN) training/validation as POW?

The basic idea is to let everyone solve a randomly picked (hashed) useful but difficult machine learning problem, which is in continuous supply, e.g., detecting top-1000 wanted fugitives in hundreds of thousands of live streaming public video footage.

POW can be in the form of DNN model optimization, where done work is submitted in the form of a Docker file plus a data model file containing the neural network trained weights and configuration file, that can be validated by anyone using the agreed-upon training dataset and validation methodology and running Docker or Kubernetes.

The lowest achieved 10-fold Cross-Validated error that has not been surpassed within a fixed period of say 1 hours, will be confirmed as the winner of POW, and vested to package transactions into the next block. This is reminiscent of the netflix challenge, except that here the train/test data is open.

Advantage of this approach:

1. ASIC resistant, because DNN are too varied and complex, and requires a full docker image to compute/deploy
2. Achieve a greater good, doing useful work for humanity
3. Promote machine learning / AI

We at GDOC (Global Data Ownership Chain) are contemplating this approach, but would like to solicit inputs from the powers that may be.

Thank you for sharing your ideas and feedback.

Topic: POW via training/validating Deep Neural Networks (Read 402 times)