1. Separate transaction processing from block generation, so transactions can be confirmed by consensus quickly without being held back by verification of node uptime for block generation.
2. Divide fees proportionally among all nodes according to individual weights for each block instead having one node take all for the block.
3. Have nodes confirm each others' timestamps and transactions by iterative random sample with consensus.
a. first sample, 10 for example must have 100% consensus. For a sample size of 10 with 10% bad nodes on network, odds of fail are .1^10 =0.00000001% or 1/10 billion.
b. if first sample is not 100% consensus, second larger sample is taken, for example, 100, requiring 90% consensus. For a sample size of 100 with 10% bad nodes on the network, odds of a fail with 90 or more nodes being bad are effectively 0 (my calculator calculates the sum of the probabilities of 90 through 100 nodes being bad as 0. This may continue through further iterations with increasing numbers of nodes and lower consensus requirements as a failsafe against network attacks by floods of bad nodes.
(for number of nodes N is much larger than sample size, using Binomial Distribution as approximation instead of sampling without replacement, http://en.wikipedia.org/wiki/Binomial_distribution)
c. The above sample sizes can be adjusted so probability of step a failing is equivalent to probability of step b failing.
4. flag nodes with consistently disagreeing results as bad
a. ignore in confirmations
b. alert owners node is malfunctioning
c. maybe auto-restart node when flagged as bad
d. too many flags and restarts blacklists node permanently.
5. Since fees are split proportionally, block may be forged randomly with highest difficulty to select valid forks as chains having highest difficulty.
Overall, this amounts to a quality and reliability engineering approach to crypto.
Ideally, the weight should reflect the quality of a node. Currently the metrics are its reliability as expressed in uptime and its activity in the amount it spends. More metrics may be added, such as the speed and accuracy of its results in processing transactions and reaching consensus, the accuracy of its time stamping, and the speed with which it responds to queries from its peers for consensus. The random sampling of nodes may be dynamically weighted to sample nodes of known higher quality more frequently than nodes of lower or unknown quality to improve the reliability of the network.
The probabilistic model is a reliability problem. The required reliability is determined as length of time the network must run with an acceptable probability of failure. If it does fail, it must do so in a graceful manner without catastrophic results, with full recovery possible. In crypto, a graceful failure is degradation of performance, less graceful is a full halt with no data or transaction loss or error. Least graceful is error requiring rollback of the blockchain for recovery. Any fails beyond this are unacceptable as they would be catastrophic, and even a rollback must be avoided.
To anticipate and avoid problems, a Failure Mode and Effects Analysis (FMEA - http://en.wikipedia.org/wiki/Failure_mode_and_effects_analysis) will be needed to identify all possible failure modes and what effects they would have on network reliability and its users. The parameters of the probabilistic iterations may be adjusted to achieve the required reliability. It is what is known in reliability engineering as a standby redundant system, where if a primary module fails, a secondary takes its place, and further iterations of consensus act as additional standby redundancy.
Thanks for the feedback,
Brian