Author

Topic: SelectCoins algorithm (Read 1636 times)

legendary
Activity: 1428
Merit: 1093
Core Armory Developer
November 06, 2011, 10:28:08 AM
#12
So, after spending a couple days battling my own SelectCoins algorithm, I came up with a decent idea for how to do this without thinking too hard, but still maintain the ability to expand it, later.  The magic is in defining the following method:

      uint32_t ScoreSelectCoinsOutput(vector selectedCoins, uint64_t targetVal, uint64_t minFee)

The input to this function is a coin-selection, a target output value and a fee.  By writing a method that assigns a score to a selection, you you can then randomly sort your input list 100 times, pick the top X inputs that satisfy your target+fee, and then keep the one with the best score.  In my code, I am slowly adding smarter solutions and phasing out the randomly selected ones, but if you pick a high enough sample size, you're bound to get at least one good solution in pot even with nothing but random selections in the search:

Code:
vector finalSelection;
uint32_t bestScore = UINT32_MAX;
for(int i=0; i<100; i++)
{
   random_shuffle(unspentTxOutList.begin(), unspentTxOutList.end())
   uint64_t sum = 0;
   uint32_t index=0
   vector selection;
   while(sum < target+fee)
   {
      selection.push_back(selection[index]);
      sum += selection[index].getValue();
      index++
   }
   uint32_t score = ScoreSelectCoinsOutput(selection, target, fee);
   if( score < bestScore)
   {
      finalSelection = selection;
      bestScore = score;
   }
}
// Now you have a pretty good selection!

Then, the entire SelectCoins algorithm is based on how you "score" a selectCoins solution.  In my code, I evaluate a bunch of different "selection factors", such as how many input addresses it combines, if it includes zero-confirm txs, how many bytes, how obvious it is which output is change/recipient, etc.  I make sure that each of these factors spits out a number between 0 and 1 (with 1 being favorable), and then multiply each score by a weight:

Code:
WEIGHT_NOZEROCONF   = 1000
WEIGHT_ALLOWFREE  = 100
WEIGHT_NUMADDR    =  100
WEIGHT_TXSIZE     = 50
WEIGHT_PRIORITY   =  60
WEIGHT_OUTANONYM  =  30

The weights are configurable (and completely arbitrary--only significant relative to each other), and gives you the option to modify them for your use-case:  you can make "WEIGHT_NUMADDR=10000" and "WEIGHT_OUTANONYM=10000" to try to max-out anonymity at any expense.  
legendary
Activity: 1428
Merit: 1093
Core Armory Developer
October 24, 2011, 10:20:02 PM
#11
Here's a naive implementation that I bet would work well in practice:
+ Sort potential inputs by priority (priority is #confirmations * amount)
+ Use the N highest-priority coins whose sum >= amount needed

I know you just threw this idea out there off the top of your head so this isn't a real criticism, I'm just expanding on your initial idea.  I think this would severely neglect your outputs with tiny values--after some point in time, you're going to be left with thousands of tiny inputs... of which you're going to have to use many dozen to execute subsequent transactions.   For example:  if you have an output in your wallet for 0.0001 BTC, you're going to have to wait 1,440,000 blocks for it to accrue the same priority as a 1 BTC output that is 1 day old.

I started investigating this with the idea that maybe I can even it out a bit by simply modifying the BTC amount before multiplying by numConfirm.   After playing with this in a spreadsheet, I see this alone won't solve the entire problem, but perhaps a combination could.  Here's three options:


("days til maturity" is the time until the BTC amount achieves the same prioritization as 1.0 BTC after 1 day)

The techniques are sorted from steepest to shallowest (left to right).  I was just making up equations that I know will guarantee maturity of all outputs before the sun runs out of energy.   Personally, I think the cube-root (technique 1) has a lot of promise, making sure that your tiny TxOuts will eventually get spent.  I'm going to consider trying Gavin's idea, but applying technique 1 before sorting the values, and occasionally throw in the top value from technique 2 or 3, to make sure I'm not letting too many tiny inputs accumulate.

administrator
Activity: 5222
Merit: 13032
October 19, 2011, 01:10:05 AM
#10
One more I was just thinking about: when selecting coins to create a new transaction, does it make sense to use transactions which are not yet in a block?

Only if you're sure that the transaction you're spending will eventually be included in a block. Bitcoin spends 0-confirmation transactions that it has created, but never 0-conf transactions from other people.

These transactions will have very low (maybe 0) priority, but this will increase as the inputs age.

Quote from: bitcoinandroid
Similarly, does the official client clear out orphan transactions after some period if they haven't made it into a block?

No, but that would be nice.
newbie
Activity: 37
Merit: 0
October 19, 2011, 12:31:13 AM
#9
Good point - the test of "would it fork the blockchain" makes sense.  Not sure if the wiki is the right place.

This forum has been super helpful as a newer developer trying to understand the pieces better.

One more I was just thinking about: when selecting coins to create a new transaction, does it make sense to use transactions which are not yet in a block?  Gavin's naive implementation above deemphasizes these, but in a worst case I'm guessing it is still ok to include a tx not in a block because the protocol rules will prevent the double spend?

Similarly, does the official client clear out orphan transactions after some period if they haven't made it into a block?

I very much appreciate the help.  It's taking me a long time to wrap my head around it but I feel like I'm slowly getting there.
staff
Activity: 4284
Merit: 8808
October 19, 2011, 12:02:00 AM
#8
This is a big request, but I'll throw it out there anyway.

Having some sort of writeup like this: https://en.bitcoin.it/wiki/Protocol_rules
but for the process of creating a new transaction could be useful.

It seems like there are a few pitfalls to be aware of with fees, selecting coins, etc.

But it's not a protocol rule (well, maximum txn size is I guess but thats already there)... this is more an issue of common norms on the network today, enforced by peers sure— but its not part of the distributed algorithm, nodes can and do disagree on it and it doesn't fork bitcoin.

Perhaps best practices should be described there, but just like the protocol rules don't tell you not to perform undefined operations in C I don't know that these other recommendations belong there.



newbie
Activity: 37
Merit: 0
October 18, 2011, 11:39:52 PM
#7
This is a big request, but I'll throw it out there anyway.

Having some sort of writeup like this: https://en.bitcoin.it/wiki/Protocol_rules
but for the process of creating a new transaction could be useful.

It seems like there are a few pitfalls to be aware of with fees, selecting coins, etc.
staff
Activity: 4284
Merit: 8808
October 18, 2011, 11:26:57 PM
#6
I am trying to understand the SelectCoins function (at least at a high level):
It computes an approximate solution to an one-dimensional knapsack problem. In textbooks it is typically posed as finding a maximum lower than the size of the knapsack (to minimize the empty space). Here it is inverted: the goal is to find minimum higher than the size of the knapsack (to minimize the change required).

Whatever you do, don't reinvent the wheel, there's a rich literature on this problem.

Well, ideally (but not in the official bitcoin today) it's complicated by an additional integer-linear constraint:  You want the priority to be greater than the fee minimum unless there isn't a feasible solution with that constraint.

You also want to avoid creating change less than 0.01 btc, all things equal, unless you have a real output smaller than 0.01 btc or if meeting this constraint causes you to fail the prior one. (in which case, abandoning this one is better)


newbie
Activity: 37
Merit: 0
October 18, 2011, 10:54:17 PM
#5
Great info, thanks guys!

Yep I remember the knapsack problem from back in the day.  Looks like there are some implementations here:
http://rosettacode.org/wiki/Knapsack_problem/0-1

Gavin, your naive implementation seems like a great start.  I was a little confused why you used num confirmations * amount as the priority, but after reading this (technical info at the bottom) it makes more sense: https://en.bitcoin.it/wiki/Transaction_fees

And yes, at this point I'm optimizing for ease of implementation.

I'm slowly understanding more of it, thanks guys!
legendary
Activity: 2128
Merit: 1073
October 18, 2011, 01:08:15 PM
#4
I am trying to understand the SelectCoins function (at least at a high level):
It computes an approximate solution to an one-dimensional knapsack problem. In textbooks it is typically posed as finding a maximum lower than the size of the knapsack (to minimize the empty space). Here it is inverted: the goal is to find minimum higher than the size of the knapsack (to minimize the change required).

Whatever you do, don't reinvent the wheel, there's a rich literature on this problem.
legendary
Activity: 1652
Merit: 2301
Chief Scientist
October 18, 2011, 08:29:29 AM
#3
What are you optimizing for?  Ease of implementation? Wallet size?

Here's a naive implementation that I bet would work well in practice:

+ Sort potential inputs by priority (priority is #confirmations * amount)

+ Use the N highest-priority coins whose sum >= amount needed

If you want to optimize for fragmentation and/or paying of fees, then also do:

+ If the change transaction is larger than some threshold amount, then split it in half (or maybe split it into Y change outputs, each of which is about the size of the threshold amount).

+ If the change transaction is small and there are other small-valued/low-priority inputs available, add a few small-value inputs to merge them together.

You could also optimize for privacy (try to avoid using more than one input and/or always create multiple change outputs), or tweak the above rules to try to always avoid fees...
staff
Activity: 4284
Merit: 8808
October 18, 2011, 01:55:04 AM
#2
1. find all tx outputs that meet the following conditions
 - sent to me (have a hash160 belonging to a key in my wallet)
 - have not been spent yet
 - are in a confirmed block
2. sort the outputs smallest to largest amount
3. for each output
  - create an input that references it and add to current_amount
  - break if current_amount > amount i want to send
4. create one output (change) sending the difference back to me at a new address

naive indeed!   This will tend to generate _much_ larger transactions that required and will pretty much not work on many wallets.

Consider what happens when someone rains a bunch of 1e-8 inputs on you and then you try to send 1BTC ... your resulting TXN will be too big and will be either need enormous fees and may be ignored by your peers and the miners even with fees.

The subset sum solver isn't just an optimization. It's an important optimization.

(The actual coin selection in bitcoin could be a lot better too... right now its fee blind and will tend to produce solutions that pay fees which could be avoided… the only real defense against this is that pre-filtering that makes it first try inputs with many confirms, unfortunately good solvers for big integer programming problems aren't small pieces of code)


newbie
Activity: 37
Merit: 0
October 18, 2011, 01:23:08 AM
#1
Hi,

I am trying to understand the SelectCoins function (at least at a high level):
https://github.com/bitcoin/bitcoin/blob/master/src/wallet.cpp#L746

It looks like it has a bunch of optimizations, but I'm wondering if something simple like this would work as a naive implementation:

1. find all tx outputs that meet the following conditions
 - sent to me (have a hash160 belonging to a key in my wallet)
 - have not been spent yet
 - are in a confirmed block
2. sort the outputs smallest to largest amount
3. for each output
  - create an input that references it and add to current_amount
  - break if current_amount > amount i want to send
4. create one output (change) sending the difference back to me at a new address

Does something like this make sense or am I missing any pieces?  Also, does it make sense to use the smallest coins first to prevent fragmentation or is there a better way?

Thanks!  Haven't seen a good writeup on this elsewhere, but if I missed it feel free to link me to it.
Jump to: