Author

Topic: How do miners solve the knapsack problem? (Read 269 times)

legendary
Activity: 2870
Merit: 7490
Crypto Swap Exchange
July 27, 2024, 04:55:21 AM
#18
F2pool, Ocean pools are going to be outliers here, simply put, F2pool censorship transactions for political agenda, Ocean censors ordinals/BRC20 tokens, mempool has a similar and better looking indicator called health, here is an example
I thought MARA was the only one, disappointing to see others following in their footsteps as well.

And it's been analyzed by 0xB10C last year on https://b10c.me/observations/08-missing-sanctioned-transactions/.

It's very interesting question. Aside from fee rate, miner also need to ensure that total sigops cost of the block isn't above 80000[1]. Otherwise, they may accidentally create invalid block which happened to F2Pool on last year[2].

[1] https://github.com/bitcoin/bitcoin/blob/v27.1/src/consensus/consensus.h#L16-L17
[2] https://bitcoin.stackexchange.com/q/117837

F2pool probably doesn't use the default getblocktemplate from core, because BlockAssembler does indeed track the number of sigops for the transactions it includes in the block, more details could be found in validation.cpp

It's the first time I hear about this incident but I am not surprised because F2pool censorship transactions based on OFAC list, which means, they did alter the code that creates their blocks, so somewhere along the line their devs failed, and ended up screwing things massively, good for them -- they deserve it.

While looking for reference link while replying another post, i also found it actually happened twice https://b10c.me/observations/11-invalid-blocks-783426-and-784121/. It's crazy they didn't update their software immediately after 1st incident.
staff
Activity: 4284
Merit: 8808
One of the first commits leading to the new 'cluster based' mempool algorithm was merged today: https://github.com/bitcoin/bitcoin/pull/30126
member
Activity: 97
Merit: 43
To optimize their profits, do miners use software like Gurobi or other LP-solvers to solve the knapsack problem and select the most profitable transactions?
I'm only talking about honest miners who only want to maximize their profits.
Miners always want to maximize their income from mining activities and want to get profit from mining, not a draw or loss from mining. If they can not get profit, they will capitulate.

https://www.bitcoinmagazinepro.com/charts/hash-ribbons/

Miners as solo miners can be not too technical, unprofessional but mining pools are professional and they will have tools to automate their works. Nowadays, there is little space for solo miners and most Bitcoin miners participate in Bitcoin mining pools, with automatic tools to filter and pick transactions to maximize their income from transaction fees.

Bitcoin mining pools have their own Bitcoin full nodes with customized settings like a setting on minimum transaction fee rate.
https://learnmeabitcoin.com/technical/mining/memory-pool/#settings
staff
Activity: 4284
Merit: 8808
I understand where you're coming from. But I'm wondering if there are any benefits to running a solver after having a solution given by getblocktemplate; concurrently, we can solve for a better solution while it is running. Hence, the tradeoff is minimal, if you get a better solution from your lp, then use it. Else, just use whatever is provided by getblocktemplate.

Part of my interest is also seeking to narrow the search space. I understand the entire mempool is far too big,

It's not too big.  Even a mempool with 100 blocks of backlog will only have on the order of 200k transactions.  A standard open source ILP solver will find an optimal solution quite quickly.

Not quick enough for relay/eviction decisions or for the very first template of a new block-- but quickly.

All the information needed is in the getrawmempool true  output-- well except for the sigops counts because who the hell knows, but it's a one line patch to add it. (getblocktemplate does give you the sigops, but doesn't give you the whole mempool)
 
The LP is simple: there is a set of binary inclusion variables,  you maximize the sum of the tx fees times their inclusion variables, subject to their weights times the inclusions variables being under the weight limit and their sigops times inclusion being under the sigops limit and subject to the constraints that any transactions inclusion variable is >= that of all its ancestors.  

But the result is that except during big sustained backlogs it hardly increases income, and even there it only produces one shot income for being smart enough to take some transactions that are failing to make it in.

Aside, when I looked almost all the income gains came from "grandchildren pay for grandparents" -- that there was some set of low feerate transaction where the sum total of their children made them quite attractive even though no particular child did so.  Actually more tightly packing the block hardly mattered because there were a lot of options with all about the same feerate just past the limit,  and like jus totally failing to fit a transaction only costs you a tiny amount. (like some random missed transaction is 800 weight out of a block of 4 million, -- who cares?)

Quote
I don't know if any pool uses LP solver post/prior to getblocktemplate
When I cheked none were, you can tell because they weren't taking the low hanging fruit the solver was finding.

Quote
i'd assume the safe option is to do it before and not after since getblocktemplate will do the verifications needed to ensure your block gets accepted by the network, your LP may add a transaction that would render your block invalid,

You can ask GBT to validate a candidate block, ignoring the invalid POW.

Miners do dick around with their txn selection and have manged to make invalid blocks many times because of it. ... but they're not doing anything so clever as to run a LP solver, or at least weren't when I checked a couple years ago.
legendary
Activity: 2394
Merit: 6581
be constructive or S.T.F.U
What I wanted to highlight was actually the size, because it is consistently 99.94/99.95% for F2Pool and less for others. The way that they're selecting the transactions is probably better in terms of quantity, or they have another pushtx utility. The txes being pushed are quite sporadic and I don't see any within the day or two so I'm thinking it might be the former.

Using Mempool health indicator I collected the following:

Kano.is = 99.92%
Cksolo = 99.15%
Viabtc = 98.91%
Binance pool = 98.33%
F2pool = 98.66%
Antpool = 98.53%
Foundry = 98.96%
Oceanpool = 87.66%
Poolin = 98.23%
Braiin = 98.71%
Spiderpool = 96.35%

Not much conclusion can be drawn from these numbers, but we know Kano and CK pools are never transaction-biased and just use GBT, the rest we just can't tell, none of the pool operators' owners talk to us here, in fact, you won't even know the things that you are supposed to know like how they count FPPS/PPS+, but ya, some of the results may be affected by the fact that they are using something else other than the default core GBT, which doesn't seem to be working great judging by the figures. Cheesy
legendary
Activity: 3038
Merit: 4418
Crypto Swap Exchange
F2pool, Ocean pools are going to be outliers here, simply put, F2pool censorship transactions for political agenda, Ocean censors ordinals/BRC20 tokens, mempool has a similar and better looking indicator called health, here is an example
I thought MARA was the only one, disappointing to see others following in their footsteps as well.
Other pools that have acceleration services like Viabtc will also have some bad selections for obvious reasons (I talked about how Viabtc scam miners in the past), you also have propagation delay issues, so the perfect set of transactions isn't always the same on all nodes, your node may be seeing 10 good paying transactions that mine doesn't see at the time of solving a block but you would think that I missed on adding them because my LP solver is bad.
Yep, that is amongst other things like directly pushing TXes to mining pools (some of them are including non-standards). Another good consideration is the delay between the block being mined and the refreshing of the merkle root; merkle root isn't refreshed in realtime and I presume there are instances where new transactions are seen after the block is mined.

I don't know if any pool uses LP solver post/prior to getblocktemplate, i'd assume the safe option is to do it before and not after since getblocktemplate will do the verifications needed to ensure your block gets accepted by the network, your LP may add a transaction that would render your block invalid, I doubt that any pool is willing to risk a whole a block just to add a few sats to a 6-7BTC block, but as you mentioned earlier when fees make up a majority of the total rewards it would be worth the effort and the risk.
Possibly not actually. But given how many invalid transactions were being mined by pools previously, I wouldn't be surprised if they're evaluating and building block templates themselves. Setting the rigorous limits and test cases should mitigate this.

What I wanted to highlight was actually the size, because it is consistently 99.94/99.95% for F2Pool and less for others. The way that they're selecting the transactions is probably better in terms of quantity, or they have another pushtx utility. The txes being pushed are quite sporadic and I don't see any within the day or two so I'm thinking it might be the former.

legendary
Activity: 2394
Merit: 6581
be constructive or S.T.F.U
Interestingly, while searching to see the discrepancy between mining pools and GBT, I found this site by 0xb10c: https://miningpool.observer/template-and-block. It definitely doesn't tell the full story due to a myriad of factors affecting the selection of transactions but some of the pools are definitely selecting transactions themselves: High discrepancy between GBT and the actual mining pool, F2Pool has an average of 99.95% compared to the rest which averages around 99.85% with regards to total capacity.

F2pool, Ocean pools are going to be outliers here, simply put, F2pool censorship transactions for political agenda, Ocean censors ordinals/BRC20 tokens, mempool has a similar and better looking indicator called health, here is an example

Other pools that have acceleration services like Viabtc will also have some bad selections for obvious reasons (I talked about how Viabtc scam miners in the past), you also have propagation delay issues, so the perfect set of transactions isn't always the same on all nodes, your node may be seeing 10 good paying transactions that mine doesn't see at the time of solving a block but you would think that I missed on adding them because my LP solver is bad.

I don't know if any pool uses LP solver post/prior to getblocktemplate, i'd assume the safe option is to do it before and not after since getblocktemplate will do the verifications needed to ensure your block gets accepted by the network, your LP may add a transaction that would render your block invalid, I doubt that any pool is willing to risk a whole a block just to add a few sats to a 6-7BTC block, but as you mentioned earlier when fees make up a majority of the total rewards it would be worth the effort and the risk.
legendary
Activity: 3038
Merit: 4418
Crypto Swap Exchange
other miners who don't have that set ready would likely reach a new set of "best" combination before you do, because you added extra steps to the process, besides, new transactions keep coming in every second so you are always going to be rotating transactions, so preparing anything before hand is unlikely going to work, and again, given that most bitcoin transactions have the same size, a cutting edge method to chose transactions isn't really needed, maybe at some point in the future as you mentioned.
I understand where you're coming from. But I'm wondering if there are any benefits to running a solver after having a solution given by getblocktemplate; concurrently, we can solve for a better solution while it is running. Hence, the tradeoff is minimal, if you get a better solution from your lp, then use it. Else, just use whatever is provided by getblocktemplate.

Part of my interest is also seeking to narrow the search space. I understand the entire mempool is far too big, but considering the tip of the mempool might have a slower rate of growth, which means we don't have to run a solver too often. Since considering less fee rate transactions would most likely not yield a better maxima. Also, the need of implementing it is definitely far from necessary, since difference is likely to be negligible.

Interestingly, while searching to see the discrepancy between mining pools and GBT, I found this site by 0xb10c: https://miningpool.observer/template-and-block. It definitely doesn't tell the full story due to a myriad of factors affecting the selection of transactions but some of the pools are definitely selecting transactions themselves: High discrepancy between GBT and the actual mining pool, F2Pool has an average of 99.95% compared to the rest which averages around 99.85% with regards to total capacity.


Most of them are just a hypothesis and I think experimenting and simulating it might or might not tell a different story.

legendary
Activity: 2394
Merit: 6581
be constructive or S.T.F.U
The way that miners work now is that they will build an empty block upon receiving a new block, before consolidating the transactions together.

Not all pools do that, some of them publicly disclose that they don't and they prove that by on-chain data, others, never talked about but still never produced an empty block despite solving a huge portion of the blocks, furthermore, until you do download the previous block and remove the included transaction from your mempool you have no way of knowing which transactions are to be removed, so it's not like you can prepare a set of transactions before you receive a block and then just use those transactions, in fact, that might even be more time and resources consuming.

Imagine an available set of transactions as follows:

A,B,C,D,E,F,G,H

now before you find a block you run whatever algo that picked A,B,C for you, so now you prepare that set of transactions and wait till you hit a block, if it's you who find a block -- all great you just add that set, if it's someone else, you would need to re-do the set, now imagine you get a new block notification and start mining a block, the block you received has B in it, so now you would have to do two operations

1- remove B from the set
2- find a replacement for B that matches A and C best

other miners who don't have that set ready would likely reach a new set of "best" combination before you do, because you added extra steps to the process, besides, new transactions keep coming in every second so you are always going to be rotating transactions, so preparing anything before hand is unlikely going to work, and again, given that most bitcoin transactions have the same size, a cutting edge method to chose transactions isn't really needed, maybe at some point in the future as you mentioned.
staff
Activity: 4284
Merit: 8808
Lets ignore sigops for a moment, as they're usually irrelevant anyways.

Lets also ignore dependencies for a moment (which are always significant), so lets just assume all transactions are just spending distinct confirmed outputs.

There is a simple greedy solution:  Order transactions by feerate, take the highest feerate first.  If the sizes of the transactions line up with the maximum block size such that the last transaction you add exactly fills the block then this trivial algorithm gives the optimal solution.

Now, of course, you seldom will fill exactly-- so how bad can this algorithm do?  Well in the worst case that as soon as one transaction won't fit you just stop then this algorithm would waste the worst feerate included times the amount of weight left behind.  The maximum amount of weight left behind is the maximum size for a transaction.  In practice a greedy solution will be somewhat better as instead of terminating you'll continue to insert the next best that still fits.  This won't be necessarily optimal because you might have been better off going back and skipping a transaction in order to better fit multiple lower feerate txn, but it'll still never be worse that just terminating a max size txn from the end.

So given that feerates are kinda low at the end of blocks and the maximum transaction size is small relative to the size of a block, the worst case approximation error of this algorithm is not so bad.

Reality isn't quite so kind, however, because transaction dependencies that we ignored above actually do exist.

So now lets consider those:

It turns out that if you have a set of dependent transactions (a "partial order knapsack"), finding the optimal order to include them appears to take potentially exponential time,  though many cases can be solved quickly.   (By potentially exponential I mean that the bitcoin core devs have an algorithim that produces an optimal ordering, and that algorithm has an exp-time worst case... but AFAIK they don't have a proof that there is no polytime solution to the specific problem of finding the best order).

If you take the set of transactions and divide them up into independent clusters and then find  and optimal inclusion order for each cluster,  you can get an optimal order (in the same sense the dumb greedy algorithm is optimal-- that it is the optimal if the length just happens to line up exactly, and has a worse case bound by the left-out cluster) for the whole block by round-robin taking best feerate part from each independent cluster. ... but in this case the worst case approximation error is the bottom feerate times the maximum weight of a cluster which is obviously less tight than the independent case because while transactions are limited to a portion of the block a cluster could be bigger than a whole block.  Usually not, however, so in practice this approach gives pretty good solutions.

The current behavior in Bitcoin core mostly works like the initial greedy algorithm mentioned above, but there is some special handling (called 'ancestestor fee rate') intended to get better results when a child transaction effectively pays for its parents.  There is a bunch of recent ongoing development to move to the second approach I described, which is called cluster mempool.

Going back to the OP's question,  you absolutely can use an integer linear programming solver to pack blocks, including all the details like the sigops limit.  A couple years back I made a little script that read the mempool from rpc and fed it to the coin-or CBC tool, and in my testing it would yield a proven optimal solution in a half second or so.    That sounds kinda fast but really the node solving needs to be quite a bit faster, though perhaps most of my time was spent just doing IO.

In terms of the gain,  my testing showed that sometimes it produced a fair bit of additional fees, like a quarter bitcoin.  But extending the income difference to multiple blocks is hard and the results can be misleading:  E.g. my tool would show block after block that 0.2 extra btc could be earned.  But that wasn't true because the gain came from some transactions that were left out over and over again, as soon as a single miner picked up that excess it would be gone.   The actual amount of additional fees that could be obtained from a smarter solver is, in fact, quite small (zero when there isn't a big backlog, in fact).... which probably explains why no one was doing it at the time.

So why is Bitcoin core bothering with the cluster mempool stuff?  Because the same mempool logic needs to be used for transaction relay and eviction so that the network isn't wasting bandwidth relaying stuff that won't get mined but does relay all that will get mined.  And for that it's extremely important that the solutions be incremental and fast,  running a half second ILP solver for every single txn that gets relayed won't fly.

Smiley
legendary
Activity: 3038
Merit: 4418
Crypto Swap Exchange
Knapsack problem is basically ensuring that you maximize the items with most value. Miners can maximize profits straightforwardly by choosing transactions with the highest fees subjected to size.
That would be the Greedy Approach, which rarely grants you a local maxima. The issue with knapsack lies predominantly with the size of each item that is supposed to be packed in a bag of a certain size. Hence, the problem of which to pack becomes a key decision to make. In operations research, we usually formulate it as a DP approach which would probably gain better results than Greedy.
Using complex and sophisticated LP solvers will actually just slow down a miner which is not ideal.
Transaction selection can be done concurrently, and this is independent of the miners because blocktemplates are not built by them. The way that miners work now is that they will build an empty block upon receiving a new block, before consolidating the transactions together. An easy way to think of a profit maximization would be to attempt to increase the value by selecting another set of transactions on-top of the Greedy method. As fees becomes the main revenue for miners, I expect marginal gains to be the priority.

LP solvers are actually sufficiently fast. However, if they are slow because you have too many items to consider, then you can narrow your search size. Given how much the size varies, majority of your set of items lies within the first Xmb of the tip, which probably becomes less than 10,000 transactions.
full member
Activity: 2520
Merit: 214
Eloncoin.org - Mars, here we come!
To optimize their profits, do miners use software like Gurobi or other LP-solvers to solve the knapsack problem and select the most profitable transactions?
Knapsack problem is basically ensuring that you maximize the items with most value. Miners can maximize profits straightforwardly by choosing transactions with the highest fees subjected to size.

Using complex and sophisticated LP solvers will actually just slow down a miner which is not ideal.
legendary
Activity: 2394
Merit: 6581
be constructive or S.T.F.U
It's very interesting question. Aside from fee rate, miner also need to ensure that total sigops cost of the block isn't above 80000[1]. Otherwise, they may accidentally create invalid block which happened to F2Pool on last year[2].

[1] https://github.com/bitcoin/bitcoin/blob/v27.1/src/consensus/consensus.h#L16-L17
[2] https://bitcoin.stackexchange.com/q/117837

F2pool probably doesn't use the default getblocktemplate from core, because BlockAssembler does indeed track the number of sigops for the transactions it includes in the block, more details could be found in validation.cpp

It's the first time I hear about this incident but I am not surprised because F2pool censorship transactions based on OFAC list, which means, they did alter the code that creates their blocks, so somewhere along the line their devs failed, and ended up screwing things massively, good for them -- they deserve it.
legendary
Activity: 2870
Merit: 7490
Crypto Swap Exchange
It's very interesting question. Aside from fee rate, miner also need to ensure that total sigops cost of the block isn't above 80000[1]. Otherwise, they may accidentally create invalid block which happened to F2Pool on last year[2].

[1] https://github.com/bitcoin/bitcoin/blob/v27.1/src/consensus/consensus.h#L16-L17
[2] https://bitcoin.stackexchange.com/q/117837
legendary
Activity: 2394
Merit: 6581
be constructive or S.T.F.U
To optimize their profits, do miners use software like Gurobi or other LP-solvers to solve the knapsack problem and select the most profitable transactions?
I'm only talking about honest miners who only want to maximize their profits.

I'd assume that most pools use Bitcoin Core's getblocktemplate function. The way it works is very simple and straightforward: it prioritizes high-fee transactions, taking into account transactions that depend on other transactions like CPFP transactions. You can go on GitHub and find the Core repo, look for the BlockAssembler class in miner.cpp.

getblocktemplate is obviously just a trial-and-error-and-hope-for-the-best solution to the knapsack problem. As mentioned above, since it's also NP-hard, you just can't tell what the best solution is. Also, given that most Bitcoin transactions have the same size, it makes the default greedy algorithm somewhat pretty efficient since the operation is just straightforward for most cases.

Obviously, other pools like Ocean have their own way of selecting transactions since they ban a dozen of them. Other pools like ViaBTC, which offers a transaction acceleration service, will prioritize their paid transactions when constructing a block. Except for Kano.is and CKsolo, I don't think anyone else makes anything that technical public.
member
Activity: 378
Merit: 93
Enable v2transport=1 and mempoolfullrbf=1
Fun info: The purpose of the 100 kvB limit (1/10 of the size of a block) for standard transaction relay is to make it easier for miners to solve this problem. When a single transaction gets close to the size of an entire block, the calculation to determine the most profitable combination of transactions gets much more difficult.
legendary
Activity: 3038
Merit: 4418
Crypto Swap Exchange
There is no way to know; each miner has their own proprietary mining software and no one can probably give you an answer besides the usual reference implementations and the ones that are open source. Validating this idea would be quite difficult, since we don't have a way to get the exact same mempool as the miners and that it is difficult to see which transactions miners are not in their mempool.

I would probably go on a limb and assume that most miners are selecting transactions by fee rates greedily, aka. Greedy Algorithm. For example, Bitcoin Core selects by ordering transactions with the highest fee rates to the lowest fee rates and selecting the transactions accordingly. I'm assuming that this could possibly give somewhat a decent tradeoff; miners cannot afford to spend too much time to consolidate the best combination of transactions and given how small Bitcoin transactions are, you can probably fill the blocks pretty much full.

This is the first time that I heard knapsack problem I did a bit of research and according to some responses that I found it talks about infinite block size.

If you are mining with Bitcoin I don't think you can develop something that you can select which if you want to include in the block you mine.
I don't think there is a called Gurobi or LP-solver in mining BTC maybe if you own a pool with huge hashrate power then you can choose and optimize what transactions you want to add to the block you mine.
Knapsack problem is exactly a problem that miners would face, because block size is not infinite. Knapsack is quite tricky because it is NP-hard and in our case the mempool has tons of transactions. It considers the optimization of fees with the pool of transactions of differing sizes and attempting to fit them into a block in a way such that you're able to maximize your profits. OR-Tools (Python) is quite flexible in this regard, and it would probably be easy to formulate something where (naively, but do consider the other stuff such as sigops)

Max z = x1(y1) + x2(y2) + ... + xn(yn)
const.
 4 = x1 + x2 + ... xn

Where x is the size of transactions in vbytes and y is the fee rates for the transaction.

Just to add-on: There are a few edgecases too and one of which concerns the fact that including child-transactions might yield a higher overall revenue and hence optimizing according to groups of transactions might be better.
legendary
Activity: 3374
Merit: 3095
Playbet.io - Crypto Casino and Sportsbook
This is the first time that I heard knapsack problem I did a bit of research and according to some responses that I found it talks about infinite block size.

If you are mining with Bitcoin I don't think you can develop something that you can select which if you want to include in the block you mine.
I don't think there is a called Gurobi or LP-solver in mining BTC maybe if you own a pool with huge hashrate power then you can choose and optimize what transactions you want to add to the block you mine.
newbie
Activity: 13
Merit: 36
To optimize their profits, do miners use software like Gurobi or other LP-solvers to solve the knapsack problem and select the most profitable transactions?
I'm only talking about honest miners who only want to maximize their profits.
Jump to: