My org is building a payment processor and we need to identify transactions as soon as they enter the mempool, so that a status message can be displayed.
may I ask what this is good for? I cannot imagine any real scenario that requires such a thing, please enlighten us.
It is mainly for reducing costs. Block explorers are the preferred solution as they have quick response times, but API keys can cost hundreds of dollars to get enough requests per month to operate with. This setup can work with a $100 - $200 server rented from somewhere like OVH and also allows you to support arbitrary altcoins. Well at least the ones that are based on Bitcoin.
So in any given payment page, you are going to see something like this:
This is a payment that I am making at CoinGate's example shop (
https://example.coingate.com) for an order of $0.50 brewed coffee. I will explain why the order is important in a minute.
When you send a transaction with these details, the system either has to detect it in the mempool and quickly show a progress screen so that the user doesn't think their payment is lost, especially the crypto-illiterate users, or it can wait until the transaction is confirmed and show the progress screen then. As nobody should be settling payment with unconfirmed transactions now that mempoolrbf is a thing.
In Bitcoin Core, the fastest way to do all this (besides using zeromq channels which achow talks about above) is by calling getrawmempool and then batch calling getrawtransaction on all of the returned transactions. A node with default settings that has been running for a few hours should have the default 300vMB-worth of transactions already stored. A recently started node will have much fewer transactions in the mempool or none at all, as it downloads transactions from its peers, which you can observe by running
while true; do bitcoin-cli getrawmempool | jq '. | length'; sleep 1; done.
After running a couple of tests yesterday, I found that calling getrawtransaction hundreds of times is slow and very disk-intensive, even with the industrial-grade HDD inside my server. This is equivalent to fetching 1-10 future blocks' worth of unconfirmed transactions. I'm sure with an SSD it would be multiple times faster but for getting the 142,000 total unconfirmed transactions it will still take quite a long time. And there is a lot of
memory usage too - the way Bitcoin Core reads the data when it gets the raw transaction, causes it to use several gigabytes if you batch too many transactions in your RPC call at once, which will definitely crash a server with any amount of RAM due to "out of memory" errors.
The only way to avoid the out-of-memory errors is if you make your batch size small enough that it doesn't run over your RAM. But with the amount of transactions in the mempool, that will take an unacceptably long amount of time (hours, or even days I believe).
It's a classic space/time trade-off, which can only be avoided by not only using small batch sizes but also sending each query to a different node. But then that no longer makes economic sense so I will write that off as an option.
One way to deal with this is when you get the mempool transactions, you will want to sort them by how likely they are to get into the next block i.e. fee * the average of the CPFP transaction parent fees if there are any. This is so that you can parse all the real transactions first and don't waste time parsing a bunch of 1sat/byte consolidations and 23 sats/byte Ordinals that are definitely not payments, until after all the higher-fee transactions are processed. As humans will usually make a payment with the fee their wallet tells them to use. By default the mempool seems to return the transactions in a random order so you'll have to sort them manually but this can be done very quickly.
Although you will still miss the ultra-low fee transactions by doing this (and those people won't get an in-progress message), it was never likely that those transactions would confirm before the timeout which is usually just a few hours.
Basically, you can only keep a few dozen vMB of transactions in the memory - if you want all of them then you have to offload them to a database like the block explorers and mempool.space do.
Really, it is only a cosmetic issue. Ideally you should wait until a transaction is mined before you show people a "there is nothing else you need to do!" message, and the workaround I wrote above will work for the vast majority of transactions - the ones with a low fee were probably manually specified by users who know a thing or two about crypto, which means they won't really be bothered if a payment processor doesn't immediately detect their transaction after they broadcast it. On the other hand, the ones that would complain are just using the default fee set by their wallet, and assuming the wallet gives them a high-enough fee, then such transactions can still be detected immediately with the resources available.
But it would still wouldn't detect the $0.5 brewed coffee payment unless you used a normal feerate that is 5-10x the size of the payment itself. But that proportion is a well-known issue.
None of these problems really exist when using ZeroMQ as it gives you the raw transaction itself (is this correct, @achow101?) so you can build a list of transactions from the time you start your node. And since transactions are usually evicted after 14 days then eventually you will have the full set after waiting that long after starting your node.
That's just a few of the things that go into the design when scanning for transactions, in a cost- and performance-efficient way.
It is necessary to download the entire mempool
Anyone, please correct me if I'm wrong or misunderstood. I think there is no
one mempool. Each full node has its own mempool.
That is correct. Nodes build their own mempools by fetching unconfirmed transactions from other nodes but they are free to keep the transactions that they like and discard the ones that they don't like.
An example of this is setting -maxmempool to limit the mempool size. In other words, nodes preferring to keep only the highest-fee txes that fit in n MB.