Core: unconfirmed transactions containing a specific address

achow101

staff

Activity: 3458

Merit: 6793

Just writing some code

Quote from: NotATether on February 25, 2024, 05:08:16 AM

And there is a lot of memory usage too - the way Bitcoin Core reads the data when it gets the raw transaction, causes it to use several gigabytes if you batch too many transactions in your RPC call at once, which will definitely crash a server with any amount of RAM due to "out of memory" errors.

Memory usage is probably due converting transaction info to JSON text so it can output it, rather than reading transaction data. Mempool transactions are always held in memory and do not use disk for storage. Writing to disk is only to preserve the mempool across restarts.

When you are requesting for the transactions via getrawtransaction, they need to be encoded as text which is not very efficient. And if you're looking for decoding, then that's a whole lot more text for the decoded info, and all of that text will only ever live in memory.

Quote from: NotATether on February 25, 2024, 05:08:16 AM

None of these problems really exist when using ZeroMQ as it gives you the raw transaction itself (is this correct, @achow101?) so you can build a list of transactions from the time you start your node.

Yes

DaveF

legendary

Activity: 3500

Merit: 6320

Crypto Swap Exchange

Why not run your own local block explorer?
There are open source ones that have an API.

Yes, you might wind up needing a tiny bit more HW but it's probably going to just work better then trying to force the issue by making other software do what it's not designed for.
All most of the explorers do is put the data your node sees in a database and then query that DB and display it.

-Dave

NotATether

legendary

Activity: 1568

Merit: 6660

bitcoincleanup.com / bitmixlist.org

Quote from: citb0in on February 23, 2024, 12:29:49 PM

Quote from: NotATether on February 23, 2024, 06:10:51 AM

My org is building a payment processor and we need to identify transactions as soon as they enter the mempool, so that a status message can be displayed.

may I ask what this is good for? I cannot imagine any real scenario that requires such a thing, please enlighten us.

It is mainly for reducing costs. Block explorers are the preferred solution as they have quick response times, but API keys can cost hundreds of dollars to get enough requests per month to operate with. This setup can work with a $100 - $200 server rented from somewhere like OVH and also allows you to support arbitrary altcoins. Well at least the ones that are based on Bitcoin.

So in any given payment page, you are going to see something like this:

This is a payment that I am making at CoinGate's example shop (https://example.coingate.com) for an order of $0.50 brewed coffee. I will explain why the order is important in a minute.

When you send a transaction with these details, the system either has to detect it in the mempool and quickly show a progress screen so that the user doesn't think their payment is lost, especially the crypto-illiterate users, or it can wait until the transaction is confirmed and show the progress screen then. As nobody should be settling payment with unconfirmed transactions now that mempoolrbf is a thing.

In Bitcoin Core, the fastest way to do all this (besides using zeromq channels which achow talks about above) is by calling getrawmempool and then batch calling getrawtransaction on all of the returned transactions. A node with default settings that has been running for a few hours should have the default 300vMB-worth of transactions already stored. A recently started node will have much fewer transactions in the mempool or none at all, as it downloads transactions from its peers, which you can observe by running while true; do bitcoin-cli getrawmempool | jq '. | length'; sleep 1; done.

After running a couple of tests yesterday, I found that calling getrawtransaction hundreds of times is slow and very disk-intensive, even with the industrial-grade HDD inside my server. This is equivalent to fetching 1-10 future blocks' worth of unconfirmed transactions. I'm sure with an SSD it would be multiple times faster but for getting the 142,000 total unconfirmed transactions it will still take quite a long time. And there is a lot of memory usage too - the way Bitcoin Core reads the data when it gets the raw transaction, causes it to use several gigabytes if you batch too many transactions in your RPC call at once, which will definitely crash a server with any amount of RAM due to "out of memory" errors.

The only way to avoid the out-of-memory errors is if you make your batch size small enough that it doesn't run over your RAM. But with the amount of transactions in the mempool, that will take an unacceptably long amount of time (hours, or even days I believe).

It's a classic space/time trade-off, which can only be avoided by not only using small batch sizes but also sending each query to a different node. But then that no longer makes economic sense so I will write that off as an option.

One way to deal with this is when you get the mempool transactions, you will want to sort them by how likely they are to get into the next block i.e. fee * the average of the CPFP transaction parent fees if there are any. This is so that you can parse all the real transactions first and don't waste time parsing a bunch of 1sat/byte consolidations and 23 sats/byte Ordinals that are definitely not payments, until after all the higher-fee transactions are processed. As humans will usually make a payment with the fee their wallet tells them to use. By default the mempool seems to return the transactions in a random order so you'll have to sort them manually but this can be done very quickly.

Although you will still miss the ultra-low fee transactions by doing this (and those people won't get an in-progress message), it was never likely that those transactions would confirm before the timeout which is usually just a few hours.

Basically, you can only keep a few dozen vMB of transactions in the memory - if you want all of them then you have to offload them to a database like the block explorers and mempool.space do.

Really, it is only a cosmetic issue. Ideally you should wait until a transaction is mined before you show people a "there is nothing else you need to do!" message, and the workaround I wrote above will work for the vast majority of transactions - the ones with a low fee were probably manually specified by users who know a thing or two about crypto, which means they won't really be bothered if a payment processor doesn't immediately detect their transaction after they broadcast it. On the other hand, the ones that would complain are just using the default fee set by their wallet, and assuming the wallet gives them a high-enough fee, then such transactions can still be detected immediately with the resources available.

But it would still wouldn't detect the $0.5 brewed coffee payment unless you used a normal feerate that is 5-10x the size of the payment itself. But that proportion is a well-known issue.

None of these problems really exist when using ZeroMQ as it gives you the raw transaction itself (is this correct, @achow101?) so you can build a list of transactions from the time you start your node. And since transactions are usually evicted after 14 days then eventually you will have the full set after waiting that long after starting your node.

That's just a few of the things that go into the design when scanning for transactions, in a cost- and performance-efficient way.

Quote from: citb0in on February 23, 2024, 12:29:49 PM

Quote from: NotATether on February 23, 2024, 06:10:51 AM

It is necessary to download the entire mempool

Anyone, please correct me if I'm wrong or misunderstood. I think there is no one mempool. Each full node has its own mempool.

That is correct. Nodes build their own mempools by fetching unconfirmed transactions from other nodes but they are free to keep the transactions that they like and discard the ones that they don't like.

An example of this is setting -maxmempool to limit the mempool size. In other words, nodes preferring to keep only the highest-fee txes that fit in n MB.

achow101

staff

Activity: 3458

Merit: 6793

Just writing some code

Quote from: NotATether on February 23, 2024, 06:10:51 AM

My org is building a payment processor and we need to identify transactions as soon as they enter the mempool, so that a status message can be displayed. It is necessary to download the entire mempool because Core doesn't have a "get all transactions of an address" RPC call except if you use the wallet subsystem, but the rescanning overhead after importing each address makes that non-viable. Then our copy of the mempool is parsed into a better format which can be queried for any address. This is updated every 10-60 seconds (subject to RPC latency benchmarking that I'm performing now).

We already have a workflow for getting confirmed transactions in a block. But because getblock also returns raw transactions (and undo data), it has never been a problem.

I suggest that you use the ZMQ interface as it can provide the raw blocks and transactions as they are verified. It's actually quite reliable. It also notifies about transactions being removed from the mempool which can be useful. Since ZMQ is a push service, there won't be a polling delay where things could happen that you aren't aware of.

citb0in

hero member

Activity: 630

Merit: 731

Bitcoin g33k

Quote from: NotATether on February 23, 2024, 06:10:51 AM

My org is building a payment processor and we need to identify transactions as soon as they enter the mempool, so that a status message can be displayed.

may I ask what this is good for? I cannot imagine any real scenario that requires such a thing, please enlighten us.

Quote from: NotATether on February 23, 2024, 06:10:51 AM

It is necessary to download the entire mempool

Anyone, please correct me if I'm wrong or misunderstood. I think there is no one mempool. Each full node has its own mempool.

NotATether

legendary

Activity: 1568

Merit: 6660

bitcoincleanup.com / bitmixlist.org

Quote from: citb0in on February 22, 2024, 02:24:16 PM

What are your specific intentions with obtaining unconfirmed transaction data? Are these addresses belonging to you or potentially to any random stranger? What is your goal ?

My org is building a payment processor and we need to identify transactions as soon as they enter the mempool, so that a status message can be displayed. It is necessary to download the entire mempool because Core doesn't have a "get all transactions of an address" RPC call except if you use the wallet subsystem, but the rescanning overhead after importing each address makes that non-viable. Then our copy of the mempool is parsed into a better format which can be queried for any address. This is updated every 10-60 seconds (subject to RPC latency benchmarking that I'm performing now).

We already have a workflow for getting confirmed transactions in a block. But because getblock also returns raw transactions (and undo data), it has never been a problem.

citb0in

hero member

Activity: 630

Merit: 731

Bitcoin g33k

The help page (you got returned and posted already) explains the parameters, watch the examples at the bottom.

All you need is running

Code:

bitcoin-cli getrawmempool true

this will return the decoded transactions. Cool

Bitcoin core 24.0.0 documentation

Out of curiosity, may I ask:

What are your specific intentions with obtaining unconfirmed transaction data? Are these addresses belonging to you or potentially to any random stranger? What is your goal ?

seoincorporation

legendary

Activity: 3388

Merit: 3154

Quote from: NotATether on February 22, 2024, 03:53:16 AM

Quote from: seoincorporation on February 21, 2024, 06:10:34 PM

If you are the owner of those addresses you could use the listunspent command, there is a way to list the unconfirmed transactions too, but i wouldn't recommend dealing with unconfirmed transactions because you can be a victim of a double spend attack, or think that you had 2 deposits while it was a RBF transaction.

You mean with the Wallet API?

It is very slow and is always rescanning when you import an address.

Nevertheless, I have found that you can actually batch JSON-RPC calls together when you query the node by HTTP.
...

I'm not talking about wallet API, I'm talking about RPC calls, here are some interesting links:

https://bitcoin.stackexchange.com/questions/118681/does-listunspent-0-return-my-non-mempool-unconfirmed-outputs

Quote

No. listunspent, even if passed 0 as the minimum number of confirmations, will not list coins created in wallet transactions that are not part of the node's mempool at this time.

https://bitcoin.stackexchange.com/questions/116058/does-listunspent-work-for-any-address-is-txindex-1

Quote

The Bitcoin Core Wallet tracks addresses associated with its own keys. listunspent is a wallet RPC that refers to the wallet data only. Beyond the wallet's tracking of its own data, Bitcoin Core does not have functionality to keep an address index regardless of whether txindex is used or not. I surmise that address tracking was never prioritized by a Bitcoin Core contributor (or even pushed back upon by others) due to the intended single-use nature of addresses.

https://bitcoincore.org/en/doc/0.19.0/rpc/wallet/listunspent/

Quote

Examples:
> bitcoin-cli listunspent
> bitcoin-cli listunspent 6 9999999 "[\"1PGFqEzfmQch1gKD3ra4k18PNj3tTUUSqg\",\"1LtvqCaApEdUGFkpKMM4MstjcaL4dKg8SP\"]"
> curl --user myusername --data-binary '{"jsonrpc": "1.0", "id":"curltest", "method": "listunspent", "params": [6, 9999999 "[\"1PGFqEzfmQch1gKD3ra4k18PNj3tTUUSqg\",\"1LtvqCaApEdUGFkpKMM4MstjcaL4dKg8SP\"]"] }' -H 'content-type: text/plain;' http://127.0.0.1:8332/
> bitcoin-cli listunspent 6 9999999 '[]' true '{ "minimumAmount": 0.005 }'
> curl --user myusername --data-binary '{"jsonrpc": "1.0", "id":"curltest", "method": "listunspent", "params": [6, 9999999, [] , true, { "minimumAmount": 0.005 } ] }' -H 'content-type: text/plain;' http://127.0.0.1:8332/

NotATether

legendary

Activity: 1568

Merit: 6660

bitcoincleanup.com / bitmixlist.org

Quote from: seoincorporation on February 21, 2024, 06:10:34 PM

If you are the owner of those addresses you could use the listunspent command, there is a way to list the unconfirmed transactions too, but i wouldn't recommend dealing with unconfirmed transactions because you can be a victim of a double spend attack, or think that you had 2 deposits while it was a RBF transaction.

You mean with the Wallet API?

It is very slow and is always rescanning when you import an address.

Nevertheless, I have found that you can actually batch JSON-RPC calls together when you query the node by HTTP.

All that needs to be done is to put all the requests in an array and send them.

Apparently it is a little-known feature present since the very early versions of Core, and also present in the JSON-RPC standard. Here's an implementation by Jeff Garzik from 2012, likely the form originally used by Core: https://github.com/jgarzik/rpcsrv

Best of all this means it will even work with other altcoin's nodes (well I'm not sure about non-Bitcoin-like chains).

The speed when batching thousands of calls together has drastically exceeded my expectations and I am happy with its performance.

The only thing I am wondering now is what is the hard limit for the number of calls I can batch together.

seoincorporation

legendary

Activity: 3388

Merit: 3154

If you are the owner of those addresses you could use the listunspent command, there is a way to list the unconfirmed transactions too, but i wouldn't recommend dealing with unconfirmed transactions because you can be a victim of a double spend attack, or think that you had 2 deposits while it was a RBF transaction.

achow101

staff

Activity: 3458

Merit: 6793

Just writing some code

Quote from: NotATether on February 21, 2024, 01:38:14 AM

Are you sure that returns decoded transactions because every time I use that parameter, I get an output that looks like the one described in the RPC docs::

Oops, I was thinking of getblock

ABCbits

legendary

Activity: 2870

Merit: 7490

Crypto Swap Exchange

Quote from: Knight Hider on February 20, 2024, 02:50:21 PM

A watch only Electrum wallet instantly shows unconfirmed transactions. That means the Electrum server keeps the database you are looking for. See if you can use this for your needs.

I also believe this should be appropriate solution. Looking at Electrum protocol, you could just use this API call.

Quote from: https://electrumx-spesmilo.readthedocs.io/en/latest/protocol-methods.html#blockchain-scripthash-get-mempool

blockchain.scripthash.get_mempool

Return the unconfirmed transactions of a script hash.

Signature

   blockchain.scripthash.get_mempool(scripthash)

   New in version 1.1.

   scripthash

   The script hash as a hexadecimal string.

Result

   A list of mempool transactions in arbitrary order. Each mempool transaction is a dictionary with the following keys:

--snip--

Script hash refer to ScriptPubKey and you can run your own Electrum server.

NotATether

legendary

Activity: 1568

Merit: 6660

bitcoincleanup.com / bitmixlist.org

Quote from: achow101 on February 20, 2024, 07:12:04 PM

Quote from: NotATether on February 20, 2024, 09:10:17 AM

- Use getrawmempool, and call getrawtransaction on each of the returned transaction IDs

getrawmempool has a verbose parameter which will return the decoded transactions rather than just txids.

Are you sure that returns decoded transactions because every time I use that parameter, I get an output that looks like the one described in the RPC docs::

Code:

{ (json object)
"transactionid" : { (json object)
"vsize" : n, (numeric) virtual transaction size as defined in BIP 141. This is different from actual serialized size for witness transactions as witness data is discounted.
"weight" : n, (numeric) transaction weight as defined in BIP 141.
"fee" : n, (numeric) transaction fee in BTC (DEPRECATED)
"modifiedfee" : n, (numeric) transaction fee with fee deltas used for mining priority (DEPRECATED)
"time" : xxx, (numeric) local time transaction entered pool in seconds since 1 Jan 1970 GMT
"height" : n, (numeric) block height when transaction entered pool
"descendantcount" : n, (numeric) number of in-mempool descendant transactions (including this one)
"descendantsize" : n, (numeric) virtual transaction size of in-mempool descendants (including this one)
"descendantfees" : n, (numeric) modified fees (see above) of in-mempool descendants (including this one) (DEPRECATED)
"ancestorcount" : n, (numeric) number of in-mempool ancestor transactions (including this one)
"ancestorsize" : n, (numeric) virtual transaction size of in-mempool ancestors (including this one)
"ancestorfees" : n, (numeric) modified fees (see above) of in-mempool ancestors (including this one) (DEPRECATED)
"wtxid" : "hex", (string) hash of serialized transaction, including witness data
"fees" : { (json object)
"base" : n, (numeric) transaction fee in BTC
"modified" : n, (numeric) transaction fee with fee deltas used for mining priority in BTC
"ancestor" : n, (numeric) modified fees (see above) of in-mempool ancestors (including this one) in BTC
"descendant" : n (numeric) modified fees (see above) of in-mempool descendants (including this one) in BTC
},
"depends" : [ (json array) unconfirmed transactions used as inputs for this transaction
"hex", (string) parent transaction id
...
],
"spentby" : [ (json array) unconfirmed transactions spending outputs from this transaction
"hex", (string) child transaction id
...
],
"bip125-replaceable" : true|false, (boolean) Whether this transaction could be replaced due to BIP125 (replace-by-fee)
"unbroadcast" : true|false (boolean) Whether this transaction is currently unbroadcast (initial broadcast not yet acknowledged by any peers)
},
...
}

achow101

staff

Activity: 3458

Merit: 6793

Just writing some code

Quote from: NotATether on February 20, 2024, 09:10:17 AM

- Use getrawmempool, and call getrawtransaction on each of the returned transaction IDs

getrawmempool has a verbose parameter which will return the decoded transactions rather than just txids.

Quote from: NotATether on February 20, 2024, 09:10:17 AM

- Use scantxoutset (But I'm not sure if you can make this only look for unconfirmed transactions)

I don't think it searches the mempool at all.

Quote from: NotATether on February 20, 2024, 09:10:17 AM

I am also aware of the ZMQ topic that publishes raw transactions. The issue is I'm not sure how reliable that is.

Although the documentation states that it does not guarantee that data was sent, I don't think it in practice has ever been unreliable, especially if used locally.

Quote from: NotATether on February 20, 2024, 09:10:17 AM

Ideally if mempool raw transactions are cached on the disk, this will be the fastest way to process them all.

Although Bitcoin Core does write mempool transactions to disk, it only does this on shutdown as the purpose is to allow mempools to persist across restarts. You could use the savemempool RPC to force this and then parse it, but of course the file will not be updated with any new tranasctions, until you do savemempool again.

Knight Hider

member

Activity: 239

Merit: 59

a young loner on a crusade

A watch only Electrum wallet instantly shows unconfirmed transactions. That means the Electrum server keeps the database you are looking for. See if you can use this for your needs.

NotATether

legendary

Activity: 1568

Merit: 6660

bitcoincleanup.com / bitmixlist.org

I am looking for a fast way to obtain a list of unconfirmed transactions which have a particular address in either its inputs or outputs in Bitcoin Core.

I am aware of 3rd party web services which can provide that, however I am looking to get that information directly from the full node.

AFAIK there are two options:

- Use getrawmempool, and call getrawtransaction on each of the returned transaction IDs
- Use scantxoutset (But I'm not sure if you can make this only look for unconfirmed transactions)

Obviously the problem with the first method is that it is fairly slow, even with muktithreading and tuned RPC parameters, but also more importantly the result uses way too much memory when the transactions are decoded.

Ideally, I would like to store all the decoded (unconfirmed) transactions inside a database specifically for that purpose, to make future queries faster.

I am also aware of the ZMQ topic that publishes raw transactions. The issue is I'm not sure how reliable that is.

Ideally if mempool raw transactions are cached on the disk, this will be the fastest way to process them all.

But I'm not sure what to do here.

Edit: I also hear that setting -maxmempool to a small value might ease the load too?

Topic: Core: unconfirmed transactions containing a specific address (Read 300 times)