Let's analyze fee rate vs confirmation time! (~4m tx data inside)

nyanhtet

sr. member

Activity: 476

Merit: 250

Bawga

bitcoin = miners rich scheme

weex

legendary

Activity: 1102

Merit: 1014

Quote from: animalspirit on February 26, 2017, 03:43:42 AM

Quote from: weex on January 21, 2017, 11:27:17 PM

Another data file: http://www.filedropper.com/txdb2tar

That link does not produce a download. Would you happen to be have a recent update?

I've been focused on Rein and had a performance issue that caused the script to fall behind under heavy transaction volume.

Will see what's possible but note that the software is available and if you have a full node, you too can do it or hire someone to do it.

animalspirit

newbie

Activity: 21

Merit: 0

Quote from: weex on January 21, 2017, 11:27:17 PM

Another data file: http://www.filedropper.com/txdb2tar

That link does not produce a download. Would you happen to be have a recent update?

weex

legendary

Activity: 1102

Merit: 1014

Quote from: gmaxwell on January 22, 2017, 01:04:18 AM

Quote from: weex on January 15, 2017, 07:49:07 PM

While not terribly useful, this scatter plot of # of blocks to confirm

I think thats exactly the opposite... fee rate vs time is not very useful: Time just adds in the noise of a possion process with no influence by fees.

If you have a good estimator based on block count you can simply multiply it's predictions by the exponential distribution of interblock intervals and get a good estimator of time... but to take data that has been smeared by random block times and construct good estimates is hard due to all the injected noise.

In effect, your time data is heavily biased by random correlations with higher and lower fee intervals with luckier or less lucky block finding.

This remains true so long as people aren't turning hashrate on and off based on fees-- and so far as I know it, no one is today.

An interesting chart is a grid over n-blocks-wait and fee-rate, then for each cell set a value of what percentage of transactions paying at least that fee rate were confirmed in at that number or fewer blocks.

How does your data handle transaction replacement? How do you compute feerates for CPFP transactions? One possibility is to only consider transactions which are not dependent on unconfirmed transactions and which have no children; and similarly do not consider replacement or replaced transactions.

I understand blocks to confirm is more useful but just looking at that graph, I could see key information wasn't visible in 2d black and white. That's what I meant by that graph not being terribly useful. I like the grid chart idea.

A question that comes to mind with blocks is, has anyone done analysis to detect patterns in luck or transaction inclusion week by week? I suppose not many people are turning miners on and off due to weather, variable energy pricing over the day but I'd be interested to look into it.

The data doesn't take into account CPFP or dependencies within unconfirmed transactions at this point but it was a thought I had.

I've only started collecting some of this data and want to see what people throw on the wall in terms of analysis, then iterate on the data that's being collected to learn more. I feel like diversity in fee estimation strategies is important to defend against some entity trying to game those strategies.

gmaxwell

staff

Activity: 4326

Merit: 8951

Quote from: weex on January 15, 2017, 07:49:07 PM

While not terribly useful, this scatter plot of # of blocks to confirm

I think thats exactly the opposite... fee rate vs time is not very useful: Time just adds in the noise of a possion process with no influence by fees.

If you have a good estimator based on block count you can simply multiply it's predictions by the exponential distribution of interblock intervals and get a good estimator of time... but to take data that has been smeared by random block times and construct good estimates is hard due to all the injected noise.

In effect, your time data is heavily biased by random correlations with higher and lower fee intervals with luckier or less lucky block finding.

This remains true so long as people aren't turning hashrate on and off based on fees-- and so far as I know it, no one is today.

An interesting chart is a grid over n-blocks-wait and fee-rate, then for each cell set a value of what percentage of transactions paying at least that fee rate were confirmed in at that number or fewer blocks.

How does your data handle transaction replacement? How do you compute feerates for CPFP transactions? One possibility is to only consider transactions which are not dependent on unconfirmed transactions and which have no children; and similarly do not consider replacement or replaced transactions.

weex

legendary

Activity: 1102

Merit: 1014

Another data file: http://www.filedropper.com/txdb2tar

weex

legendary

Activity: 1102

Merit: 1014

Quote from: animalspirit on January 17, 2017, 04:50:18 AM

I was curious how much revenue is coming from fees from the transactions which miners include in only the next one or two blocks.,

One problem with using OP's data (confirmation_times.csv) for this is that it uses a timestamp, instead of block #. So there's no real way to know how many confirmations it took for a transaction to get mined. But using the average of 1 block = 10 minutes, I should be able to get kind of close.

What information this provides us is that miners receive ~%75 of their revenue from transactions they include within the first two blocks (well, within the first 20 minutes, to be technically accurate).

Very cool! Note that the latest data file does have # of blocks to confirm for each transaction which may simplify your script. That file is at http://www.filedropper.com/conftimes

I'll probably post another this weekend and will look to hook this up to http://bitcoinexchangerate.org/fees to generate the best fee estimates possible.

animalspirit

newbie

Activity: 21

Merit: 0

I was curious how much revenue is coming from fees from the transactions which miners include in only the next one or two blocks.,

One problem with using OP's data (confirmation_times.csv) for this is that it uses a timestamp, instead of block #. So there's no real way to know how many confirmations it took for a transaction to get mined. But using the average of 1 block = 10 minutes, I should be able to get kind of close.

What information this provides us is that miners receive ~%75 of their revenue from transactions they include within the first two blocks (well, within the first 20 minutes, to be technically accurate).

https://i.imgur.com/26SmxdY.png

Generated using the following Python3 source for use in a Jupyter notebook.

Code:

import pandas as pd
from plotly import graph_objs as go
from plotly.offline import iplot, init_notebook_mode
init_notebook_mode() # Jupyter notebook
data = []
daily_threshold = 10 # Ignore days w/less than N btc in fees.
filename = 'confirmation_times.csv'
col_names = ['conf_time', 'first_confirmed', 'fee', 'size']
labels = ['0+ minutes', '10+ minutes', '20+ minutes', '30+ minutes', '1+ hours']
bins = [0, 1*10*60, 2*10*60, 3*10*60, 6*10*60, 9999*10*60]
df = pd.read_csv('./{}'.format(filename), usecols=col_names, parse_dates=['first_confirmed'])
df['fee'] = df['fee'] / 100000000 # Use BTCs, not Satoshis.
df['date'] = pd.to_datetime(df['first_confirmed']).dt.date
df['time'] = pd.cut(df['conf_time'], bins, labels=labels)
grouped = df.groupby(['date', 'time'])
df2 = grouped['fee'].sum().to_frame().rename(columns = lambda x: x + '_sum')
s = df2.unstack()['fee_sum'].sum(1).ge(daily_threshold)
plot_df = df2.loc[s.index[s].tolist()]['fee_sum'].unstack()
for time_bin in list(plot_df):
   data.append(
   go.Bar(
   x=['{:%Y-%m-%d}'.format(dt) for dt in list(plot_df[time_bin].index)],
   y=plot_df[time_bin].tolist(),
   name=time_bin))
layout = go.Layout(
   barmode='stack', title='Bitcoin Mining Fee Revenue By Confirmation Speed',
   xaxis=go.XAxis(title='Date', type='category'),
   yaxis=go.YAxis(title='Total Fees (BTC)'))
fig=go.Figure(data=data, layout=layout)
iplot(fig, filename='stacked-bar')

weex

legendary

Activity: 1102

Merit: 1014

While not terribly useful, this scatter plot of # of blocks to confirm vs fee_rate does have some artistic value: http://imgur.com/a/j5iPl

New data file: http://www.filedropper.com/conftimes

More important was doing the code to calculate this from first seen time and first confirmed block. https://github.com/weex/bitcoin-fee-distribution/commit/f80e1cf41b1d533544470758f85ce5f4af8c102a

Should get some better fee estimates out of this soon.

Amph

legendary

Activity: 3248

Merit: 1072

Quote from: franky1 on January 14, 2017, 01:52:55 PM

Quote from: Amph on January 14, 2017, 03:21:43 AM

is the average tx size increasing in the time? because i remember it was 300bytes before, this also lead to more fee of course

the numbers for average txbytes over 8 years has changed. EG 2009 was under 250 and it become more over time.

the numbers of the OP's data in the spreadsheet is only numbers of 1m tx's which is well under 500 blocks = less than a week of data so its not going to reveal much long term change just short term.

i did do a few selective averages of
0-335k tx's = 447byte
335k-666k tx's =473byte
666k-1m tx's =454byte

and they all average about 447bytes - 473bytes

this is what i was talking about, and the only explanation is that people are receiving small transaction and sending big transaction, and doing this will only increase the fee

weex

legendary

Activity: 1102

Merit: 1014

Another pic with better axes: https://i.imgur.com/FGGBYpe.png

Code:

tx <- read.csv(file="confirmation_times.csv",sep=",",head=TRUE)
plot(tx$fee_rate,tx$conf_time, log="xy", yaxt="n", xaxt="n", pch = 20, cex=0.05)
marks <- c(0,60,600,3600,86400)
axis(2,at=marks,labels=marks)
xmarks <- c(1,5,10,20,100,300,1000,5000)
axis(1,at=xmarks,labels=xmarks)

franky1

legendary

Activity: 4424

Merit: 4794

Quote from: Velkro on January 14, 2017, 07:55:30 PM

Personally i don't believe this data. Why?
Because if that was true, that average tx confirmation is 40 min. that would be pure fauilure of bitcoin design of 10 minutes confirmations.
Data could be badly calculated becase of edge cases vastly diffirent from other data, included in calculation.

firstly the 10min expectation is not actually the bitcoin rule..
no tx is guaranteed to be accepted in 10 minutes.

the rule of bitcoin is that 2016 blocks should be produced in a fortnight. no rule that forces a tx into a block. no rule that forces XXXX tx's per block either. so bitcoin could have empty blocks forever and actually meet its protocol rules.
the 10minute per block is dividing 2 weeks by 2016 blocks..

now lets get into the details of the transactions..

an average block can only store ~2200tx or max of 1mb of data
check the mempool count.. yep more than 2200tx or 1mb waiting most of the time.
https://blockchain.info/unconfirmed-transactions - 3blocks of tx's waiting at time of writing this post = ~30min wait for some tx's(using velkro's very simplistic time overview of bitcoin confirms)

yea sometimes the mempool count can be low and all tx's get into a block promptly. other times it can take a couple blocks or upto an hour. depending on demand, and other criteria

id say the data is not bad. its actually accurate.. you just have to understand the context of the data.

so here is the context.

not everyone pays excess/top fee to be first inline. some pay minimum fee which then gets outbid by 2200 others who pay slightly higher so that the minimum payer is left waiting lower in the queue..

and yep some pay no fee at all. meaning they wont get accepted for hours. and its things like paying minimum or no fee that push the range of times out. which then affect the average time..

im sure people can go through the data and selectively delete out the tx's of zero fee.. but then getting selective/creative/manipulative over what tx's are deemed worthy of being part of someones expectations. is what starts being 'manipulative' and causing bad data

edit
i just checked the couple hundred tx's with 0 fee.. average confirm time was 5 hours 7 minutes 37 seconds

funny part is there were even tx that paid over $100 at the time (last week btcprice over $1k/btc) 0.1btc and the average of these big spenders
were 11mins 32secs (basically bribbed their way to the top of the list) with silly huge fee's

txid: 61d9e2841e462f0a73668bf37601f2c021e9a90a3810ef654a21063b5722840a
txid: 3a5546217b76ae91f0fd113dc7f8c863fd9099ab62abedaa71f2a856ccd48d6f
txid: 2fce0c36505aece2fa77df5f3bc02cf7d5ffe5231e7a41597b51d4a2ffb61383
txid: 3ea07465f19e188535766c1d4f60b6b5b968294212a5778b65ed13889c753636
txid: 0c48281a819ca34ae837297e1ece737dc779d7eee0025c8a46e4e87fc6658696
txid: d8d194e4ae415323a90a56cd999e2e7cca9dfe13258a4e82469223b8f2fbbc8c
txid: d14d0ddbdc269ebe09174a9d02e85b13c9ae97fc27e6e57cba196ef7c46ddab3
txid: 99901d44db56788b74999c0f6b4f3bc1c960fec61761b447be878ae1721ec6e4
txid: e78f3045c8348ff5882da99c0f8555294d889b641c820c82cc6f3ab62df103b0
txid: f6bf8c706fabb59489b1152cac038c69dbd565bd23dfed35d8131a08a655b846
txid: 415856aefeb42a6050abb8ec9b66b8a3688fd24632f36299b5deebf1a16e5c85
txid: e9ad2d09ec5de999b723ce8e74667243ccb7656ce09920c2b76fd91dea8b89ff
txid: 5ce70be2cf3163fad5192daf8356f9819f79622894c510c192929e78a714b332
txid: adf3dccfa9b2e24a5dfe7c997e927f6ac6865780c78f8afed388f34603883033

Velkro

legendary

Activity: 2296

Merit: 1014

Quote from: franky1 on January 13, 2017, 11:07:04 PM

based on the million tx's (mine stopped at 1,048,575 results)

average tx confirmed in 40mins 30mins
average tx size 458bytes
average tx fee 36892sat-41750sat (depending on if you include or exclude the 0fee tx's in the average)

average fee per byte is 91sat/byte
max fee per byte of range 34883 sat/byte
min fee per byte of range 0
-- as for the max fee.. either the source data has an error or someone lastweek paid ALOT for one of their transaction

max tx size 98888 bytes (98.9KB)
min tx size 170 bytes
-- as for the max bytes..either the source data has an error or someone lastweek had a near 99kb tx(filling 10% of block with 1tx)

Personally i don't believe this data. Why?
Because if that was true, that average tx confirmation is 40 min. that would be pure fauilure of bitcoin design of 10 minutes confirmations.
Data could be badly calculated becase of edge cases vastly diffirent from other data, included in calculation.

weex

legendary

Activity: 1102

Merit: 1014

Made a scatterplot with R: http://imgur.com/cPkJ6tq

The code to make this is:

Code:

tx <- read.csv(file="confirmation_times.csv",sep=",",head=TRUE)
plot(tx$fee_rate,tx$conf_time, log="xy", pch = 20, cex=0.05)

In the plot command, both axes are set as log scale, pch=20 means draw a dot and cex=0.05 means scale it down so it's a about a pixel.

franky1

legendary

Activity: 4424

Merit: 4794

Quote from: Amph on January 14, 2017, 03:21:43 AM

is the average tx size increasing in the time? because i remember it was 300bytes before, this also lead to more fee of course

the numbers for average txbytes over 8 years has changed. EG 2009 was under 250 and it become more over time.

the numbers of the OP's data in the spreadsheet is only numbers of 1m tx's which is well under 500 blocks = less than a week of data so its not going to reveal much long term change just short term.

i did do a few selective averages of
0-335k tx's = 447byte
335k-666k tx's =473byte
666k-1m tx's =454byte

and they all average about 447bytes - 473bytes

Quote from: Amph on January 14, 2017, 03:21:43 AM

is the average tx size increasing in the time? because i remember it was 300bytes before, this also lead to more fee of course
if i remember correctly the size of a tx is only based on how many imput you receive and some byte from the output
this mean that many are doing few big transaction and receiving many small one? correct?

using old legacy transactions
((148 * inputsused) + (34 * outputs used) ) +-10 variance = tx size estimate

as for multisigs. well that a whole different calculation to work out the bytes of a tx, as there are more variables involved
im sure someone else has found a workable calculation to work it out for multisigs.
but to answer your question. multisigs do use more bytes per tx.. if you were to compare a 2in 2out multsig to a 2in 2out legacy tx

oh and lets not forget LN's settlments which will also include extra bytes for CLTV and CSV data will bloat a tx even if its still just a 2in 2out.
yep segwit suggests more tx space but then LN settlements then refill that space with larger tx's..

ArcCsch

full member

Activity: 224

Merit: 117

▲ Portable backup power source for mining.

The huge transaction got confirmed:
https://blockchain.info/tx/2be1cbe4d1470a881f179f7dffdad3e5d12212966a86fe3d3b9be17789902989
Miner unknown.

jak3

legendary

Activity: 1274

Merit: 1004

well good work and 80mb is like a pritty big set to calculate but the more the better we are gonna make a good calculations and some stat on this soon. well its a diffrent thing that now i have to wait more 1hour before going to bed. i am excited to see all those collected reports which are gonna revel many questions

weex

legendary

Activity: 1102

Merit: 1014

Quote from: ArcCsch on January 14, 2017, 08:56:03 AM

Quote from: weex on January 14, 2017, 03:15:55 AM

Yup, here it is. Not confirmed so far. https://blockr.io/tx/info/2be1cbe4d1470a881f179f7dffdad3e5d12212966a86fe3d3b9be17789902989

Someone is trying to combine dust transactions into one address. All the inputs are less than a milli. If the average fee rate is 0.91μBTC/byte, and each input contributes 180 bytes, addresses with less than 0.16 mBTC are useless dust.

As for the data, can someone please make a scatter-plot of time vs. fee rate?
This should not be to difficult to do with Excel, I tried but could not get it to work.

This must be done outside a spreadsheet as none of those handle more than 64k records well. Pyplot is setup to do some graphing in the repo that collected this data but maybe someone wants to attack this with R?

ArcCsch

full member

Activity: 224

Merit: 117

▲ Portable backup power source for mining.

Much of the spam in the blockchain comes from the following sources:
1. Faucets
2. Gambling
3. Dust change addresses
Faucets are a pathetic way to make bitcoin, I know this by personal experience. However, they serve two purposes for newbies. They fulfil the newbies need to experiment with addresses and transactions, and, to a newbie, it is quite exciting to get their first chunk of bitcoin (personal experience). The first need can be satisfied by testnet coins, but the second reason is harder to eliminate, and is likely the main reason faucets are so prevalent.
Gambling spams up the blockchain and provides entrainment (honourable enough) and the potential for addiction and large loss (this ruins many people's lives).

Dust change addresses, however, are a problem that can be reduced. Say your wallet drafts a transaction using up several outputs to produce a payment. Most of the transaction goes into the payment output, but a small amount is left over.
Adding another output cost only about 0.03094mBTC, but using that output costs 0.1638mBTC, for a total of about 0.2mBTC.
The fix I suggest, is for when the leftover is too small (less than, for example, 1mBTC), for the wallet to overpay instead of creating a change address.
This should certainly not be a problem for the recipient, and it would reduce blockchain spam.

ranochigo

legendary

Activity: 3038

Merit: 4418

Crypto Swap Exchange

Quote from: Amph on January 14, 2017, 03:21:43 AM

is the average tx size increasing in the time? because i remember it was 300bytes before, this also lead to more fee of course

No. If you used compress key, for a transaction with 1 input, 1 output, that would be quite near the size you would get.

Quote from: Amph on January 14, 2017, 03:21:43 AM

if i remember correctly the size of a tx is only based on how many imput you receive and some byte from the output

Yes. Each input should occupy about 34 bytes. It's how many inputs you send and how many UXTOs you create.

Quote from: Amph on January 14, 2017, 03:21:43 AM

this mean that many are doing few big transaction and receiving many small one? correct?

You can't assume that.

Topic: Let's analyze fee rate vs confirmation time! (~4m tx data inside) (Read 1490 times)