Author

Topic: The fastest way to download blockchain data in csv format (Read 266 times)

sr. member
Activity: 310
Merit: 727
---------> 1231006505
It sounds like XY problem, why do you specifically need those datas in csv format? If you're willing to share what you're trying to solve, some member might able to give better solution.

I want to collect all inputs and outputs of transactions with needed parameters. My idea is to create something like walletexplorer but with different heuristics. I have aws server with btc node. This data could be inother formats like json, xls. I could always convert it.
You could always interact with the full node you are running, just be sure to set the  txindex=1 value.

You can create your own program to do something along the lines like this:
Code:
#get hash of block you want to parse (600000 in this example:
bitcoin-cli getblockhash 600000
Returns value: 00000000000000000007316856900e76b4f7a9139cfbfba89842c8d196cd5f91

#get all txids in the block
bitcoin-cli getblock 00000000000000000007316856900e76b4f7a9139cfbfba89842c8d196cd5f91
Returns json object with the hashes of all transactions in that block in "tx"

#Loop through every value found, in this example I chose the 10th txid, use that to get the rawtransaction and in turn decode it:
bitcoin-cli getrawtransaction 72ff0008adcf3100a20e76333554d2fb9341820266456c4aeb089d328a001a5d
Returns the raw transaction, decode it in the next step:
bitcoin-cli decoderawtransaction 010000000001025a6d6ab8d1142df6709e04cb649a580831280f266d65ea5a052f89987e825cb30700000023220020fdab9a48cf9e636fc3a352486cb4ea2e3209eefe560b7950c18c06a41624dd75ffffffff9335593b4b4d923d0e809c07d2221670f1b4a46a8d64227b55185f889d6cb0ef00000000232200209363ea25f0c3fdf47b9c76ab0fc21bd44e910a504ff748d33d265908842b9ab9ffffffff02e09da3010000000017a914f975e09c756c209b6e9d7031a8a330a91f3fb20a8744d503000000000017a914a6c3f581c093cabe4644c948ab3ff8f966d57d588704004830450221009733ddafbe721c82aaf78aba6b9ba18d6a063e5393882e8073e93f5c22c75aef022023038d94ea67c5eb4d1a75eebda5b256cf0d934e20cc723b8ceb9a4c152cf73a014730440220789266c818bb2ecc9d0f1836ab7bb0179df4abc5f772b8cc96b7778eaeef3d3902201756a2a86b23041397eb7e3c2059156c9b0f037a066305ae0529d739049ab5620147522103607df524b5afee15ece0e87b362fa66a698488919dac93faf11b27273d2b224121030adf9b1d50dfe31367f033923458d5b2488d2ee92d7c806411b517eb021d960352ae040047304402205924875cd609cba315ba70a1edd7b17eac181a4fe5eab9e48884219ca453fcd7022001b6e89f91cfb7427f0aefb13e77ca3da84f611d07bbff7cdc4dd58fb705a435014730440220386974a306a111da6bb22256de3defe911d936a98faa0d0a47616fed948c35e202207f1459c69f4b60d821ab88df5baf3ba019ae7894faba2fd349eb70dec7a5cbb50147522102a03bb11024359133088573cf644941d90451fcd4a71375556735fb42c6ea0b2a21030adf9b1d50dfe31367f033923458d5b2488d2ee92d7c806411b517eb021d960352ae00000000

Returns this JSON:
{
  "txid": "72ff0008adcf3100a20e76333554d2fb9341820266456c4aeb089d328a001a5d",
  "hash": "5c5383609de5c12519f330fa6751326a5b99d799b96a3d9db879d835f15d948e",
  "version": 1,
  "size": 665,
  "vsize": 336,
  "weight": 1343,
  "locktime": 0,
  "vin": [
    {
      "txid": "b35c827e98892f055aea656d260f283108589a64cb049e70f62d14d1b86a6d5a",
      "vout": 7,
      "scriptSig": {
        "asm": "0020fdab9a48cf9e636fc3a352486cb4ea2e3209eefe560b7950c18c06a41624dd75",
        "hex": "220020fdab9a48cf9e636fc3a352486cb4ea2e3209eefe560b7950c18c06a41624dd75"
      },
      "txinwitness": [
        "",
        "30450221009733ddafbe721c82aaf78aba6b9ba18d6a063e5393882e8073e93f5c22c75aef022023038d94ea67c5eb4d1a75eebda5b256cf0d934e20cc723b8ceb9a4c152cf73a01",
        "30440220789266c818bb2ecc9d0f1836ab7bb0179df4abc5f772b8cc96b7778eaeef3d3902201756a2a86b23041397eb7e3c2059156c9b0f037a066305ae0529d739049ab56201",
        "522103607df524b5afee15ece0e87b362fa66a698488919dac93faf11b27273d2b224121030adf9b1d50dfe31367f033923458d5b2488d2ee92d7c806411b517eb021d960352ae"
      ],
      "sequence": 4294967295
    },
    {
      "txid": "efb06c9d885f18557b22648d6aa4b4f1701622d2079c800e3d924d4b3b593593",
      "vout": 0,
      "scriptSig": {
        "asm": "00209363ea25f0c3fdf47b9c76ab0fc21bd44e910a504ff748d33d265908842b9ab9",
        "hex": "2200209363ea25f0c3fdf47b9c76ab0fc21bd44e910a504ff748d33d265908842b9ab9"
      },
      "txinwitness": [
        "",
        "304402205924875cd609cba315ba70a1edd7b17eac181a4fe5eab9e48884219ca453fcd7022001b6e89f91cfb7427f0aefb13e77ca3da84f611d07bbff7cdc4dd58fb705a43501",
        "30440220386974a306a111da6bb22256de3defe911d936a98faa0d0a47616fed948c35e202207f1459c69f4b60d821ab88df5baf3ba019ae7894faba2fd349eb70dec7a5cbb501",
        "522102a03bb11024359133088573cf644941d90451fcd4a71375556735fb42c6ea0b2a21030adf9b1d50dfe31367f033923458d5b2488d2ee92d7c806411b517eb021d960352ae"
      ],
      "sequence": 4294967295
    }
  ],
  "vout": [
    {
      "value": 0.27500000,
      "n": 0,
      "scriptPubKey": {
        "asm": "OP_HASH160 f975e09c756c209b6e9d7031a8a330a91f3fb20a OP_EQUAL",
        "hex": "a914f975e09c756c209b6e9d7031a8a330a91f3fb20a87",
        "reqSigs": 1,
        "type": "scripthash",
        "addresses": [
          "3QS3PaF9Z9jcTaULo91bq4BeU8Ps868X2W"
        ]
      }
    },
    {
      "value": 0.00251204,
      "n": 1,
      "scriptPubKey": {
        "asm": "OP_HASH160 a6c3f581c093cabe4644c948ab3ff8f966d57d58 OP_EQUAL",
        "hex": "a914a6c3f581c093cabe4644c948ab3ff8f966d57d5887",
        "reqSigs": 1,
        "type": "scripthash",
        "addresses": [
          "3Gtnqrjt11dKZXyxAyY3qhMecbfU5bKD31"
        ]
      }
    }
  ]
}

You can then grab all the data you want from the JSON-file and store it in a way you like. BTW: Inputs are in the "vin"-block and outputs are in the "vout"-block.
jr. member
Activity: 68
Merit: 1
It sounds like XY problem, why do you specifically need those datas in csv format? If you're willing to share what you're trying to solve, some member might able to give better solution.

I want to collect all inputs and outputs of transactions with needed parameters. My idea is to create something like walletexplorer but with different heuristics. I have aws server with btc node. This data could be inother formats like json, xls. I could always convert it.
HCP
legendary
Activity: 2086
Merit: 4316
Theoretically, it should be possible to sync up a full node (with indexing switched on)... and then write (or possibly find) a block parser that can read through the raw block files and extract the data you require.

Or instead of parsing the block files, you could try and run one of the open-source "block explorers" and then use the API to query your local block explorer to retrieve the block/transaction data and output as CSV.

Honestly not sure which way would be quickest/easiest... although have the node setup and synced along with block explorer would be "reusable" if you needed to update the data etc.
legendary
Activity: 3290
Merit: 16489
Thick-Skinned Gang Leader and Golden Feather 2021
I want to download all transactions with input and output addresses.
You can use Blockchair Database Dumps to download all transactions, inputs and outputs. There's a catch though: you'll only get 100 kB/s. The last one ("outputs") is 106 GB in size, and I've downloaded it already. I think "inputs" is 3-4 times larger, so around 350 GB. That's going to take 40 days to download at that speed.
I'd love to add this data to my "collection" too, but I don't have the server space for it. Update: See Bitcoin block data (728 GB): inputs, outputs and transactions.

I only tried bitcoin-etl (https://github.com/blockchain-etl/bitcoin-etl) but I calculated that it takes more than one month to download it and save to small csv files (e.g. 50-150 blocks in one).
A fast cloud server might help. Converting small CSV files into larger files shouldn't be much of a problem.

Can you share what you're trying to do with this data?



jr. member
Activity: 68
Merit: 1
I wonder how to download bitcoin data to csv files. I would like to get all blocks, transactions in csv files. What is the fastest way to achieve this?

Been there, done that. Trust me unless your searching skill is way better mine then LoyceV's blockdata will be your best source, alternatively, you can download the "Blockchair Database Dumps" from blockchair, but if you don't have a script/bot to do that for you, then you will need to manually download 10 years worth of DAILY data, assuming we never had a whole day without finding a block then you are going to download about 3650 files, so I suggest you stick to the former option.

Keep in mind that LoyceV has that data in a .txt file, so for you to convert it to CSV you going to need a workaround, and the easiest one would be using MS Excel.


It is really great work by LoyceV but it hasn't resolved my problem. I want to download all transactions with input and output addresses. I only tried bitcoin-etl (https://github.com/blockchain-etl/bitcoin-etl) but I calculated that it takes more than one month to download it and save to small csv files (e.g. 50-150 blocks in one).  Do you know something else ?
legendary
Activity: 3290
Merit: 16489
Thick-Skinned Gang Leader and Golden Feather 2021
newer versions of Windows (although i don't use them myself) have added the functionality 2 years ago.
Wow! Let me quote microsoft.com:
Quote
For many years, Windows Notepad only supported text documents containing Windows End of Line (EOL) characters
~snip~
Today, we’re excited to announce that we have fixed this issue!
So I was wrong, it didn't take 37 years, but only 35 years. Such progress Cheesy
HCP
legendary
Activity: 2086
Merit: 4316
...but the newer versions of Windows (although i don't use them myself) have added the functionality 2 years ago.
Wouldn't you know it... I didn't even look in notepad... hahaha



It even happily tells you that it has Unix EOL encoding! Been so long since I actually bothered using notepad Tongue Roll Eyes


legendary
Activity: 3472
Merit: 10611
I don't use Windows, but I'm amazed after 37 years notepad still can't process Linux new lines. Last time I checked (years ago), Wordpad could handle it.

the screenshot above looks like Windows 7 which only supported the default windows new line (\r\n) which is not a bad thing IMO but they added the support for Unix specific new line encoding (\n), but the newer versions of Windows (although i don't use them myself) have added the functionality 2 years ago.
legendary
Activity: 3290
Merit: 16489
Thick-Skinned Gang Leader and Golden Feather 2021
It shows up fine in Notepad++... but by displaying the "hidden" whitespace, we can see it only has LF or "Line Feed" characters for "End of Line"... this is a Unix/Linux standard. However, Windows (and notepad.exe) uses "CR LF"... aka "Carriage Return, Line Feed".
I don't use Windows, but I'm amazed after 37 years notepad still can't process Linux new lines. Last time I checked (years ago), Wordpad could handle it.
I could easily convert the files to "Windows new lines", but that would mean adding even more files. id.txt is the only file without actual commas in it.
HCP
legendary
Activity: 2086
Merit: 4316
That's the individual "column" files... which aren't exactly CSV[1], as they're really just a single column of data... and that's an "End Of Line" separator issue.


It shows up fine in Notepad++... but by displaying the "hidden" whitespace, we can see it only has LF or "Line Feed" characters for "End of Line"... this is a Unix/Linux standard. However, Windows (and notepad.exe) uses "CR LF"... aka "Carriage Return, Line Feed".

Hence, the file not displaying correctly in notepad.exe


Here is how it looks after importing it to excel and exporting it as .csv
I suspect that Excel simply changed the EOL to CRLF so that it shows "correctly" in notepad.exe


[1] - Technically, they are valid according to csv specs
legendary
Activity: 2268
Merit: 6424
be constructive or S.T.F.U
No need to convert anything... a .csv is a plaintext file... it's just got the .csv extension to indicate that the data contained in the text file is in "comma separated value" format... ie. it is just a bunch of rows/columns where the column delimiter is "," and, optionally, the first row is column headers.

That's a good point but the plaintext in these files is not separated by a comma, will that still work in all csv viewers? Excel does recognize it as a file that has eliminated data in it, it also gives you the option of "My data has headers".

Here is how the .txt file looks by default:



Here is how it looks after importing it to excel and exporting it as .csv




Simply changing the extension from .txt to .csv didn't change the way the data is represented in Notepad, so I am not sure different programs deal with this stuff.


HCP
legendary
Activity: 2086
Merit: 4316
Keep in mind that LoyceV has that data in a .txt file, so for you to convert it to CSV you going to need a workaround, and the easiest one would be using MS Excel.
Or you could simply rename the file from blockdata.txt to blockdata.csv Roll Eyes Roll Eyes Roll Eyes

No need to convert anything... a .csv is a plaintext file... it's just got the .csv extension to indicate that the data contained in the text file is in "comma separated value" format... ie. it is just a bunch of rows/columns where the column delimiter is "," and, optionally, the first row is column headers.
legendary
Activity: 2268
Merit: 6424
be constructive or S.T.F.U
I wonder how to download bitcoin data to csv files. I would like to get all blocks, transactions in csv files. What is the fastest way to achieve this?

Been there, done that. Trust me unless your searching skill is way better mine then LoyceV's blockdata will be your best source, alternatively, you can download the "Blockchair Database Dumps" from blockchair, but if you don't have a script/bot to do that for you, then you will need to manually download 10 years worth of DAILY data, assuming we never had a whole day without finding a block then you are going to download about 3650 files, so I suggest you stick to the former option.

Keep in mind that LoyceV has that data in a .txt file, so for you to convert it to CSV you going to need a workaround, and the easiest one would be using MS Excel.
full member
Activity: 173
Merit: 120
jr. member
Activity: 68
Merit: 1
Yes, I plan to split it into many smaller files. I am looking for the best solution to quickly download this.

I know blocksci but I am not sure that it could help me to do this in my way
legendary
Activity: 4172
Merit: 8075
'The right to privacy matters'
Hi,

I wonder how to download bitcoin data to csv files. I would like to get all blocks, transactions in csv files. What is the fastest way to achieve this?
I heard about bitcoin-etl and blocksci. Is it possible with both of them ?

It is a pretty big file I would think you need to spilt it into smaller ones.


this is a pdf of blocksci white paper

https://arxiv.org/pdf/1709.02489.pdf

do you read it? it may give you some insight to your question.
jr. member
Activity: 68
Merit: 1
Hi,

I wonder how to download bitcoin data to csv files. I would like to get all blocks, transactions in csv files. What is the fastest way to achieve this?
I heard about bitcoin-etl and blocksci. Is it possible with both of them ?
Jump to: