Pages:
Author

Topic: Bitcoin block data (728 GB): inputs, outputs and transactions - page 2. (Read 3356 times)

legendary
Activity: 3290
Merit: 16489
Thick-Skinned Gang Leader and Golden Feather 2021
Ethereum data

I've added ethereumdata.loyce.club:
ethereumdata.loyce.club/calls/ (329 GB)
ethereumdata.loyce.club/erc-20_transactions/ (363 GB)
ethereumdata.loyce.club/transactions/ (69 GB)

I needed this data from gz.blockchair.com/ethereum/ for a project (which I abandoned before my download completed). Since I now have the data, I'll share it (downloading 762 GB at 100 kB/s took forever 3 months).

Updates
New files are added daily.

Disclaimer
I don't like Ethereum, which is a centralized shitcoin that abandoned it's one unique selling point ("code is law") the moment it was convenient for the creator. Don't waste your money on it!
And since it's a shitcoin, I don't think it deserves it's own topic.

Missing files
Those 2 files are missing from my mirror because they're corrupted:
https://gz.blockchair.com/ethereum/calls/blockchair_ethereum_calls_20200113.tsv.gz
https://gz.blockchair.com/ethereum/calls/blockchair_ethereum_calls_20211110.tsv.gz

I sent Blockchair an email about it 3 months ago, but the response I got was this:
Quote
We've checked and haven't detected the issue with the file.
If anyone has the tools to rebuild those 2 files, I'd love to add them!
legendary
Activity: 3290
Merit: 16489
Thick-Skinned Gang Leader and Golden Feather 2021
I stumbled upon something peculiar. Take blockchair_bitcoin_outputs_20220203.tsv.gz for example:
Code:
721577  f2ec8c7f07725959014613a5cf04dde4cf3079c8948bc011298479e751935fc3        11      2022-02-03 01:43:54     51200000        18970.623       bc1qt2kc82kr0wdyyyqns7qyvl377dap69ygzkpwmc      witness_v0_scripthash   00145aad83aac37b9a4210138780467e3ef37a1d1488    0       -1
721577  f2ec8c7f07725959014613a5cf04dde4cf3079c8948bc011298479e751935fc3        11      2022-02-03 01:43:54     51200000        18970.623       bc1qt2kc82kr0wdyyyqns7qyvl377dap69ygzkpwmc      witness_v0_scripthash   00145aad83aac37b9a4210138780467e3ef37a1d1488    0       -1
721577  f2ec8c7f07725959014613a5cf04dde4cf3079c8948bc011298479e751935fc3        11      2022-02-03 01:43:54     51200000        18970.623       bc1qt2kc82kr0wdyyyqns7qyvl377dap69ygzkpwmc      witness_v0_scripthash   00145aad83aac37b9a4210138780467e3ef37a1d1488    0       -1
721577  f2ec8c7f07725959014613a5cf04dde4cf3079c8948bc011298479e751935fc3        12      2022-02-03 01:43:54     51200000        18970.623       bc1qx9jm85e08jasw75g0drr7y2x9xx45xck6xvxhe      witness_v0_scripthash   00143165b3d32f3cbb077a887b463f1146298d5a1b16    0       -1
721577  f2ec8c7f07725959014613a5cf04dde4cf3079c8948bc011298479e751935fc3        13      2022-02-03 01:43:54     51200000        18970.623       bc1qmx0eraunssvj9ukel8m40cpt8ez4wxj4t2jn4q      witness_v0_scripthash   0014d99f91f793841922f2d9f9f757e02b3e45571a55    0       -1
721577  f2ec8c7f07725959014613a5cf04dde4cf3079c8948bc011298479e751935fc3        13      2022-02-03 01:43:54     51200000        18970.623       bc1qmx0eraunssvj9ukel8m40cpt8ez4wxj4t2jn4q      witness_v0_scripthash   0014d99f91f793841922f2d9f9f757e02b3e45571a55    0       -1
721577  f2ec8c7f07725959014613a5cf04dde4cf3079c8948bc011298479e751935fc3        13      2022-02-03 01:43:54     51200000        18970.623       bc1qmx0eraunssvj9ukel8m40cpt8ez4wxj4t2jn4q      witness_v0_scripthash   0014d99f91f793841922f2d9f9f757e02b3e45571a55    0       -1
721577  f2ec8c7f07725959014613a5cf04dde4cf3079c8948bc011298479e751935fc3        14      2022-02-03 01:43:54     51200000        18970.623       bc1qr4jgu3t5fnjrcux646kssfmavsw5zftmj4tsc6      witness_v0_scripthash   00141d648e45744ce43c70daaead08277d641d41257b    0       -1
721577  f2ec8c7f07725959014613a5cf04dde4cf3079c8948bc011298479e751935fc3        15      2022-02-03 01:43:54     51200000        18970.623       bc1qw6dnjhw8qjyxtszn950l5zh57x60tm5lcsdtkr      witness_v0_scripthash   0014769b395dc7048865c0532d1ffa0af4f1b4f5ee9f    0       -1
721577  f2ec8c7f07725959014613a5cf04dde4cf3079c8948bc011298479e751935fc3        15      2022-02-03 01:43:54     51200000        18970.623       bc1qw6dnjhw8qjyxtszn950l5zh57x60tm5lcsdtkr      witness_v0_scripthash   0014769b395dc7048865c0532d1ffa0af4f1b4f5ee9f    0       -1
721577  f2ec8c7f07725959014613a5cf04dde4cf3079c8948bc011298479e751935fc3        15      2022-02-03 01:43:54     51200000        18970.623       bc1qw6dnjhw8qjyxtszn950l5zh57x60tm5lcsdtkr      witness_v0_scripthash   0014769b395dc7048865c0532d1ffa0af4f1b4f5ee9f    0       -1
There are many duplicated lines! For the compressed filesize it doesn't matter much, but if I remove them, the number of lines drops by 1,046,856-879,157=167,699!
I checked more archives: the older ones have only a few duplicate lines, the newer archives have tens or hundreds of thousands of duplicates.

What could be the reason? And worse: it also makes me wonder if other entries could be missing.
legendary
Activity: 3290
Merit: 16489
Thick-Skinned Gang Leader and Golden Feather 2021
Hi, what's the state of ETH addresses dump? Is it uploaded already?
All funded Ethereum addresses: ethereumdata.loyce.club/blockchair_ethereum_addresses_latest.tsv.gz (2.1 GB). I downloaded this from gz.blockchair.com/ethereum/addresses/ on July 21, 2022. There's currently no update available.
hero member
Activity: 1659
Merit: 687
LoyceV on the road. Or couch.
Hi, what's the state of ETH addresses dump? Is it uploaded already?
I have it, updated until June only, but not online yet. And I'm currently sailing so can't access it.
The data I meant is the full Ethereum transaction data, about 800 GB. That will take my home internet a while to upload.

Quote
Do you want to talk about doing it in DB?
I'm still a total noob, but would be good to learn.
legendary
Activity: 952
Merit: 1386
Update: the server is currenly offline for an upgrade Cheesy With more disk space, I can add Ethereum data from Blockchair (which I have locally already), and later Dogecoin data too.

Hi, what's the state of ETH addresses dump? Is it uploaded already?

Problem:
There's a lot of data, and I don't do databases. I was actually running out of disk space to sort the data, so this upgrade came at the right time.

Do you want to talk about doing it in DB?
legendary
Activity: 3290
Merit: 16489
Thick-Skinned Gang Leader and Golden Feather 2021
I just got a very nice Xeon-powered dedicated server (no more VPS!) from an anonymous donation, so I'm covered for now.
Update: the server is currenly offline for an upgrade Cheesy With more disk space, I can add Ethereum data from Blockchair (which I have locally already), and later Dogecoin data too.



I've been playing with an idea: I want to make a graph of funded (potential) ChipMixer chips over the years, with daily data.

Assumptions:
I'll start by looking for addresses that received 0.512BTC. I'll exclude all addresses that received more than one transaction (ever), and I'll count the chips from the day they were funded until they day they're emptied. In Bitcoin's early years, before ChipMixer even existed, many potential chips were emptied the same day again. Those won't be counted.

Problem:
There's a lot of data, and I don't do databases. I was actually running out of disk space to sort the data, so this upgrade came at the right time.
legendary
Activity: 952
Merit: 1386
That's exactly what I was thinking about.
Is it? It doesn't have the block hashes you asked for, only txids and block numbers.

Yes, I was just stunned by amount of data to download. I need only txids.
With block hashes it is indeed simpler.
(you may delete the file)
legendary
Activity: 3290
Merit: 16489
Thick-Skinned Gang Leader and Golden Feather 2021
Perfect, great, fantastic, thank you!
That's exactly what I was thinking about.
Is it? It doesn't have the block hashes you asked for, only txids and block numbers.

Quote
I owe you beer & frites.
No worries, I've had my fair share today already Cheesy
legendary
Activity: 952
Merit: 1386
See: blockdata.loyce.club/PawGo.tsv or blockdata.loyce.club/PawGo.tsv.gz. It's 27 GB now, hashes don't compress very well. It's scheduled to be deleted in 7 days.

Quote
Otherwise I would have to download packs like from http://blockdata.loyce.club/ , decompress and parse tsv (which seems to be the simplest solution Wink
That's what I would do Smiley

Perfect, great, fantastic, thank you!
That's exactly what I was thinking about. I owe you beer & frites.
legendary
Activity: 3290
Merit: 16489
Thick-Skinned Gang Leader and Golden Feather 2021
Would it be possible to prepare a single file with:
- all block hashes
- all transaction IDs

Or maybe you have such a script I may execute on full node.
What format are you looking for? Just a long list of hashes, or do you need to know which txid belong to which block?
I think the easiest way is to get the data from /transactions/ (55 GB) and /blocks/.

I can save you a 55 GB download if you get me the format you want.
To test, this is running now in /transactions/:
Code:
for file in `ls`; do echo "Now processing $file"; gunzip -c $file | grep -v 'block_id' | cut -f1-2 >> ../PawGo.tsv; done; gzip ../PawGo.tsv
See: blockdata.loyce.club/PawGo.tsv or blockdata.loyce.club/PawGo.tsv.gz. It's 27 GB now, hashes don't compress very well. It's scheduled to be deleted in 7 days.

Quote
Otherwise I would have to download packs like from http://blockdata.loyce.club/ , decompress and parse tsv (which seems to be the simplest solution Wink
That's what I would do Smiley

Quote
Do you plan to backup /blocks/ folder from blockchair?
No need: these files are tiny, so downloading them from Blockchair directly shouldn't take too long anyway.
legendary
Activity: 952
Merit: 1386
Hi Loyce,

Would it be possible to prepare a single file with:
- all block hashes
- all transaction IDs

Or maybe you have such a script I may execute on full node. Otherwise I would have to download packs like from http://blockdata.loyce.club/ , decompress and parse tsv (which seems to be the simplest solution Wink
Do you plan to backup /blocks/ folder from blockchair?
legendary
Activity: 3290
Merit: 16489
Thick-Skinned Gang Leader and Golden Feather 2021
In the list of founded addresses there is unnecessary line "address"  (kind of column header) between addresses "3.." and "bc..", line 31808561. Like you had separated lists and you concatenated them.
Thanks! I'll take this to my other topic.
legendary
Activity: 952
Merit: 1386
In the list of founded addresses there is unnecessary line "address"  (kind of column header) between addresses "3.." and "bc..", line 31808561. Like you had separated lists and you concatenated them.
legendary
Activity: 3290
Merit: 16489
Thick-Skinned Gang Leader and Golden Feather 2021
Would anyone be interested in block data for Ethereum?
Update: I have downloaded "calls", "erc-20" and "transactions". The total size is 712 GB, and I don't have the server space to store it on. I have it on my local storage (so I can upload it in about 2 days).

Unfortunately, 2 files are corrupted:
  • blockchair_ethereum_calls_20200113.tsv.gz
  • blockchair_ethereum_calls_20211110.tsv.gz
I contacted Blockchair, but they "haven't detected the issue".

Quote
I'm currently downloading it to compile a list of Eth addresses and their balance from this data. Amazingly, I couldn't find a complete list anywhere.
By now Blockchair has the complete Ethereum address list online: blockchair_ethereum_addresses_latest.tsv.gz. That ends my efforts to compile this list myself.
legendary
Activity: 3290
Merit: 16489
Thick-Skinned Gang Leader and Golden Feather 2021
Would anyone be interested in block data for Ethereum? Apart from the disk space it takes (it's even larger than Bitcoin's block data), I wouldn't mind adding it to my collection.
I'm currently downloading it to compile a list of Eth addresses and their balance from this data. Amazingly, I couldn't find a complete list anywhere.
legendary
Activity: 3290
Merit: 16489
Thick-Skinned Gang Leader and Golden Feather 2021
Just a question - for the list of founded addresses, do you take in to account unconfirmed balance at the moment of snapshot or not? And opposite - if address with unconfirmed spending is still included or already not?
The datadump comes from the blockchain only. Anything in mempool is ignored.
legendary
Activity: 952
Merit: 1386
Just a question - for the list of founded addresses, do you take in to account unconfirmed balance at the moment of snapshot or not? And opposite - if address with unconfirmed spending is still included or already not?
legendary
Activity: 3290
Merit: 16489
Thick-Skinned Gang Leader and Golden Feather 2021
I noticed "outputs" wasn't updating because the cronjob started when "transactions" was still running (and Blockchair.com only allows one connection at a time). It's updated now and daily updates should work again.



I used this data to check for myself if the Bitcoin blockchain has duplicate transaction hashes:
Code:
for day in `ls /var/www/blockdata.loyce.club/public_html/outputs/*gz`; do echo $day; cat $day | gunzip | cut -f 1-2 | grep -v transaction_hash > outputs.tmp; for block in `cat outputs.tmp | cut -f1 | sort -nu`; do grep -P "^$block\t" outputs.tmp | cut -f2 | awk '!a[$0]++' >> all_transaction_hashes_outputs.txt; done; rm outputs.tmp; done
# The awk-part removes duplicates within each block
cat all_transaction_hashes_outputs.txt | sort -S69% | uniq -d
d5d27987d2a3dfc724e359870c6644b40e497bdc0589a033220fe15429d88599
e3bf3d07d4b0375638d5f1db5255fe07ba2c4cb067cd81b84ee974b6585fb468
This confirmed 2 transactions have the same hash.

I tried to do the same for "inputs", but unfortunately I don't have the disk space to sort this large amount of data.
legendary
Activity: 3500
Merit: 6320
Crypto Swap Exchange
...I believe you have previously stated that you don't want to use AWS because you don't want to have to use a credit card to pay, nor associate your IRL identity with the service. My argument would be that this is really your only option, as using a VPS with a smaller provider (as you have been doing) is eventually going to result in your account being shut down, or your bandwidth being exhausted...

Not unless he is working with a really small provider. More and more 1GB un-metered for co-location is the standard.* Or if it is metered it's in the multi TB range.
Bandwidth has gotten so cheap at data centers that it's pointless to rate it anymore. Case in point, Cogent just made us an offer for a 1GB fiber loop to our rack for under $400 a month all in. 3 years ago that same circuit was well over $1500. Hurricane is under $2500 for a 10GB circuit. And we are a very small buyer of bandwidth. "Real" companies that are buying multi 100GB circuits are paying very very little.

-Dave

* Most places are going for a 10 to 1 over subscription so you probably may not get the 1GB all the time but the point is the same.
It is unlikely that Loyce would be dealing with Cogent directly, but rather would be dealing with one of Cogent's customers.

Even if someone's bandwidth is "unmetered", I can assure you that usage is still "monitored". As you note, Cogent is going to oversell their capacity, and most likely, Cogent's customer who sells VPS services will also oversell their capacity. If Loyce is constantly sending hundreds of GB's worth of data to the internet, there will be less capacity for other customers to send their own data to the internet, which will degrade service for others.

The files that Loyce is hosting total over 660 GB. It would not take much for Loyce to run into hitting the multi TB range with files of that size, especially considering that it is trivial for someone to request those files multiple times.

Cogent and most providers like them as a rule do not oversubscribe, if anything they under subscribe.

They used to do it, but stopped around the GFC. No I don't think it's related, but around 2007 - 8 - 9 we just stopped seeing it, at least in the major DCs. I think there are just too many competitors. I can call Monday AM and have an agreement in a couple of days with another provider have the fiber connected a couple of days after that and be running a couple of days after that. So if I don't get my 1GB you're gone. Heck when Sandy took out people in NY loops were being turned up in hours as we all ran around the DCs getting stuff wired.

The people they sell to oversubscribe what they bought.

If you use a storage bucket on AWS or GCS, someone will need to pay $0.09/gb to transfer the file to the internet, or $0.01 to transfer your file to another data center run by the same cloud provider located on the same continent where your data is housed (or $0.00/gb -- free - within the same datacenter location). You can configure the storage bucket so that the person downloading the file will need to pay the transfer charges.
Up to $60 to download a few files. No wonder Bezos is rich Cheesy

Till we got out of that business we used to joke Amazon is our best salesperson.
Their product is good, don't get me wrong, but it's super expensive and most people don't need it and a 2nd or 3rd tier provider is usually fine.
Not to mention, if your provender is not peered fast enough with Amazon they tend to be super slow since they prioritize amazon.com traffic over the aws traffic. In most major markets it's not a big deal. But in the middle of nowhere when your local ISP only has a 10GB connection to the peering network come Christmas time it can get bad.

Either way this is drifting from the main point of the thread, so I'm going to leave it alone for now.

Side note, do you know how many people are actually using this data, it's neat and all but outside of the geek-base on this forum I don't see a lot of people caring.

-Dave
legendary
Activity: 3290
Merit: 16489
Thick-Skinned Gang Leader and Golden Feather 2021
AWS and GCS have the same pricing structure, generally speaking. If you hosted the files on either of those platforms, it would cost approximately US$20 per month. This is not the same as hosting your files on a VPS. Accessing and transferring the files would be much quicker compared to using a VPS.
There are several problems with that: first, I can find a much better deal for that price. The Seedbox for instance costs much less for 6 TB bandwidth (and unlimited at 100 Mbit after that). And I don't really mind if someone has to wait a day for a 665 GB download. That's a small inconvenience, and I prefer that over having to buy their own storage bucket and pay for bandwidth before they can download the files.

If you use a storage bucket on AWS or GCS, someone will need to pay $0.09/gb to transfer the file to the internet, or $0.01 to transfer your file to another data center run by the same cloud provider located on the same continent where your data is housed (or $0.00/gb -- free - within the same datacenter location). You can configure the storage bucket so that the person downloading the file will need to pay the transfer charges.
Up to $60 to download a few files. No wonder Bezos is rich Cheesy

The files that Loyce is hosting total over 660 GB. It would not take much for Loyce to run into hitting the multi TB range with files of that size, especially considering that it is trivial for someone to request those files multiple times.
It still takes several people crazy enough to download this much data per day to reach 50 TB. My other project currently uses over 2 TB per month, so for now I'm good Smiley
Pages:
Jump to: