Pages:
Author

Topic: Bitcoin block data (728 GB): inputs, outputs and transactions - page 2. (Read 3204 times)

legendary
Activity: 952
Merit: 1385
That's exactly what I was thinking about.
Is it? It doesn't have the block hashes you asked for, only txids and block numbers.

Yes, I was just stunned by amount of data to download. I need only txids.
With block hashes it is indeed simpler.
(you may delete the file)
legendary
Activity: 3290
Merit: 16489
Thick-Skinned Gang Leader and Golden Feather 2021
Perfect, great, fantastic, thank you!
That's exactly what I was thinking about.
Is it? It doesn't have the block hashes you asked for, only txids and block numbers.

Quote
I owe you beer & frites.
No worries, I've had my fair share today already Cheesy
legendary
Activity: 952
Merit: 1385
See: blockdata.loyce.club/PawGo.tsv or blockdata.loyce.club/PawGo.tsv.gz. It's 27 GB now, hashes don't compress very well. It's scheduled to be deleted in 7 days.

Quote
Otherwise I would have to download packs like from http://blockdata.loyce.club/ , decompress and parse tsv (which seems to be the simplest solution Wink
That's what I would do Smiley

Perfect, great, fantastic, thank you!
That's exactly what I was thinking about. I owe you beer & frites.
legendary
Activity: 3290
Merit: 16489
Thick-Skinned Gang Leader and Golden Feather 2021
Would it be possible to prepare a single file with:
- all block hashes
- all transaction IDs

Or maybe you have such a script I may execute on full node.
What format are you looking for? Just a long list of hashes, or do you need to know which txid belong to which block?
I think the easiest way is to get the data from /transactions/ (55 GB) and /blocks/.

I can save you a 55 GB download if you get me the format you want.
To test, this is running now in /transactions/:
Code:
for file in `ls`; do echo "Now processing $file"; gunzip -c $file | grep -v 'block_id' | cut -f1-2 >> ../PawGo.tsv; done; gzip ../PawGo.tsv
See: blockdata.loyce.club/PawGo.tsv or blockdata.loyce.club/PawGo.tsv.gz. It's 27 GB now, hashes don't compress very well. It's scheduled to be deleted in 7 days.

Quote
Otherwise I would have to download packs like from http://blockdata.loyce.club/ , decompress and parse tsv (which seems to be the simplest solution Wink
That's what I would do Smiley

Quote
Do you plan to backup /blocks/ folder from blockchair?
No need: these files are tiny, so downloading them from Blockchair directly shouldn't take too long anyway.
legendary
Activity: 952
Merit: 1385
Hi Loyce,

Would it be possible to prepare a single file with:
- all block hashes
- all transaction IDs

Or maybe you have such a script I may execute on full node. Otherwise I would have to download packs like from http://blockdata.loyce.club/ , decompress and parse tsv (which seems to be the simplest solution Wink
Do you plan to backup /blocks/ folder from blockchair?
legendary
Activity: 3290
Merit: 16489
Thick-Skinned Gang Leader and Golden Feather 2021
In the list of founded addresses there is unnecessary line "address"  (kind of column header) between addresses "3.." and "bc..", line 31808561. Like you had separated lists and you concatenated them.
Thanks! I'll take this to my other topic.
legendary
Activity: 952
Merit: 1385
In the list of founded addresses there is unnecessary line "address"  (kind of column header) between addresses "3.." and "bc..", line 31808561. Like you had separated lists and you concatenated them.
legendary
Activity: 3290
Merit: 16489
Thick-Skinned Gang Leader and Golden Feather 2021
Would anyone be interested in block data for Ethereum?
Update: I have downloaded "calls", "erc-20" and "transactions". The total size is 712 GB, and I don't have the server space to store it on. I have it on my local storage (so I can upload it in about 2 days).

Unfortunately, 2 files are corrupted:
  • blockchair_ethereum_calls_20200113.tsv.gz
  • blockchair_ethereum_calls_20211110.tsv.gz
I contacted Blockchair, but they "haven't detected the issue".

Quote
I'm currently downloading it to compile a list of Eth addresses and their balance from this data. Amazingly, I couldn't find a complete list anywhere.
By now Blockchair has the complete Ethereum address list online: blockchair_ethereum_addresses_latest.tsv.gz. That ends my efforts to compile this list myself.
legendary
Activity: 3290
Merit: 16489
Thick-Skinned Gang Leader and Golden Feather 2021
Would anyone be interested in block data for Ethereum? Apart from the disk space it takes (it's even larger than Bitcoin's block data), I wouldn't mind adding it to my collection.
I'm currently downloading it to compile a list of Eth addresses and their balance from this data. Amazingly, I couldn't find a complete list anywhere.
legendary
Activity: 3290
Merit: 16489
Thick-Skinned Gang Leader and Golden Feather 2021
Just a question - for the list of founded addresses, do you take in to account unconfirmed balance at the moment of snapshot or not? And opposite - if address with unconfirmed spending is still included or already not?
The datadump comes from the blockchain only. Anything in mempool is ignored.
legendary
Activity: 952
Merit: 1385
Just a question - for the list of founded addresses, do you take in to account unconfirmed balance at the moment of snapshot or not? And opposite - if address with unconfirmed spending is still included or already not?
legendary
Activity: 3290
Merit: 16489
Thick-Skinned Gang Leader and Golden Feather 2021
I noticed "outputs" wasn't updating because the cronjob started when "transactions" was still running (and Blockchair.com only allows one connection at a time). It's updated now and daily updates should work again.



I used this data to check for myself if the Bitcoin blockchain has duplicate transaction hashes:
Code:
for day in `ls /var/www/blockdata.loyce.club/public_html/outputs/*gz`; do echo $day; cat $day | gunzip | cut -f 1-2 | grep -v transaction_hash > outputs.tmp; for block in `cat outputs.tmp | cut -f1 | sort -nu`; do grep -P "^$block\t" outputs.tmp | cut -f2 | awk '!a[$0]++' >> all_transaction_hashes_outputs.txt; done; rm outputs.tmp; done
# The awk-part removes duplicates within each block
cat all_transaction_hashes_outputs.txt | sort -S69% | uniq -d
d5d27987d2a3dfc724e359870c6644b40e497bdc0589a033220fe15429d88599
e3bf3d07d4b0375638d5f1db5255fe07ba2c4cb067cd81b84ee974b6585fb468
This confirmed 2 transactions have the same hash.

I tried to do the same for "inputs", but unfortunately I don't have the disk space to sort this large amount of data.
legendary
Activity: 3500
Merit: 6320
Crypto Swap Exchange
...I believe you have previously stated that you don't want to use AWS because you don't want to have to use a credit card to pay, nor associate your IRL identity with the service. My argument would be that this is really your only option, as using a VPS with a smaller provider (as you have been doing) is eventually going to result in your account being shut down, or your bandwidth being exhausted...

Not unless he is working with a really small provider. More and more 1GB un-metered for co-location is the standard.* Or if it is metered it's in the multi TB range.
Bandwidth has gotten so cheap at data centers that it's pointless to rate it anymore. Case in point, Cogent just made us an offer for a 1GB fiber loop to our rack for under $400 a month all in. 3 years ago that same circuit was well over $1500. Hurricane is under $2500 for a 10GB circuit. And we are a very small buyer of bandwidth. "Real" companies that are buying multi 100GB circuits are paying very very little.

-Dave

* Most places are going for a 10 to 1 over subscription so you probably may not get the 1GB all the time but the point is the same.
It is unlikely that Loyce would be dealing with Cogent directly, but rather would be dealing with one of Cogent's customers.

Even if someone's bandwidth is "unmetered", I can assure you that usage is still "monitored". As you note, Cogent is going to oversell their capacity, and most likely, Cogent's customer who sells VPS services will also oversell their capacity. If Loyce is constantly sending hundreds of GB's worth of data to the internet, there will be less capacity for other customers to send their own data to the internet, which will degrade service for others.

The files that Loyce is hosting total over 660 GB. It would not take much for Loyce to run into hitting the multi TB range with files of that size, especially considering that it is trivial for someone to request those files multiple times.

Cogent and most providers like them as a rule do not oversubscribe, if anything they under subscribe.

They used to do it, but stopped around the GFC. No I don't think it's related, but around 2007 - 8 - 9 we just stopped seeing it, at least in the major DCs. I think there are just too many competitors. I can call Monday AM and have an agreement in a couple of days with another provider have the fiber connected a couple of days after that and be running a couple of days after that. So if I don't get my 1GB you're gone. Heck when Sandy took out people in NY loops were being turned up in hours as we all ran around the DCs getting stuff wired.

The people they sell to oversubscribe what they bought.

If you use a storage bucket on AWS or GCS, someone will need to pay $0.09/gb to transfer the file to the internet, or $0.01 to transfer your file to another data center run by the same cloud provider located on the same continent where your data is housed (or $0.00/gb -- free - within the same datacenter location). You can configure the storage bucket so that the person downloading the file will need to pay the transfer charges.
Up to $60 to download a few files. No wonder Bezos is rich Cheesy

Till we got out of that business we used to joke Amazon is our best salesperson.
Their product is good, don't get me wrong, but it's super expensive and most people don't need it and a 2nd or 3rd tier provider is usually fine.
Not to mention, if your provender is not peered fast enough with Amazon they tend to be super slow since they prioritize amazon.com traffic over the aws traffic. In most major markets it's not a big deal. But in the middle of nowhere when your local ISP only has a 10GB connection to the peering network come Christmas time it can get bad.

Either way this is drifting from the main point of the thread, so I'm going to leave it alone for now.

Side note, do you know how many people are actually using this data, it's neat and all but outside of the geek-base on this forum I don't see a lot of people caring.

-Dave
legendary
Activity: 3290
Merit: 16489
Thick-Skinned Gang Leader and Golden Feather 2021
AWS and GCS have the same pricing structure, generally speaking. If you hosted the files on either of those platforms, it would cost approximately US$20 per month. This is not the same as hosting your files on a VPS. Accessing and transferring the files would be much quicker compared to using a VPS.
There are several problems with that: first, I can find a much better deal for that price. The Seedbox for instance costs much less for 6 TB bandwidth (and unlimited at 100 Mbit after that). And I don't really mind if someone has to wait a day for a 665 GB download. That's a small inconvenience, and I prefer that over having to buy their own storage bucket and pay for bandwidth before they can download the files.

If you use a storage bucket on AWS or GCS, someone will need to pay $0.09/gb to transfer the file to the internet, or $0.01 to transfer your file to another data center run by the same cloud provider located on the same continent where your data is housed (or $0.00/gb -- free - within the same datacenter location). You can configure the storage bucket so that the person downloading the file will need to pay the transfer charges.
Up to $60 to download a few files. No wonder Bezos is rich Cheesy

The files that Loyce is hosting total over 660 GB. It would not take much for Loyce to run into hitting the multi TB range with files of that size, especially considering that it is trivial for someone to request those files multiple times.
It still takes several people crazy enough to download this much data per day to reach 50 TB. My other project currently uses over 2 TB per month, so for now I'm good Smiley
legendary
Activity: 2870
Merit: 7490
Crypto Swap Exchange
(Actually I'd prefer to do that .. can't do rsync straight to that appliance and I would rather avoid setting up a buffer VM just for rsync)

What's the problem? Doesn't both scp and rsync use ssh when copying data from different device?

Quote
If the "input outputs transaction" dump is something like a csv file, just compiling the "missing" data with Bitcoin Core and append it to the dump could do the trick.
Until I can get the full data from my own Bitcoin Core installation, I'll keep using the data dumps. And even if I can get the data from Bitcoin Core, it would further increase the VPS requirements.

You could run Bitcoin Core and processing script on local device, then upload the result to your VPS/seedbox.

Is that true that if you tell average user to leave vi
It can't be: the average user doesn't use vi.

I think he meant average terminal user, although i would argue nano is more popular option.
copper member
Activity: 1652
Merit: 1901
Amazon Prime Member #7
...I believe you have previously stated that you don't want to use AWS because you don't want to have to use a credit card to pay, nor associate your IRL identity with the service. My argument would be that this is really your only option, as using a VPS with a smaller provider (as you have been doing) is eventually going to result in your account being shut down, or your bandwidth being exhausted...

Not unless he is working with a really small provider. More and more 1GB un-metered for co-location is the standard.* Or if it is metered it's in the multi TB range.
Bandwidth has gotten so cheap at data centers that it's pointless to rate it anymore. Case in point, Cogent just made us an offer for a 1GB fiber loop to our rack for under $400 a month all in. 3 years ago that same circuit was well over $1500. Hurricane is under $2500 for a 10GB circuit. And we are a very small buyer of bandwidth. "Real" companies that are buying multi 100GB circuits are paying very very little.

-Dave

* Most places are going for a 10 to 1 over subscription so you probably may not get the 1GB all the time but the point is the same.
It is unlikely that Loyce would be dealing with Cogent directly, but rather would be dealing with one of Cogent's customers.

Even if someone's bandwidth is "unmetered", I can assure you that usage is still "monitored". As you note, Cogent is going to oversell their capacity, and most likely, Cogent's customer who sells VPS services will also oversell their capacity. If Loyce is constantly sending hundreds of GB's worth of data to the internet, there will be less capacity for other customers to send their own data to the internet, which will degrade service for others.

The files that Loyce is hosting total over 660 GB. It would not take much for Loyce to run into hitting the multi TB range with files of that size, especially considering that it is trivial for someone to request those files multiple times.
legendary
Activity: 3500
Merit: 6320
Crypto Swap Exchange
...I believe you have previously stated that you don't want to use AWS because you don't want to have to use a credit card to pay, nor associate your IRL identity with the service. My argument would be that this is really your only option, as using a VPS with a smaller provider (as you have been doing) is eventually going to result in your account being shut down, or your bandwidth being exhausted...

Not unless he is working with a really small provider. More and more 1GB un-metered for co-location is the standard.* Or if it is metered it's in the multi TB range.
Bandwidth has gotten so cheap at data centers that it's pointless to rate it anymore. Case in point, Cogent just made us an offer for a 1GB fiber loop to our rack for under $400 a month all in. 3 years ago that same circuit was well over $1500. Hurricane is under $2500 for a 10GB circuit. And we are a very small buyer of bandwidth. "Real" companies that are buying multi 100GB circuits are paying very very little.

-Dave

* Most places are going for a 10 to 1 over subscription so you probably may not get the 1GB all the time but the point is the same.
copper member
Activity: 1652
Merit: 1901
Amazon Prime Member #7
Hosing your files in a storage bucket
We discussed this already. But I just got a very nice Xeon-powered dedicated server (no more VPS!) from an anonymous donation, so I'm covered for now.
Ahh, yes, there it is. I thought I remembered giving you this advice, but I couldn't find the discussion in this thread.

Quote
All major cloud providers offer storage buckets. Many smaller cloud providers do as well.
I'm curious: what would it cost to store a TB in a storage bucket?
AWS and GCS have the same pricing structure, generally speaking. If you hosted the files on either of those platforms, it would cost approximately US$20 per month. This is not the same as hosting your files on a VPS. Accessing and transferring the files would be much quicker compared to using a VPS.

I believe you have previously stated that you don't want to use AWS because you don't want to have to use a credit card to pay, nor associate your IRL identity with the service. My argument would be that this is really your only option, as using a VPS with a smaller provider (as you have been doing) is eventually going to result in your account being shut down, or your bandwidth being exhausted.

There might be other cloud storage providers that can offer storage buckets at a lower price that might not be as reliable or have as high of throughput that offers their services for less. Your project is not one that is critical to always have access to all your data at a moment's notice, so this may be okay. It is possible you can find one that will be willing to accept crypto for their services.

If you use a storage bucket on AWS or GCS, someone will need to pay $0.09/gb to transfer the file to the internet, or $0.01 to transfer your file to another data center run by the same cloud provider located on the same continent where your data is housed (or $0.00/gb -- free - within the same datacenter location). You can configure the storage bucket so that the person downloading the file will need to pay the transfer charges.
legendary
Activity: 3290
Merit: 16489
Thick-Skinned Gang Leader and Golden Feather 2021
Hosing your files in a storage bucket
We discussed this already. But I just got a very nice Xeon-powered dedicated server (no more VPS!) from an anonymous donation, so I'm covered for now.

Quote
All major cloud providers offer storage buckets. Many smaller cloud providers do as well.
I'm curious: what would it cost to store a TB in a storage bucket?
copper member
Activity: 1652
Merit: 1901
Amazon Prime Member #7
My VPS doesn't have enough storage for "inputs" torrent, so i didn't try until download is finished. But it starts almost immediately with similar speed on previous speed.
It is ~never appropriate to store that much data on a VPS. You are much better off storing these files in a storage bucket. If you try any other solution, you will either quickly hit your transfer limitations, or your files will eventually get taken down because you are taking up too many resources.

Hosing your files in a storage bucket will mean that others can ~instantly download your files (limited only by their own bandwidth and computer equipment). Allowing people to download your files from the internet will be expensive, however, this can be addressed by configuring your bucket such that the requestor (the person downloading your file) will pay for the egress bandwidth.

All major cloud providers offer storage buckets. Many smaller cloud providers do as well.
Pages:
Jump to: