Pages:
Author

Topic: Bitcoin block data (728 GB): inputs, outputs and transactions - page 2. (Read 2861 times)

legendary
Activity: 952
Merit: 1367
In the list of founded addresses there is unnecessary line "address"  (kind of column header) between addresses "3.." and "bc..", line 31808561. Like you had separated lists and you concatenated them.
legendary
Activity: 3276
Merit: 16448
Thick-Skinned Gang Leader and Golden Feather 2021
Would anyone be interested in block data for Ethereum?
Update: I have downloaded "calls", "erc-20" and "transactions". The total size is 712 GB, and I don't have the server space to store it on. I have it on my local storage (so I can upload it in about 2 days).

Unfortunately, 2 files are corrupted:
  • blockchair_ethereum_calls_20200113.tsv.gz
  • blockchair_ethereum_calls_20211110.tsv.gz
I contacted Blockchair, but they "haven't detected the issue".

Quote
I'm currently downloading it to compile a list of Eth addresses and their balance from this data. Amazingly, I couldn't find a complete list anywhere.
By now Blockchair has the complete Ethereum address list online: blockchair_ethereum_addresses_latest.tsv.gz. That ends my efforts to compile this list myself.
legendary
Activity: 3276
Merit: 16448
Thick-Skinned Gang Leader and Golden Feather 2021
Would anyone be interested in block data for Ethereum? Apart from the disk space it takes (it's even larger than Bitcoin's block data), I wouldn't mind adding it to my collection.
I'm currently downloading it to compile a list of Eth addresses and their balance from this data. Amazingly, I couldn't find a complete list anywhere.
legendary
Activity: 3276
Merit: 16448
Thick-Skinned Gang Leader and Golden Feather 2021
Just a question - for the list of founded addresses, do you take in to account unconfirmed balance at the moment of snapshot or not? And opposite - if address with unconfirmed spending is still included or already not?
The datadump comes from the blockchain only. Anything in mempool is ignored.
legendary
Activity: 952
Merit: 1367
Just a question - for the list of founded addresses, do you take in to account unconfirmed balance at the moment of snapshot or not? And opposite - if address with unconfirmed spending is still included or already not?
legendary
Activity: 3276
Merit: 16448
Thick-Skinned Gang Leader and Golden Feather 2021
I noticed "outputs" wasn't updating because the cronjob started when "transactions" was still running (and Blockchair.com only allows one connection at a time). It's updated now and daily updates should work again.



I used this data to check for myself if the Bitcoin blockchain has duplicate transaction hashes:
Code:
for day in `ls /var/www/blockdata.loyce.club/public_html/outputs/*gz`; do echo $day; cat $day | gunzip | cut -f 1-2 | grep -v transaction_hash > outputs.tmp; for block in `cat outputs.tmp | cut -f1 | sort -nu`; do grep -P "^$block\t" outputs.tmp | cut -f2 | awk '!a[$0]++' >> all_transaction_hashes_outputs.txt; done; rm outputs.tmp; done
# The awk-part removes duplicates within each block
cat all_transaction_hashes_outputs.txt | sort -S69% | uniq -d
d5d27987d2a3dfc724e359870c6644b40e497bdc0589a033220fe15429d88599
e3bf3d07d4b0375638d5f1db5255fe07ba2c4cb067cd81b84ee974b6585fb468
This confirmed 2 transactions have the same hash.

I tried to do the same for "inputs", but unfortunately I don't have the disk space to sort this large amount of data.
legendary
Activity: 3444
Merit: 6182
Crypto Swap Exchange
...I believe you have previously stated that you don't want to use AWS because you don't want to have to use a credit card to pay, nor associate your IRL identity with the service. My argument would be that this is really your only option, as using a VPS with a smaller provider (as you have been doing) is eventually going to result in your account being shut down, or your bandwidth being exhausted...

Not unless he is working with a really small provider. More and more 1GB un-metered for co-location is the standard.* Or if it is metered it's in the multi TB range.
Bandwidth has gotten so cheap at data centers that it's pointless to rate it anymore. Case in point, Cogent just made us an offer for a 1GB fiber loop to our rack for under $400 a month all in. 3 years ago that same circuit was well over $1500. Hurricane is under $2500 for a 10GB circuit. And we are a very small buyer of bandwidth. "Real" companies that are buying multi 100GB circuits are paying very very little.

-Dave

* Most places are going for a 10 to 1 over subscription so you probably may not get the 1GB all the time but the point is the same.
It is unlikely that Loyce would be dealing with Cogent directly, but rather would be dealing with one of Cogent's customers.

Even if someone's bandwidth is "unmetered", I can assure you that usage is still "monitored". As you note, Cogent is going to oversell their capacity, and most likely, Cogent's customer who sells VPS services will also oversell their capacity. If Loyce is constantly sending hundreds of GB's worth of data to the internet, there will be less capacity for other customers to send their own data to the internet, which will degrade service for others.

The files that Loyce is hosting total over 660 GB. It would not take much for Loyce to run into hitting the multi TB range with files of that size, especially considering that it is trivial for someone to request those files multiple times.

Cogent and most providers like them as a rule do not oversubscribe, if anything they under subscribe.

They used to do it, but stopped around the GFC. No I don't think it's related, but around 2007 - 8 - 9 we just stopped seeing it, at least in the major DCs. I think there are just too many competitors. I can call Monday AM and have an agreement in a couple of days with another provider have the fiber connected a couple of days after that and be running a couple of days after that. So if I don't get my 1GB you're gone. Heck when Sandy took out people in NY loops were being turned up in hours as we all ran around the DCs getting stuff wired.

The people they sell to oversubscribe what they bought.

If you use a storage bucket on AWS or GCS, someone will need to pay $0.09/gb to transfer the file to the internet, or $0.01 to transfer your file to another data center run by the same cloud provider located on the same continent where your data is housed (or $0.00/gb -- free - within the same datacenter location). You can configure the storage bucket so that the person downloading the file will need to pay the transfer charges.
Up to $60 to download a few files. No wonder Bezos is rich Cheesy

Till we got out of that business we used to joke Amazon is our best salesperson.
Their product is good, don't get me wrong, but it's super expensive and most people don't need it and a 2nd or 3rd tier provider is usually fine.
Not to mention, if your provender is not peered fast enough with Amazon they tend to be super slow since they prioritize amazon.com traffic over the aws traffic. In most major markets it's not a big deal. But in the middle of nowhere when your local ISP only has a 10GB connection to the peering network come Christmas time it can get bad.

Either way this is drifting from the main point of the thread, so I'm going to leave it alone for now.

Side note, do you know how many people are actually using this data, it's neat and all but outside of the geek-base on this forum I don't see a lot of people caring.

-Dave
legendary
Activity: 3276
Merit: 16448
Thick-Skinned Gang Leader and Golden Feather 2021
AWS and GCS have the same pricing structure, generally speaking. If you hosted the files on either of those platforms, it would cost approximately US$20 per month. This is not the same as hosting your files on a VPS. Accessing and transferring the files would be much quicker compared to using a VPS.
There are several problems with that: first, I can find a much better deal for that price. The Seedbox for instance costs much less for 6 TB bandwidth (and unlimited at 100 Mbit after that). And I don't really mind if someone has to wait a day for a 665 GB download. That's a small inconvenience, and I prefer that over having to buy their own storage bucket and pay for bandwidth before they can download the files.

If you use a storage bucket on AWS or GCS, someone will need to pay $0.09/gb to transfer the file to the internet, or $0.01 to transfer your file to another data center run by the same cloud provider located on the same continent where your data is housed (or $0.00/gb -- free - within the same datacenter location). You can configure the storage bucket so that the person downloading the file will need to pay the transfer charges.
Up to $60 to download a few files. No wonder Bezos is rich Cheesy

The files that Loyce is hosting total over 660 GB. It would not take much for Loyce to run into hitting the multi TB range with files of that size, especially considering that it is trivial for someone to request those files multiple times.
It still takes several people crazy enough to download this much data per day to reach 50 TB. My other project currently uses over 2 TB per month, so for now I'm good Smiley
copper member
Activity: 1610
Merit: 1898
Amazon Prime Member #7
...I believe you have previously stated that you don't want to use AWS because you don't want to have to use a credit card to pay, nor associate your IRL identity with the service. My argument would be that this is really your only option, as using a VPS with a smaller provider (as you have been doing) is eventually going to result in your account being shut down, or your bandwidth being exhausted...

Not unless he is working with a really small provider. More and more 1GB un-metered for co-location is the standard.* Or if it is metered it's in the multi TB range.
Bandwidth has gotten so cheap at data centers that it's pointless to rate it anymore. Case in point, Cogent just made us an offer for a 1GB fiber loop to our rack for under $400 a month all in. 3 years ago that same circuit was well over $1500. Hurricane is under $2500 for a 10GB circuit. And we are a very small buyer of bandwidth. "Real" companies that are buying multi 100GB circuits are paying very very little.

-Dave

* Most places are going for a 10 to 1 over subscription so you probably may not get the 1GB all the time but the point is the same.
It is unlikely that Loyce would be dealing with Cogent directly, but rather would be dealing with one of Cogent's customers.

Even if someone's bandwidth is "unmetered", I can assure you that usage is still "monitored". As you note, Cogent is going to oversell their capacity, and most likely, Cogent's customer who sells VPS services will also oversell their capacity. If Loyce is constantly sending hundreds of GB's worth of data to the internet, there will be less capacity for other customers to send their own data to the internet, which will degrade service for others.

The files that Loyce is hosting total over 660 GB. It would not take much for Loyce to run into hitting the multi TB range with files of that size, especially considering that it is trivial for someone to request those files multiple times.
legendary
Activity: 2842
Merit: 7333
Crypto Swap Exchange
(Actually I'd prefer to do that .. can't do rsync straight to that appliance and I would rather avoid setting up a buffer VM just for rsync)

What's the problem? Doesn't both scp and rsync use ssh when copying data from different device?

Quote
If the "input outputs transaction" dump is something like a csv file, just compiling the "missing" data with Bitcoin Core and append it to the dump could do the trick.
Until I can get the full data from my own Bitcoin Core installation, I'll keep using the data dumps. And even if I can get the data from Bitcoin Core, it would further increase the VPS requirements.

You could run Bitcoin Core and processing script on local device, then upload the result to your VPS/seedbox.

Is that true that if you tell average user to leave vi
It can't be: the average user doesn't use vi.

I think he meant average terminal user, although i would argue nano is more popular option.
legendary
Activity: 3444
Merit: 6182
Crypto Swap Exchange
...I believe you have previously stated that you don't want to use AWS because you don't want to have to use a credit card to pay, nor associate your IRL identity with the service. My argument would be that this is really your only option, as using a VPS with a smaller provider (as you have been doing) is eventually going to result in your account being shut down, or your bandwidth being exhausted...

Not unless he is working with a really small provider. More and more 1GB un-metered for co-location is the standard.* Or if it is metered it's in the multi TB range.
Bandwidth has gotten so cheap at data centers that it's pointless to rate it anymore. Case in point, Cogent just made us an offer for a 1GB fiber loop to our rack for under $400 a month all in. 3 years ago that same circuit was well over $1500. Hurricane is under $2500 for a 10GB circuit. And we are a very small buyer of bandwidth. "Real" companies that are buying multi 100GB circuits are paying very very little.

-Dave

* Most places are going for a 10 to 1 over subscription so you probably may not get the 1GB all the time but the point is the same.
copper member
Activity: 1610
Merit: 1898
Amazon Prime Member #7
Hosing your files in a storage bucket
We discussed this already. But I just got a very nice Xeon-powered dedicated server (no more VPS!) from an anonymous donation, so I'm covered for now.
Ahh, yes, there it is. I thought I remembered giving you this advice, but I couldn't find the discussion in this thread.

Quote
All major cloud providers offer storage buckets. Many smaller cloud providers do as well.
I'm curious: what would it cost to store a TB in a storage bucket?
AWS and GCS have the same pricing structure, generally speaking. If you hosted the files on either of those platforms, it would cost approximately US$20 per month. This is not the same as hosting your files on a VPS. Accessing and transferring the files would be much quicker compared to using a VPS.

I believe you have previously stated that you don't want to use AWS because you don't want to have to use a credit card to pay, nor associate your IRL identity with the service. My argument would be that this is really your only option, as using a VPS with a smaller provider (as you have been doing) is eventually going to result in your account being shut down, or your bandwidth being exhausted.

There might be other cloud storage providers that can offer storage buckets at a lower price that might not be as reliable or have as high of throughput that offers their services for less. Your project is not one that is critical to always have access to all your data at a moment's notice, so this may be okay. It is possible you can find one that will be willing to accept crypto for their services.

If you use a storage bucket on AWS or GCS, someone will need to pay $0.09/gb to transfer the file to the internet, or $0.01 to transfer your file to another data center run by the same cloud provider located on the same continent where your data is housed (or $0.00/gb -- free - within the same datacenter location). You can configure the storage bucket so that the person downloading the file will need to pay the transfer charges.
legendary
Activity: 3276
Merit: 16448
Thick-Skinned Gang Leader and Golden Feather 2021
Hosing your files in a storage bucket
We discussed this already. But I just got a very nice Xeon-powered dedicated server (no more VPS!) from an anonymous donation, so I'm covered for now.

Quote
All major cloud providers offer storage buckets. Many smaller cloud providers do as well.
I'm curious: what would it cost to store a TB in a storage bucket?
copper member
Activity: 1610
Merit: 1898
Amazon Prime Member #7
My VPS doesn't have enough storage for "inputs" torrent, so i didn't try until download is finished. But it starts almost immediately with similar speed on previous speed.
It is ~never appropriate to store that much data on a VPS. You are much better off storing these files in a storage bucket. If you try any other solution, you will either quickly hit your transfer limitations, or your files will eventually get taken down because you are taking up too many resources.

Hosing your files in a storage bucket will mean that others can ~instantly download your files (limited only by their own bandwidth and computer equipment). Allowing people to download your files from the internet will be expensive, however, this can be addressed by configuring your bucket such that the requestor (the person downloading your file) will pay for the egress bandwidth.

All major cloud providers offer storage buckets. Many smaller cloud providers do as well.
legendary
Activity: 3276
Merit: 16448
Thick-Skinned Gang Leader and Golden Feather 2021
Big update

First: the Torrents won't last. I got an anonymous sponsor for a dedicated server. The Seedbox expires on April 6, so the Torrens won't work much longer:
New! Torrents!
inputs.torrent
outputs.torrent
transactions.torrent
For privacy, you may want to consider using a VPN so other users can't see your IP address.

The data is back at it's original location:
This server is has a 50 TB/month bandwidth limit. So far, at most a few people per month downloaded this (crazy amount of) data, so it should be sufficient.

Note: Some files are missing, those are still being updated now. By tomorrow automated daily updates should be on track again.
legendary
Activity: 3276
Merit: 16448
Thick-Skinned Gang Leader and Golden Feather 2021
Yesterday I was looking for some hosting services for BTC and honestly speaking I haven't found anything I would trust.
For projects like this topic I don't really need AWS-level uptime, so I gladly go for the budget hosts. I run this project at Racknerd (good deals via Lowendtalk.com), and (so far) I'm quite happy with it. If I look at prices on their own site, it's much higher.
You could also go for RamNode, pay by the hour and less cheap, but from what I've seen RamNode has very solid performance too.
I don't think any of those are run from their garage Wink I run this project at Gullo's Hosting (again: see Lowendtalk.com for deals), which has the unique feature that it's run by one guy. Servers are international, so not from his garage, but you'll always deal with the same guy. He'll try his best, which for this project is everything I need.

I also found out that several of the higher range VPS providers don't accept Bitcoin, or demand a copy of my passport (yeah, right!) first.

Quote
That's not a VPS, so won't help me much.
legendary
Activity: 952
Merit: 1367
See:
Credits
Blockchair Database Dumps has a staggering amount of data, easily accessible (at 10 100 kB/s) with daily updates. All data in this topic comes from Blockchair.
(nobody ever reads the OP)

;-) Your point.
The same applies to the privacy policy and the washing machine manual.

Quote
Quote
For operations on local blocks I have used that parser: https://github.com/gcarq/rusty-blockparser
Configuration was really easy, processing of course takes some time.
Memory usage to get balances: ~18GB. That would take big VPS.

Ops. I did not use it, only transactions dump. It is up to you if you want to use it, even partially.

Yesterday I was looking for some hosting services for BTC and honestly speaking I haven't found anything I would trust. Once I tried one hosting just to compare with my main one, but service and reliability was terrible. I had impression all that companies which offer servers for BTC are running their "datacenters" somewhere in the garage.
Have you seen https://www.sync.com/pricing-individual/ ?
legendary
Activity: 3276
Merit: 16448
Thick-Skinned Gang Leader and Golden Feather 2021
How do you currently extract the data you publish?
See:
Credits
Blockchair Database Dumps has a staggering amount of data, easily accessible (at 10 100 kB/s) with daily updates. All data in this topic comes from Blockchair.
(nobody ever reads the OP)

Quote
For operations on local blocks I have used that parser: https://github.com/gcarq/rusty-blockparser
Configuration was really easy, processing of course takes some time.
Memory usage to get balances: ~18GB. That would take a strong VPS.
legendary
Activity: 2842
Merit: 7333
Crypto Swap Exchange
Let me know, I can provide an FTP access to you.

People still use FTP these days? I remember using FTP with FileZilla long time ago.

I'm still uploading data, but I'm not sure if I'll extend this hosts contract. It's now spitting out hardware errors:

I would demand partial/full refund.
legendary
Activity: 952
Merit: 1367

You could run Bitcoin Core and processing script on local device, then upload the result to your VPS/seedbox.
Apart from the fact that I wouldn't know how to do this, I don't really want to add more load to my local PC.

Wait, I am lost. How do you currently extract the data you publish?
For operations on local blocks I have used that parser: https://github.com/gcarq/rusty-blockparser
Configuration was really easy, processing of course takes some time.
Pages:
Jump to: