Pages:
Author

Topic: List of all Bitcoin addresses ever used - currently UNavailable on temp location - page 7. (Read 4161 times)

legendary
Activity: 3290
Merit: 16489
Thick-Skinned Gang Leader and Golden Feather 2021
I had used AWS as an example because I believed you used it for some of your other projects.
Correct, loyce.club runs on AWS (sponsored).

I meant on a VPS with another cloud provider with unmetered traffic, such as Hetzner.
I'm not using anything with "unmetered" traffic.



Still working on restoring all data from scratch. I'm curious to see if it matches any of the 2 existing files.
I don't really get the focus on data traffic though, right after I got a good deal on a new VPS. I'm good for now Smiley

I was under the impression that traffic out of the AWS network (for AWS) will count as egress traffic, and will be billed accordingly.
AWS charges $0.09/GB, and especially since this one is sponsored, I don't want to abuse it. I love how stable the server is though, it has never been down.
copper member
Activity: 1666
Merit: 1901
Amazon Prime Member #7
~snip

If you have the network capacity then it's better to just serve it locally (except, AWS bills your upload traffic too  Angry)
Your local ISP might not like it very much if you are uploading that much data.

Sorry, when I said locally, I meant on a VPS with another cloud provider with unmetered traffic, such as Hetzner.

I guess I have been doing too much of my work on the cloud to tell the difference anymore.
Ahh, gotcha.

I was under the impression that traffic out of the AWS network (for AWS) will count as egress traffic, and will be billed accordingly. Migrating your data from AWS to GCS will incur a charge from AWS for the amount of your data. There might be ways around this, I'm not sure.
legendary
Activity: 1568
Merit: 6660
bitcoincleanup.com / bitmixlist.org
~snip

If you have the network capacity then it's better to just serve it locally (except, AWS bills your upload traffic too  Angry)
Your local ISP might not like it very much if you are uploading that much data.

Sorry, when I said locally, I meant on a VPS with another cloud provider with unmetered traffic, such as Hetzner.

I guess I have been doing too much of my work on the cloud to tell the difference anymore.
Vod
legendary
Activity: 3668
Merit: 3010
Licking my boob since 1970
Your local ISP might not like it very much if you are uploading that much data.

Quickseller, most ISPs have a download bottleneck - not upload.

So few people upload more than they download that most ISPs don't even restrict uploads. 

What ISP does LoyceV use that does not like uploading?
copper member
Activity: 1666
Merit: 1901
Amazon Prime Member #7
As a FYI, you generally will not want to host files on a server. You will probably want to host files in a storage bucket that can be accessed by a server.
Amazon charges $0.09 per GB outgoing data, that's rediculous for this purpose (my current 5 TB bandwidth limit would cost $450 per month when maxed out). And Amazon wants my creditcard instead of Bitcoin.
I had used AWS as an example because I believed you used it for some of your other projects.

Yes, transferring data to the internet is very expensive. You can use a CDN (content delivery network) to reduce costs a little bit. 5 TB of data is a lot.

Quote
Separately, sorting lists are not scalable, period.
Actually, sort performs quite well. I've tested:
10M lines: 10 seconds (fits in RAM)
50M lines: 63 seconds (starts using temporary files)
250M lines: 381 seconds (using 2 GB RAM and temporary files)
So a 5 times larger file takes 6 times longer to sort. I'd say scalability is quite good.
I think you are proving my point. The more input you have, the more time it takes to process one additional input.

To put it another way, it takes 1 unit of time to sort a list with a length of 2, it takes 1 + a units of time to sort a list with a length of 3, it takes 1 + a + b units of time to sort a list with a length of 4, and so on. The longer the list, the longer it will take to sort one additional line.

As a FYI, you generally will not want to host files on a server. You will probably want to host files in a storage bucket that can be accessed by a server.

If you want to update a file that takes a lot of resources, you can create a VM, execute a script that updates the file, and uploads it to a S3 (on AWS) bucket. You would then be able to access that file using another VM that takes fewer resources.

That may save on local resources but you will be paying a lot of money per month if people download several hundred gigabytes each month particularly if the files are large like the files hosted in the OP.

If you have the network capacity then it's better to just serve it locally (except, AWS bills your upload traffic too  Angry)
Your local ISP might not like it very much if you are uploading that much data.
legendary
Activity: 1568
Merit: 6660
bitcoincleanup.com / bitmixlist.org
As a FYI, you generally will not want to host files on a server. You will probably want to host files in a storage bucket that can be accessed by a server.

If you want to update a file that takes a lot of resources, you can create a VM, execute a script that updates the file, and uploads it to a S3 (on AWS) bucket. You would then be able to access that file using another VM that takes fewer resources.

That may save on local resources but you will be paying a lot of money per month if people download several hundred gigabytes each month particularly if the files are large like the files hosted in the OP.

If you have the network capacity then it's better to just serve it locally (except, AWS bills your upload traffic too  Angry)
legendary
Activity: 3290
Merit: 16489
Thick-Skinned Gang Leader and Golden Feather 2021
Thanks for the update the last .gz you had I think was from September.
Correct (August 6 and September 2).

As a FYI, you generally will not want to host files on a server. You will probably want to host files in a storage bucket that can be accessed by a server.[/quote]
Amazon charges $0.09 per GB outgoing data, that's rediculous for this purpose (my current 5 TB bandwidth limit would cost $450 per month when maxed out). And Amazon wants my creditcard instead of Bitcoin.

Quote
If you want to update a file that takes a lot of resources, you can create a VM, execute a script that updates the file, and uploads it to a S3 (on AWS) bucket. You would then be able to access that file using another VM that takes fewer resources.
Still, that's quite excessive for just 2 files that are barely used.

Quote
Separately, sorting lists are not scalable, period.
Actually, sort performs quite well. I've tested:
10M lines: 10 seconds (fits in RAM)
50M lines: 63 seconds (starts using temporary files)
250M lines: 381 seconds (using 2 GB RAM and temporary files)
So a 5 times larger file takes 6 times longer to sort. I'd say scalability is quite good.

It just takes a while because it uses temporare disk storage. Given enough RAM, it can utilize multiple cores.

Quote
There are some things you can do to increase the speed, such as keep the list in RAM, or cutting the number of instances the entire list is reviewed, but you ultimately cannot sort an unordered very large list.
The 256 GB RAM server idea would cost a few dollars per hour, so I'll do with less.
copper member
Activity: 1666
Merit: 1901
Amazon Prime Member #7
Some results: The awk-thing uses just over 1 GB memory for 10 million addresses. So for 1.5 billion addresses, a 256 GB server should be enough. At AWS, that would cost a few dollars per hour.
As a FYI, you generally will not want to host files on a server. You will probably want to host files in a storage bucket that can be accessed by a server.

If you want to update a file that takes a lot of resources, you can create a VM, execute a script that updates the file, and uploads it to a S3 (on AWS) bucket. You would then be able to access that file using another VM that takes fewer resources.

Separately, sorting lists are not scalable, period. There are some things you can do to increase the speed, such as keep the list in RAM, or cutting the number of instances the entire list is reviewed, but you ultimately cannot sort an unordered very large list.
newbie
Activity: 12
Merit: 3
Just yesterday, I got a good deal on a new VPS (more memory, more disk, more CPU and more bandwidth). It's dedicated to only this project (and I have no idea how reliable it's going to be). I've updated the OP.

There's a problem though. There are:
756,494,121 addresses according to addresses_in_order_of_first_appearance.txt.gz
756,524,407 addresses according to addresses_sorted.txt.gz
Obviously, these numbers should be the same. I haven't scheduled automated updates yet, I first want to recreate this data from scratch to see which number is correct.

Thanks for the update the last .gz you had I think was from September.

legendary
Activity: 3290
Merit: 16489
Thick-Skinned Gang Leader and Golden Feather 2021
Just yesterday, I got a good deal on a new VPS (more memory, more disk, more CPU and more bandwidth). It's dedicated to only this project (and I have no idea how reliable it's going to be). I've updated the OP.

There's a problem though. There are:
756,494,121 addresses according to addresses_in_order_of_first_appearance.txt.gz
756,524,407 addresses according to addresses_sorted.txt.gz
Obviously, these numbers should be the same. I haven't scheduled automated updates yet, I first want to recreate this data from scratch to see which number is correct.
legendary
Activity: 3290
Merit: 16489
Thick-Skinned Gang Leader and Golden Feather 2021
Are you downloading Blockchair dumps at the slow rate?
Yes. But 100 kB/s isn't a problem anymore: the initial download took a long time, but for daily updates it doesn't take that long.

Quote
I just contacted Blockchair for an API key, which enables people to download at the fast rate, and a support rep told me they cost $500/month.
I thought they'd offer it for free for certain users, but this makes sense from a business point of view.

Quote
If network bandwidth is a problem I'm able to host this on my hardware if you like.
Just this month I'm at 264 GB for this project, and 174 GB for all Bitcoin addresses with a balance. That means this full list is only downloaded a few times per month, but the funded addy list is downloaded a few times per day.
I'm more in need for more disk space for sorting this data, but I haven't decided yet where to host it. 100 GB disk space isn't enough.
legendary
Activity: 1568
Merit: 6660
bitcoincleanup.com / bitmixlist.org
@LoyceV

Are you downloading Blockchair dumps at the slow rate? I just contacted Blockchair for an API key, which enables people to download at the fast rate, and a support rep told me they cost $500/month.

If network bandwidth is a problem I'm able to host this on my hardware if you like.
legendary
Activity: 3290
Merit: 16489
Thick-Skinned Gang Leader and Golden Feather 2021
Sample: unique_addresses.txt.gz: all Bitcoin addresses ever used, without duplicates, sorted by address (Warning: 15 GB)
I didn't have enough disk space to process the 31 GB file the way I want it, so I've (temporarily) removed this file. After I'm done with that, I'll restore the missing file. Give it a few days.
Well, that didn't go as planned Sad Although I can keep all unique addresses in order of first appearance, it turns out 100 GB disk space is not enough for the temporary space it needs. Because of the large data traffic, I don't want to use loyce.club's AWS hosting for this, and I'm not sure yet if I should get another VPS just for this.

An alternative would be to run it from my home PC, but the heavy writing will just wear out my SSD. So this project is on hold for now. Daily updates still continue.
legendary
Activity: 3290
Merit: 16489
Thick-Skinned Gang Leader and Golden Feather 2021
Sample: unique_addresses.txt.gz: all Bitcoin addresses ever used, without duplicates, sorted by address (Warning: 15 GB)
I didn't have enough disk space to process the 31 GB file the way I want it, so I've (temporarily) removed this file. After I'm done with that, I'll restore the missing file. Give it a few days.

Since I got no response to my question above, I'll go with 2 versions:
  • All addresses ever used, without duplicates, in order of first appearance.
  • All addresses ever used, without duplicates, sorted.
The first file feels nostalgic, the second file will be very convenient to match addresses with a list of your own.
legendary
Activity: 3290
Merit: 16489
Thick-Skinned Gang Leader and Golden Feather 2021
Sample: addresses.txt.gz: all addresses in chronological order, with duplicates (Warning: 31 GB):
Code:
1A1zP1eP5QGefi2DMPTfTL5SLmv7DivfNa
12c6DSiU4Rq3P4ZxziKxzrL5LmMBrzjrJX
1HLoD9E4SDFFPDiYfNYnkBLQ85Y51J3Zb1
.......
3GFfFQAFgXKiA1qqUK6rqBpEpG4vZDos6t
3Mbtv47gZ2eN6Fy7owpgHHwSLYHS42P56P
38JyF2RQknBUMETyRT2yGndDJFYSp6hJNg
Due to limitations on disk space, I'm considering removing this file. Unless anyone has a need for it, so: can anyone tell me what this can be used for? I know it can be used to make a Top 100 of addresses with most receiving transactions.

Instead of this list, I want to make a new list without duplicates, but still in order of first appearance of each address. Thanks to bob123, I can do that now!
I'll also keep the sorted list, because that list is very convenient to find matches on a list.



I need some time to process all data. When done, I'll rewrite some of my posts.
legendary
Activity: 3346
Merit: 3130
This is an awesome apport for the community, some weeks ago i see a user asking for a list like this to make a bruteforce... Some users use their addy as password, that's why a list like this is a great tool, thanks again to LoyceV for making it fo us.
legendary
Activity: 3290
Merit: 16489
Thick-Skinned Gang Leader and Golden Feather 2021
If someone has enough RAM to experiment, I'd love to see the result of this (on the 31 GB file):
This looks very promising:
Code:
cat -n input.txt | sort -uk2 | sort -nk1 | cut -f2- > output.txt
I'll be testing it soon.

Some results: The awk-thing uses just over 1 GB memory for 10 million addresses. So for 1.5 billion addresses, a 256 GB server should be enough. At AWS, that would cost a few dollars per hour.

I've tested with the first 10 million lines, and can confirm both give the same result:
Code:
head -n 10000000 addresses.txt | awk '!a[$0]++' | md5sum
head -n 10000000 addresses.txt | nl | sort -uk2 | sort -nk1 | cut -f2 | md5sum
As expected, awk is faster.
newbie
Activity: 29
Merit: 50
I actually can Cheesy I found this regexp on Stackoverflow:
Code:
egrep --regexp="^[13][a-km-zA-HJ-NP-Z1-9]{25,34}$" filename
With some slight changes it stops matching parts of Eth-addresses:
Code:
egrep -w --regexp="[13][a-km-zA-HJ-NP-Z1-9]{25,34}" *


I have compiled these from various sources and use them to automatically set my blockchain explorer options based on user input, and also keep them at my .zshrc :
Code:
#cryptocurrency greps

#btc1 and btc2 combined
alias btcgrep="grep -Ee '\b[13][a-km-zA-HJ-NP-Z1-9]{25,34}\b' -e '\bbc(0([ac-hj-np-z02-9]{39}|[ac-hj-np-z02-9]{59})|1[ac-hj-np-z02-9]{8,87})\b'"

#legacy addresses only
alias btcgrep1="grep -E '\b[13][a-km-zA-HJ-NP-Z1-9]{25,34}\b'"
#http://mokagio.github.io/tech-journal/2014/11/21/regex-bitcoin.html

#bech32 v1 and v0 addresses
alias btcgrep2="grep -E '\bbc(0([ac-hj-np-z02-9]{39}|[ac-hj-np-z02-9]{59})|1[ac-hj-np-z02-9]{8,87})\b'"
#https://stackoverflow.com/questions/21683680/regex-to-match-bitcoin-addresses

#bech32 addresses only
alias btcgrep3="grep -E '\bbc1[ac-hj-np-zAC-HJ-NP-Z02-9]{11,71}\b'"

#both legacy and bech32
alias btcgrep4="grep -E '\b([13][a-km-zA-HJ-NP-Z1-9]{25,34}|bc1[ac-hj-np-zAC-HJ-NP-Z02-9]{11,71})\b'"
#http://mokagio.github.io/tech-journal/2014/11/21/regex-bitcoin.html

#private keys
alias btcgrep5="grep -E '\b[5KL][1-9A-HJ-NP-Za-km-z]{50,51}\b'"
#word boundary: '\b'
#https://bitcoin.stackexchange.com/questions/56737/how-can-i-find-a-bitcoin-private-key-that-i-saved-in-a-text-file

#transaction hashes
alias btcgrep6="grep -E '\b[a-fA-F0-9]{64}\b'"
#https://stackoverflow.com/questions/46255833/bitcoin-block-and-transaction-regex
#https://bitcoin.stackexchange.com/questions/70261/recognize-bitcoin-address-from-block-hash-and-transaction-hash

#block hashes
alias btcgrep7="grep -E '\b[0]{8}[a-fA-F0-9]{56}\b'"
#https://stackoverflow.com/questions/46255833/bitcoin-block-and-transaction-regex

#ethereum address hash
#test for 'plausibility'
alias ethgrep="grep -E '\b(0x)?[0-9a-fA-F]{40}\b'"
#https://ethereum.stackexchange.com/questions/1374/how-can-i-check-if-an-ethereum-address-is-valid

#ethereum transaction hash
alias ethgrep2="grep -E '\b(0x)?([A-Fa-f0-9]{64})\b'"  #parentheses are not necessary
#https://ethereum.stackexchange.com/questions/34285/what-is-the-regex-to-validate-an-ethereum-transaction-hash/34286

Flag -w is 'word bondary' and can also be set within the regex with '\b' at the ends.

Very good work on compiling those addresses, mate!
legendary
Activity: 3290
Merit: 16489
Thick-Skinned Gang Leader and Golden Feather 2021
If someone has enough RAM to experiment, I'd love to see the result of this (on the 31 GB file):
I suggest instead of the awk one-liner you look at gz-sort, it is a small linux program that sorts gzip-compressed files on disk while using a very small memory buffer, as low as 4 megabytes.
I checked, but it does what I'm doing already. The awk-command removes duplicate lines without sorting the lines. I'd like to do it, but I can't run it.

Quote
This prints 1111111111111111111114oLvT2. This address was used 55405 times (!)
I'd be interested to see which real address is the shortest. The 111111111-addresses are all burn addresses. I'm not entirely sure what determines address length, but from what I've seen, shorter addresses are much harder to find. I've been looking for short addresses created from mini-private-keys, and they were quite rare.
To find a real short address, it needs to have sent funds too.

Quote
Maybe you can also make a list of addresses sorted by balance
See List of all Bitcoin addresses with a balance.
legendary
Activity: 1568
Merit: 6660
bitcoincleanup.com / bitmixlist.org
If someone has enough RAM to experiment, I'd love to see the result of this (on the 31 GB file):

I suggest instead of the awk one-liner you look at gz-sort, it is a small linux program that sorts gzip-compressed files on disk while using a very small memory buffer, as low as 4 megabytes.

You sort the file using
Code:
gz-sort -u addresses.txt.gz addresses_sorted.txt.gz

The -u switch removes duplicate lines from the sorted output, and you can increase the buffer size to give it a larger buffer for transporting stuff, but this isn't necessary. I used -S 1G to give it a 1 gigabyte buffer and it took around 7 hours to complete so not much shorter than the advertised completion time, 9 or 10 hours. So this program will run well in your VM, the RAM factor isn't important.

You need to compile it yourself using make but it has minimal dependencies, only zlib and GNU headers.

I used it to find the smallest address in the dump using
Code:
zcat addresses_sorted.txt.gz | head -n 55405 | uniq

This prints 1111111111111111111114oLvT2. This address was used 55405 times (!)

Here are some the other smallest addresses:

Code:
1111111111111111111114oLvT2
111111111111111111112BEH2ro
111111111111111111112xT3273
1111111111111111111141MmnWZ
111111111111111111114ysyUW1
1111111111111111111184AqYnc
11111111111111111111BZbvjr
11111111111111111111CJawggc
11111111111111111111HV1eYjP
11111111111111111111HeBAGj
11111111111111111111QekFQw
11111111111111111111UpYBrS
11111111111111111111g4hiWR
11111111111111111111jGyPM8
11111111111111111111o9FmEC
11111111111111111111ufYVpS
111111111111111111121xzjPWX1
111111111111111111128gzo7iT
11111111111111111112AmVxQeF
11111111111111111112Fr3DURyz
11111111111111111112GvNtZ1K
11111111111111111112VUYD4wA
1111111111111111111313xyAwW
111111111111111111137vGPgFbT
11111111111111111113aT9ZSLG
111111111111111111168xDACCG
11111111111111111116B8w87yU



Maybe you can also make a list of addresses sorted by balance, now that you have an efficient way to deduplicate them.
Pages:
Jump to: