Pages:
Author

Topic: List of all Bitcoin addresses with a balance (Read 9477 times)

legendary
Activity: 3290
Merit: 16489
Thick-Skinned Gang Leader and Golden Feather 2021
January 29, 2025, 09:45:17 AM
I think sort was single-threaded when no -parallel setting was given.
Mine uses all cores until available memory becomes a limitation.
full member
Activity: 316
Merit: 193
Quote
Quote
Code:
sort -u -S 20% --parallel=16
Are you sure this makes it faster? When I tested it (on a server with HDD), adding more CPU threads only helps if it fits in RAM, and with more threads, sort needs more memory so you don't want that if it means writing more to /tmp.
Without the parallel-setting, sort already uses many cores. So I used this setting to limit it.

Well, need to test memory percent value and number of threads with different values to make it fastest.

I think sort was single-threaded when no -parallel setting was given.
legendary
Activity: 3290
Merit: 16489
Thick-Skinned Gang Leader and Golden Feather 2021
I've put LC_ALL=C before sort command
Why not just put it at the start of your script once?
Code:
export LC_ALL=C
I'll add this to the OP. And I'll add it to my own script, so it no longer depends on the server I'm using. And right when I wanted to do this, I realized it's there already:
Code:
export LC_ALL=C   # This makes "sort" a few percent faster
The reason I added this a long time ago is in the comment. I completely forgot this was in there. I'll also add it to "all addresses ever used".

Quote
Code:
sort -u -S 20% --parallel=16
Are you sure this makes it faster? When I tested it (on a server with HDD), adding more CPU threads only helps if it fits in RAM, and with more threads, sort needs more memory so you don't want that if it means writing more to /tmp.
Without the parallel-setting, sort already uses many cores. So I used this setting to limit it.
full member
Activity: 316
Merit: 193
@LoyceV

So I've made some further tests.

And seems like it is all ok!

LC_ALL=C should be used on systems, that have it local or different.

I've put LC_ALL=C before sort command and before compare command (my cmn script, which uses comm program) to test:

sort-u-mt:
Code:
#!/usr/bin/env bash
FILESIZE=$(stat -c%s "$1")
time pv -cN input "$1" | dos2unix -f | LC_ALL=C sort -u -S 20% --parallel=16 | pv -cN output -s $FILESIZE > "$1.sorted~"
if [[ -s "$1.sorted~" ]]
then
mv "$1.sorted~" "$1"
echo Done.
else
echo Error!
fi
>&2 echo -ne "\a"

cmn:
Code:
#!/usr/bin/env bash
time LC_ALL=C comm -12 <(pv -cN in1 "$1") <(pv -cN in2 "$2") | (pv -cN out) > "$3"
echo -e "\nResult file has $(wc -l < "$3") lines, head:"
head "$3"
>&2 echo -ne "\a"

So your files are LC_ALL=C sorted but thru my sort my files are not.

If we add LC_ALL=C before sort and comm we get the expected results.

So in my opinion there is no change needed.
?
Activity: -
Merit: -
oh wowwww Shocked Shocked Shocked

Very Thanks
legendary
Activity: 3290
Merit: 16489
Thick-Skinned Gang Leader and Golden Feather 2021
We were talking about that in our private messages.
Thank you for pointing this out Smiley

Quote
Maybe sorting should use LC_ALL=C or LC_ALL=C.UTF-8 before sorting command so it could be always one type of sorting for all systems (it should work like that)
I'll wait if someone responds with a good reason to keep things the way they are. If not, I think I'll go for LC_ALL=C.

Quote
Because systems/servers/OSes differ, we always should give the sorting way for each sorting command (LC_ALL...)
I agree. I just didn't know about the difference, and (before my dedicated server disappeared) never stumbled upon this problem.

Quote
If we change that now, we can break peoples' scripts, but we should make one way of sorting forever, that's a engineering idea as it should be
Let's say give it 2 weeks. But I guess most people don't read here, until after I broke their script by changing things Tongue

Quote
We can see in sorted file, on first page that fits the screen that the sorting differs depending on system or given LC_ALL; it is visible by naked eye that the addresses are sorted other way (mainly lowercase-uppercase are in other order)
Here's the difference:
Code:
11111111111111111111HV1eYjP
11111111111111111111HeBAGj
11111111111111111111QekFQw
11111111111111111111UpYBrS
11111111111111111111g4hiWR
11111111111111111111jGyPM8
11111111111111111111o9FmEC
11111111111111111111ufYVpS
vs:
Code:
11111111111111111111g4hiWR
11111111111111111111HeBAGj
11111111111111111111HV1eYjP
11111111111111111111jGyPM8
11111111111111111111o9FmEC
11111111111111111111QekFQw
11111111111111111111ufYVpS
11111111111111111111UpYBrS
That is annoying to deal with!



This can of course easily be avoided by sorting the data on your local system before using it. For this project, it's quite easy. But for all Bitcoin addresses ever used, it can take hours to sort the data.
full member
Activity: 316
Merit: 193
It was brought to my attention that my "sort" is "different" now, and I got these results testing:

Code:
cat Bitcoin_addresses_LATEST.txt.gz | gunzip | sha256sum
df0baad2301e9b897a02bd3fccb115968c82eb3956143e2f5b4c3ad7b2c227bf  -

So far so good.
Now, this file is sorted on my server from a cronjob. But when I sort it on my local computer, I get this:
Code:
cat Bitcoin_addresses_LATEST.txt.gz | gunzip | sort -S20% | sha256sum
27c2541369d0546ec7c7e70d09d807d8fc6d39435f8857e5ebbf8386584be2d2  -

Has anyone else noticed an incompatible sorting method? Should I change this to a different sorting? Or would that break scripts from people who are currently using it?

We were talking about that in our private messages.

My suggestions:
  • Use pv instead of cat, so you could see progress, it won't affect the result
  • Maybe sorting should use LC_ALL=C or LC_ALL=C.UTF-8 before sorting command so it could be always one type of sorting for all systems (it should work like that)
  • Because systems/servers/OSes differ, we always should give the sorting way for each sorting command (LC_ALL...)
  • If we change that now, we can break peoples' scripts, but we should make one way of sorting forever, that's a engineering idea as it should be
  • We can see in sorted file, on first page that fits the screen that the sorting differs depending on system or given LC_ALL; it is visible by naked eye that the addresses are sorted other way (mainly lowercase-uppercase are in other order)
legendary
Activity: 3290
Merit: 16489
Thick-Skinned Gang Leader and Golden Feather 2021
It was brought to my attention that my "sort" is "different" now, and I got these results testing:

Code:
cat Bitcoin_addresses_LATEST.txt.gz | gunzip | sha256sum
df0baad2301e9b897a02bd3fccb115968c82eb3956143e2f5b4c3ad7b2c227bf  -

So far so good.
Now, this file is sorted on my server from a cronjob. But when I sort it on my local computer, I get this:
Code:
cat Bitcoin_addresses_LATEST.txt.gz | gunzip | sort -S20% | sha256sum
27c2541369d0546ec7c7e70d09d807d8fc6d39435f8857e5ebbf8386584be2d2  -

Has anyone else noticed an incompatible sorting method? Should I change this to a different sorting? Or would that break scripts from people who are currently using it?
legendary
Activity: 3290
Merit: 16489
Thick-Skinned Gang Leader and Golden Feather 2021
It's back!
My site addresses.loyce.club/ is back online. For now, there's only the most recent snapshot. During the next year, I'll keep more snapshots again.

Direct link to LATEST versions
blockchair_bitcoin_addresses_and_balance_LATEST.tsv.gz (currently 1.5GB)
Bitcoin_addresses_LATEST.txt.gz (currently 1.3GB)

Bandwidth
Starting December 2024, I have a new VPS. This server is allowed 16 TB bandwidth per month. Enjoy!



I've received a few PMs from people who missed my data. It's always good to see it fills a need.
Mek
jr. member
Activity: 75
Merit: 7
mtc.mekweb.eu - mega transistor clock
I still have the "latest" txt file on my disk, list of addresses without balances. Date modified says 29 Nov 2024. Would that help?
I'd appreciate having it. Any where to download it?
In retrospect, I should have made backups of this too. I will next time, I now regret losing monthly snapshots from the past year.
Sure, link sent via PM Smiley
legendary
Activity: 3290
Merit: 16489
Thick-Skinned Gang Leader and Golden Feather 2021
I still have the "latest" txt file on my disk, list of addresses without balances. Date modified says 29 Nov 2024. Would that help?
I'd appreciate having it. Any where to download it?
In retrospect, I should have made backups of this too. I will next time, I now regret losing monthly snapshots from the past year.
Mek
jr. member
Activity: 75
Merit: 7
mtc.mekweb.eu - mega transistor clock
I still have the "latest" txt file on my disk, list of addresses without balances. Date modified says 29 Nov 2024. Would that help?
legendary
Activity: 3290
Merit: 16489
Thick-Skinned Gang Leader and Golden Feather 2021
To get good list of addresses with balances, I used a script in python.
That's a long way to turn the list with balances into the list without balances (which I had on my site already). But it doesn't fix the problem that they're still unavailable.

I'm still adding data to my new server, when I'm done, I'll see if I can get this list from Bitcoin Core myself.
full member
Activity: 316
Merit: 193
To get good list of addresses with balances, I used a script in python.
It is very good to then check other addresses with this file if you need fast solution.
Both files must be sorted, then use "comm -12" to find addresses that are in both files.

Code:
#!/usr/bin/env bash
# apt install aria2
echo Downloading...
aria2c -x4 http://addresses.loyce.club/blockchair_bitcoin_addresses_and_balance_LATEST.tsv.gz
echo Unpacking...
pv blockchair_bitcoin_addresses_and_balance_LATEST.tsv.gz | gunzip > blockchair_bitcoin_addresses_and_balance_LATEST.tsv
echo Sorting...
pv -B 1M -cN input blockchair_bitcoin_addresses_and_balance_LATEST.tsv | cut -f 1 | sort -u --parallel=16 | pv -cN output > addrs-with-bal.txt
echo Done!
>&2 echo -ne "\a"
legendary
Activity: 3290
Merit: 16489
Thick-Skinned Gang Leader and Golden Feather 2021
Update:
I have a new server for this project. But.... I didn't keep a backup of the actual data. There was just a lot of data, so I kinda skipped this one Sad I have my script to download and process it, but the page design is hard to find back, and my year of monthly snapshots is gone.
To make matters worse, gz.blockchair.com where I get the data was last updated on November 30, and hasn't updated since. This has happened before and usually they come back, but for now I have no data.
If it takes too long I'll have to figure out how to get this data from Bitcoin Core myself.
If anyone happens to have some old backups, please let me know Smiley
legendary
Activity: 3290
Merit: 16489
Thick-Skinned Gang Leader and Golden Feather 2021
Still working for me.  https://loyce.club/
That's on AWS, still going strong. "This topic" is on my dedicated Xeon server:
The data
See addresses.loyce.club. I keep 18 snapshots of Blockchair's daily data.
Vod
legendary
Activity: 3668
Merit: 3010
Licking my boob since 1970
It looks like my server is gone Sad I'm waiting for a response from my sponsor.

Still working for me.  https://loyce.club/
legendary
Activity: 3290
Merit: 16489
Thick-Skinned Gang Leader and Golden Feather 2021
It looks like my server is gone Sad I'm waiting for a response from my sponsor.
Mek
jr. member
Activity: 75
Merit: 7
mtc.mekweb.eu - mega transistor clock
I noticed my bandwidth consumption dropped from 20 to 14 TB/month Wink But now that I think about it, my memory may be incorrect. I think it used to be 10 TB/month, which means it increased. I'm not so sure anymore, I don't really keep track.
Possibly because there were many days without update. No need to redownload old files.
legendary
Activity: 3290
Merit: 16489
Thick-Skinned Gang Leader and Golden Feather 2021
Hahaha, is this post something that really needs a BTCump?
I noticed my bandwidth consumption dropped from 20 to 14 TB/month Wink But now that I think about it, my memory may be incorrect. I think it used to be 10 TB/month, which means it increased. I'm not so sure anymore, I don't really keep track.

Quote
After that users always ask about how to process/edit or manage those big files... most of them don't know any Linux environment or command   Undecided
If you know an equivalent that works in Windows, please post it.

Quote
Anyway thanks for your contributions to this community.
No problem Smiley
Pages:
Jump to: