Pages:
Author

Topic: All used addresses (Read 718 times)

newbie
Activity: 29
Merit: 50
August 24, 2020, 06:44:49 AM
#29
i wrote my methodology for my addresses list with more details in my github repo ..
Here is how i did the sorting..

Code:
$ export TMPDIR='/large/tmp/dir'
$ export LC_ALL=C
$ nl concat.txt | sort -k2 -u | sort -n | cut -f2 > final.txt

Note that using LC_ALL=C will greatly speed up sorting!
legendary
Activity: 3290
Merit: 16489
Thick-Skinned Gang Leader and Golden Feather 2021
August 24, 2020, 06:44:38 AM
#28
Code:
cat -n input.txt | sort -uk2 | sort -nk1 | cut -f2- > output.txt
This looks genius in simplicity! It worked on a small sample, I'm currently transfering and extracting 31 GB of data to do the full test. There's no way the double sort will fit the 100 GB VPS, but this should work without using a lot of RAM.
I'll continu my test results in my own topic: List of all Bitcoin addresses ever used.
legendary
Activity: 1624
Merit: 2481
August 24, 2020, 06:13:58 AM
#27
I want to remove duplicate lines without changing the order, so only keeping the first occurrence.

If i am not mistaken, the following should work:
Code:
cat -n input.txt | sort -uk2 | sort -nk1 | cut -f2- > output.txt

None of these commands needs to hold the file in memory all at once.
But as mentioned previously, sort does need quite some disk space to create temporary files. So that might be a bottleneck, depending on your system specs.
hero member
Activity: 1659
Merit: 687
LoyceV on the road. Or couch.
August 24, 2020, 05:43:44 AM
#26
I might have just misunderstood the problem, might want to elaborate the actual issue?
I want to remove duplicate lines without changing the order, so only keeping the first occurrence.

Say:
A
G
D
A
B
C
D

I want to keep:
A
G
D
B
C
legendary
Activity: 1624
Merit: 2481
August 24, 2020, 05:10:57 AM
#25
The sort command can keep chronological order without using ram
How? I haven't found that option.

The command sort only takes roughly 50% of your available RAM.
If you are running out of memory using sort on a large file, most likely there isn't enough space on your hard drive.

What sort does, when the file is larger than your available RAM, is that it creates temporary files on the hard drive which are then merge-sorted at the end.
So the overall capacity needed is the sizes of the file times 3 (if you keep the original one) or 2 (if you overwrite the original file).


I might have just misunderstood the problem, might want to elaborate the actual issue?
legendary
Activity: 3290
Merit: 16489
Thick-Skinned Gang Leader and Golden Feather 2021
August 24, 2020, 04:59:41 AM
#24
The sort command can keep chronological order without using ram
How? I haven't found that option.
newbie
Activity: 29
Merit: 50
August 23, 2020, 05:06:59 PM
#23
I made List of all Bitcoin addresses ever used.

one feature of my lists is i tried to keep the original order in which addresses first appeared in the blockchair dumps..
It works with awk:
Code:
awk '!a[$0]++'
But this requires far too much memory. I can use this on data per day, but not on all data.
So for now, I gave up trying to keep addresses in chronological order. I'll keep the original data in case I find a different solution (or enough RAM) later.

Hey that is very nice , bro.
You have got the means, work hard and you are very good at it.
I knew that awk one-liner you wrote, tho i tried using perl because thought that might need less ram..
The sort command can keep chronological order without using ram but it needs a large temp directory (/tmp will not work as it is limited by the system ram value).
ok, cheers!
legendary
Activity: 3290
Merit: 16489
Thick-Skinned Gang Leader and Golden Feather 2021
August 03, 2020, 03:27:11 AM
#22
I made List of all Bitcoin addresses ever used.

one feature of my lists is i tried to keep the original order in which addresses first appeared in the blockchair dumps..
It works with awk:
Code:
awk '!a[$0]++'
But this requires far too much memory. I can use this on data per day, but not on all data.
So for now, I gave up trying to keep addresses in chronological order. I'll keep the original data in case I find a different solution (or enough RAM) later.
newbie
Activity: 29
Merit: 50
July 20, 2020, 07:44:45 PM
#21
please donate to me instead( LOL), i never received a donation lol
no i am not selling data
[it will take me some days (not many, though) to upload every thing to git as of today]

and i think this is a much powerful proof we are NOT
really interested in money, except some private companies may


only one liners,  yeah oh... go learn some more
@loyce you didnt post the resulting lists, how can anybody be sure about; 
@windows/osx ;people please remove windows or mac os
newbie
Activity: 29
Merit: 50
July 20, 2020, 02:03:27 PM
#20
really pleased with the GNU sort (that is called the genius sort)..

this ought to be about right.. will try and upload split unique address list files to git.

one feature of my lists is i tried to keep the original order in which addresses first appeared in the blockchair dumps..

you should not do it all at once, if you have got disc space.. i have produced some intermediate  files for processing..


dump files from blockchair (4210 files):

from 20090103 to 20200718

addresses total: 1483853800

unique addresses total: 692773144

https://github.com/mountaineerbr/bitcoin-all-addresses
legendary
Activity: 3290
Merit: 16489
Thick-Skinned Gang Leader and Golden Feather 2021
July 20, 2020, 03:23:10 AM
#19
Answering this deleted post, I'm currently running this:
Code:
for file in *.tsv.gz; do gunzip -c $file | grep -v is_from_coinbase | cut -f 7 >> /tmp/addresses.txt; done
It takes a while, but doesn't consume a lot of memory. When done, I'll sort | uniq the file and get the result. Now that I think about it, I could have piped the whole thing through sort in the same line instantly.

When I have a bit more time, I'll create daily updates for all used Bitcoin addresses and make it available for download. It's going to be a big file though.



Blockchair's /bitcoin/outputs currently takes 106 GB, and grows 2 GB per month. At 100 KB/s, it takes just under 2 weeks to download from Blockchair. For $32 per year, I can run a VPS in Germany with 1 Gbit connection, enough disk space to keep up for a few years, and enough bandwidth to allow 9 downloads per month. If anyone can use this, let me know.



The list with all addresses is 49 GB in size. If you tried to load it to RAM, that's probably why you ran out of memory.
Total address count: 1,484,589,749
1... address count: 1,039,899,708
3... address count: 343,485,961
bc1q... address count: 55,006,904
...-... (with a "dash") address count: 46,197,161

Unique address count:
1... address count: 470,943,308
3... address count: 167,941,821
bc1q... address count: 39,137,878
...-... (with a "dash") weird address count: 15,157,808

And here it stops for now: after processing data for 5 hours, I made a mistake and accidentally overwrote my end-result. I'll restart later.
I'd like to see which address has received most transactions.
legendary
Activity: 3290
Merit: 16489
Thick-Skinned Gang Leader and Golden Feather 2021
July 16, 2020, 01:08:37 PM
#18
Sorry i can't repay.
No problem. I'll run out of disk space on the VPS soon though, so I'll have to start deleting older files.
Judging by bandwidth consumption, someone else downloaded the data already. I'll also run out of monthly bandwidth if only a few people download all 84 GB, but since I have no other use for the VPS, that's okay too.

Quote
here is a shell script i'm using for downloading these files (requires curl and bash)..
https://github.com/mountaineerbr/scripts/blob/master/blockchair.btcblockhain.outputs.sh
Mine is a lot shorter, usually I type everything on just one line. I still have to adjust it for processing the data, but I'll do that when it's complete.
newbie
Activity: 29
Merit: 50
July 16, 2020, 08:22:01 AM
#17
Hey LoyceV

Thank you very much for your effort for uploading those dumps..
Sorry i can't repay.

I am downloading the files, should not take too long now..

As for cheating Blockchair's download restrictions, i don't think that would be bad or evil.

here is a shell script i'm using for downloading these files (requires gnu coreutils)..
https://github.com/mountaineerbr/scripts/blob/master/blockchair.btcoutputs.sh

Take care everyone!

legendary
Activity: 3290
Merit: 16489
Thick-Skinned Gang Leader and Golden Feather 2021
July 16, 2020, 06:55:15 AM
#16
I tried downloading it and here shows 100 KB/s. I can download them and upload them to google drive if you want Smiley I'll be away for a few weeks but I can do that job, just let me know.
Thanks, but no need, I downloaded 4 months since yesterday, so I should be done in a 4 days.
legendary
Activity: 2240
Merit: 3150
₿uy / $ell ..oeleo ;(
July 16, 2020, 03:46:18 AM
#15

The easiest way is to download Blockchair Database Dumps, but at 10 KB/s it takes months.

I tried downloading it and here shows 100 KB/s. I can download them and upload them to google drive if you want Smiley I'll be away for a few weeks but I can do that job, just let me know.

legendary
Activity: 3290
Merit: 16489
Thick-Skinned Gang Leader and Golden Feather 2021
July 15, 2020, 10:05:34 AM
#14
i really just wanted to grep those dumps, sort and uniq those addresses..
That's my plan Tongue I'm curious to see how many different addresses have been used.
Once I have it, I'll provide daily updates.

Quote
if i could access from three different locations
Getting 3 different VPSses (VPSs?) wouldn't even be very expensive, but it feels like cheating Blockchair's download restrictions. I don't think it will take 200 days at 100 kB/s. I'm currently at February 2019.

This is all I have so far, feel free to download 74 GB: http://loyceipv6.tk:20319/blockdata/
newbie
Activity: 29
Merit: 50
July 15, 2020, 07:25:37 AM
#13
hey LoyceV

yeah, and there is this new feature in 0.20 which seems it can dump snapshots of unspent transaction outputs with `dumptxoutset`..
when they update boincoind in my distro, i will check that out.
indeed my downloads are between 48KB/s and 98KB/s from blockchair, still if i could access from three different locations it should take less than 70 days to download everything.. if from one location, 200 days..
i really just wanted to grep those dumps, sort and uniq those addresses..
legendary
Activity: 3290
Merit: 16489
Thick-Skinned Gang Leader and Golden Feather 2021
July 14, 2020, 11:43:00 PM
#12
i am downloading it too, with a bash script, two days and downloaded 10GB..
i think those dumps amount to 1TB, so it will take too long to download from only one location..
Blockchair increased the download speed from 10 to 100 kB/s. I'm currently downloading November 2018. I'm not sure yet if I'll share the raw data, I'd need another VPS because of the size.
It's probably much less than 1 TB though, the entire blockchain isn't that big.
newbie
Activity: 29
Merit: 50
July 14, 2020, 06:48:56 PM
#11
I'm already downloading it, and will publish the data for faster downloads once it's done. I don't know how many weeks/months it'll take though.

Hey LoyceV
Thanks for downloading and offering to share that.
i am downloading it too, with a bash script, two days and downloaded 10GB..
i think those dumps amount to 1TB, so it will take too long to download from only one location..
if you can share the files when you are ready, i will be very ineterested.
cheers!

PS: yes, i think i added some zeroes when calculating download size. this data set size from blockchair, specifically, seems close to 100GB..
sr. member
Activity: 310
Merit: 727
---------> 1231006505
July 08, 2020, 02:58:55 AM
#10
Get all balances here (currently unlocked, 771 megabyte):
https://balances.crypto-nerdz.org/balances/balances-bitcoin-20200708-0000-MXSyuyTD.gz

Please note: it's not my site/data and link will probably be locked again. So if you need it I suggest you get it now.
Pages:
Jump to: