Yes, I'm looking for spending_signature_hex. Sorry about the mixup.
I'm currently running this:
for day in `ls inputs/*gz`; do echo $day; gunzip -c $day | cut -f7,8,18 | grep -v spending_signature_hex | grep pubkeyhash | grep -vP "\t$" >> output2.txt; done
If it doesn't run out of disk space, I'll continue from there tomorrow. It won't fit for sure
That's going to be a problem, because I need a few times more disk space to sort the data. 1 TB might not even be enough.
This is a bit smaller:
for day in `ls /var/www/blockdata.loyce.club/public_html/inputs/*gz`; do echo $day; gunzip -c $day | cut -f7,8,18 | grep -v spending_signature_hex | grep pubkeyhash | grep -vP "\t$" | cut -f1,3 >> output2.txt; done
MrFreeDragon already described what we're looking for here.
I'll try this:
for day in `ls /var/www/blockdata.loyce.club/public_html/inputs/*gz`; do echo $day; gunzip -c $day | cut -f7,8,18 | grep -v spending_signature_hex | grep pubkeyhash | grep -vP "\t$" | cut -f1,3 > /dev/shm/tmp.file; paste <(cat /dev/shm/tmp.file | cut -f1) <(cat /dev/shm/tmp.file | cut -f2 | cut -c149-) | sort -u -S40% >> output2.txt; rm /dev/shm/tmp.file; done
Explanation: it truncates
spending_signature_hex to only the pubkey (I used a detour, but it works without reducing performance much). This reduces the size, and I can remove duplicates for each day already. That should reduce the output size.
I'll let it run overnight.
Update: This produces different results:
1zxhKVZtMBt8kf7km2shn2mkR4NLHGigT 0405eec604993048314294f7c1f9b45c3ed8424ef940426336153831f8813228a788f845e1df353c2021174573e33f2fab05d94e1dd5e5449832ec83ac3d5db17e
1zxhKVZtMBt8kf7km2shn2mkR4NLHGigT 05eec604993048314294f7c1f9b45c3ed8424ef940426336153831f8813228a788f845e1df353c2021174573e33f2fab05d94e1dd5e5449832ec83ac3d5db17e
Somehow the 148 characters isn't a constant.
Another example with full data:
block_id transaction_hash index time value value_usd recipient type script_hex is_from_coinbase is_spendable spending_block_id spending_transaction_hash spending_index spending_time spending_value_usd spending_sequence spending_signature_hex spending_witness lifespan cdd
355251 e73b8e08f86b6130ed00531cf3fb9fff4718f87d33bee879ff6119bd6b88f142 0 2015-05-06 18:42:33 300000 0.7111 14XvKaYnFEH2SioW2oRpgCKaMsaFwmFT7C pubkeyhash 76a91426c15da2beb1fb4a6db5ba4c993a806d22cd10d688ac 0 -1 356405 12c440ae9cd07a530d5a942e56cfae0f0b3797bd3c579366ecd0a7fa8a2d24d0 11 2015-05-14 15:51:23 0.7307 4294967295 47304402204a20d3044915706057c2a8a2a2ec7ba7e6ceb5735ab172329565df9bce70ea68022022f81e2d8b041c9eb3526168ab6ae0a6d375b1696abd86a0721cd64c7951be8101410421557041f930252b79b0fa28e6587680053b3a3672ff0c1dca6a623c79bdc0b6125a7a2be5450e28e49731ba8f60231dd8eceeff170923717d97a1ca5a67acd4 680930 0.023643402777777777
355939 077b2b53f3dc4ab060ad0e5752dd7ab4d036085d58d6e61f1bf335a8e9385c1a 0 2015-05-11 15:34:00 40000 0.097 14XvKaYnFEH2SioW2oRpgCKaMsaFwmFT7C pubkeyhash 76a91426c15da2beb1fb4a6db5ba4c993a806d22cd10d688ac 0 -1 356405 12c440ae9cd07a530d5a942e56cfae0f0b3797bd3c579366ecd0a7fa8a2d24d0 12 2015-05-14 15:51:23 0.0974 4294967295 4730440220066b7baba05a4dca623e7719a7ddca253c0ac9dd9b11c8a4d78a480c2a7b9fbf02203cc0f7f64a919ed44ca5b05cc766f708697e091fee13b9fe3be84b5a12c656c501410421557041f930252b79b0fa28e6587680053b3a3672ff0c1dca6a623c79bdc0b6125a7a2be5450e28e49731ba8f60231dd8eceeff170923717d97a1ca5a67acd4 260243 0.0012048287037037036
355948 8d43ed844cc99bfdb0faa1fa0f7803af32e21457053aba385380d712393773f5 0 2015-05-11 16:35:42 28300 0.0687 14XvKaYnFEH2SioW2oRpgCKaMsaFwmFT7C pubkeyhash 76a91426c15da2beb1fb4a6db5ba4c993a806d22cd10d688ac 0 -1 356405 12c440ae9cd07a530d5a942e56cfae0f0b3797bd3c579366ecd0a7fa8a2d24d0 13 2015-05-14 15:51:23 0.0689 4294967295 4830450221008bf415b6c4bc7118a1d93ef8f6c63b0801d9abe2e41e390670acf9677ee58e5602200da3df76f11ae04758c947a975f84dd7dba990e00c146b451dc4fa514c6cb52d01410421557041f930252b79b0fa28e6587680053b3a3672ff0c1dca6a623c79bdc0b6125a7a2be5450e28e49731ba8f60231dd8eceeff170923717d97a1ca5a67acd4 256541 0.0008402905439814814
356093 65c62d27591ae513e9b5ddb1630b083e9f804c0bc89052c5c17e9ee3cdde29a6 0 2015-05-12 14:16:22 10000 0.0242 14XvKaYnFEH2SioW2oRpgCKaMsaFwmFT7C pubkeyhash 76a91426c15da2beb1fb4a6db5ba4c993a806d22cd10d688ac 0 -1 356405 12c440ae9cd07a530d5a942e56cfae0f0b3797bd3c579366ecd0a7fa8a2d24d0 14 2015-05-14 15:51:23 0.0244 4294967295 483045022100a83ca95b6b3153c5fce971c1eebbeebc892ba6c297157c326a8359c9b408ce1902201904060ce4e1fbd455403546232779dc9ca7bfe3582d3055270f27f245575d0901410421557041f930252b79b0fa28e6587680053b3a3672ff0c1dca6a623c79bdc0b6125a7a2be5450e28e49731ba8f60231dd8eceeff170923717d97a1ca5a67acd4 178501 0.00020659837962962966
356247 9873b5b567b7947fdb271ed37240da6ee45085f9804565c82a02f8a3a376833d 0 2015-05-13 14:46:45 12000 0.0292 14XvKaYnFEH2SioW2oRpgCKaMsaFwmFT7C pubkeyhash 76a91426c15da2beb1fb4a6db5ba4c993a806d22cd10d688ac 0 -1 356405 12c440ae9cd07a530d5a942e56cfae0f0b3797bd3c579366ecd0a7fa8a2d24d0 15 2015-05-14 15:51:23 0.0292 4294967295 47304402202a6531662b10fdc9d4b4e7ada1a4db2cdba6e0d22155ebbcedb8545a9ddfd1170220760a431d5eac0f32e8f6fc5812cc9ea5c1530eac37f11768288185bb87cba3ea01410421557041f930252b79b0fa28e6587680053b3a3672ff0c1dca6a623c79bdc0b6125a7a2be5450e28e49731ba8f60231dd8eceeff170923717d97a1ca5a67acd4 90278 0.00012538611111111112
356375 a6afeb2893da616d75c39a4eeaed51c91fef715930f356988665a32028cad057 204 2015-05-14 10:33:58 99600 0.2426 14XvKaYnFEH2SioW2oRpgCKaMsaFwmFT7C pubkeyhash 76a91426c15da2beb1fb4a6db5ba4c993a806d22cd10d688ac 0 -1 356405 12c440ae9cd07a530d5a942e56cfae0f0b3797bd3c579366ecd0a7fa8a2d24d0 16 2015-05-14 15:51:23 0.2426 4294967295 47304402202ad9c30164315b7ae2a3d280f629a014d316f4177e4dbde6167bd83c628a4d050220308bba5e628de5a47606d1e0ecd939332f7036a6da7eb6458e2bd6e690bd553d01410421557041f930252b79b0fa28e6587680053b3a3672ff0c1dca6a623c79bdc0b6125a7a2be5450e28e49731ba8f60231dd8eceeff170923717d97a1ca5a67acd4 19045 0.00021954652777777774
Any idea why? The pubkey doesn't seem to have a fixed length either, so counting characters from the end won't work.