@iceland2k14: thanks, but it feels like I'm in over my head. I've abandoned the pubkey project (at least for now).
It looks like I'm going to need to
learn using a database.
Let's say I have a list like this:
sender recipient value fee
0xae0bf57678cf8151ff95889078e944a7696e18d5 0x930509a276601ca55d508cb5983c2c0d699fd7e9 1 39890258223793255
0x8c0fcd139568055e92a2b96c48ac85fa076c6c6a 0x202f1cbc8a208ee6dece54bb8837950b89e704b6 0 316430894723098310
0x1c7e19f5283aa41a496c1f351b36e96dbaad507f 0x7e75aefd78dbfd7e0846cf608151563164fbb7b2 0 42016257624091770
0xeca2e2d894d19778939bd4dfc34d2a3c45e96456 0xeca2e2d894d19778939bd4dfc34d2a3c45e96456 0 7521743554059000
0x26bce6ecb5b10138e4bf14ac0ffcc8727fef3b2e 0x26bce6ecb5b10138e4bf14ac0ffcc8727fef3b2e 0 7521743554059000
0x57845987c8c859d52931ee248d8d84ab10532407 0xd9e1ce17f2641f24ae83637ab66a2cca9c378b9f 63984998281303121920 9298601644341820
0x6604ac53a82cd784525e5f90652c4d6e6b2252af 0x8a0f69b5f97d5c5a2573314e91ef9d7f46ba6da1 0 32647788000000000
0x23d9e4be4d1d2b2a43a51cc66da725f0bd25ec43 0x95172ccbe8344fecd73d0a30f54123652981bd6f 0 17067300000000000
0x6e90ae41af1dea6f0006aa7752d9db2cf5e6a49f 0xd9e1ce17f2641f24ae83637ab66a2cca9c378b9f 55455427699986136143 38403951353497129
Want I want, is a database that only contains Addresses and their Value. To get this, Value and Fee get subtracted from Sender's balance, and Value gets added to Recipient's balance. The input data will be around 200 GB, including many duplicate addresses.
Given that I know nothing about databases, how would I start doing this? Is it going to be a problem if the database is larger than my RAM? If needed, I can (easily) split this list up into 2 lists: one with Sender, Value and Fee, and the other with Recipient and Value.
@TryNinja: Considering the performance you managed to get on ninjastic.space, I think you're the right person to ask
Allow me to notify you
To make it easier to understand what I need, I can turn the above table into this:
0x930509a276601ca55d508cb5983c2c0d699fd7e9 1
0xd9e1ce17f2641f24ae83637ab66a2cca9c378b9f 63984998281303121920
0xd9e1ce17f2641f24ae83637ab66a2cca9c378b9f 55455427699986136143
0xae0bf57678cf8151ff95889078e944a7696e18d5 -1
0x57845987c8c859d52931ee248d8d84ab10532407 -63984998281303121920
0x6e90ae41af1dea6f0006aa7752d9db2cf5e6a49f -55455427699986136143
0xae0bf57678cf8151ff95889078e944a7696e18d5 -39890258223793255
0x8c0fcd139568055e92a2b96c48ac85fa076c6c6a -316430894723098310
0x1c7e19f5283aa41a496c1f351b36e96dbaad507f -42016257624091770
0xeca2e2d894d19778939bd4dfc34d2a3c45e96456 -7521743554059000
0x26bce6ecb5b10138e4bf14ac0ffcc8727fef3b2e -7521743554059000
0x57845987c8c859d52931ee248d8d84ab10532407 -9298601644341820
0x6604ac53a82cd784525e5f90652c4d6e6b2252af -32647788000000000
0x23d9e4be4d1d2b2a43a51cc66da725f0bd25ec43 -17067300000000000
0x6e90ae41af1dea6f0006aa7752d9db2cf5e6a49f -38403951353497129
Sorting gives this:
0x1c7e19f5283aa41a496c1f351b36e96dbaad507f -42016257624091770
0x23d9e4be4d1d2b2a43a51cc66da725f0bd25ec43 -17067300000000000
0x26bce6ecb5b10138e4bf14ac0ffcc8727fef3b2e -7521743554059000
0x57845987c8c859d52931ee248d8d84ab10532407 -63984998281303121920
0x57845987c8c859d52931ee248d8d84ab10532407 -9298601644341820
0x6604ac53a82cd784525e5f90652c4d6e6b2252af -32647788000000000
0x6e90ae41af1dea6f0006aa7752d9db2cf5e6a49f -38403951353497129
0x6e90ae41af1dea6f0006aa7752d9db2cf5e6a49f -55455427699986136143
0x8c0fcd139568055e92a2b96c48ac85fa076c6c6a -316430894723098310
0x930509a276601ca55d508cb5983c2c0d699fd7e9 1
0xae0bf57678cf8151ff95889078e944a7696e18d5 -1
0xae0bf57678cf8151ff95889078e944a7696e18d5 -39890258223793255
0xd9e1ce17f2641f24ae83637ab66a2cca9c378b9f 55455427699986136143
0xd9e1ce17f2641f24ae83637ab66a2cca9c378b9f 63984998281303121920
0xeca2e2d894d19778939bd4dfc34d2a3c45e96456 -7521743554059000
Sorting is going to be slow for the full data set, and probably takes about 500 GB tmp space, but it's doable if it helps.
Main question:
how do I put this in .db format?