Author

Topic: python script compare lines in 2 text files and output matches (Read 291 times)

legendary
Activity: 3290
Merit: 16489
Thick-Skinned Gang Leader and Golden Feather 2021
i use windows so yes it has to be python
You can use bash utilities on Windows too. Or boot your computer from an Ubuntu LIVE DVD just to do this.

Quote
yes it is very very slow for files above 1,000,000 lines

when i want to compare a files with +10,000,000 lines to other 2,000,000 lines (i had to cancel and close the script Undecided )
It took me a while to create 2 test-files with 50,000 Bitcoin addresses, so I just copied the same addresses to the same files to make it 10 million and 2 million lines per file.
The comm-code above took 9 seconds to find all matches (and my PC is not very fast). I strongly suggest to use the proper tools for the job Smiley

Update: it used 1.5 GB RAM to do this. If you have much more data to compare, it might reduce memory load if you sort the files first.
jr. member
Activity: 56
Merit: 3
Does it have to be python? Bash command comm does exactly what you need:
Code:
comm -12 <(sort file1) <(sort file2)

I don't know how fast a Python loop would be, but the above code takes about 0.05 seconds for 2 files with 50,000 lines each.

i use windows so yes it has to be python
yes it is very very slow for files above 1,000,000 lines

when i want to compare a files with +10,000,000 lines to other 2,000,000 lines (i had to cancel and close the script Undecided )

i sort the first file then i split it to many small files 100k lines each (and was very very slow too Undecided)   

at some point python did not help my requirement

legendary
Activity: 4466
Merit: 3391
Code:
...
for firstline in firstfile:
  if firstline in secondfile:
    print >>f1, (firstline)
For a small number of lines, that might be ok. But for a large number of lines, it would be faster to sort the files first, and then compare. It's O(n log n) vs, O(n2).

Like this:

Does it have to be python? Bash command comm does exactly what you need:
Code:
comm -12 <(sort file1) <(sort file2)

I don't know how fast a Python loop would be, but the above code takes about 0.05 seconds for 2 files with 50,000 lines each.

Comparison psuedo code looks like this:
Code:
e1 = file1.begin()
e2 = file2.begin()
while e1 ≠ file1.end() and e2 ≠ file2.end()
    if *e1 < *e2
        ++e1
    else if *e1 > *e2
        ++e2
    else
        print *e1
        ++e1
        ++e2

Computer science FTW.
legendary
Activity: 3290
Merit: 16489
Thick-Skinned Gang Leader and Golden Feather 2021
Does it have to be python? Bash command comm does exactly what you need:
Code:
comm -12 <(sort file1) <(sort file2)

I don't know how fast a Python loop would be, but the above code takes about 0.05 seconds for 2 files with 50,000 lines each.
jr. member
Activity: 56
Merit: 3
#edit

Code:
import ecdsa, ecdsa.der, ecdsa.util, hashlib
import hashlib, os, re, struct
import bitcoin

f1=open("output","a") # f1=open("output","w")
firstfile= [line.rstrip('\n') for line in open("file1t")]
secondfile= [line.rstrip('\n') for line in open("file2")]

for firstline in firstfile:
  if firstline in secondfile:
    print >>f1, (firstline)
jr. member
Activity: 56
Merit: 3
1- save this as comprs.py
Code:
firstfile= [line.rstrip('\n') for line in open("file1.txt")]
secondfile= [line.rstrip('\n') for line in open("file2.txt")]
for firstline in firstfile:
  if firstline in secondfile:
    print(firstline

2- save this as result.bat (batch file)
Code:
@echo off
comprs.py >> file3.txt
start file3.txt
exit

3-you only need to run result.bat
jr. member
Activity: 56
Merit: 3
can you correct this code:
~snip~
the output file part Huh

If it prints the correct key, but writes the wrong (or nothing) to the file, my solution has been posted.
Just view the code and at least try to understand it. I gave an explanation on how to fix it, together with an example integrated in the code from above.


i did try it on first minute you posted
it didn't work too ! no need for explanation Grin

thank you bob123 and mocacinno + sent you a positive feedback
legendary
Activity: 1624
Merit: 2481
can you correct this code:
~snip~
the output file part Huh

If it prints the correct key, but writes the wrong (or nothing) to the file, my solution has been posted.
Just view the code and at least try to understand it. I gave an explanation on how to fix it, together with an example integrated in the code from above.
legendary
Activity: 3584
Merit: 5243
https://merel.mobi => buy facemasks with BTC/LTC
To tell you the truth, i've never used the module "BitcoinKeypair". That being said, i do see a potential problem with the identation of the code you've shared:
on line 10, you start looping over all private keys in list in_prvkey, but on line 14 you haven't used any identation, so the line "outfile.write(k.address(x)+"\n")" will only write one address to your file (the address generated by the last private key in in_prvkey.
If, accidentally, your input file contains a newline at the bottom of the file, there is a chance the last ellement of the list is empty, so the address-function might fail due to this...

Anyways, if you see errors, my gut feeling tells me that it's either a wrong identation combined with an error on the last line of the input file, or a misusage of the BitcoinKeypair module... Can you post the exact error message?
jr. member
Activity: 56
Merit: 3

can you correct this code:

Code:
from coinkit.keypair import BitcoinKeypair

with open("prvkey.txt","r") as f:
    in_prvkey = f.readlines()
in_prvkey = [x.strip() for x in in_prvkey]
f.close()
#print  in_prvkey

outfile = open("prvkey2add.txt","w")
for x in in_prvkey:
  k = BitcoinKeypair(x)
  print k
 
outfile.write(k.address(x)+"\n")
outfile.close()


the output file part Huh

(kindly also adding some library that up to date and support all prvtkys format )
- read private keys from a file
- output public address
legendary
Activity: 1624
Merit: 2481
it print correct result
but in outputfile.txt file always containing the last line in file1 Undecided

Then you are most probably calling it in the wrong place.
You need to write it to the file when you are checking (and printing) the line which is present in both files.

If we take the code from above:

Code:
// open file writable, in this example as: "file"

firstfile= [line.rstrip('\n') for line in open("textfile_containing_first_list.txt")]
secondfile= [line.rstrip('\n') for line in open("textfile_containing_second_list.txt")]
for firstline in firstfile:
  if firstline in secondfile:
    print(firstline)
    file.write(firstline+"\n")
file.close()
jr. member
Activity: 56
Merit: 3
make sure your indentations are correct... Tab =/= space

As for writing to a file...
file= open("outputfile.txt","a+")
file.write("key %s\r\n" % firstline)
file.close()

I'll be heading home for the day, if you have more questions... Don't hesitate to ask them, i'll be answering them tomorrow (or somebody else will probably help you out in my absence)

it print correct result
but in outputfile.txt file always containing the last line in file1 Undecided

have a good day


now i am using this batch for temporally solution
Code:
@echo off
comprs.py >> 3.txt
exit
legendary
Activity: 3584
Merit: 5243
https://merel.mobi => buy facemasks with BTC/LTC
make sure your indentations are correct... Tab =/= space

As for writing to a file...
file= open("outputfile.txt","a+")
file.write("key %s\r\n" % firstline)
file.close()

I'll be heading home for the day, if you have more questions... Don't hesitate to ask them, i'll be answering them tomorrow (or somebody else will probably help you out in my absence)
jr. member
Activity: 56
Merit: 3
i tried to add a line to redirecting print output to a 3.txt  Undecided

i need the proper line
 
Code:
print(firstline) , file=open("3.txt", "a"))
errors 
>>IndentationError: unexpected indent
>>SyntaxError: invalid syntax
jr. member
Activity: 56
Merit: 3
in python, that's relatively easy...
I wrote this code from memory (and copy/pasted 2 lines from the source i mentioned below), it should work, but typos might happen

Code:
firstfile= [line.rstrip('\n') for line in open("textfile_containing_first_list.txt")]
secondfile= [line.rstrip('\n') for line in open("textfile_containing_second_list.txt")]
for firstline in firstfile:
  if firstline in secondfile:
    print(firstline)

part of the source : https://qiita.com/visualskyrim/items/1922429a07ca5f974467 (i was to lazy to write a loop over a filehandle from memory)

5 STARS Grin Grin Grin
thank you so much
legendary
Activity: 3584
Merit: 5243
https://merel.mobi => buy facemasks with BTC/LTC
in python, that's relatively easy...
I wrote this code from memory (and copy/pasted 2 lines from the source i mentioned below), it should work, but typos might happen

Code:
firstfile= [line.rstrip('\n') for line in open("textfile_containing_first_list.txt")]
secondfile= [line.rstrip('\n') for line in open("textfile_containing_second_list.txt")]
for firstline in firstfile:
  if firstline in secondfile:
    print(firstline)

part of the source : https://qiita.com/visualskyrim/items/1922429a07ca5f974467 (i was to lazy to write a loop over a filehandle from memory)
jr. member
Activity: 56
Merit: 3
compare  every lines in file1 with lines in file2
string comparative not only first number

do this

file1
Code:
1FFYY4EGHTVBWHEQbPcceME9YA6BWnEJxK
1GYeVf48v55hWHwynqgpXSnP84A96K9JxJ
1Ji25E8DaLpsgekWhkQk4UG5L6pz468EKy
1K5MT7BbKvCj4YeALeoEQr5sK2bH2uZdWi
1KRQjx2T31HC5boSoj9h3eMxHPkTFVtcJX

file2
Code:
1C1wxy5pcFj9KBFDFFnVyUYr7puT8abHaW
1K5MT7BbKvCj4YeALeoEQr5sK2bH2uZdWi
1Ly8X7xSoJdM6nfZSi1HDQuBjMjiuiev1r
12ux1FpMq5iJ14wycDV2DpBcqxHTTGPSjC
16jw8vgKjA8DThTwpBb3pfk6tGbMHWnz6x

output
Code:
1K5MT7BbKvCj4YeALeoEQr5sK2bH2uZdWi
Jump to: