-u, --unique this mean you have more then same addresses at different lines, actually mean duplicate, btw you dont have duplicate as i see last year your files
I don't have duplicates
because I use "
sort -u". The input data, which I use to create this file, contains every Bitcoin transaction every made, and many addresses have been reused.
without -u your sort will be right in order
With "
-u" too
for duplicate remove you can use perl command for big files, no memory issue/error
perl -ne'print unless $_{$_}++' big-file.txt >> dup-remove.txt
I tried to test the performance difference, but
perl used up my 16 GB RAM:
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
1665014 loyce 20 0 11.8g 11.8g 4128 R 82.7 75.7 1:07.01 perl
So I'll stick to "
uniq" for this.
for duplicate awk command will give errors, even you have big ram, but perl work best
I'm not sure which awk command you mean: in another thread, I've mentioned removing duplicates while keeping addresses in their original order, and that indeed requires a lot of memory. But that's more complicated than simply removing duplicates.
sort -u will no right in order results, better use simple sort
That's incorrect: the command sorts the data, then removes duplicates. The output is sorted.