cat unsorted.txt | sort -u -S 65% -T tmp > sorted.txt
I'm already using "
sort", which uses /tmp by default.
I'll try "
sort -u" though, it might need less temporary storage than "
sort | uniq". The next update is scheduled for tomorrow, I'll see how it performs.
-S will tell your machine to use at most 65% CPU
I think you mean RAM, not CPU. This VM has only 256 MB, so I'll let "
sort" figure it out on it's own.
-T puts temporary files in a directory (here named "tmp") and not in RAM; if you have an SSD, the speed isn't too shabby
That's default behaviour
It doesn't have an SSD though, and I'm using "
cputool" to keep server load low. I'm okay without daily updates on this, I wouldn't want users to download this large file on a daily basis anyway.
I have sorted huge lists (>80 GB) on budget laptops using these two arguments. Worth a shot! If you want better hosting, PM me.
Since last year, I'm using an AWS server donated by
suchmoon for loyce.club. However, since AWS charges
$0.15/GB, I'm not comfortable hosting very large files on suchmoon's server.
When I tested sorting data on AWS, it started throtting disk IO after a while, which made it very slow. I've also tested a pay-by-the-hour-VPS, and obviously it was a lot faster.
There's one thing on my wish list though: a method to show only unique addresses in order of appearance (
without sorting them). It can be done with
awk '!a[$0]++', but this requires a lot of memory and doesn't use temporary files.