Author

Topic: linux question (Read 977 times)

legendary
Activity: 1890
Merit: 1086
Ian Knowles - CIYAM Lead Developer
June 09, 2012, 03:53:19 AM
#17
I can barely keep my eyes open, but i sent your payment anyway. I trust that if I have issues with the script you will support me Smiley

Payment received (thanks) and sure if you have any problems with it just PM me.

Smiley
legendary
Activity: 1148
Merit: 1000
June 09, 2012, 03:49:37 AM
#16
Assuming that these scripts do actually accomplish what you are trying to do payment to 16grCc2rdtfRvnY2tKStaJDN3xgUHA4gjy would be much appreciated.


Cheers,

Ian.

http://blockchain.info/address/16grCc2rdtfRvnY2tKStaJDN3xgUHA4gjy

I can barely keep my eyes open, but i sent your payment anyway. I trust that if I have issues with the script you will support me Smiley

i'll message you later on, thanks for your help!
legendary
Activity: 1890
Merit: 1086
Ian Knowles - CIYAM Lead Developer
June 09, 2012, 03:41:59 AM
#15
Assuming that these scripts do actually accomplish what you are trying to do payment to 16grCc2rdtfRvnY2tKStaJDN3xgUHA4gjy would be much appreciated.


Cheers,

Ian.
legendary
Activity: 1890
Merit: 1086
Ian Knowles - CIYAM Lead Developer
June 09, 2012, 03:22:38 AM
#14
Okay - no problem - this is an updated "copy_new_file" script:

Code:
if [ $# -lt 2 ]; then
 echo Usage: copy_new_files [destination directory] [hash value] [source file]
else
 if [ ! -d $1/hashes ]; then
  echo "Error: Did not find $1/hashes directory (create it and re-run if $1 is the correct destination)."
 else
  if [ ! -f $1/hashes/$2 ]; then
   echo $2 $3> $1/hashes/$2
   fname="${3%.[^.]*}"
   if [ -f $1/$fname ]; then
    cp $3 $1/$2.$fname
   else
    cp $3 $1
   fi
  fi
 fi
fi

Now if it finds a file with the same name already exists then the destination filename with have the md5 hash prefixed (e.g. y.txt becomes 2f8ff6fabf4b2936197b8a93702461f9.y.txt).
legendary
Activity: 1148
Merit: 1000
June 09, 2012, 03:01:13 AM
#13
This does have a problem that if you have two (or more) files that have the same name but have different hashes as the subsequent files will just overwrite the earlier ones. If this is going to be an issue for you then I'll work out a way to perhaps prefix the filename with the hash.


Yeah that could be a problem. If you can just have it rename one of the files, that would be superb.
legendary
Activity: 1890
Merit: 1086
Ian Knowles - CIYAM Lead Developer
June 09, 2012, 02:57:59 AM
#12
Okay - I've done this using two scripts. The first is called 'copy_files' and does the "find" and "md5sum" work:

Code:
if [ $# -lt 2 ]; then
 echo Usage: copy_files [source directory] [destination directory]
else
 if [ ! -d $2/hashes ]; then
  echo "Error: Did not find $2/hashes directory (create it and re-run if $2 is the correct destination)."
 else
  find $1 -name \* -type f  | xargs -n1 md5sum | xargs -n2 ./copy_new_file $2
 fi
fi

and the second is called "copy_new_file" which will copy the file to the destination unless a file with the same hash has already been copied before:

Code:
if [ $# -lt 3 ]; then
 echo Usage: copy_new_file [destination directory] [hash value] [source file]
else
 if [ ! -d $1/hashes ]; then
  echo "Error: Did not find $1/hashes directory (create it and re-run if $1 is the correct destination)."
 else
  if [ ! -d $1/hashes ]; then
   echo "Error: Did not find $1/hashes directory (create it and re-run if $1 is the correct destination)."
  else
   if [ ! -f $1/hashes/$2 ]; then
    echo $2 $3> $1/hashes/$2
    cp $3 $1
   fi
  fi
 fi
fi

To use first make sure you have execute permissions on both scripts:
Code:
chmod a+x copy_files copy_new_file

Now it is as simple as:
Code:
./copy_files source_dir dest_dir

This does have a problem that if you have two (or more) files that have the same name but have different hashes as the subsequent files will just overwrite the earlier ones. If this is going to be an issue for you then I'll work out a way to perhaps prefix the filename with the hash.
legendary
Activity: 1148
Merit: 1000
June 09, 2012, 02:17:08 AM
#11
This could be fairly simply done just as a bash script (assuming you can run bash scripts on a Mac) and I'd be willing to write it for you for 2 BTC, however, there are a couple of things I would need to know first:

1) Are you happy for MD5 (or SHA1) to be the decision that the files are identical?

2) Can the destination files simply go to one directory or if not then how to determine which directory to copy them to?


I would say that md5 is sufficient. It's going to be 98% text files and 2% other text-based files (.sql,.sqlite,etc).

And destination DIR can be one directory, yes.
legendary
Activity: 1890
Merit: 1086
Ian Knowles - CIYAM Lead Developer
June 09, 2012, 02:12:06 AM
#10
This could be fairly simply done just as a bash script (assuming you can run bash scripts on a Mac) and I'd be willing to write it for you for 2 BTC, however, there are a couple of things I would need to know first:

1) Are you happy for MD5 (or SHA1) to be the decision that the files are identical?

2) Can the destination files simply go to one directory or if not then how to determine which directory to copy them to?
legendary
Activity: 1904
Merit: 1002
June 09, 2012, 02:08:39 AM
#9
If you can program, just get a list of all the files, hash it, check your hash list.  If the hash is already in it, move on, otherwise add the hash and copy the file to the destination folder.  If you can't program, hire a programmer.

I'd be willing to send someone a few btc to write a script for me. It will have to work using mac binaries.

I would think something like python would run fine on a Mac, but there's likely programmers around who could do it without you having to install an interpreter.  If you post your requirements here you should get a decent response: https://bitcointalk.org/index.php?board=52.0.  If not, I can do it for you in python, but it will take me a couple days since I'm busy with a lot of other stuff.
legendary
Activity: 1148
Merit: 1000
June 09, 2012, 02:02:49 AM
#8
If you can program, just get a list of all the files, hash it, check your hash list.  If the hash is already in it, move on, otherwise add the hash and copy the file to the destination folder.  If you can't program, hire a programmer.

I'd be willing to send someone a few btc to write a script for me. It will have to work using mac binaries.
legendary
Activity: 1904
Merit: 1002
June 09, 2012, 01:53:50 AM
#7
If you can program, just get a list of all the files, hash it, check your hash list.  If the hash is already in it, move on, otherwise add the hash and copy the file to the destination folder.  If you can't program, hire a programmer.
legendary
Activity: 1890
Merit: 1086
Ian Knowles - CIYAM Lead Developer
June 09, 2012, 01:50:10 AM
#6
I see - a little tricky but you might find the following useful for this job:

Code:
find . -name \* -type f  | xargs -n1 md5sum | sort >x

Change . to for example /mnt/raid (do this for each of your drives changing the name x to something different each time).

If you then check the contents of each 'x' file you should see something like the following:

Code:
cat x
0f4fa2bf42a91febbde52b7a32495f94  ./sub1/usage.bat
1bd3cb2ef387818ebe0fc318c232e27d  ./test.txt
2f8ff6fabf4b2936197b8a93702461f9  ./y.txt
8594592b8f830139b266c2d167a6fc5c  ./test.bun.gz
8c917a3450a4969d7e32e8da71e176ab  ./sub2/menu.bat
d41d8cd98f00b204e9800998ecf8427e  ./x
f258dcd6600d3ebf238662f8445b5e4a  ./sub1/sub1.1/hello.txt

If you check "diffs' between the various x files then you should be able to find any identical md5 hashes (which doesn't guarantee that the files are identical but it is most likely that they are).
legendary
Activity: 1148
Merit: 1000
June 09, 2012, 01:26:31 AM
#5
I have 4 external drives and an 8tb nas. I'm trying to take all of the data from the 4 externals and compare it and create one master folder of all unique files.

Any suggestions on how to accomplish this easily? Some files may have the same filename, but different content.
legendary
Activity: 1890
Merit: 1086
Ian Knowles - CIYAM Lead Developer
June 09, 2012, 12:59:56 AM
#4
Pretty much that simple - if your string has characters that are normally regexp ones then you'll need to escape those (or perhaps there is an option to indicate you are not using a regexp).

You may also need some other options if binary files need to be included (not sure but I don't think they are normally included).
legendary
Activity: 1148
Merit: 1000
June 09, 2012, 12:27:33 AM
#3
grep -r "string" /mnt/raid/*


damn, its really that simple?
legendary
Activity: 1890
Merit: 1086
Ian Knowles - CIYAM Lead Developer
June 09, 2012, 12:18:03 AM
#2
grep -r "string" /mnt/raid/*
legendary
Activity: 1148
Merit: 1000
June 09, 2012, 12:14:20 AM
#1
how can i search for a text string (*string*) inside every file on /mnt/raid?
Jump to: