[BBLISA] Live Sync / Backup / Sync without crawling
John Orthoefer
jco at direwolf.com
Mon Nov 3 21:34:33 EST 2008
On Nov 3, 2008, at 5:36 PM, Tom Metro wrote:
> John Orthoefer wrote:
>> Tony Rudie wrote:
>>> Rsync should be fine. And searching for a specific entry in a
>>> directory should be way faster than looking at every entry to see
>>> if it needs copying. Right?
>> If your filesystem stores directory entries in something other than
>> an unsorted list, yes it should be faster.
>> ...
>> But if your filesystem still keeps you directories in something
>> that is unhashed, then you might as well just let rsync do it's
>> job, you are only saving a stat call at that point...
>
> I'm not sure I see the relevance of hashed directories with respect
> to the OP's question.
>
The relevance is if you pass rsync a list of files (because we don't
know if the OPs plan is to sync each file as inotify fires, or to sync
the files every n minutes based on a list generated by inotify), as
far as I know it takes the first file in the list, and grabs the file.
Which means it needs to scan each directory down until it hits the
file, rinse and repeat for each file. If some of those directories
are huge and the scan is linear it can a long time to scan each
directory. But if rsync does it's own thing it does each file as it
encounters it, so it doesn't end up rescanning for each file.
Really I can't tell if it's relevant to the OP question, because he
didn't give any of those details. Further more, I was responding not
to the OP question but statement that Tony made, that searching for a
file is faster than checking each file. I was trying to point out
you need profile what is going on, and know where the bottleneck is
and what the problem is you are trying to solve. (not the problem you
want to solve, that is called research. And no they aren't always the
same thing.)
> If you let rsync operate in its usual fashion, then it needs to scan
> the directory hierarchy, and look at the file system metadata for
> each file, comparing it with the remote file, and if a difference is
> found, perform a more detailed block comparison.
It depends on what other flags you give rsync, you can tell it to just
blind sync the file based on metadata, --whole-file (as I recall.)
> The OP was seeking to replace that scan with an event driven model
> using inotify or an equivalent service hooked into the OS's kernel
> that would fire events when a change occurred in the area of interest.
>
>
>> When I first saw this message, my answer was use rsync with
>> --from-file...
>
> So where does the list of files come from that you put into the file
> pointed at by --files-from?
>
> Sure, you can use something like:
>
> inotifywait -q -r -m ... /path | perl -pe ... | rsync --files-
> from=- ...
>
> but it requires more than just rsync.
>
Yes, I didn't provide a whole script, some glue is required.
Depending on how fast you are expecting files to change, how fast you
can sync them, and how many are changing per unit time (maybe the
right answer for his problem might be to do something even more
complex, inotify feeding a program that forks off 20 or so rsyncs
which it queues files to be synced.) So it requires more knowledge
about what is going on to find the best solutions. With that said
based on what he asked, I would look to use rsync --file-from=. Also
you'll have to do something like rsync to resync after a restart,
because you won't be able to guarantee that while your notify wasn't
in place something didn't change.
Then again you might not be able to do what he needs with inotify, and
you might have to bring out the really heavy guns. Like a pair of
netapps with SnapSync/SnapMirror (I think those are the products that
keep 2 netapps in sync.)
Johno
More information about the bblisa
mailing list