[Novalug] bash & grep question - best for optimizing?
James Ewing Cottrell 3rd
JECottrell3@Comcast.NET
Tue Nov 14 17:42:30 EST 2006
One could argue, and I will, that hacking -r into grep is an
abomination. The tool for recursing is FIND. Hacking every tool's
features into every other tool violate the one of the ancient UNIX
Fundamentals that "A Tool does one thing and does it well". Hopefully,
-r ignores symlinks, but what about -xdev, name selection, or
restricting to files only?
Solutions in this thread that lack -l are incorrect. And -H is wrong too.
Finally, now that you have this list, what are you going to do with it?
The temptations is to do something like
for file in $(find . -type f | xargs grep -l pat)
do process $file; done
but a better way is to do somethings like:
find . -type f | xargs grep -l pat |
while read file
do process "$file"; done
This allows all commands to run in parallel and won't die no matter how
long the argument list is. If the command is trivial (rm or a shell
script) then the following is possible
find . -type f |
xargs grep -l pat) |
xargs ./process "$file"
JIM
Ross Patterson wrote:
> At 09:08 11/13/2006, Nick Danger wrote:
>
>> 2. Should I do one grep for each pattern, or a single grep with multiple
>> matches?
>
>
> Assuming a modern grep,
>
> grep -rl -E "(pattern1)|(pattern2)|(pattern3)|(pattern4)" .
>
> is likely to be the highest performance and most reliable. In
> particular, assuming your hash-structured directory is kind of broad,
> grepping recursively on "." instead of on "*" will prevent the shell
> from trying to expand "*" into a list of subdirectory names. There is
> a limit on the size of a command string, and this will avoid it and
> also cut down on the storage size to launch the command.
>
> Ross
> _______________________________________________
> Novalug mailing list
> Novalug@calypso.tux.org
> http://calypso.tux.org/cgi-bin/mailman/listinfo/novalug
>
>
More information about the Novalug
mailing list