[Novalug] Unix script help request

James Ewing Cottrell 3rd JECottrell3@Comcast.NET
Fri Feb 27 18:43:32 EST 2009


Well, it just goes to show that No Good Deed Goes Unpunished.

And I did acknowledge that any working program is good. My purpose was 
not to attack you personally. Perhaps I should have omitted your name.

But remember, your post should not only solve his problem, but should 
demonstrate good coding practice to the Newbies here among us.

OK, you merely answered his question, you didn't write an entire script. 
I was responding to the script as a whole. Sorry for attributing the 
entire thing to you.

Yes, you are correct about grep -q. But -q is a relatively new feature, 
and people used to write 'if grep -s pattern file >/dev/null', thus 
reading the whole file. Indeed, Sun's manpage still doesn't mention -q.

Slow in and of itself isn't wrong, but you don't know the size and scope 
of the problem, altho in this case, it is hinted at strongly. Perhaps 
the guy was running this command in a news group spool, where there 
could be thousands of files...no longer "a few greps".

Of course you get a pass on my pet peeve; I merely intended to show how 
else it can be done, and why I think it is better. But if you had 
realized (or thought to mention it) that the proper idiom is "if grep 
..." rather than throwing an ugly test for exit status, WHICH IS WHAT IF 
TESTS ANYWAY, that issue would never have come up.

And BTW, FORTRAN uses .EQ., altho I doubt many people here are old 
enough to remember that. And FORTRAN is no example of good taste.

The parallel example is rather interesting tho. But probably too hard to 
do in a shell script without overloading the system.

I should also mention that the ls is unnecessary; set -- EBPV*.out.xml 
will set the script arguments ($1, $2, $3, etc) to the list of files.

But still, the script contained an inferior idiom: grep -l is CLEARLY 
what is called for, eliminating the loop.

I agree that People Time is more expensive than Computer Time, but where 
we can have both, why not? My idioms are cleaner in both cases.

JIM

Garrett Nievin wrote:
> Firstly, I'm not sure how you think that I wrote a script.  I used a
> 3-line code fragment to show Kevin how to use the exit code of a quiet
> grep to detect the existence of a string in a file in a script he was
> working on last week.  I was helping a guy with a question about his
> script, running in an environment he knows much better than I, and
> offered a quick bit of help as I was heading out the door.  If he wants
> more, I trust him to ask for it.
> 
> Kevin was working to get a small and simple job done.  In that
> context, I'd argue that there's nothing at all wrong with what I
> offered.  In fact, I'd argue that you don't understand what the word
> wrong means.  Those three lines work completely correctly.  Every
> mistake you pointed out is either a potential alleged optimization or
> your "pet peeve" ( -eq works exactly the same as it does in FORTRAN,
> BTW), and not all are even correct.  The "-q" option on grep stops
> reading at the first match the same as "-l" does.  Yes, it does. Read
> the code or use "strace" or "time" to convince yourself.
> 
> There's a ton more we could do to optimize Kevin's script, IF there's a
> reason to do so.  Why didn't you do anything at all to parallelize and
> take advantage of modern architectures?  Saving a few greps is nothing
> in comparison to that.  If it's going to be running on a 128-processor
> system with tons of I/O, and there's hundreds of thousand files an
> hour, that would be really valuable.  In comparison, saving greps would
> be barely noticeable.
> 
> While we're at it, let's write the whole thing in C instead of shell
> and save tons of overhead.  And use an inotify event to do the job
> immediately when the XML files are closed and avoid the latency of a
> shell script being run by cron.  We could keep enhancing this in a
> zillion ways.
> 
> Or, just possibly, in the case of a few files, it's not worth spending
> even the time spent you spent writing the email.  Much better to have
> basic, simple code that can be understood and modified by a larger body
> of people with varying levels of experience.  People time is expensive;
> computer time is cheap.  Correct and slow is usually just as good as
> correct and less slow.  Slower != wrong.
> 
> Kevin specifically asked for a way to use "grep" and a conditional
> statement. That's precisely what I gave him.  I also gave it to him when
> he asked for it - a week ago.  And it worked correctly.  Job done to
> requirements and on time.
> 
> This is why I rarely raise my head in this forum - too often I get it
> bitten off because everybody knows something I don't and some are
> itching to prove it.
> 
> 
> 
> On Fri, 27 Feb 2009 13:04:38 -0500
> James Ewing Cottrell 3rd <JECottrell3@Comcast.NET> wrote:
> 
>> Well, it has been said that any program that works is a good one, but it 
>> can be done more eloquently:
>>
>> grep -l pattern files |
>> while read name
>> do
>> 	mv $name S$name
>> done
>>
>> Or, if you are into using xargs:
>>
>> grep -l pattern files | xargs -i @ mv @ S@
>>
>> I hate to be a wet blanket, but there are several things wrong with 
>> Garrett's script:
>>
>> [1] grep is executed once per file rather than once only.
>> [2] grep -l will quit reading when a file once a pattern. His version 
>> unnecessarily reads the rest of the file after the first match.
>> [3] anytime you test $? you missed an opportunity to use if:
>>
>> 	if grep ....
>> 	then it_matched
>> 	else it_did_not
>> 	fi
>> [4] this is more of a pet peeve of mine, but I would rather use a case 
>> statement to do string matching:
>>
>> 	case $? in
>> 	(0) success;;
>> 	(*) failure;;
>> 	esac
>>
>> The reason is that the == (or is it = ?) and -eq are the *opposite* from 
>> what perl uses, and in my mind, Perl's version make the most sense.
>>
>> One way to debug this is to place an echo in front of the mv to see what 
>> kind of commands will be generated. Then, when you are done, either take 
>> the echo out, or pipe it to sh.
>>
>> And yes, you can pipe both into and out of a while statement:
>>
>> grep -l pattern files | while read x; do echo mv $x S$x; done | sh -x
>>
>> grep -l pattern files | xargs -i @ echo mv @ S@ | sh -x
>>
>> JIM
>>
>> Garrett Nievin wrote:
>>> One possible solution:
>>>
>>> If you do a grep with the "-q" option (quiet), it will just set the
>>> exit code to 0 if the text was found and 1 if it was not.  You can
>>> check the exit code using the shell variable ?.  The question mark
>>> character is the name of the variable.
>>>
>>> Here's some nice structured mainframe-y code:
>>>
>>> # see if $loopfile contains the magic string, and do something accordingly
>>> grep -q '<TRANS_CDE>RSCF' $loopfile
>>> if [ $? -eq 0 ] ; then
>>> 	echo "string was found"
>>> else
>>> 	echo "string was not found"
>>> done
>>>
>>>
>>>
>>> On Sat, 21 Feb 2009 08:32:38 -0500
>>> Kevin Starkey <kevin.linuxfan@gmail.com> wrote:
>>>
>>>> Hi all,
>>>>
>>>> My Linux experience (meager as it is) is finally starting to be 
>>>> applicable at my job (mainframe programmer). We just started using a 
>>>> Unix file server on the mainframe and occasionally we run into an issue 
>>>> to solve and I volunteered to tackle this one. The issue is, I might 
>>>> have 0, 1, or > 1 files in a directory with the same name pattern of 
>>>> "EBPV************.out.xml" and one of these files might be the one I'm 
>>>> looking for, or it might not. I need to identify a particular string 
>>>> inside of the file, and if it matches, then I want to add an 'S' to the 
>>>> start of the name (SEBPV....).
>>>>
>>>> Below is the start of a script that I hope is close to what I need, but 
>>>> I need to add "grep" and a conditional statement where I only do the 
>>>> renaming if the grep finds what I'm looking for, but I'm not sure how to 
>>>> code that part (this is my first script).
>>>>
>>>> ---------------------------------------------------------------------------------------------------
>>>> #!/bin/bash
>>>>
>>>> # Change to directory
>>>> cd /proj/preed/output
>>>>
>>>> #Get all needed files in current directory
>>>> originalFiles=$(ls EBPV*.out.xml)
>>>>
>>>> # Loop through all files and do your changes
>>>> for loopFile in $originalFiles
>>>> do
>>>> # Create your new filename including the extension
>>>> mv $loopFile S$loopFile
>>>> done
>>>> ---------------------------------------------------------------------------------------------------
>>>>
>>>> I need to somehow add:
>>>>
>>>> grep $loopfile "<TRANS_CDE>RSCF"
>>>>
>>>> and only if it's found then I want to do the "mv ...." (renaming).
>>>>
>>>> Any help would be greatly appreciated.
>>>>
>>>> Thanks,
>>>> Kevin.
>>>>
>>>> _______________________________________________
>>>> Novalug mailing list
>>>> Novalug@calypso.tux.org
>>>> http://calypso.tux.org/cgi-bin/mailman/listinfo/novalug
>>>>
>>>
>>>
>>> _______________________________________________
>>> Novalug mailing list
>>> Novalug@calypso.tux.org
>>> http://calypso.tux.org/cgi-bin/mailman/listinfo/novalug
>>>
>>>
>>> ------------------------------------------------------------------------
>>>
>>>
>>> No virus found in this incoming message.
>>> Checked by AVG - www.avg.com 
>>> Version: 8.0.237 / Virus Database: 270.11.2/1963 - Release Date: 02/20/09 19:22:00
>>>
> 
> 
> 
> ------------------------------------------------------------------------
> 
> 
> No virus found in this incoming message.
> Checked by AVG - www.avg.com 
> Version: 8.0.237 / Virus Database: 270.11.4/1976 - Release Date: 02/27/09 13:27:00
> 




More information about the Novalug mailing list