[Novalug] tidbit: using recoll to search file contents

Doug Toppin dougtoppin@gmail.com
Sat Aug 15 18:17:59 EDT 2009


I thought that I would pass along info about an extremely useful tool
that I ran across a few hours ago.  I'm doing a lot of school work
this weekend and have to frequently refer to a library of more than 90
files of various formats (almost all MS files such as doc, xls, ...).
To find pieces of information in them I had been grep'ing but for the
binary and xml formats I was getting a lot of hits that I wasn't
interested in.  I looked around and ran across a Linux tool called
'recoll' which is described as "personal full text search tool for
Unix/Linux" at http://www.lesbonscomptes.com/recoll/.  It is a format
aware search tool that does exactly what I need.  I used beagle in the
past and had various cpu issues with it and didn't want to spend any
time messing with that approach right now.  I installed recoll on my
Kubuntu 9.04 (sudo apt-get install recoll) and it works perfectly.  It
starts indexing on the first run (and gives you a chance to change the
default configuration before it starts).

Very valuable things that it does:
* works out of the box
* indexes quickly if you limit the dirs being searched
* gives you a preview in textual form of the file hits in the query box

A couple of notes on using it:
* the default configuration on the first run will index everything
from your home dir down, you might want to reduce that to dirs that
you really care about (mine was taking forever)
* it uses a number of external tools (if they are available on your
system) to index the files which you may need to install to get the
full value, I had to install the catdoc package to get xls support),
the documentation section called "7.2. Supporting packages" will tell
you what packages it will use if available

I suspect that they are other tools available but I needed something
working quickly and this fit the bill perfectly.  If anyone has any
related tidbits to pass along please do.

Doug



More information about the Novalug mailing list