[nylug-talk] Desktop search
Zachary Stern
rollerskatejamms at gmail.com
Fri Aug 3 12:37:12 EDT 2007
Sounds great! Get to work!
:-P
Zach Stern
347.531.8810
wordrockets.com/blog
-----Original Message-----
From: "Peter C. Norton" <spacey-nylug at lenin.net>
Date: Fri, 3 Aug 2007 07:46:05
To:nylug-talk at nylug.org
Subject: [nylug-talk] Desktop search
The concept of desktop+network searching is getting to be something
that's more and more interesting to me, and so far I don't see a good
answer.
For this purpose I haven't liked what I've seen of heavily using a
site whose search is driven by lucene. The search syntax is OK, but
has a lot of limitations that I've found seem to make it less useful
(eg. not being able to index well in a way that allows substring
searches for IP addresses in documents), though I'm not 100% sure that
this is a problem with lucene or the implementation I've ended up
using.
I like a lot of the features of xapian, but I'm sort of turned off by
the idea that the indexer is a part of the search engine, rather than
an API, since that seems to place limitations on what a desktop can
feed to the engine. That said, it's not a dealbreaker. I guess I
should look into it some more.
What are the technical requirements for a desktop search? The things I
can think of:
* Opportunistic: by this I mean that it's running in the background,
not trying to contend with the user. However it should also be tied
into desktop apps. When you read an email, the reader should feed
the search engine instead of the search engine having to pick apart
your personal mail store.
* Good tracking of already-searched documents. The desktop and the
OS should be providing a good source of GUIDs (to identify the
document) and something like a checksum (to know when the document
has changed) as well as change tracking (if there is a change
history, like for source code or for a document with tracking on)
for the search engine to be able to point users back to the right
application and revision of a document.
* Doesn't search files, searches documents and contexts. The search
engine should be able to know the difference between an executable
file, a read-only document, a database file, etc. and return the
search in the right context. Eg. it'd be nice to have a better
search than "apropos" and "man" to try to find a program to to xyz,
eg by invoking the search engine on all the strings in all the
binaries and cross-referencing that with their man pages and
possibly their prior executions (or better yet the equivelant of an
audit log of past invocations) to return the program you're looking
for, how it relates to what you're looking for, and how it was
invoked in the past (long after your shell history has lost it).
What else can you think of?
-Peter
--
The 5 year plan:
In five years we'll make up another plan.
Or just re-use this one.
_____________________________________________________________________________
Hire expert Linux talent by posting jobs here :: http://jobs.nylug.org
The nylug-talk mailing list is at nylug-talk at nylug.org
The list archive is at http://nylug.org/pipermail/nylug-talk
To subscribe or unsubscribe: http://nylug.org/mailman/listinfo/nylug-talk
More information about the nylug-talk
mailing list