[nylug-workshop] Log analyzer (Re: Regular meetings of the Python workshop)
Yusuke Shinyama
yusuke at cs.nyu.edu
Sun Feb 25 12:13:13 EST 2007
Peter, thanks for bug reporting and the patches! I'll take look at
them later. Unfortunately, I have to take a minor surgery
tomorrow, so I probably can't go to the talk Tuesday. Hopefully
I'll get recovered soon and we can meet in the next Python meeting.
Yusuke
On Sun, 25 Feb 2007 01:43:38 -0800, "Peter C. Norton" <spacey-nylug-workshop at lenin.net> wrote:
> On Wed, Feb 14, 2007 at 01:48:54PM -0500, Yusuke Shinyama wrote:
> > Okay, I just put this up. See
> > http://www.unixuser.org/~euske/python/logweeder/
> >
> > I know the documentation is still crappy.
> > Please feel free to point me out any weird English errors! ;)
>
> I had a fun time looking at this. Most of the techniques in analyzing
> the differences is new to me, but I thnk I've got a feel for how it
> works.
>
> Attached I've got a patch that changes a bit of how this works:
>
> bugs:
> 1) I think the third element of each output patterrn line can run into
> a situation where the string to be eval()'d by match.py can get
> through with unescaped single quotes, or at least this happened to me
> while I was fooling around with it, and I don't see the original code
> protecting from that. Added a call to repr() to fix that.
>
>
> changes for consideration:
> 1) Added a configuration file. this is mostly for fun, but I think
> it's a useful way to experiment
> 2) Created a default (configurable) pattern file that makes patterns
> persistant and automatically handled within the program. The names of
> the patterns get maintained, and new can be added to old.
> 3) When learning a new logfile, existing patterns are used to filter
> out unwatned elements. I think that this creates a substantial
> improvement in the speed (the following is for an about 1000 line
> syslog messages file - very repetative):
>
> spacey at brick:~/src/logweeder-0.1$ time ./learn.py /var/log/messages > patterns
> [...]
> real 2m47.487s
> user 2m30.941s
> sys 0m1.400s
> spacey at brick:~/src/logweeder-0.1$ grep 'type-' patterns | wc -l
> 160
>
> spacey at brick:~/src/logweeder-0.1-pn$ rm patterns
> spacey at brick:~/src/logweeder-0.1-pn$ time for i in $(seq 1 10) ; do \
> ./learn.py -s 100 /var/log/messages; done
> real 0m12.247s
> user 0m10.597s
> sys 0m0.264s
> spacey at brick:~/src/logweeder-0.1-pn$ wc -l patterns
> 35 patterns
>
> The way that the patterns are created, I thought that the new version
> sould be exposed to as man if not more strings from which to create
> patterns due to the filtering step, but I may be wrong:
>
> spacey at brick:~/src/logweeder-0.1$ ./match.py patterns
> /var/log/messages | egrep -v 'type-'
> unknown: Feb 21 23:44:34 brick gconfd (root-4802): GConf server is not
> in use, shutting down.
> unknown: Feb 22 07:51:00 brick gconfd (spacey-4525): SIGHUP received,
> reloading all databases
>
>
> spacey at brick:~/src/logweeder-0.1-pn$ ./match.py patterns
> /var/log/messages | egrep -v 'type-'
> unknown: Feb 21 23:44:34 brick gconfd (root-4802): GConf server is not
> in use, shutting down.
> unknown: Feb 22 07:51:00 brick gconfd (spacey-4525): SIGHUP received,
> reloading all databases
> unknown: Feb 24 13:43:22 brick kernel: [ 1189.708190] ohci_hcd
> 0000:00:13.1: wakeup
> unknown: Feb 24 13:43:22 brick kernel: [ 1189.878617] usb 2-3: new low
> speed USB device using ohci_hcd and address 2
> unknown: Feb 24 13:43:22 brick kernel: [ 1189.971468] usb 2-3:
> configuration #1 chosen from 1 choice
> unknown: Feb 24 13:43:23 brick kernel: [ 1190.083431] input: Logitech
> USB Trackball as /class/input/input3
> unknown: Feb 24 13:43:23 brick kernel: [ 1190.083976] input: USB HID
> v1.10 Mouse [Logitech USB Trackball] on usb-0000:00:13.1-3
> unknown: Feb 24 13:43:23 brick kernel: [ 1190.084351]
> drivers/usb/input/hid-core.c: v2.6:USB HID core driver
>
> I think this may be a bug that I created, though. It doesn't seem to
> want to create patterns for things it's seeing only once, which is
> still desired. I'm going to work on that later, when I've slept.
>
> In exchange for the improved time to learn patterns, it's likely that
> I'm breaking something about the behavior of the cluster class
> somewhat and possibly allowing for patterns to be created for related
> entires that aren't going to be localized in the same pattern, but I
> suspect that in the case of log files it's unlikely for this to
> happen.
>
> but something is weird here. If I look at the first few patterns in
> the original, get things like this that I just don't think that I should:
>
> ('type-0', 212, '^brick\\ kernel.*\\:.*\\ \\[.*\\ .*\\ .*\\ .*\\ ')
> ('type-1', 205, '^brick\\ kernel.*\\:.*\\ \\[.*\\ .*\\ .*\\ .*\\ ')
> ('type-2', 198, '^brick\\ kernel.*\\:.*\\ \\[.*\\ .*\\ .*\\ .*\\ ')
> ('type-3', 187, '^brick\\ kernel.*\\:.*\\ \\[.*\\ .*\\ .*\\ .*\\ ')
> ('type-4', 187, '^brick\\ kernel.*\\:.*\\ \\[.*\\ .*\\ .*\\ .*\\ ')
> ('type-5', 187, '^brick\\ kernel.*\\:.*\\ \\[.*\\ .*\\ .*\\ .*\\ ')
> ('type-6', 151, '^brick\\ kernel.*\\:.*\\ \\[.*\\ .*\\ .*\\ .*\\ ')
> ('type-7', 128, '^brick\\ kernel\\:\\ \\[.*\\ .*\\ .*\\ .*\\ ')
> ('type-8', 107, '^brick\\ kernel\\:\\ \\[.*\\ .*\\ .*\\ .*\\ ')
>
> I think these patterns should be impossible since 0-6 appear to be
> identical, etc. However, thatt's how it is right now.
>
> things I'm not sure will fly:
>
> 1) The way I handle the pattern file, the sample data gets thrown
> out. this isn't good, but I'm too tired now.
> 2) There's some debuging gunk lying around. comment out the
> sys.sterr.writes you don't like.
>
> -Peter
> (see attached for patches)
>
>
> --
> The 5 year plan:
> In five years we'll make up another plan.
> Or just re-use this one.
More information about the nylug-workshop
mailing list