[BBLISA] My notes from Nagios and SEC talk
johnmalloy at comcast.net
johnmalloy at comcast.net
Thu Jan 11 14:12:18 EST 2007
I appreciate it!
Are there any slides available for the talk?
--
Thanks!
John Malloy
johnmalloy AT comcast DOT net
-------------- Original message ----------------------
From: "Daniel Clark" <dclark at pobox.com>
> Sort of garbled, but thought they might be of use to someone
>
> Are the slides available anywhere?
>
> = John Rouillard - Nagios and SEC =
>
> == Nagios ==
>
> Nagios, unlike some other FLOSS software, has Correlation - parents and others
> * Limited cause/effect detection
>
> Don't use host_name in "define service" stanza -- use hostgroup_name instead!
> * Has test on each host where it looks up it's own name to make sure
> dns is working on that host
> * Flap detection is problematical - he leaves it turned off
> * Nagios can put performance data "somewhere" - DB, RRD etc.
> * is_volitile useful in special cases
> * read the manual - twice
>
> Correlation - find the fingerprint - only be notified of things that matter
>
> Nagios 3 will support defining own variables - write up on hack to do
> this now how to monitor SSL is on nagios-users list (find post)
> check_ldaps
> * I think this post is:
> http://article.gmane.org/gmane.network.nagios.user/40093/match=ldaps
>
> Servicegroup - bundle of group of services that provide a
> customer-visable server (e.g. db2, websphere app server, apache)
>
> Serviceextifo/Hostextinfo going away in Nagios 3 -- info shifts to
> becoming attributes of service and host objects
>
> Nagios 3 in alpha now.
>
> * Nagios really a service monitoring program, not a host monitoring system
>
> Many other monitoring projects are missing correlation.
>
> Nagios 2 - host checks are done in series (In Nagios 3, they will be in paralel)
>
> Correlation includes (slide) Topology, Thresholds, Service, Cluster
> (meta) plugin, Flap detection (doesn't quite work, but SEC replaces
> it)
>
> Tricks:
> * Links to TWiki for a knowlege base for services, hosts, addl commands
> * Can change html pages - he has "Unack Svc Probs" - on call person
> lives in this screen
> * Downtime scheduling
>
> * He uses cacti and rt integrated with twiki - interesting feature -
> find last ticket in RT that mentions system
> * connect via (ajaxterm?)
> * look at nagios definitiaons
> * (cacti not from nagios - he doesn't like nagios for rrd suff - he
> uses drraw instead)
> * Also have wiki pages for services
> * Nagios just has link - no dual-way automation, but don't really
> need it in this case - wiki-side template for hosts and services do
> exist however
>
> == SEC ==
> * Is very passive
> * often times you may need to hook rule types together -- in groups
> * only useable in real time at the moment
> * can do everything that nagios does except topoplgy
>
> Plugin talks to device, sec determines severity level, gives data back
> to nagios (nagios not time aware, sec is)
>
> * He has created patch to Nagios that allows te active events to be
> passed through to sec - patch is in beta this month, still 2 open
> slots for more beta testers - beta period will last at least 2 months.
>
> When used with nagios his patch adds:
> * counting ok states before reamrming
> * differeent triggers or polling interval on analysis of error not
> just non-ok severity
> * changing trouble thresholds per time period/activity
>
> * SEC also monitors nagios log file - often this file will show
> nagios configuration errors
>
> Contexts
> * See ssh example in 2004 lisa paper (http://www.cs.umb.edu/~rouilj/sec/)
>
> Nagios is good at "what is hapening now"; sec is good at figuring out
> "how I got to now"
> * His patch will be released under GPL
>
> * Personal Website: www.cs.umb.edu/~rouilj
>
> * easy: passive service event -> nagios
> * trick here is getting active stream from Nagios
>
> OpenNMS (in 2004) - didn't have good correlation compared to nagios,
> and certainly not comperable to SEC
> * Does it have correlation now?
> * It used to have thresholding issues as well, and may still
>
> ZenOSS:
> * He couldn't see correlation aspects that he really needed.
>
> Temperature censors - lmsensors and smartcontrol can be used instead
> of stand-alone devices in some cases
>
> Some tricks:
> * Rack as host - if 3 boxes in rack have high temp, rack is overheated
> * Room as host - "room is on fire' alert if 3 racks have high temp
> * But really needed "room is underwater" alert :-)
>
> * Q: lots of host - does he manual edit? A: Yes, but working towards
> defining every host once in config (his config mgmt app, akin to
> cfengine/puppet/bcfg2/lcfg)
> * automation issue: Think of a host group as a set, nagios only has
> set subtraction - makes automation very difficult
> * could just not use hostgroups, but then that makes the nagios web GUI suck
> * hostgroups for admin data
>
> * Groundworks stuff may be pretty good for automating config for lots
> if machines - http://www.groundworkopensource.com/products/os-overview.html
> * Nagios 3 isn't going to push config into DB - Nagios 4 might.
>
> * Oreon graphical interface for nagios - out of france - might be
> nice - http://www.oreon-project.org/
>
> --
> Daniel Clark # http://dclark.us # http://opensysadmin.com
>
> _______________________________________________
> bblisa mailing list
> bblisa at bblisa.org
> http://www.bblisa.org/mailman/listinfo/bblisa
More information about the bblisa
mailing list