<font><font face="verdana,sans-serif">There is a very important academic &amp; practical discussion to be had about this.  In fact Alva Couch and I and others have been examining similar topics for years.  Unfortunately I don&#39;t have the bandwidth right now to get into it, perhaps in a few months.  I&#39;ll leave you with these two tidbits:  thresholds are no good in these circumstances (except as a coarse lower/upper bound)...you need to combine learning (small amounts of hysteresis) and highly reactive management.  Second, one might be able to obtain unrefined but useful estimates of performance in various components (e.g., cpu, disk, network, etc) without an agent -- via analysis of response-time and other statistics...essentially building a black-box model over time of how the system is *expected* to work.</font></font><div>

<font face="verdana, sans-serif"><br></font></div><div><font face="verdana, sans-serif">Regards,</font></div><div><font face="verdana, sans-serif">Marc<br></font><div><font face="verdana, sans-serif"><br></font><div class="gmail_quote">

On Sun, Aug 4, 2013 at 12:00 PM,  <span dir="ltr">&lt;<a href="mailto:bblisa-request@bblisa.org" target="_blank">bblisa-request@bblisa.org</a>&gt;</span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

Send bblisa mailing list submissions to<br>

        <a href="mailto:bblisa@bblisa.org">bblisa@bblisa.org</a><br>

<br>

To subscribe or unsubscribe via the World Wide Web, visit<br>

        <a href="http://www.bblisa.org/mailman/listinfo/bblisa" target="_blank">http://www.bblisa.org/mailman/listinfo/bblisa</a><br>

or, via email, send a message with subject or body &#39;help&#39; to<br>

        <a href="mailto:bblisa-request@bblisa.org">bblisa-request@bblisa.org</a><br>

<br>

You can reach the person managing the list at<br>

        <a href="mailto:bblisa-owner@bblisa.org">bblisa-owner@bblisa.org</a><br>

<br>

When replying, please edit your Subject line so it is more specific<br>

than &quot;Re: Contents of bblisa digest...&quot;<br>

<br>

<br>

Today&#39;s Topics:<br>

<br>

   1. statistics-based zero config network management: why doesnt<br>

      this exist? (Alex Aminoff)<br>

   2. Re: statistics-based zero config network management: why<br>

      doesnt this exist? (Brian O&#39;Neill)<br>

   3. Re: statistics-based zero config network management: why<br>

      doesnt this exist? (<a href="mailto:kurin@delete.org">kurin@delete.org</a>)<br>

   4. Re: statistics-based zero config network management: why<br>

      doesnt this exist? (Matt Simmons)<br>

   5. Re: statistics-based zero config network management: why<br>

      doesnt this exist? (Edward Ned Harvey (bblisa4))<br>

   6. Re: statistics-based zero config network management: why<br>

      doesnt this exist? (Brian O&#39;Neill)<br>

<br>

<br>

----------------------------------------------------------------------<br>

<br>

Message: 1<br>

Date: Sat, 03 Aug 2013 15:52:41 -0400<br>

From: Alex Aminoff &lt;<a href="mailto:alex@basespace.net">alex@basespace.net</a>&gt;<br>

Subject: [BBLISA] statistics-based zero config network management: why<br>

        doesnt this exist?<br>

To: <a href="mailto:bblisa@bblisa.org">bblisa@bblisa.org</a><br>

Message-ID: &lt;<a href="mailto:51FD5F89.2010907@basespace.net">51FD5F89.2010907@basespace.net</a>&gt;<br>

Content-Type: text/plain; charset=ISO-8859-1; format=flowed<br>

<br>

<br>

I&#39;m looking at SNMP-based network monitoring systems: cacti, zabbix,<br>

some other similar ones. All of them seem to require you to configure<br>

your devices on the system. There are some auto-discovery functions, but<br>

they only work if you have loaded up the &quot;profile&quot; or &quot;template&quot; for<br>

your particular network hardware.<br>

<br>

So why is this necessary? Suppose instead there was a network monitoring<br>

system that worked like this:<br>

<br>

  - Find any SNMP device on your subnet<br>

  - Walk its SNMP tree, collecting all data, no matter what it is:<br>

interface counters, manufacturer&#39;s serial number, I dont care<br>

  - Save this data in some sort of time series storage, like RRD<br>

  - Then use statistics to throw an alert when a new value (or more<br>

likely a group of new values) differs sufficiently in statistical terms<br>

from the history of that value.<br>

<br>

The great thing about this plan is you don&#39;t need to configure in<br>

advance the MIBs and OIDs. When an alert happens, the system can include<br>

the OID in the message. A human can then look it up or otherwise deal.<br>

<br>

There will be false positives, but one should be able to filter those<br>

out once they happen. A real network problem in my experience involved<br>

some values jumping from 0-1-2-0 to 1,234,567 so you can dial the<br>

sensitivity way down on the statistical tests.<br>

<br>

My question is, why does this not exist? Is there some reason I have<br>

overlooked why this would be impractical? Or does it exist and I just<br>

have not found it?<br>

<br>

  - Alex<br>

<br>

<br>

<br>

------------------------------<br>

<br>

Message: 2<br>

Date: Sat, 03 Aug 2013 16:09:05 -0400<br>

From: Brian O&#39;Neill &lt;<a href="mailto:oneill@oinc.net">oneill@oinc.net</a>&gt;<br>

Subject: Re: [BBLISA] statistics-based zero config network management:<br>

        why doesnt this exist?<br>

To: <a href="mailto:bblisa@bblisa.org">bblisa@bblisa.org</a><br>

Message-ID: &lt;<a href="mailto:51FD6361.8020606@oinc.net">51FD6361.8020606@oinc.net</a>&gt;<br>

Content-Type: text/plain; charset=ISO-8859-1; format=flowed<br>

<br>

90% of the data available in SNMP isn&#39;t generally relevant, and it can<br>

be massive on some systems and take a long time to poll.<br>

<br>

Zenoss seems to have a decent auto-discovery system, and will do its<br>

best to detect the type of system and apply a template. That template<br>

defines what is relevant to monitor. I personally didn&#39;t like it as it<br>

was rather complicated to do anything &quot;out of the box&quot;, and the kind of<br>

monitoring I generally dealt with needed more flexibility (based on my<br>

evaluation).<br>

<br>

Aside from auto-discovery, I like cacti for its presentation and seeing<br>

trend data. I like nagios for its flexibility and notification handling,<br>

but I think it requires a lot of scripting to use effectively. I may be<br>

investigating soon some autodiscovery-type systems, or design one on my<br>

own...I do like that nagios can be provisioned by outside systems - even<br>

multiple ones.<br>

<br>

<br>

On 8/3/2013 3:52 PM, Alex Aminoff wrote:<br>

&gt;<br>

&gt; I&#39;m looking at SNMP-based network monitoring systems: cacti, zabbix,<br>

&gt; some other similar ones. All of them seem to require you to configure<br>

&gt; your devices on the system. There are some auto-discovery functions, but<br>

&gt; they only work if you have loaded up the &quot;profile&quot; or &quot;template&quot; for<br>

&gt; your particular network hardware.<br>

&gt;<br>

&gt; So why is this necessary? Suppose instead there was a network monitoring<br>

&gt; system that worked like this:<br>

&gt;<br>

&gt;    - Find any SNMP device on your subnet<br>

&gt;    - Walk its SNMP tree, collecting all data, no matter what it is:<br>

&gt; interface counters, manufacturer&#39;s serial number, I dont care<br>

&gt;    - Save this data in some sort of time series storage, like RRD<br>

&gt;    - Then use statistics to throw an alert when a new value (or more<br>

&gt; likely a group of new values) differs sufficiently in statistical terms<br>

&gt; from the history of that value.<br>

&gt;<br>

&gt; The great thing about this plan is you don&#39;t need to configure in<br>

&gt; advance the MIBs and OIDs. When an alert happens, the system can include<br>

&gt; the OID in the message. A human can then look it up or otherwise deal.<br>

&gt;<br>

&gt; There will be false positives, but one should be able to filter those<br>

&gt; out once they happen. A real network problem in my experience involved<br>

&gt; some values jumping from 0-1-2-0 to 1,234,567 so you can dial the<br>

&gt; sensitivity way down on the statistical tests.<br>

&gt;<br>

&gt; My question is, why does this not exist? Is there some reason I have<br>

&gt; overlooked why this would be impractical? Or does it exist and I just<br>

&gt; have not found it?<br>

&gt;<br>

&gt;    - Alex<br>

&gt;<br>

&gt; _______________________________________________<br>

&gt; bblisa mailing list<br>

&gt; <a href="mailto:bblisa@bblisa.org">bblisa@bblisa.org</a><br>

&gt; <a href="http://www.bblisa.org/mailman/listinfo/bblisa" target="_blank">http://www.bblisa.org/mailman/listinfo/bblisa</a><br>

&gt;<br>

<br>

<br>

<br>

------------------------------<br>

<br>

Message: 3<br>

Date: Sat, 3 Aug 2013 20:13:26 +0000<br>

From: <a href="mailto:kurin@delete.org">kurin@delete.org</a><br>

Subject: Re: [BBLISA] statistics-based zero config network management:<br>

        why doesnt this exist?<br>

To: Alex Aminoff &lt;<a href="mailto:alex@basespace.net">alex@basespace.net</a>&gt;<br>

Cc: <a href="mailto:bblisa@bblisa.org">bblisa@bblisa.org</a><br>

Message-ID: &lt;<a href="mailto:20130803201326.GC4490@delete.org">20130803201326.GC4490@delete.org</a>&gt;<br>

Content-Type: text/plain; charset=us-ascii<br>

<br>

I&#39;ve toyed with the idea of applying machine learning to syslog alerts,<br>

trying to predict failures, but I never got off the ground.  The whole<br>

thing has to be unsupervised, unless you&#39;re willing to sit there<br>

classifying every event.<br>

<br>

On Sat, Aug 03, 2013 at 03:52:41PM -0400, Alex Aminoff wrote:<br>

&gt;<br>

&gt; I&#39;m looking at SNMP-based network monitoring systems: cacti, zabbix,<br>

&gt; some other similar ones. All of them seem to require you to configure<br>

&gt; your devices on the system. There are some auto-discovery functions, but<br>

&gt; they only work if you have loaded up the &quot;profile&quot; or &quot;template&quot; for<br>

&gt; your particular network hardware.<br>

&gt;<br>

&gt; So why is this necessary? Suppose instead there was a network monitoring<br>

&gt; system that worked like this:<br>

&gt;<br>

&gt;   - Find any SNMP device on your subnet<br>

&gt;   - Walk its SNMP tree, collecting all data, no matter what it is:<br>

&gt; interface counters, manufacturer&#39;s serial number, I dont care<br>

&gt;   - Save this data in some sort of time series storage, like RRD<br>

&gt;   - Then use statistics to throw an alert when a new value (or more<br>

&gt; likely a group of new values) differs sufficiently in statistical terms<br>

&gt; from the history of that value.<br>

&gt;<br>

&gt; The great thing about this plan is you don&#39;t need to configure in<br>

&gt; advance the MIBs and OIDs. When an alert happens, the system can include<br>

&gt; the OID in the message. A human can then look it up or otherwise deal.<br>

&gt;<br>

&gt; There will be false positives, but one should be able to filter those<br>

&gt; out once they happen. A real network problem in my experience involved<br>

&gt; some values jumping from 0-1-2-0 to 1,234,567 so you can dial the<br>

&gt; sensitivity way down on the statistical tests.<br>

&gt;<br>

&gt; My question is, why does this not exist? Is there some reason I have<br>

&gt; overlooked why this would be impractical? Or does it exist and I just<br>

&gt; have not found it?<br>

&gt;<br>

&gt;   - Alex<br>

&gt;<br>

&gt; _______________________________________________<br>

&gt; bblisa mailing list<br>

&gt; <a href="mailto:bblisa@bblisa.org">bblisa@bblisa.org</a><br>

&gt; <a href="http://www.bblisa.org/mailman/listinfo/bblisa" target="_blank">http://www.bblisa.org/mailman/listinfo/bblisa</a><br>

&gt;<br>

<br>

<br>

<br>

------------------------------<br>

<br>

Message: 4<br>

Date: Sat, 3 Aug 2013 19:10:32 -0400<br>

From: Matt Simmons &lt;<a href="mailto:bandman@gmail.com">bandman@gmail.com</a>&gt;<br>

Subject: Re: [BBLISA] statistics-based zero config network management:<br>

        why doesnt this exist?<br>

To: <a href="mailto:kurin@delete.org">kurin@delete.org</a><br>

Cc: <a href="mailto:bblisa@bblisa.org">bblisa@bblisa.org</a>, Alex Aminoff &lt;<a href="mailto:alex@basespace.net">alex@basespace.net</a>&gt;<br>

Message-ID:<br>

        &lt;CAL0sVA_FZ-Qbo8iZMe-NrFKOQjodrGPW2r1mC1B-mBwar=<a href="mailto:CgZw@mail.gmail.com">CgZw@mail.gmail.com</a>&gt;<br>

Content-Type: text/plain; charset=&quot;utf-8&quot;<br>

<br>

Have you looked into any of the Windows-based solutions like Spiceworks<br>

(free ad-supported)? They do an amazing job with autodiscovery, not just of<br>

SNMP-enabled devices, but also UNIX/Linux and other Windows machines. I&#39;ve<br>

been impressed, although I&#39;ve never actually found the tools fit into my<br>

workflow, I appreciate what they do.<br>

<br>

--Matt<br>

<br>

<br>

<br>

On Sat, Aug 3, 2013 at 4:13 PM, &lt;<a href="mailto:kurin@delete.org">kurin@delete.org</a>&gt; wrote:<br>

<br>

&gt; I&#39;ve toyed with the idea of applying machine learning to syslog alerts,<br>

&gt; trying to predict failures, but I never got off the ground.  The whole<br>

&gt; thing has to be unsupervised, unless you&#39;re willing to sit there<br>

&gt; classifying every event.<br>

&gt;<br>

&gt; On Sat, Aug 03, 2013 at 03:52:41PM -0400, Alex Aminoff wrote:<br>

&gt; &gt;<br>

&gt; &gt; I&#39;m looking at SNMP-based network monitoring systems: cacti, zabbix,<br>

&gt; &gt; some other similar ones. All of them seem to require you to configure<br>

&gt; &gt; your devices on the system. There are some auto-discovery functions, but<br>

&gt; &gt; they only work if you have loaded up the &quot;profile&quot; or &quot;template&quot; for<br>

&gt; &gt; your particular network hardware.<br>

&gt; &gt;<br>

&gt; &gt; So why is this necessary? Suppose instead there was a network monitoring<br>

&gt; &gt; system that worked like this:<br>

&gt; &gt;<br>

&gt; &gt;   - Find any SNMP device on your subnet<br>

&gt; &gt;   - Walk its SNMP tree, collecting all data, no matter what it is:<br>

&gt; &gt; interface counters, manufacturer&#39;s serial number, I dont care<br>

&gt; &gt;   - Save this data in some sort of time series storage, like RRD<br>

&gt; &gt;   - Then use statistics to throw an alert when a new value (or more<br>

&gt; &gt; likely a group of new values) differs sufficiently in statistical terms<br>

&gt; &gt; from the history of that value.<br>

&gt; &gt;<br>

&gt; &gt; The great thing about this plan is you don&#39;t need to configure in<br>

&gt; &gt; advance the MIBs and OIDs. When an alert happens, the system can include<br>

&gt; &gt; the OID in the message. A human can then look it up or otherwise deal.<br>

&gt; &gt;<br>

&gt; &gt; There will be false positives, but one should be able to filter those<br>

&gt; &gt; out once they happen. A real network problem in my experience involved<br>

&gt; &gt; some values jumping from 0-1-2-0 to 1,234,567 so you can dial the<br>

&gt; &gt; sensitivity way down on the statistical tests.<br>

&gt; &gt;<br>

&gt; &gt; My question is, why does this not exist? Is there some reason I have<br>

&gt; &gt; overlooked why this would be impractical? Or does it exist and I just<br>

&gt; &gt; have not found it?<br>

&gt; &gt;<br>

&gt; &gt;   - Alex<br>

&gt; &gt;<br>

&gt; &gt; _______________________________________________<br>

&gt; &gt; bblisa mailing list<br>

&gt; &gt; <a href="mailto:bblisa@bblisa.org">bblisa@bblisa.org</a><br>

&gt; &gt; <a href="http://www.bblisa.org/mailman/listinfo/bblisa" target="_blank">http://www.bblisa.org/mailman/listinfo/bblisa</a><br>

&gt; &gt;<br>

&gt;<br>

&gt; _______________________________________________<br>

&gt; bblisa mailing list<br>

&gt; <a href="mailto:bblisa@bblisa.org">bblisa@bblisa.org</a><br>

&gt; <a href="http://www.bblisa.org/mailman/listinfo/bblisa" target="_blank">http://www.bblisa.org/mailman/listinfo/bblisa</a><br>

&gt;<br>

<br>

<br>

<br>

--<br>

&quot;Today, vegetables... Tomorrow, the world!&quot;<br>

-------------- next part --------------<br>

An HTML attachment was scrubbed...<br>

URL: <a href="http://www.bblisa.org/pipermail/bblisa/attachments/20130803/2a007a0d/attachment.html" target="_blank">http://www.bblisa.org/pipermail/bblisa/attachments/20130803/2a007a0d/attachment.html</a><br>

<br>

------------------------------<br>

<br>

Message: 5<br>

Date: Sun, 4 Aug 2013 12:21:43 +0000<br>

From: &quot;Edward Ned Harvey (bblisa4)&quot; &lt;<a href="mailto:bblisa4@nedharvey.com">bblisa4@nedharvey.com</a>&gt;<br>

Subject: Re: [BBLISA] statistics-based zero config network management:<br>

        why doesnt this exist?<br>

To: Alex Aminoff &lt;<a href="mailto:alex@basespace.net">alex@basespace.net</a>&gt;, &quot;<a href="mailto:bblisa@bblisa.org">bblisa@bblisa.org</a>&quot;<br>

        &lt;<a href="mailto:bblisa@bblisa.org">bblisa@bblisa.org</a>&gt;<br>

Message-ID:<br>

        &lt;<a href="mailto:54dc388cccfc4970a726fdc3bad7c891@BLUPR04MB040.namprd04.prod.outlook.com">54dc388cccfc4970a726fdc3bad7c891@BLUPR04MB040.namprd04.prod.outlook.com</a>&gt;<br>

<br>

Content-Type: text/plain; charset=&quot;us-ascii&quot;<br>

<br>

&gt; From: <a href="mailto:bblisa-bounces@bblisa.org">bblisa-bounces@bblisa.org</a> [mailto:<a href="mailto:bblisa-bounces@bblisa.org">bblisa-bounces@bblisa.org</a>] On<br>

&gt; Behalf Of Alex Aminoff<br>

&gt;<br>

&gt; I&#39;m looking at SNMP-based network monitoring systems: cacti, zabbix,<br>

&gt; some other similar ones. All of them seem to require you to configure<br>

&gt; your devices on the system. There are some auto-discovery functions, but<br>

&gt; they only work if you have loaded up the &quot;profile&quot; or &quot;template&quot; for<br>

&gt; your particular network hardware.<br>

<br>

I don&#39;t think that&#39;s correct.  I think SNMP auto discover does exactly as you said.  It just walks the system, discovers whatever it can discover, and there you have it.<br>

<br>

The thing is:  Very rarely is SNMP sufficient.  For most devices, it counts no more than a ping monitor.  If you want reliable statistics of cpu, disk, network, memory usage, you have to install an agent.  I emphasize reliable.  Because although SNMP technically supports all that, I&#39;ve never seen it usable for that purpose.<br>


<br>

<br>

<br>

------------------------------<br>

<br>

Message: 6<br>

Date: Sun, 04 Aug 2013 10:14:24 -0400<br>

From: Brian O&#39;Neill &lt;<a href="mailto:oneill@oinc.net">oneill@oinc.net</a>&gt;<br>

Subject: Re: [BBLISA] statistics-based zero config network management:<br>

        why doesnt this exist?<br>

To: <a href="mailto:bblisa@bblisa.org">bblisa@bblisa.org</a><br>

Message-ID: &lt;<a href="mailto:51FE61C0.2040600@oinc.net">51FE61C0.2040600@oinc.net</a>&gt;<br>

Content-Type: text/plain; charset=ISO-8859-1; format=flowed<br>

<br>

On 8/4/2013 8:21 AM, Edward Ned Harvey (bblisa4) wrote:<br>

&gt; The thing is:  Very rarely is SNMP sufficient.  For most devices, it counts no more than a ping monitor.  If you want reliable statistics of cpu, disk, network, memory usage, you have to install an agent.  I emphasize reliable.  Because although SNMP technically supports all that, I&#39;ve never seen it usable for that purpose.<br>


&gt;<br>

<br>

I use it all the time for network, disk, CPU and memory monitoring on my<br>

Linux boxes using Net-SNMP.<br>

<br>

Windows, on the other hand, is more of a problem. SNMP out of the box on<br>

Windows only exposes network info. You can add SNMP-Informant - the free<br>

version adds disk space, CPU and memory, but the memory monitoring isn&#39;t<br>

terribly useful from what I&#39;ve found. And I&#39;m running into problems with<br>

reliability, not due to SNMP itself, but Windows implementation of it.<br>

On some systems, it is just really slow at times getting even a small<br>

amount of info, like space used on a single volume system. It also<br>

doesn&#39;t appear to be a complete version 2 implementation (does not<br>

support getbulkrequest). Windows doesn&#39;t seem to want to support<br>

anything for monitoring but their own, like WMI, and even then they seem<br>

non-committal - we investigated Exchange 2010 (or maybe 2007 - I forget<br>

how long ago) monitoring via WMI, and there were indications they didn&#39;t<br>

plan on providing the data...I think they did eventually provide it. But<br>

WMI can also be slow when accessing remotely, and sometimes requires<br>

elevated credentials that depending on the local bureaucracy might not<br>

be possible.<br>

<br>

On network devices, it depends on the manufacturer, but most of the big<br>

ones will give you decent info. Finding the MIBs to know what the info<br>

actually is can be a challenge.<br>

<br>

<br>

<br>

------------------------------<br>

<br>

_______________________________________________<br>

bblisa mailing list<br>

<a href="mailto:bblisa@bblisa.org">bblisa@bblisa.org</a><br>

<a href="http://www.bblisa.org/mailman/listinfo/bblisa" target="_blank">http://www.bblisa.org/mailman/listinfo/bblisa</a><br>

<br>

End of bblisa Digest, Vol 117, Issue 1<br>

**************************************<br>

</blockquote></div><br></div></div>