[BBLISA] Monitoring survey
Dewey Sasser
dewey at sasser.com
Thu Feb 4 11:46:24 EST 2016
On 2/4/2016 10:35 AM, John Miller wrote:
> Does anyone have experience with LogicMonitor, New Relic, Scout,
> Librato, Traverse, etc.? I could see those tools being a very good
> fit for us, provided that metrics are collected in a sane fashion (we
> don't need hundreds of devices trying to report off site) and the
> billing model is flexible (we want our VMs now, thank you very much).
> We've also got strong in-house Windows experience: has anyone used
> SCOM to monitor Linux devices?
>
I use Librato extensively and have used New Relic peripherally.
New Relic was fairly easy but their first solution was in the appserver
monitoring space while they've grown beyond it, I think it colors them.
On the other hand, they have great navigation and summaries for problems
in that space. If I have any needs for appserver monitoring I'd
probably look to them first, though the DataDog demo's I've seen also
look very cool.
Librato is basically graphite-as-a-service, though it's a much improved
graphite. It has excellent support for metric ingest from a number of
sources (including directly from CloudWatch), a very easy to use REST
protocol and language-specific APIs. It supports standard threshold
alerting and integrates with PagerDuty. I've built more complex
alerting by extracting data via their API (quite easy) and doing more
interesting calculations.
I'm currently using it with CloudWatch ingest for infrastructure,
collectd for guest level metrics (heavily filtered to avoid duplication
with CloudWatch and irrelevant metrics) and direct use of their API for
app level metrics.
Their billing model is per metric with different update rates costing
different amounts. I *never* have to think about Librato when adding
new resources -- it "just works". I believe they cost on the order of
1% as much as our AWS bill :-)
I'm not sure what you mean about "metrics are collected in a sane
fashion". It *is* a service and thus your devices do need to report to
their system over the Internet (TLS, of course). For our needs (mix of
public cloud and on-prem) this is sane. I don't recall any time their
metric collection went down.
I haven't investigated if they have some kind of relay agent. We did
take down their (then beta) display system once by doing evil things
with lots of people viewing dashboards of composite metrics on large
metrics sets. However, they were back up within 4 hours (and worked
with us really well) and their metric ingest never hiccuped. (Note to
self: inform all service vendors in advance of large product launch.
Your blast radius is larger than your company).
--
Dewey
More information about the bblisa
mailing list