[BBLISA] Large scale log processing

Fri May 15 15:29:33 EDT 2009

Mike Sprague wrote:
> One obvious solution is syslog-ng and a central log server.

On the collection side of things, I'd recommend taking a look at rsyslog:
http://www.rsyslog.com/

It's a drop-in replacement for syslog, is forked from sysklogd, and adds 
features like reliable transports (TCP or its own RELP protocol over 
TCP), queues, multiple storage drivers (SQL databases), plus filtering. 
The author is currently working on batch processing of queues to further 
boost performance.

I don't have a recommendation on the analysis side, but you might want 
to start there and work backwards, as it will likely dictate or at least 
influence how the data is gathered and stored.

> A colleague mentioned hadoop/MapReduce (http://hadoop.apache.org/).

Isn't that more of a raw storage and processing technology that will 
still require an analysis app? Is that something you want to write? I 
see there is a general purpose data summarization tool (Hive) that works 
with Hadoop, but even that my require coding to get it to behave as a 
useful log analysis tool.

  -Tom

-- 
Tom Metro
Venture Logic, Newton, MA, USA
"Enterprise solutions through open source."
Professional Profile: http://tmetro.venturelogic.com/