Sunday, March 11, 2012

Raw data is good, Normalized is better!

This week my company arranged a seminar on log management and I had the opportunity to make a demo of one of our products.

My goal was to show why using normalized data is better than just using raw data and how this accelerate the analisys process resulting in a faster response and therefor money saving.

When I talk of normalized data, I mean that the information contained in a event is split in different pairs of "field:value". Saying it in a different way, we understand the content of that event.

Imagine an scenario where you are the IT responsible for your company custom developed transaction application. This application manage all the economical transaction between the company's different locations and the central server. You, aware as you are of the importance of having the logs properly secured, have the logs of this application sent into your company log management system.

As you don't have any specific usage for this logs beyond possible future troubleshooting and there isn't any regulation requirements which specify anything else, you decided that having the logs directly in raw format is enough. When I say "raw" format, I mean storing the logs just how they are created in the application, with other words, without understanding its content. And this may look as a complete valid statement for some cases.

But imagine...