Thursday, April 12, 2012

Raw data is good, Normalized is better, Categorized is awesome!

Continuing my last post on why using normalized data is better than just using raw data and how it accelerates the analisys process resulting in a faster response and therefor money saving, I'd like to focus now in the data mining aspect.

Remember the scenario: you are the IT responsible for your company's custom developed transaction application and your boss ask you to send him a report with all the activity related to the account number 1234567890.

Of course you can give all the raw information to your boss, but I'd not sure he is gonna like the idea of receiving a 20 pages report with all the entries where that account has been involved...

Having the data in raw format is good, we need it, but data mining is very difficult on it.

I'd prefer to give him some data more easy to handle, maybe a excel file where the information is easy to visualize, filter, group etc....Maybe create some graphs...