Friday, October 12, 2012

A Darwinian Theory of Logs

Hi again,

In September I had the opportunity to participate in the ISF (Norwegian Information Security Forum) autumn conference with my talk "A Darwinian Theory of Logs". It was a great conference, well organized and with several interesting talks and networking possibilities.

Obviously it won't be the same to just see the slides than be there in person. The original slides had very little text and they where use only as a supporting mechanism for transmit the content of the talk. But anyway I'd like to share with you the presentation here in our blog.

Click here to download the presentation. 

Comments are welcome :)


Thursday, April 12, 2012

Raw data is good, Normalized is better, Categorized is awesome!

Continuing my last post on why using normalized data is better than just using raw data and how it accelerates the analisys process resulting in a faster response and therefor money saving, I'd like to focus now in the data mining aspect.

Remember the scenario: you are the IT responsible for your company's custom developed transaction application and your boss ask you to send him a report with all the activity related to the account number 1234567890.

Of course you can give all the raw information to your boss, but I'd not sure he is gonna like the idea of receiving a 20 pages report with all the entries where that account has been involved...

Having the data in raw format is good, we need it, but data mining is very difficult on it.

I'd prefer to give him some data more easy to handle, maybe a excel file where the information is easy to visualize, filter, group etc....Maybe create some graphs...

Sunday, March 11, 2012

Raw data is good, Normalized is better!

This week my company arranged a seminar on log management and I had the opportunity to make a demo of one of our products.

My goal was to show why using normalized data is better than just using raw data and how this accelerate the analisys process resulting in a faster response and therefor money saving.

When I talk of normalized data, I mean that the information contained in a event is split in different pairs of "field:value". Saying it in a different way, we understand the content of that event.

Imagine an scenario where you are the IT responsible for your company custom developed transaction application. This application manage all the economical transaction between the company's different locations and the central server. You, aware as you are of the importance of having the logs properly secured, have the logs of this application sent into your company log management system.

As you don't have any specific usage for this logs beyond possible future troubleshooting and there isn't any regulation requirements which specify anything else, you decided that having the logs directly in raw format is enough. When I say "raw" format, I mean storing the logs just how they are created in the application, with other words, without understanding its content. And this may look as a complete valid statement for some cases.

But imagine...

Thursday, January 5, 2012

Automated open source intelligence utilities

In my last post I talked about how you can use open source intelligence information to prioritize your alerts. And I think it could be interesting to make a short comparison between the two utilities I mentioned: ArcOSI and EnigmaIndicators.

As said before, the general idea in both utilities is the same:
  1. Scrapes different open sources intelligence sites for known malware information as IPs, domains, urls, MD5 files hashes, email addresses, etc.
  2. For each entry, create an CEF event with the source and type of intelligence and the threat information.
  3. Send it via Syslog to a defined destination.
Both utilities are designed for a easy integration with ArcSight, using CEF, so no parser is needed.

And in both you can defined your own sources of information and whitelist specific entries.

The main different i can see are shown in the table below:

ArcOSI
(http://code.google.com/p/arcosi/)
EnigmaIndicators
http://enigmaindicators.codeplex.com/
Scripting language used      PythonBash
(dependencies - bash, cut, grep, zgrep, sed, awk, curl, wget, sort, perl and *nix /dev/udp and/or tcp socket)
Types / number of reputation sources
  1. IP / 7
  2. Domain / 7
  1. IP / 49
  2. Domain / 35
  3. Web requested URL / 8
  4. URL file name / 8
  5. User agent string /  2
  6. Email address sender / 1
  7. Email subject / 1
  8. Suspicious files /4
  9. News feed / 1
  10. MD5 file hash / 7
Entropy calculation      N/AEnigma calculates entropy (measures the randomness of possible outcome) against the relevant data it parses for advance heuristics detection

Do you know of any other interesting open source intelligence utility?

Prioritizing alerts using automated open source intelligence

After a very busy 2011, I'm starting 2012 with a new year's resolution: "To write posts more often". And here is the first one...

Lately I've been working in how to enhance the data using open source intelligence information. And I'm amazed how much value you can get from it.

The idea is to use reputation information from public sources and correlate it with your internal events in order to prioritize alerts. For example, in IDS/IPS alerts, you can correlate the external IPs in the IDS signatures against a list of known Malware IPs and increase the priority of if you get a match. Of course you can extend this to domain names, urls, etc. and also to different log sources as firewalls, proxies and so on.