Hi again,
In September I had the opportunity to participate in the ISF (Norwegian Information Security Forum) autumn conference with my talk "A Darwinian Theory of Logs". It was a great conference, well organized and with several interesting talks and networking possibilities.
Obviously it won't be the same to just see the slides than be there in person. The original slides had very little text and they where use only as a supporting mechanism for transmit the content of the talk. But anyway I'd like to share with you the presentation here in our blog.
Click here to download the presentation.
Comments are welcome :)
Friday, October 12, 2012
Thursday, April 12, 2012
Raw data is good, Normalized is better, Categorized is awesome!
Continuing my last post on why using normalized data is better than just using raw data and how it accelerates the analisys process resulting in a faster response and therefor money saving, I'd like to focus now in the data mining aspect.
Remember the scenario: you are the IT responsible for your company's custom developed transaction application and your boss ask you to send him a report with all the activity related to the account number 1234567890.
Of course you can give all the raw information to your boss, but I'd not sure he is gonna like the idea of receiving a 20 pages report with all the entries where that account has been involved...
Having the data in raw format is good, we need it, but data mining is very difficult on it.
I'd prefer to give him some data more easy to handle, maybe a excel file where the information is easy to visualize, filter, group etc....Maybe create some graphs...
Remember the scenario: you are the IT responsible for your company's custom developed transaction application and your boss ask you to send him a report with all the activity related to the account number 1234567890.
Of course you can give all the raw information to your boss, but I'd not sure he is gonna like the idea of receiving a 20 pages report with all the entries where that account has been involved...
Having the data in raw format is good, we need it, but data mining is very difficult on it.
I'd prefer to give him some data more easy to handle, maybe a excel file where the information is easy to visualize, filter, group etc....Maybe create some graphs...
Sunday, March 11, 2012
Raw data is good, Normalized is better!
This week my company arranged a seminar on log management and I had the opportunity to make a demo of one of our products.
My goal was to show why using normalized data is better than just using raw data and how this accelerate the analisys process resulting in a faster response and therefor money saving.
When I talk of normalized data, I mean that the information contained in a event is split in different pairs of "field:value". Saying it in a different way, we understand the content of that event.
Imagine an scenario where you are the IT responsible for your company custom developed transaction application. This application manage all the economical transaction between the company's different locations and the central server. You, aware as you are of the importance of having the logs properly secured, have the logs of this application sent into your company log management system.
As you don't have any specific usage for this logs beyond possible future troubleshooting and there isn't any regulation requirements which specify anything else, you decided that having the logs directly in raw format is enough. When I say "raw" format, I mean storing the logs just how they are created in the application, with other words, without understanding its content. And this may look as a complete valid statement for some cases.
But imagine...
My goal was to show why using normalized data is better than just using raw data and how this accelerate the analisys process resulting in a faster response and therefor money saving.
When I talk of normalized data, I mean that the information contained in a event is split in different pairs of "field:value". Saying it in a different way, we understand the content of that event.
Imagine an scenario where you are the IT responsible for your company custom developed transaction application. This application manage all the economical transaction between the company's different locations and the central server. You, aware as you are of the importance of having the logs properly secured, have the logs of this application sent into your company log management system.
As you don't have any specific usage for this logs beyond possible future troubleshooting and there isn't any regulation requirements which specify anything else, you decided that having the logs directly in raw format is enough. When I say "raw" format, I mean storing the logs just how they are created in the application, with other words, without understanding its content. And this may look as a complete valid statement for some cases.
But imagine...
Thursday, January 5, 2012
Automated open source intelligence utilities
In my last post I talked about how you can use open source intelligence information to prioritize your alerts. And I think it could be interesting to make a short comparison between the two utilities I mentioned: ArcOSI and EnigmaIndicators.
As said before, the general idea in both utilities is the same:
And in both you can defined your own sources of information and whitelist specific entries.
The main different i can see are shown in the table below:
Do you know of any other interesting open source intelligence utility?
As said before, the general idea in both utilities is the same:
- Scrapes different open sources intelligence sites for known malware information as IPs, domains, urls, MD5 files hashes, email addresses, etc.
- For each entry, create an CEF event with the source and type of intelligence and the threat information.
- Send it via Syslog to a defined destination.
And in both you can defined your own sources of information and whitelist specific entries.
The main different i can see are shown in the table below:
ArcOSI (http://code.google.com/p/arcosi/) | EnigmaIndicators http://enigmaindicators.codeplex.com/ | |
Scripting language used | Python | Bash (dependencies - bash, cut, grep, zgrep, sed, awk, curl, wget, sort, perl and *nix /dev/udp and/or tcp socket) |
Types / number of reputation sources |
|
|
Entropy calculation | N/A | Enigma calculates entropy (measures the randomness of possible outcome) against the relevant data it parses for advance heuristics detection |
Do you know of any other interesting open source intelligence utility?
Labels:
ArcOSI,
enigmaIndicators,
Open Source Intelligence,
Reputation
Prioritizing alerts using automated open source intelligence
After a very busy 2011, I'm starting 2012 with a new year's resolution: "To write posts more often". And here is the first one...
Lately I've been working in how to enhance the data using open source intelligence information. And I'm amazed how much value you can get from it.
The idea is to use reputation information from public sources and correlate it with your internal events in order to prioritize alerts. For example, in IDS/IPS alerts, you can correlate the external IPs in the IDS signatures against a list of known Malware IPs and increase the priority of if you get a match. Of course you can extend this to domain names, urls, etc. and also to different log sources as firewalls, proxies and so on.
Lately I've been working in how to enhance the data using open source intelligence information. And I'm amazed how much value you can get from it.
The idea is to use reputation information from public sources and correlate it with your internal events in order to prioritize alerts. For example, in IDS/IPS alerts, you can correlate the external IPs in the IDS signatures against a list of known Malware IPs and increase the priority of if you get a match. Of course you can extend this to domain names, urls, etc. and also to different log sources as firewalls, proxies and so on.
Labels:
ArcOSI,
arcsight,
enigmaIndicators,
envision,
Open Source Intelligence,
q1,
Reputation
Subscribe to:
Posts (Atom)