PhDidact: 2007

Thursday, September 13, 2007

Machine Learning Approaches to Classifying Behavior

Nothing here yet...

The Shapes of Normality

Discussion about our submitted LISA '07 paper.

Using a Lattice to Predict Performance

Sets/Subsets placeholder.

Thresholds

These are limited in their ability to signal abnormal conditions.

Friday, September 7, 2007

Internal models seem daunting; even if an OS and its applications are well tested and even if you have a programmatic way to accurately explore a system's state space. Documentation is another issue: most software does not come with a state machine description, but it does come with bugs! How do the bugs affect the state space?

By contrast, external models attempt to paint a picture of (or extract a set of rules that govern) normal behavior by observing input and output statistics over time. What is normal? That's a difficult question, worthy of PhD research. Suppose that I look at the number of incoming HTTP requests versus disk reads on a web server during a given time period. I might notice a strong positive correlation during this period; as the requests increase so do the number of reads. If I am quick to define this relationship as normal, I would be ignoring several important facts:

correlation is not causality; while it may be true that incoming requests are the primary driver of disk reads, they are not likely to be the only driver. There is also the classic statistical maxim that a third, or any number of other known or hidden drivers is influencing both variables.

my sample is too small; it could be that my observations fall during a period when there is a strong correlation, but further sampling would reveal a much weaker correlation, or a cyclic correlation.

caching kicks in; depending on the nature of the requests during the observation period, caching may cause disk reads to drop precipitously after a sufficient time.

There are of course ways to deal with these pesky facts, but the point is that we need to be quite careful when establishing "norms" for the behavior of a particular system. The ideal behavioral profile would consist of a set of multivariate functions whose input variables are independent of each other and uniquely determine each output value. The multidimensional surface that is determined by these functions would be the model. A point that does not lie on or near (by some as yet undefined measure) this surface represents abnormal behavior.

Sadly, there is usually a big difference between an ideal and reality. While inputs and outputs may be correlated in some way, they may not be related by functions, or if they are, the functions may not be easily extracted.

Wednesday, September 5, 2007

Internal Models

There are only a few ways to describe and/or predict the behavior of computer systems and networks. One can attempt to produce an internal model. An assumption about internal models is that they describe as precisely and completely as possible the internal state machine that governs behavior. All computer system state machines are finite, although they may be impractically large and/or complex for human understanding. Besides their size and complexity, which tend to hide dependencies, if we look at them as implementing hidden Markov processes, it is nearly impossible to determine the probabilities of all but a small subset of potential state transitions. To see why these state machines are so complex, consider the typical *NIX based operating system. Encoded in the kernel is the equivalent of thousands, if not millions of conditional statements. Think of how many states are represented by various combinations of true and false conditionals. It's just a bit staggering! Now assume that you have a good handle on even the types of states available: if you know what state the machine is in, you should be able to determine with reasonable confidence, the class of states to which the system will transition next. If it doesn't do this, then there are two possibilities: either your model is incorrect or not accurate enough; or your system is broken.

Tuesday, September 4, 2007

I Am a Didact

di · dact [dahy-dakt]
-noun
a didactic person; one overinclined to instruct others.

I am a didact, but I figure that first I need to complete my own instruction. I'm working on my PhD in Computer Science and trying to finish by the end of next summer. I hope this blog will help me in my final push to finish the research and write my dissertation. Many of you will find this blog incomprehensible, which is OK, because I'm writing it for me. I have a hard time writing coherently for two reasons: one, because it is very difficult for me to write in a train of thought style and edit afterward; two, because I have collected a tremendous amount of information toward this endeavor and I don't see where to start or how to organize everything. I figure that if I write a little every day, I can collect my thoughts more consistently and get better at writing. EOF