Friday, September 7, 2007

External Models

Internal models seem daunting; even if an OS and its applications are well tested and even if you have a programmatic way to accurately explore a system's state space. Documentation is another issue: most software does not come with a state machine description, but it does come with bugs! How do the bugs affect the state space?

By contrast, external models attempt to paint a picture of (or extract a set of rules that govern) normal behavior by observing input and output statistics over time. What is normal? That's a difficult question, worthy of PhD research. Suppose that I look at the number of incoming HTTP requests versus disk reads on a web server during a given time period. I might notice a strong positive correlation during this period; as the requests increase so do the number of reads. If I am quick to define this relationship as normal, I would be ignoring several important facts:
  • correlation is not causality; while it may be true that incoming requests are the primary driver of disk reads, they are not likely to be the only driver. There is also the classic statistical maxim that a third, or any number of other known or hidden drivers is influencing both variables.
  • my sample is too small; it could be that my observations fall during a period when there is a strong correlation, but further sampling would reveal a much weaker correlation, or a cyclic correlation.
  • caching kicks in; depending on the nature of the requests during the observation period, caching may cause disk reads to drop precipitously after a sufficient time.
There are of course ways to deal with these pesky facts, but the point is that we need to be quite careful when establishing "norms" for the behavior of a particular system. The ideal behavioral profile would consist of a set of multivariate functions whose input variables are independent of each other and uniquely determine each output value. The multidimensional surface that is determined by these functions would be the model. A point that does not lie on or near (by some as yet undefined measure) this surface represents abnormal behavior.

Sadly, there is usually a big difference between an ideal and reality. While inputs and outputs may be correlated in some way, they may not be related by functions, or if they are, the functions may not be easily extracted.

No comments: