Occam's Razor and Model Complexity
When using predictive analytics to develop a model it is important to understand the principles of model complexity. Occam’s Razor is a concept that is frequently stated, but not always fully understood. The basic idea is that “All else being equal, simpler models should be favored over more complex ones.” It is concept we both embrace and approach with caution so that it is not misused.
First, let’s flesh out the concept of Occam’s Razor beyond the simple aphorism given above as it can apply to predictive analytics.
Suppose I flip a coin ten times, and I get a run that goes “HHTTTHHTTT”. After observing the coin flips I assess that there are two possible models for the behavior of the coin:
(A) The coin is fair and has a 50/50 chance of getting either heads or tails on each flip. The observed run was just one of 1024 possible results of the ten coin flips.
(B) The coin flips are deterministic and will land in a repeating pattern of “HHTTT” which perfectly fits with the results of our sample of coin flips.
Without further experimentation I have no certain way of knowing which model is actually true. If I were to flip the coin five more times, if I got anything other than “HHTTT” all confidence in (B) would be gone, the same cannot be said for (A). This is because (B) is a much more complex model then (A). It other words, it would take much more evidence to be confident in (B) over (A).
Keeping this concept in mind is important when developing predictive models. With the huge volume of information and the massive data sets that CAN utilizes it can be tempting to include as many parameters as possible. Not paying attention to the model complexity, and the evidence required to support the claims of observed relationships can lead to false assumptions, and predictions that are outside of desired bounds.
Another problem arrises when Occam’s razor gets misapplied. A common mistake is that people misinterpret this to mean that models should be a simple as possible, when the thought process should be to keep models as simple as they need to be to explain the what is observed.
One of the more famous examples of this is in the history of the understanding of the motion of the planets. In an attempt to explain the retrograde motion of the planets Ptolemy devised a complicated geocentric model where each of the known planets, including the sun, orbited the earth while also moving in their own smaller circular paths.
Alternatively, Copernicus’ heliocentric model, placing the sun at the center of the solar system, was a much simpler model and it fit the observed motion of the planets, as the planets are orbiting the sun at different rates.
As revolutionary as the Copernican idea was, it does fit Ockham’s premise. At the time, Ptolemy’s model was a better fit for what had been observed. But, like the more complicated coin flip model the geocentric model fit what had previously been observed yet utterly failed when later observations did not fit.
But, simpler models alone do not mean that that the truth is revealed. Copernicus’ model was not perfect. Later, when Johannes Kepler discovered that the planets move in elliptical orbits, not the circular ones as supposed by Copernicus, it made the model much more accurate at predicting the movement of the planets. But, ellipses are much more complex then circles, what does this mean for Ockham?
Again, utilizing Occam’s razor is not a search for the most simple model, but the most simple required. While Kepler added complexity to the model, it was complexity that was supported by decades of information gathering on the inconsistencies in the Copernican model. More complicated models require more evidence to defend, which is not the same as saying they are indefensible.
Let’s go back to the coin flip example and change the scenario slightly. Let’s say we were attending a magic show, and the magician claimed that he could flip a coin and make it follow the pattern “HHTTT” (i.e. model (B) ). Does this change our perception of the complexity of the models? Of course it does. We expect there to be a trick that will cause the magician’s claim to be true.
Like Kepler’s model, there is more complexity in the model that predicts a distinct pattern of coin flips, but the presence of new data (the magician flipping the coins, or the precession of planetary orbits for Kepler) allows us to have more confidence in the more complex model.
At Contemporary Analysis we solve hard problems, and hard problems often necessitate complex models, and complex models are not in themselves bad. Throwing out informative parameters for just the sake of simplicity would strip away the predictive value of the models. Its about ensuring that the factors of the complex models are both valuable and grounded in sound theory. This is why we don’t just jump into building models as soon as we have data. We spend a lot of time working with customers to understand the nature of the environment we are modeling. Simple or complex, if our theory is misinformed there is no way our models can accurately reflect the real world.
References:
http://philosophy.wisc.edu/forster/520/Chapter%203.pdf