Information theory introduced the image of system boundaries as implicitly projecting “input categories” upon reality, parsing it into “variables” with an associated space of possible outcomes. Metaphorically, a system queries Nature, asks it a yes/no-question, the answer of which constitutes a bit. Nowhere is this more evident than in children, who with unceasing curiosity experiment with their surroundings and never accept an answer without following it up with a progressively more patience-testing “But why?”. Eventually, however, our internal schema of reality becomes sufficiently rich for us to live a functional life without craving answers to everything and exploring any corner in sight. Curiosity-driven exploits become a waste of resources. We lose the childhood habit of isolating some aspect of reality by mentally situating it in a space of other conceivable scenarios, of counter-factual could-have-beens, of whys, and we replace it with “whatever”.
This is where science comes in. Science may be seen as an outgrowth of our instinctive curiosity, but it is by no means spontaneous. It took Homo sapiens millions of years of ad hoc religious explanations before Galileo Galilei, in the 16th and 17th century, pioneered the use of experiments to discover causes and make better predictions, and the scientific method is still undergoing refinement. But science is more than methodology – through the establishment of research institutions, humans are given resources and economic incentive to sustain their inquisitiveness and keep asking “Why?”.
But posing questions is no trivial task. It takes a brain of Newton’s caliber to watch an apple fall to the ground, wonder why it doesn’t fall upwards or leftwards, and from empirical data construct a mathematical model that makes its downward fall necessary and predictable. Historians of science use the term “Fragestellung” to refer to the fact that our adult capacity to mentally place an observation in a space of alternative scenarios, thereby creating uncertainty which we become motivated to resolve, depends critically on the intellectual climate and ideas to which the scientist is exposed. For example, the classical Greeks were eminent at careful, quantitative observation, yet they never arrived at Linnaeus’s taxonomy scheme, or conducted Mendel’s simple pea experiments which allowed him to infer the existence of genes.
And this is where information theory comes in. Not only has its ideas generated extremely fruitful lines of inquiry, it also brings a fresh perspective on science itself, as well as the strange loopiness that characterizes the ontological endeavor to access raw reality while simultaneously being embedded in it. The information-theoretical account runs somewhere along the following statements, about how new systems of processing information arise organically out of older systems:
- DNA encodes information about the evolutionary past, in the sense that systematic differences pierce through the noisy cloud of random fluctuations to, via natural selection, have statistical effects on the gene pool, making life more viable.
- Natural selection hones the nervous system so that systematic features of the environment can be encoded in the organism during his own lifetime (this is called “learning”). Perception is here regarded as the reception of a message.
- An organism’s learning may involve writing system, allowing him to encode systematic, stable cognitions on physical media like paper and computer documents. This acts like a collective conscious memory, which on Earth is unique to humans. To conduct an experiment is to query Nature, and the acceptance or rejection of an experimental hypothesis constitutes a bit.
- This way, low-entropy regularities in the environment will be translated into low-entropy regularities in neural activity and eventually low-entropy regularities in external representations like text and diagrams (which are then fed back, in a loop, as humans read and integrate symbolic information, expanding our collective knowledge).
Information theory also taught us how a message always can be encoded in a way that makes distortions negligible by building redundancy into the transmission code (for example, we may append extra letters and instructions on how to recover original message from redundancies). This raises the interesting question to what extent natural selection has polished neural encoding to optimize its fidelity, and maximize its mutual information with Nature. The discovery that humans are prone to cognitive biases and make poor intuitive statisticians, due to peculiarities in System 1 and System 2, suggests that natural selection has not put high priority on making humans sophisticated predictors. So in what way exactly does the scientific method supplement our flawed intuitions? What’s the secret behind its incredible success? The keys to science’s success are in its active probing, falsifiability, and the principle of Occam’s razor. Let us explain these in terms of information.
Scientific experiments are active interventions in causal chains. A scientist does not passively await data to investigate hypotheses – he brings himself to the conditions in which the data are the most precise and the least noise-ridden. In Bayesian predictive coding, this implies that prediction error signal will be minimized, since it is weighted by precision. Because precise data have greater influence in the updating of hypotheses, agency reduces uncertainty much more efficiently than passive perceptual inference. Note how this makes science an extension of action in general, from the eyes’ micro-saccades, to hand-reaching and everyday explorations. Science is Bayesian, in a way that our brains’ System 1 and 2 aren’t. According to Richard Feynman, it is “a way of trying not to fool yourself”, of making optical guesswork and protect our pooled knowledge from human fallibilities like confirmation bias and bandwagon-effects.
The notion of “falsifiability” means that a theory must provide test criteria by which it can be rejected in order to be meaningful. Science, according to this logic, cannot tell you what reality is like, but it can tell you what it is not like. It describes the world in terms of negatives. Recall that the more surprising an event is – that is, the bigger our misprediction, and the bigger our confidence in our theory – the more informative it is, because it makes a greater difference and causes more uncertainty. For a well-established theory to pass a test and be experimentally confirmed is a low-information event, but prediction failures carry high information and may cause great paradigmatic upheaval. By regarding science as a system that randomly generates conjectures and then eliminates them by refutation, it becomes analogous to a species undergoing evolution, and can be seen as a form of information processing that results in the editing of our universe-compression program, just like how the environment is compressed in the DNA.
“Occam’s razor” is the philosophical principle of parsimony according to which, all else equal, a scientist faced with competing explanations for a phenomenon should select the one with the fewest ontological assumptions about what entities exist. The reason for why a short explanation is intrinsically more plausible can be understood with reference to the concept of “algorithmic probability”. Much like how monkeys typing randomly in a programming environment are more likely to write a functional computer program if it is short, a stable dynamical organization is more likely to emerge randomly if it is simple. Fractals abound in Nature precisely because they are so easy to generate, just a simple mechanism generated many times.
Moreover, short explanations are also more useful. Statisticians must be careful not to use too many variables in their models – something known as “overfitting” – since this implies a bigger risk for incorporation of random factors, making it a worse predictor. The idea that reality is organized into nested systems and stable regularities, which impart it predictability, is the same as saying that reality contains information redundancies and is algorithmically compressible. A scientific model can be viewed as information compression, in how a physical equation subsumes a multitude of disparate observations. The depth of a theory – the more phenomena we can derive from it, by either prediction or postdiction – the more powerful it is, so when choosing between competing theories, we naturally favor the one that compresses the most, for it is the lightest to carry.
Information theory also has interesting things to say about randomness. Because any theory, like a program, is finite, it can produce only a finite set of results. The outcomes that are unaccounted for will in effect be random and incompressible. The unpredicted outcomes will also be our only source of new information. Seen in this light, our understanding is complete only when the universe can no longer surprise us. However, we can merely be confident in a putative “Theory of Everything” – we cannot know that it is correct unless we know that it is irrefutable. And because a theory must be falsifiable in order to be meaningful, this means that we can never know whether a black swan, a high-information scientific discovery, awaits us. Because there will always be potential experiments to perform, randomness is in this way inherent in the universe.
Finally, information theory places limits on what science can in principle accomplish. There is a well-known theorem found by Alan Turing, known as the “Halting problem”, which states that a computer program cannot compute whether or not a program will terminate. Its own future behavior is intrinsically unpredictable – it is said to be “uncomputable”. Therefore, a debugger that checks whether a program will crash before we run it can impossibly exist. More generally, if a logical system supports self-reference, then it automatically results in paradoxes. In mathematics, this is known as Gödel’s incompleteness theorem. Laplace’s demon can impossibly exist, because to simulate the future it would need as much processing power as the Universe itself. Seen in a different way, we cannot predict our own behavior, because by engaging in prediction we interfere with how we normally would behave. We can impossibly know where our train of thought will lead us, and we can impossibly predict the precise course of the universe, because we are integral to its evolution.