V. Basic Information Theory

Time to collision may be simply estimated based on how fast the image of an object is expanding. According to Gibson's ecological approach, organisms are tuned to such invariant relations.

Most psychologists studying perception and cognition today argue that Gibson’s radio-metaphor is flawed because a brain, unlike a radio, identifies a “signal” not directly, but in a memory-dependent way: a brain learns over time, via structural changes, what is significant and what isn’t, so the radio’s “tuning” is determined by a history of interactions. However, if a hierarchy theoretical approach is taken, Gibson is somewhat vindicated, since at the levels of neurons and molecules, “perception” becomes more and more direct, since these smaller systems “anticipate” much simpler environmental features to change their state. But the greatest virtue of Gibson’s metaphor is in its recognition that what leaves a lasting, structural trace in a system is other systems. Interaction with non-systems – with dynamics that lack pattern and repetition – will not be reinforced, and consequently cancelled out. A system’s predicament can be likened to that of detecting a signal through a noisy channel, regardless of whether there is an intelligent producer of that signal.

The visual system’s task to create a stable internal model of some perceived distant object (“distal stimulus”) despite the fact that the optical input (“proximal stimulus”) never is the same twice, can be compared with a telegraph wire-designer’s job is to ensure that interfering disturbances do not admix with a sent message (i.e. distal stimulus) to the extent that the received version (i.e. proximal stimulus) becomes incomprehensible and therefore meaningless. The task, essentially, is one of distinguishing between possible scenarios, between input categories. If the telegraph communication is about a weather forecast, the recipient wants to know unambiguously whether it will be rain or sunshine. But like how an object is “signaled” to the perceiving organism via the three dimensions of reflected light, a telegraph message is communicated via Morse code, so the preservation of the rain/sunshine message ultimately is a matter of preserving much simpler physical distinctions – those between short and long signals across the wire. If noise prevents short and long signals from being distinguished, the original message cannot be reconstructed.

We may cast the issue formally in the following way:

  • Any system that specifies a medium so as to indirectly specify the state another system’s state may be considered a “sender”, or “message source”. The “recipient” considers it as a “random variable” – a cluster of possible outcomes, one of which will occur, but the recipient is unsure of which. The system, therefore, is probabilistic.
  • For now, suppose there are 20 different types of weather and that the messages are equiprobable, in the sense that the recipient has encountered each message equally often, so that all possible messages have a 1/20 probability.
  • Represent each distinct outcome by a number as an identifier. This could be arbitrary, for example, “hail” could be 12. The recipient system knows about this label, and is able to disambiguate between outcomes using it.
  • We could represent such distinctions by a base-10 system, since we find this counting system very natural as a consequence of being born with 10 fingers, but disregarding such human-centric quirks, the objectively simplest way is to use the base-2 system, which uses the same principles, but with only two numerals: 0 and 1.
  • This way, outcomes involving complex stimuli are reduced to a binary sequence of atomic distinctions, of bits. Two bits enable decision among four equally likely outcomes, three bits eight outcomes. The number of bits required is equivalent to the number of yes/no questions required to determine what the message is, out of all possible messages, and we will require 5 bits for our purposes.
  • Given a number n of equiprobable possible outcomes that needs be distinguished, the number of bits required to encode it is determined by log2(n) and this constitutes the “information content” of each message. The number of bits that a medium can preserve, in other words its maximum possible transmission rate, is likewise its “information channel capacity”. (Note that in the illustration below, the last two bits are correlated, and therefore only count as one)

For each independent physical distinction with two equiprobable micro-states, or "bit", the number of possible macro-states is multiplied by 2. The number of bits defines the information content.

So if you ever wondered what a “bit” is, now you know. A bit can be thought of as the key that unlocks the answer to a binary query. It was the philosopher Gregory Bateson who defined a bit as “a difference that makes a difference”, the smallest physical difference distinguishable by a system. To resolve the uncertainty of a bit – to have it collapse into either one state or the other – is to convey “information”Paradoxically, this means that the amount of information is determined by the initial absence of content, where information is the decrease in data deficit. The numerals of 0 and 1 could be replaced by “no”/”yes”, “false”/”true” or “head”/”tails” as we see fit. It could be realized neurally, electrically, and in an indefinite number of other ways. It could also refer to any level in the hierarchy: if there only two types of weather, the rain/sun distinction would constitute a bit as well. Some philosophers therefore find it useful to divide data into primary data or “dedomena” (“data in the wild”, cognitively unprocessed lack of uniformity in the world), that of “secondary data” (the registered differences) and “derivative data” (lack of uniformity between two signals, for example between two averages in statistical analysis).

A related, useful classification of information is the following, by mathematician Warren Weaver:

  • A computer file, has a certain bit-content defined by the number of registers (memory units) it occupies. This includes bits that encode not only letters and symbols, but also format-specifications like font-size and text-alignment, which a document reader uses to determine how to color the pixels on your computer screen. All this may be referred to as “syntactic information”. The meaning that a human may invest in it is irrelevant. Once rendered, it could be absolute gibberish – what matters is that all the distinctions in its physical state are preserved.
  • When a human reads the document, which happens to be a research report, and its written symbols trigger brain activity patterns that correspond to those of the document’s author, the kilobytes of the file may serve to resolve bits of uncertainty in what ideas the document embodies. These bits are the “semantic information”.
  • For a human who is already aware of these findings, the document brings nothing new to the table – it does not permit the reader to make any more adaptive decisions than he already could – and therefore resolves no uncertainty. Its “pragmatic information” content is zero.

Confusion arises around the concept of information when we forget what system we refer to: the physical medium ("syntactic"), the recipient's understanding ("semantic"), or how it affects his decision-making ("pragmatic"). The latter ones are much harder to quantify.

An analogous example of syntactic, semantic and pragmatic information is that of a dog smelling food. Features of the odorant’s molecular structure are used to disambiguate between rotten and edible meat, but because the dog’s stomach is already full, this bit is not very meaningful anyway, and cannot be said to exist. This highlights the relativistic nature of information: depending on measuring instrument and interpreter, a physical difference can convey different numbers of bits. In normal parlance, “information” connotes wisdom, meaning, useful chunks of knowledge and a rather ethereal quality of “aboutness”. This is because we normally use the word with reference to our own umwelt in which a construction manual is “uninformative” if it is written in a language we don’t speak, and a textbook is “meaningless” if it covers nothing on what we will be examined on.

As already pointed out, our everyday concept of “meaning” is typically just applied to systems that have been under selective pressure to inform, and in this sense intends to inform. Humans may have discovered a correlation between clouds and rain, but clouds did not evolve for the sake of telling you to seek shelter or bring an umbrella. “Meaning” feels much more natural when applied to man-made symbolic systems like weather forecasts and poetry, or instinctive semiotic systems like courtship rituals and facial expressions. Before they were deciphered, hieroglyphs was obviously information, but its meaning was buried so deep into the linguistic expressions that a Rosetta-stone was needed to excavate it. The litmus test between information and meaning can therefore be said to be whether or not it makes sense to speak of misrepresentation.

It is actually common, in the history of science, for a concept initially to be regarded as a kind of ethereal substance, and to be later clarified in relational terms. Our brains, it seems, wish to cling onto the conviction that a notion represents something concrete for as long as it can. During the 19thcentury, heat was thought of as a fluid, energy was for example conceived of as intangible stuff, an “élan vital” or “ether” that by mysterious means invigorated machines and bodies, before it was re-framed as the quantity that is conserved but transformed across processes of induced change. Similarly, “information” is often thought of as a disembodied commodity that can be transferred, stored, and sold, endowed with a mystical “aboutness”. It would take until the publishing of Claude Shannon’s mathematical theory of communication in 1948 for it to be quantitatively defined, providing solutions to engineering problems on which our globalized culture critically depends.

Shannon’s theory dealt with messages that may have different probabilities, and we will discuss that later, but suffice it now to remember that information is therefore not something intrinsic, but something relational. Defining it as the resolution of uncertainty, as the context-determined distinguishability between two states, means that we can measure the information capacity of any physical substrate given the number of distinguishable outcomes it can support. If the micro-states are clear-cut, mutually exclusive, and uncorrelated to each other, then it really is just a matter of counting the number of possible macrostates, and taking its logarithm.

A computer is a powerful tool precisely because it affords us so many crisp, physical differences that may be flexibly assigned reference, and be physically manipulated as a stand-in, or simulation, for some physical process that humans find useful. In a computer, bits are stored in “capacitors”, microscopic buckets that hold electrons. A capacitor has an umwelt of two categories: that of zero voltage, which gives it no excess electrons (it registers 0), and that of non-zero voltage, where lots of excess electrons represents the registration of 1. The nervous system, too, is powerful precisely because it has lots of simple systems that distinguish between two different micro-states, which collectively can organize adaptive responses to the environment that they represent. And with four different bases, DNA may not be binary, but it makes excellent sense to speak of this linear string as “coding for” proteins and traits, in how, given a normal chemical environment and ecological backdrop, changes DNA will reliably correlate with changes in the latter. Who, then, is the programmer of DNA? The message source is systematic changes in the eco-system, which, unlike non-systemic fluctuations, are persistent enough to influence the gene pool and inform about what genetic changes are appropriate.