Shannon’s measure of information is actually known as “entropy”, a word better known from thermodynamics, whose famous second law states that, in a closed system, it always increases to a maximum. For example, when we place a cool object next to something warm – heat orderly packaged into one but not the other – but always end up with two objects of equalized temperature. The distribution is evened out and becomes unable to do anything interesting. Heat flowing upwards can drive a turbine or push a piston to lift a weight, but if you do not supply more heat, the system soon dies, and we no longer experience “order”, whatever this elusive property is.

19th century statistical mechanic Ludwig Boltzmann’s formula for thermodynamic entropy, S = k * logW, is actually mathematically equivalent to Shannon’s, and on advice from computing pioneer John von Neumann, Shannon chose the same name as a self-conscious analogy with it – a decision that has led to considerable confusion as to wherein their deep relationship resides. Here, the equivalence will be explained via the concepts of “visible”, “invisible” and “mutual” bits.

Recall the trade-off between extent and grain at the boundary filter of any entity. For a quantitative variable, “extent” refers to the *range* of values a system can register, and “grain” to the size of the smallest distinguishable entity, in other words its *precision* (number of significant figures). Dividing range by precision equals the number of values that a system can distinguish between. This number represents the amount of *available *information, which, depending on context, may be seen as either syntactic, semantic, or pragmatic. Syntactically, a thermometer reading of 31.56 degrees, for example, has a finer grain and therefore more bits than a friend screaming “30-ish”.

If the potential sun-bather relies on a less precise friend, we may say that more of the information in the over-arching system (the bits required to define the detailed microscopic state of people living with a variable atmosphere) becomes *invisible*. The bits that describe the temperature with infinite precision operate cryptically beneath a blur. The bits describing the coarse-grained, collective, statistical properties extracted by the system nested inside of it – the temperature registered by the sun-bather – meanwhile become *visible, *in that it is used to distinguish between different brain states (that of “30-ish”) in the human observer.

Consider how, at microscopic levels, the molecules in a gas behave myopically in a Newtonian fashion, and for a moment assume the viewpoint of such a molecule. Each molecule may be regarded as a system in its simple right, and while gas molecules may not evolve biologically, the formation of molecules from atoms nevertheless involves feedback and a “survival of the stablest”, and therefore possess an umwelt as they register the position and velocity of colliding molecules to “calculate” their future trajectory.

Now, zoom out hierarchy-style and take the perspective of an external, human observer. To obtain data about the position and velocity of each gas molecule is a practical impossibility, but suppose you have managed to extract these for one molecule. For simplicity, regard the molecule as a single bit whose state we know to be 0. This bit interacts with an unknown bit, such that the unknown bit’s state depends on the known bit. An example of such a bit-flipping rule would be “If the known bit is 1, then flip the second bit”. As a result, the bits become correlated, with a state of either (0,0) or (1,1), although we don’t know which of these.

After interacting with an unknown bit, each bit now has one bit of uncertainty, and because of the correlation the two bits also *share* one bit of uncertainty. Adding their individual bits and *subtracting* the shared bits gives us a quantity known as *mutual information*. As a consequence, the *total* information content – the sum of both the invisible and visible component – remains constant, but *our* bits of ignorance of the system has spread – the *visible *information has decreased – and will continue to do so, almost like an epidemic. The collision could, of course, be reversed and thus reduce our uncertainty, but assuming molecular chaos any two molecules colliding again will effectively be uncorrelated.

The idea of total information as a conserved quantity is known as “Landauer’s principle”. For a gas molecule, being part of an ordered gas where, say, the particles are concentrated in a corner, means that its micro-state encodes fewer past collisions and registers fewer bits. For this molecule, there is less uncertainty as to the microstate of another gas molecule about to collide with it. The particle carries little information, because there is little uncertainty to resolve. There is little *invisible *information in the gas. For an external observer, meanwhile, an ordered gas means we have less uncertainty as to the whereabouts of a particular molecule (it is somewhere in the corner): there is a lot of *visible *information. You may inflate a gas container so that molecules inside it will register fewer bits, but only at the expense of making molecules outside of it register more bits. If you add heat, there will be more possible microstates, and the state description would require more bits, however adding heat means that the system no longer is closed (so the principle is not violated).

Consider a thought experiment by Scottish physicist James Clerk Maxwell, modified from one by French mathematician Pierre-Simon Laplace, in which a demon sorts gas molecules into a “faster-than-average” and “slower-than-average” compartment of a gas chamber so as to convert disorganized energy into ordered energy, thereby breaking the 2^{nd} law of thermodynamics. What makes such a perpetual-motion machine impossible? Well, in order to sort, a demon would have to process information- to compute. Computation is necessarily physical – it requires energy and releases heat, so the entropy bill is duly paid.

Landauer’s principle is, however, perhaps best illustrated by the heat emitted from your laptop – to erase a bit, the electrons in the capacitor realizing the bit must be discharged, and this causes a change in temperature that preserves the bit you tried to delete. In the overall system (environment + laptop) there is thus no ambiguity about the initial states, and if your cooling fan does not work properly, this becomes painfully palpable. One way to express this is that the classical laws of physics are one-to-one maps: each input state has only one output state, and vice versa. The classical universe is *deterministic*, or *computationally reversible*. (Later we will see that, depending on interpretation, quantum chance may be said to be an exception to Landauer’s principle and determinism, so as to inject new, fresh bits into the Universe.)

But how is the invariance of total information content compatible with the 2^{nd} law of thermodynamics? Thermodynamic entropy corresponds to the invisible component, the microscopic jigglings that, as time goes on, increases as interactions with unknown values cause an external observer’s initial certainty about a molecule to be washed away.

To reinforce the equivalence of entropy-as-disorder with invisible information, imagine a phase space representing all the possible macro-states of a gas chamber (by “macro-state” we mean a complete description of all particles’ *micro-states*, i.e. their positions and velocities). For an external observer in a state of total ignorance, every macro-state has the same probability. However, if gas has just been released in one of the chamber’s corners, then the number of likely macro-states is dramatically reduced: the observer knows it will be in the macro-state category of “majority of particles are in the upper-right corner”. As molecules slavishly whiz around in their Newton-dictated ways, the mutual information will increase as the position and velocity values encode more and more previous encounters, converting visible information into invisible information, with the number of likely macro-states increasing as a result.

Molecules are thus inexorably driven to explore more space, with an overwhelming tendency to spread out, evolving into a highly disordered configuration. While it is not *impossible* for a gas to unmix itself into two compartments, the relative frequency of macro-states that do so is vanishingly small, and the probability is effectively zero. It becomes useful to partition the phase space of macrostates is into the categories “order” and “disorder”, and the latter is what corresponds to the concept of higher observer uncertainty and more entropy, that is, more invisible information