# Nottingham FP Lab Blog

Welcome to the blog of the Functional Programming Lab at the University of Nottingham!

To post something, read the instructions at http://bitbucket.org/fplab/hblog.

## Categorical Semantics of Dependent Type Theory

by nicolai on February 7, 2013

In the Type Theory meeting last week, I have given an overview over the categorical semantics of dependent type theory. The semantics of the simply typed lambda calculus are well-known and generally accepted, but this is not so much true for the dependently typed setting. Also, it is not so easy to state the conditions on a category $$\mathbb C$$ to be suitable for the construction of a model. Often, people just say that $$\mathbb C$$ should be locally cartesian closed. This makes a lot of sense, but is neither really sufficient, nor necessary. However, for the moment, it is a good description for $$\mathbb C$$.

We interpret every (valid) context as an object in $$\mathbb C$$. For the simply typed lambda calculus, it is not too important to distinguish between contexts and types. This is more subtle for dependent types, as not every type is closed. We have typing judgments $\Gamma \vdash A \textit{ : type},$ and from such a judgment, we get a new valid context, often written $$\Gamma.A$$, the context extension. We interprete this judgment as a morphism between these contexts: $[[ \Gamma \vdash A \textit{ : type} ]] \; : \; [[ \Gamma.A ]] \to [[ \Gamma ]].$ One should think of this map as a projection, that "projects A away". Being lazy, one also writes $$[[A]]$$ instead of the rather long expression above. In the simply typed setting, one usually models types as objects. Equivalently, one could model a type as the unique morphism from the object to the terminal object.

If we have the type judgment above, we can also have a term of this type, $\Gamma \vdash t \, : \, A .$ We model this judgment as a section of the type judgment, i.e. it is a morphism from $$[[\Gamma ]]$$ to $$[[\Gamma.A]]$$ such that $[[ \Gamma \vdash A \textit{ : type} ]] \; \circ \; [[ \Gamma \vdash t \, : \, A]] \; = \; \operatorname{id}_{[[\Gamma]]},$ or just $$[[A]] \circ [[t]]$$ in the lazy version.

The nice thing is that we can do many things in the same way as it works for the simply typed case; we just need to restrict ourselves to the slice category over $$[[\Gamma]]$$, if we work in context $$\Gamma$$. For example, for the simply typed lambda calculus, we use exponentials to model function types. Note that the functor "$$A\to$$", or $$\_^A$$, is right adjoint to the product functor $$\_ \times A$$. The product with $$[[A]]$$ in the slice over $$[[\Gamma]]$$ is nothing else but the pullback along $$[[A]]$$, let us write $$A^*$$, and we get dependent function types as the right adjoints of these pullbacks. Similarly, we get dependent sums as left adjoints of these pullbacks. Summarized, we have $\Sigma_A \dashv A^* \dashv \Pi_A.$ One can convince oneself that this makes sense by drawing a couple of diagrams.

I think these semantics have first been described by Seely. “Locally Cartesian Closed Categories and Type Theory”, and have since then often been discussed; in particular, there is a coherence problem (things that hold strictly in type theory can, by default, only be described up to isomorphism in category theory). The (arguable) cleanest solution for this problem is a universe construction by Voevodsky. The plan is that more detailed semantics, together with this construction, the presheaf of simplicial sets, and univalence, will be the topics of the Type & Homotopy Theory meetings, as well as the Sheaf Theory seminars, in the next two weeks.

## ε0 happy returns

by Venanzio on December 15, 2012

Today is the 100th anniversary of the birth of Reuben Goodstein. It's a very young age for someone immortalized by his voyages into the transfinite realm. A few months ago I talked about his theorem during an FP lunch. After some wait, here is a blog post about it.

### Cantor Normal Forms

Every natural number $$x$$ can be put in Cantor normal form with respect with any base $$b$$. First we write $$x$$ in base $$b$$, as a sum of powers of $$b$$ multiplied by coefficients smaller than $$b$$, for example, if $$x=68630377369258$$ and $$b=3$$ we have $68630377369258 = 3^{29} + 3^7\cdot 2 + 1$ We recursively write the exponents in the same base: $29 = 3^3+2 = 3^{3^1}+2, \quad 7 = 3\cdot 2 +1 = 3^1\cdot 2+1$ Thus the Cantor normal form of 68630377369258 in base 3 is: $68630377369258 = 3^{3^{3^1}+2} + 3^{3^1\cdot 2+1}\cdot 2 + 1$ So a Cantor normal form in base b expresses a number x as $$x = b^{c_0} \cdot a_0 + b^{c_1} \cdot a_1 + \cdots + b^{c_n} \cdot a_n$$ where $$c_0$$, ..., $$c_n$$ are themselves Cantor normal forms.

Abstractly, generalizing over the base, it's a list of pairs of coefficients and exponents, which are themselves Cantor normal forms. Here it is expressed as a Haskell data type with functions converting between it and naturals with respect to a given base.

data Cantor = Cantor [(Integer, Cantor)]
deriving Show

-- (cantNum c b) computes the number with CNF c in base b
cantNum :: Cantor -> Integer -> Integer
cantNum (Cantor l) = cantNumL l

cantNumL :: [(Integer,Cantor)] -> Integer -> Integer
cantNumL [] _ = 0
cantNumL ((i,c):cs) base = i*base^(cantNum c base) + cantNumL cs base

-- (numCant b x) computes the CNF of x in base b
numCant :: Integer -> Integer -> Cantor
numCant base = Cantor . map ($$c,n) -> (c,numCant base n)) . (inBase base) iLog :: Integer -> Integer -> Integer iLog base n = iLogAux 0 1 where iLogAux i k = if (k*base)>n then i else iLogAux (i+1) (k*base) inBase :: Integer -> Integer -> [(Integer,Integer)] inBase base 0 = [] inBase base n = (n div k,i) : inBase base (n rem k) where i = iLog base n k = base^i Now take a Cantor normal form (CNF), change the base and recompute the number. For example, earlier we had the CNF of \(68630377369258$$ in base 3. If we change the base to 4 we obtain $\begin{array}{l} 3^{3^{3^1}+2} + 3^{3^1\cdot 2+1}\cdot 2 + 1 \leadsto 4^{4^{4^1}+2} + 4^{4^1\cdot 2+1}\cdot 2 + 1\\ = 21452492687908155359318439997129353803966985312947829\\ \quad 40435769830995482244811767516288299887706704548430405\\ \quad 09730983776813660062124991145119142938384097869825. \end{array}$ The number has become much bigger, so let's make it smaller by subtracting one! $\begin{array}{l} 21452492687908155359318439997129353803966985312947829\\ 40435769830995482244811767516288299887706704548430405\\ 09730983776813660062124991145119142938384097869825 - 1 \\ = 21452492687908155359318439997129353803966985312947829 \\ \quad 40435769830995482244811767516288299887706704548430405 \\ \quad 09730983776813660062124991145119142938384097869824. \end{array}$ This is one step of the Goodstein procedure: find the Cantor normal form of a number $$x$$ with respect to a base $$b_1$$, change the base to $$b_2$$, recompute the number and subtract one. The Goodstein sequence associated with a number is obtained by iterating the step increasing the base by one every time.

-- (goodstep b1 b2 n) first finds the CNF of n in base b1
-- then recomputes it in base b2 and subtracts 1
goodstep :: Integer -> Integer -> Integer -> Integer
goodstep base1 base2 n = cantNum (numCant base1 n) base2 - 1

-- Given a list of bases and a number n
-- we iterate the Goodstein step on n using all the bases
goodsteinGen :: [Integer] -> Integer -> [Integer]
goodsteinGen _ 0 = []
goodsteinGen (b1:b2:bs) n = n : goodsteinGen (b2:bs) (goodstep b1 b2 n)

goodstein :: Integer -> [Integer]
goodstein = goodsteinGen [2..]

Let's compute a few:

*Main> goodstein 1
[1]
*Main> goodstein 2
[2,2,1]
*Main> goodstein 3
[3,3,3,2,1]

Now do what I did: run it on 4 on your university's server and enjoy the screams of frustration of your colleagues when everything hangs up.

Although computing the Goodstein sequence for 4 is not feasible in practice, one can prove that it terminates.

### Goodstein's Theorem

The Goodstein procedure terminates on every starting number.

Although the theorem has a simple formulation and a short elegant proof, it cannot be proved in Peano Arithmetic: it essentially needs transfinite induction. It is probably the simplest result that is undecidable in PA.

The proof consists in associating an ordinal to every CNF, proving that every step in the process reduces this ordinal and therefore concluding that the process terminates by well-foundedness of the ordinal numbers.

Given a CNF, just compute it using the ordinal $$\omega$$ (the ordinal associated to the natural numbers) as a base. The example that we've seen earlier gives: $\omega^{\omega^{\omega^1}+2} + \omega^{\omega^1\cdot 2+1}\cdot 2 + 1$ (Addition and multiplication of transfinite ordinals are not commutative; to get the correct result the coefficients must be multiplied on the right and the terms must be in decreasing order.)

Now it is very easy to see that every step reduces the ordinal. If the CNF contained a non-zero term with the base raised to the power zero (that is, if the number is not divisible by the base) then the associated ordinal is a successor an the step will just give its predecessor. Otherwise there is a bigger decrease in the ordinal (depending on the base used).

In any case, we get a decreasing sequence of ordinals that will eventually reach zero. This proves the theorem.

Since every tower of exponentiation of the base is a valid CNF, we get that every ordinal smaller that $$\epsilon_0$$ can occur: $\epsilon_0 = \sup \{\omega, \omega^\omega, \omega^{\omega^\omega}, \ldots\}.$ So we are doing $$\epsilon_0$$-induction, which goes beyond elementary arithmetic.

## A Quotient in HoTT

by Nicolai Kraus on December 12, 2012

Last Friday (07/12/12), I have talked about the difficulties of quotients in Homotopy Type Theory.

Traditionally, the quotient $$A/\sim$$ of a set $$A$$ by some equivalence relation $$\sim : A \to A \to \textbf{Prop}$$ is a set $$Q$$ (or, by abuse of notation, just $$A/\sim$$ , think of the set of equivalence classes), together with a function $$[\_] : A \to Q$$ (mapping every element on its equivalence class) and an eliminator. The latter says, more or less, whenever a function $$f : A \to C$$ does not distinguish between elements that are $$\sim$$-equivalent, it can be lifted to a function $$\overline f : Q \to C$$. For the details, I would recommend Definable Quotients in Type Theory by Nuo, Thomas and Thorsten.

This is a very natural concept and, in some cases, it is necessary. For example, the real numbers cannot be defined without quotients (this statement refers only to Type Theory, but even in classical mathematics, notions as the Cauchy Reals make usage of quotients). However, the concept becomes much more involved as soon as the principle UIP (uniqueness of identity proofs) is dropped. It would make sense to generalise the notion of an equivalence relation, but even if we don't, it is not clear at all what it means for a function to "not distinguish between elements that are equivalent". Without UIP, the actual proofs and their behaviours matter, while we did not have to care about that issue before.

I talked about the example of the possibly simplest equivalence relation, which is constantly $$\textbf{True}$$. In that case, the quotient should be the h-propositional reflection (also known as "squash-type" or "bracket-type"), but it is already very hard to formalise the conditions under which a function $$f : A \to C$$ can be lifted to $$Q \to C$$. Informally, for any finite number of points in $$A$$, we can, by induction on the number of points, define the corresponding simplex in $$Q$$, and the condition on $$f$$ is that these simplices can be built out of coherence proofs. This leads to one of the standard constructions when it comes to $$\omega$$-category theory, and possibly to concepts that are well-known in Homology. Of course, the first goal is the formalisation of the informally described idea.

## Induction-Recursion and Impredicativity

by Venanzio on December 6, 2012

The type theory meeting of Wednesday 5 December was a short introduction to induction-recursion with a divagation about impredicativity.

### Type Universes

Martin-Löf distinguishes universes à la Russell and universes à la Tarski. The difference is that a Russell-style Universe $$\mathsf{U}$$ is a type such that any of its elements $$A:\mathsf{U}$$ is itself a type; while a Tarski-style universe is a set $$\mathsf{U}$$ of codes together with a decoding function $$\mathsf{Elem}: \mathsf{U}\rightarrow \mathsf{Type}$$.

A simple Tarski universe with non-dependent types can be defined by plain induction and the decoding function is given by recursion on it: $\begin{array}{l} \mathsf{U}: \mathsf{Type}\\ \mathsf{zero}: \mathsf{U}\\ \mathsf{one}: \mathsf{U}\\ \mathsf{plus}: \mathsf{U}\rightarrow \mathsf{U}\rightarrow \mathsf{U}\\ \mathsf{times}: \mathsf{U}\rightarrow \mathsf{U}\rightarrow \mathsf{U}\\ \end{array} \quad \begin{array}{l} \mathsf{Elem}: \mathsf{U}\rightarrow \mathsf{Type}\\ \mathsf{Elem}\,\mathsf{zero}= \emptyset\\ \mathsf{Elem}\,\mathsf{one}= \{\bullet\}\\ \mathsf{Elem}\,(\mathsf{plus}\,a\,b) = \mathsf{Elem}\,a + \mathsf{Elem}\,b\\ \mathsf{Elem}\,(\mathsf{times}\,a\,b) = \mathsf{Elem}\,a \times \mathsf{Elem}\,b \end{array}$

But when we want to add constructors $$\mathsf{pi}$$ and $$\mathsf{sigma}$$ for dependent products and sums to $$\mathsf{U}$$, then the definitions of $$\mathsf{U}$$ and $$\mathsf{Elem}$$ become entangled: $$\mathsf{Elem}$$ occurs in the type of the constructors of $$\mathsf{U}$$: $\begin{array}{l} \mathsf{pi}: \Pi\,a:\mathsf{U}, (\mathsf{Elem}\, a \rightarrow \mathsf{U}) \rightarrow \mathsf{U}\\ \mathsf{sigma}: \Pi\,a:\mathsf{U}, (\mathsf{Elem}\, a \rightarrow \mathsf{U}) \rightarrow \mathsf{U}\\ \ \\ \mathsf{Elem}\, (\mathsf{pi}\,a\,b) = \Pi\,x:\mathsf{Elem}\, a, \mathsf{Elem}\, (b\, x)\\ \mathsf{Elem}\, (\mathsf{sigma}\,a\,b) = \Sigma\,x:\mathsf{Elem}\, a, \mathsf{Elem}\, (b\, x) \end{array}$ This was the first instance of an inductive-recursive definition: the simultaneous specification of an inductive type and a recursive function on it.

The construction can be iterated and generalized. We can define another universe $$\mathsf{U}_1$$ that contains $$\mathsf{U}$$ by the same recursive-inductive specification plus a code $$\mathsf{u}_0:\mathsf{U}_1$$ with associated type $$\mathsf{Elem}\, \mathsf{u}_0 = \mathsf{U}$$. In fact, if we already defined an entire family of universes, we can use this construction to create an new universe that contains them all. Eric Palmgren introduced a superuniverse that is closed under this universe-creation rule.

### Induction-Recursion

Peter Dybjer further generalized the construction and invented simultaneous inductive-recursive definitions. These allow us to define an inductive type together with a recursive function on it. The function is allowed to appear in the type of the constructors; but there must be syntactic restrictions to guarantee that it is applied to arguments that are smaller than the one generated by the constructor. Later Dybjer and Anton Setzer devised a elegant type of codes for inductive-recursive definitions (we'll look at it on a future meeting).

A simple example is the type of lists without repetitions, by Catarina Coquand. The type is defined simultaneously with a freshness relation that guarantees that we can only add elements that are not already in the list. $\begin{array}{l} \mathsf{FList}\, (A:\mathsf{Set}): \mathsf{Set}\\ \mathsf{fnil}: \mathsf{FList}\, A\\ \mathsf{fcons}: \Pi\,x:A, \Pi\,l:\mathsf{FList}\, A, \mathsf{Fresh}\, x\, l \rightarrow \mathsf{FList}\, A\\ \ \\ \mathsf{Fresh}: A \rightarrow \mathsf{FList}\, A \rightarrow \mathsf{Prop}\\ \mathsf{Fresh}\, x\, \mathsf{fnil}= \top\\ \mathsf{Fresh}\, x\, (\mathsf{fcons}\, y\, l\, p) = (x\neq y) \land \mathsf{Fresh}\, x\, l \end{array}$

Ana Bove and I used inductive-recursive definitions to formalize general recursive functions in type theory. The idea is to characterize the domain of a function by an inductive predicate and then define the function itself by recursion on the proof of the predicate. Take for example the quicksort algorithm. In Haskell it is written as:

quicksort :: [A] -> [A]
quicksort [] = []
quicksort (x:l) = quicksort ll ++ x : quicksort lr
where ll = [y | y <- l, y<=x]
lr = [y | y <- l, y>x]


In this case the function happens to be total: we can prove that the domain predicate is always satisfied, $$H:\forall l:\mathsf{List}\,A, \mathsf{D}_{\mathsf{qs}}\,l$$. Therefore, we can eliminate the dependency on the proof: $$\mathsf{quicksort}: \mathsf{List}\,A \rightarrow \mathsf{List}\,A$$, $$\mathsf{quicksort}\,l = \mathsf{qs}\,l\,(H\,l)$$.

If the recursive function we try to implement has nested recursive calls, then we need induction-recursion. Take for example McCarthy's 91 function:

mccarthy :: Int -> Int
mccarthy n | n > 100 = n - 10
| otherwise = mccarthy (mccarthy (n+11))

The function always terminates and returns 91 on every argument less or equal to 101. However, the recursive calls are made on arguments that may be larger than the original input and are nested. We can formalize it in type theory in the same style as the quicksort algorithm:


### Impredicativity

I've been a bit too casual in defining the domain predicates as functions to $$\mathsf{Prop}$$. There are issues related to the consistency of elimination rules between type universes. You can simply adopt the Curry-Howard correspondence in full and replace $$\mathsf{Prop}$$ everywhere with $$\mathsf{Set}$$. However, if instead you have a separate impredicative universe of proposition (as, for example, in Coq), then there are consistency problems.

Impredicativity refers to the possibility of quantifying over a type universe and obtaining an element of that same universe. In general, the formation rule of product types can be formulated as follows: $\dfrac {A:s_1 \quad x:A \vdash B[x]:s_2} {\Pi\,x:A, B[x] : s_3}$ where $$s_1$$, $$s_2$$ and $$s_3$$ are sorts (type universes). One can define a Pure Type System by specifying a set of such rules simply by giving the triples $$(s_1,s_2,s_3)$$ and some axioms in the form $$s_1:s_2$$. We have that a sort $$s_1$$ is impredicative if there is an axiom $$s_1:s_2$$ and a rule for $$\Pi\,$$-types with triple $$(s_2,s_1,s_1)$$. In our specific case, $$s_1 = \mathsf{Prop}$$ and $$s_2 = \mathsf{Set}$$: we can quantify over any set, including the set of all propositions.

The most famous impredicative theory is Girard's system F. Girard proved that the normalization property of his system was equivalent to the consistency of second order Peano Arithmetic, that is, analysis. He then proved normalization with the new technique of reducibility candidates.

Although pure type systems only have function types, in an impredicative theory we can encode inductive types by the polymorphic quantification of their iterators. For example, the natural numbers can be defined in system F as $\mathbb{N}= \Pi\,X:\mathsf{Set}, X \rightarrow (X\rightarrow X) \rightarrow X$ (In this case it is the sort $$\mathsf{Set}$$ that's impredicative).

If $$\mathsf{Prop}$$ is impredicative, eliminating an inductive proposition over a set like I have done is forbidden. Allowing this kind of elimination in general would lead to a contradiction, a version of Girard's paradox described by Thierry Coquand.


### References

• Per Martin-Löf, Intuitionistic type theory, Napoli, Bibliopolis, 1984.

• Eric Palmgren, On universes in type theory, in: G. Sambin and J. Smith (eds.) Twenty Five Years of Constructive Type Theory, Oxford Logic Guides, 1998.

• Peter Dybjer, A General Formulation of Simultaneous Inductive-Recursive Definitions in Type Theory, J. Symb. Log. 65(2): 525-549 (2000).

• Peter Dybjer and Anton Setzer, A Finite Axiomatization of Inductive-Recursive Definitions, TLCA 1999:129-146.

• Ana Bove and Venanzio Capretta, Nested General Recursion and Partiality in Type Theory, in TPHOLs 2001, LNCS 2152, pages 121-135.

• Henk Barendregt, Lambda calculi with Types, in: Handbook of Logic in Computer Science, Vol 2, OUP, (1992), pp 117-309.

• Jean-Yves Girard, Yves Lafont and Paul Taylor, Proofs and Types, Cambridge Tracts in Theoretical Computer Science, 1989

• Thierry Coquand, An Analysis of Girard's Paradox, LICS 1986: 227-236.

## Isomorphisms and weak equivalences

by Paolo Capriotti on November 28, 2012

Today we discussed about isomorphisms and weak equivalences in type theory.

There are at least two ways to define the notion of equivalence between two types X and Y.

The first, which we will call isomorphism, is a tuple (f, g, H, K), with:

f : X → Y
g : Y → X

H : (x : X) → g (f x) ≡ x
K : (y : Y) → f (g y) ≡ y

i.e. essentially a pair of functions and pair of homotopies. This corresponds to the notion of homotopy equivalence in homotopy theory.

The other possible definition is the usual one by Voevodsky: a function f : X → Y is a weak equivalence if the following type is inhabited:

(y : Y) → h₀ (Σ X λ x → f x ≡ y)

i.e. every fiber is contractible.

We can show that these two notions are logically equivalent. Going from a weak equivalence to an isomorphism is quite straightforward. Given a weak equivalence f, unfolding the definition gives that, for all y : Y, we have a triple

(g y, K y, ϕ y)

with

g y : X
K y : f (g y) ≡ y
ϕ y : (x : X)(p : f x ≡ y) → (x, p) ≡ (g y, K y)

So we automatically get an inverse g and a homotopy K, and we only need to find an H. So we define:

H x = cong proj₁ (ϕ (f x))

and we're done.

The other direction is trickier. We first prove the type-theoretic equivalent of Vogt's lemma:

Given an isomorphism (f, g, H, K), we can find K', of the same type as K, and γ : (x : X) → cong f (H x) ≡ K' (f x).

The outline of the proof is as follows: we first show that the following diagram of homotopies


commutes. In fact, it is quite easy to see that we can replace fHgf with fgfH in the top arrow. Similarly, we can replace fgKf with Kfgf in the left arrow. Then commutativity of the diagram follows immediately from the interchange law (i.e. the fact that horizontal composition of homotopies is well defined).

Next, we can observe that f appears on the right of every arrow except the bottom one, hence if we define:

$K' = K \circ f H g \circ (f g K)^{-1}$

it's clear that

$K' f \equiv f H$

$$\square$$

Now we're ready to show that any isomorphism is a weak equivalence. Given any isomorphism (f, g, H, K), Vogt's lemma implies that we can assume the existence of γ : (x : X) → cong f (H x) ≡ K (f x) (we say that the isomorphism is coherent, in this case). Again, a weak equivalence is given by a triple (g, K, ϕ), and we have g and K already, so we only need to provide ϕ.

Using the fact that equality in a $$Σ$$-type is the same as a $$Σ$$ of equalities, and given x : X, y : Y and p : f x ≡ y, we define

q : x ≡ g y
q = (H x)⁻¹ ; cong g p

contr : subst (λ x → f x ≡ y) q p ≡ K y
contr = ? -- see below

ϕ x y p = q , contr

The proof contr consists simply of algebraic manipulations on paths:

  subst (λ x → f x ≡ y) q p
≡⟨ substituting on an equality is the same as composing ⟩
cong f q⁻¹ ; p
≡⟨ definition of q ⟩
cong f (cong g p⁻¹) ; cong f (H x) ; p
≡⟨ coherence ⟩
cong (g ∘ f) p⁻¹ ; K (f x) ; p
≡⟨ interchange law ⟩
K y ; p⁻¹ ; p
≡⟨ groupoid law ⟩
K y

Questions raised:

1. What's an example of non-coherent isomorphism, i.e. a quadruple (f, g, H, K) for which there exists no corresponding γ?
2. Is the logical equivalence above (restricted to coherent isomorphisms) an isomorphism itself?
3. It is possible to define a dual notion of coherence by swapping f and g. Fortunately, one coherence is enough to prove the other, but it raises a question: is it possible to have non-coherent coherence proofs (and so on)?

Update:

An answer to 1 can be found in this agda file, which uses our work in progress base library for HoTT

## The Infinite Jail

by Venanzio on November 19, 2012

There is a jail containing a countably infinite number of inmates. Because of budget cuts, the government is closing it down. The prisoners will either be all released or all executed. The decision depends on whether they pass a certain test.

On the fatal day, the inmates are allowed to agree a common strategy beforehand, but they can't communicate during the test itself. They will be put in a (very) long line and coloured hats, either black or red, will be put on their heads. A prisoner cannot see the colour of his own hat, but he can see the hats of all the infinite people in front of him. Afterwards, each of them is taken to a separate room and must guess his own hat's colour.

If only a finite number of them guess wrongly, then they pass the test and they all go free. But if they make an infinite number of wrong guesses, then they will all be executed.

Now: do you think that the inmates can agree on a strategy that will guarantee their success?

Note: These prisoners have very good eyesight: they can see an infinite sequence of hats in front of them. They also can compute everything that exists in classical mathematics, even if it's not computable by finite means. They have a very very large memory. In other words, each of them has the mind of God.

## Solution

Here is how the prisoners can agree on a winning strategy beforehand. Let's define an equivalence relation on the infinite sequences of hat colours: Two sequences are equivalent if they differ only on a finite number of positions. It is easy to verify that this is an equivalence relation. Therefore it partitions the set of all sequences into equivalence classes. The prisoners agree on a choice of a specific representative for each equivalence class. At the time of the test, each prisoner can see the the whole sequence except the initial positions up to his own. This is enough to determine the equivalence class. Then the prisoner will give as answer the colour given to its position by the chosen representative in that class. This ensures that they all choose according to the same representative sequence, and this sequence is equivalent to the actual sequence of hat colours. These will then differ only on a finite number of positions, that is, there will be only a finite number of mistakes.

This solution is so clever that it's not measurable by human standards. In fact, here is a proof by Christian that there cannot be a measurable strategy.

Let $$C = 2^{\mathbb{N}}$$ denote the hat configuration space. Let there be a fixed strategy $$s : C \to C$$ with non-negative probability of winning with the constraint that each prisoner cannot see their own and previous hats, i.e. $$s\ c\ n = s\ d\ n$$ if $$c\ m = d\ m$$ for all $$m > n$$ (1). Abbreviate $A = \{ c : C \mid \{ n : \mathbb{N} \mid c\ n = \operatorname{false} \}\ \text{finite}\ \},$ $B_n = \{ c : C \mid c\ m = \operatorname{true}\ \text{forall}\ m \geq n \}.$ Since $$A = \bigcup_{n : \mathbb{N}} B_n$$, the probability of the prisoners winning is given by $\mathbb{P}(s(X) \operatorname{eq} X \in A) = \lim_n \mathbb{P}(s(X) \operatorname{eq} X \in B_n),$ where $$X$$ is a random variable denoting the hat configuration, and $$\operatorname{eq}$$ is the equality operator on booleans (or hat colours). Formally, $$\mathbb{P}(X \in D)$$ is defined to mean $$\mu(D)$$, where $$\mu : \sigma(C) \to [0, 1]$$ is the measure generated by $$\mu(S_n) = \frac{1}{2}$$ and $$\sigma(C)$$ is the $$\sigma$$-algebra generated by $$S_n$$ where $$S_n := \{d\in C \mid d\ n = \operatorname{true}\}$$ for $$n : \mathbb{N}$$. Here, in talking about $$\mathbb{P}(s(X) \operatorname{eq} X \in B_n)$$, we have crucially assumed that $$s$$ is a measurable function.

Fix $$n : \mathbb{N}$$ such that $$\epsilon := \mathbb{P}(s(X) \operatorname{eq} X \in B_n) > 0$$ and abbreviate $$t\ c = s\ c \operatorname{eq} c$$. For $$m > n$$, let $$E_m$$ denote $$t^{-1}(B_n)$$ with all its elements flipped at hat $$m$$.

The $$E_m$$ are all disjoint: Suppose $$E_{m_1}$$ and $$E_{m_1}$$ have a common element. This means that there are two sequences $$c_1, c_2 \in t^{-1}(B_n)$$ whose only differences on indices larger that $$n$$ are at positions $$m_1$$ and $$m_2$$. The last difference between them is at $$n' = \max(m_1,m_2) > n$$, so $$c_1\ k = c_2\ k$$ for $$k>m$$. By (1) this implies that $$s\ c_1\ n' = s\ c_2\ n'$$. We also know (because $$n'>n$$) that $$t\ c_1\ n' = t\ c_2\ n' = \operatorname{true}$$, that is, $$c_1\ n' = s\ c_1\ n' = s\ c_2\ n' = c_2\ n'$$. This contradict the assumption that $$c_1$$ and $$c_2$$ differ at position $$n'$$.

But then $1 \geq \sum_{m > n} \mu(E_m) = \sum_{m > n} \mu(t^{-1}(B_n)) = \infty \cdot \epsilon = \infty,$ a contradiction.

Hence, the strategy must not be measurable. This suggests a Banach-Tarski-like solution.

--Christian