Welcome to the blog of the Functional Programming Lab at the University of Nottingham!

To post something, read the instructions at http://bitbucket.org/fplab/hblog.

by *nicolai* on **February 7, 2013**

In the Type Theory meeting last week, I have given an overview over the categorical semantics of dependent type theory. The semantics of the simply typed lambda calculus are well-known and generally accepted, but this is not so much true for the dependently typed setting. Also, it is not so easy to state the conditions on a category \(\mathbb C\) to be suitable for the construction of a model. Often, people just say that \(\mathbb C\) should be *locally cartesian closed*. This makes a lot of sense, but is neither really sufficient, nor necessary. However, for the moment, it is a good description for \(\mathbb C\).

We interpret every (valid) context as an object in \(\mathbb C\). For the simply typed lambda calculus, it is not too important to distinguish between contexts and types. This is more subtle for dependent types, as not every type is closed. We have typing judgments \[
\Gamma \vdash A \textit{ : type},
\] and from such a judgment, we get a new valid context, often written \(\Gamma.A\), the *context extension*. We interprete this judgment as a morphism between these contexts: \[[[ \Gamma \vdash A \textit{ : type} ]] \; : \; [[ \Gamma.A ]] \to [[ \Gamma ]]. \] One should think of this map as a projection, that "projects A away". Being lazy, one also writes \([[A]]\) instead of the rather long expression above. In the simply typed setting, one usually models types as objects. Equivalently, one could model a type as the unique morphism from the object to the terminal object.

If we have the type judgment above, we can also have a term of this type, \[ \Gamma \vdash t \, : \, A .\] We model this judgment as a section of the type judgment, i.e. it is a morphism from \([[\Gamma ]]\) to \([[\Gamma.A]]\) such that \[ [[ \Gamma \vdash A \textit{ : type} ]] \; \circ \; [[ \Gamma \vdash t \, : \, A]] \; = \; \operatorname{id}_{[[\Gamma]]}, \] or just \([[A]] \circ [[t]]\) in the lazy version.

The nice thing is that we can do many things in the same way as it works for the simply typed case; we just need to restrict ourselves to the slice category over \([[\Gamma]]\), if we work in context \(\Gamma\). For example, for the simply typed lambda calculus, we use exponentials to model function types. Note that the functor "\(A\to\)", or \(\_^A\), is right adjoint to the product functor \(\_ \times A\). The product with \([[A]]\) in the slice over \([[\Gamma]]\) is nothing else but the pullback along \([[A]]\), let us write \(A^*\), and we get dependent function types as the right adjoints of these pullbacks. Similarly, we get dependent sums as left adjoints of these pullbacks. Summarized, we have \[\Sigma_A \dashv A^* \dashv \Pi_A.\] One can convince oneself that this makes sense by drawing a couple of diagrams.

I think these semantics have first been described by Seely. “Locally Cartesian Closed Categories and Type Theory”, and have since then often been discussed; in particular, there is a *coherence problem* (things that hold strictly in type theory can, by default, only be described up to isomorphism in category theory). The (arguable) cleanest solution for this problem is a universe construction by Voevodsky. The plan is that more detailed semantics, together with this construction, the presheaf of simplicial sets, and univalence, will be the topics of the Type & Homotopy Theory meetings, as well as the Sheaf Theory seminars, in the next two weeks.

by *Venanzio* on **December 15, 2012**

Today is the 100th anniversary of the birth of Reuben Goodstein. It's a very young age for someone immortalized by his voyages into the transfinite realm. A few months ago I talked about his theorem during an FP lunch. After some wait, here is a blog post about it.

Every natural number \(x\) can be put in *Cantor normal form* with respect with any base \(b\). First we write \(x\) in base \(b\), as a sum of powers of \(b\) multiplied by coefficients smaller than \(b\), for example, if \(x=68630377369258\) and \(b=3\) we have \[
68630377369258 = 3^{29} + 3^7\cdot 2 + 1
\] We recursively write the exponents in the same base: \[
29 = 3^3+2 = 3^{3^1}+2, \quad 7 = 3\cdot 2 +1 = 3^1\cdot 2+1
\] Thus the Cantor normal form of 68630377369258 in base 3 is: \[
68630377369258 = 3^{3^{3^1}+2} + 3^{3^1\cdot 2+1}\cdot 2 + 1
\] So a Cantor normal form in base b expresses a number x as \(x = b^{c_0} \cdot a_0 + b^{c_1} \cdot a_1 + \cdots + b^{c_n} \cdot a_n\) where \(c_0\), ..., \(c_n\) are themselves Cantor normal forms.

Abstractly, generalizing over the base, it's a list of pairs of coefficients and exponents, which are themselves Cantor normal forms. Here it is expressed as a Haskell data type with functions converting between it and naturals with respect to a given base.

```
data Cantor = Cantor [(Integer, Cantor)]
deriving Show
-- (cantNum c b) computes the number with CNF c in base b
cantNum :: Cantor -> Integer -> Integer
cantNum (Cantor l) = cantNumL l
cantNumL :: [(Integer,Cantor)] -> Integer -> Integer
cantNumL [] _ = 0
cantNumL ((i,c):cs) base = i*base^(cantNum c base) + cantNumL cs base
-- (numCant b x) computes the CNF of x in base b
numCant :: Integer -> Integer -> Cantor
numCant base = Cantor . map (\(c,n) -> (c,numCant base n)) . (inBase base)
iLog :: Integer -> Integer -> Integer
iLog base n = iLogAux 0 1
where iLogAux i k = if (k*base)>n then i else iLogAux (i+1) (k*base)
inBase :: Integer -> Integer -> [(Integer,Integer)]
inBase base 0 = []
inBase base n = (n `div` k,i) : inBase base (n `rem` k)
where i = iLog base n
k = base^i
```

Now take a Cantor normal form (CNF), change the base and recompute the number. For example, earlier we had the CNF of \(68630377369258\) in base 3. If we change the base to 4 we obtain \[ \begin{array}{l} 3^{3^{3^1}+2} + 3^{3^1\cdot 2+1}\cdot 2 + 1 \leadsto 4^{4^{4^1}+2} + 4^{4^1\cdot 2+1}\cdot 2 + 1\\ = 21452492687908155359318439997129353803966985312947829\\ \quad 40435769830995482244811767516288299887706704548430405\\ \quad 09730983776813660062124991145119142938384097869825. \end{array} \] The number has become much bigger, so let's make it smaller by subtracting one! \[ \begin{array}{l} 21452492687908155359318439997129353803966985312947829\\ 40435769830995482244811767516288299887706704548430405\\ 09730983776813660062124991145119142938384097869825 - 1 \\ = 21452492687908155359318439997129353803966985312947829 \\ \quad 40435769830995482244811767516288299887706704548430405 \\ \quad 09730983776813660062124991145119142938384097869824. \end{array} \] This is one step of the Goodstein procedure: find the Cantor normal form of a number \(x\) with respect to a base \(b_1\), change the base to \(b_2\), recompute the number and subtract one. The Goodstein sequence associated with a number is obtained by iterating the step increasing the base by one every time.

```
-- (goodstep b1 b2 n) first finds the CNF of n in base b1
-- then recomputes it in base b2 and subtracts 1
goodstep :: Integer -> Integer -> Integer -> Integer
goodstep base1 base2 n = cantNum (numCant base1 n) base2 - 1
-- Given a list of bases and a number n
-- we iterate the Goodstein step on n using all the bases
goodsteinGen :: [Integer] -> Integer -> [Integer]
goodsteinGen _ 0 = []
goodsteinGen (b1:b2:bs) n = n : goodsteinGen (b2:bs) (goodstep b1 b2 n)
goodstein :: Integer -> [Integer]
goodstein = goodsteinGen [2..]
```

Let's compute a few:

```
*Main> goodstein 1
[1]
*Main> goodstein 2
[2,2,1]
*Main> goodstein 3
[3,3,3,2,1]
```

Now do what I did: run it on 4 on your university's server and enjoy the screams of frustration of your colleagues when everything hangs up.

Although computing the Goodstein sequence for 4 is not feasible in practice, one can prove that it terminates.

*The Goodstein procedure terminates on every starting number.*

Although the theorem has a simple formulation and a short elegant proof, it cannot be proved in Peano Arithmetic: it essentially needs transfinite induction. It is probably the simplest result that is undecidable in PA.

The proof consists in associating an ordinal to every CNF, proving that every step in the process reduces this ordinal and therefore concluding that the process terminates by well-foundedness of the ordinal numbers.

Given a CNF, just compute it using the ordinal \(\omega\) (the ordinal associated to the natural numbers) as a base. The example that we've seen earlier gives: \[ \omega^{\omega^{\omega^1}+2} + \omega^{\omega^1\cdot 2+1}\cdot 2 + 1 \] (Addition and multiplication of transfinite ordinals are not commutative; to get the correct result the coefficients must be multiplied on the right and the terms must be in decreasing order.)

Now it is very easy to see that every step reduces the ordinal. If the CNF contained a non-zero term with the base raised to the power zero (that is, if the number is not divisible by the base) then the associated ordinal is a successor an the step will just give its predecessor. Otherwise there is a bigger decrease in the ordinal (depending on the base used).

In any case, we get a decreasing sequence of ordinals that will eventually reach zero. This proves the theorem.

Since every tower of exponentiation of the base is a valid CNF, we get that every ordinal smaller that \(\epsilon_0\) can occur: \[ \epsilon_0 = \sup \{\omega, \omega^\omega, \omega^{\omega^\omega}, \ldots\}. \] So we are doing \(\epsilon_0\)-induction, which goes beyond elementary arithmetic.

by *Nicolai Kraus* on **December 12, 2012**

Last Friday (07/12/12), I have talked about the difficulties of quotients in Homotopy Type Theory.

Traditionally, the quotient \(A/\sim\) of a set \(A\) by some equivalence relation \(\sim : A \to A \to \textbf{Prop}\) is a set \(Q\) (or, by abuse of notation, just \(A/\sim\) , think of the set of equivalence classes), together with a function \([\_] : A \to Q\) (mapping every element on its equivalence class) and an eliminator. The latter says, more or less, whenever a function \(f : A \to C\) does not distinguish between elements that are \(\sim\)-equivalent, it can be lifted to a function \(\overline f : Q \to C\). For the details, I would recommend Definable Quotients in Type Theory by Nuo, Thomas and Thorsten.

This is a very natural concept and, in some cases, it is necessary. For example, the real numbers cannot be defined without quotients (this statement refers only to Type Theory, but even in classical mathematics, notions as the Cauchy Reals make usage of quotients). However, the concept becomes much more involved as soon as the principle UIP (uniqueness of identity proofs) is dropped. It would make sense to generalise the notion of an equivalence relation, but even if we don't, it is not clear at all what it means for a function to "not distinguish between elements that are equivalent". Without UIP, the actual proofs and their behaviours matter, while we did not have to care about that issue before.

I talked about the example of the possibly simplest equivalence relation, which is constantly \(\textbf{True}\). In that case, the quotient should be the h-propositional reflection (also known as "squash-type" or "bracket-type"), but it is already very hard to formalise the conditions under which a function \(f : A \to C\) can be lifted to \(Q \to C\). Informally, for any finite number of points in \(A\), we can, by induction on the number of points, define the corresponding simplex in \(Q\), and the condition on \(f\) is that these simplices can be built out of coherence proofs. This leads to one of the standard constructions when it comes to \(\omega\)-category theory, and possibly to concepts that are well-known in Homology. Of course, the first goal is the formalisation of the informally described idea.

by *Venanzio* on **December 6, 2012**

The type theory meeting of Wednesday 5 December was a short introduction to induction-recursion with a divagation about impredicativity.

Martin-Löf distinguishes universes à la Russell and universes à la Tarski. The difference is that a Russell-style Universe \(\mathsf{U}\) is a type such that any of its elements \(A:\mathsf{U}\) is itself a type; while a Tarski-style universe is a set \(\mathsf{U}\) of codes together with a decoding function \(\mathsf{Elem}: \mathsf{U}\rightarrow \mathsf{Type}\).

A simple Tarski universe with non-dependent types can be defined by plain induction and the decoding function is given by recursion on it: \[ \begin{array}{l} \mathsf{U}: \mathsf{Type}\\ \mathsf{zero}: \mathsf{U}\\ \mathsf{one}: \mathsf{U}\\ \mathsf{plus}: \mathsf{U}\rightarrow \mathsf{U}\rightarrow \mathsf{U}\\ \mathsf{times}: \mathsf{U}\rightarrow \mathsf{U}\rightarrow \mathsf{U}\\ \end{array} \quad \begin{array}{l} \mathsf{Elem}: \mathsf{U}\rightarrow \mathsf{Type}\\ \mathsf{Elem}\,\mathsf{zero}= \emptyset\\ \mathsf{Elem}\,\mathsf{one}= \{\bullet\}\\ \mathsf{Elem}\,(\mathsf{plus}\,a\,b) = \mathsf{Elem}\,a + \mathsf{Elem}\,b\\ \mathsf{Elem}\,(\mathsf{times}\,a\,b) = \mathsf{Elem}\,a \times \mathsf{Elem}\,b \end{array} \]

But when we want to add constructors \(\mathsf{pi}\) and \(\mathsf{sigma}\) for dependent products and sums to \(\mathsf{U}\), then the definitions of \(\mathsf{U}\) and \(\mathsf{Elem}\) become entangled: \(\mathsf{Elem}\) occurs in the type of the constructors of \(\mathsf{U}\): \[ \begin{array}{l} \mathsf{pi}: \Pi\,a:\mathsf{U}, (\mathsf{Elem}\, a \rightarrow \mathsf{U}) \rightarrow \mathsf{U}\\ \mathsf{sigma}: \Pi\,a:\mathsf{U}, (\mathsf{Elem}\, a \rightarrow \mathsf{U}) \rightarrow \mathsf{U}\\ \ \\ \mathsf{Elem}\, (\mathsf{pi}\,a\,b) = \Pi\,x:\mathsf{Elem}\, a, \mathsf{Elem}\, (b\, x)\\ \mathsf{Elem}\, (\mathsf{sigma}\,a\,b) = \Sigma\,x:\mathsf{Elem}\, a, \mathsf{Elem}\, (b\, x) \end{array} \] This was the first instance of an inductive-recursive definition: the simultaneous specification of an inductive type and a recursive function on it.

The construction can be iterated and generalized. We can define another universe \(\mathsf{U}_1\) that contains \(\mathsf{U}\) by the same recursive-inductive specification plus a code \(\mathsf{u}_0:\mathsf{U}_1\) with associated type \(\mathsf{Elem}\, \mathsf{u}_0 = \mathsf{U}\). In fact, if we already defined an entire family of universes, we can use this construction to create an new universe that contains them all. Eric Palmgren introduced a superuniverse that is closed under this universe-creation rule.

Peter Dybjer further generalized the construction and invented simultaneous inductive-recursive definitions. These allow us to define an inductive type together with a recursive function on it. The function is allowed to appear in the type of the constructors; but there must be syntactic restrictions to guarantee that it is applied to arguments that are smaller than the one generated by the constructor. Later Dybjer and Anton Setzer devised a elegant type of codes for inductive-recursive definitions (we'll look at it on a future meeting).

A simple example is the type of lists without repetitions, by Catarina Coquand. The type is defined simultaneously with a freshness relation that guarantees that we can only add elements that are not already in the list. \[ \begin{array}{l} \mathsf{FList}\, (A:\mathsf{Set}): \mathsf{Set}\\ \mathsf{fnil}: \mathsf{FList}\, A\\ \mathsf{fcons}: \Pi\,x:A, \Pi\,l:\mathsf{FList}\, A, \mathsf{Fresh}\, x\, l \rightarrow \mathsf{FList}\, A\\ \ \\ \mathsf{Fresh}: A \rightarrow \mathsf{FList}\, A \rightarrow \mathsf{Prop}\\ \mathsf{Fresh}\, x\, \mathsf{fnil}= \top\\ \mathsf{Fresh}\, x\, (\mathsf{fcons}\, y\, l\, p) = (x\neq y) \land \mathsf{Fresh}\, x\, l \end{array} \]

Ana Bove and I used inductive-recursive definitions to formalize general recursive functions in type theory. The idea is to characterize the domain of a function by an inductive predicate and then define the function itself by recursion on the proof of the predicate. Take for example the quicksort algorithm. In Haskell it is written as:

```
quicksort :: [A] -> [A]
quicksort [] = []
quicksort (x:l) = quicksort ll ++ x : quicksort lr
where ll = [y | y <- l, y<=x]
lr = [y | y <- l, y>x]
```

The arguments of the recursive calls are not subterms of the principal argument, so this definition is not accepted in systems like Coq or Agda. Instead we can first define an inductive domain predicate: \[ \newcommand{\dnil}{\mathsf{d}_{\mathsf{nil}}} \newcommand{\dcons}{\mathsf{d}_{\mathsf{cons}}} \newcommand{\append}{\mathop{+\!\!+}} \begin{array}{l} \mathsf{D}_{\mathsf{qs}} : \mathsf{List}\, A \rightarrow \mathsf{Prop}\\ \dnil : \mathsf{D}_{\mathsf{qs}}\,\mathsf{nil}\\ \dcons : \Pi\,x:A, \Pi\,l:\mathsf{List}\, A, \mathsf{D}_{\mathsf{qs}}\, l_{\leq x}\rightarrow \mathsf{D}_{\mathsf{qs}}\, l_{>x}\rightarrow \mathsf{D}_{\mathsf{qs}}\, (\mathsf{cons}\, x\, l) \end{array} \] where \(l_{\leq x}\) and \(l_{>x}\) are the lists of elements of \(l\) respectively less or equal and larger than \(x\). The actual sorting algorithm can be defined by structural recursion on the proof that the list satisfies the domain predicate: \[ \begin{array}{l} \mathsf{qs}: \Pi\,l:\mathsf{List}\,A, \mathsf{D}_{\mathsf{qs}} \rightarrow \mathsf{List}\,A\\ \mathsf{qs}\,\mathsf{nil}\,\dnil = \mathsf{nil}\\ \mathsf{qs}\,(\mathsf{cons}\,x\,l)\,(\dcons\,x\,l\,p_1,p_2) = (\mathsf{qs}\,l_{\leq x}\,p_1) \append \mathsf{cons}\,x\,(\mathsf{qs}\,l_{>x}\,p_2) \end{array} \]

In this case the function happens to be total: we can prove that the domain predicate is always satisfied, \(H:\forall l:\mathsf{List}\,A, \mathsf{D}_{\mathsf{qs}}\,l\). Therefore, we can eliminate the dependency on the proof: \(\mathsf{quicksort}: \mathsf{List}\,A \rightarrow \mathsf{List}\,A\), \(\mathsf{quicksort}\,l = \mathsf{qs}\,l\,(H\,l)\).

If the recursive function we try to implement has nested recursive calls, then we need induction-recursion. Take for example McCarthy's 91 function:

```
mccarthy :: Int -> Int
mccarthy n | n > 100 = n - 10
| otherwise = mccarthy (mccarthy (n+11))
```

The function always terminates and returns 91 on every argument less or equal to 101. However, the recursive calls are made on arguments that may be larger than the original input and are nested. We can formalize it in type theory in the same style as the quicksort algorithm:

\[ \newcommand{\mcc}{\mathsf{mc}} \newcommand{\dgr}{\mathsf{d}_{0}} \newcommand{\dls}{\mathsf{d}_{1}} \begin{array}{l} \mathsf{D}_{\mcc} : \mathbb{N}\rightarrow \mathsf{Prop}\\ \dgr : \Pi\,n:\mathbb{N}, n>100 \rightarrow \mathsf{D}_{\mcc}\,n\\ \dls : \Pi\,n:\mathbb{N}, n\leq 100 \rightarrow \\ \qquad \Pi\,h: \mathsf{D}_{\mcc}\,(n+11), \mathsf{D}_{\mcc}\,(\mcc\,(n+11)\,h) \rightarrow \mathsf{D}_{\mcc}\,n \\ \ \\ \mcc : \Pi\,n:\mathbb{N}, \mathsf{D}_{\mcc}\,n \rightarrow \mathbb{N}\\ \mcc\,n\,(\dgr\,n\,p) = n - 10\\ \mcc\,n\,(\dls\,n\,p\,h_1\,h_2) = \mcc\,(\mcc\,(n+11)\,h_1)\,h_2 \end{array} \] In this case we used the function \(\mcc\) in the type of the constructor \(\dls\), which makes it a simultaneous inductive-recursive definition.

I've been a bit too casual in defining the domain predicates as functions to \(\mathsf{Prop}\). There are issues related to the consistency of elimination rules between type universes. You can simply adopt the Curry-Howard correspondence in full and replace \(\mathsf{Prop}\) everywhere with \(\mathsf{Set}\). However, if instead you have a separate impredicative universe of proposition (as, for example, in Coq), then there are consistency problems.

*Impredicativity* refers to the possibility of quantifying over a type universe and obtaining an element of that same universe. In general, the formation rule of product types can be formulated as follows: \[
\dfrac
{A:s_1 \quad x:A \vdash B[x]:s_2}
{\Pi\,x:A, B[x] : s_3}
\] where \(s_1\), \(s_2\) and \(s_3\) are sorts (type universes). One can define a *Pure Type System* by specifying a set of such rules simply by giving the triples \((s_1,s_2,s_3)\) and some axioms in the form \(s_1:s_2\). We have that a sort \(s_1\) is impredicative if there is an axiom \(s_1:s_2\) and a rule for \(\Pi\,\)-types with triple \((s_2,s_1,s_1)\). In our specific case, \(s_1 = \mathsf{Prop}\) and \(s_2 = \mathsf{Set}\): we can quantify over any set, including the set of all propositions.

The most famous impredicative theory is Girard's *system F*. Girard proved that the normalization property of his system was equivalent to the consistency of second order Peano Arithmetic, that is, analysis. He then proved normalization with the new technique of *reducibility candidates*.

Although pure type systems only have function types, in an impredicative theory we can encode inductive types by the polymorphic quantification of their iterators. For example, the natural numbers can be defined in system F as \[ \mathbb{N}= \Pi\,X:\mathsf{Set}, X \rightarrow (X\rightarrow X) \rightarrow X \] (In this case it is the sort \(\mathsf{Set}\) that's impredicative).

If \(\mathsf{Prop}\) is impredicative, eliminating an inductive proposition over a set like I have done is forbidden. Allowing this kind of elimination in general would lead to a contradiction, a version of Girard's paradox described by Thierry Coquand.

However, there is a nice trick to get around it; it was taught to me by Christine Paulin. There is no contradiction if the proposition you eliminate has at most one proof. This is in fact the case for our domain predicates. To convince Coq to accept our formalization we need to modify it slightly by defining destructors for the proofs, for example: \[ \newcommand{\getlx}{\mathsf{get}_{\leq x}} \newcommand{\getgx}{\mathsf{get}_{>x}} \begin{array}{l} \getlx : \Pi\,x:A, \Pi\,l:\mathsf{List}\,A,\mathsf{D}_{\mathsf{qs}}\, (\mathsf{cons}\, x\, l) \rightarrow \mathsf{D}_{\mathsf{qs}}\, l_{\leq x}\\ \getlx\,(\mathsf{cons}\, x\, l)\,(\dcons\,x\,l\,h_1\,h_2) = h_1\\ \ \\ \getgx : \Pi\,x:A, \Pi\,l:\mathsf{List}\,A, \mathsf{D}_{\mathsf{qs}}\, (\mathsf{cons}\, x\, l) \rightarrow \mathsf{D}_{\mathsf{qs}}\, l_{\leq x}\\ \getgx\,(\mathsf{cons}\, x\, l)\,(\dcons\,x\,l\,h_1\,h_2) = h_2\\ \ \\ \end{array} \] Then we use these destructors to extract the subproofs when defining the function: \[ \begin{array}{l} \mathsf{qs}\,(\mathsf{cons}\,x\,l)\,h =\\ \qquad (\mathsf{qs}\,l_{\leq x}\,(\getlx\,x\,l\,h)) \append \mathsf{cons}\,x\,(\mathsf{qs}\,l_{>x}\,(\getgx\,x\,l\,h)). \end{array} \]

Per Martin-Löf,

*Intuitionistic type theory*, Napoli, Bibliopolis, 1984.Eric Palmgren,

*On universes in type theory*, in: G. Sambin and J. Smith (eds.) Twenty Five Years of Constructive Type Theory, Oxford Logic Guides, 1998.Peter Dybjer,

*A General Formulation of Simultaneous Inductive-Recursive Definitions in Type Theory*, J. Symb. Log. 65(2): 525-549 (2000).Peter Dybjer and Anton Setzer,

*A Finite Axiomatization of Inductive-Recursive Definitions*, TLCA 1999:129-146.Ana Bove and Venanzio Capretta,

*Nested General Recursion and Partiality in Type Theory*, in TPHOLs 2001, LNCS 2152, pages 121-135.Henk Barendregt,

*Lambda calculi with Types*, in: Handbook of Logic in Computer Science, Vol 2, OUP, (1992), pp 117-309.Jean-Yves Girard, Yves Lafont and Paul Taylor,

*Proofs and Types*, Cambridge Tracts in Theoretical Computer Science, 1989Thierry Coquand,

*An Analysis of Girard's Paradox*, LICS 1986: 227-236.

by *Paolo Capriotti* on **November 28, 2012**

Today we discussed about isomorphisms and weak equivalences in type theory.

There are at least two ways to define the notion of equivalence between two types `X`

and `Y`

.

The first, which we will call **isomorphism**, is a tuple `(f, g, H, K)`

, with:

```
f : X → Y
g : Y → X
H : (x : X) → g (f x) ≡ x
K : (y : Y) → f (g y) ≡ y
```

i.e. essentially a pair of functions and pair of homotopies. This corresponds to the notion of *homotopy equivalence* in homotopy theory.

The other possible definition is the usual one by Voevodsky: a function `f : X → Y`

is a weak equivalence if the following type is inhabited:

`(y : Y) → h₀ (Σ X λ x → f x ≡ y)`

i.e. every fiber is contractible.

We can show that these two notions are logically equivalent. Going from a weak equivalence to an isomorphism is quite straightforward. Given a weak equivalence `f`

, unfolding the definition gives that, for all `y : Y`

, we have a triple

`(g y, K y, ϕ y)`

with

```
g y : X
K y : f (g y) ≡ y
ϕ y : (x : X)(p : f x ≡ y) → (x, p) ≡ (g y, K y)
```

So we automatically get an inverse `g`

and a homotopy `K`

, and we only need to find an `H`

. So we define:

`H x = cong proj₁ (ϕ (f x))`

and we're done.

The other direction is trickier. We first prove the type-theoretic equivalent of **Vogt's lemma**:

*Given an isomorphism (f, g, H, K), we can find K', of the same type as K, and γ : (x : X) → cong f (H x) ≡ K' (f x).*

The outline of the proof is as follows: we first show that the following diagram of homotopies

\[ \newcommand{\ra}[1]{\kern-1.5ex\xrightarrow{\ \ #1\ \ }\phantom{}\kern-1.5ex} \newcommand{\ras}[1]{\kern-1.5ex\xrightarrow{\ \ \smash{#1}\ \ }\phantom{}\kern-1.5ex} \newcommand{\da}[1]{\bigg\downarrow\raise.5ex\rlap{\scriptstyle#1}} \begin{array}{c} f g f g f & \ra{fHgf} & f g f \\ \da{fgKf} & & \da{K f} \\ f g f & \ra{f H} & f \\ \end{array} \]

commutes. In fact, it is quite easy to see that we can replace `fHgf`

with `fgfH`

in the top arrow. Similarly, we can replace `fgKf`

with `Kfgf`

in the left arrow. Then commutativity of the diagram follows immediately from the interchange law (i.e. the fact that horizontal composition of homotopies is well defined).

Next, we can observe that `f`

appears on the right of every arrow except the bottom one, hence if we define:

\[ K' = K \circ f H g \circ (f g K)^{-1} \]

it's clear that

\[ K' f \equiv f H \]

\(\square\)

Now we're ready to show that any isomorphism is a weak equivalence. Given any isomorphism `(f, g, H, K)`

, Vogt's lemma implies that we can assume the existence of `γ : (x : X) → cong f (H x) ≡ K (f x)`

(we say that the isomorphism is **coherent**, in this case). Again, a weak equivalence is given by a triple `(g, K, ϕ)`

, and we have `g`

and `K`

already, so we only need to provide `ϕ`

.

Using the fact that equality in a \(Σ\)-type is the same as a \(Σ\) of equalities, and given `x : X`

, `y : Y`

and `p : f x ≡ y`

, we define

```
q : x ≡ g y
q = (H x)⁻¹ ; cong g p
contr : subst (λ x → f x ≡ y) q p ≡ K y
contr = ? -- see below
ϕ x y p = q , contr
```

The proof `contr`

consists simply of algebraic manipulations on paths:

```
subst (λ x → f x ≡ y) q p
≡⟨ substituting on an equality is the same as composing ⟩
cong f q⁻¹ ; p
≡⟨ definition of q ⟩
cong f (cong g p⁻¹) ; cong f (H x) ; p
≡⟨ coherence ⟩
cong (g ∘ f) p⁻¹ ; K (f x) ; p
≡⟨ interchange law ⟩
K y ; p⁻¹ ; p
≡⟨ groupoid law ⟩
K y
```

Questions raised:

- What's an example of non-coherent isomorphism, i.e. a quadruple
`(f, g, H, K)`

for which there exists no corresponding`γ`

? - Is the logical equivalence above (restricted to coherent isomorphisms) an isomorphism itself?
- It is possible to define a dual notion of coherence by swapping
`f`

and`g`

. Fortunately, one coherence is enough to prove the other, but it raises a question: is it possible to have non-coherent coherence proofs (and so on)?

Update:

An answer to 1 can be found in this agda file, which uses our work in progress base library for HoTT

by *Venanzio* on **November 19, 2012**

There is a jail containing a countably infinite number of inmates. Because of budget cuts, the government is closing it down. The prisoners will either be all released or all executed. The decision depends on whether they pass a certain test.

On the fatal day, the inmates are allowed to agree a common strategy beforehand, but they can't communicate during the test itself. They will be put in a (very) long line and coloured hats, either black or red, will be put on their heads. A prisoner cannot see the colour of his own hat, but he can see the hats of all the infinite people in front of him. Afterwards, each of them is taken to a separate room and must guess his own hat's colour.

If only a finite number of them guess wrongly, then they pass the test and they all go free. But if they make an infinite number of wrong guesses, then they will all be executed.

Now: do you think that the inmates can agree on a strategy that will guarantee their success?

**Note:** These prisoners have very good eyesight: they can see an infinite sequence of hats in front of them. They also can compute everything that exists in classical mathematics, even if it's not computable by finite means. They have a very very large memory. In other words, each of them has the mind of God.

Here is how the prisoners can agree on a winning strategy beforehand. Let's define an equivalence relation on the infinite sequences of hat colours: Two sequences are equivalent if they differ only on a finite number of positions. It is easy to verify that this is an equivalence relation. Therefore it partitions the set of all sequences into equivalence classes. The prisoners agree on a choice of a specific representative for each equivalence class. At the time of the test, each prisoner can see the the whole sequence except the initial positions up to his own. This is enough to determine the equivalence class. Then the prisoner will give as answer the colour given to its position by the chosen representative in that class. This ensures that they all choose according to the same representative sequence, and this sequence is equivalent to the actual sequence of hat colours. These will then differ only on a finite number of positions, that is, there will be only a finite number of mistakes.

This solution is so clever that it's not measurable by human standards. In fact, here is a proof by Christian that there cannot be a measurable strategy.

Let \(C = 2^{\mathbb{N}}\) denote the hat configuration space. Let there be a fixed strategy \(s : C \to C\) with non-negative probability of winning with the constraint that each prisoner cannot see their own and previous hats, i.e. \(s\ c\ n = s\ d\ n\) if \(c\ m = d\ m\) for all \(m > n\) (1). Abbreviate \[A = \{ c : C \mid \{ n : \mathbb{N} \mid c\ n = \operatorname{false} \}\ \text{finite}\ \},\] \[B_n = \{ c : C \mid c\ m = \operatorname{true}\ \text{forall}\ m \geq n \}.\] Since \(A = \bigcup_{n : \mathbb{N}} B_n\), the probability of the prisoners winning is given by \[\mathbb{P}(s(X) \operatorname{eq} X \in A) = \lim_n \mathbb{P}(s(X) \operatorname{eq} X \in B_n),\] where \(X\) is a random variable denoting the hat configuration, and \(\operatorname{eq}\) is the equality operator on booleans (or hat colours). Formally, \(\mathbb{P}(X \in D)\) is defined to mean \(\mu(D)\), where \(\mu : \sigma(C) \to [0, 1]\) is the measure generated by \(\mu(S_n) = \frac{1}{2}\) and \(\sigma(C)\) is the \(\sigma\)-algebra generated by \(S_n\) where \(S_n := \{d\in C \mid d\ n = \operatorname{true}\}\) for \(n : \mathbb{N}\). Here, in talking about \(\mathbb{P}(s(X) \operatorname{eq} X \in B_n)\), we have crucially assumed that \(s\) is a measurable function.

Fix \(n : \mathbb{N}\) such that \(\epsilon := \mathbb{P}(s(X) \operatorname{eq} X \in B_n) > 0\) and abbreviate \(t\ c = s\ c \operatorname{eq} c\). For \(m > n\), let \(E_m\) denote \(t^{-1}(B_n)\) with all its elements flipped at hat \(m\).

The \(E_m\) are all disjoint: Suppose \(E_{m_1}\) and \(E_{m_1}\) have a common element. This means that there are two sequences \(c_1, c_2 \in t^{-1}(B_n)\) whose only differences on indices larger that \(n\) are at positions \(m_1\) and \(m_2\). The last difference between them is at \(n' = \max(m_1,m_2) > n\), so \(c_1\ k = c_2\ k\) for \(k>m\). By (1) this implies that \(s\ c_1\ n' = s\ c_2\ n'\). We also know (because \(n'>n\)) that \(t\ c_1\ n' = t\ c_2\ n' = \operatorname{true}\), that is, \(c_1\ n' = s\ c_1\ n' = s\ c_2\ n' = c_2\ n'\). This contradict the assumption that \(c_1\) and \(c_2\) differ at position \(n'\).

But then \[1 \geq \sum_{m > n} \mu(E_m) = \sum_{m > n} \mu(t^{-1}(B_n)) = \infty \cdot \epsilon = \infty,\] a contradiction.

Hence, the strategy must not be measurable. This suggests a Banach-Tarski-like solution.

--Christian