*Wednesday February 28 2024.*

In some previous post I expressed interested in autoformalization. I’m not so sure anymore this is all that powerful…

Formalizing is not that hard, proof synthesis is.

I think about the success of the chess-playing algorithm, which played against itself. Is that the key? But there is no playing against the self here, we have to generate the data in some way…?

*Monday February 26 2024.*

A birth-death process is a (continuous-time) Markov chain where the state-space is \(N=\{0,1,2,\cdots\}\) and where for every \(n\) there is a transition to \(n+1\) (birth) and a transition to \(n-1\) (death), except \(n=0\) which can only transition to \(n=1\).

Each of these transitions has a rate assigned, \(\alpha_n\) for the birth rate from \(n\) to \(n+1\) and \(\beta_n\) for the death rate from \(n+1\) to \(n\). For example, if \(n\) was the number of individuals with an infectious disease in a closed system, then \(\alpha_0=0\) because at least 1 individual is needed to spread, making \(n=0\) an absorbing state.

You can convert this to a *discrete-time* Markov chain by defining \[
p_n = \frac{\alpha_n}{\alpha_n+\beta_{n-1}}
\] and \[
q_n = \frac{\beta_n}{\alpha_{n+1}+\beta_n}
\] assuming \(\alpha_n+\beta_{n+1}\neq0\) for all \(n\). In this model, \(p_n\) is the probability of transitioning from \(n\) to \(n+1\) and \(q_n\) is the probability of transitioning from \(n+1\) to \(n\).

What if you have a dynamical system on a state space \(X\) and a map \(f:X \to \mathbb{N}\) assigning each state to a natural number? Can you reconstruct a birth-death process from this? It appears so, here’s how:

For each \(n \in N\) let \(X_n\) be the preimage of \(n\). We should assume \(X_n\) only borders \(X_{n-1}\) and \(X_{n+1}\), that is, \[ |m-n| \geq 2 \implies X_m \cap X_n = \emptyset, \quad \forall m,n \in N \]

Then, for each \(n\) and for each \(x_0 \in X_n\), the orbit \(x(t)\) starting from \(t=0\) at \(x_0\) can either 1) remain in \(X_n\) forever, 2) leave into \(X_{n-1}\), or 3) leave into \(X_{n+1}\). Thus partition \(X_n\) into \(A_n^+ \cup A_n \cup A_n^-\). (Assume \(A_0^-=\emptyset\).)

At this point, we also need a measure on \(X\). In fact let’s just assume we have a measure-preserving dynamical system so that we can measure (some) subsets \(S \subseteq X\) and get a real number \(\mu(S)\) between 0 and 1 in return, with \(\mu(X)=1\) and \(\mu(\emptyset)=0\).

If we assume for all \(n\) that \(A_n^+\) and \(A_n^-\) are measurable, and \(\mu(A_n^+ \cup A_n^-)\neq 0\), then we can define \[ p_n = \frac{\mu(A_n^+)}{\mu(A_n^+ \cup A_n^-)} \] and \[ q_n = \frac{\mu(A_n^-)}{\mu(A_n^+ \cup A_n^-)} \] for all \(n\). In other words, for a given \(n\), \(\alpha_n\) is the proportion of trajectories This seems to yield a valid discrete-time birth-death process. :)

TODO: shouldn’t this apply to continuous time too..? In that case i think you integrate the time-to-exit over \(A_n^\pm\).

*Friday February 2 2024.*

When agents are allowed to evolve in the pursuit of goal (e.g. multi-agent reinforcement learning) then the following “theorem” seems to apply:

- purely competitive agents will never communicate
- purely cooperative agents will tell-all to their peers

In situations of mixed cooperation (like inter-species communication) then some mixture of these extremes. But where is this explicitly stated and/or proved? It lies somewhere in signaling theory probably. Some papers:

- Learning to communicate with deep multi-agent reinforcement learning (2016) by Jakob Foerster and corresponding video lecture
- Generalized beliefs for cooperative AI (2022)

*Thursday February 1 2024.*

Some people are working on “language agents” which try to turn GPT into an agent by giving it the ability to take “actions” in some way, and rolling this into an action-perception loop.

I think this is mostly bullshit but it does seem to have the ability to take a very short text prompt, and turn that into working software with high probability…

Some interesting projects include: - Generative agents: interactive simulacra of human behavior - Microsoft’s Autogen - ChatDev: communicative agents for software development

*Monday January 29 2024.*

Let N be the Petri net (equivalently chemical reaction network) with a single species X and two transitions, b: ∅ → X and d: X → ∅. If you assign each of these transitions a *reaction rate*, say \(r_b\) and \(r_d\) respectively, then this defines a birth-death process where (using Wikipedia’s notation) \(\lambda_n = r_b\) and \(\mu_n=nr_d\). The system has a steady state (in the probablistic sense) when \(\lambda_n=\mu_n\), i.e. when \(n = \frac{r_b}{r_d}\), the ratio of the two rates and the “carrying capacity” of the model.

Extreme technial bit incoming. Let C be the commutative monoidal category associated to this Petri net (see Proposition 3.3 in arXiv:2101.04238), and let S be any dynamical system, regarded as a category, where objects are states and morphisms are transitions. Define a “birth-death system” to be any functor S → C.

I wonder if this has a name? It seems to extend the notion of birth-death process beacuse it not only encodes the probabilities of transition between different n but also *tracks each individual*, since any time-evolution in the dynamical system S is mapped to a morphism in C, and every such morphism is generated by (consists of sequential and parallel composition of) b and d.

*Sunday January 7 2024.*

You may know the prisoners’ dilemma as a classic example in game theory. But for whatever reason, this example never resonated with me and I didn’t really get the point. Then, I was reading a book about evolutionary game theory (forgot which oops) that describes the public goods game and claimed it is a *generalization* of the prisoner’s dilemma. So, if we can understand one, we can understand the other!

In the public goods game, each player starts with some fixed allowance - for simplicity, let’s assume \(n\) is the number of players and they all get the same allowance of 100. Each player then has a *choice*: what *proportion* of their allowance to keep, and what proportion to put into a *public pool*. After all players have made the choice, each player’s score is the sum of 1) the part of their allowance they kept, and 2) the total amount in the public pool, divided by the number of players, multiplied by some factor \(R\).

What is the meaning of \(R\)? It sort of models the beneficial effect of cooperation. When \(R=1\), then public pool has no effect, and there’s no reason for anyone to contribute. When \(R>1\) however, the situation becomes more interesting. For concreteness let’s assume \(R=2\). Then:

if no one contributes anything to the pool (and keep all their allowance) then each player gets a score of 100.

if everyone contributes all their allowance to the pool, then the pool contains a score of \(200 \times N\), so when it’s distributed equally to all players they each receive a score of 200.

Everyone contributing their entire allowance is thus the state which maximizes everybody’s score.

Unfortunately, this situation is not a *Nash equilibrium*, and we can see why: suppose all your peers contribute their full allowance, and so do you. Let’s name the corresponding score as the cooperation score: \[
\mathsf{score_{cooperate}}= 200
\] Alternatively, assuming your peers will contribute, you could choose to defect: to keep all your allowance and contribute none to the pool. In that case your score is given by your allowance, plus the pooled contributions of the remaining \(n-1\) players. \[
\mathsf{score_{defect}} = 100 + \frac{200 \times (n-1)}{n}
\] As you may have guessed by now, \[
\mathsf{score_{defect} > score_{cooperate}}
\] As a result, the “rational” behavior of each player is to keep all their allowance. And thus, the Nash equilibrium, the equilibrium that would presumably be reached by repeated competitive play, is where no player contributes anything, and receives a score of 100. How pitiful!

It is a generalization if you take the number of players to be 2 (also reduce \(R\) to 1.5 because it needs to be less than the number of players), and instead of any split, players must put all or nothing into the pool.

In that case, there’s only 3 distinct outcomes:

- If they both cooperate, they both get 150 (neither prisoner tattles).
- If they both defect, they both get 100 (both tattle).
- If one cooperates and one defects, the sucker (cooperator) gets 75 (their own allowance times 1.5 over 2) and the defector gets 175 (100 kept + 75 from the sucker).

Just as before, even though they would both benefit from cooperating, the Nash equilibrium is to defect.

*Thursday December 28 2023.*

```
structure Monoid where
element : Type
op : element -> element -> element
identity : element
identity_left : forall x: element, op identity x = x
identity_right : forall x: element, op x identity = x
associativity : forall x y z: element, op x (op y z) = op (op x y) z
```

The above code defines a new type called `Monoid`

. In order to construct a term of this type, you need to provide the six fields listed.

`element`

is some type. Lean has a lot of builtin types, such as integers or natural numbers. You could also pass a custom type here.`op`

is a function that takes two terms of type`element`

and returns a result, another term of type`element`

. In other words, it is a binary operation on`element`

.`identity`

is some specific term of type`element`

- the remaining three,
`identity_left`

,`identity_right`

, and`associativity`

are*propositions*.

Thus, in order to construct a term of type `Monoid`

, you not only need to provide an underlying set of elements and a binary operation with an identity, you also need to present *proofs* of the corresponding propositions that they satisfy the conditions of a monoid.

`Nat`

. Lean4 also comes with the builtin function `Nat.add`

which has type `Nat -> Nat -> Nat`

, which is exactly what we want for the binary operation, and `Nat.zero`

, which is a term of type `Nat`

.
```
example: Monoid := {
element := Nat
op := Nat.add
identity := Nat.zero
identity_left := by {intros; rw [Nat.add_eq, Nat.zero_eq, Nat.zero_add]},
identity_right := by {intros; rw [Nat.add_eq, Nat.zero_eq, Nat.add_zero]},
associativity := by {intros; rw [Nat.add_eq, Nat.add_eq, Nat.add_eq, Nat.add_eq, Nat.add_assoc]}
}
```

The identity and associativity proofs are a little convoluted, and in truth I just found them via trial and error in an interactive prover. But it should not be surprising they are short proofs, because the fact that the natural numbers form a monoid under addition is one of their fundamental properties.

As another example, we could also define the monoid of natural numbers with`element`

field will be the same, but the operation will be `Nat.mul`

and the identity is 1 instead of 0. There is no `Nat.one`

, instead we can write `Nat.succ Nat.zero`

, which is the application of `Nat.succ`

, a function, to `Nat.zero`

, an element. In English - the successor of zero. (We could also simply write `1`

but this would involve type inference.)
```
example: Monoid := {
element := Nat
op := Nat.mul
identity := Nat.succ Nat.zero
identity_left := by {intros; rw [Nat.mul_eq, Nat.one_mul]},
identity_right := by {intros; rw [Nat.mul_eq, Nat.mul_one]},
associativity := by {intros; rw [Nat.mul_eq, Nat.mul_eq, Nat.mul_eq, Nat.mul_eq, Nat.mul_assoc]}
}
```

For many algebraic objects, you want to look at *structure-preserving maps* between them. For monoids, these are called *monoid homomorphisms* (they are also sometimes called *monoid morphisms* since they are the morphisms in the category of monoids). In short, a monoid homomorphism is a function between two monoids that “preserves” the binary operation, as well as the identity element.

```
structure Monoid.morphism (M N: Monoid) where
function: M.element -> N.element
identity_preserving: function M.identity = N.identity
operation_preserving: forall x y: M.element, function (M.op x y) = N.op (function x) (function y)
```

This code defines, for any two monoids `M`

and `N`

, a new type `Monoid.morphism M N`

, the type of monoid homomorphisms from M to N. To construct a term of this type, you need to provide the function between the underlying element types as well as proofs it is identity- and operation-preserving.

`M`

, the identity morphism from `M`

to itself.
```
def Monoid.morphism.id (M: Monoid): Monoid.morphism M M := {
function := fun x => x,
identity_preserving := by simp,
operation_preserving := by simp
}
```

Note that the input type is `Monoid`

, while the output type is a `Monoid.morphism M M`

. The type of the output depends on the input value - that’s a dependent type!

Luckily the proofs of structure-preserving-ness were simple enough that Lean could handle them “by simp”.

We can also define the composition of two monoid morphisms:```
def Monoid.morphism.comp (f: Monoid.morphism M1 M2) (g: Monoid.morphism M2 M3): Monoid.morphism M1 M3 := {
function := fun x => g.function (f.function x),
identity_preserving := by {simp; rw [f.identity_preserving, g.identity_preserving]}
operation_preserving := by {intros; simp; rw [f.operation_preserving, g.operation_preserving]}
}
```

This definition takes as input two monoid homomorphisms from M1 to M2 and from M2 to M3 and returns a monoid homomorphism from M1 to M3. Note that the types of `M1, M2, M3`

are actually implicit in the definition. It would be more verbose to explicitly introduce them, but Lean4 allows this shorthand.

**TODO**: some interesting theorem to cap off this post.

*Thursday December 28 2023.*

I was reading about allopatric speciation, which occurs in evolution when two groups of members of some species become geographically isolated from each other. The evolutionary paths of the two groups diverge and eventually they are considered different species.

I guess some biologists can interpret geographical location as a genomic variable, or in the other direction, had generalized their view of genome to include geography. In that case, allopatric and sympatric speciation (which occurs without geographic separation) are really the same, just with different trait variables.

Then, I was reading this other paper where they modeled the evolution of population genotypes as a non-local reaction-diffusion PDE, and I saw the connection to the replicator-mutator equation, which I first heard about in this Youtube video, “Biology as information dynamics” by John Baez.

The reaction-diffusion equation is often written as follows, which I will call equation 1: \[
\frac{du}{dt} = \Delta u + f(u)
\] I actually don’t like this notation, and prefer the following, which will be equation 2: \[
\frac{du}{dt} = \Delta u + f \circ u
\] The difference is subtle, but in equation 2, f is applied *pointwise* to the field u - this corresponds to a *local* reaction term. This is usually the situation being described even when equation 1 is written.

In an evolutionary model, the fitness of each individual doesn’t just depend on the population density in a neighborhood of their genotype, it depends on the whole ecology, that is, the whole distribution of genotypes. This is actually better reflected by equation 1, where the reaction term depends, at each point in genospace, on the *entire* distribution of genotypes across the whole genospace.

I almost forgot to mention, \(\Delta\) is the Laplace operator, and models diffusion over the genospace as a result of mutation.

*Thursday December 21 2023.*

Wikipedia defines a semiautomon as

a set \(S\) of states,

a set \(I\) of inputs,

a function \(S \times I \to S\) that updates the state given an input. Let’s call it \(\mathsf{step}\).

This is a deterministic automaton: when put in an initial state \(s_0\) and given a sequence of inputs \(i_0, i_1, i_2, \dots\), the resulting sequence of states is uniquely determined by the recurrence relation \(s_{t+1} = \mathsf{step}(s_t,i_t)\), where \(t=0,1,2,\cdots\).

A fair warning, everything I’m about to say is covered in depth on the Wikipedia page linked above. Alas…

Let \(\mathrm{Seq}(S)\) denote the set of all possible sequences of states, and likewise \(\mathrm{Seq}(I)\) the set of possible sequences of inputs.

Let \(Seq[S]\) and \(Seq[I]\) denote sequences of elements of \(S\) and \(I\), respectively. What we have described is a function \(\mathrm{run}: S \times Seq[I] \to Seq[S]\). The property that \(run\) satisfies is illustrated by the following commutative diagram:

In other words, starting in the top left corner with a initial state/input sequence pair \((s_0,(i_0,i_1,i_2,\dots)) \in S \times Seq[I]\), the following processes do the same thing

first apply “run”, yielding the sequence \((s_0,s_1,s_2,\dots)\), and then take the

*tail*of that sequence, which is \((s_1,s_2,s_3,\cdots)\).split the input sequence into its head/tail pair, leading to \((s_0,i_0,(i_1,i_2,i_3,\dots))\). Then apply \(f\) to \((s_0,i_0)\) in the first two positions to get \((s_1,(i_1,i_2,i_3,\dots))\). Finally apply run to get \((s_1,s_2,s_3,\cdots)\).

We can view the update rule as an action of \(I\) on \(S\). This uniquely determines a monoid action of the free monoid on I, on S.

*Tuesday November 7 2023.*

One definition of a “spatio-temporal” dynamical system is that you have

- a set \(X\) of points called the spatial domain,
- a set \(Y\) of states,
- a monoid \(T\) representing the time variable,
- an evolutionary operator (monoid action) \[f:T \times S \to S\] where \(S \subseteq (X \to Y)\) is a subset of functions \(X \to Y\).

For example, heat diffusion in 3 dimensions, where \(X = \mathbb{R}^3\), \(Y=\mathbb{R}\), \(T=[0,\infty)\), and the evolutionary operator \(f^t\) involves the heat kernel. Another example is Conway’s Game of Life, where \(X = \mathbb{Z}^2\), \(Y=\{0,1\}\), and \(T=\mathbb{N}\).

Another important property of spatiotemporal systems is they often have spatial symmetries, in the sense there is a group \(G\) along with a group action \((g,x) \mapsto g \* x\), such that the corresponding dynamical system is *equivariant* with respect to the action of \(G\), meaning \[
f(t,g\*x)=g\*f(t,x)
\]

Definition 1: Let \(G\) be a group, let \(X\) be a set, and let the mapping \((g, x) \mapsto g \ast x\) be a group action of \(G\) on \(X\). The group action is called *transitive* if for any \(x_1, x_2 \in X\), there exists \(g \in G\) such that \(g \ast x_1 = x_2\).

By “vacuum state” I just mean a constant states, i.e. a function \(u: X \to Y\) such that there exists some \(y \in Y\) such that \(u(x) = y\) for all \(y\). Some people just write \(u \equiv y\). Anyway, because of the transitive group action, if \(u\) is constant, then so is \(f(t,u)\) for all \(t \in T\). Let’s prove it!

Lemma 1: Suppose the group action \((g,x) \mapsto g\ast x\) is transitive. Then \(u \in Y^X\) is constant iff. \(u = g\ast u\), \(\forall g \in G\).

Proof: For the forward direction, suppose \(u\) is constant, \(u \equiv y\), and let \(g \in G\). Then \(u(x)=(g\ast u)(x)=y\) for all \(x \in X\), so \(g\ast u = u\). For the other direction, suppose \(g\ast u=u\) for all \(g \in G\). We want to show \(u\) is constant, so let \(x_1,x_2 \in X\). Since \(G\) is transitive, we know there exists \(g \in G\) such that \(x_2 = g\ast x_1\). Thus \(u(x_2) = u(g\ast x_1) = (g\ast u)(x_1) = u(x_1)\). Thus, \(u\) is constant. QED.

The next lemma shows that if the group action is transitive, and the dynamical system is equivariant with respect to the group action, then constant states will remain constant.

Lemma 2: Suppose the action \((g,x) \mapsto g \ast x\) is transitive, and suppose \(f\), the action of \(T\) on \(Y^X\), is \(G\)-equivariant, i.e. \(f(t,g\ast u) = g\ast f(t,u)\) for all \(u \in Y^X\), \(g \in G\), and \(t \in T\). Then if \(u \in Y^X\) is constant, so is \(f(t,u)\) for any \(t \in T\).

Proof: Suppose \(u\) is constant, i.e. \(u \equiv y\) for some \(y \in Y\). Now let \(t \in T\). We wish to show \(f(t,u)\) is constant. By Lemma 1, it suffices to show \(g\ast f(t,u)=f(t,u)\) for all \(g \in G\). So let \(g \in G\). By Lemma 1, since \(u\) is constant we know \(g\ast u = u\). By assumption we have \(g\ast f(t,u)=f(t,g\ast u)\). Replacing \(g\ast u=u\), we have \(g\ast f(t,u) = f(t,u)\), the desired result. QED.

Corollary: If the action $(g,x) g x $ is transitive and the action of \(f^t\) on \(Y^X\) is \(G\)-equivariant, then there is a unique monoid action of \(\phi:T\times Y \to Y\) satisfying

\[ f \circ (1_T \times i) = i \circ \phi \]

\[ \[\begin{tikzcd} \bullet && \bullet \arrow[from=1-1, to=1-3] \end{tikzcd}\] \]

*Sunday November 5 2023.*

why free monoid? Just because I think they are cool.. For example, a *discrete-time dynamical system*: you have a set \(X\) of states and function \(f:X \to X\) which updates the state. Any initial state \(x_0 \in X\) has a trajectory \(\{x_0,x_1,x_2,\cdots\}\) given by \(x_t = f^t(x_0)=f(f(\cdots(f(x_0))\cdots))\) (\(f\) is applied \(t\) times). It obeys \(f^{s+t}(x_0) = f^s(f^t(x_0))\) which is derived from the fact that the set of natural numbers \(\mathbb{N}=\{0,1,2,\cdots\}\) along with the operation of addition forms a *monoid*, sometimes denoted as \((\mathbb{N},+)\) to emphasize the monoid consists of both a set *and* the operation. This monoid is special because it is the *free monoid on 1 generator*.

Given a set \(S=\{x,y,z\}\) of “symbols”, you can form the *free monoid* on \(S\) (aka the free monoid generated by \(S\)) which consists of all the finite sequences of elements of \(S\). These are sometimes called “words” of elements in \(S\), and the set of all words is often denoted by \(S^\*\), where * is the Kleene star. For example, some finite sequences of the aforementioned \(S\) could be written as \([x,x]\), \([x,y]\), \([y,x]\), \([x,x,x]\), \([x,x,x,x]\), the empty list \([]\), etc. The *monoid operation* on these lists is to concatenate them, e.g. \([x,y] + [y,z] = [x,y,y,z]\). The empty list is an identity element for this operation because concatenating it to any list has no effect.

Returning to the case of 1 generator, suppose \(S\) contains just one symbol, \(S = \{ x \}\). Then the free monoid on \(S\) consists of \(S^\*=\{[], [x], [x,x], [x,x,x], [x,x,x,x],\cdots\}\). Notice that

- each element of this monoid can be represented by its length
- if you concatenate elements of length \(m\) and \(n\), the result has length \(m+n\)

The two properties described above characterize the monoid of natural numbers. What we have just shown is that the free monoid on 1 generator *is isomorphic to* the monoid of natural numbers. In fact, if we label the list containing \(n\) \(x's\) by \([x]^n\), then the map \(n \mapsto [x]^n\) is the isomorphism!

In the language of category theory, there is a functor \(F:Set \to Mon\) from the category of sets to the category of monoids called the free monoid functor. As we just showed, \(F(1) \cong (\mathbb{N},+)\). Being a functor, if \(A\) and \(B\) are two sets with the same cardinality (same size) then \(F(A)\) is isomorphic to \(F(B)\). Also, for any sets \(X\) and \(S\), the actions of \(F(S)\) on \(X\) are in one-to-one correspondence with maps \(S \to (X \to X)\). This is why earlier we could specify an action of \(\mathbb{N}\) on \(X\) by choosing some point \(1 \to (X \to X)\), i.e. some function \(f: X \to X\).

In computer science these are called transition systems and are apparently typically defines in terms of a *graph* where each vertex represents a state of the system, and each vertex has a labelled set of outgoing edges going to other states. With non-determinism, you already have plenty of interesting problems, such as *speedrunning*: suppose my initial state is \(x_0 \in X\) and I have a target state \(x^* \in X\) - what sequence of actions will get me there quickest?

*Saturday October 28 2023.*

Suppose you want to solve a **linear partial differential equation** of order \(m\) in \(n\) variables: for example when \(n=3\) and \(m=2\) you might be considering the wave equation in 2 dimensions: \[
\frac{\partial^2 u}{\partial t^2} - \frac{\partial^2 u}{\partial x^2} - \frac{\partial^2 u}{\partial y^2} = 0
\] This equation can be written as \(L[u]=0\), where \(L\) is the linear differential operator \(L[u] = \partial_t^2[u] - \partial_x^2[u] - \partial_y^2[u]\). (Yes, \(L\) is a polynomial over \(\partial_t,\partial_x,\partial_y\), and yes, it is order \(2\) because this polynomial has degree \(2\).) In general, given some open set \(U \subseteq \mathbb{R}^n\) and linear differential operator \(L:C^m(U,\mathbb{R}) \to C^0(U,\mathbb{R})\), you want to find solutions \(u \in C^m(U,\mathbb{R})\) such that \(L[u]=0\). Because \(L\) is linear, the set of all solutions \(u \in C^m(U,\mathbb{R})\) such that \(L[u] = 0\) is itself a vector space called the kernel (or null space) or \(L\). In notation, if \(S\) is the set of solutions to the equation \(L[u] = 0\), then \(S = \mathrm{kernel}(L) = \{ u \in C^m(U,\mathbb{R}) \mid L[u]=0 \}\). It’s not hard to check these solutions form a vector space: try it! (solution: because \(L\) is linear, for any \(u,u' \in S\), \(L[u+u']=L[u]+L[u']=0+0=0 \implies u+u' \in S\) and for any \(u \in S\) and \(c \in \mathbb{R}\), \(L[c\cdot u]=c\cdot L[u]=c\cdot 0=0 \implies c\cdot u \in S\). And don’t forget: \(0 \in S\) because \(L[0]=0\).)

To pose an initial value problem (IVP) you need one of the dimensions to be time - say it’s the first dimension. Now let \(D = \{(0,x) \mid x \in \mathbb{R}^{n-1}\}\) be the subset of the domain where \(t=0\). Given some initial data \(f: D \to \mathbb{R}\), the initial condition becomes \(u|_D=f\), so the IVP is \[ L[u] = 0, \quad u|_D = f.\] We can write the solution space of this problem as \(S = \mathrm{kernel}(L) \cap R(D,f)\), where \(R(D,f)=\{u \mid u|_D =f\}\) (R is for restriction). \(S\) is not a vector space, because if \(u,u' \in S\) are solutions, then \((u+u')|_D = u|_D + u'|_D = f+f=2f\), which is not equal to \(f\) unless \(f=0\). What we have instead is that \(S\) is an affine space over \(\mathrm{kernel}(L) \cap R(D,0)\). This follows from the more general argument that \(R(D,f)\) is an affine space over \(R(D,0)\) - try proving it! (solution: if \(u \in R(D,f)\) and \(v \in R(D,0)\) then \((u+v)|_D = u|_D + v|_D = f+0 = f\).)

An initial boundary value problem (IBVP) in \(n\) variables might take the form \[ L[u] = 0, \quad u|_{t=0} = f, \quad u|_{x=0} = g \] where \((x,t) \in \mathbb{R}^{n-1} \times (0,\infty)\), \(f:\mathbb{R}^{n-1} \to \mathbb{R}\), and \(g:(0,\infty) \to \mathbb{R}\). (For an example take Shearer & Levy 4.2.2.) So, we have some initial condition \(f\) along with some forcing term \(g\) which is applied at \(x=0\) for all time \(t\). We want to find a solution \(u \in \mathrm{kernel}(L) \cap R(D_1,f) \cap R(D_2,g)\), where \(D_1 = \mathbb{R}^{n-1}\) and \(D_2=(0,\infty)\).

Sometimes, as in the case of the wave equation, splitting the problem into two other IBVPs leads to a solution. We consider the IBVPs \[ L[u] = 0, \quad u|_{t=0} = 0, \quad u|_{x=0} = g \] and \[ L[u] = 0, \quad u|_{t=0} = f, \quad u|_{x=0} = 0 \] In the first, the initial condition is replaced with zero, and in the second the boundary condition is replaced with zero.

We can show if \(u_1\) is a solution to the first and \(u_2\) is a solution to the second, it follows that \(u_1+u_2\) is a solution to the original IBVP, i.e. it belongs to the solution space \(S = \mathrm{kernel}(L) \cap R(D_1,f) \cap R(D_2,g)\) where \(D_1=\mathbb{R}^{n-1}\) and \(D_2 = (0,\infty)\). If \(u_1 \in \mathrm{kernel}(L) \cap R(D_1,f) \cap R(D_2,0)\) and \(u_2 \in \mathrm{kernel}(L) \cap R(D_1,0) \cap R(D_2,g)\), then

\(u_1+u_2 \in \mathrm{kernel}(L)\) since \(u_1,u_2 \in \mathrm{kernel}(L)\) and it is a vector space,

\(u_1+u_2 \in R(D_1,f)\) since \(u_1 \in R(D_1,f)\), \(u_2 \in R(D_1,0)\), and \(R(D_1,f)\) is an affine space over \(R(D_1,0)\),

\(u_1+u_2 \in R(D_2,g)\) since \(u_1 \in R(D_2,0)\), \(u_2 \in R(D_2,g)\), and \(R(D_2,g)\) is an affine space over \(R(D_2,0)\),

Thus \(u_1+u_2 \in S\), so it solves the IBVP.

Appendix: seems like we are using the following theorem: if \(\{(D_i,f_i)\}\), \(i \in I\), is a (finite) set of pairs where each \(D_i \subseteq U\) and \(f:D_i \to \mathbb{R}\), and \(\{u_i\}_{i \in I} \subseteq C(U,\mathbb{R})\) satisfies \(u_i \in R(D_i,f_i) \cap \bigcap_{j \neq i} R(D_j,0)\) for all \(i\), then \(\sum_i u_i \in \bigcap_i R(D_i,f_i)\).

But it’s just a special case of the more general fact about affine spaces: if \(\{X_i\}\), \(i \in I\) is a (presumably finite) collection of subspaces of some vector space \(X\), and \(\{A_i\}\) is a collection of affine spaces, each \(A_i\) affine over \(X_i\), and if \(\{a_i\}\) is a collection such that \(a_i \in A_i\) and \(a_i \in X_j\) for all \(j \neq i\), and \(a = \sum_i a_i\), then \(a \in A_i\) for all \(i\).

*Friday October 27 2023.*

I’ve been wanting to learn to use Lean4 for a while, but it’s so weird and hard. I followed the instructions on this page and ran the following command:

`curl https://raw.githubusercontent.com/leanprover/elan/master/elan-init.sh -sSf | sh`

then rebooted by machine.

Next the page said I need to to install an editor like VS Code. But last I checked programming languages should not depend on an editor.

At this point if we run `elan`

or `lean`

in terminal they seem to be installed. What I would like to do is run `lean example.lean`

and have it return something. So, I followed the instructions on this page. It said to run `lake +leanprover/lean4:nightly-2023-02-04 new my_project math`

which makes a new folder called `my_project`

. Then I went into `my_project`

and ran `lake update`

, which “downloaded component: lean” (~200Mb). Then I ran `lake exe cache get`

and downloaded a bunch more files. Next we make a new folder, `MyProject`

inside `my_project`

, and create the file `test.lean`

. In this file I wrote

```
import Mathlib.Topology.Basic
#check TopologicalSpace
```

Then in termimal I ran the command

`lean test.lean`

The output was

```
test.lean:1:0: error: unknown package 'Mathlib'
test.lean:3:7: error: unknown identifier 'TopologicalSpace'
test.lean:3:0: error: unknown constant 'sorryAx'
```

So clearly I did something wrong.

*Thursday March 2 2023.*

If \(A\) and \(B\) are sets, then so is \(B^A\), the set of functions \(A \to B\). In general, we have that \(B^A\) is an exponential object in the category of sets. The superscript notation is justified when \(A\) and \(B\) have finitely many elements, because then \(| B^A | = |B|^{|A|}\), where vertical bars denote the cardinality of a set.

Given some fixed set \(S\), consider the mapping \(F:A \mapsto A^S\). This defines an endofunctor on the category of sets: if \(f:A \to B\) is a function between sets \(A\) and \(B\), then \(F(f)\) is the corresponding map \(A^S \to B^S\) which takes a function \(g\in A^S\) and returns \(f \circ g \in B^S\).

Likewise, there is a contravariant functor \(G:A \mapsto S^A\) which sends every function \(f : A \to B\) to the map \(S^B \to S^A\), which takes each \(g \in S^B\) and returns \(g \circ f \in S^A\). In the case when \(S = 2\), the set with two elements, we can regard functions \(A \to 2\) as subsets of \(A\). Thus, the mapping \(A \mapsto 2^A\) is the contravariant power set functor. Given two sets \(A\) and \(B\), and their power sets \(2^A\) and \(2^B\), the power set functor sends any map \(f:A \to B\) to the preimage map \(2^B \to 2^B\) given by \(U \mapsto f^{-1}(U) = \{a \in A \mid f(a) \in U\}\) for each subset \(U \in 2^B\).

*Friday February 3 2023.*

Proving mathematical theorems on computers is still apparently pretty hard. The most well-known successes are maybe the four color theorem, the Kepler conjecture, and the Liquid Tensor Experiment. One interesting metric is Freek Wiedijk’s list of 100 mathematical theorems and their formalization progress in various languages. Currently Isabelle and HOL Light are tied for first at 87, but my personal favorite is Lean with its extensive mathematical library, mathlib.

While *automated theorem proving* (ATP) refers to automatically finding the *proofs* of theorems, given their statements, *autoformalization* refers to converting natural language into formal mathematics. It’s no surprise that autoformalization didn’t get as much attention as ATP until recently, because without *language models* like GPTs, it’s basically impossible!

One can imagine an ideal AF system: input an old mathematics paper and out comes its formalization in some formal language of your choosing. Not only that, but once you’ve formalized all the existing papers, you get a whole formal database to search! Anyway, the main obstacle seems to be *lack of labeled data*, which would be natural language/formal statement pairs. There seems to be hope, though - even if you only encode the *statements* of theorems by hand (not their proofs), this would still be useful as you could retroactively rediscover many proofs, maybe using the natural language proof as a conditioning variable.

Sites like Math Stack Exchange (MSE) have a pretty strict question/answer format where every question needs to be well-defined to where it can be answered precisely. But this is precisely the form of the natural language theorem/proof pairs needed to train an AF system. Could MSE assist in this effort by allowing volunteers to transcribe questions and answers to their corresponding formalization?

First Experiments with Neural Translation of Informal to Formal Mathematics by Qingxiang Wang, Cezary Kaliszyk, and Josef Urban (2018).

Autoformalization with Large Language Models by Yuhuai Wu, Albert Q. Jiang, Wenda Li, Markus N. Rabe, Charles Staats, Mateja Jamnik, and Christian Szegedy (2022).

A Promising Path Towards Autoformalization and General Artificial Intelligence by Christian Szegedy (2020). (paywalled… wait a minute, didn’t this guy invent batch normalization…?)

Exploration of Neural Machine Translation in Autoformalization of Mathematics in Mizar by Qingxiang Wang, Chad Brown, Cezary Kaliszyk, and Josef Urban (2019).

The work on univalent foundations seems relevant, but I can’t see how. Need to explore more.

*Sunday November 20 2022.*

```
K = [
[1, 1, 1],
[1, 8, 1],
[1, 1, 1]
]
```

```
def f(u: int) -> int:
if u == 3 or u == 10 or u == 11:
return 1
else:
return 0
```

The idea is that we are given an array `U`

of 1s and 0s, which is the state of Game of Life, and at each point we count up the number of live neighbors, +8 if the center is alive and +0 otherwise, and store these values in a new array `V`

of the same shape. This corresponds to the Game of Life as follows:

- center dead, 3 live neighbors ⇔ v = 3
- center alive, 2 live neighbors ⇔ v = 10
- center alive, 3 live neighbors ⇔ v = 11

You can represent other cellular automata like this too - see arXiv.1809.02942. I realized this after playing around with neuralpatterns.io (**flashing colors warning**) that lets you choose the kernel and activation function in-browser.

*Sunday November 6 2022.*

I started by reading the raw html of my friends websites or random internet people I thought were cool. Eventually I started hosting a page on neocities where I wrote the html from scratch. I also made a portfolio on Github Pages in html. I used Jekyll for a while, and it is good, it is used by many people for their portfolios and etc. But I found myself spending too much time learning how to use it… and I realized that I could just write a static site generator by myself using **pandoc**. So, that is what I do now: I have some folders with markdowns (like the one I am writing in now) and a Python script that loops through that, converts to html using pandoc, saves to a ./static/ folder, and then deploys al that to Netlify.

Edit: click here to view script used to build site.