freemonoid.xyz


One in a million chance

Tuesday May 14 2024.

If something has one-in-a-hundred chance, and you attempt it a hundred times, it is guaranteed to happen right?

Not quite, turns out it is only \(\approx 63\%\).

#tries chance
0 0%
1 1%
2 1.99%
3 2.97%
100 63.3%
200 86.6%
500 99.3%

You can see why by the following calculation:

\[ 1 - \left( 1 - \frac{1}{100}\right)^{100} \,=\, 0.6339... \]

In fact, replace a one-in-hundred with one-in-million, and the result will be about the same after a million attempts:

\[ 1 - \left( 1 - \frac{1}{1000000}\right)^{1000000} \,=\, 0.6321... \]

So what is this magic number? You can find out by computing the limit as \(n\to\infty\), making use of the following exponential identity:

\[ \lim_{n\to\infty} \left( 1 + \frac{x}{n} \right)^{n} \,=\, e^x \]

Compute the limit by using the above identity with \(x=-1\):

\[ \lim_{n\to\infty} 1 - \left( 1 - \frac{1}{n}\right)^{n} \quad = \quad 1 - e^{-1} \quad = \quad 0.63212055... \]

One in a million chance

Tuesday May 14 2024.

If something has one-in-a-hundred chance, and you attempt it a hundred times, it is guaranteed to happen right?

Not quite, turns out it is only \(\approx 63\%\).

#tries chance
0 0%
1 1%
2 1.99%
3 2.97%
100 63.3%
200 86.6%
500 99.3%

You can see why by the following calculation:

\[ 1 - \left( 1 - \frac{1}{100}\right)^{100} \,=\, 0.6339... \]

In fact, replace a one-in-hundred with one-in-million, and the result will be about the same after a million attempts:

\[ 1 - \left( 1 - \frac{1}{1000000}\right)^{1000000} \,=\, 0.6321... \]

So what is this magic number? You can find out by computing the limit as \(n\to\infty\), making use of the following exponential identity:

\[ \lim_{n\to\infty} \left( 1 + \frac{x}{n} \right)^{n} \,=\, e^x \]

Compute the limit by using the above identity with \(x=-1\):

\[ \lim_{n\to\infty} 1 - \left( 1 - \frac{1}{n}\right)^{n} \quad = \quad 1 - e^{-1} \quad = \quad 0.63212055... \]



Tuesday May 14 2024. If something has one-in-a-hundred chance, and you attempt it a hundred times, it is guaranteed to happen right?

Not quite, turns out it is only \(\approx 63\%\).

#tries chance
0 0%
1 1%
2 1.99%
3 2.97%
100 63.3%
200 86.6%
500 99.3%

You can see why by the following calculation:

\[ 1 - \left( 1 - \frac{1}{100}\right)^{100} \,=\, 0.6339... \]

In fact, replace a one-in-hundred with one-in-million, and the result will be about the same after a million attempts:

\[ 1 - \left( 1 - \frac{1}{1000000}\right)^{1000000} \,=\, 0.6321... \]

So what is this magic number? You can find out by computing the limit as \(n\to\infty\), making use of the following exponential identity:

\[ \lim_{n\to\infty} \left( 1 + \frac{x}{n} \right)^{n} \,=\, e^x \]

Compute the limit by using the above identity with \(x=-1\):

\[ \lim_{n\to\infty} 1 - \left( 1 - \frac{1}{n}\right)^{n} \quad = \quad 1 - e^{-1} \quad = \quad 0.63212055... \]

Public key encryption: a quicky in diagrams

Thursday May 9 2024.

TLDR: I explain public key (asymmetric) encryption using string diagrams. For the standard references, read New directions in cryptography by Diffe and Hellman (1976).

In symmetric key encryption you have 3 types of data:

and two one-way functions,

Given a message \(x\) and a key \(k\), one computes the encrypted message \(y = e(x,k)\) and can only recover \(x\) using the same key via \(x = d(y,k)\); that is, they satisfy

\[ d(e(x, k), k) = x \]

In fact, even stronger, a message encrypted with key \(k\) can only be decrypted using \(k\) again, no other key \(k'\) will work:

\[ d(e(x, k), k') = x \quad \iff \quad k = k', \quad \forall k' \]

The symmetric communication protocol goes like this: Alex wants to send a message \(x\) to Blair, and we assume they both know the key \(k\) beforehand. Alex computes the ciphertext \(y = e(x, k)\) and sends it across the channel. Blair receives the ciphertext and recovers the plaintext via \(x = d(y, k)\). Simple! Here is a diagram showing which parties observe which data:

Notice how the communication channel (highlighted in red) only sees the ciphertext \(y\), and not the plaintext \(x\) nor the key \(k\).

On the other hand, asymmetric encryption like RSA and ECC does not assume Alex and Blair share the key \(k\) beforehand. In this scenario, we have four data types:

and three one-way functions:

If \(k\) is a private key and \(p=p(k)\) is the corresponding public key, then \[ d(e(x, p(k)), k) = x \] In fact, even stronger as before: \[ d(e(x, p(k)), k') = x \quad \iff \quad k = k', \quad \forall k' \]

The asymmetric communication protocol now goes like this: Alex wants to send a message \(x\) to Blair, but they don’t share a private key. So, Blair takes their private key \(k\) and generates their public key \(p = p(k)\), then sends it across the channel to Alex. Alex then computes \(y = e(x, p)\) and returns the encrypted message back across the channel to Blair, who then recovers the original message via \(x = d(y, k)\).

As before, notice how the channel only sees the ciphertext and Blair’s public key, but not the original message nor Blair’s private key \(k\); furthermore, Alex never sees Blair’s private key either!

Of course, I never described how any such one-way functions with the above properties could be constructed. That’s where all the prime number bullshit comes in!

Public key encryption: a quicky in diagrams

Thursday May 9 2024.

TLDR: I explain public key (asymmetric) encryption using string diagrams. For the standard references, read New directions in cryptography by Diffe and Hellman (1976).

In symmetric key encryption you have 3 types of data:

and two one-way functions,

Given a message \(x\) and a key \(k\), one computes the encrypted message \(y = e(x,k)\) and can only recover \(x\) using the same key via \(x = d(y,k)\); that is, they satisfy

\[ d(e(x, k), k) = x \]

In fact, even stronger, a message encrypted with key \(k\) can only be decrypted using \(k\) again, no other key \(k'\) will work:

\[ d(e(x, k), k') = x \quad \iff \quad k = k', \quad \forall k' \]

The symmetric communication protocol goes like this: Alex wants to send a message \(x\) to Blair, and we assume they both know the key \(k\) beforehand. Alex computes the ciphertext \(y = e(x, k)\) and sends it across the channel. Blair receives the ciphertext and recovers the plaintext via \(x = d(y, k)\). Simple! Here is a diagram showing which parties observe which data:

Notice how the communication channel (highlighted in red) only sees the ciphertext \(y\), and not the plaintext \(x\) nor the key \(k\).

On the other hand, asymmetric encryption like RSA and ECC does not assume Alex and Blair share the key \(k\) beforehand. In this scenario, we have four data types:

and three one-way functions:

If \(k\) is a private key and \(p=p(k)\) is the corresponding public key, then \[ d(e(x, p(k)), k) = x \] In fact, even stronger as before: \[ d(e(x, p(k)), k') = x \quad \iff \quad k = k', \quad \forall k' \]

The asymmetric communication protocol now goes like this: Alex wants to send a message \(x\) to Blair, but they don’t share a private key. So, Blair takes their private key \(k\) and generates their public key \(p = p(k)\), then sends it across the channel to Alex. Alex then computes \(y = e(x, p)\) and returns the encrypted message back across the channel to Blair, who then recovers the original message via \(x = d(y, k)\).

As before, notice how the channel only sees the ciphertext and Blair’s public key, but not the original message nor Blair’s private key \(k\); furthermore, Alex never sees Blair’s private key either!

Of course, I never described how any such one-way functions with the above properties could be constructed. That’s where all the prime number bullshit comes in!



Thursday May 9 2024. TLDR: I explain public key (asymmetric) encryption using string diagrams. For the standard references, read New directions in cryptography by Diffe and Hellman (1976).

In symmetric key encryption you have 3 types of data:

and two one-way functions,

Given a message \(x\) and a key \(k\), one computes the encrypted message \(y = e(x,k)\) and can only recover \(x\) using the same key via \(x = d(y,k)\); that is, they satisfy

\[ d(e(x, k), k) = x \]

In fact, even stronger, a message encrypted with key \(k\) can only be decrypted using \(k\) again, no other key \(k'\) will work:

\[ d(e(x, k), k') = x \quad \iff \quad k = k', \quad \forall k' \]

The symmetric communication protocol goes like this: Alex wants to send a message \(x\) to Blair, and we assume they both know the key \(k\) beforehand. Alex computes the ciphertext \(y = e(x, k)\) and sends it across the channel. Blair receives the ciphertext and recovers the plaintext via \(x = d(y, k)\). Simple! Here is a diagram showing which parties observe which data:

Notice how the communication channel (highlighted in red) only sees the ciphertext \(y\), and not the plaintext \(x\) nor the key \(k\).

On the other hand, asymmetric encryption like RSA and ECC does not assume Alex and Blair share the key \(k\) beforehand. In this scenario, we have four data types:

and three one-way functions:

If \(k\) is a private key and \(p=p(k)\) is the corresponding public key, then \[ d(e(x, p(k)), k) = x \] In fact, even stronger as before: \[ d(e(x, p(k)), k') = x \quad \iff \quad k = k', \quad \forall k' \]

The asymmetric communication protocol now goes like this: Alex wants to send a message \(x\) to Blair, but they don’t share a private key. So, Blair takes their private key \(k\) and generates their public key \(p = p(k)\), then sends it across the channel to Alex. Alex then computes \(y = e(x, p)\) and returns the encrypted message back across the channel to Blair, who then recovers the original message via \(x = d(y, k)\).

As before, notice how the channel only sees the ciphertext and Blair’s public key, but not the original message nor Blair’s private key \(k\); furthermore, Alex never sees Blair’s private key either!

Of course, I never described how any such one-way functions with the above properties could be constructed. That’s where all the prime number bullshit comes in!

What are language models good for?

Sunday April 28 2024.

What are language models even good for? TODO

What are language models good for?

Sunday April 28 2024.

What are language models even good for? TODO



Sunday April 28 2024. What are language models even good for? TODO

Panic relief

Sunday April 28 2024.

Anxiety! Panic! a cruel dictator that often rule and ruin my life. Here are my coping strategies:

  1. Understand and avoid short-term triggers. Mine include (TW) any mention of the heart, heartrate, pulse, or diseases related to the blood or heart.

  2. Understand and avoid medium-term causes. For me this includes more than 1 cup of coffee per day, and skipping SSRI medication.

  3. Recognize panic symptoms. For me are psychosomatic chest pain (random pain spikes, dull chest aches) and shortness of breath and unsatisfying breaths. Your mind can exaggerate pain and only notice pain near your area of concern (e.g. chest). Another one is jaw pain in the TMJ muscle from grinding or clenching, which may occur unconsciously during anxious sleep. Since often occur to me at night, I become acutely aware no urgent care center is open.

  4. Relax muscles and perform breathing exercises. The shortness of breath often comes from unconscious tension in the chest, so relax your chest. A normal unstrained breath should raise and lower the stomach, not the chest. My preferred exercise is 4 seconds each on in/hold/out/hold. Try listening to a youtube video guide or asmr. For nighttime clenching, consider a mouth guard.

  5. In last resort, immediate relief drugs. The main one are alprazolam (Xanax), lorazepam (Ativan), and diazepam (Valium). I have heard of beta blockers but they seem less relevant.

  6. Accept precarity of life and appreciate its absurd beauty. If you can read this, you already made it far: you are human, you are alive and with vision or e-reader and have internet access. Life owes us nothing, and we can’t obsess over every possible misfortune or else lose the human spirit.

Panic relief

Sunday April 28 2024.

Anxiety! Panic! a cruel dictator that often rule and ruin my life. Here are my coping strategies:

  1. Understand and avoid short-term triggers. Mine include (TW) any mention of the heart, heartrate, pulse, or diseases related to the blood or heart.

  2. Understand and avoid medium-term causes. For me this includes more than 1 cup of coffee per day, and skipping SSRI medication.

  3. Recognize panic symptoms. For me are psychosomatic chest pain (random pain spikes, dull chest aches) and shortness of breath and unsatisfying breaths. Your mind can exaggerate pain and only notice pain near your area of concern (e.g. chest). Another one is jaw pain in the TMJ muscle from grinding or clenching, which may occur unconsciously during anxious sleep. Since often occur to me at night, I become acutely aware no urgent care center is open.

  4. Relax muscles and perform breathing exercises. The shortness of breath often comes from unconscious tension in the chest, so relax your chest. A normal unstrained breath should raise and lower the stomach, not the chest. My preferred exercise is 4 seconds each on in/hold/out/hold. Try listening to a youtube video guide or asmr. For nighttime clenching, consider a mouth guard.

  5. In last resort, immediate relief drugs. The main one are alprazolam (Xanax), lorazepam (Ativan), and diazepam (Valium). I have heard of beta blockers but they seem less relevant.

  6. Accept precarity of life and appreciate its absurd beauty. If you can read this, you already made it far: you are human, you are alive and with vision or e-reader and have internet access. Life owes us nothing, and we can’t obsess over every possible misfortune or else lose the human spirit.



Sunday April 28 2024. Anxiety! Panic! a cruel dictator that often rule and ruin my life. Here are my coping strategies:

  1. Understand and avoid short-term triggers. Mine include (TW) any mention of the heart, heartrate, pulse, or diseases related to the blood or heart.

  2. Understand and avoid medium-term causes. For me this includes more than 1 cup of coffee per day, and skipping SSRI medication.

  3. Recognize panic symptoms. For me are psychosomatic chest pain (random pain spikes, dull chest aches) and shortness of breath and unsatisfying breaths. Your mind can exaggerate pain and only notice pain near your area of concern (e.g. chest). Another one is jaw pain in the TMJ muscle from grinding or clenching, which may occur unconsciously during anxious sleep. Since often occur to me at night, I become acutely aware no urgent care center is open.

  4. Relax muscles and perform breathing exercises. The shortness of breath often comes from unconscious tension in the chest, so relax your chest. A normal unstrained breath should raise and lower the stomach, not the chest. My preferred exercise is 4 seconds each on in/hold/out/hold. Try listening to a youtube video guide or asmr. For nighttime clenching, consider a mouth guard.

  5. In last resort, immediate relief drugs. The main one are alprazolam (Xanax), lorazepam (Ativan), and diazepam (Valium). I have heard of beta blockers but they seem less relevant.

  6. Accept precarity of life and appreciate its absurd beauty. If you can read this, you already made it far: you are human, you are alive and with vision or e-reader and have internet access. Life owes us nothing, and we can’t obsess over every possible misfortune or else lose the human spirit.

Cyberattacks!

Friday April 26 2024.

TODO

Cyberattacks!

Friday April 26 2024.

TODO



Friday April 26 2024. TODO

All stories are stories about people

Friday April 5 2024.

“All stories are stories about people”.

Remind me to update this post whenever I figure out who came up with this concept first.

Some scifi stories are about automation and the rise of machines, at least on the surface. But they really speak about capitalist human relations between worker and owner.

Similarly stories about disaster are not really about the disaster itself, but about the people you fear to lose.

People are the atoms of our lives, not physical atoms - we had to discover those.

All stories are stories about people

Friday April 5 2024.

“All stories are stories about people”.

Remind me to update this post whenever I figure out who came up with this concept first.

Some scifi stories are about automation and the rise of machines, at least on the surface. But they really speak about capitalist human relations between worker and owner.

Similarly stories about disaster are not really about the disaster itself, but about the people you fear to lose.

People are the atoms of our lives, not physical atoms - we had to discover those.



Friday April 5 2024. “All stories are stories about people”.

Remind me to update this post whenever I figure out who came up with this concept first.

Some scifi stories are about automation and the rise of machines, at least on the surface. But they really speak about capitalist human relations between worker and owner.

Similarly stories about disaster are not really about the disaster itself, but about the people you fear to lose.

People are the atoms of our lives, not physical atoms - we had to discover those.

Multicellularity

Wednesday March 20 2024.

Humans are made of cells, and societies are made of humans. Are we, like, cells in a “social organism”?

Some references to find out include:

Multicellularity

Wednesday March 20 2024.

Humans are made of cells, and societies are made of humans. Are we, like, cells in a “social organism”?

Some references to find out include:



Wednesday March 20 2024. Humans are made of cells, and societies are made of humans. Are we, like, cells in a “social organism”?

Some references to find out include:

Solipsistic theorem

Thursday March 7 2024.

I was talking to a friend about “mind-body dualism” and started thinking about Mealy machines, a mathematical formalization of systems with input, output, and state (memory). Formally it consists of:

Given an initial state \(s_0 \in S\) and a sequence of inputs \(i_0,i_1,i_2,...\in I\) a Mealy machine can deterministically produce a sequence of outputs \(o_1,o_2,o_3,... \in O\) and states \(s_1,s_2,s_3,... \in S\) by using the equation \((s_{t+1},o_{t+1}) = f(s_t,i_t)\).

Note: animation was inspired by this blog post.

In “mind-body dualism” I imagine there are two Mealy machines, hooked up to one another, where one machine is the world, including the body, and other is the mind (doesn’t matter which is which due to symmetry). We observe our senses, process thoughts, take action. Likewise the world “observes” our action, processes it via physics, and “acts” on us by sending us a new observation in an endless cycle… 🌹🪰😊

You could formally define a “dual system” (my terminology) to consist of

Thus, \(S_1\) and \(S_2\) are the state sets of the two machines, and \(I_1\) and \(I_2\) are their “interfaces” (input/output and output/input respectively). This is reminiscent of a partially observed Markov decision process, a class of Markov decision process which are commonly used in reinforcement learning.

Next consider multiple agents: define a multi agent system with \(n\) agents to consist of

This is like having one central Mealy machine, the environment, hooked up to a bunch of other Mealy machines, the agents. Similar concepts are used in multi-agent reinforcement learning. When \(n=1\), it’s exactly the same as a dual system.

Theorem (informal): any agent in a multi agent system can view themselves being part of a dual system, where all the other agents are part of the environment. See the figure below, where the pink box designates the enlarged environment containing every agent except \(j\), where \(j=1\).

Theorem (more formal): choose an agent \(j\). Then we can turn the multi agent system into a dual system where \(S_1 = S_1 \times \prod_{i \neq j} (S_2^i \times I_1^i \times I_2^i)\), \(S_2 = S_2^j\), \(I_1 = I_1^j\), \(I_2 = I_2^j\), and \(f_2 = f_2^j\). As for \(f_1\), it can be described as follows:

Todo. What is the philosophical history of this argument, if there’s any? It sounds like something Douglas Hofstadter would say. It also reminds me of the idea of Cartesian framing. It’s pretty naive, but I’m just doing what Lawvere (1992) said:

It is my belief that in the next decade and in the next century the technical advances forged by category theorists will be of value to dialectical philosophy, lending precise form with disputable mathematical models to ancient philosophical distinctions such as general vs. particular, objective vs. subjective, being vs. becoming, space vs. quantity, equality vs. difference, quantitative vs. qualitative etc.

Categorical systems theory. You could probably express it like Example 4.60 in polynomial functors book by Spivak. But they use Moore machines instead of Mealy machines, which in their language is a lens \(Sy^S \to Oy^I\). And that means I would have to redraw all the animations.

Solipsistic theorem

Thursday March 7 2024.

I was talking to a friend about “mind-body dualism” and started thinking about Mealy machines, a mathematical formalization of systems with input, output, and state (memory). Formally it consists of:

Given an initial state \(s_0 \in S\) and a sequence of inputs \(i_0,i_1,i_2,...\in I\) a Mealy machine can deterministically produce a sequence of outputs \(o_1,o_2,o_3,... \in O\) and states \(s_1,s_2,s_3,... \in S\) by using the equation \((s_{t+1},o_{t+1}) = f(s_t,i_t)\).

Note: animation was inspired by this blog post.

In “mind-body dualism” I imagine there are two Mealy machines, hooked up to one another, where one machine is the world, including the body, and other is the mind (doesn’t matter which is which due to symmetry). We observe our senses, process thoughts, take action. Likewise the world “observes” our action, processes it via physics, and “acts” on us by sending us a new observation in an endless cycle… 🌹🪰😊

You could formally define a “dual system” (my terminology) to consist of

Thus, \(S_1\) and \(S_2\) are the state sets of the two machines, and \(I_1\) and \(I_2\) are their “interfaces” (input/output and output/input respectively). This is reminiscent of a partially observed Markov decision process, a class of Markov decision process which are commonly used in reinforcement learning.

Next consider multiple agents: define a multi agent system with \(n\) agents to consist of

This is like having one central Mealy machine, the environment, hooked up to a bunch of other Mealy machines, the agents. Similar concepts are used in multi-agent reinforcement learning. When \(n=1\), it’s exactly the same as a dual system.

Theorem (informal): any agent in a multi agent system can view themselves being part of a dual system, where all the other agents are part of the environment. See the figure below, where the pink box designates the enlarged environment containing every agent except \(j\), where \(j=1\).

Theorem (more formal): choose an agent \(j\). Then we can turn the multi agent system into a dual system where \(S_1 = S_1 \times \prod_{i \neq j} (S_2^i \times I_1^i \times I_2^i)\), \(S_2 = S_2^j\), \(I_1 = I_1^j\), \(I_2 = I_2^j\), and \(f_2 = f_2^j\). As for \(f_1\), it can be described as follows:

Todo. What is the philosophical history of this argument, if there’s any? It sounds like something Douglas Hofstadter would say. It also reminds me of the idea of Cartesian framing. It’s pretty naive, but I’m just doing what Lawvere (1992) said:

It is my belief that in the next decade and in the next century the technical advances forged by category theorists will be of value to dialectical philosophy, lending precise form with disputable mathematical models to ancient philosophical distinctions such as general vs. particular, objective vs. subjective, being vs. becoming, space vs. quantity, equality vs. difference, quantitative vs. qualitative etc.

Categorical systems theory. You could probably express it like Example 4.60 in polynomial functors book by Spivak. But they use Moore machines instead of Mealy machines, which in their language is a lens \(Sy^S \to Oy^I\). And that means I would have to redraw all the animations.



Thursday March 7 2024.

I was talking to a friend about “mind-body dualism” and started thinking about Mealy machines, a mathematical formalization of systems with input, output, and state (memory). Formally it consists of:

Given an initial state \(s_0 \in S\) and a sequence of inputs \(i_0,i_1,i_2,...\in I\) a Mealy machine can deterministically produce a sequence of outputs \(o_1,o_2,o_3,... \in O\) and states \(s_1,s_2,s_3,... \in S\) by using the equation \((s_{t+1},o_{t+1}) = f(s_t,i_t)\).

Note: animation was inspired by this blog post.

In “mind-body dualism” I imagine there are two Mealy machines, hooked up to one another, where one machine is the world, including the body, and other is the mind (doesn’t matter which is which due to symmetry). We observe our senses, process thoughts, take action. Likewise the world “observes” our action, processes it via physics, and “acts” on us by sending us a new observation in an endless cycle… 🌹🪰😊

You could formally define a “dual system” (my terminology) to consist of

Thus, \(S_1\) and \(S_2\) are the state sets of the two machines, and \(I_1\) and \(I_2\) are their “interfaces” (input/output and output/input respectively). This is reminiscent of a partially observed Markov decision process, a class of Markov decision process which are commonly used in reinforcement learning.

Next consider multiple agents: define a multi agent system with \(n\) agents to consist of

This is like having one central Mealy machine, the environment, hooked up to a bunch of other Mealy machines, the agents. Similar concepts are used in multi-agent reinforcement learning. When \(n=1\), it’s exactly the same as a dual system.

Theorem (informal): any agent in a multi agent system can view themselves being part of a dual system, where all the other agents are part of the environment. See the figure below, where the pink box designates the enlarged environment containing every agent except \(j\), where \(j=1\).

Theorem (more formal): choose an agent \(j\). Then we can turn the multi agent system into a dual system where \(S_1 = S_1 \times \prod_{i \neq j} (S_2^i \times I_1^i \times I_2^i)\), \(S_2 = S_2^j\), \(I_1 = I_1^j\), \(I_2 = I_2^j\), and \(f_2 = f_2^j\). As for \(f_1\), it can be described as follows:

Todo. What is the philosophical history of this argument, if there’s any? It sounds like something Douglas Hofstadter would say. It also reminds me of the idea of Cartesian framing. It’s pretty naive, but I’m just doing what Lawvere (1992) said:

It is my belief that in the next decade and in the next century the technical advances forged by category theorists will be of value to dialectical philosophy, lending precise form with disputable mathematical models to ancient philosophical distinctions such as general vs. particular, objective vs. subjective, being vs. becoming, space vs. quantity, equality vs. difference, quantitative vs. qualitative etc.

Categorical systems theory. You could probably express it like Example 4.60 in polynomial functors book by Spivak. But they use Moore machines instead of Mealy machines, which in their language is a lens \(Sy^S \to Oy^I\). And that means I would have to redraw all the animations.

4 types of categories to know about

Thursday February 29 2024.

If you wish to know more about categories (in the sense of category theory) then the following 4 types may become of importance to you:

  1. Concrete categories are ones where, roughly speaking, the objects are sets with some additional structure. Many objects in mathematics, like fields, vector spaces, groups, rings, and topological spaces, can be represented this way. For a detailed list of examples check out Example 1.1.3 in Category Theory in Context (free pdf). For example, if X is a topological space, then X would be an object in the category of topological spaces. Now, every topological space has an underlying set of points. This is represented by the forgetful functor from the category of topological spaces to the category of sets, sometimes denoted \(U:\mathrm{Top} \to \mathrm{Set}\). In that case \(U(X)\) would be the underlying set of points of \(X\).

  2. Monoidal categories. Many categories have a way to combine two objects, like the Cartesian product \(A \times B\) of sets, the direct product of two groups, the tensor product of two vector spaces, etc, that constitute a binary operation. Furthermore in each case there is a unit object which serves as an identity for the binary operation: the singleton set (with one element), the trivial vector space (with one point), etc. This is the premise of a monoidal category. Insert segue into Rosetta stone paper here.

  3. Cartesian closed categories. This type of category is commonly used to represent the type systems within some programming languages. Given two objects \(A\) and \(B\) in a “CCC”, you can form the product \(A \times B\) as well as the exponential \(B^A\), also written more suggestively as \(A \to B\). When \(A\) and \(B\) are sets (or types) then \(B^A\) can be understood as the set (type) of functions (or “programs”) from \(A\) to \(B\). If you combine this with point 2, so that the product \(\times\) obeys the monoidal laws, then you get a closed monoidal category. I need a better reference on this.

  4. Abelian categories. If you ever try to study algebraic topology you will come across the concept of homology (or cohomology) and the notion of a chain complex. Let \(Ab\) be the category of abelian groups (which is a concrete category as well as a monoidal category). A chain complex of abelian groups is a sequence \(A_0, A_1, A_2, \cdots\) of abelian groups, along with homomorphisms \(d_n: A_{n+1} \to A_n\) for each \(n\), such that \(d_{n} \circ d_{n+1} = 0\) for all \(n\), where on the left is the composition of \(d_{n+1}\) with \(d_n\) and on the right is the zero morphism from \(A_{n+2}\) to \(A_n\), which sends every element to zero. If one asks, “what is the most general category I can replace \(Ab\) with?” the answer would be one where you have zero morphisms, and a few other things… anyway, people already figured this out, and they are called abelian categories!

4 types of categories to know about

Thursday February 29 2024.

If you wish to know more about categories (in the sense of category theory) then the following 4 types may become of importance to you:

  1. Concrete categories are ones where, roughly speaking, the objects are sets with some additional structure. Many objects in mathematics, like fields, vector spaces, groups, rings, and topological spaces, can be represented this way. For a detailed list of examples check out Example 1.1.3 in Category Theory in Context (free pdf). For example, if X is a topological space, then X would be an object in the category of topological spaces. Now, every topological space has an underlying set of points. This is represented by the forgetful functor from the category of topological spaces to the category of sets, sometimes denoted \(U:\mathrm{Top} \to \mathrm{Set}\). In that case \(U(X)\) would be the underlying set of points of \(X\).

  2. Monoidal categories. Many categories have a way to combine two objects, like the Cartesian product \(A \times B\) of sets, the direct product of two groups, the tensor product of two vector spaces, etc, that constitute a binary operation. Furthermore in each case there is a unit object which serves as an identity for the binary operation: the singleton set (with one element), the trivial vector space (with one point), etc. This is the premise of a monoidal category. Insert segue into Rosetta stone paper here.

  3. Cartesian closed categories. This type of category is commonly used to represent the type systems within some programming languages. Given two objects \(A\) and \(B\) in a “CCC”, you can form the product \(A \times B\) as well as the exponential \(B^A\), also written more suggestively as \(A \to B\). When \(A\) and \(B\) are sets (or types) then \(B^A\) can be understood as the set (type) of functions (or “programs”) from \(A\) to \(B\). If you combine this with point 2, so that the product \(\times\) obeys the monoidal laws, then you get a closed monoidal category. I need a better reference on this.

  4. Abelian categories. If you ever try to study algebraic topology you will come across the concept of homology (or cohomology) and the notion of a chain complex. Let \(Ab\) be the category of abelian groups (which is a concrete category as well as a monoidal category). A chain complex of abelian groups is a sequence \(A_0, A_1, A_2, \cdots\) of abelian groups, along with homomorphisms \(d_n: A_{n+1} \to A_n\) for each \(n\), such that \(d_{n} \circ d_{n+1} = 0\) for all \(n\), where on the left is the composition of \(d_{n+1}\) with \(d_n\) and on the right is the zero morphism from \(A_{n+2}\) to \(A_n\), which sends every element to zero. If one asks, “what is the most general category I can replace \(Ab\) with?” the answer would be one where you have zero morphisms, and a few other things… anyway, people already figured this out, and they are called abelian categories!



Thursday February 29 2024. If you wish to know more about categories (in the sense of category theory) then the following 4 types may become of importance to you:

  1. Concrete categories are ones where, roughly speaking, the objects are sets with some additional structure. Many objects in mathematics, like fields, vector spaces, groups, rings, and topological spaces, can be represented this way. For a detailed list of examples check out Example 1.1.3 in Category Theory in Context (free pdf). For example, if X is a topological space, then X would be an object in the category of topological spaces. Now, every topological space has an underlying set of points. This is represented by the forgetful functor from the category of topological spaces to the category of sets, sometimes denoted \(U:\mathrm{Top} \to \mathrm{Set}\). In that case \(U(X)\) would be the underlying set of points of \(X\).

  2. Monoidal categories. Many categories have a way to combine two objects, like the Cartesian product \(A \times B\) of sets, the direct product of two groups, the tensor product of two vector spaces, etc, that constitute a binary operation. Furthermore in each case there is a unit object which serves as an identity for the binary operation: the singleton set (with one element), the trivial vector space (with one point), etc. This is the premise of a monoidal category. Insert segue into Rosetta stone paper here.

  3. Cartesian closed categories. This type of category is commonly used to represent the type systems within some programming languages. Given two objects \(A\) and \(B\) in a “CCC”, you can form the product \(A \times B\) as well as the exponential \(B^A\), also written more suggestively as \(A \to B\). When \(A\) and \(B\) are sets (or types) then \(B^A\) can be understood as the set (type) of functions (or “programs”) from \(A\) to \(B\). If you combine this with point 2, so that the product \(\times\) obeys the monoidal laws, then you get a closed monoidal category. I need a better reference on this.

  4. Abelian categories. If you ever try to study algebraic topology you will come across the concept of homology (or cohomology) and the notion of a chain complex. Let \(Ab\) be the category of abelian groups (which is a concrete category as well as a monoidal category). A chain complex of abelian groups is a sequence \(A_0, A_1, A_2, \cdots\) of abelian groups, along with homomorphisms \(d_n: A_{n+1} \to A_n\) for each \(n\), such that \(d_{n} \circ d_{n+1} = 0\) for all \(n\), where on the left is the composition of \(d_{n+1}\) with \(d_n\) and on the right is the zero morphism from \(A_{n+2}\) to \(A_n\), which sends every element to zero. If one asks, “what is the most general category I can replace \(Ab\) with?” the answer would be one where you have zero morphisms, and a few other things… anyway, people already figured this out, and they are called abelian categories!

Autoprovers > autoformalizers?

Wednesday February 28 2024.

Autoformalization is a way to encode all the mathematical work that’s already been done by inputting the pdfs or latex files into a computer and asking it to parse out all the little details into a completely rigorous, type-checked proof.

But what if it’s easier to just redo all the work from scratch? This is the basic premise of automated theorem proving. Let’s go with the following dubious logic: if there was some amazing method to automatically prove theorems without advances in deep learning, then they would have already been found. Therefore it suffices to look at automated theorem proving in the context of neural theorem provers specifically.

Alright! Let’s first look at GPT-\(f\) from 2 people at OpenAI.

Here’s a list of some neural theorem provers:

Jesse Michael Han, Jason Rute, Yuhuai Wu, Edward W Ayers, and Stanislas Polu. Proof artifact co-training for theorem proving with language models. arXiv preprint arXiv:2102.06203, 2021. [18] Stanislas Polu, Jesse Michael Han, Kunhao Zheng, Mantas Baksys, Igor Babuschkin, and Ilya Sutskever. Formal mathematics statement curriculum learning. arXiv preprint arXiv:2202.01344, 2022. [19] Daniel Whalen. Holophrasm: a neural automated theorem prover for higher-order logic. arXiv preprint arXiv:1608.02644, 2016

I think about the success of the AlphaZero, the machine learning algorithm which became superhuman at chess, Go and other games via self-play using Monte-Carlo tree search (MCTS). A lot of methods I see now use existing training data, like existing math libraries, databases or forums. But don’t we need a way to automatically generately data like in self-play? But there’s no self-play in mathematics, aside from excessive self-admiration. I need to read more but this paper, Hypertree proof search for neural theorem proving (2022), was inspired by AlphaZero. They say this:

Theorem proving can be thought of as computing game-theoretic value for positions in a min/max tree: for a goal to be proven, we need one move (max) that leads to subgoals that are all proven (min). Noticing heterogenity in the arities of min or max nodes, we propose a search method that goes down simultaneously in all children of min nodes, such that every simulation could potentially result in a full proof-tree.

Autoprovers > autoformalizers?

Wednesday February 28 2024.

Autoformalization is a way to encode all the mathematical work that’s already been done by inputting the pdfs or latex files into a computer and asking it to parse out all the little details into a completely rigorous, type-checked proof.

But what if it’s easier to just redo all the work from scratch? This is the basic premise of automated theorem proving. Let’s go with the following dubious logic: if there was some amazing method to automatically prove theorems without advances in deep learning, then they would have already been found. Therefore it suffices to look at automated theorem proving in the context of neural theorem provers specifically.

Alright! Let’s first look at GPT-\(f\) from 2 people at OpenAI.

Here’s a list of some neural theorem provers:

Jesse Michael Han, Jason Rute, Yuhuai Wu, Edward W Ayers, and Stanislas Polu. Proof artifact co-training for theorem proving with language models. arXiv preprint arXiv:2102.06203, 2021. [18] Stanislas Polu, Jesse Michael Han, Kunhao Zheng, Mantas Baksys, Igor Babuschkin, and Ilya Sutskever. Formal mathematics statement curriculum learning. arXiv preprint arXiv:2202.01344, 2022. [19] Daniel Whalen. Holophrasm: a neural automated theorem prover for higher-order logic. arXiv preprint arXiv:1608.02644, 2016

I think about the success of the AlphaZero, the machine learning algorithm which became superhuman at chess, Go and other games via self-play using Monte-Carlo tree search (MCTS). A lot of methods I see now use existing training data, like existing math libraries, databases or forums. But don’t we need a way to automatically generately data like in self-play? But there’s no self-play in mathematics, aside from excessive self-admiration. I need to read more but this paper, Hypertree proof search for neural theorem proving (2022), was inspired by AlphaZero. They say this:

Theorem proving can be thought of as computing game-theoretic value for positions in a min/max tree: for a goal to be proven, we need one move (max) that leads to subgoals that are all proven (min). Noticing heterogenity in the arities of min or max nodes, we propose a search method that goes down simultaneously in all children of min nodes, such that every simulation could potentially result in a full proof-tree.



Wednesday February 28 2024. Autoformalization is a way to encode all the mathematical work that’s already been done by inputting the pdfs or latex files into a computer and asking it to parse out all the little details into a completely rigorous, type-checked proof.

But what if it’s easier to just redo all the work from scratch? This is the basic premise of automated theorem proving. Let’s go with the following dubious logic: if there was some amazing method to automatically prove theorems without advances in deep learning, then they would have already been found. Therefore it suffices to look at automated theorem proving in the context of neural theorem provers specifically.

Alright! Let’s first look at GPT-\(f\) from 2 people at OpenAI.

Here’s a list of some neural theorem provers:

Jesse Michael Han, Jason Rute, Yuhuai Wu, Edward W Ayers, and Stanislas Polu. Proof artifact co-training for theorem proving with language models. arXiv preprint arXiv:2102.06203, 2021. [18] Stanislas Polu, Jesse Michael Han, Kunhao Zheng, Mantas Baksys, Igor Babuschkin, and Ilya Sutskever. Formal mathematics statement curriculum learning. arXiv preprint arXiv:2202.01344, 2022. [19] Daniel Whalen. Holophrasm: a neural automated theorem prover for higher-order logic. arXiv preprint arXiv:1608.02644, 2016

I think about the success of the AlphaZero, the machine learning algorithm which became superhuman at chess, Go and other games via self-play using Monte-Carlo tree search (MCTS). A lot of methods I see now use existing training data, like existing math libraries, databases or forums. But don’t we need a way to automatically generately data like in self-play? But there’s no self-play in mathematics, aside from excessive self-admiration. I need to read more but this paper, Hypertree proof search for neural theorem proving (2022), was inspired by AlphaZero. They say this:

Theorem proving can be thought of as computing game-theoretic value for positions in a min/max tree: for a goal to be proven, we need one move (max) that leads to subgoals that are all proven (min). Noticing heterogenity in the arities of min or max nodes, we propose a search method that goes down simultaneously in all children of min nodes, such that every simulation could potentially result in a full proof-tree.

Learned communication theorem

Friday February 2 2024.

When agents are allowed to evolve in the pursuit of goal (e.g. multi-agent reinforcement learning) then the following “theorem” seems to apply:

In situations of mixed cooperation (like inter-species communication) then some mixture of these extremes. But where is this explicitly stated and/or proved? It lies somewhere in signaling theory probably. Some papers:

Learned communication theorem

Friday February 2 2024.

When agents are allowed to evolve in the pursuit of goal (e.g. multi-agent reinforcement learning) then the following “theorem” seems to apply:

In situations of mixed cooperation (like inter-species communication) then some mixture of these extremes. But where is this explicitly stated and/or proved? It lies somewhere in signaling theory probably. Some papers:



Friday February 2 2024. When agents are allowed to evolve in the pursuit of goal (e.g. multi-agent reinforcement learning) then the following “theorem” seems to apply:

In situations of mixed cooperation (like inter-species communication) then some mixture of these extremes. But where is this explicitly stated and/or proved? It lies somewhere in signaling theory probably. Some papers:

Language agents

Thursday February 1 2024.

Some people are working on “language agents” which try to turn ChatGPT into an agent by giving it the ability to take “actions” in some way, and rolling this into an action-perception loop.

I think this is mostly psuedoscience but it does seem to have the ability to take a short text prompt and turn it into working software… Some interesting projects include:

Language agents

Thursday February 1 2024.

Some people are working on “language agents” which try to turn ChatGPT into an agent by giving it the ability to take “actions” in some way, and rolling this into an action-perception loop.

I think this is mostly psuedoscience but it does seem to have the ability to take a short text prompt and turn it into working software… Some interesting projects include:



Thursday February 1 2024. Some people are working on “language agents” which try to turn ChatGPT into an agent by giving it the ability to take “actions” in some way, and rolling this into an action-perception loop.

I think this is mostly psuedoscience but it does seem to have the ability to take a short text prompt and turn it into working software… Some interesting projects include:

The public goods game is a generalization of the prisoner’s dilemma

Sunday January 7 2024.

Prisoners’ dilemma is a classic example in game theory, but for whatever reason it never stuck with me. Then, I was reading a book about evolutionary game theory (forgot which oops) that describes the public goods game and claimed it is a generalization of the prisoner’s dilemma. I didn’t know that!

In the public goods game, each player starts with some fixed allowance - say \(N\) is the number of players and they all get the same allowance of 100. Each player then has the choice of what portion of their allowance to keep (all, nothing, or anywhere in between) and for the rest to go into the public pool. After all players make their choice, each player’s score is determined by the sum of the portion of their allowance they kept, plus the total amount in the public pool divided by the number of players and multiplied by some factor \(R\).

What is the meaning of \(R\)? It sort of models the beneficial effect of cooperation, because when \(R>1\), it would be theoretically better for everybody to go all in and reap the reward. For concreteness assume \(R=2\). Then:

Everyone contributing their entire allowance is thus the state which maximizes everybody’s score. Unfortunately it is not the Nash equilibrium.

A Nash equilibrium is defined where if you fix all your opponents strategies, and vary your own, then you cannot improve your score whatsoever. If every player finds themeselves in this situation, no one has an incentive to change their strategy or else they will forfeit score, so it persists.

To see why in this case, suppose all your peers contribute their full allowance to the public pool and so do you. We will call the resulting score the score of cooperation: \[ \text{score}(\text{cooperation}) = 200 \] Alternatively, you could choose to defect: keep all your allowance and contribute none. In that case your score is given by your whole allowance plus the pooled contributions of the remaining \(N-1\) players: \[ \text{score}(\text{defection}) = 100 + \frac{200 \cdot (N-1)}{N} \] Here comes a little algebra: \[ \text{score}(\text{defection}) - \text{score}(\text{cooperation}) = 100 - \frac{200}{N} \] Given \(N\) is large enough (3 or more is enough) then the quantity on the right is positive, so: \[ \text{score}(\text{defection}) > \text{score}(\text{cooperation}) \] In the other extreme, if every player adopted the selfish play style of contributing nothing, then any individual who foolishly decided to contribute would only be hurt. Therefore the Nash equilibrium (the “rational” behavior) is for every player to hoard their allowance. Boo!

But how is it a generalization of the prisoner’s dilemma? Suppose there are only 2 players and reduce the value of \(R\) to 1.5 because it needs to be less than the number of players to work. Instead of letting players choose a fractional split, they have to put all or nothing into the pool. Now it’s equivalent to the prisoner’s dilemma, and there are only three outcomes:

Just as before, even though they would mutually both benefit from cooperating, the Nash equilibrium is to defect.

The public goods game is a generalization of the prisoner’s dilemma

Sunday January 7 2024.

Prisoners’ dilemma is a classic example in game theory, but for whatever reason it never stuck with me. Then, I was reading a book about evolutionary game theory (forgot which oops) that describes the public goods game and claimed it is a generalization of the prisoner’s dilemma. I didn’t know that!

In the public goods game, each player starts with some fixed allowance - say \(N\) is the number of players and they all get the same allowance of 100. Each player then has the choice of what portion of their allowance to keep (all, nothing, or anywhere in between) and for the rest to go into the public pool. After all players make their choice, each player’s score is determined by the sum of the portion of their allowance they kept, plus the total amount in the public pool divided by the number of players and multiplied by some factor \(R\).

What is the meaning of \(R\)? It sort of models the beneficial effect of cooperation, because when \(R>1\), it would be theoretically better for everybody to go all in and reap the reward. For concreteness assume \(R=2\). Then:

Everyone contributing their entire allowance is thus the state which maximizes everybody’s score. Unfortunately it is not the Nash equilibrium.

A Nash equilibrium is defined where if you fix all your opponents strategies, and vary your own, then you cannot improve your score whatsoever. If every player finds themeselves in this situation, no one has an incentive to change their strategy or else they will forfeit score, so it persists.

To see why in this case, suppose all your peers contribute their full allowance to the public pool and so do you. We will call the resulting score the score of cooperation: \[ \text{score}(\text{cooperation}) = 200 \] Alternatively, you could choose to defect: keep all your allowance and contribute none. In that case your score is given by your whole allowance plus the pooled contributions of the remaining \(N-1\) players: \[ \text{score}(\text{defection}) = 100 + \frac{200 \cdot (N-1)}{N} \] Here comes a little algebra: \[ \text{score}(\text{defection}) - \text{score}(\text{cooperation}) = 100 - \frac{200}{N} \] Given \(N\) is large enough (3 or more is enough) then the quantity on the right is positive, so: \[ \text{score}(\text{defection}) > \text{score}(\text{cooperation}) \] In the other extreme, if every player adopted the selfish play style of contributing nothing, then any individual who foolishly decided to contribute would only be hurt. Therefore the Nash equilibrium (the “rational” behavior) is for every player to hoard their allowance. Boo!

But how is it a generalization of the prisoner’s dilemma? Suppose there are only 2 players and reduce the value of \(R\) to 1.5 because it needs to be less than the number of players to work. Instead of letting players choose a fractional split, they have to put all or nothing into the pool. Now it’s equivalent to the prisoner’s dilemma, and there are only three outcomes:

Just as before, even though they would mutually both benefit from cooperating, the Nash equilibrium is to defect.



Sunday January 7 2024. Prisoners’ dilemma is a classic example in game theory, but for whatever reason it never stuck with me. Then, I was reading a book about evolutionary game theory (forgot which oops) that describes the public goods game and claimed it is a generalization of the prisoner’s dilemma. I didn’t know that!

In the public goods game, each player starts with some fixed allowance - say \(N\) is the number of players and they all get the same allowance of 100. Each player then has the choice of what portion of their allowance to keep (all, nothing, or anywhere in between) and for the rest to go into the public pool. After all players make their choice, each player’s score is determined by the sum of the portion of their allowance they kept, plus the total amount in the public pool divided by the number of players and multiplied by some factor \(R\).

What is the meaning of \(R\)? It sort of models the beneficial effect of cooperation, because when \(R>1\), it would be theoretically better for everybody to go all in and reap the reward. For concreteness assume \(R=2\). Then:

Everyone contributing their entire allowance is thus the state which maximizes everybody’s score. Unfortunately it is not the Nash equilibrium.

A Nash equilibrium is defined where if you fix all your opponents strategies, and vary your own, then you cannot improve your score whatsoever. If every player finds themeselves in this situation, no one has an incentive to change their strategy or else they will forfeit score, so it persists.

To see why in this case, suppose all your peers contribute their full allowance to the public pool and so do you. We will call the resulting score the score of cooperation: \[ \text{score}(\text{cooperation}) = 200 \] Alternatively, you could choose to defect: keep all your allowance and contribute none. In that case your score is given by your whole allowance plus the pooled contributions of the remaining \(N-1\) players: \[ \text{score}(\text{defection}) = 100 + \frac{200 \cdot (N-1)}{N} \] Here comes a little algebra: \[ \text{score}(\text{defection}) - \text{score}(\text{cooperation}) = 100 - \frac{200}{N} \] Given \(N\) is large enough (3 or more is enough) then the quantity on the right is positive, so: \[ \text{score}(\text{defection}) > \text{score}(\text{cooperation}) \] In the other extreme, if every player adopted the selfish play style of contributing nothing, then any individual who foolishly decided to contribute would only be hurt. Therefore the Nash equilibrium (the “rational” behavior) is for every player to hoard their allowance. Boo!

But how is it a generalization of the prisoner’s dilemma? Suppose there are only 2 players and reduce the value of \(R\) to 1.5 because it needs to be less than the number of players to work. Instead of letting players choose a fractional split, they have to put all or nothing into the pool. Now it’s equivalent to the prisoner’s dilemma, and there are only three outcomes:

Just as before, even though they would mutually both benefit from cooperating, the Nash equilibrium is to defect.

The replicator-mutator equation is a non-local reaction-diffusion equation (??!)

Thursday December 28 2023.

I was reading about allopatric speciation, which occurs in evolution when two groups of members of some species become geographically isolated from each other. The evolutionary paths of the two groups diverge and eventually they are considered different species.

I guess some biologists can interpret geographical location as a genomic variable, or in the other direction, had generalized their view of genome to include geography. In that case, allopatric and sympatric speciation (which occurs without geographic separation) are really the same, just with different trait variables.

Then, I was reading this other paper where they modeled the evolution of population genotypes as a non-local reaction-diffusion PDE, and I saw the connection to the replicator-mutator equation, which I first heard about in this Youtube video, “Biology as information dynamics” by John Baez.

The reaction-diffusion equation is often written as follows, which I will call equation 1: \[ \frac{du}{dt} = \Delta u + f(u) \] I actually don’t like this notation, and prefer the following, which will be equation 2: \[ \frac{du}{dt} = \Delta u + f \circ u \] The difference is subtle, but in equation 2, f is applied pointwise to the field u - this corresponds to a local reaction term. This is usually the situation being described even when equation 1 is written.

In an evolutionary model, the fitness of each individual doesn’t just depend on the population density in a neighborhood of their genotype, it depends on the whole ecology, that is, the whole distribution of genotypes. This is actually better reflected by equation 1, where the reaction term depends, at each point in genospace, on the entire distribution of genotypes across the whole genospace.

I almost forgot to mention, \(\Delta\) is the Laplace operator, and models diffusion over the genospace as a result of mutation.

The replicator-mutator equation is a non-local reaction-diffusion equation (??!)

Thursday December 28 2023.

I was reading about allopatric speciation, which occurs in evolution when two groups of members of some species become geographically isolated from each other. The evolutionary paths of the two groups diverge and eventually they are considered different species.

I guess some biologists can interpret geographical location as a genomic variable, or in the other direction, had generalized their view of genome to include geography. In that case, allopatric and sympatric speciation (which occurs without geographic separation) are really the same, just with different trait variables.

Then, I was reading this other paper where they modeled the evolution of population genotypes as a non-local reaction-diffusion PDE, and I saw the connection to the replicator-mutator equation, which I first heard about in this Youtube video, “Biology as information dynamics” by John Baez.

The reaction-diffusion equation is often written as follows, which I will call equation 1: \[ \frac{du}{dt} = \Delta u + f(u) \] I actually don’t like this notation, and prefer the following, which will be equation 2: \[ \frac{du}{dt} = \Delta u + f \circ u \] The difference is subtle, but in equation 2, f is applied pointwise to the field u - this corresponds to a local reaction term. This is usually the situation being described even when equation 1 is written.

In an evolutionary model, the fitness of each individual doesn’t just depend on the population density in a neighborhood of their genotype, it depends on the whole ecology, that is, the whole distribution of genotypes. This is actually better reflected by equation 1, where the reaction term depends, at each point in genospace, on the entire distribution of genotypes across the whole genospace.

I almost forgot to mention, \(\Delta\) is the Laplace operator, and models diffusion over the genospace as a result of mutation.



Thursday December 28 2023. I was reading about allopatric speciation, which occurs in evolution when two groups of members of some species become geographically isolated from each other. The evolutionary paths of the two groups diverge and eventually they are considered different species.

I guess some biologists can interpret geographical location as a genomic variable, or in the other direction, had generalized their view of genome to include geography. In that case, allopatric and sympatric speciation (which occurs without geographic separation) are really the same, just with different trait variables.

Then, I was reading this other paper where they modeled the evolution of population genotypes as a non-local reaction-diffusion PDE, and I saw the connection to the replicator-mutator equation, which I first heard about in this Youtube video, “Biology as information dynamics” by John Baez.

The reaction-diffusion equation is often written as follows, which I will call equation 1: \[ \frac{du}{dt} = \Delta u + f(u) \] I actually don’t like this notation, and prefer the following, which will be equation 2: \[ \frac{du}{dt} = \Delta u + f \circ u \] The difference is subtle, but in equation 2, f is applied pointwise to the field u - this corresponds to a local reaction term. This is usually the situation being described even when equation 1 is written.

In an evolutionary model, the fitness of each individual doesn’t just depend on the population density in a neighborhood of their genotype, it depends on the whole ecology, that is, the whole distribution of genotypes. This is actually better reflected by equation 1, where the reaction term depends, at each point in genospace, on the entire distribution of genotypes across the whole genospace.

I almost forgot to mention, \(\Delta\) is the Laplace operator, and models diffusion over the genospace as a result of mutation.

Learning Lean4 involves failure

Friday October 27 2023.

I’ve been wanting to learn to use Lean4 for a while, but it’s so weird and hard. I followed the instructions on this page and ran the following command:

curl https://raw.githubusercontent.com/leanprover/elan/master/elan-init.sh -sSf | sh

then rebooted by machine.

Next the page said I need to to install an editor like VS Code. But last I checked programming languages should not depend on an editor.

At this point if we run elan or lean in terminal they seem to be installed. What I would like to do is run lean example.lean and have it return something. So, I followed the instructions on this page. It said to run lake +leanprover/lean4:nightly-2023-02-04 new my_project math which makes a new folder called my_project. Then I went into my_project and ran lake update, which “downloaded component: lean” (~200Mb). Then I ran lake exe cache get and downloaded a bunch more files. Next we make a new folder, MyProject inside my_project, and create the file test.lean. In this file I wrote

import Mathlib.Topology.Basic

#check TopologicalSpace

Then in termimal I ran the command

lean test.lean

The output was

test.lean:1:0: error: unknown package 'Mathlib'
test.lean:3:7: error: unknown identifier 'TopologicalSpace'
test.lean:3:0: error: unknown constant 'sorryAx'

So clearly I did something wrong.

Learning Lean4 involves failure

Friday October 27 2023.

I’ve been wanting to learn to use Lean4 for a while, but it’s so weird and hard. I followed the instructions on this page and ran the following command:

curl https://raw.githubusercontent.com/leanprover/elan/master/elan-init.sh -sSf | sh

then rebooted by machine.

Next the page said I need to to install an editor like VS Code. But last I checked programming languages should not depend on an editor.

At this point if we run elan or lean in terminal they seem to be installed. What I would like to do is run lean example.lean and have it return something. So, I followed the instructions on this page. It said to run lake +leanprover/lean4:nightly-2023-02-04 new my_project math which makes a new folder called my_project. Then I went into my_project and ran lake update, which “downloaded component: lean” (~200Mb). Then I ran lake exe cache get and downloaded a bunch more files. Next we make a new folder, MyProject inside my_project, and create the file test.lean. In this file I wrote

import Mathlib.Topology.Basic

#check TopologicalSpace

Then in termimal I ran the command

lean test.lean

The output was

test.lean:1:0: error: unknown package 'Mathlib'
test.lean:3:7: error: unknown identifier 'TopologicalSpace'
test.lean:3:0: error: unknown constant 'sorryAx'

So clearly I did something wrong.



Friday October 27 2023. I’ve been wanting to learn to use Lean4 for a while, but it’s so weird and hard. I followed the instructions on this page and ran the following command:

curl https://raw.githubusercontent.com/leanprover/elan/master/elan-init.sh -sSf | sh

then rebooted by machine.

Next the page said I need to to install an editor like VS Code. But last I checked programming languages should not depend on an editor.

At this point if we run elan or lean in terminal they seem to be installed. What I would like to do is run lean example.lean and have it return something. So, I followed the instructions on this page. It said to run lake +leanprover/lean4:nightly-2023-02-04 new my_project math which makes a new folder called my_project. Then I went into my_project and ran lake update, which “downloaded component: lean” (~200Mb). Then I ran lake exe cache get and downloaded a bunch more files. Next we make a new folder, MyProject inside my_project, and create the file test.lean. In this file I wrote

import Mathlib.Topology.Basic

#check TopologicalSpace

Then in termimal I ran the command

lean test.lean

The output was

test.lean:1:0: error: unknown package 'Mathlib'
test.lean:3:7: error: unknown identifier 'TopologicalSpace'
test.lean:3:0: error: unknown constant 'sorryAx'

So clearly I did something wrong.

Monoids versus semigroups

Sunday May 28 2023.

Read the following passage from the nLab page for semigroups under the heading “Attitudes towards semigroups”

Some mathematicians consider semigroups to be a case of centipede mathematics. Category theorists sometimes look with scorn on semigroups, because unlike a monoid, a semigroup is not an example of a category.

However, a semigroup can be promoted to a monoid by adjoining a new element and decreeing it to be the identity. This gives a fully faithful functor from the category of semigroups to the category of monoids. So, a semigroup can actually be seen as a monoid with extra property.

How weird! Who wrote this anyway? By checking the revision history we can see the phrase “Category theorists sometimes look with scorn on semigroups” was added in revision #2 and the comment about “centipede mathematics” in revision #3, both in 2009.

So, what other sorts of mathematical objects do category theorists scorn? Apparently the notion of centipede mathematics is well known enough to get its own wiki page, which cites the semigroup as centipede mathematics with a link to nLab right away.

The following quote summarises the value and usefulness of the concept: “The term ‘centipede mathematics’ is new to me, but its practice is surely of great antiquity. The binomial theorem (tear off the leg that says that the exponent has to be a natural number) is a good example. A related notion is the importance of good notation and the importance of overloading, aka abuse of language, to establish useful analogies.” — Gavin Wraith

There we find a 2000 blog post from John Baez accusing all sorts of objects of being centipede mathematics, including ternary rings, nearfields, quasifields, Moufang loops, Cartesian groups, and so on!

Monoids versus semigroups

Sunday May 28 2023.

Read the following passage from the nLab page for semigroups under the heading “Attitudes towards semigroups”

Some mathematicians consider semigroups to be a case of centipede mathematics. Category theorists sometimes look with scorn on semigroups, because unlike a monoid, a semigroup is not an example of a category.

However, a semigroup can be promoted to a monoid by adjoining a new element and decreeing it to be the identity. This gives a fully faithful functor from the category of semigroups to the category of monoids. So, a semigroup can actually be seen as a monoid with extra property.

How weird! Who wrote this anyway? By checking the revision history we can see the phrase “Category theorists sometimes look with scorn on semigroups” was added in revision #2 and the comment about “centipede mathematics” in revision #3, both in 2009.

So, what other sorts of mathematical objects do category theorists scorn? Apparently the notion of centipede mathematics is well known enough to get its own wiki page, which cites the semigroup as centipede mathematics with a link to nLab right away.

The following quote summarises the value and usefulness of the concept: “The term ‘centipede mathematics’ is new to me, but its practice is surely of great antiquity. The binomial theorem (tear off the leg that says that the exponent has to be a natural number) is a good example. A related notion is the importance of good notation and the importance of overloading, aka abuse of language, to establish useful analogies.” — Gavin Wraith

There we find a 2000 blog post from John Baez accusing all sorts of objects of being centipede mathematics, including ternary rings, nearfields, quasifields, Moufang loops, Cartesian groups, and so on!



Sunday May 28 2023. Read the following passage from the nLab page for semigroups under the heading “Attitudes towards semigroups”

Some mathematicians consider semigroups to be a case of centipede mathematics. Category theorists sometimes look with scorn on semigroups, because unlike a monoid, a semigroup is not an example of a category.

However, a semigroup can be promoted to a monoid by adjoining a new element and decreeing it to be the identity. This gives a fully faithful functor from the category of semigroups to the category of monoids. So, a semigroup can actually be seen as a monoid with extra property.

How weird! Who wrote this anyway? By checking the revision history we can see the phrase “Category theorists sometimes look with scorn on semigroups” was added in revision #2 and the comment about “centipede mathematics” in revision #3, both in 2009.

So, what other sorts of mathematical objects do category theorists scorn? Apparently the notion of centipede mathematics is well known enough to get its own wiki page, which cites the semigroup as centipede mathematics with a link to nLab right away.

The following quote summarises the value and usefulness of the concept: “The term ‘centipede mathematics’ is new to me, but its practice is surely of great antiquity. The binomial theorem (tear off the leg that says that the exponent has to be a natural number) is a good example. A related notion is the importance of good notation and the importance of overloading, aka abuse of language, to establish useful analogies.” — Gavin Wraith

There we find a 2000 blog post from John Baez accusing all sorts of objects of being centipede mathematics, including ternary rings, nearfields, quasifields, Moufang loops, Cartesian groups, and so on!

Equivariant dynamics theorem

Friday March 3 2023.

Suppse you have a “spatiotemporal dynamical system”:

In addition, assume there is a group \(G\) along with a group action on \(X\) denoted by \(g \cdot x\) for \(g \in G\) and \(x \in X\). Any such action can be extended to an action on \(U\) by applying the transformation pointwise, which we can also denote by \(g \cdot u\) for \(u \in U\). Now assume the dynamical system is \(G\)-equivariant with respect to the action of \(G\), meaning it obeys the equation \[ F(t,g\cdot u)=g \cdot F(t,u) \] If \(F\) is \(G\)-equivariant, then whenever \(u\) is a constant function, so is \(F(t,u)\) for all \(t\). It follows that we can define a local action \(f:T \times Y \to Y\) by the action of \(F\) on the constant functions. Is there a name for this little \(f\)?

Edit: Here is a quote from Long-time behavior of a class of biological models by Weinberger (1982):

A constant function \(\alpha\) is clearly translation invariant. That is, \(T_y\alpha=a\) for all \(y\). Consequently, \(T_yQ[\alpha]= Q[\alpha]\) for all \(y\), which means that \(Q[\alpha]\) is again a constant. This simply states that in a homogeneous habitat the effects of migration cancel when \(u\) is constant. Thus, the properties of the model in the absence of migration can be found by looking at what \(Q\) does to constant functions.

Equivariant dynamics theorem

Friday March 3 2023.

Suppse you have a “spatiotemporal dynamical system”:

In addition, assume there is a group \(G\) along with a group action on \(X\) denoted by \(g \cdot x\) for \(g \in G\) and \(x \in X\). Any such action can be extended to an action on \(U\) by applying the transformation pointwise, which we can also denote by \(g \cdot u\) for \(u \in U\). Now assume the dynamical system is \(G\)-equivariant with respect to the action of \(G\), meaning it obeys the equation \[ F(t,g\cdot u)=g \cdot F(t,u) \] If \(F\) is \(G\)-equivariant, then whenever \(u\) is a constant function, so is \(F(t,u)\) for all \(t\). It follows that we can define a local action \(f:T \times Y \to Y\) by the action of \(F\) on the constant functions. Is there a name for this little \(f\)?

Edit: Here is a quote from Long-time behavior of a class of biological models by Weinberger (1982):

A constant function \(\alpha\) is clearly translation invariant. That is, \(T_y\alpha=a\) for all \(y\). Consequently, \(T_yQ[\alpha]= Q[\alpha]\) for all \(y\), which means that \(Q[\alpha]\) is again a constant. This simply states that in a homogeneous habitat the effects of migration cancel when \(u\) is constant. Thus, the properties of the model in the absence of migration can be found by looking at what \(Q\) does to constant functions.



Friday March 3 2023. Suppse you have a “spatiotemporal dynamical system”:

In addition, assume there is a group \(G\) along with a group action on \(X\) denoted by \(g \cdot x\) for \(g \in G\) and \(x \in X\). Any such action can be extended to an action on \(U\) by applying the transformation pointwise, which we can also denote by \(g \cdot u\) for \(u \in U\). Now assume the dynamical system is \(G\)-equivariant with respect to the action of \(G\), meaning it obeys the equation \[ F(t,g\cdot u)=g \cdot F(t,u) \] If \(F\) is \(G\)-equivariant, then whenever \(u\) is a constant function, so is \(F(t,u)\) for all \(t\). It follows that we can define a local action \(f:T \times Y \to Y\) by the action of \(F\) on the constant functions. Is there a name for this little \(f\)?

Edit: Here is a quote from Long-time behavior of a class of biological models by Weinberger (1982):

A constant function \(\alpha\) is clearly translation invariant. That is, \(T_y\alpha=a\) for all \(y\). Consequently, \(T_yQ[\alpha]= Q[\alpha]\) for all \(y\), which means that \(Q[\alpha]\) is again a constant. This simply states that in a homogeneous habitat the effects of migration cancel when \(u\) is constant. Thus, the properties of the model in the absence of migration can be found by looking at what \(Q\) does to constant functions.