Monday, March 6, 2023

Double categories

Perhaps the most fundamental idea in category theory is internalisation. This lets you take some concept familiar from classical mathematics and set theory, and then turn it into an internal concept in some category. With a sufficiently advanced category, with a powerful enough internal language like a topos, you can really remake all mathematics internal to that category. This creates a different perspective on everything.

A particular example of internalisation is that of a double category. This is a category internal to the category of categories $Cat$. These double categories are the first in a line of different types of n-fold categories. Looking back at it now, I believe these n-fold categories, categories internal to the category of categories, are more fundamental then bicategories because internalisation is more interesting and fundamental then enrichment.

I want to escape from set-theoretic preconceptions which are still so ubiquitous even as category theory gains traction. The only way I think that you can do that is through internalisation, and the concept of mathematics internal to some category other then $Sets$. If you just go with enrichment, then you have all the assumptions about $Set$-categories packed in, you are just adding on more structure to the $Set$-category without doing anything really different.

Perhaps you could then do enriched category theory internal to a given category, e.g internal enriched category theory. That would resolve all issues. But the point is that this internalisation notion is more fundamental, and you want a bit of it even if you do enriched category theory. This gives me the impression that double categories are more fundamental then category enriched categories.

The perception that double categories are more fundamental then 2-categories is formalised by the notion that any 2-category can be converted in to a double category. This can be achieved by defining the horizontal or the vertical morphisms of the double category to be identities. So an advantage of a double-categorical framework is that it can subsume 2-categories as a special case.

Double categories:
A category internal to the category of categories $Cat$ is a structure $C$ that has six components:
  • A category of objects $Ob(C)$
  • A category of morphisms $Arrows(C)$
  • A source functor $S: Arrows(C) \to Ob(C)$
  • A target functor $T: Arrows(C) \to Ob(C)$
  • An identity functior $id: Ob(C) \to Arrows(C)$
  • A composition functor $\circ: Arrows(C) \times_{Ob(C)} \times Arrows(C) \to Arrows(C)$.
The key takeaway is that for any category $C$ then the category of its commutative squares $Sq(C)$ forms a double category. This formalizes the fact that commutative squares can be composed either horizontally or vertically. This is a very important part of the theory of commutative squares.

References:
Double categories

Sunday, February 19, 2023

Functorial characterizations of internal magmoids

An internal quiver in a category $C$ is just a diagram $F: T_2^* \to C$. These diagrams consist of a pair of parallel arrows in the category. Then if we are given such an internal quiver, we can always form an equalizer of its to arrows to get a subobject of the arrow object $A$. The problem is that this subobject $S$ is not going to contain all the elements we want for a composition domain. For example, for a category $C$ this consists of all endomorphisms instead of all composable pairs of arrows. Instead we have to work with pullbacks, and get the pullback of the first and second arrow. This is why internal categories are always defined over categories that have all pullbacks. Then this pullback has the property that it is the equalizer of the first and second arrows as they filter through the product object $A^2$. This makes $R$ into an internal relation in the category $C$, embedded in the object $A$. If we take a look at the above diagram, its not hard to see that the first and second projection arrows can be made to filter through $A \times_O A$ to make them into a ternary quiver consisting of three parallel arrows going to $A$. This leads to a diagram that looks like this. Then diagrams of this type form a category $C$. The axioms of this category $C$ can be used to ensure the laws specifying the source and target of composite morphisms. In particular, we can ensure that the source of the composite is the source of the second morphism and the target of the composite is the target of the first morphism. All that can be encoded in the index category. We then have the idea of a compositional quiver, which is a further over this index category $CQ$.

The importance of this functorial characterization is that we can form a functor category $C^{CQ}$. The problem, however, is that internal compositional quivers don't necessarily have to have nice input domains $A \times_O A$. This is why to have an internal magmoid, we must have a compositional quiver diagram with the property that the composition domain is the pullback $A \times_O A$.

In other words, this means that in the compositional quiver diagram $A \times_O A$ and the morphisms $1$ and $2$ form a universal limiting cone for the pullback. This leads me to an interesting idea: functors with the property that certain subcones within the functor, described as subdiagrams, are universal. This can be used for example to characterize whenever an object in a diagram is a product or a coproduct. Something like this would make for an interesting data structure for those interested in implementing category theory on the machine.

To form an internal category is a little bit more involved. We can always add identities to an internal magmoid diagram, by adding another identity morphism going from objects to arrows. To make that unital magmoid into a category also requires that we satisfy associativity and the unit laws. Rest assured, however, that internal categories can be formed in $Sets^{\to}$ because it has all pullbacks, and if there is anything interesting to be said there I will be saying it eventually. For now these are just some of my thoughts on internal magmoids and related notions.

A key takeaway is that everything should be represented by a functor. Natural transformations for example are simply functors from a category in to an arrow category, so they also have nicely defined functorial semantics. If we take categories to be something like functors or diagrams internal to some category, that is a step forwards too. While internal categories could be just some collection of objects and arrows satisfying some axioms, if we can make them in to special types of functors that is most convenient. Functors are the ideal structures we should work towards dealing with.

References:
internal categories

Saturday, February 18, 2023

Future directions in declarative programming

This blog has been around for a while now, and as time has gone on some of my opinions and views have changed. One thing that has remained the same is that I have always had a mind focused on logic. Everything to do with logic interests me. I don't know why this is, some of us just have a mind focused on logic I suppose.

As a logician, I have tried to seek out a wide variety of tools of logical reasoning: logic programming, horn clauses, set theory, predicate logic, ontologies, lattices and formal concept analysis, categorical logic and topoi, etc. As I have done all this research and thinking about logic I've felt there is a mismatch between functional programming and the techniques I use. My primary problem then has been how do we solve this mismatch between functional programming and logic?

Problem: functional programming is not logical enough

Solution 1: integrate functional and logic programming in to a single language

Solution 2: embrace functional programming but make it more logical by developing a new logic of functions

When I first encountered topos theory, I thought that solution 1 would be the way to go. We would simply take existing logical and functional programming techniques and use topoi as a unifying bridge between them. The result would be a unified multi-paradigm declarative language with topos theory as its core.

The idea of unifying functional and logical programming is rather obvious and it fits with the common practice of multi-paradigm programming. The problem with the approach of throwing everything and the kitchen sink into a language is its indecisive. The multi-paradigm approach is too fence sitting and it fails to committ to one thing or another.

It occurs to me in developing the toposic structure theory of computations (2023) that we can just develop new logical techniques for handling functions to get our 'logic fix' that way, all without leaving functional programming for another paradigm. Practical work in modern logic uses categoric logic and topos theory. Therefore, this new logic of functional programming will have to be developed using categorical logic and the topos of functions.

What should the future trend in declarative programming be? It should be towards more and better functional programming. We can make things more functional by developing them in the functional languages: Clojure, F#, Haskell, etc and we can make these existing functional languages better. Instead of the multi-paradigm concept, functional programming should come first.

Previously:
The future of declarative programming

References:
[1] Hartmanis, J., & Stearns, R. E. (1966). Algebraic structure theory of sequential machines.

[2] Bernier, J. (2023). Toposic structure theory of computations.

Tuesday, February 14, 2023

The logic of functional programming

A trend toward functional programming has emerged in the last fifteen years. Functional programming languages have seen greater adoption, and several functional features have finally started to make their way into mainstream languages. Lambda expressions have been added to C++ (with C++11), Java (with Java 8), and C# (with version 2).

I rode on the wave of this rise in functional programming. When functional programming started becoming mainstream, I decided to join in. As I went through this personal journey, there have been various problems I have sought to solve along the way and ideas that I have sought to make sense of.

Functional modification

I made my own journey from imperative languages, primarily JavaScript, to more functional ones. One of the first things we are all faced with when making the transition to functional programming is making sense of immutability and functional purity. How is it possible to program if we cannot change anything?

As it turns out, changing things in functional languages is possible. You are just encouraged to do so with pure functions. A case in point is the assoc function in Clojure. This pure function takes in a composite structure and returns a changed version without affecting the original structure.
(def coll [1 2 3 4])
(def new-coll (assoc coll 0 10))
(= coll new-coll) ;=> false
The assoc function is generalized to handle various composite data structures, all within a functional framework. So we can use assoc to modify the parts of a map:
(def point {:x 1, :y 2})
(assoc point :x 10) ;=> {:x 10, :y 2}
In addition, the assoc-in method can be used to make modifications deep within the nested parts of a composite data structure. For example, it can be used to modify the values within a matrix.
(def matrix 
  [[1 2 3]
   [4 5 6]
   [7 8 9]])
(assoc-in matrix [1 1] 50) ;=> [[1 2 3] [4 50 6] [7 8 9]]
So it would seem that there is a way of modifying things in functional languages. It is just that the modifications take on a different form. This leads us to functional references, which help us to make sense of functional modifications.

Functional references:
Traditional languages like C++, Java, C#, and JavaScript provide generalized imperative references in the form of lvalues (locator values). An lvalue is anything that occurs on the left-hand side of an equal sign. The right-hand side contains what the lvalue is to be turned into. When you move over to the functional world, you get generalized functional references, also known as lenses.
; the basic idea is that a lens can be defined by combining a pure 
; getter function and a pure setter function.  
(deftype Lens [getter setter]
  clojure.lang.IFn 
  (invoke [this arg] (get arg))
  (applyTo [this args] (clojure.lang.AFn/applyToHelper this args)))
Lenses implement clojure.lang.IFn, so they can be treated a lot like functions. However, we need to define special methods that wrap their setter functions to make further use of them.
(defn modify 
  "Set the part of object at lens to val."
  [lens obj val]
  
  (let [setter (.-setter lens)]
    (setter obj val)))

(defn zap
  "Apply the function func to the part of the obj at lens."
  [lens obj func]
  
  (let [setter (.-setter lens)]
    (setter obj (func (lens obj)))))
The clojure get and get-in functions are the counterparts of assoc and assoc-in. We can combine get and assoc methods to form lenses.
; an example lens which describes the first part of a tuple
(def first-lens
  (Lens.
   ; get the first part of a tuple
   (fn [arg]
     (get arg 0)) 
    
    ; set the first part of a tuple
    (fn [arg v]
      (assoc arg 0 v))))

; an example lens which describes the second part of a tuple
(def second-lens 
  (Lens.
   ; get the second part of a tuple
   (fn [arg]
     (get arg 1)) 
    
    ; set the second part of a tuple
    (fn [arg v]
      (assoc arg 1 v))))
A fundamental property of references in imperative and functional languages is that they can be arbitrarily nested. In the functional world, this nesting is defined compositionally.
(defn ^Lens compose-lenses 
  "combine two lenses to get their composition"
  [^Lens f, ^Lens g]
  
  (Lens.
   (comp (.-getter f) (.-getter g))
    (let [set-f (.-setter f)
          set-g (.-setter g)]
      (fn [a v]
        (set-g a (set-f (g a) v))))))
By using lens composition, we can recapture the nesting provided by Clojure's get-in and assoc-in methods with a much more extensible framework. With lenses, we have now translated references from an imperative context to a functional one.

Bijections are lenses:
Another key fact is that bijections are lenses. In particular, we can form a lens from a function and its inverse. This basic fact is implemented in the code below.
(defn bijection-lens 
  "get a lens from a function and its inverse."
  [func inv-func]
  
  (Lens.
   func 
    (fn [arg v]
      (inv-func v))))
A most obvious example is the bijection between booleans like true and false and bits like zero and one. It is customary to treat false as zero and one as true. So we use this to define a bijection.
(def boolean-to-bit 
  (bijection-lens
    {false 0, true 1} 
    {0 false, 1 true}))
We can now define an object of the boolean type and then modify it in terms of its bit representation.
(def condition true)
(modify boolean-to-bit condition 0) ;=> false
Recalling that we earlier demonstrated that lenses are compositional, any lens can now be composed with a bijection to get a different lens. A lens now has two different components: the information it refers to and its representation of that information. There is a single notion of the first part of a pair of numbers, but there are potentially infinite representations of that same first part.

Algebraic theory of information:
The fact that lenses can be composed with bijections suggests a dichotomy between information and its accessors.
  • Information: a location within a larger structure
  • Accessor: the way that a given piece of information is viewed
The first part of a tuple $X^2$ is a unique piece of information, but it may have many different representations. Each composition with a bijection produces a different representation. With so many bijections available producing different representations, it is important that we consider lenses and information as different.

There is a further important property. We often find that some reference has more information than another, or that one memory location has more data then another. So, for example, consider the class of 2x2 matrices. Then these matrices have a first row and a first column, and both of them together contain the $(0,0)$ entry. So the $(0,0)$ is an information substructure of the first row and the first column.

We can think of a lens as a way of drilling down into part of a structure, and some lenses drill down deeper into a structure. The depth that a lens goes down to is part of the theory of information orders, which is a kind of logic. This motivates our further study of the logic of functional programming.

The varied behaviour of functions:
Even with this groundwork laid by functional references we don't really know how general functions behave. With all this work with lenses, all we can do is translate concepts from imperative programming (like modification and references) to functional programming. But we still don't have a way of describing how functions behave.

To explain why we haven't gotten that far yet, we should look at some examples. If we have a quadruple in the set $X^4$ then we can say that it has four different lenses associated to each of its indices. Each lens refers to a different part of the quadruple. We can define a permutation $f(a,b,c,d) = (b,a,d,c)$ which swaps the first and second pairs of values.
(defn permute-quadruple 
  "Swap the first value with the second and the third with  
  the fourth"
  [[a b c d]]
  
  [b a d c])
  
(permute-quadruple [10, 20, 30, 40]) ;=> [20, 10, 40, 30]
This is not simply the modification of some part of a structure. This permutation moves data from one lens to another. There seem to be motions and subcomputations occuring inside of our pure functions! What are these? How do you explain them? This will need to be the subject of further investigation.
(defn transform-quadruple
  "A transformation of a quadruple which is not a permutation."
  [[a b c d]]
  
  [c a a c])
  
(permute-quadruple [10, 20, 30, 40]) ;=> [30, 10, 10, 30]
Aside from moving values around from place to place, a function can modify them. For example, you have things like map which lets you take a function and apply it to all parts of a structure.
(defn increment-triple
  "Increment all parts of a triple"
  [[a b c]]
  
  (vec (map inc [a b c])))
  
(increment-triple [1 2 3]) ;=> [2 3 4]
One further example uses combiners like + and * to take multiple different places and combine them into a single place in the result.
(defn sum-product-pair
  "Get the sum of the first two elements and the product of the 
   next two."
  [[a b c d]]
  
  [(+ a b) (* c d)])
  
(sum-product-pair [10 20 30 40]) ;=> [30 1200]
These are some of the many examples of things we can do with functions. A couple more examples come from linear algebra:
  • When you transpose a matrix each entry in the output matrix is determined by an entry in the input matrix.
  • If you take two matrices and multiply them then each entry of the output matrix is determined by a row in the first matrix and a column in the second.
  • When you add two matrices each entry in the output matrix is determined by the an entry in the first matrix and an entry in the second matrix.
There are examples abound throughout math. This concept of information dependence appears everywhere. Once we have a concept of information and of functional references, it is desirable that we should have a logical theory as to how functions behave around these references. To describe how they move information around. It follows that there is still more that needs to be done.

Describing the transport of data:
The varied behaviour of functions that we are seeking to explain all have a common description. They all transport data from one place to another using some intermediate function. Permutations move data from one place to another without changing them along the way. They are purely a rearrangement of data. The map function moves data from a location back to itself, but it changes it along the way.

Commutative square diagrams can be used to explain all these varied behaviours of functions that we have been seeing. In order to see this it might be helpful to explain commutative square diagrams using exceptionally descriptive variable names. While we could describe each of its four components by single letter variables: f,g,h, and i perhaps the approach I am about to propose will be more suggestive.

We can say that a commutative square diagram for a given function, has in addition to that function three additional components: an input function, an output function, and a transport function. Intuitively, this means that the commutative square diagram describes the movement of data from the input place to the output place occurs along the given transport function. This is how we will describe the movement of data from one place to another. With this fundamental formalism we now have the means we need to describe the behavior of functions, such as how data is moved from one place to another. If we look back at the examples we were exploring before, like permutations, transformations, matrix transpositions, the incrementation of triples, arithmetic functions, etc they all have an explanation for their behavior using this fundamental formalism.

Take the function permute-quadruple. It moves the first index to the second index, and vice versa, and the third index to the fourth index and vice versa. In each case data is moved from one place to another along the identity function. In the case of increment-triple each index is moved back to itself using the increment-triple function. In these examples, the motion of data from one place to another by a function is described by a commutative square diagram.

When we introduced lenses and functional references we made a very big step forward, because we had a general formalism for describing functional mutations, yet the story could not end there. Introducing these functional references left us open to the question of how we could understand how various different kinds of functions move the data these lenses point to around. With commutative square diagrams, we have the means we need to explain all the behaviour we have been seeing.

Information flows:
What we have so far described is how data can be moved from an input place to an output place along some transport function. We can also ask what can be said of data migrations without a transport function? What happens if we just want to talk about input and output locations by themselves?

Then the formalism we get is a pair $(I,O)$ of an input location and an output location. These location pairs, or partition pairs, describe how ignorance spreads or information flows under the operation of a function. The idea of exploring such pairs with a view towards studying how information flows was first put forward by Hartmanis and Stearns:
Therefore, the S.P partitions on a machine describe the cases in which the "ignorance" about the exact states of the machine does not increase as the machine operates. The concept of partition pairs is more general and is introduced to study how "ignorance spreads" or "informaton flows" through a sequential machine when it operates.
J. Hartmanis, R. E. Stearns
By studying these partition pairs in the context of commutative square diagrams, we can make them part of a larger framework that includes a theory of the transport of information. At the same time, these information flows help us to describe how these commutative square diagrams function.

In a recent paper of mine, I study these information flows in terms of the commutative diagrams they produce. It is noted, for example, that for any function $f: A \to B$ the partition pairs of $f$ form a lattice, and the properties of these lattices are described. We use these results to form a new kind of structure theory.

The topos of functions:
If we take two different commutative square diagrams, where the transport of one is the function of another, then these two diagrams can be composed. The composition of the two square diagrams occurs componentwise along their input and output functions. The result is a new square diagram. If we take this to be our composition law, the result is a category: the category $Sets^{\to}$. If we look a little deeper we find there is something very special about this category: it is a topos.

The fact that it is a topos gives us a lot. This means that $Sets^{\to}$ has all small limits and colimits, products, coproducts, epi-mono factorisations, subobject classifiers, an object of truth values, logical connectives, an internal logic, exponentation, power objects, subobject lattices, etc. Everything you can do in a topos you can do in $Sets^{\to}$. This makes $Sets^{\to}$ a broadly interesting subject of study in its own right, with some interesting properties of its own, like the failure of the law of excluded middle to hold.

The key thing for us is how $Sets^{\to}$ can help us all better understand functional programming. Its central concept, the commutative square diagram, describes how data moves around in functions. The commutative square can be described by a function, an input place, an output place, and a transport function that describes how inputs are mapped to outputs. As is true for any form of categorical logic, the dual logical theories of $Sets^{\to}$ are emergent properties of these commutative square diagrams.

Functional lenses let you focus in on some particular piece of a complex data structure. Commutative square diagrams, on the other hand, let you focus on some particular piece of a function. The result is the structure theory of functions. With this structure theory, we can now reason logically about the behaviour of functional programs.

Feedback:
How can we further improve the structure theory of functional programs? Write to me at jhuni.v2@gmail.com or leave a comment in the comment section below.

References:
[1] Hartmanis, J., & Stearns, R. E. (1966). Algebraic structure theory of sequential machines.

[2] Bernier, J. (2023). Toposic structure theory of computations.

[3] Goldblatt, R. (2014). Topoi the categorial analysis of logic. Elsevier Science.

[4] Johnstone, P. T. (2008). Sketches of an Elephant: A topos theory compendium. Oxford University Press.

[5] Lane, M. S., & Moerdijk, I. (2010). Sheaves in geometry and logic: A first introduction to topos theory. Springer.

[6] Barr, M., & Wells, C. (1985). Toposes, triples and theories. Springer-Verlag.

[7] MacLarty, C. (2005). Elementary categories, elementary toposes. Clarendon Press.

[8] Bell, J. L. (2008). Toposes and local set theories: An introduction. Dover Publications.

[9] Moerdijk, I. (2008). Classifying spaces and classifying topoi. Springer.

[10] Caramello, O. (2018). Theories, sites, toposes: Relating and studying mathematical theories through topos-theoretic 'Bridges'. Oxford University Press.

Tuesday, February 7, 2023

The number of available processors

In order to parallelize a function that has a well-defined product decomposition, such as the map function or one of its variants it is good that we should determine how many processors are available in a given system. This can be achieved on the JVM using the availableProcessors method of the Runtime class.

Java:
public class ProcessorCounter {
    public static void main(String[] args) {
        int processorsCount = Runtime.getRuntime().availableProcessors();
        System.out.println("The number of available processors is:" + Integer.toString(processorsCount));
    }
} 

Clojure:
(let [runtime (Runtime/getRuntime)
      processors-count (.availableProcessors runtime)]
  (println "The number of available processors is: " (str processors-count)))
Limits to parallelism:
There are two limits to parallelism:
  • The number of available processors, which is determined by the available processors method.
  • The overhead of threading, which states that sometimes the cost of launching parts of a computation into separate threads can outweigh the advantages of parallelism.
It doesn't make much sense to parallelize a computation if you don't have the available processors to take advantage of that. On the other hand, the overhead of threading is one reason why it isn't always good to use pmap in Clojure.

Saturday, February 4, 2023

On Clojure's map and pmap functions

Let $f: A \to B$ be a function then mapping $f$ over $n$-tuples is the same as applying the product function $f^n$. The map function is one of the simplest types of product functions: the product of a function with itself. Consequently, map is embarrassingly parallel. Its parallelism is one of the main ingredients of modern parallel programming alongside the referential transparency that comes with functional programming languages.

If a programming language has referential transparency, then its components can be computed in any order and then combined afterwards to produce a common result. This is why functional programming languages are so good at modern parallel computing, and its a large part of why they have be seen to be getting greater traction in recent times. The parallelism of reduce is essentially in this category, because the combination of associativity and referential transparency allows you to freely determine your computation order.

The map function is distinguished by the fact that it is embarassingly parallel. This means that map can be decomposed into different independent computation tasks requiring no communication between one another until they are joined at the end. The fact that map is embarassingly parallel has already been put to good use in a number of parallel computing applications, such as MapReduce libraries like Apache Hadoop.

However, while the use of parallel mapping constructs is satisfactory that doesn't mean we should stop there. It is also desirable that we should have a general mathematical structure theorem for describing when functions can be given embarassingly parallel implementations. This will capture the categorical universal properties that make embarassing parallelism possible, without regard for representation details like the use of sequences. A truly abstract mathematical description should not make reference to representations.

If we have an embarrasingly parallel computation such as the application of the map function, then we can break it down into parts using threads or multiple computers. Then when the computations are done they can be combined to produce a common result. This is what the Clojure pmap does, as it uses the Java threading API to split up a computation into parts that can be run on as many processors as possible.
  • The pmap function first gets the number of processors on the machine by using the static getRuntime method on java.lang.Runtime and then calling the virtual availableProcessors method on that to get the number of processors available on the JVM. The number of processors is used to determine how many threads to break the computation in to.
  • Once the number of processors is determined the pmap function breaks up the computation of the map into separate threads using Clojure futures. These take a computation and run it on another thread without blocking, and then they cache the result so that it can be accessed later by dereferences.
  • Finally, the computation of the parallel map is computed semi-lazily by using the lazy-sequence function. It computes an appropriate chunk of the sequence determined by the number of available processors but it doesn't compute everything unless required.
The implementation of an embarassingly parallel function like map will probably have to take a similar one to that played by Clojure's pmap: the computation will be broken up into threads handled by different processors. However, what makes this possible is a mathematical property shared not only by map but potentially by a number of other different types of functions. By understanding these structural properties, we can better understand the parallelisation of functional applications.

References:
Map (parallel pattern)

Tuesday, January 31, 2023

Embarassingly parallel decompositions of functions

A basic primitive of functional programming, the map function, is embarrassingly parallel. This means that map can be broken up into independent computations requiring no communication or synchronization between them except for a join operation at the end. It would be nice if we could describe embarrassing parallelism in the most general sense using category-theoretic universal properties.

To do this we will use the topos $Sets^{\to}$. It will be shown that products in this category describe embarrassingly parallel computations. If $X^n$ is the set of $n$-tuples of elements of $X$, then the map function with argument $f$ is equivalent to the product function $f^n$ of $f$ with itself $n$ times. We will describe how arbitrary functions in $Sets^{\to}$ can be given product decompositions, even when they do not use representations by sequences so that they can be treated as embarrassingly parallel functions.

Background on the algebraic theory of information:
The algebraic theory of information, introduced by Hartmanis and Stearns requires that we model information using partitions. In modern categorical treatments, partitions are equivalence classes of epimorphisms in the topos $Sets$. As every topos, including $Sets$, has epi-mono factorizations, every function in $Sets$ has an epi-component which is its partition or kernel.

Definition. let $f$ be a function then $f$ defines a partition called $ker(f)$ which describes equality with respect to $f$: \[ (a,b) \in ker(f) \Leftrightarrow f(a) = f(b) \] Example 1. let $\Omega : \{0,\frac{1}{2},1\} \to \{0,1\}$ be the object of truth values in $Sets^{\to}$ then its kernel is the equivalence relation defined by the partition $\{\{0\},\{\frac{1}{2},1\}\}$

Example 2. let $abs : \mathbb{R} \to \mathbb{R}$ be the absolute value function. Then $(-1,1) \in ker(abs)$ because $-1$ and $1$ have the same absolute value.

Background on products in Sets:
Definition. let $S_1,S_2,...S_n$ be a family of sets in the topos $Sets$. Then a universal product cone is a family of epimorphisms $f_i$ from an object $X$ to each $S_i$. By the kernel construction it follows that each $f_i$ has a corresponding partition $ker(f_i)$. These partitions have the following property: \[ \forall F \in \prod_{i \in I} P_i: |\bigcap_{i \in I} F_i| = 1 \] Which equivalently states that each selection of equivalence classes for each partition has a single common intersection. To demonstrate the use of this formalism we have provided a set of examples:

Example 1. {{0,1},{2,3}} and {{0,2},{1,3}} form a product decomposition because all pairs of equivalence classes between them have singular intersection:
0,2 1,3
0,1 0 1
2,3 2 3

Example 2. {{0,1,2},{3,4,5},{6,7,8}} and {{0,3,6},{1,4,7},{2,5,8}} form a product decomposition because all pairs of equivalence classes between them have singular intersection:
0,3,6 1,4,7 2,5,8
0,1,2 0 1 2
3,4,5 3 4 5
6,7,8 6 7 8
Example 3. the following three equivalence classes form a product decomposition because all triples of equivalence classes chosen between the three of them have singular intersection: \[ \{\{0,1,2,3\}, \{4,5,6,7\} \} \] \[ \{\{0,2,4,6\}, \{1,3,5,7\} \} \] \[ \{\{0,1,4,5\}, \{2,3,6,7\} \} \] This can be confirmed by examining all the triples of equivalence classes of these three partitions manually to determine that they have singular intersection: \[ \{0,1,2,3\} \cap \{0,2,4,6\} \cap \{0,1,4,5\} = \{0\} \] \[ \{0,1,2,3\} \cap \{0,2,4,6\} \cap \{2,3,6,7\} = \{2\} \] \[ \{0,1,2,3\} \cap \{1,3,5,7\} \cap \{0,1,4,5\} = \{1\} \] \[ \{0,1,2,3\} \cap \{1,3,5,7\} \cap \{2,3,6,7\} = \{3\} \] \[ \{4,5,6,7\} \cap \{0,2,4,6\} \cap \{0,1,4,5\} = \{4\} \] \[ \{4,5,6,7\} \cap \{0,2,4,6\} \cap \{2,3,6,7\} = \{6\} \] \[ \{4,5,6,7\} \cap \{1,3,5,7\} \cap \{0,1,4,5\} = \{5\} \] \[ \{4,5,6,7\} \cap \{1,3,5,7\} \cap \{2,3,6,7\} = \{7\} \] Each of these eight triples are families of sets in the product $\prod_{i \in I} P_i$ that have singular intersection.

Product functions:
Definition. let $f_i: A_i \to B_i $ be a family of functions then their product in $Sets^{\to}$ is the function: \[ \prod_{i \in I} f_i : \prod_{i \in I} A_i \to \prod_{i \in I} B_i \] That is defined by applying the function $f_i$ to the value at index $i$ of a sequence in $\prod_{i \in I} A_i$: \[ \prod_{i \in I} f_i(a_1,...a_n) = (f_1(a_1),...,f_n(a_n)) \] Example 1. let $f: X \to Y$ be a function then $f^n: X^n \to Y^n$ is the function that applies the higher-order map function to sequences of size $n$ using the function $f$.

Example 2. let $inc: \mathbb{R} \to \mathbb{R}$ be the function $inc(x) = x+1$ and let $double : \mathbb{R} \to \mathbb{R}$ be the function $double(x) = 2*x$. Then the function $inc \times double : \mathbb{R}^2 \to \mathbb{R}^2$ has $f(x,y) = (x+1, 2*y)$.

The critical characteristic of product functions is that they are embarrassingly parallel. Given a product function $f \times g$, the results of the functions $f$ and $g$ can be computed entirely separately from one another. Then all we need to do is combine the results of these separately computed values at the end to get a final result.

Category theory teaches us that we can treat products using universal properties instead of by reference to their common representations. The product of sets is not necessarily a set of ordered pairs but rather a universal limiting cone, and that is one of its representations. To move beyond issues of representation, we will also have to consider universal cones in $Sets^{\to}$.

Embarassingly parallel decompositions:
Proposition. let $f: A \to B$ be a function in $Sets^{\to}$. Then a decomposition of $f$ into separate information flows is a family $(P_1,Q_1), (P_2,Q_2), ... (P_n,Q_n)$ of information flows such that the source partitions $P_1, P_2, ... P_n$ form a product decomposition of $A$ and the target partitions $Q_1, Q_2, ... Q_n$ form a product decomposition of $B$.

Example 1. let $f(a,b) = (a+1,b+1)$ then ((0,0), (1,1)) form a family of flow relations where (0,1) and (0,1) are both product decompositions. These information flows produce a universal limiting cone in $Sets^{\to}$: Example 2. let $f(a,b) = (b,a)$ be the transposition function then $((0,1),(1,0))$ forms a family of flow relation where (0,1) and (1,0) are both product decompositions. These information flows produce a universal limiting cone in $Sets^{\to}$: We see now that to reconstruct a product function from a family of independent information flows, you only need to take the product of all its quotients. This demonstrates how the theory of information flows in the topos of functions $Sets^{\to}$ can be used to create embarassingly parallel decompositions.

References:
[1] Hartmanis, J., & Stearns, R. E. (1966). Algebraic structure theory of sequential machines.

[2] Bernier, J. (2023). Toposic structure theory of computations.

[3] Goldblatt, R. (2014). Topoi the categorial analysis of logic. Elsevier Science.

Sunday, January 29, 2023

A comparative study of structure theories

The most groundbreaking work in the computer industry was essentially done in the late 1950s to the 1960s, because there wasn't much ground to break yet. The Algebraic Structure Theory of Sequential Machines (1966) owes its origin to that era. Every single computer science treatise now owes something to that time, and most ones have broad similarity to something back then. But the similarities often end there, as examinations of any modern theory always exposes key differences.

That is also the case with the topos theory of computations. While the basic intuitive framework is the same as Hartmanis and Stearns, all technical aspects of the theory are different. In 2023, a preliminary research paper Toposic Structure Theory of Computations (2023) was released with this new and updated theory. If this is well recieved, a full textbook on the the topos of computation, containing the advanced theory, will be produced.

With these two structure theories now widely available for consideration, it is high time that a comparative study was conducted to highlight their similarities and differences. In this study, we will describe why it is so necessary that a new and updated theory should be produced. The value brought about by the application of new techniques in categorical logic and topos theory will be described and the limitations of the classical theory will be considered.

Background and motivation:
The theory of Hartmanis and Stearns was designed with engineering applications in mind. To quote their text, "The engineering motivations demands new mathematical techniques to be invented with no counterpart in the development of algebra. Thus, this theory has a characteristic flavour and mathematical identity of its own." As a consequence, they developed the idea of information flows with its engineering applications in mind.

On the other hand, the Toposic Structure Theory of Computations (2023) is a purely mathematical treatise. It was created with the idea of exploring an aspect of topos theory in mind. That the topos it studies happens to be the topos of computations is a convenient coincidence, that makes this study worthy of publication and wide spread dissemination. This text should be of interest even to pure mathematicians working in topos theory.

Scope:
The Algebraic Structure Theory of Sequential Machines (1966) studies information flows with respect to sequential machines. These are treated as transition functions $\delta: S \times I \to S$ on a set of states $S$ and a set of inputs $I$. In effect, however, they are families of functions $\delta : S \to S$ for each $i \in I$. Then a partition pair of $\delta$ for $(P,Q)$ says that for each $i \in I$ : $s =_P t \Rightarrow \delta(i,s) =_Q \delta(i,t)$.

They never consider to examine what a partition pair $(P,Q)$ be defined an a single function $f: A \to B$ would mean. Already in section 0.1 of their text, on "sets and functions" you can see that they are hopelessly lost in the old ways of set theory. The great mental liberation and significant technical advantages of modern topos theory were not considered.

That is why they define information flows always for families of functions $\delta_i$ rather then considering them individually. If they'd taken the later perspective, their theory would have had better foundations. This approach produces a theory with very limited scope and applicability. It is no wonder then why this theory was hardly ever used and history went the way it did.

By contrast, the Toposic Structure Theory of Computations (2023) is applicable to any kind of functional computation. This produces a theory with far greater and wider applicability than ever before, and it exposes all functions to the logic of information flows. This is really a paradigm shift and a transition from foundations in the topos $Sets$ to the topos $Sets^{\to}$.

Mathematical context:
In the Toposic Structure Theory of Computations (2023) we study the topos $Sets^{\to}$ and its morphisms. These are ordered pairs of functions $(i,o)$ between functions $f$ and $g$ that form commutative diagrams like so: Then $(i,o)$ has an epi-mono factorisation like so: There are your partition pairs $(P,Q)$ emerging right out of morphisms in the Sierpinski topos. They exist between individual functions. A morphism in the topos $Sets^{\to}$ is the embedding of the information flow of some function in another. The algebraic approach to information, it turns out, is nothing more then the logic of the topos $Sets^{\to}$.

By taking this categorical perspective we are able to create a far richer theory than the classical one of Hartmanis and Stearns. The effects of morphisms in $Sets^{\to}$ on information flows is considered categorically, leading to a number of interesting results. Functorial semantics for information flows are developed. None of this would be possible without the use of new developments in category theory.

Conclusions:
The prior work of Hartmanis and Stearns on the algebraic theory of information is inspiring. Any further work in the algebraic theory of information should mention their fundamental work and give proper credit where it is due. Nonetheless, their approach is frought with a number of limitations: limited scope, a lack of category theory, underdeveloped mathematical foundations. This is solved by creating a new theory.

References:
[1] Hartmanis, J., & Stearns, R. E. (1966). Algebraic structure theory of sequential machines. Prentice-Hall.

Saturday, January 28, 2023

Visualising information flows of finite functions

One way that I can demonstrate that information flows is through visualisation. That information flows form a lattice can be seen intuitively by realising that if the information in $P_1$ determines the information in $Q_1$ and the information in $P_2$ determines the information in $Q_2$ then the combined information $P_1P_2$ determines the information in $Q_1Q_2$. That they form a lattice also means they can be displayed as Hasse diagrams, so you can see what they look like.

All visualisations will be done with Clojure and Graphviz. In order to create set theoretic functions, we need the data of an input set, an output set, and a Clojure function or map. This ensures that we can handle functions that are not surjective. But in the easier case where they are surjective, we can just get a function using the to-function method. The elements of the information flow lattices are partition pairs, which are vectors of disjoint families of sets.

Small functions:
(SetFunction. #{0} #{0} {0 0})
(SetFunction. #{0 1} #{0} {0 0, 1 0})
(SetFunction. #{0 1} #{0 1} {0 0, 1 1})
(SetFunction. #{0 1} #{0 1} {0 0, 1 0})
(SetFunction. #{0 1 2} #{0} {0 0, 1 0, 2 0})
(SetFunction. #{0 1} #{0 1 2} {0 0, 1 1})
(SetFunction. #{0 1 2} #{0 2} {0 0, 1 0, 2 2})
(SetFunction. #{0 1 2} #{0 1 2} {0 0, 1 1, 2 2})

Intermediate sized functions:
(to-function {0 0, 1 0, 2 0, 3 0})
(to-function {0 0, 1 0, 2 2, 3 2})
(to-function {0 0, 1 0, 2 0, 3 3})
(to-function {0 0, 1 1, 2 2, 3 2})
(to-function {0 0, 1 1, 2 2, 3 3})
(to-function {0 0, 1 0, 2 0, 3 0, 4 0})
(to-function {0 0, 1 0, 2 2, 3 2, 4 2})
(to-function {0 0, 1 0, 2 0, 3 0, 4 4})
(to-function {0 0, 1 0, 2 0, 3 3, 4 4})
(to-function {0 0, 1 1, 2 1, 3 3, 4 3})
(to-function {0 0, 1 1, 2 2, 3 3, 4 3})
(to-function {0 0, 1 1, 2 2, 3 3, 4 4})
Large or infinite functions:
By generality, functions larger then these have information flow lattices even if we cannot visualize them. The algebraic theory of information ensures that we can exist. Using mathematical techniques we can describe their properties even in the infinite case.

References:
[1] Hartmanis, J., & Stearns, R. E. (1966). Algebraic structure theory of sequential machines. Prentice-Hall.

[2] Lausmaa, T. (2005). Extropy-based quantitative evaluation of finite functions. Proceedings of the Estonian Academy of Sciences. Physics. Mathematics, 54(2), 67. https://doi.org/10.3176/phys.math.2005.2.01

Friday, January 27, 2023

Paying homage to the work of Hartmanis and Stearns

A commenter alerted me to the fundamental work of Hartmanis and Stearns: Algebraic Structure Theory of Sequential Machines (1966). This came closest to capturing the fundamental ideas that I was getting at in my study of information flows. To put my work in context, I want to do something to pay homage to this prior work.

When I first thought about what to title my paper, I couldn't think of anything particularly good that would stick. So I took the good old fashioned shotgun approach and made a big title with a bit of everything: "The Topos Theory of Computing: Introduction to the Mathematics of Dataflow." Yet its clear that calling this a theory of computing doesn't get to the point, because the whole point of this thesis is an examination of structures.

By changing the title of my work to Toposic Structure Theory of Computations (2023) I get an intrinsically better title while also more effectively paying homage to the excellent prior work of Hartmanis and Stearns. This hopefully reduces any controversy around my work and puts it into an appropriate historical context. We now have two different types of structure theory in computer science:
  • The algebraic structure theory
  • The toposic structure theory
The Algebraic Structure Theory of Sequential Machines (1966) is one of those marvelous gems of science and technology that due to historical accident seems to have fallen to the way side. It is something that despite its age still has something to teach us, just like the Lisp machines. There is always something to be learned from studying the best technologies of the past.

But I am not here to revive old technologies. In my work, I developed novel new foundations for the mathematics of information flows using topos theory. It is demonstrated that the information flows of Hartmanis and Stearns are nothing more then quotients in $Sets^{\to}$. By establishing functors from a wide variety of different categories to $Sets^{\to}$ information flows are applied to a wider variety of contexts then ever before.

It is shown that morphisms in $Sets^{\to}$ preserve and reflect partition pairs. The result is a monotone Galois connection between information flow lattices of functions associated to every morphism in $Sets^{\to}$. This opens up a wide variety of different computations that would not be possible without the topos theoretic framework. All of this makes this subject an exciting and fertile ground for further exploration.

References:
Hartmanis, J., & Stearns, R. E. (1966). Algebraic structure theory of sequential machines. Prentice-Hall.

Thursday, January 26, 2023

A new way of programming with information flows

Informal discussion:
The physical world is highly parallel, with many things happening side by side. A realistic model of computation is dataflow, which can be conceptualized as a kind of motion. Much like how particles move around in physical systems, bits of information move around in computers. This motivates our development of a theory of information flows.

But what exactly are these bits of information? Consider even and odd; then both are bit-valued functions that produce ones and zeros. But even and odd produce the same bits of information. They represent the information in two different ways. If $n$ is even then we know it is not odd, and if its not even then we know it is odd. The same applies the other way around.

To represent the same bit of information as evenness or oddness, we can use a partition. Both the even and odd functions are representations of the abstract notion of parity, which can be defined as a partition with two elements. An abstract bit of information is then a partition of a set into two different classes. It does not matter rather you get the parity from even, odd, 2 * mod(n,2), or any other representation. An abstract bit of information may be realized in many different forms.

A piece of information, in general, is just a collection of bits of information. We can call these partitions, but the important point is that they refer to a place within the information of a larger structure. A piece of information is larger than another one if it has more bits than it. Partitions abstract functions so that every function can now be treated as a representation of a piece of information.

We now know how to think abstractly about places within an information system, but we still need to understand how information moves around. Let $f: A \to B$ be a function; then we can describe a flow relation on $f$ by a pair (P,Q) which proclaims that the function moves the information in the place $P$ to the place $Q$. We now have a fundamental notion of information flows.

As an example, return back to our notion of parity. If we take a function like $x+1$ then we see that $x+1$ maps the parity information in its input to the parity information in its output. So (parity, parity) is a datflow relation on $x+1$. If we take an even number and add one to it, then we get an odd number and if we take an odd number and add one to it we get an even number. So $x+1$ flips parity bits around.

On the other hand, $x^2$ leaves the parity bit as it is: the square of an even number is even, and the square of an odd number is odd. Two other examples are $2x$, which always produces a number of even parity, and $2x+1$, which always produces one that is odd. Other functions like $\Omega$ might not map parities back to parities at all. These functions might take different numbers of the same parity and map them to numbers of different parities.

It would be nice if we could take a place in an input structure, like the parity of a number, and then find out exactly what place in the output it flows into. By doing this, we will depart from the traditional view of set theory which is that functions values in sets and instead make them take values in partitions.

In fact, we can do this if we take the notion of a partition image $f(P)$. This produces the set of information we know from $f$ when it is given the information in $P$. So the partition image directly codifies how a function moves information from places to places, and it lets you compute how a function moves bits from one location back to another. The dual concept is the partition inverse image $f^{-1}(Q)$, which is the collection of all bits of information that map into $Q$.

Partition images are the basis of our new way of programming with dataflow relations. A whole new parallel universe of functions that take operate on partitions is created alongside those that take values in sets. This is a new paradigm of computing in information systems which we call functional dataflow.

We provide a system of support for smart functions: special data types of functions that know things about their own data flow characteristics. These are combined with special partition data types, so that smart functions can have computable partition images. They can take a place in a system and return the place it maps to. This makes information flows accessible to the user.

As a basic example, consider a transposition permutation on pairs. Then this function swaps the order in an ordered pair $f(x,y) = (y,x)$. We might make this into a smart function that records its own flow information so that it records that the information in the first index is moved to the second index, and the information in the second is moved to the first. This flow information might be made computable using partition images, so that if you $f$ what it maps the first index to it will return the second index.

If bits of information are the simplest units of storage, then this suggests that bit-flow relations $(P,Q)$ between bits of information $P$ and $Q$ are the simplest forms of dataflow. We already saw an example of a bit flow relation with the (parity, parity) relations in arithmetic functions. These bit-flow relations are the atomic building blocks for all other dataflow relations.

Information flows can now be seen to form an ordering by scale. There can be large events with millions of bits moving around to ones with only one or two. By the same token, an event in a physical system might have anything from the motion of an astronomical body on the higher end to the motions of subatomic particles on the smallest scales. In both cases, we have an intuitive notion of scale. We now have a way of capturing information flows from the smallest scales to the largest.

For technical experts:
For aficionados of functors, categories, adjunctions, toposes, lattices, lenses, theorems and proofs: the topos structure theory of computing

Sunday, January 22, 2023

Toposic structure theory of computations FAQ

As I present this new thesis on the role of topos theory in computing to the world, I have anticipated a number of possible questions and answers ahead of time. I will do my best to answer any questions you may have.

What prior work is closest to yours?
With the help of a commentor on hacker news, I have determined that the Algebraic Structure Theory of Sequential Machines (1966) comes closest my work. Most impressively, they came close to developing the idea of a dataflow relation without using Sierpinski topos theory. Instead of an ordinary algebraic structure theory, I provide an updated theory based instead upon topoi.

What other prior works do you build upon the most?
The first and foremost is probably Robert Goldblatt's Topoi: The Categorial Analysis of Logic. It is with this text that I first discovered the Sierpinski topos which opened up the way for my own further studies in the subject. A second reference text is Sketches of an Elephant – A Topos Theory Compendium. Aside from these a variety of lattice theory textbooks and papers provided extra tidbits of information.

What is the practical basis of applied topos theory?
I believe the answer to this is the principle of locality. The physical universe of which we are all a part is organized around this principle which states that an object is influenced directly only by its immediate surroundings. This creates an organizing principle of relevance to all fields of engineering. In classical physics this motivates the modeling of physical systems based upon their local behaviour. In computing, this motivates our theories of local computation and dataflow.

What use is a topos theory of computing?
The topos theory of computing can be used to model information loss and information flow. The first is relevant because the second law of thermodynamics states that the entropy of a logical system cannot decrease - which implies that information loss leads to heat dissipitation. The second is relevant because the principal of locality necessitates that computer systems should be modeled in terms of information flows and local effects.

What is this topos theory of computing?
I would like to explain it by analogy to physics. In physics we study the movement of particles from place to place. In this theory of computing, we instead study the movement of packets of bits from place to place. In order to describe these information flows I have presented a new formalism based upon Sierpinski topos theory.

Toposic structure theory of computations

Abstract:
The theory of data-flow analysis of computer programs has been extensively studied. The increasing need for dataflow analysis in the automatic parallelization of computer programs motivates the development of mathematical foundations for this field. We present a new approach to program dataflow analysis that incorporates partition lattices and the Sierpinski topos.

Link:
The topos theory of computing: introduction to the mathematics of dataflow