Sunday, February 19, 2023

Functorial characterizations of internal magmoids

An internal quiver in a category $C$ is just a diagram $F: T_2^* \to C$. These diagrams consist of a pair of parallel arrows in the category. Then if we are given such an internal quiver, we can always form an equalizer of its to arrows to get a subobject of the arrow object $A$. The problem is that this subobject $S$ is not going to contain all the elements we want for a composition domain. For example, for a category $C$ this consists of all endomorphisms instead of all composable pairs of arrows. Instead we have to work with pullbacks, and get the pullback of the first and second arrow. This is why internal categories are always defined over categories that have all pullbacks. Then this pullback has the property that it is the equalizer of the first and second arrows as they filter through the product object $A^2$. This makes $R$ into an internal relation in the category $C$, embedded in the object $A$. If we take a look at the above diagram, its not hard to see that the first and second projection arrows can be made to filter through $A \times_O A$ to make them into a ternary quiver consisting of three parallel arrows going to $A$. This leads to a diagram that looks like this. Then diagrams of this type form a category $C$. The axioms of this category $C$ can be used to ensure the laws specifying the source and target of composite morphisms. In particular, we can ensure that the source of the composite is the source of the second morphism and the target of the composite is the target of the first morphism. All that can be encoded in the index category. We then have the idea of a compositional quiver, which is a further over this index category $CQ$.

The importance of this functorial characterization is that we can form a functor category $C^{CQ}$. The problem, however, is that internal compositional quivers don't necessarily have to have nice input domains $A \times_O A$. This is why to have an internal magmoid, we must have a compositional quiver diagram with the property that the composition domain is the pullback $A \times_O A$.

In other words, this means that in the compositional quiver diagram $A \times_O A$ and the morphisms $1$ and $2$ form a universal limiting cone for the pullback. This leads me to an interesting idea: functors with the property that certain subcones within the functor, described as subdiagrams, are universal. This can be used for example to characterize whenever an object in a diagram is a product or a coproduct. Something like this would make for an interesting data structure for those interested in implementing category theory on the machine.

To form an internal category is a little bit more involved. We can always add identities to an internal magmoid diagram, by adding another identity morphism going from objects to arrows. To make that unital magmoid into a category also requires that we satisfy associativity and the unit laws. Rest assured, however, that internal categories can be formed in $Sets^{\to}$ because it has all pullbacks, and if there is anything interesting to be said there I will be saying it eventually. For now these are just some of my thoughts on internal magmoids and related notions.

A key takeaway is that everything should be represented by a functor. Natural transformations for example are simply functors from a category in to an arrow category, so they also have nicely defined functorial semantics. If we take categories to be something like functors or diagrams internal to some category, that is a step forwards too. While internal categories could be just some collection of objects and arrows satisfying some axioms, if we can make them in to special types of functors that is most convenient. Functors are the ideal structures we should work towards dealing with.

References:
internal categories

Saturday, February 18, 2023

Future directions in declarative programming

This blog has been around for a while now, and as time has gone on some of my opinions and views have changed. One thing that has remained the same is that I have always had a mind focused on logic. Everything to do with logic interests me. I don't know why this is, some of us just have a mind focused on logic I suppose.

As a logician, I have tried to seek out a wide variety of tools of logical reasoning: logic programming, horn clauses, set theory, predicate logic, ontologies, lattices and formal concept analysis, categorical logic and topoi, etc. As I have done all this research and thinking about logic I've felt there is a mismatch between functional programming and the techniques I use. My primary problem then has been how do we solve this mismatch between functional programming and logic?

Problem: functional programming is not logical enough

Solution 1: integrate functional and logic programming in to a single language

Solution 2: embrace functional programming but make it more logical by developing a new logic of functions

When I first encountered topos theory, I thought that solution 1 would be the way to go. We would simply take existing logical and functional programming techniques and use topoi as a unifying bridge between them. The result would be a unified multi-paradigm declarative language with topos theory as its core.

The idea of unifying functional and logical programming is rather obvious and it fits with the common practice of multi-paradigm programming. The problem with the approach of throwing everything and the kitchen sink into a language is its indecisive. The multi-paradigm approach is too fence sitting and it fails to committ to one thing or another.

It occurs to me in developing the toposic structure theory of computations (2023) that we can just develop new logical techniques for handling functions to get our 'logic fix' that way, all without leaving functional programming for another paradigm. Practical work in modern logic uses categoric logic and topos theory. Therefore, this new logic of functional programming will have to be developed using categorical logic and the topos of functions.

What should the future trend in declarative programming be? It should be towards more and better functional programming. We can make things more functional by developing them in the functional languages: Clojure, F#, Haskell, etc and we can make these existing functional languages better. Instead of the multi-paradigm concept, functional programming should come first.

Previously:
The future of declarative programming

References:
[1] Hartmanis, J., & Stearns, R. E. (1966). Algebraic structure theory of sequential machines.

[2] Bernier, J. (2023). Toposic structure theory of computations.

Tuesday, February 14, 2023

The logic of functional programming

A trend toward functional programming has emerged in the last fifteen years. Functional programming languages have seen greater adoption, and several functional features have finally started to make their way into mainstream languages. Lambda expressions have been added to C++ (with C++11), Java (with Java 8), and C# (with version 2).

I rode on the wave of this rise in functional programming. When functional programming started becoming mainstream, I decided to join in. As I went through this personal journey, there have been various problems I have sought to solve along the way and ideas that I have sought to make sense of.

Functional modification

I made my own journey from imperative languages, primarily JavaScript, to more functional ones. One of the first things we are all faced with when making the transition to functional programming is making sense of immutability and functional purity. How is it possible to program if we cannot change anything?

As it turns out, changing things in functional languages is possible. You are just encouraged to do so with pure functions. A case in point is the assoc function in Clojure. This pure function takes in a composite structure and returns a changed version without affecting the original structure.
(def coll [1 2 3 4])
(def new-coll (assoc coll 0 10))
(= coll new-coll) ;=> false
The assoc function is generalized to handle various composite data structures, all within a functional framework. So we can use assoc to modify the parts of a map:
(def point {:x 1, :y 2})
(assoc point :x 10) ;=> {:x 10, :y 2}
In addition, the assoc-in method can be used to make modifications deep within the nested parts of a composite data structure. For example, it can be used to modify the values within a matrix.
(def matrix 
  [[1 2 3]
   [4 5 6]
   [7 8 9]])
(assoc-in matrix [1 1] 50) ;=> [[1 2 3] [4 50 6] [7 8 9]]
So it would seem that there is a way of modifying things in functional languages. It is just that the modifications take on a different form. This leads us to functional references, which help us to make sense of functional modifications.

Functional references:
Traditional languages like C++, Java, C#, and JavaScript provide generalized imperative references in the form of lvalues (locator values). An lvalue is anything that occurs on the left-hand side of an equal sign. The right-hand side contains what the lvalue is to be turned into. When you move over to the functional world, you get generalized functional references, also known as lenses.
; the basic idea is that a lens can be defined by combining a pure 
; getter function and a pure setter function.  
(deftype Lens [getter setter]
  clojure.lang.IFn 
  (invoke [this arg] (get arg))
  (applyTo [this args] (clojure.lang.AFn/applyToHelper this args)))
Lenses implement clojure.lang.IFn, so they can be treated a lot like functions. However, we need to define special methods that wrap their setter functions to make further use of them.
(defn modify 
  "Set the part of object at lens to val."
  [lens obj val]
  
  (let [setter (.-setter lens)]
    (setter obj val)))

(defn zap
  "Apply the function func to the part of the obj at lens."
  [lens obj func]
  
  (let [setter (.-setter lens)]
    (setter obj (func (lens obj)))))
The clojure get and get-in functions are the counterparts of assoc and assoc-in. We can combine get and assoc methods to form lenses.
; an example lens which describes the first part of a tuple
(def first-lens
  (Lens.
   ; get the first part of a tuple
   (fn [arg]
     (get arg 0)) 
    
    ; set the first part of a tuple
    (fn [arg v]
      (assoc arg 0 v))))

; an example lens which describes the second part of a tuple
(def second-lens 
  (Lens.
   ; get the second part of a tuple
   (fn [arg]
     (get arg 1)) 
    
    ; set the second part of a tuple
    (fn [arg v]
      (assoc arg 1 v))))
A fundamental property of references in imperative and functional languages is that they can be arbitrarily nested. In the functional world, this nesting is defined compositionally.
(defn ^Lens compose-lenses 
  "combine two lenses to get their composition"
  [^Lens f, ^Lens g]
  
  (Lens.
   (comp (.-getter f) (.-getter g))
    (let [set-f (.-setter f)
          set-g (.-setter g)]
      (fn [a v]
        (set-g a (set-f (g a) v))))))
By using lens composition, we can recapture the nesting provided by Clojure's get-in and assoc-in methods with a much more extensible framework. With lenses, we have now translated references from an imperative context to a functional one.

Bijections are lenses:
Another key fact is that bijections are lenses. In particular, we can form a lens from a function and its inverse. This basic fact is implemented in the code below.
(defn bijection-lens 
  "get a lens from a function and its inverse."
  [func inv-func]
  
  (Lens.
   func 
    (fn [arg v]
      (inv-func v))))
A most obvious example is the bijection between booleans like true and false and bits like zero and one. It is customary to treat false as zero and one as true. So we use this to define a bijection.
(def boolean-to-bit 
  (bijection-lens
    {false 0, true 1} 
    {0 false, 1 true}))
We can now define an object of the boolean type and then modify it in terms of its bit representation.
(def condition true)
(modify boolean-to-bit condition 0) ;=> false
Recalling that we earlier demonstrated that lenses are compositional, any lens can now be composed with a bijection to get a different lens. A lens now has two different components: the information it refers to and its representation of that information. There is a single notion of the first part of a pair of numbers, but there are potentially infinite representations of that same first part.

Algebraic theory of information:
The fact that lenses can be composed with bijections suggests a dichotomy between information and its accessors.
  • Information: a location within a larger structure
  • Accessor: the way that a given piece of information is viewed
The first part of a tuple $X^2$ is a unique piece of information, but it may have many different representations. Each composition with a bijection produces a different representation. With so many bijections available producing different representations, it is important that we consider lenses and information as different.

There is a further important property. We often find that some reference has more information than another, or that one memory location has more data then another. So, for example, consider the class of 2x2 matrices. Then these matrices have a first row and a first column, and both of them together contain the $(0,0)$ entry. So the $(0,0)$ is an information substructure of the first row and the first column.

We can think of a lens as a way of drilling down into part of a structure, and some lenses drill down deeper into a structure. The depth that a lens goes down to is part of the theory of information orders, which is a kind of logic. This motivates our further study of the logic of functional programming.

The varied behaviour of functions:
Even with this groundwork laid by functional references we don't really know how general functions behave. With all this work with lenses, all we can do is translate concepts from imperative programming (like modification and references) to functional programming. But we still don't have a way of describing how functions behave.

To explain why we haven't gotten that far yet, we should look at some examples. If we have a quadruple in the set $X^4$ then we can say that it has four different lenses associated to each of its indices. Each lens refers to a different part of the quadruple. We can define a permutation $f(a,b,c,d) = (b,a,d,c)$ which swaps the first and second pairs of values.
(defn permute-quadruple 
  "Swap the first value with the second and the third with  
  the fourth"
  [[a b c d]]
  
  [b a d c])
  
(permute-quadruple [10, 20, 30, 40]) ;=> [20, 10, 40, 30]
This is not simply the modification of some part of a structure. This permutation moves data from one lens to another. There seem to be motions and subcomputations occuring inside of our pure functions! What are these? How do you explain them? This will need to be the subject of further investigation.
(defn transform-quadruple
  "A transformation of a quadruple which is not a permutation."
  [[a b c d]]
  
  [c a a c])
  
(permute-quadruple [10, 20, 30, 40]) ;=> [30, 10, 10, 30]
Aside from moving values around from place to place, a function can modify them. For example, you have things like map which lets you take a function and apply it to all parts of a structure.
(defn increment-triple
  "Increment all parts of a triple"
  [[a b c]]
  
  (vec (map inc [a b c])))
  
(increment-triple [1 2 3]) ;=> [2 3 4]
One further example uses combiners like + and * to take multiple different places and combine them into a single place in the result.
(defn sum-product-pair
  "Get the sum of the first two elements and the product of the 
   next two."
  [[a b c d]]
  
  [(+ a b) (* c d)])
  
(sum-product-pair [10 20 30 40]) ;=> [30 1200]
These are some of the many examples of things we can do with functions. A couple more examples come from linear algebra:
  • When you transpose a matrix each entry in the output matrix is determined by an entry in the input matrix.
  • If you take two matrices and multiply them then each entry of the output matrix is determined by a row in the first matrix and a column in the second.
  • When you add two matrices each entry in the output matrix is determined by the an entry in the first matrix and an entry in the second matrix.
There are examples abound throughout math. This concept of information dependence appears everywhere. Once we have a concept of information and of functional references, it is desirable that we should have a logical theory as to how functions behave around these references. To describe how they move information around. It follows that there is still more that needs to be done.

Describing the transport of data:
The varied behaviour of functions that we are seeking to explain all have a common description. They all transport data from one place to another using some intermediate function. Permutations move data from one place to another without changing them along the way. They are purely a rearrangement of data. The map function moves data from a location back to itself, but it changes it along the way.

Commutative square diagrams can be used to explain all these varied behaviours of functions that we have been seeing. In order to see this it might be helpful to explain commutative square diagrams using exceptionally descriptive variable names. While we could describe each of its four components by single letter variables: f,g,h, and i perhaps the approach I am about to propose will be more suggestive.

We can say that a commutative square diagram for a given function, has in addition to that function three additional components: an input function, an output function, and a transport function. Intuitively, this means that the commutative square diagram describes the movement of data from the input place to the output place occurs along the given transport function. This is how we will describe the movement of data from one place to another. With this fundamental formalism we now have the means we need to describe the behavior of functions, such as how data is moved from one place to another. If we look back at the examples we were exploring before, like permutations, transformations, matrix transpositions, the incrementation of triples, arithmetic functions, etc they all have an explanation for their behavior using this fundamental formalism.

Take the function permute-quadruple. It moves the first index to the second index, and vice versa, and the third index to the fourth index and vice versa. In each case data is moved from one place to another along the identity function. In the case of increment-triple each index is moved back to itself using the increment-triple function. In these examples, the motion of data from one place to another by a function is described by a commutative square diagram.

When we introduced lenses and functional references we made a very big step forward, because we had a general formalism for describing functional mutations, yet the story could not end there. Introducing these functional references left us open to the question of how we could understand how various different kinds of functions move the data these lenses point to around. With commutative square diagrams, we have the means we need to explain all the behaviour we have been seeing.

Information flows:
What we have so far described is how data can be moved from an input place to an output place along some transport function. We can also ask what can be said of data migrations without a transport function? What happens if we just want to talk about input and output locations by themselves?

Then the formalism we get is a pair $(I,O)$ of an input location and an output location. These location pairs, or partition pairs, describe how ignorance spreads or information flows under the operation of a function. The idea of exploring such pairs with a view towards studying how information flows was first put forward by Hartmanis and Stearns:
Therefore, the S.P partitions on a machine describe the cases in which the "ignorance" about the exact states of the machine does not increase as the machine operates. The concept of partition pairs is more general and is introduced to study how "ignorance spreads" or "informaton flows" through a sequential machine when it operates.
J. Hartmanis, R. E. Stearns
By studying these partition pairs in the context of commutative square diagrams, we can make them part of a larger framework that includes a theory of the transport of information. At the same time, these information flows help us to describe how these commutative square diagrams function.

In a recent paper of mine, I study these information flows in terms of the commutative diagrams they produce. It is noted, for example, that for any function $f: A \to B$ the partition pairs of $f$ form a lattice, and the properties of these lattices are described. We use these results to form a new kind of structure theory.

The topos of functions:
If we take two different commutative square diagrams, where the transport of one is the function of another, then these two diagrams can be composed. The composition of the two square diagrams occurs componentwise along their input and output functions. The result is a new square diagram. If we take this to be our composition law, the result is a category: the category $Sets^{\to}$. If we look a little deeper we find there is something very special about this category: it is a topos.

The fact that it is a topos gives us a lot. This means that $Sets^{\to}$ has all small limits and colimits, products, coproducts, epi-mono factorisations, subobject classifiers, an object of truth values, logical connectives, an internal logic, exponentation, power objects, subobject lattices, etc. Everything you can do in a topos you can do in $Sets^{\to}$. This makes $Sets^{\to}$ a broadly interesting subject of study in its own right, with some interesting properties of its own, like the failure of the law of excluded middle to hold.

The key thing for us is how $Sets^{\to}$ can help us all better understand functional programming. Its central concept, the commutative square diagram, describes how data moves around in functions. The commutative square can be described by a function, an input place, an output place, and a transport function that describes how inputs are mapped to outputs. As is true for any form of categorical logic, the dual logical theories of $Sets^{\to}$ are emergent properties of these commutative square diagrams.

Functional lenses let you focus in on some particular piece of a complex data structure. Commutative square diagrams, on the other hand, let you focus on some particular piece of a function. The result is the structure theory of functions. With this structure theory, we can now reason logically about the behaviour of functional programs.

Feedback:
How can we further improve the structure theory of functional programs? Write to me at jhuni.v2@gmail.com or leave a comment in the comment section below.

References:
[1] Hartmanis, J., & Stearns, R. E. (1966). Algebraic structure theory of sequential machines.

[2] Bernier, J. (2023). Toposic structure theory of computations.

[3] Goldblatt, R. (2014). Topoi the categorial analysis of logic. Elsevier Science.

[4] Johnstone, P. T. (2008). Sketches of an Elephant: A topos theory compendium. Oxford University Press.

[5] Lane, M. S., & Moerdijk, I. (2010). Sheaves in geometry and logic: A first introduction to topos theory. Springer.

[6] Barr, M., & Wells, C. (1985). Toposes, triples and theories. Springer-Verlag.

[7] MacLarty, C. (2005). Elementary categories, elementary toposes. Clarendon Press.

[8] Bell, J. L. (2008). Toposes and local set theories: An introduction. Dover Publications.

[9] Moerdijk, I. (2008). Classifying spaces and classifying topoi. Springer.

[10] Caramello, O. (2018). Theories, sites, toposes: Relating and studying mathematical theories through topos-theoretic 'Bridges'. Oxford University Press.

Tuesday, February 7, 2023

The number of available processors

In order to parallelize a function that has a well-defined product decomposition, such as the map function or one of its variants it is good that we should determine how many processors are available in a given system. This can be achieved on the JVM using the availableProcessors method of the Runtime class.

Java:
public class ProcessorCounter {
    public static void main(String[] args) {
        int processorsCount = Runtime.getRuntime().availableProcessors();
        System.out.println("The number of available processors is:" + Integer.toString(processorsCount));
    }
} 

Clojure:
(let [runtime (Runtime/getRuntime)
      processors-count (.availableProcessors runtime)]
  (println "The number of available processors is: " (str processors-count)))
Limits to parallelism:
There are two limits to parallelism:
  • The number of available processors, which is determined by the available processors method.
  • The overhead of threading, which states that sometimes the cost of launching parts of a computation into separate threads can outweigh the advantages of parallelism.
It doesn't make much sense to parallelize a computation if you don't have the available processors to take advantage of that. On the other hand, the overhead of threading is one reason why it isn't always good to use pmap in Clojure.

Saturday, February 4, 2023

On Clojure's map and pmap functions

Let $f: A \to B$ be a function then mapping $f$ over $n$-tuples is the same as applying the product function $f^n$. The map function is one of the simplest types of product functions: the product of a function with itself. Consequently, map is embarrassingly parallel. Its parallelism is one of the main ingredients of modern parallel programming alongside the referential transparency that comes with functional programming languages.

If a programming language has referential transparency, then its components can be computed in any order and then combined afterwards to produce a common result. This is why functional programming languages are so good at modern parallel computing, and its a large part of why they have be seen to be getting greater traction in recent times. The parallelism of reduce is essentially in this category, because the combination of associativity and referential transparency allows you to freely determine your computation order.

The map function is distinguished by the fact that it is embarassingly parallel. This means that map can be decomposed into different independent computation tasks requiring no communication between one another until they are joined at the end. The fact that map is embarassingly parallel has already been put to good use in a number of parallel computing applications, such as MapReduce libraries like Apache Hadoop.

However, while the use of parallel mapping constructs is satisfactory that doesn't mean we should stop there. It is also desirable that we should have a general mathematical structure theorem for describing when functions can be given embarassingly parallel implementations. This will capture the categorical universal properties that make embarassing parallelism possible, without regard for representation details like the use of sequences. A truly abstract mathematical description should not make reference to representations.

If we have an embarrasingly parallel computation such as the application of the map function, then we can break it down into parts using threads or multiple computers. Then when the computations are done they can be combined to produce a common result. This is what the Clojure pmap does, as it uses the Java threading API to split up a computation into parts that can be run on as many processors as possible.
  • The pmap function first gets the number of processors on the machine by using the static getRuntime method on java.lang.Runtime and then calling the virtual availableProcessors method on that to get the number of processors available on the JVM. The number of processors is used to determine how many threads to break the computation in to.
  • Once the number of processors is determined the pmap function breaks up the computation of the map into separate threads using Clojure futures. These take a computation and run it on another thread without blocking, and then they cache the result so that it can be accessed later by dereferences.
  • Finally, the computation of the parallel map is computed semi-lazily by using the lazy-sequence function. It computes an appropriate chunk of the sequence determined by the number of available processors but it doesn't compute everything unless required.
The implementation of an embarassingly parallel function like map will probably have to take a similar one to that played by Clojure's pmap: the computation will be broken up into threads handled by different processors. However, what makes this possible is a mathematical property shared not only by map but potentially by a number of other different types of functions. By understanding these structural properties, we can better understand the parallelisation of functional applications.

References:
Map (parallel pattern)