\documentclass[7Sketches]{subfiles}
\begin{document}
\setcounter{chapter}{2}%Just finished 2.
%------------ Chapter ------------%
\chapter[Databases: Categories, functors, and (co)limits]{Databases:\\Categories, functors, and universal constructions}\label{chap.databases}
%\settocdepth{subsubsection}
%\tableofcontents*
%-------- Section --------%
\section{What is a database?}\label{sec.C2_motivation}
\index{database}
Integrating data from disparate sources is a major problem in industry today. A
study in 2008 \cite{bernstein.Hass:2008a} showed that data integration accounts for 40\% of IT (information technology) budgets, and that the market for
data integration software was \$2.5 billion in 2007 and increasing at a rate of
more than 8\% per year. In other words, it is a major problem; but what is it?
%---- Subsection ----%
%\subsection{What is a database?}
\paragraph{A database is a system of interlocking tables}\index{database!as
interlocking tables}
Data becomes information when it is stored \emph{in} a given \emph{formation}. That is, the numbers and letters don't mean anything until they are organized, often into a system of interlocking tables. An organized system of interlocking tables is called a database. Here is a favorite example:
\begin{equation}\label{eqn.fav_ex_db}
\begin{tabular}{ c | c c c}
\textbf{Employee}&\textbf{FName}&\textbf{WorksIn}&\textbf{Mngr}\\\hline
1&Alan&101&2\\
2&Ruth&101&2\\
3&Kris&102&3
\end{tabular}
\hspace{.6in}
\begin{tabular}{ c | c c}
\textbf{Department}&\textbf{DName}&\textbf{Secr}\\\hline
101&Sales&1\\
102&IT&3\\~
\end{tabular}
\end{equation}
These two tables interlock by use of a special left-hand column, demarcated by a vertical line; it is called the ID column. The ID column of the first table is called ``Employee'', and the ID column of the second table is called ``Department''. The entries in the ID column---e.g.\ 1,2,3 or 101, 102---are like row labels; they indicate a whole row of the table they're in. Thus each row label must be unique (no two rows can have the same label), so that it can unambiguously specify its row.
Each table's ID column, and the unique identifiers found therein, is what allows for the interlocking mentioned above. Indeed, other entries in various tables can reference rows in a given table by use of its ID column. For example, each entry in the WorksIn column references a department for each employee; each entry in the Mngr (manager) column references an employee for each employee, and each entry in the Secr (secretary) column references an employee for each department. Managing all this cross-referencing is the purpose of databases.
Looking back at \cref{eqn.fav_ex_db}, one might notice that every non-ID column, found in either table, is a reference to a label of some sort. Some of these, namely WorksIn, Mngr, and Secr, are \emph{internal references}; they refer to rows in some ID column. Others, namely FName and DName, are \emph{external references}; they refer to strings or integers, which are also labels whose meaning is known more broadly. Internal reference labels can be changed as long as the change is consistent---1 could be replaced by 1001 everywhere without changing the meaning---whereas external reference labels certainly cannot! Changing Ruth to Bruce everywhere would change how people understood the data.
The reference structure for a given database---i.e.\ how tables interlock---tells us something about what information was intended to be stored in it. One may visualize the reference structure for \cref{eqn.fav_ex_db} graphically as follows:
\begin{equation}\label{eqn.free_schema}
\text{easySchema}\coloneqq
\boxCD{
\begin{tikzcd}[row sep=large, ampersand replacement=\&]
\LTO{Employee}\ar[rr, shift left, "\text{WorksIn}"]\ar[dr, bend right, "\text{FirstName}"']\ar[loop left, "\text{Manager}"]\&\&
\LTO{Department}\ar[ll, shift left, "\text{Secretary}"]\ar[dl, bend left, "\text{DepartmentName}"]\\
\&\LTO{String}
\end{tikzcd}
}
\end{equation}
\vskip 1em
\begin{equation}
\text{easySchema}\coloneqq
\boxCD{
\begin{tikzcd}[row sep=large, ampersand replacement=\&]
\LTO{Employee}\ar[dr, bend right, "\text{FirstName}"']\ar[loop left, "\text{Manager}"]\&\&
\\
\&\LTO{String}
\end{tikzcd}
}
\end{equation}
\begin{tikzcd}[row sep=large, ampersand replacement=\&]
\LTO{Germans}\ar[rr, shift left, "\text{FriendOf}"]\&\&
\LTO{Italians}\ar[rr, shift left, "\text{ChildOf}"]\&\&
\LTO{Austrians}
\end{tikzcd}
\vskip 1em
\begin{tikzcd}[row sep=large, ampersand replacement=\&]
\LTO{Germans}\ar[rr, shift left, "\text{FriendOf}"]\&\&
\LTO{Italians}\ar[ll, shift left, "\text{FriendOf}{}^\prime "]
\end{tikzcd}
\vskip 2em
\begin{tikzcd}[row sep=large, ampersand replacement=\&]
\LTO{Germans}\ar[rr, shift left, "\text{Friend}"]\&\&
\LTO{Italians}
\end{tikzcd}
\vskip 2em
\begin{tikzcd}[row sep=large, ampersand replacement=\&]
\LTO{Germans}
\end{tikzcd}
\vskip 2em
\begin{tikzcd}[row sep=large, ampersand replacement=\&]
\LTO{People}\ar[loop left, "\text{FriendOf}"]
\end{tikzcd}
\vskip 2em
This is a kind of ``Hasse diagram for databases'', much like the Hasse diagrams for posets in \cref{rem.Hasse}. How should you read it?
\index{database!schema}
The two tables from \cref{eqn.fav_ex_db} are represented in the graph \eqref{eqn.free_schema} by the
two black nodes, which are given the same name as the ID columns: Employee and Department. There is
another node---drawn white rather than black---which represents the external reference type of strings, like ``Alan'', ``Alpha'', and ``Sales". The arrows in the diagram represent non-ID columns of the tables; they point in the direction of reference: WorksIn refers an employee to a department.
\begin{exercise}
Count the number of non-ID columns in \cref{eqn.fav_ex_db}. Count the number of arrows in \cref{eqn.free_schema}. They should be the same number; is this a coincidence?
\end{exercise}
\index{free!schema}
A Hasse-style diagram like the one in \cref{eqn.free_schema} can be called a \emph{database schema}; it represents how the information is being organized, the formation in which the data is kept. One may add rules, sometimes called ``business rules'' to the schema, in order to ensure the integrity of the data. If these rules are violated, one knows that data being entered does not conform to the way the database designers intended. For example, the designers may enforce that
\begin{itemize}
\item every department's secretary must work in that department;
\item every employee's manager must work in the same department as the employee.
\end{itemize}
Doing so changes the schema, say from ``easySchema'' \eqref{eqn.free_schema} to ``mySchema'' below.
\begin{equation}\label{eqn.mySchema}
\text{mySchema}\coloneqq
\boxCD{
\begin{tikzcd}[row sep=large, ampersand replacement = \&]
\LTO{Employee}\ar[rr, shift left, "\text{WorksIn}"]\ar[dr, bend right, "\text{FirstName}"']\ar[loop left, "\text{Manager}"]\&\&
\LTO{Department}\ar[ll, shift left, "\text{Secretary}"]\ar[dl, bend left, "\text{DepartmentName}"]\\
\&\LTO{String}
\end{tikzcd}
\\~\\\footnotesize
Department.Secr.WorksIn = Department\\
Employee.Mngr.WorksIn = Employee.WorksIn
}
\end{equation}
The difference is that $\mathrm{easySchema}$ plus constraints equals $\mathrm{mySchema}$.
We will soon see that database schemas are categories $\cat{C}$, that the data itself is given by a ``set-valued'' functor $\cat{C}\to\smset$, and that databases can be mapped to each other via functors $\cat{C}\to\cat{D}$. In other words, there is a relatively large overlap between database theory and category theory. This has been worked out in a number of papers; see \cref{sec.ch2_further_reading}. It has also been implemented in working software, called FQL, which stands for \emph{functorial query language}. Here is example FQL code for the schema shown above:
\footnotesize
\begin{verbatim}
schema mySchema = {
nodes
Employee, Department;
attributes
DName : Department -> string,
FName : Employee -> string;
arrows
Mngr : Employee -> Employee,
WorksIn : Employee -> Department,
Secr : Department -> Employee;
equations
Department.Secr.worksIn = Department,
Employee.Mngr.WorksIn = Employee.WorksIn;
}
\end{verbatim}
\normalsize
\index{functorial query language, FQL}
%---- Subsection ----%
%\subsection{Communication between databases}
\paragraph{Communication between databases}
We have said that databases are designed to store information about something. But different people or organizations can view the same sort of thing in different ways. For example, one bank stores its financial records according to European standards and another does so according to Japanese standards. If these two banks merge into one, they will need to be able to share data despite differences in the shape of their database schemas.
Such problems are huge and intricate in general, because databases often comprise hundreds or thousands of interlocking tables. Moreover, these problems occur more frequently than just when companies want to merge. It is quite common that a given company moves data between databases on a daily basis. The reason is that different ways of organizing information are convenient for different purposes. Just like we pack our clothes in a suitcase when traveling but use a closet at home, there is generally not one best way to organize anything.
Category theory provides a mathematical approach for translating between these
different organizational forms. That is, it formalizes a sort of automated
reorganizing process called \emph{data migration}, which takes data that fits
snugly in one schema and moves it into another.\index{database!data migration}
Here is a simple case. Imagine an airline company has two different databases, perhaps created at different times, that hold roughly the same data.
\begin{equation}\label{eqn.airline_schemas}
\begin{tikzpicture}[commutative diagrams/every diagram, inner sep=10pt, baseline=(A)]
\matrix[matrix of math nodes, name=A, row sep=25pt, column sep=15pt, commutative diagrams/every cell] {
&
|(AD)|\LTO[\circ]{\$}
\\
|(AE)|\LTO{Economy}
&&
|(AF)|\LTO{First Class}
\\
&
|(AS)|\LTO[\circ]{string}
\\
};
%
\path[commutative diagrams/.cd, every arrow, every label, font=\scriptsize]
(AE) edge["Price"] (AD)
edge["Position"'] (AS)
(AF) edge["Price"'] (AD)
edge["Position"] (AS);
\node[draw, fit=(AE) (AD) (AF) (AS)] (A box) {};
\node[left=0 of A box] {$A\coloneqq$};
%%
\matrix[matrix of math nodes, name=B, row sep=25pt, commutative diagrams/every cell, matrix anchor=south, right=3 of A.south east] {
|(BD)|\LTO[\circ]{\$}
\\
|(BAS)|\LTO{Airline Seat}
\\
|(BS)|\LTO[\circ]{string}
\\
};
%
\path[commutative diagrams/.cd, every arrow, every label, font=\scriptsize]
(BAS) edge["Price"'] (BD)
edge["Position"] (BS);
\node[draw, fit=(BAS) (BD) (BS)] (B box) {};
\node[right=0 of B box] {$=:B$};
%
% \draw[functor] (A box) to node[above] {$F$} (B box);
\end{tikzpicture}
\end{equation}
Schema $A$ has more detail than schema $B$---an airline seat may be in first class or economy---but they are roughly the same. We will see that they can be connected by a functor, and that data conforming to $A$ can be migrated through this functor to schema $B$ and vice versa.
The statistics at the beginning of this section show that this sort of problem---when occurring at enterprise scale---has proved difficult and expensive. If one attempts to move data from a source schema to a target schema, the migrated data could fail to fit into the target schema or fail to satisfy some of its constraints. This happens surprisingly often in the world of business: a night may be spent moving data, and the next morning it is found to have arrived broken and unsuitable for further use. In fact, it is believed that over half of database migration projects fail.
In this chapter, we will discuss a category-theoretic method for migrating data. Using categories and functors, one can prove up front that a given data migration will not fail, i.e.\ that the result is guaranteed to fit into the target schema and satisfy all its constraints.
The material in this chapter gets to the heart of category theory: in particular, we discuss categories, functors, natural transformations, adjunctions, limits, and colimits. In fact, many of these ideas have been present in the discussion above:
\begin{itemize}
\item The schema pictures, e.g.\ \cref{eqn.mySchema} depict categories $\cat{C}$.
\item The instances, e.g.\ \cref{eqn.fav_ex_db} are functors from $\cat{C}$ to a certain category called $\smset$.
\item The implicit mapping in \cref{eqn.airline_schemas} that takes economy and first class seats in $A$ to airline seats in $B$ constitutes a functor $A\to B$.
\item The notion of data migration for moving data between schemas is formalized by adjoint functors.
\end{itemize}
We begin in \cref{sec.categories} with the definition of categories and a bunch of different sorts of examples. In \cref{sec.cat_fun_nt_db} we bring back databases, in particular their instances and the maps between them, by discussing functors and natural transformations. In \cref{sec.adjunctions_mig} we discuss data migration by way of adjunctions, which generalize the Galois connections we introduced in \cref{sec.galois_connections}. Finally in \cref{sec.bonus_lims_colims} we give a bonus section on limits and colimits.%
\footnote{By ``bonus'', we mean that this is important material---limits and colimits will show up throughout the book and throughout ones interaction with category theory---so we think the reader will especially benefit from this material in the long run.}
%-------- Section --------%
\section{Categories}\label{sec.categories}
A category $\cat{C}$ consists of four pieces of data---objects, morphisms,
identities, and a composition rule---satisfying two properties.
\begin{definition}\label{def.category}\index{category}
To specify a \emph{category} $\cat{C}$:
\begin{enumerate}[label=(\roman*)]
\item one specifies a collection $\Ob(\cat{C})$, elements of which are called
\emph{objects}.
\item for every two objects $c,d$, one specifies a set $\cat{C}(c,d)$,%
\footnote{This set $\cat{C}(c,d)$ is often denoted
$\Hom_{\cat{C}}(c,d)$, and called the ``hom-set from $c$ to $d$.'' The
word ``hom'' stands for homomorphism, of which the word ``morphism'' is
a shortened version.} elements of which are called \emph{morphisms} from
$c$ to $d$.
\item for every object $c\in \Ob(\cat{C})$, one specifies a morphism $\id_c\in
\cat{C}(c,c)$, called the \emph{identity morphism} on
$c$.\index{identity!morphism}
\item for every three objects $c,d,e\in\Ob(\cat{C})$ and morphisms $f\in
\cat{C}(c,d)$ and $g\in \cat{C}(d,e)$, one specifies a morphism $f\cp g\in \cat{C}(c,e)$, called \emph{the composite of $f$ and $g$}.
\end{enumerate}
It is convenient to denote elements $f\in\cat{C}(c,d)$ as $f\colon c\to
d$. Here, $c$ is called the \emph{domain} of $f$, and $d$ is called the
\emph{codomain} of $f$.\index{domain}\index{codomain}
These constituents are required to satisfy two conditions:
\begin{enumerate}[label=(\alph*)]
\item \emph{unitality}: for any morphism $f\colon c\to d$, composing with the identities at $c$ or $d$ does nothing: $\id_c.f=f$ and $f.\id_d=f$.
\item \emph{associativity}: for any three morphisms $f\colon c_0\to c_1$, $g\colon c_1\to c_2$, and $h\colon c_2\to c_3$, the following are the equal: $(f.g).h=f.(g.h)$. We can write it simply as $f.g.h$.
\end{enumerate}
\end{definition}\index{unitality}\index{associativity}
Our next goal is to give lots of examples of this concept. Our first source of
examples is that of free and finitely presented categories, which generalize the
notion of Hasse diagram from \cref{rem.Hasse}.
% Subsubsection %
\subsection{Free
categories}\label{subsubsec.path_cats}\index{free!category}\index{graph}
Recall from \cref{def.graph} that a graph consists of two types of thing: vertices and arrows. From there one can define
paths, which are just head-to-tail sequences of arrows. Every path $p$ has a start
vertex and an end vertex; if $p$ goes from $v$ to $w$, we write $p\colon v\to w$. To every vertex $v$, there is a trivial path, containing no arrows,
starting and ending at the given vertex; we often denote it simply by $v$. We may also concatenate paths: if $p\colon v\to w$ and $q\colon w\to x$, their concatenation is denoted $p\cp q$, and it goes $v\to w$.
In \cref{chap.posets}, we used graphs to depict posets $(V,\leq)$: the vertices form the
elements of the poset and for the order, we say that $v\leq w$ if there is a path from $v$ to $w$ in $G$.
We will now use graphs in a very similar way to depict certain categories, known
as \emph{free categories}. Then we will explain the strong relationship between
posets and categories in \cref{subsubsec.pos_free_spectrum}.
For any graph $G = (V,A,s,t)$, we can define a category $\free(G)$,
called the \emph{free category on $G$}, whose objects are the vertices $V$ and
whose morphisms from $c$ to $d$ are the paths from $c$ to $d$. The identity
morphism on an object $c$ is simply the trivial path at $c$. Composition is given by concatenation of paths.
For example, we define $\Cat{2}$ to be the free category generated by the graph shown below:
\begin{equation}\label{eqn.graphs_1_2}
\Cat{2}\coloneqq\free\left(\;\raisebox{-.05in}{\fbox{
\begin{tikzcd}[ampersand replacement=\&]
\LMO{v_1}\ar[r, "f_1"]\&\LMO{v_2}
\end{tikzcd}
}}
\;\right)
\end{equation}
\begin{equation}
\Cat{2}\coloneqq\free\left(\;\raisebox{-.05in}{\fbox{
\begin{tikzcd}[ampersand replacement=\&]
\LMO{s(e)}\ar[r, "e"]\&\LMO{t(e)}
\end{tikzcd}
}}
\;\right)
\end{equation}
\begin{equation}
\fbox{
\begin{tikzcd}[ampersand replacement=\&]
\LMO{x}\ar[loop above, "f"]\ar[r, bend left,"g"]\&\LMO{y}\ar[l, bend left, "h"]
\end{tikzcd}
}
\end{equation}
It has two objects $v_1$ and $v_2$,
and three morphisms: $\id_{v_1}\colon v_1 \to v_1$, $f_1\colon v_1 \to v_2$,
and $\id_{v_2}\colon v_2 \to v_2$. Here $\id_{v_1}$ is the path of length 0
starting and ending at $v_1$, $f_1$ is the path of length 1 consisting
of just the arrow $f_1$, and $\id_{v_2}$ is the length 0 path at $v_2$. As our
notation suggests $\id_{v_1}$ is the identity morphism for the object $v_1$, and
similarly $\id_{v_2}$ for $v_2$. As composition is given by concatenation, we
have, for example $\id_{v_1}.f_1 =f_1$, $\id_{v_2}.\id_{v_2}=\id_{v_2}$, and so
on.
From now on, we may elide the difference between a graph and the corresponding free category $\free(G)$, at least when the one we mean is clear enough from context.
\begin{exercise}
For $\free(G)$ to really be a category, we must check that this data we
specified obeys the unitality and associativity properties. Check that these
obeyed.
\end{exercise}
\begin{exercise}
The free category on the graph shown here:%
\footnote{As mentioned above, we elide the difference between the graph and the corresponding free category.}
\begin{equation}\label{eqn.graphs_rand9851}
\Cat{1} ={\mathbf{Free\bigg(}}\;\raisebox{-.05in}{\fbox{
\begin{tikzcd}[ampersand replacement=\&]
\LMO{v_1}
\end{tikzcd}
}}
\;\bigg)
\end{equation}
\begin{equation}\label{eqn.graphs_rand9851}
\Cat{2} ={\mathbf{Free\bigg(}}\;\raisebox{-.05in}{\fbox{
\begin{tikzcd}[ampersand replacement=\&]
\LMO{v_1}\ar[r, "f_1"]\&\LMO{v_2}
\end{tikzcd}
}}
\;\bigg)
\end{equation}
\begin{equation}\label{eqn.graphs_rand9851}
\Cat{3} ={\mathbf{Free\bigg(}}\;\raisebox{-.05in}{\fbox{
\begin{tikzcd}[ampersand replacement=\&]
\LMO{v_1}\ar[r, "f_1"]\&\LMO{v_2}\ar[r, "f_2"]\&\LMO{v_3}
\end{tikzcd}
}}
\;\bigg)
\end{equation}
has three objects and six morphisms: the three vertices and six paths in the graph.
Create six names, one for each of the six morphisms in $\Cat{3}$. Write down a six-by-six table, label the rows and columns by the six names you chose.
\begin{enumerate}
\item Fill out the table by writing the name of the composite in each cell.
\item Where are the identities?
\qedhere
\end{enumerate}
\end{exercise}
\begin{exercise}\label{exc.Cat_n}
Let's make some definitions, based on the pattern above:
\begin{enumerate}
\item What is the category $\Cat{1}$? That is, what are its objects and morphisms?
\item What is the category $\Cat{0}$?
\item What is the formula for the number of morphisms in $\Cat{n}$ for arbitrary $n\in\NN$?
\qedhere
\end{enumerate}
\end{exercise}
\begin{example}[Natural numbers as a free category]
\label{ex.monoid_nats}\index{natural numbers!as free category}
Consider the following graph:
\begin{equation}\label{eqn.loop_graph}
\fbox{
\begin{tikzcd}[ampersand replacement=\&]
\LMO[under]{z}\ar[loop above, "s"]
\end{tikzcd}
}
\end{equation}
\begin{equation}
\fbox{
\begin{tikzcd}[ampersand replacement=\&]
\LMO[under]{z}\ar[loop left, "s"]\ar[loop right, "t"]
\end{tikzcd}
}
\end{equation}
It has only one vertex and one arrow, but it has infinitely many paths. Indeed, it
has a unique path of length $n$ for every natural number $n\in\NN$. That is,
$\Set{Path}=\{z, s, s\cp s, s\cp s\cp s, \ldots\}$, where we write $z$ for the
length 0 path on $z$; it represents the morphism $\id_z$. There is a one-to-one correspondence between $\Set{Path}$
and the natural numbers, $\NN=\{0,1,2,3,\ldots\}$.
This is an example of a category with one object. Such categories are known as
\emph{monoids}, because they are defined by a set (of morphisms) with an identity and a single binary
operation (the composition rule). Monoidal posets, which we met in
\cref{chap.resource_theory}, are so-named because they add monoid structure to
posets.
\end{example}
\begin{exercise}
In \cref{ex.monoid_nats} we identified the paths of the loop graph \eqref{eqn.loop_graph} with numbers $n\in\NN$. Paths can be concatenated. Given numbers $m,n\in\NN$, what number corresponds to the concatenation of their associated paths?
\end{exercise}
% Subsubsection %
\subsection{Presenting categories via path equations}
\label{subsec.presenting_cats}
\index{presentation!of
category}\index{category!presented}\index{category!finitely presented}
So for any graph $G$, there is a free category on $G$. But we don't have to stop
there: we can add equations between paths in the graph, and still get a
category. We are only allowed to equate two paths $p$ and $q$ when they are \emph{parallel}, meaning they have the same source vertex and the same target vertex.
A finite graph with path equations is called a \emph{finite presentation} for a
category, and the category that results is known as a \emph{finitely presented category}.
Here are two examples:
\[
\mathrm{Free\_square}\coloneqq\boxCD{
\begin{tikzcd}[ampersand replacement=\&]
\LMO{A}\ar[r, "f"]\ar[d, "g"']\&\LMO{B}\ar[d, "h"]\\
\LMO[under]{C}\ar[r, "i"']\&\LMO[under]{D}
\end{tikzcd}
\\~\\\footnotesize
\textit{no equations}
}
\hspace{.8in}
\mathrm{Comm\_square}\coloneqq
\boxCD{
\begin{tikzcd}[ampersand replacement=\&]
\LMO{A}\ar[r, "f"]\ar[d, "g"']\&\LMO{B}\ar[d, "h"]\\
\LMO[under]{C}\ar[r, "i"']\&\LMO[under]{D}
\end{tikzcd}
\\~\\\footnotesize
$f.h=g.i$
}
\]
Both of these are presentations of categories: in the left-hand one, there are no equations so it presents a free category, as discussed in \cref{subsubsec.path_cats}. The free square category has ten morphisms, because every path is a unique morphism.
\begin{exercise} \label{exc.label_free_square}
\begin{enumerate}
\item Write down the ten paths in the free square category above.
\item Name two different paths that are parallel.
\item Name two different paths that are not parallel.
\qedhere
\end{enumerate}
\end{exercise}
On the other hand, the category presented on the right has only nine morphisms,
because $f.h$ and $g.i$ are made equal. This category is called the
``commutative square''.\index{commutative square} Its morphisms are
\[
\{A, B, C, D, f, g, h, i, f\cp h\}
\]
One might say ``the missing one is $g\cp i$,'' but that is not quite right: $g\cp i$ is there too, because it is equal to $f\cp h$. As usual, $A$ denotes $\id_A$, etc.
\begin{exercise}
Write down all the morphisms in the category presented by the following diagram:
\[
\boxCD{
\begin{tikzcd}[ampersand replacement=\&]
\LMO{A}\ar[r, "f"]\ar[d, "g"']\ar[dr, "j" description]\&\LMO{B}\ar[d, "h"]\\
\LMO[under]{C}\ar[r, "i"']\&\LMO[under]{D}
\end{tikzcd}
\\~\\\footnotesize
$f\cp h=j=g\cp i$
}
\]
\end{exercise}
\begin{example} \label{ex.group_of_order_2}\index{group}
Here are two more another examples:
\[
\cat{C}\coloneqq\boxCD{\begin{tikzcd}[ampersand replacement=\&]
\LMO[under]{z}\ar[loop above, "s"]
\end{tikzcd}
\\~\\\footnotesize
$s\cp s=z$
}
\hspace{1in}
\cat{D}\coloneqq\boxCD{\begin{tikzcd}[ampersand replacement=\&]
\LMO[under]{z}\ar[loop above, "s"]
\end{tikzcd}
\\~\\\footnotesize
$s\cp s\cp s\cp s=s\cp s$
}
\]
The set of morphisms in $\cat{C}$ is $\{z,s\}$, where $s\cp s=z$. Thus $s\cp s\cp s=s$ and $s\cp s\cp s\cp s=z$, etc.
\end{example}
\begin{exercise}
Write down all the morphisms in the category $\cat{D}$ from \cref{ex.group_of_order_2}.
\end{exercise}
\begin{remark}\label{rem.db_schemas_are_cats}
We can now see that the schemas in \cref{sec.C2_motivation}, e.g.\ \cref{eqn.free_schema,eqn.mySchema} are finite presentations of categories. We will come back to that in \cref{sec.cat_fun_nt_db}.
\end{remark}
% Subsubsection %
\subsection{Posets and free categories: two ends of a spectrum}\label{subsubsec.pos_free_spectrum}
Now that we have used graphs to depict posets in \cref{chap.posets} and categories above, one may want
to know the relationship between these uses. The main idea we want to explain now is that
\begin{quote}
``A poset is a category where every two parallel arrows are the same.''
\end{quote}
Thus any poset can be regarded as a category, and any category can be somehow ``crushed down'' into a poset. Let's discuss these ideas.
\paragraph{Posets as categories}\index{poset!as category}
Suppose $(P,\leq)$ is a poset. It specifies a category $\cat{P}$ as follows. The
objects of $\cat{P}$ are precisely the elements of $P$; that is,
$\Ob(\cat{P})=P$. As for morphisms, $\cat{P}$ has exactly one morphism $p\to q$ if
$p\leq q$ and no morphisms $p\to q$ if $p\not\leq q$.
This means that a Hasse diagram for a poset can be thought of a presentation of
a category where, for all vertices $p$ and $q$, every two paths from $p\to q$ are
declared equal. For example, in \cref{eqn.parts_of_3} we saw a Hasse diagram
that was like the graph on the left:
\[
\boxCD{
\begin{tikzcd}[row sep=15pt, ampersand replacement=\&]
\&\bullet\\
\bullet\ar[ur]\&\bullet\ar[u]\&\bullet\ar[ul]\\
\&\bullet\ar[lu]\ar[u]\ar[ru]
\end{tikzcd}
\\~\\\footnotesize
}
\hspace{.6in}
\boxCD[red]{
\begin{tikzcd}[row sep=15pt, ampersand replacement=\&]
\&\bullet\\
\bullet\ar[ur,"d"]\&\bullet\ar[u,"e"]\&\bullet\ar[ul,"f"']\\
\&\bullet\ar[lu, "a"]\ar[u,"b"]\ar[ru,"c"']
\end{tikzcd}
\\~\\\footnotesize
\textit{\color{red}no equations?}
}
\hspace{.6in}
\boxCD{
\begin{tikzcd}[row sep=15pt, ampersand replacement=\&]
\&\bullet\\
\bullet\ar[ur,"d"]\&\bullet\ar[u,"e"]\&\bullet\ar[ul,"f"']\\
\&\bullet\ar[lu, "a"]\ar[u,"b"]\ar[ru,"c"']
\end{tikzcd}
\\~\\\footnotesize
$a\cp d = b\cp e = c\cp f$
}
\]
The Hasse diagram (left) might look the most like the free category presentation (middle) which has no equations, but that is not correct. The free category has three morphisms from bottom object to top
object, whereas posets are categories with at most one morphism between two given
objects. Thus the diagram on the right is the correct presentation for the poset on the left.
\begin{exercise}
What equations would you need to add to the graphs below in order to present the associated posets?
\[
G_1=\boxCD{
\begin{tikzcd}[ampersand replacement=\&, column sep=20pt]
\bullet\ar[r, shift left, "f"]\ar[r, shift right, "g"']\&\bullet
\end{tikzcd}
}
\hspace{.4in}
G_2=\boxCD{
\begin{tikzcd}[ampersand replacement=\&]
\bullet\ar[loop above, "f"]
\end{tikzcd}
}
\hspace{.4in}
G_3=\boxCD{
\begin{tikzcd}[ampersand replacement=\&, column sep=20pt]
\bullet\ar[r, "f"]\ar[d, "g"']\&\bullet\ar[d, "h"]\\
\bullet\ar[r, "i"']\&\bullet
\end{tikzcd}
}
\hspace{.4in}
G_4=\boxCD{
\begin{tikzcd}[ampersand replacement=\&, column sep=20pt]
\bullet\ar[r, "f"]\ar[d, "g"']\&\bullet\ar[d, "h"]\\
\bullet\&\bullet
\end{tikzcd}
}
\qedhere
\]
\end{exercise}
\paragraph{The poset reflection of a category}\index{category!poset reflection}
Given any category $\cat{C}$, one can obtain a poset $(C,\leq)$ from it by destroying the distinction between any two parallel morphisms. That is, let $C=\Ob(\cat{C})$, and put $c_1\leq c_2$ iff $\cat{C}(c_1,c_2)\neq\emptyset$. If there is one, or two, or fifty, or infinitely many morphisms $c_1\to c_2$ in $\cat{C}$, the poset reflection does not see the difference. But it does see the difference between some and none.
\begin{exercise}
What is the poset reflection of the category $\NN$ from \cref{ex.monoid_nats}?
\end{exercise}
%\begin{exercise}
%Let $\finset\ss\smset$ denote the category of finite sets and the functions between them. Consider the poset reflection of $\finset$. What does its Hasse diagram look like?
%\end{exercise}
We have only discussed adjoint functors between posets, but soon we will discuss adjoints in general. Here is a statement you might not understand exactly, but it's true; you can ask a category theory expert about it and they should be able to explain it to you:
\begin{quote}
Considering a poset as a category is right adjoint to turning a category into a poset by poset reflection.
\end{quote}
\begin{remark}[Ends of a spectrum]\index{ends of a spectrum}
The main point of this subsection is that both posets and free categories are specified by a graph without path equations, but they denote opposite ends of a spectrum. In both cases, the vertices of the graph become the objects of a category and the paths become morphisms. But in the case of free categories, each path becomes a different morphism. In the case of posets, all parallel paths become the same morphism. Every category presentation, i.e.\ graph with equations, lies somewhere in between the free category and its poset reflection.
\end{remark}
% Subsubsection %
\subsection{Important categories in mathematics} \label{subsec.important_cats}
We have been talking about category presentations, but there are categories that are best understood directly, not by way of presentations. Recall the definition of category from \cref{def.category}. The most important category in mathematics is the category of sets.
\begin{definition}\label{def.category_of_sets}\index{category!$\smset$}
The \emph{category of sets}, denoted $\smset$, is defined as follows.
\begin{enumerate}[label=(\roman*)]
\item $\Ob(\smset)$ is the collection of all sets.
\item If $S$ and $T$ are sets, then $\smset(S,T)=\{f\colon S\to T\mid f\text{ is a function}\}$.
\item For each set $S$, the identity morphism is the function $\id_S\colon S\to S$ given by $\id_S(s)=s$ for each $s\in S$.
\item Given $f\colon S\to T$ and $g\colon T\to U$, their composite $f.g$ sends $s\in S$ to $g(f(s))\in U$.
\end{enumerate}
These definitions satisfy the unitality and associativity conditions, so $\smset$ is indeed a category.
\end{definition}
Closely related is the category $\finset$. This is the category where the
objects are finite sets and the morphisms are functions between them.
\index{category!$\finset$}
\begin{exercise}
Let $\ul{2}=\{1,2\}$ and $\ul{3}=\{1,2,3\}$. These are objects in the
category $\smset$ discussed in \cref{def.category_of_sets}. Write down all the elements of the set $\smset(\ul{2},\ul{3})$; there should be nine.
\end{exercise}
\begin{remark} \label{rem.cats_and_vcats1}\index{enriched category!vs category}
You may have wondered what categories have to do with
$\cat{V}$-categories (\cref{def.cat_enriched_mpos}); perhaps you think the
definitions hardly look alike. Despite the term `enriched category',
$\cat{V}$-categories are not categories with extra structure. While some sorts
of $\cat{V}$ categories, such as $\Bool$-categories, i.e.\ posets, can naturally be
seen as categories, other sorts, such as $\Cost$-categories, cannot.
The reason for the importance of $\smset$ is that, if we generalize the
definition of enriched category (\cref{def.cat_enriched_mpos}), we find that
categories in the sense of \cref{def.category} are exactly $\smset$-categories---so categories are
$\cat{V}$-categories for a very special choice of $\cat{V}$. We'll come back to
this in \cref{subsec.SMC_enrichment}. For now, we simply remark that just like
a deep understanding of the category $\Cost$---for example, knowing that it is a
quantale---yields insight into Lawvere metric spaces, so the study of $\smset$
yields insights into categories.
\end{remark}
There are many other categories that mathematicians care about:
\begin{itemize}
\item $\Cat{Top}$: the category of topological spaces (neighborhoods)
\item $\Cat{Grph}$: the category of graphs (connection)
\item $\Cat{Meas}$: the category of measure spaces (amount)
\item $\Cat{Mon}$: the category of monoids (action)
\item $\Cat{Grp}$: the category of groups (reversible action, symmetry)
\item $\Cat{Cat}$: the category of small categories (action in context, structure)
\end{itemize}
But in fact, this does not at all do justice to which categories mathematicians
think about. They work with whatever category they find fits their purpose at
the time, like `the category of Riemannian manifolds of dimension at most
4'.
Here is one more source of examples: take any category you already have and reverse all its morphisms; the result is again a category.
\begin{definition}\label{def.opposite_cat}\index{opposite!category}
Let $\cat{C}$ be a category. Its \emph{opposite}, denoted $\cat{C}\op$ is the category with the same objects, $\Ob(\cat{C}\op)=\Ob(\cat{C})$, and for any two objects $c,d\in\Ob(\cat{C})$, one has $\cat{C}\op(c,d)=\cat{C}(d,c)$. Identities and composition are as in $\cat{C}$.
\end{definition}
% Subsubsection %
\subsection{Isomorphisms in a category}
The previous sections have all been about examples of categories: free categories, presented categories, and important categories in math. In this section, we briefly switch gears and talk about an important concept in category theory, namely the concept of isomorphism.
In a category, there is often the idea that two objects are interchangeable. For
example, in the category $\smset$, one can exchange the set $\{\blacksquare,\square\}$ for the set $\{0,1\}$
and everything will be the same, other than the names for the elements. Similarly, if one has a poset with elements $a,b$, such that $a \le b$ and $b \le a$, i.e.\ $a\cong b$, then are essentially the same. How so? Well they act the same, in that for any other object $c$, we know that $c \le a$ if and only if $c
\le b$, and $c\ge a$ iff $c\geq b$. The notion of isomorphism formalizes this notion of interchangeability.
\begin{definition}\index{isomorphism}
An \emph{isomorphism} is a morphism $f\colon A \to B$ such that there exists a
morphism $g\colon B \to A$ satisfying $f\cp g=\id_A$ and $g\cp f=\id_B$. In this case
we call $f$ and $g$ \emph{inverses}, and we often write $g=f\inv$, or
equivalently $f=g\inv$. We also say that $A$ and $B$ are \emph{isomorphic}
objects.
\end{definition}
\begin{example}\label{ex.simple_iso}
The set $A\coloneqq\{a,b,c\}$ and the set $\ul{3}=\{1,2,3\}$ are isomorphic; that is, there exists an isomorphism $f\colon A\to \ul{3}$ given by $f(a)=2$, $f(b)=1$, $f(c)=3$.
\end{example}
\begin{exercise}
\begin{enumerate}
\item What is the inverse $f\inv\colon\ul{3}\to A$ of the function $f$ given in \cref{ex.simple_iso}?
\item How many distinct isomorphisms are there $A\to\ul{3}$?
\qedhere
\end{enumerate}
\end{exercise}
\begin{exercise}
Show that in any category $\cat{C}$, for any object $c\in\cat{C}$, the identity $\id_c$ is an isomorphism.
\end{exercise}
\begin{exercise}
Recall Examples \ref{ex.monoid_nats} and \ref{ex.group_of_order_2}. A monoid in
which every morphism is an isomorphism is known as a \emph{group}.
\begin{enumerate}
\item Is the monoid in \cref{ex.monoid_nats} a group?
\item What about the monoid in \cref{ex.group_of_order_2}?
\qedhere
\end{enumerate}
\end{exercise}
\begin{exercise}\index{free!category}
Let $G$ be a graph, and let $\free(G)$ be the corresponding free category. Somebody tells you that the only isomorphisms in $\free(G)$ are the identity morphisms. Is that person correct? Why or why not?
\end{exercise}
\begin{example}
In this example, we will see that it is possible for $g$ and $f$ to be almost---but not quite---inverses, in a certain sense.
Consider the functions $f\colon\ul{2}\to\ul{3}$ and $g\colon\ul{3}\to\ul{2}$ drawn below:
\[
\begin{tikzpicture}[y=.3cm, shorten <=-2pt, shorten >=-2pt]
\node[label={[above=-4pt, font=\tiny]:$1$}] (A1) {$\bullet$};
\node[below=1 of A1, label={[above=-4pt, font=\tiny]:$2$}] (A2) {$\bullet$};
\node[ellipse, draw, inner sep=0pt, fit=(A1) (A2)] (A) {};
%
\node[above right=0 and 1 of A1, label={[above=-4pt, font=\tiny]:$1$}] (B1) {$\bullet$};
\node[below=1 of B1, label={[above=-4pt, font=\tiny]:$2$}] (B2) {$\bullet$};
\node[below=1 of B2, label={[above=-4pt, font=\tiny]:$3$}] (B3) {$\bullet$};
\node[ellipse, draw, inner sep=0pt, fit=(B1) (B3)] (B) {};
%
\node[right=3 of B1, label={[above=-4pt, font=\tiny]:$1$}] (C1) {$\bullet$};
\node[below=1 of C1, label={[above=-4pt, font=\tiny]:$2$}] (C2) {$\bullet$};
\node[below=1 of C2, label={[above=-4pt, font=\tiny]:$3$}] (C3) {$\bullet$};
\node[ellipse, draw, inner sep=0pt, fit=(C1) (C3)] (C) {};
%
\node[right=6 of A1, label={[above=-4pt, font=\tiny]:$1$}] (D1) {$\bullet$};
\node[below=1 of D1, label={[above=-4pt, font=\tiny]:$2$}] (D2) {$\bullet$};
\node[ellipse, draw, inner sep=0pt, fit=(D1) (D2)] (D) {};
%
\draw[->] (A1) to (B1);
\draw[->] (A2) to (B3);
\draw[->] (C1) to (D1);
\draw[->] (C2) to (D1);
\draw[->] (C3) to (D2);
\end{tikzpicture}
\]
Then the reader should be able to instantly check that $f\cp g=\id_{\ul{2}}$ but
$g\cp f\neq\id_{\ul{3}}$. Thus $f$ and $g$ are not inverses and hence not
isomorphisms. %In fact there is a special word for the situation when at least one composite is the identity, namely we call this situation a \emph{retraction}. However, we will not need this concept again in the course.
\end{example}
%\subsection{Back to wiring diagrams}\label{ssec.wirdia1}
%
%We introduced a very simple sort of wiring diagram in \cref{subsec.interlude_course}.
%\[
%\begin{tikzpicture}[oriented WD, bb medium, bb port length=0]
% \node[bb={1}{1}] at (0,0) (a) {$\le$};
% \node[bb={1}{1}] at (3,0) (b) {$\le$};
% \node[bb={1}{1}] at (6,0) (c) {$\le$};
% \node[bb={1}{1}, fit=(a) (b) (c)] (outer) {};
% \begin{scope}[font=\tiny]
% \draw (outer_in1') to node[above=-2pt] {$x_0$} (a_in1);
% \draw (a_out1) to node[above=-2pt] {$x_1$} (b_in1);
% \draw (b_out1) to node[above=-2pt] {$x_2$} (c_in1);
% \draw (c_out1) to node[above=-2pt] {$x_3$} (outer_out1');
% \end{scope}
%\end{tikzpicture}
%\]
%It was in the context of posets which, as we saw in \cref{subsubsec.pos_free_spectrum}, are categories where there is at most one morphism between any two objects. If there is one, we write $v\leq w$. The wiring diagram above is basically a proof that $x_0\leq x_3$ (the exterior box) using three facts (the interior boxes), $x_0\leq x_1$, $x_1\leq x_2$, and $x_2\leq x_3$.
%
%Categories have basically the same sort of wiring diagram as posets---namely sequences of boxes inside a box---but there may be more than one morphism between two objects, so we need to be able to label our boxes with morphism names. All of our boxes still have one string on the left and one on the right; these correspond to the domain and codomain of the morphism; see \cref{def.category}.
%
%So suppose given morphisms $f\colon x_0\to x_1$, $g\colon x_1\to x_2$, and $h\colon x_2\to x_3$. Then their composite $f\cp g\cp h$ is the outer box in the following wiring diagram
%\[
%\begin{tikzpicture}[oriented WD, bb medium, bb port length=0]
% \node[bb={1}{1}] at (0,0) (a) {$f$};
% \node[bb={1}{1}] at (3,0) (b) {$g$};
% \node[bb={1}{1}] at (6,0) (c) {$h$};
% \node[bb={1}{1}, fit=(a) (b) (c)] (outer) {};
% \begin{scope}[font=\tiny]
% \draw (outer_in1') to node[above=-2pt] {$x_0$} (a_in1);
% \draw (a_out1) to node[above=-2pt] {$x_1$} (b_in1);
% \draw (b_out1) to node[above=-2pt] {$x_2$} (c_in1);
% \draw (c_out1) to node[above=-2pt] {$x_3$} (outer_out1');
% \end{scope}
%\end{tikzpicture}
%\]
%
%The identity morphism on an object $x$ is drawn as on the left below, but we
%will see that it is not harmful to draw $\id_x$ in any of the following three
%ways:
%\[
%\begin{tikzpicture}[oriented WD, bb medium, bb port length=0]
% \node[bb={1}{1}] (empty) {{\color{white}$\leq$}};
% \draw (empty_in1) to (empty_out1);
% \node[font=\tiny] at ($(empty_in1)+(-4pt,0)$) {$x$};
% \node[font=\tiny] at ($(empty_out1)+(4pt,0)$) {$x$};
%\end{tikzpicture}
%\hspace{1in}
%\begin{tikzpicture}[oriented WD, bb medium, bb port length=0]
% \node[bb={1}{1}, dotted] (empty) {{\color{white}$\leq$}};
% \draw (empty_in1) to (empty_out1);
% \node[font=\tiny] at ($(empty_in1)+(-4pt,0)$) {$x$};
% \node[font=\tiny] at ($(empty_out1)+(4pt,0)$) {$x$};
%\end{tikzpicture}
%\hspace{1in}
%\begin{tikzpicture}[oriented WD, bb medium, bb port length=0]
% \node[bb={1}{1}, white] (empty) {{\color{white}$\leq$}};
% \draw (empty_in1) to (empty_out1);
% \node[font=\tiny] at ($(empty_in1)+(-4pt,0)$) {$x$};
% \node[font=\tiny] at ($(empty_out1)+(4pt,0)$) {$x$};
%\end{tikzpicture}
%\]
%By \cref{def.category} the morphisms in a category satisfy two properties, called the unitality property and the associativity property. The unitality says that $\id_x\cp f=f=f\cp\id_y$ for any $f\colon x\to y$. In terms of string diagrams this would say
%\[
%\begin{tikzpicture}[oriented WD, bb medium, bb port length=0]
% \node[bb={1}{1}, dotted] (a) {{\color{white}$f$}};
% \node[bb={1}{1}, right=1 of a] (b) {$f$};
% \node[bb={1}{1}, fit=(a) (b)] (outer) {};
% \begin{scope}[font=\tiny]
% \draw (outer_in1') to node[above=-2pt] {$x$} (a_in1);
% \draw (a_in1) to (a_out1);
% \draw (a_out1) to node[above=-2pt] {$x$} (b_in1);
% \draw (b_out1) to node[above=-2pt] {$y$} (outer_out1');
% \end{scope}
%%
% \node[bb={1}{1}, right=14 of a] (c) {$f$};
% \node[bb={1}{1}, dotted, right=1 of c] (d) {{\color{white}$f$}};
% \node[bb={1}{1}, fit=(c) (d)] (outer2) {};
% \begin{scope}[font=\tiny]
% \draw (outer2_in1') to node[above=-2pt] {$x$} (c_in1);
% \draw (c_out1) to node[above=-2pt] {$y$} (d_in1);
% \draw (d_in1) to (d_out1);
% \draw (d_out1) to node[above=-2pt] {$y$} (outer2_out1');
% \end{scope}
%%
% \node[bb={1}{1}] at ($(b)!.5!(c)$) (m) {$f$};
% \node[bb={1}{1}, fit=(m)] (outerm) {};
% \begin{scope}[font=\tiny]
% \draw (outerm_in1') to node[above=-2pt] {$x$} (m_in1);
% \draw (m_out1) to node[above=-2pt] {$y$} (outerm_out1');
% \end{scope}
%%
% \node at ($(outer.east)!.5!(outerm.west)$) {=};
% \node at ($(outer2.west)!.5!(outerm.east)$) {=};
%\end{tikzpicture}
%\]
%This means you can insert or discard any identity morphism you see in a wiring diagram.
%
%\begin{exercise}
%The associativity property says that for any three morphisms $f\colon c_0\to c_1$, $g\colon c_1\to c_2$, and $h\colon c_2\to c_3$, we have $(f\cp g)\cp h=f\cp(g\cp h)$. Draw wiring diagrams for each of the two sides of that equation.
%\end{exercise}
%
%
%-------- Section --------%
\section{Functors, natural transformations, and databases}\label{sec.cat_fun_nt_db}
In \cref{sec.C2_motivation} we showed some database schemas: graphs with path equations. Then in \cref{subsec.presenting_cats} we said that graphs with path equations correspond to finitely presented categories. Now we want to explain what the data in a database is, as a way to introduce functors. To do so, we begin by noticing that sets and functions---the objects and morphisms in the category $\smset$---can be captured by simple databases.
%---- Subsection ----%
\subsection{Sets and functions as databases}
\index{database!schema}\index{category!as database schema}
The first observation is that any set can be understood as a table with only one column: the ID column.
\[
\begin{tabular}{ c |}
\textbf{Planet of Sol}\\\hline
Mercury\\
Venus\\
Earth\\
Mars\\
Jupiter\\
Saturn\\
Uranus\\
Neptune
\end{tabular}
\hspace{.7in}
\begin{tabular}{ c |}
\textbf{Prime number}\\\hline
2\\
3\\
5\\
7\\
11\\
13\\
17\\
$\vdots$
\end{tabular}
\hspace{.7in}
\begin{tabular}{ c |}
\textbf{Flying pig}\\\hline
~\\
~\\
~\\
~\\
~\\
~\\
~\\
~
\end{tabular}
\]
Rather than put the elements of the set between braces, e.g.\ $\{2,3,5,7,11,\ldots\}$, we write them down as rows in a table.
In databases, single-column tables are often called controlled vocabularies, or master data. Now to be honest, we can only write out every single entry in a table when its set of rows is finite. A database practitioner might find the idea of our prime number table a bit unrealistic. But we're mathematicians, so since the idea makes perfect sense abstractly, we will continue to think of sets as one-column tables.
The above databases have schemas consisting of just one vertex:
\[
\fbox{
\begin{tikzcd}[ampersand replacement=\&, column sep=50pt]
\LTO{Planet of Sol}
\end{tikzcd}
}
\hspace{1in}
\fbox{
\begin{tikzcd}[ampersand replacement=\&, column sep=50pt]
\LTO{Prime number}
\end{tikzcd}
}
\hspace{1in}
\fbox{
\begin{tikzcd}[ampersand replacement=\&, column sep=50pt]
\LTO{Flying pig}
\end{tikzcd}
}
\]
Obviously, there's really not much difference between these schemas, other than the label of the unique vertex. So we could say ``sets are databases whose schema consists of a single vertex''. Let's move on to functions.
A function $f\colon A\to B$ can almost be depicted as a two-column table
\[
\begin{tabular}{ c | c}
\textbf{Beatle}&\textbf{Played}\\\hline
George&Lead guitar\\
John&Rhythm guitar\\
Paul&Bass guitar\\
Ringo&Drums
\end{tabular}
\]
except it is unclear whether the elements of the right-hand column exhaust all of $B$. What if there are rock-and-roll instruments out there that none of the Beatles played? So a function $f\colon A\to B$ requires two tables, one for $A$ and its $f$ column, and one for $B$:
\[
\begin{tabular}{ c | c}
\textbf{Beatle}&\textbf{Played}\\\hline
George&Lead guitar\\
John&Rhythm guitar\\
Paul&Bass guitar\\
Ringo&Drums\\
~
\end{tabular}
\hspace{1in}
\begin{tabular}{ c |}
\textbf{Rock-and-roll instrument}\\\hline
Bass guitar\\
Drums\\
Keyboard\\
Lead guitar\\
Rhythm guitar
\end{tabular}
\]
\[
\begin{tabular}{ c | c }
\textbf{Germans}&\textbf{Friend}\\\hline
Ilsa & Giulia \\
Klaus& Gian-Carlo\\
J\"org & Martina \\
Sabine & Alessandro \\
Heinrich & Martina \\
~
\end{tabular}
\hspace{1in}
\begin{tabular}{ c }
\textbf{Italians}\\\hline
Giuseppe \\
Giulia\\
Gian-Carlo\\
Alessandro\\
Martina\\
Bianca
\end{tabular}
\]
\[
\begin{tabular}{ c }
\textbf{Germans}\\\hline
Ilsa \\
Klaus\\
J\"org \\
Sabine \\
\end{tabular}
\]
\[
\begin{tabular}{ c | c }
\textbf{Germans}&\textbf{Friend}\\\hline
Ilsa & Italian${}_1$ \\
Klaus& Italian${}_2$\\
J\"org & Italian${}_3$ \\
Sabine & Italian${}_4$ \\
\end{tabular}
\hspace{1in}
\begin{tabular}{ c }
\textbf{Italians}\\\hline
Italian${}_1$ \\
Italian${}_2$\\
Italian${}_3$\\
Italian${}_4$\\
\end{tabular}
\]
Thus the database schema for any function is just a labeled version of $\Cat{2}$:
\[
\fbox{
\begin{tikzcd}[ampersand replacement=\&, column sep=50pt]
\LTO{Beatle}\ar[r, "\text{Played}"]\&\LTO{\parbox{.7in}{\centering Rock-and-roll\\\vspace{-.1in}instrument}}
\end{tikzcd}
}
\]
The lesson is that an instance of a database takes a presentation of a category,
and turns every vertex into a set, and every arrow into a function. As such, it
describes a map from the presented category to the category $\smset$. In
\cref{subsec.enriched_functors} we saw that maps of $\cat{V}$-categories are
known as $\cat{V}$-functors. Similarly, we call maps of plain old categories,
functors.
\subsection{Functors}\label{sec.functors}
A functor is a mapping between categories. It sends objects to objects and morphisms to morphisms, all while preserving identities and composition. Here is the formal definition.
\begin{definition}\index{functor}
Let $\cat{C}$ and $\cat{D}$ be categories. To specify a \emph{functor from $\cat{C}$ to $\cat{D}$}, denoted $F\colon\cat{C}\to\cat{D}$,
\begin{enumerate}[label=(\roman*)]
\item for every object $c\in\Ob(\cat{C})$, one specifies an object $F(c)\in\Ob(\cat{D})$;
\item for every morphism $f\colon c_1\to c_2$ in $\cat{C}$, one specifies a morphism $F(f)\colon F(c_1)\to F(c_2)$ in $\cat{D}$.
\end{enumerate}
The above constituents must satisfy two properties:
\begin{enumerate}[label=(\alph*)]
\item for every object $c\in\Ob(\cat{C})$, we have $F(\id_c)=\id_{F(c)}$.
\item for every three objects $c_1,c_2,c_3\in\Ob(\cat{C})$ and two morphisms $f\in\cat{C}(c_1,c_2)$, $g\in\cat{C}(c_2,c_3)$, the equation $F(f\cp g)=F(f)\cp F(g)$ holds in $\cat{D}$.
\end{enumerate}
\end{definition}
\begin{example}
For example, here we draw three functors $F\colon\Cat{2}\to\Cat{3}$:
\[
\begin{tikzpicture}[x=.7in, y=.25in, inner sep=5pt]
\foreach \i in {0,1,2}{
\node (A\i-n0) at (3*\i,-1) {$\LMO{m_0}$};
\node (A\i-n1) at (3*\i,-3) {$\LMO[under]{m_1}$};
\draw[->] (A\i-n0) to node[left=-2pt, font=\scriptsize] {$f_1$} (A\i-n1);
\node[draw, inner ysep=1pt, fit=(A\i-n0) (A\i-n1)] (A\i) {};
%
\node (B\i-n0) at (3*\i+1,0) {$\LMO{n_0}$};
\node (B\i-n1) at (3*\i+1,-2) {$\LMO{n_1}$};
\node (B\i-n2) at (3*\i+1,-4) {$\LMO{n_2}$};
\draw[->] (B\i-n0) to node[right=-2pt, font=\scriptsize] {$g_1$} (B\i-n1);
\draw[->] (B\i-n1) to node[right=-2pt, font=\scriptsize] {$g_2$} (B\i-n2);
\node[draw, inner ysep=1pt, fit=(B\i-n0) (B\i-n2)] (B\i) {};
}
%
\begin{scope}[dotted, thick, ->]
\draw (A0-n0) -- (B0-n0);
\draw (A0-n1) -- (B0-n0);
%
\draw (A1-n0) -- (B1-n0);
\draw (A1-n1) -- (B1-n1);
%
\draw (A2-n0) -- (B2-n0);
\draw (A2-n1) -- (B2-n2);
\end{scope}
\end{tikzpicture}
\]
In each case, the dotted arrows show what the functor $F$ does to the vertices in $\Cat{2}$; once that information is specified, it turns out---in this special case---that what $F$ does to the three paths in $\Cat{2}$ is completely determined. In the left-hand diagram, $F$ sends every path to the trivial path, i.e.\ the identity on $n_0$. In the middle diagram $F(m_0)=n_0$, $F(f_1)=g_1$, and $F(m_1)=n_1$. In the right-hand diagram, $F(m_0)=n_0$, $F(m_1)=n_2$, and $F(f_1)=g_1.g_2$.
\end{example}
\begin{exercise}
Above we wrote down three functors $\Cat{2}\to\Cat{3}$. Find and write down all the rest of the functors $\Cat{2}\to\Cat{3}$.
\end{exercise}
\begin{example}\index{commutative square}
Recall the categories presented by $\mathrm{Free\_square}$ and
$\mathrm{Comm\_square}$ in \cref{subsec.presenting_cats}. Here they are again,
with $'$ added to the labels in $\mathrm{Free\_square}$ to help distinguish
them:
\[
\mathrm{Free\_square}\coloneqq\boxCD{
\begin{tikzcd}[ampersand replacement=\&]
\LMO{A'}\ar[r, "f'"]\ar[d, "g'"']\&\LMO{B'}\ar[d, "h'"]\\
\LMO[under]{C'}\ar[r, "i'"']\&\LMO[under]{D'}
\end{tikzcd}
\\~\\\footnotesize
\textit{no equations}
}
\hspace{.4in}
\mathrm{Comm\_square}\coloneqq\boxCD{
\begin{tikzcd}[ampersand replacement=\&]
\LMO{A}\ar[r, "f"]\ar[d, "g"']\&\LMO{B}\ar[d, "h"]\\
\LMO[under]{C}\ar[r, "i"']\&\LMO[under]{D}
\end{tikzcd}
\\~\\\footnotesize
$f.h=g.i$
}
\]
There are lots of functors from the free square category (let's call it
$\cat{F}$) to the commutative square category (let's call it $\cat{C}$).
However, there is exactly one such functor $F\colon\cat{F}\to\cat{C}$ that
sends $A'$ to $A$, $B'$ to $B$, $C'$ to $C$, and $D'$ to $D$. That is, once we
have made this decision about $F$ on objects, each of the ten paths in $\cat{F}$ is forced to go to a certain path in $\cat{C}$: the one with the right source and target.
\end{example}
\begin{exercise}
Say where each morphism in $\cat{F}$ is sent under the above functor $F$.
\end{exercise}
All of our example functors so far have been completely determined by what they do on objects, but this is usually not the case.
\begin{exercise}
Consider the free categories $\cat{C}=\fbox{$\bullet\to\bullet$}$ and $\cat{D}=\fbox{$\bullet\tto\bullet$}$. Give two functors $F,G\colon\cat{C}\to\cat{D}$ that act the same on objects but differently on morphisms.
\end{exercise}
\begin{example}
There are also lots of functors from the commutative square category $\cat{C}$ to the free square category $\cat{F}$, but \emph{none} that sends $A$ to $A'$, $B$ to $B'$, $C$ to $C'$, and $D$ to $D'$. The reason is that if $F$ were such a functor, then since $f.h=g.i$ in $\cat{F}$, we would have $F(f.h)=F(g.i)$, but then the rules of functors would let us reason as follows:
\[f'.h'=F(f).F(h)=F(f.h)=F(g.i)=F(g).F(i)=g'.i'\]
The resulting equation, $f'.h'=g'.i'$ does not hold in $\cat{F}$ because it is a free category (there are ``no equations''): every two paths are considered different morphisms. Thus our proposed $F$ is not a functor.
\end{example}
\begin{example}[Functors between posets are monotone maps]
\label{ex.poset_functor}
Recall from \cref{subsubsec.pos_free_spectrum} that posets are categories with
at most one morphism between any two objects. A functor between posets is
exactly a monotone map.
For example, consider the poset $(\NN,\leq)$ considered as a category $\cat{N}$ with objects $\Ob(\cat{N})=\NN$ and a unique morphism $m\to n$ iff $m\leq n$. A functor $F\colon\cat{N}\to\cat{N}$ sends each object $n\in\NN$ to an object $F(n)\in\NN$. It must send morphisms in $\cat{N}$ to morphisms in $\NN$. This means if there is a morphism $m\to n$ then there had better be a morphism $F(m)\to F(n)$. In other words, if $m\leq n$, then we had better have $F(m)\leq F(n)$. But as long as $m\leq n$ implies $F(m)\leq F(n)$, we have a functor.
Thus a functor $F\colon\cat{N}\to\cat{N}$ and a monotone map $\NN\to\NN$ are the same thing.
\end{example}
\begin{exercise}[The category of categories]
Back in the primordial ooze, there is a category $\Cat{Cat}$ in which \emph{the
objects are themselves categories}. Your task here is to construct this
category.
\begin{enumerate}
\item Given any category $\cat{C}$, show that there exists a functor $\id_{\cat{C}}\colon
\cat{C} \to \cat{C}$, known as the \emph{identity functor on $\cat{C}$}, that
maps each object to itself and each morphism to itself.
\item Note that a functor $\cat{C} \to \cat{D}$ consists of a function from
$\Ob(\cat{C})$ to $\Ob(\cat{D})$ and for each pair of objects $c_1,c_2 \in \cat{C}$ a
function from $\cat{C}(c_1,c_2)$ to $\cat{D}(F(c_1),F(c_2))$.
\item Show that given
$F\colon \cat{C} \to \cat{D}$ and $G\colon \cat{D} \to \cat{E}$, we can define a
new functor $F\cp G\colon \cat{C} \to \cat{E}$ just by composing functions.
\item Show that there is a category, call it $\Cat{Cat}$, where the objects are categories, morphisms
are functors, and identities and composition are given as above.
\qedhere
\end{enumerate}
\end{exercise}
\subsection{Databases as $\smset$-valued functors}
Let $\cat{C}$ be a category, and recall the category $\smset$ from \cref{def.category_of_sets}. A functor $F\colon\cat{C}\to\smset$ is known as a \emph{set-valued functor} on $\cat{C}$. Much of database theory (not how to make them fast, but what they are and what you do with them) can be cast in this light.
Indeed, we already saw in \cref{rem.db_schemas_are_cats} that any database schema can be regarded as (presenting) a small
category $\cat{C}$. The next thing to notice is that the data itself---any
instance of the database---is given by set-valued functor
$I\colon\cat{C}\to\smset$. The only additional detail is that for any white
node, such as $c=\LTO[\circ]{string}$, we want to force $I$ to map to the set of
strings. We suppress this detail in the following definition.
\begin{definition}\label{rdef.instance}
Let $\cat{C}$ be a schema, i.e.\ a finitely presented category. A
\emph{$\cat{C}$-instance} is a functor
$I\colon\cat{C}\to\smset$.\footnote{Warning: a $\cat{C}$-instance is a state of the database ``at an instant in time''. The term ``instance'' should not be confused with that in object oriented programming, which would correspond more to what we call a row $r\in I(c)$.}
\end{definition}
\begin{exercise}\label{ex.set_1}
Let $\Cat{1}$ denote the category with one object, called 1, one identity morphism $\id_1$, and no other morphisms. For any functor $F\colon\Cat{1}\to\smset$ one can extract a set $F(1)$. Show that for any set $S$, there is a functor $F_S\colon\Cat{1}\to\smset$ such that $F_S(1)=S$.
\end{exercise}
The above exercise reaffirms that the set of planets, the set of prime numbers, and the set of flying pigs are all set-valued functors---instances---on the schema $\Cat{1}$. Similarly, set-valued functors on the category $\Cat{2}$ are functions. All our examples so far are for the situation where the schema is a free category (no equations). Let's try an example of a category that is not free.
\begin{example}
Consider the following category:
\begin{equation}\label{eqn.idempotent}
\cat{C}\coloneqq\boxCD{\begin{tikzcd}[ampersand replacement=\&]
\LMO[under]{z}\ar[loop above, "s"]
\end{tikzcd}
\\~\\\footnotesize
$s.s=s$
}
\end{equation}
What is a set-valued functor $F\colon\cat{C}\to\smset$? It will consist of a set $Z\coloneqq F(z)$ and a function $S\coloneqq F(s)\colon Z\to Z$, subject to the requirement that $S.S=S$. Here are some examples
\begin{itemize}
\item $Z$ is the set of US citizens, and $S$ sends each citizen to her
or his president. The president's president is her- or him-self.
\item $Z=\NN$ is the set of natural numbers and $S$ sends each number to $0$. In particular, 0 goes to itself.
\item $Z$ is the set of all well-formed arithmetic expressions, such as $13+(2*4)$ or $-5$, that one can write using integers and the symbols $+,-,*,(,)$. The function $S$ evaluates the expression to return an integer, which is itself a well-formed expression. The evaluation of an integer is itself.
\item $Z=\NN_{\geq 2}$, and $S$ sends $n$ to its smallest prime factor. The smallest prime factor of a prime is itself.
\end{itemize}
\[\small
\begin{array}{ c | c}
\NN_{\geq2}&\textbf{smallest prime factor}\\\hline
2&2\\
3&3\\
4&2\\
\vdots&\vdots\\
49&7\\
50&2\\
51&3\\
\vdots&\vdots
\end{array}
\qedhere
\]
\end{example}
\begin{exercise}
Above, we thought of the sort of data that would make sense for the schema \eqref{eqn.idempotent}. Give an example of the sort of data that would make sense for the following schemas:\qquad
\begin{enumerate*}[itemjoin=\hspace{1in}]
\item \boxCD{\begin{tikzcd}[ampersand replacement=\&]
\LMO[under]{z}\ar[loop above, "s"]
\end{tikzcd}
\\~\\\footnotesize
$s\cp s=z$
}
\item
\boxCD{\begin{tikzcd}[ampersand replacement=\&]
\LMO{a}\ar[r, "f"]\&\LMO{b}\ar[r, shift left, "g"]\ar[r, shift right, "h"']\&\LMO{c}
\end{tikzcd}
\\~\\\footnotesize
$f\cp g=f\cp h$
}
\qedhere
\end{enumerate*}
\end{exercise}
The main idea is this: a database schema is a category, and an instance on that schema---the data itself---is a set-valued functor. All the constraints, or business rules, are ensured by the rules of functors, namely that functors preserve composition.
%---- Subsection ----%
\subsection{Natural transformations}
If $\cat{C}$ is a schema---i.e.\ a finitely presented category---then there are many database instances on it, which we can organize into a category. But this is part of a larger story, namely that of natural transformations. An abstract picture to have in mind is this:
\[
\begin{tikzcd}[column sep=large]
\cat{C}\ar[r, bend left, "F"]\ar[r, bend right, "G"']\ar[r, phantom, "\alpha\!\Downarrow\;"]&\cat{D}.
\end{tikzcd}
\]
\begin{definition}\label{def.natural_transformation}\index{natural
transformation}
Let $\cat{C}$ and $\cat{D}$ be categories, and let $F,G\colon\cat{C}\to\cat{D}$
be functors. To specify a \emph{natural transformation} $\alpha\colon F\Rightarrow G$,
\begin{enumerate}[label=(\roman*)]
\item for each object $c\in\cat{C}$, one specifies a morphism $\alpha_c\colon F(c)\to G(c)$ in $\cat{D}$, called the \emph{$c$-component of $\alpha$}.
\end{enumerate}
These components must satisfy the following, called the \emph{naturality condition}:
\begin{enumerate}[label=(\alph*)]
\item for every morphism $f\colon c\to d$ in $\cat{C}$, the following equation must hold:
\[I(f).\alpha_d=\alpha_c.J(f).\]
\end{enumerate}
A natural transformation $\alpha\colon F\to G$ is called a \emph{natural isomorphism} if each component $\alpha_c$ is an isomorphism in $\cat{D}$.
\end{definition}
\index{naturality condition}\index{commutative diagram}\index{commutative
square}\index{diagram!commutative}
The naturality condition can also be written as a so-called \emph{commutative diagram}. A
diagram in a category is drawn as a graph whose vertices and arrows are labeled by objects and morphisms in the category. For example, here is a diagram that's relevant to the naturality condition in \cref{def.natural_transformation}:
\begin{equation}\label{eqn.naturality_condition}
\begin{tikzcd}
I(c)\ar[r, "\alpha_c"]\ar[d, "I(f)"']&J(c)\ar[d, "J(f)"]\\
I(d)\ar[r, "\alpha_d"']&J(d)
\end{tikzcd}
\end{equation}
\begin{definition}\label{def.diagram_commutes}\index{diagram}
A \emph{diagram} in $\cat{C}$ is a functor from any category
$D\colon\cat{J} \to \cat{C}$. We say that the diagram $D$ \emph{commutes} if $Df=Df'$ holds for every parallel pair of morphisms $f,f'\colon a \to b$ in $\cat{J}$.%
\footnote{We package this formally by saying that $D$ commutes iff it factors through the poset reflection of $\cat{J}$.}
We call $\cat{J}$ the \emph{indexing category} for the diagram.
\end{definition}
In terms of \cref{eqn.naturality_condition}, the only case of two parallel morphisms is that of $I(c)\tto J(d)$, so to say that the diagram commutes is to say that $I(f).\alpha_d=\alpha_c.J(f)$. This is exactly the naturality
condition from \cref{def.natural_transformation}.
\begin{example}
A representative picture is as follows:
\[
\begin{tikzpicture}[x=.7in, y=.3in, inner sep=5pt]
\node (1) at (0,0) {$\LMO{1}$};
\node (2) at (1,0) {$\LMO{2}$};
\draw[->] (1) to node[above=-2pt, font=\scriptsize] {$f$} (2);
\node[draw, fit=(1) (2)] (box) {};
\node[left=0 of box] {$\cat{C}\coloneqq$};
%
\node (a) at (2.8,0) {$\LMO{u}$};
\node[blue] (b) at (3.5,1) {$\LMO{v}$};
\node[blue] (c) at (4.5,1) {$\LMO{w}$};
\node[red] (d) at (3.5,-1) {$\LMO{x}$};
\node[red] (e) at (4.5,-1) {$\LMO{y}$};
\node (f) at (5.2,0) {$\LMO{z}$};
\draw[->] (a) to node[above, font=\scriptsize] {$a$} (b);
\draw[->] (a) to node[below, font=\scriptsize] {$b$} (d);
\draw[->,blue] (b) to node[below=-2pt, font=\scriptsize] {$d$} (c);
\draw[->,green!50!black] (b) to node[right=-2pt, font=\scriptsize] {$c$} (d);
\draw[->,red] (d) to node[above=-2pt, font=\scriptsize] {$e$} (e);
\draw[->,green!50!black] (c) to node[left=-2pt, font=\scriptsize] {$g$} (e);
\draw[->] (c) to node[above, font=\scriptsize] {$h$} (f);
\draw[->] (e) to node[below, font=\scriptsize] {$k$} (f);
\node[draw, fit=(a) (b) (c) (d) (e) (f)] (box) {};
\node[right=0 of box] {$=:\cat{D}$};
%
\begin{scope}[dotted, thick, ->, blue, bend left=25]
\draw (1) to (b);
\draw (2) to (c);
\end{scope}
\begin{scope}[dotted, thick, ->, red, bend right=25]
\draw (1) to (d);
\draw (2) to (e);
\end{scope}
\end{tikzpicture}
\]
We have
depicted, in blue and red respectively, two functors $F,G \colon \cat{C} \to
\cat{D}$. A natural transformation $\alpha\colon F \Rightarrow G$ is given by
choosing components $\alpha_1\colon v\to x$ and $\alpha_2\colon w\to y$; we have highlighted the only choice for each in green.
The key point is that the functors $F$ and $G$ are ways of viewing the category
$\cat{C}$ as lying inside the category $\cat{D}$. The natural transformation
$\alpha$, then, is a way of relating these two views using the morphisms in
$\cat{D}$.
\end{example}
\begin{example}\label{ex.1_inst}
We said in \cref{ex.set_1} that a functor $\Cat{1}\to\smset$ can be identified with a set. So suppose $A$ and $B$ are sets considered as functors $A,B\colon\Cat{1}\to\smset$. A natural transformation between these functors is just a function between the sets.
\end{example}
\begin{definition}\label{def.functor_cat}\index{category!functor category}
Let $\cat{C}$ and $\cat{D}$ be categories. We denote by $\cat{C}^{\cat{D}}$ the category whose objects are functors $F\colon\cat{D}\to\cat{C}$ and whose morphisms $\cat{C}^{\cat{D}}(F,G)$ are the natural transformations $\alpha\colon F\to G$. This category $\cat{C}^\cat{D}$ is called the \emph{functor category}.
\end{definition}
\begin{exercise}
Let's look more deeply at how $\cat{D}^{\cat{C}}$ is a category.
\begin{enumerate}
\item Figure out how to compose natural transformations. (Hint: an expert tells you ``for each object $c\in\cat{C}$, compose the $c$-components''.)
\item Propose an identity natural transformation on any object $F\in\cat{D}^\cat{C}$, and check that it is unital.
\qedhere
\end{enumerate}
\end{exercise}
\begin{example}
In our new language, \cref{ex.1_inst} says that $\smset^{\Cat{1}}$ is equivalent to $\smset$.
\end{example}
\begin{example}
Let $\cat{N}$ denote the category associated to the poset $(\NN,\leq)$, and recall from \cref{ex.poset_functor} that we can identify a functor $F\colon\cat{N}\to\cat{N}$ with a non-decreasing sequence $(F_0,F_1,F_2)$ of natural numbers, i.e.\ $F_0\leq F_1\leq F_2\leq\cdots$. If $G$ is another functor, considered as a non-decreasing sequence, then what is a natural transformation $\alpha\colon F\to G$?
Since there is at most one morphism between two objects in a poset, each component $\alpha_n\colon F_n\to G_n$ has no ``data'', it just tells us a fact: that $F_n\leq G_n$. And the naturality condition is vacuous: every square in a poset commutes. So a natural transformation between $F$ and $G$ exists iff $F_n\leq G_n$ for each $n$. Any two natural transformations $F\Rightarrow G$ are the same. In other words, $\cat{N}^\cat{N}$ is itself a poset; see \cref{subsubsec.pos_free_spectrum}.
\end{example}
\begin{exercise}
Let $\cat{C}$ be an arbitrary category and let $\cat{P}$ be a poset, thought of as a category. Consider the following statements:
\begin{enumerate}
\item For any two functors $F,G\colon\cat{C}\to\cat{P}$, there is at most one natural transformation $F\to G$.
\item For any two functors $F,G\colon\cat{P}\to\cat{C}$, there is at most one natural transformation $F\to G$.
\end{enumerate}
For each, if it is true, say why; if it is false, give a counterexample.
\end{exercise}
\begin{remark} \label{rem.poset_boolcats2}\index{equivalence of categories}
Recall that in \cref{rem.poset_boolcats} we said the category of posets is
equivalent to the category of $\Bool$-categories. We can now state the precise meaning
of this sentence. First, there exists a category $\Cat{Pos}$ in which the
objects are posets and the morphisms are monotone maps. Second, there exists a
category $\Bool\textrm{-}\Cat{Cat}$ in which the objects are $\Bool$-categories and the
morphisms are $\Bool$-functors. We call these two categories equivalent because
there exist functors $F\colon \Cat{Pos} \to \Bool\textrm{-}\Cat{Cat}$ and $G\colon
\Bool\textrm{-}\Cat{Cat} \to \Cat{Pos}$ such that there exist natural isomorphisms $F\cp G
\cong \id_{\Cat{Pos}}$ and $G\cp F \cong \id_{\Bool\textrm{-}\Cat{Cat}}$ in the sense of \cref{def.natural_transformation}.
\end{remark}
%---- Subsection ----%
\subsection{The category of instances on a schema} \label{subsec.instances_cat}
\begin{definition}\label{def.instance}\index{database!instance}
Suppose that $\cat{C}$ is a database schema and $I,J\colon\cat{C}\to\smset$ are database instances. An \emph{instance homomorphism} between them is a natural transformation $\alpha\colon I\to J$. Write $\cat{C}\inst\coloneqq\smset^\cat{C}$ to denote the functor category as defined in \cref{def.functor_cat}.
\end{definition}
We saw in \cref{ex.1_inst} that $\Cat{1}\inst$ is equivalent to the category
$\smset$. In this subsection, we will show that there is a schema whose instances are graphs and whose instance homomorphisms are graph homomorphisms.
% Subsection %
%\subsection{Graphs as a category of instances}\label{subsubsec.graph_instances}
\paragraph{Extended example: the category of graphs as a functor category}
\index{graph!as functor} \index{primordial ooze}
You may find yourself back in the primordial ooze (first discussed in \cref{subsec.posets_Bool_enriched}), because we have been using graphs to present categories; now we obtain graphs themselves as database instances on a specific schema (which is itself a graph):
\[
\Cat{Gr}\coloneqq\boxCD{
\begin{tikzcd}[ampersand replacement=\&, column sep=50pt]
\LTO{Arrow}\ar[r, shift left, "\text{source}"]\ar[r, shift right, "\text{target}"']\&\LTO{Vertex}
\end{tikzcd}
\\~\\\footnotesize
\textit{no equations}
}
\]
Here's an example $\Cat{Gr}$-instance, i.e.\ set-valued functor $I\colon\Cat{Gr}\to\smset$, in table form:
\[
\begin{tabular}{ c | c c}
\textbf{Arrow}&\textbf{source}&\textbf{target}\\\hline
a&1&2\\
b&1&3\\
c&1&3\\
d&2&2\\
e&2&3\\
\end{tabular}
\hspace{1in}
\begin{tabular}{ c |}
\textbf{Vertex}\\\hline
1\\
2\\
3\\
4\\
~
\end{tabular}
\]
Here $I(\mathrm{Arrow})=\{a,b,c,d,e\}$, and $I(\mathrm{Vertex})=\{1,2,3,4\}$. One can draw the instance $I$ as a graph:
\[
I=\fbox{
\begin{tikzcd}[ampersand replacement=\&]
\LMO{1}\ar[r, "a"]\ar[dr, shift left, "b"]\ar[dr, shift right, "c"']\&
\LMO{2}\ar[d, "e"]\ar[loop right, "d"]\\
\&\LMO{3}\&\LMO{4}
\end{tikzcd}
}
\]
Every row in the Vertex table is drawn as a vertex, and every row in the Arrow table is drawn as an arrow, connecting its specified source and target. Every possible graph can be written as a database instance on the schema $\Cat{Gr}$, and every possible $\Cat{Gr}$-instance can be represented as a graph.
\begin{exercise}
In \cref{eqn.free_schema}, a graph is shown (forget the distinction between white and black nodes). Write down the corresponding $\Cat{Gr}$-instance. (Do not be concerned that you are in the primordial ooze.)
\end{exercise}
Thus the objects in the category $\Cat{Gr}\inst$ are the graphs. In fact, the morphisms in $\Cat{Gr}\inst$ are called graph homomorphisms. Let's unwind this. Suppose that $G,H\colon\Cat{Gr}\to\smset$ are functors (i.e.\ $\Cat{Gr}$-instances); that is, they are objects $G,H\in\Cat{Gr}\inst$. A morphism $G\to H$ is a natural transformation $\alpha\colon G\to H$ between them; what does that entail?
By \cref{def.natural_transformation}, since $\Cat{Gr}$ has two objects, $\alpha$ consists of two components,
\[
\alpha_{\Set{Vertex}}\colon G(\Set{Vertex})\to H(\Set{Vertex})
\qquad\text{ and }\qquad
\alpha_{\Set{Arrow}}\colon G(\Set{Arrow})\to H(\Set{Arrow})
\]
both of which are morphisms in $\smset$. In other words, $\alpha$ consists of a function from vertices of $G$ to vertices of $H$ and a function from arrows of $G$ to arrows of $H$. For these functions to constitute a graph homomorphism, they must ``respect source and target'' in the precise sense that the naturality condition, \cref{eqn.naturality_condition} holds. That is, for every morphism in $\Cat{Gr}$, namely $\text{source}$ and $\text{target}$, the following diagrams must commute:
\[
\begin{tikzcd}[column sep=large]
G(\text{Arrow})\ar[r, "\alpha_{\Set{Arrow}}"]\ar[d, "G(\text{source})"']&
H(\text{Arrow})\ar[d, "H(\text{source})"]\\
G(\text{Vertex})\ar[r, "\alpha_{\Set{Vertex}}"']&
H(\text{Vertex})
\end{tikzcd}
\hspace{.8in}
\begin{tikzcd}[column sep=large]
G(\text{Arrow})\ar[r, "\alpha_{\Set{Arrow}}"]\ar[d, "G(\text{target})"']&
H(\text{Arrow})\ar[d, "H(\text{target})"]\\
G(\text{Vertex})\ar[r, "\alpha_{\Set{Vertex}}"']&
H(\text{Vertex})
\end{tikzcd}
\]
These may look complicated, but they say exactly what we want. We want the functions $\alpha_{\Set{Vertex}}$ and $\alpha_{\Set{Arrow}}$ to respect source and targets in $G$ and $H$. The left diagram says ``start with an arrow in $G$. You can either apply $\alpha$ to the arrow and then take its source in $H$, or you can take its source in $G$ and then apply $\alpha$ to that vertex; either way you get the same answer''. The right-hand diagram says the same thing about targets.
\begin{example}
Consider the graphs $G$ and $H$ shown below
\[
G\coloneqq\boxCD{
\begin{tikzcd}[ampersand replacement=\&]
\LMO{1}\ar[r, "a"]\& \LMO{2}\ar[r, "b"]\& \LMO{3}
\end{tikzcd}
}
\hspace{.7in}
H\coloneqq\boxCD{
\begin{tikzcd}[ampersand replacement=\&]
\LMO{4}\ar[r, shift right, "c"']\ar[r, shift left, "d"]\& \LMO{5}\ar[loop right, "e"]
\end{tikzcd}
}
\]
Here they are, written as database instances---i.e.\ set-valued functors---on $\Cat{Gr}$:
\[
\begin{array}{c c c c c}
G\coloneqq&&
\begin{array}{c | c c}
\textbf{Arrow}&\textbf{source}&\textbf{target}\\
a&1&2\\
b&2&3\\
~
\end{array}
&&
\begin{array}{c |}
\textbf{Vertex}\\
1\\
2\\
3\\
\end{array}
\\
H\coloneqq&
\begin{array}{c | c c}
\color{gray}{\textbf{Arrow}}&\color{gray}{\textbf{source}}&\color{gray}{\textbf{target}}\\
c&4&5\\
d&4&5\\
e&5&5
\end{array}
&&
\begin{array}{c |}
\color{gray}{\textbf{Vertex}}\\
4\\
5\\
~
\end{array}
\end{array}
\]
The top row is $G$ and the bottom row is $H$. They are offset so you can more easily complete the following exercise.
\end{example}
\begin{exercise}\label{exc.unique_alpha}\index{graph!homomorphism}
We claim that there is exactly one graph homomorphism $\alpha\colon G\to H$ such that $\alpha_{\Set{Arrow}}(a)=d$.
\begin{enumerate}
\item What is the other value of $\alpha_{\Set{Arrow}}$, and what are the three values of $\alpha_{\Set{Vertex}}$?
\item Draw $\alpha_{\Set{Arrow}}$ as two lines connecting the cells in the ID column of $G(\Set{Arrow})$ to those in the ID column of $H(\Set{Arrow})$. Similarly, draw $\alpha_{\Set{Vertex}}$ as connecting lines.
\item Check the source column and target column and make sure that the matches are natural, i.e.\ that ``alpha-then-source equals source-then-alpha'' and similarly for ``target''.
\qedhere
\end{enumerate}
\end{exercise}
%-------- Section --------%
\section{Adjunctions and data migration}\label{sec.adjunctions_mig}
We have talked about how set-valued functors on a schema can be understood as filling that schema with data. But there are also functors between schemas. When the two sorts of functors are composed, data is migrated. This is the simplest form of data migration; more complex ways to migrate data come from using adjoints. All of this is the subject of this final section.
%---- Subsection ----%
\subsection{Pulling back data along a functor}\label{subsec.pullback_data}
\index{data migration}
To begin, we will migrate data between the graph-indexing schema $\Cat{Gr}$ and the loop schema, which we call $\Cat{DDS}$, shown below
\[
\Cat{Gr}\coloneqq\boxCD{
\begin{tikzcd}[ampersand replacement=\&, column sep=50pt]
\LTO{Arrow}\ar[r, shift left, "\text{source}"]\ar[r, shift right, "\text{target}"']\&\LTO{Vertex}
\end{tikzcd}
\\~\\\footnotesize
\textit{no equations}
}
\hspace{1in}
\Cat{DDS}\coloneqq\boxCD{
\begin{tikzcd}[ampersand replacement=\&]
\LTO{State}\ar[loop below, "\text{next}"]
\end{tikzcd}
\\~\\\footnotesize
\textit{no equations}
}
\]
We begin by writing down a sample instance $I\colon\Cat{DDS}\to\smset$ on this schema:
\begin{equation}\label{eqn.state_table}
\begin{array}{c | c}
\textbf{State}&\textbf{next}\\\hline
1 & 4\\
2 & 4\\
3 & 5\\
4 & 5\\
5 & 5\\
6 & 7\\
7 & 6
\end{array}
\end{equation}
We call the schema $\Cat{DDS}$ to stand for discrete dynamical system. Once we migrate the data in \cref{eqn.state_table} to data on $\Cat{Gr}$, it will be a graph and we will draw it. In the meantime, think of the data in \cref{eqn.state_table} like the states of a deterministic machine: at every point in time it is in a state and in the next instant it transforms to a next state.
We will use a functor connecting schemas in order to move data between them. The reader can create any functor she likes, but we will use a specific functor $F\colon\Cat{Gr}\to\Cat{DDS}$ to migrate data in a way that makes sense to us, the authors. Here we draw $F$, using colors to hopefully aid understanding:
\[
\begin{tikzpicture}[color=blue]
\node (Arrow) {$\LTO{Arrow}$};
\node[below=of Arrow] (Vertex) {$\LTO{Vertex}$};
\draw[->]
($(Arrow.south)+(-3pt,0)$) to node[left, font=\scriptsize] (source) {source}
($(Vertex.north)+(-3pt,0)$);
\draw[->, color=red]
($(Arrow.south)+(3pt,0)$) to node[right, font=\scriptsize] (target) {target}
($(Vertex.north)+(3pt,0)$);
\node[draw, color=black, fit=(Arrow) (Vertex) (source) (target)] (Gr) {};
\node[below=0 of Gr, black] {$\Cat{Gr}$};
%
\node[right=2 of target] (State) {$\LTO{State}$} edge [loop below, red] node[font=\scriptsize] (next) {next} ();
\node[draw, color=black, fit=(State) (next)] (DDS) {};
\node[below=0 of DDS, black] {$\Cat{DDS}$};
%
\draw[functor, black] (Gr.east|-target) to node[above, font=\scriptsize] {$F$} (DDS.west|-State);
\end{tikzpicture}
\]
$F$ sends both objects of $\Cat{Gr}$ to the `State' object of $\Cat{DDS}$; it sends the `source' morphism to the identity morphism on `State', and the `target' morphism to the morphism `next'.
Sample data on $\Cat{DDS}$ was given in \cref{eqn.state_table}; we think of it as a functor $I\colon\Cat{DDS}\to\smset$. So now we have two functors as follows:
\[
\begin{tikzcd}
\Cat{Gr}\ar[r, "F"]&\Cat{DDS}\ar[r, "I"]&\smset.
\end{tikzcd}
\]
Objects in $\Cat{Gr}$ are sent by $F$ to objects in $\Cat{DDS}$, which are sent by $I$ to objects in $\smset$, which are sets. Morphisms in $\Cat{Gr}$ are sent by $F$ to morphisms in $\Cat{DDS}$, which are sent by $I$ to morphisms in $\smset$, which are functions. This defines a composite functor $F\cp I\colon\Cat{Gr}\to\smset$. Both $F$ and $I$ respect identities and composition, so $F\cp I$ does too. Thus we have obtained an instance on $\Cat{Gr}$, i.e.\ we have converted our discrete dynamical system from \cref{eqn.state_table} into a graph! What graph is it?
For an instance on $\Cat{Gr}$, we need to fill an Arrow table and a Vertex table. Both of these are sent by $F$ to State, so let's fill both with the rows of State in \cref{eqn.state_table}. Similarly, since $F$ sends `source' to the identity and sends `target' to `next', we obtain the following tables:
\[
\begin{tabular}{ c | c c}
\textbf{Arrow}&\textbf{source}&\textbf{target}\\\hline
1&1&4\\
2&2&4\\
3&3&5\\
4&4&5\\
5&5&5\\
6&6&7\\
7&7&6
\end{tabular}
\hspace{1in}
\begin{tabular}{ c |}
\textbf{Vertex}\\\hline
1\\
2\\
3\\
4\\
5\\
6\\
7
\end{tabular}
\]
Now that we have a graph, we can draw it.
\[
\boxCD{
\begin{tikzcd}[column sep=20pt, row sep=5, pos=.1, ampersand replacement=\&]
\&\LMO{1}\ar[dr, "1"']\&\&\LMO{2}\ar[dl,"2"]\\
\LMO{3}\ar[dr, "3"']\&\&\LMO{4}\ar[dl, "4"]\&\&\LMO{6}\ar[r, bend left, pos=.5, "6"]\&[20pt]\LMO{7}\ar[l, bend left, pos=.5, "7"]\\
\&\LMO[under]{5}\ar[loop below, looseness=5, pos=.5, "5"]
\end{tikzcd}
}
\]
Each arrow is labeled by its source vertex, as if to say ``I am only what I do.''
\begin{exercise}
Consider the functor $G\colon\Cat{Gr}\to\Cat{DDS}$ given by sending `source' to `next' and sending `target' to the identity on `State'. Migrate the same data, \cref{eqn.state_table}, using $G$. Write down the tables and draw the corresponding graph.
\end{exercise}
We call the above procedure, ``pulling back data along a functor''. We pulled
back $I$ along $F$.\index{pullback along a map}
\begin{definition}\label{def.pullback_along}
Let $\cat{C}$ and $\cat{D}$ be categories and let $F\colon\cat{C}\to\cat{D}$ be a functor. For any set-valued functor $I\colon\cat{D}\to\smset$, we refer to the functor $F\cp I\colon\cat{C}\to\smset$ as the \emph{pullback of $I$ along $F$}.
Given a natural transformation $\alpha\colon I\to J$, there is a natural transformation $\alpha_F\colon F\cp I\to F\cp J$, whose component $(F\cp I)(c)\to (F\cp J)(c)$ for any $c\in\Ob(\cat{C})$ is given by $(\alpha_F)_c\coloneqq\alpha_{Fc}$.
Thus we have used $F$ to define a functor which we denote $\Delta_F\colon\cat{D}\inst\to\cat{C}\inst.$
\end{definition}
%---- Subsection ----%
\subsection{Adjunctions}\label{subsec.adjoints_lims_colims}
In \cref{sec.galois_connections} we discussed Galois connections. These are
adjunctions between posets. Now that we've defined categories and functors, we
can discuss adjunctions in general. The relevance to databases is that the
data migration functor $\Delta$ from \cref{def.pullback_along} always has two
adjoints of its own: a left adjoint which we denote $\Sigma$ and a right adjoint
which we denote $\Pi$.
Recall that an adjunction between posets $P$ and $Q$ is a pair of monotone maps
$f\colon P \to Q$ and $g\colon Q \to P$ that are \emph{almost} inverses: we have
\begin{equation}\label{eqn.poset_adjunction_revisit}
f(p) \le q \mbox{ if and only if } p \le g(q).
\end{equation}
Recall from \cref{subsubsec.pos_free_spectrum} that in a poset $P$, a hom-set $P(a,b)$ has one element when $a \le b$, and no elements otherwise. Thus we can thus rephrase \cref{eqn.poset_adjunction_revisit} as an isomorphism of sets $Q(f(p),q) \cong P(p,g(q))$: either both are one-element sets or both are 0-element sets.
This suggests how to define adjunctions in the general case.
\begin{definition}\label{def.adjoints}\index{adjunction}
Let $\cat C$ and $\cat D$ be categories, and $L\colon \cat C \to \cat D$ and $R
\colon \cat D \to \cat C$ be
functors. We say that \emph{$L$ is left adjoint to $R$} (and that \emph{$R$ is
right adjoint to $L$}) if, for any $c\in\cat{C}$ and $d\in\cat{D}$, there is an isomorphism of hom-sets
\[\alpha_{c,d}\colon\cat{C}(c,R(d))\To{\cong}\cat{D}(L(c),d)\]
that is natural in $c$ and $d$.%
%\footnote{This naturality is between functors $\cat{C}\op\times\cat{D}\to\smset$. It says that for any morphisms $f\colon c\to c'$ in $\cat{C}$ and $g\colon d\to
%d'$ in $\cat{D}$, the following diagram commutes:
%\[
%\begin{tikzcd}[sep=large, ampersand replacement=\&]
% \cat{C}(c,Gd) \ar[d, "{\cat C(f,Gg)}"'] \ar[r, "\alpha_{c,d}"] \&
% \cat{D}(Fc,d) \ar[d, "{\cat D(Ff,g)}"] \\
% \cat{C}(c',Gd') \ar[r, "\alpha_{c',d'}"'] \&
% \cat{D}(Fc',Gd')
%\end{tikzcd}
%\]
%}
%Given a morphism $f\colon c\to R(d)$ in $\cat{C}$, its image $g\coloneqq\alpha_{c,d}(f)$ is called its \emph{mate}. Similarly, the mate of $g\colon L(c)\to d$ is $f$.
To denote an adjunction we write $L \dashv R$, or in diagrams,
\[
\begin{tikzcd}[column sep=large]
\cat{C}\ar[r, shift left=2pt, bend left=15, "L"]\ar[r, phantom,
"\scriptstyle\bot"]&\cat{D}\ar[l, shift left=2pt, bend left=15, "R"]
\end{tikzcd}
\]
with the $\bot$-sign resting against the right adjoint.
\end{definition}
\begin{example}
Recall that every poset $\cat{P}$ can be regarded as a category. Galois connections between posets and adjunctions between the corresponding categories are exactly the same thing.
\end{example}
\begin{example}\label{ex.currying}\index{currying}
Let $B\in\Ob(\smset)$ be any set. There is an adjunction called ``currying
$B$'', after the logician Haskell Curry:
\[
\begin{tikzcd}[column sep=large]
\smset\ar[r, shift left=2pt, bend left=15, "-\times B"]\ar[r, phantom,
"\scriptstyle\bot"]&\smset\ar[l, shift left=2pt, bend left=15, "(-)^B"]
\end{tikzcd}
\hspace{1in}
\smset(A\times B,C)\cong\smset(A,C^B)
\]
Abstractly we write it as on the left, but what this means is that for any sets
$A,C$, there is a natural isomorphism as on the right.
To explain this, we need to talk about exponential objects in $\smset$. Suppose
that $B$ and $C$ are sets. Then the set of functions $B\to C$ is also a set;
let's denote it $C^B$. It's written this way because if $C$ has $10$ elements and $B$ has $3$ elements then $C^B$ has $10^3$ elements, and more generally for any two finite sets $|C^B|=|C|^{|B|}$.
The idea of currying is that given sets $A$, $B$, and $C$, there is a one-to-one
correspondence between functions $(A\times B)\to C$ and functions $A\to C^B$.
Intuitively, if I have a function $f$ of two variables $a,b$, I can ``put off''
entering the second variable: if you give me just $a$, I'll return a function
$B\to C$ that's waiting for the $B$ input. This is the curried version of $f$.
\end{example}
\begin{exercise}
In \cref{ex.currying}, we discussed an adjunction between functors $-\times B$
and $(-)^B$. But we only said how these functors worked on objects: for any set
$X$, they return sets $X\times B$ and $X^B$ respectively.
\begin{enumerate}
\item Given a morphism $f\colon X\to Y$, what morphism should $-\times B\colon X\times B\to Y\times B$ return?
\item Given a morphism $f\colon X\to Y$, what morphism should $(-)^B\colon X^B\to Y^B$ return?
\item Consider the function $+\colon\NN\times\NN\to\NN$, which sends
$(a,b)\mapsto a+b$. Currying $+$, we get a certain function
$p\colon\NN\to\NN^\NN$. What is $p(3)$?
\qedhere
\end{enumerate}
\end{exercise}
\begin{example}\label{ex.adjunctions}\index{adjunction!examples}
If you know some abstract algebra or topology, here are some other examples of adjunctions.
\begin{enumerate}
\item Free constructions: given any set you get a free group, free monoid, free ring, free vector space, etc.; each of these is a left adjoint. The corresponding right adjoint takes a group, a monoid, a ring, a vector space etc.\ and forgets the algebraic structure to return the ``underlying set''.
\item Similarly, given a graph you get a free category or a free poset, as we discussed in \cref{subsubsec.pos_free_spectrum}; each is a left adjoint. The corresponding right adjoint is the ``underlying graph'' of a category or of a poset.
\item Discrete things: given any set you get a discrete graph, a discrete metric space, a discrete topological space; each of these is a left adjoint. The corresponding right adjoint is again ``underlying set.''
\item Given a group, you can quotient by its commutator subgroup to get an abelian group; this is a left adjoint. The right adjoint is the inclusion of abelian groups into groups.
\qedhere
\end{enumerate}
\end{example}
% Subsubsection %
\subsection{Left and right pushforward functors, $\Sigma$ and
$\Pi$}\index{$\Sigma$}\index{$\Pi$}
Given $F\colon\cat{C}\to\cat{D}$, the data migration functor $\Delta_F$ turns
$\cat{D}$-instances into $\cat{C}$-instances. This functor has both a left and a
right adjoint:
\[
\begin{tikzcd}[column sep=70pt]
\cat{C}\inst\ar[r, shift left=8pt, bend left=8pt, "\Sigma_F"]\ar[r, shift right=8pt, bend right=8pt, "\Pi_F"']
\ar[r, shift left=9pt, phantom, "\scriptstyle\bot"]
\ar[r, shift right=9pt, phantom, "\scriptstyle\bot"]
&
\cat{D}\inst\ar[l, "\Delta_F" description]
\end{tikzcd}
\]
Using the names $\Sigma$ and $\Pi$ in this context is fairly standard in category theory. In the case of
databases, they have the following helpful mnemonic:
\[
\begin{tabular}{c | p{.8in} p{1.1in} p{1.5in}}
\textbf{Migration Functor}&\textbf{Pronounced}&\textbf{Reminiscent of}&\textbf{Database idea}\\\hline
$\Delta$ & Delta & Duplicate\qquad~ or destroy&Duplicate or destroy tables or columns\\
$\Sigma$ & Sigma & Sum & Union (sum up) data\\\\
$\Pi$ & Pi & Product & Pair\footnote{This is more commonly called ``join''
by database programmers.} and query data
\end{tabular}
\]
Just like we used $\Delta_F$ to pull back any discrete dynamical system along $F\colon\Cat{Gr}\to\Cat{DDS}$ and get a graph,
the migration functors $\Sigma_F$ and $\Pi_F$ can be used to turn any graph into a
discrete dynamical system. Indeed, given instance $J\colon\Cat{Gr}\to\smset$, we
would get instances $\Sigma_F(J)$ and $\Pi_F(J)$ on $\Cat{DDS}$. This, however, is quite
technical, and we leave it to the adventurous reader to compute an example, with
help perhaps from \cite{Spivak:2014a}, which explores the definitions of
$\Sigma$ and $\Pi$ in detail. A less technical shortcut is simply to code up the
computation in the open-source FQL software.
To get the basic idea across without getting mired in technical idea, here we
shall instead discuss a very simple example. Recall the schemas from
\cref{eqn.airline_schemas}. We can set up a functor between them, the one sending black dots to black dots and white dots to white dots:
\[
\begin{tikzpicture}[commutative diagrams/every diagram, inner sep=10pt]
\matrix[matrix of math nodes, name=A, row sep=20pt, commutative diagrams/every cell] {
&
|(AD)|\LTO[\circ]{\$}
\\
|(AE)|\LTO{Economy}
&&
|(AF)|\LTO{First Class}
\\
&
|(AS)|\LTO[\circ]{string}
\\
};
%
\path[commutative diagrams/.cd, every arrow, every label, font=\scriptsize]
(AE) edge["Price"] (AD)
edge["Position"'] (AS)
(AF) edge["Price"'] (AD)
edge["Position"] (AS);
\node[draw, inner ysep=0pt, fit=(AE) (AD) (AF) (AS)] (A box) {};
\node[left=0 of A box] {$A\coloneqq$};
%%
\matrix[matrix of math nodes, name=B, row sep=20pt, commutative diagrams/every cell, matrix anchor=south, right=3 of A.south east] {
|(BD)|\LTO[\circ]{\$}
\\
|(BAS)|\LTO{Airline Seat}
\\
|(BS)|\LTO[\circ]{string}
\\
};
%
\path[commutative diagrams/.cd, every arrow, every label, font=\scriptsize]
(BAS) edge["Price"'] (BD)
edge["Position"] (BS);
\node[draw, inner ysep=0pt, fit=(BAS) (BD) (BS)] (B box) {};
\node[right=0 of B box] {$=:B$};
\draw[functor] (A box.east) to node[above, font=\scriptsize] {$F$} (B box.west);
\end{tikzpicture}
\]
With this functor $F$ in hand, we can transform any $B$-instance into an
$A$-instance using $\Delta_F$. Whereas $\Delta$ was interesting in the case of
turning discrete dynamical systems into graphs in \cref{subsec.pullback_data},
it is not very interesting in this case. Indeed, it will just copy---$\Delta$
for duplicate---the rows in Airline seat into both Economy and First Class.
$\Delta_F$ has two adjoints, $\Sigma_F$ and $\Pi_F$, both of which transform
any $A$-instance $I$ into a $B$-instance. The functor $\Sigma_F$ does the what one would most expect by reading the names on each object: it will put into Airline Seat the union of Economy and First Class:
\[\Sigma_F(I)(\mathrm{Airline\ Seat})=I(\mathrm{Economy})\sqcup I(\mathrm{First\ Class}).\]
The functor $\Pi_F$ puts into Airline Seat the set of those pairs $(e,f)$ where
$e$ is an Economy seat, $f$ is a First Class seat, and $e$ and $f$ have the same
price and position. In this particular example, one imagine that there should be
no such seats in a valid instance $I$, in which case $\Pi_F(I)(\mathrm{Airline\
Seat})$ would be empty. But in other uses of these same schemas, $\Pi_F$ can be a useful
operation. For example, in the schema $A$ replace the label `Economy' by
`Rewards Program', and in $B$ replace `Airline Seat' by `First Class Seats'. Then the operation $\Pi_F$ finds those first class seats that are
also rewards program seats. This operation is a kind of database query; querying
is the operation that databases are built for.\index{database!query}
The moral is that complex data migrations can be specified by constructing
functors $F$ between schemas and using the ``induced'' functors $\Delta_F$, $\Sigma_F$, and $\Pi_F$.
Indeed, in practice essentially all useful migrations can be built in this
way. Hence the language of categories provides a framework for specifying
and reasoning about data migrations.
% Subsubsection %
\subsection{Single set summaries of databases}
To give a stronger idea of the flavor of $\Sigma$ and $\Pi$, we consider another
special case, namely where the target category $\cat{D}$ is equal to $\Cat{1}$; see \cref{exc.Cat_n}.
In this case, there is exactly one functor $\cat{C}\to\Cat{1}$ for any
$\cat{C}$; let's denote it $!\colon\cat{C}\to\Cat{1}$.
\begin{exercise} \label{ex.terminal_cat}
Describe this functor $!\colon \cat{C} \to \Cat{1}$. Where does it send each
object? What about each morphism?
\end{exercise}
We want to consider the data migration functors
$\Sigma_!\colon\cat{C}\inst\to\Cat{1}\inst$ and
$\Pi_!\colon\cat{C}\inst\to\Cat{1}\inst$. In \cref{ex.1_inst}, we saw that an
instance on $\Cat{1}$ is the same thing as a set. So let's identify
$\Cat{1}\inst$ with $\smset$, and hence discuss
\[
\Sigma_!\colon\cat{C}\inst\to\smset
\qquad\text{and}\qquad
\Pi_!\colon\cat{C}\inst\to\smset.
\]
Given any schema $\cat{C}$ and instance $I\colon\cat{C}\to\smset$, we will get
sets $\Sigma_!(I)$ and $\Pi_!(I)$. Thinking of these sets as database instances,
each corresponds to a single one-column table---a controlled
vocabulary---summarizing an entire database instance on the schema $\cat{C}$.
Consider the following schema
\[
\cat{G}\coloneqq\boxCD{
\begin{tikzcd}[ampersand replacement=\&, column sep=60pt]
\LTO{Email}\ar[r, shift left, "\text{sent\_by}"]\ar[r, shift right, "\text{received\_by}"']\&\LTO{Address}
\end{tikzcd}
\\~\\\footnotesize
\textit{no equations}
}
\]
Here's a sample instance $I\colon\cat{G}\to\smset$:
\[
\begin{tabular}{ c | c c}
\textbf{Email}&\textbf{sent\_by}&\textbf{received\_by}\\\hline
Em\_1 & Bob & Grace\\
Em\_2 & Grace & Pat\\
Em\_3 & Bob & Emory\\
Em\_4 & Sue & Doug\\
Em\_5 & Doug & Sue\\
Em\_6 & Bob & Bob
\end{tabular}
\hspace{1in}
\begin{tabular}{ c |}
\textbf{Address}\\\hline
Bob\\
Doug\\
Emory\\
Grace\\
Pat\\
Sue
\end{tabular}
\]
\begin{exercise}
Note that $\cat{G}$ is isomorphic to the schema $\Cat{Gr}$. In
\cref{subsec.instances_cat} we saw that instances on $\Cat{Gr}$ are graphs. Draw
the above instance $I$ as a graph.
\end{exercise}
%Of course, $\cat{G}$ is isomorphic to the schema $\Cat{Gr}$, so we can draw this instance as a graph:
%\[
%\boxCD{
%\begin{tikzcd}[ampersand replacement=\&]
% \LTO{Bob}\ar[loop above, "6"]\ar[r, "1"]\ar[d, "3"']\&\LTO{Grace}\ar[r, "2"]\&\LTO{Pat}\\
% \LTO{Emory}\&\LTO{Sue}\ar[r, shift left, "4"]\&\LTO{Doug}\ar[l, shift left, "5"]
%\end{tikzcd}
%}
%\]
Now we have a unique functor $!\colon\cat{G}\to\Cat{1}$, and we want to say what $\Sigma_!(I)$ and $\Pi_!(I)$ give us as single-set summaries. First, $\Sigma_!(I)$ tells us all the emailing groups---the ``connected components''---in $I$:
\[
\begin{tabular}{ c |}
\textbf{1}\\\hline
Bob-Grace-Pat-Emory\\
Sue-Doug
\end{tabular}
\]
This form of summary, involving identifying entries into common groups, or
quotients, is typical of $\Sigma$-operations.
The functor $\Pi_!(I)$ lists the emails from $I$ which were sent from a person to
her- or him-self.
\[
\begin{tabular}{ c |}
\textbf{1}\\\hline
Em\_6
\end{tabular}
\]
This is again a sort of query, selecting the entries that fit the criterion of
self-to-self emails. Again, this is typical of $\Pi$-operations.
Where do these facts---that $\Pi_!$ and $\Sigma_!$ act the way we said---come
from? Everything follows from the definition of adjoint functors
(\ref{def.adjoints}): indeed we hope this, together with the examples given in \cref{ex.adjunctions}, give the reader some idea of how general and useful adjunctions are, both in mathematics and in database theory.
One more point: while we will not spell out the details, we note that these
operations are also examples of constructions known as small colimits and limits in
$\smset$. We end this chapter by exploring these key category theoretic
constructions. The reader should keep in mind that, in general and not just for
functors to $\Cat{1}$, $\Sigma$-operations are built from colimits in $\smset$,
and $\Pi$-operations are built from limits in $\smset$.
%-------- Section --------%
\section{Bonus: An introduction to limits and colimits}\label{sec.bonus_lims_colims}
What do products of sets, the results of $\Pi_!$-operations on database instances, and meets in a poset all have in
common? The answer, as we shall see, is that they are all examples of limits. Similarly, disjoint unions of sets, the results of $\Sigma_!$-operations on database instances, and joins in a poset are all colimits. Let's begin with limits.
Recall that $\Pi_!$ takes a database instance $I\colon \cat{C} \to \smset$ and turns it
into a set $\Pi_!(I)$. More generally, limits turn a functor $F\colon \cat{C} \to
\cat{D}$ into an object of $\cat{D}$.
%---- Subsection ----%
\subsection{Terminal objects and products} \label{subsec.terminals_products}
Terminal objects and products are each a sort of limit. Let's discuss them in turn.
\paragraph{Terminal objects}
The most basic limit is a terminal object.
\begin{definition}\index{terminal object}
Let $\cat{C}$ be a category. Then an object $Z$ in $\cat{C}$ is a \emph{terminal
object} if, for each object $C$ of $\cat{C}$, there exists a unique morphism
$!\colon C \to Z$.
\end{definition}
Since this unique morphism exists for all objects in $\cat{C}$, we say that
terminal objects have a \emph{universal property}.\index{universal property}
\begin{example}
In $\smset$, any set with exactly one element is a terminal object. Why?
Consider some such set $\{\bullet\}$. Then for any other set $C$ we need to
check that there is exactly one function $!\colon C \to \{\bullet\}$. This
unique function is the one that does the only thing that can be done: it maps
each element $c \in C$ to the element $\bullet \in \{\bullet\}$.
\end{example}
\begin{exercise}
Let $(P,\le)$ be a poset. Show that $z \in P$ is a terminal object if and
only if it is maximal: that is, if and only if for all $c \in P$ we have $c \le
z$.
\end{exercise}
\begin{exercise}
What is the terminal object in the category $\Cat{Cat}$? (Hint: recall
\cref{ex.terminal_cat}.)
\end{exercise}
\begin{exercise}
Not every category has a terminal object. Find one that doesn't.
\end{exercise}
\begin{proposition}
All terminal objects are isomorphic.
\end{proposition}
\begin{proof}
This is a simple, but powerful standard argument. Suppose $Z$ and $Z'$ are both
terminal objects in some category $\cat{C}$. Then there exist unique maps
$a\colon Z \to Z'$ and $b\colon Z \to Z'$. Composing these, we get maps
$a\cp b\colon Z \to Z$ and $b\cp a\colon Z'\to Z'$. But we know there is a map $\id_Z\colon Z \to Z$, and
since $Z$ is terminal there is only one such map. So we must have $a\cp b = \id_Z$.
Similarly, we find that $b \cp a= \id_{Z'}$. Thus $a$ (and its inverse $b$) is an isomorphism.
\end{proof}
\begin{remark}[``The limit'' vs.\ ``a
limit'']\label{rem.the_vs_a}\index{universal property}
Not only are all terminal objects isomorphic, there is a unique isomorphism between any two. We hence say ``terminal objects are unique up to unique isomorphism.'' To a category theorist, this is
very nearly the same thing as saying ``all terminal objects are equal''. Thus we often abuse terminology and talk of `the' terminal object, rather than `a' terminal
object. We will do the same for any sort of limit or colimit, e.g.\ speak of ``the product'' of two sets, rather than ``a product''.
\end{remark}
\paragraph{Products} Products are slightly more complicated to formalize than terminal objects are, but they are familiar in practice.
\begin{definition}\index{product}
Let $\cat{C}$ be a category, and let $X,Y$ be objects in $\cat{C}$. A
\emph{product} of $X$ and $Y$ is an object, denoted $X\times Y$, together with
morphisms $p_X\colon X\times Y \to X$ and $p_Y\colon X \times Y \to Y$ such that
for all objects $C$ together with morphisms $f\colon C \to X$ and $g\colon C \to
Y$, there exists a unique morphism $C \to X \times Y$, denoted $\pair{f,g}$, for which the following diagram commutes:
\[
\begin{tikzcd}[row sep=small]
& C \ar[dl,"f",swap] \ar[dr, "g"] \ar[dd,"\pair{f,g}",dotted]\\
X && Y \\
& X\times Y \ar[ul, "p_X"] \ar[ur,"p_Y",swap]
\end{tikzcd}
\]
\end{definition}
We will try to bring this down to earth in \cref{ex.product_in_set}. Before we do, note that $X\times Y$ is an object equipped with morphisms to $X$ and $Y$. Roughly speaking, it is like ``the best object-equipped-with-morphisms-to-$X$-and-$Y$'' in all of $\cat{C}$, in the sense that any other object-equipped-with-morphisms-to-$X$-and-$Y$ maps to it uniquely. This is called a \emph{universal property}. It's customary to use a dotted line
to indicate the unique morphism that exists because of some universal property.
\begin{example}\label{ex.product_in_set}\index{product!of sets}
In $\smset$, a product of two sets $X$ and $Y$ is their usual cartesian product
\[
X \times Y\coloneqq \{(x,y) \mid x \in X, y \in Y\},
\]
which comes with two projections $p_X\colon X\times Y\to X$ and $p_Y\colon X\times Y\to Y$, given by $p_X(x,y)=x$ and $p_Y(x,y)=y$.
Given any set $C$ with functions $f\colon C \to X$ and $g\colon C \to Y$, the
unique map from $C$ to $X\times Y$ such that the required diagram commutes is
given by $\langle f,g\rangle(c)\coloneqq (f(c),g(c))$.
Here is a picture of the product of sets $\ul{4}$ and $\ul{6}$.
\[
\begin{tikzpicture}[y=.9cm, rounded corners, shorten <=2pt, shorten >=2pt, baseline=(allf)]
\foreach \i in {1,...,6}{
\foreach \j in {1,...,4} {
\node (ij\i\j) at (\i,\j) {$\LMO{(\i,\j)}$};
\ifthenelse{\i=1}{\node (j\j) at (-1,\j) {$\LMO{\j}$};}{};
}
\node (i\i) at (\i,-1) {$\LMO{\i}$};
}
\node[draw, inner ysep=0pt, fit=(i1) (i6)] (i) {};
\node[draw, inner ysep=0pt, fit=(j1) (j4)] (j) {};
\node[draw, inner ysep=0pt, fit=(ij11) (ij64)] (ij) {};
\draw[->, thick] (ij) -- (i);
\draw[->, thick] (ij) -- (j);
%
\begin{scope}[gray]
\node (X) at (8,6) {$C$};
\draw[->, bend left=10pt] (X) to node[right=5pt] (allf) {$\forall f$} (i.north east);
\draw[->, bend right=10pt] (X) to node[above=5pt] {$\forall g$} (j.north east);
\draw[->] (X) to node[fill=white] {$\exists!$}(ij.north east);
\end{scope}
\end{tikzpicture}
\qedhere
\]
\end{example}
\begin{exercise}\index{meet}
Let $(P,\le)$ be a poset, and suppose that $x,y \in P$. Show that their product and their meet are equivalent, $(x\times y)\cong (x\wedge y)$.
\end{exercise}
\begin{example}\index{product!of categories}
Given two categories $\cat{C}$ and $\cat{D}$, their product $\cat{C} \times
\cat{D}$ may be given as follows. The objects of this category are pairs
$(c,d)$, where $c$ is an object of $\cat{C}$ and $d$ is an object of $\cat{D}$.
Similarly, morphisms $(c,d) \to (c',d')$ are pairs $(f,g)$ where $f\colon c \to
c'$ is a morphism in $\cat{C}$ and $g\colon d \to d'$ is a morphism in
$\cat{D}$. Composition of morphisms is simply given by composing each entry in
the pair separately, so $(f,g)\cp (f',g')=(f\cp f',g\cp g')$.
\end{example}
\begin{exercise}
\begin{enumerate}
\item What are the identity morphisms in a product category $\cat{C}\times\cat{D}$?
\item Why is composition in a product category associative?
\item What is the product category $\Cat{1} \times \Cat{2}$?
\qedhere
\item What is the product category $P \times Q$ when $P$ and $Q$ are posets?
\end{enumerate}
\end{exercise}
These two constructions, terminal objects and products, are subsumed by the
notion of limit.
\subsection{Limits}
We'll get a little abstract. Consider the definition of product. This says that
given any pair of maps $X \xleftarrow{f} C \xrightarrow{g} Y$, there exists a
unique map $C \to X\times Y$ such that certain diagrams commute. This has the
flavor of being terminal---there is a unique map to $X\times Y$---but it's a bit more
complicated. How are the two ideas related?
It turns out that products \emph{are} terminal objects, but of a different
category, which we'll call $\Cat{Cone}(X,Y)$, \emph{the category of cones
over $X$ and $Y$ in $\cat{C}$}. It will turn out---see \cref{exc.prod_as_term_cone}---that $X \xleftarrow{p_X} X\times Y\xrightarrow{p_Y} Y$ is a
terminal object in $\Cat{Cone}(X,Y)$.
An object of $\Cat{Cone}(X,Y)$ is simply a pair of maps $X \xleftarrow{f} C
\xrightarrow{g} Y$. A morphism from $X \xleftarrow{f} C
\xrightarrow{g} Y$ to $X \xleftarrow{f'} C' \xrightarrow{g'} Y$ in $\Cat{Cone}(X,Y)$
is a morphism $a\colon C \to C'$ in $\cat{C}$ such that the following diagram commutes:
\[
\begin{tikzcd}[row sep=small]
& C \ar[dl,"f",swap] \ar[dr, "g"] \ar[dd,"a"]\\
X && Y \\
& C' \ar[ul, "f'"] \ar[ur,"g'",swap]
\end{tikzcd}
\]
\begin{exercise}\label{exc.prod_as_term_cone}
Check that a product $X \xleftarrow{p_X} X\times Y\xrightarrow{p_Y} Y$ is
exactly the same as a terminal object in $\Cat{Cone}(X,Y)$.
\end{exercise}
We're now ready for the abstract definition. Don't worry if the details are
unclear; the main point is that it is possible to unify terminal objects, maximal
elements, meets, products of sets, posets, categories, and many other familiar
friends. In fact, they're all just terminal objects in different categories.
Recall from \cref{def.diagram_commutes} that formally speaking, a diagram in $\cat{C}$ is just a functor $D\colon\cat{J}\to\cat{C}$. Here $\cat{J}$ is called the i
\begin{definition}\label{def.cones_limits}\index{limit}
Let $D\colon \cat{J}\to \cat{C}$ be a diagram. A \emph{cone} $(C,c_\ast)$ over $D$ is
\begin{enumerate}[label=(\roman*)]
\item an object $C \in \cat{C}$, called the \emph{cone point}.
\item for each object $j \in \cat{J}$, a morphism $c_j\colon T \to D(j)$, called the \emph{$j^{\tn{th}}$ structure map}
\end{enumerate}
obeying
\begin{enumerate}[label=(\alph*)]
\item for each $f\colon i \to j$ in $\cat{J}$, we have $c_i=c_j\cp D(f)$.
\end{enumerate}
\index{cone}
A \emph{morphism of cones} $(C,c_\ast) \to (C',c'_\ast)$ is a morphism $a\colon C \to
C'$ in $\cat{C}$ such that for all $j \in \cat{J}$ we have $c_j=a\cp c'_j$.
Cones and their morphisms form a category $\Cat{Cone}(D)$.
The \emph{limit} of $D$ is the terminal object in the category of
$\Cat{Cone}(D)$. Its cone point is often called the \emph{limit object} and its structure maps are often called \emph{projection maps}.
\end{definition}
For visualization purposes, if $D\colon\cat{J}\to\cat{C}$ looks like the diagram to the left, then a cone on it shown in the diagram to the right:
\[
\boxCD{
\begin{tikzcd}[sep=small, ampersand replacement=\&]
{\color{white}C}\\[10pt]
D_1\ar[dr]\&\&D_3\ar[ld]\ar[dr, bend left]\ar[dr, bend right]\\
\&D_2\&\&D_4\ar[r]\&D_5
\end{tikzcd}
}
\hspace{.8in}
\boxCD{
\begin{tikzcd}[sep=small, ampersand replacement=\&]
\&\&C\ar[dll, bend right, "c_1"']\ar[ddl, bend right, "c_2"']\ar[d, "c_3"]\ar[ddr, bend left, "c_4"]\ar[ddrr, bend left, "c_5"]\\[10pt]
D_1\ar[dr]\&\&D_3\ar[ld]\ar[dr, bend left]\ar[dr, bend right]\\
\&D_2\&\&D_4\ar[r]\&D_5
\end{tikzcd}
}
\]
Here, any two parallel paths that start at $C$ are considered the same.
\begin{example}
Terminal objects are limits where the indexing category is empty, $\cat{J}=\varnothing$.
\end{example}
\begin{example}\label{ex.prods_as_lims}\index{product!as limit}
Products are limits where the indexing category consists of two objects $v, w$
and no arrows, $\cat{J}=\fbox{$\LMO{v}\quad\LMO{w}$}$.
\end{example}
%---- Subsection ----%
\subsection{Finite limits in $\smset$}
Recall that this discussion was inspired by wanting to understand
$\Pi$-operations, and in particular $\Pi_!$. We can now see that a database
instance $I \colon \cat{C} \to \smset$ is a diagram in $\smset$. The functor
$\Pi_!$ takes the limit of this diagram. In this subsection we give a formula
describing the result. This captures \emph{all finite limits in $\smset$}.
In database theory, we work with categories $\cat{C}$ that are presented by a finite graph. We won't explain the details, but
it's in fact enough just to work with this graph: as far as limits are concern,
the equations in $\cat{C}$ don't matter. For consistency with the rest of this section, let's denote the database schema by $\cat{J}$ instead of $\cat{C}$.
\begin{theorem} \label{thm.set_limits}\index{limit!formula for finite limits in
$\smset$}
Let $\cat{J}$ be a category presented by the finite graph $(V,A,s,t)$ together with
some equations, and let $D\colon \cat{J} \to \smset$ be a set-valued functor. Write $V=\{v_1,\ldots,v_n\}$. The set
\[\lim_\cat{J} D \coloneqq \big\{(d_1,\ldots,d_n)\mid d_i\in D(v_i)\text{ and for all }a\colon v_i\to v_j\in A, \text{ we have } D(a)(d_i)=d_j\big\},\]
together with the projection maps $p_i\colon(\lim_\cat{J} D)\to D(v_i)$ given by $p_i(d_1,\ldots,d_n)\coloneqq d_i$, is a limit of $D$.
\end{theorem}
\begin{example}
If $J$ is the empty graph \fbox{\color{white}$\LMO{}$}, then $n=0$: there are no vertices. There is
exactly one empty tuple $(\ )$, which vacuously satisfies the properties, so
we've constructed the limit as the singleton set $\{ (\ ) \}$ consisting of
just the empty tuple. The limit of the empty diagram, i.e.\ the terminal object in $\smset$ is the singleton set. See \cref{rem.the_vs_a}.
\end{example}
\begin{exercise}
Show that the limit formula in \cref{thm.set_limits} works for products. See \cref{ex.prods_as_lims}.
\end{exercise}
\begin{exercise}
If $D\colon\Cat{1}\to\smset$ is a functor, what is the limit of $D$? Compute it using \cref{thm.set_limits}, and check your answer against \cref{def.cones_limits}.
\end{exercise}
\paragraph{Pullbacks}
\index{pullback}
In particular, the condition that the limit of $D\colon\cat{J}\to\smset$ selects tuples $(d_1,\dots,d_n)$
such that $D(a)(d_i)=d_j$ for each morphism $a\colon i\to j$ in $\cat{J}$ allows us to use limits to select data that satisfies
certain equations or constraints. This is what allows us to express queries in
terms of limits. Here is an example.\index{database!query}
\begin{example}\label{ex.pullback}
If $J$ is presented by the \emph{cospan} graph
\fbox{$\LMO{x}\Too{f}\LMO{a}\Fromm{g}\LMO{y}$}, then its limit is
known as a \emph{pullback}. Given the diagram $X \xrightarrow{f} A
\xleftarrow{g} Y$, the pullback is the cone shown on the left below:
\[
\begin{tikzcd}
C\ar[d, "t_x"']\ar[r, "c_y"]\ar[dr, description, "c_a"]&Y\ar[d, "g"]\\
X\ar[r, "f"']&A
\end{tikzcd}
\hspace{1in}
\begin{tikzcd}
X\times_AY\ar[d, "t_x"']\ar[r, "t_y"]&Y\ar[d, "g"]\\
X\ar[r, "f"']&A\ar[ul, phantom, very near end, "\lrcorner"]
\end{tikzcd}
\]
The fact that the diagram commutes means that the diagonal arrow $c_a$ is in some sense superfluous, so one generally denotes pullbacks by dropping the diagonal arrow, naming the cone point $X\times_AY$, and adding the $\lrcorner$ symbol, as shown to the right above.
Here is a picture to help us unpack the definition in $\smset$. We take $X
=\ul{4}$, $Y=\ul{6}$, and $A$ to be the set of colors
$\{\textrm{red},\textrm{blue},\textrm{black}\}$.
\[
\begin{tikzpicture}[rounded corners, y=5ex, shorten <=2pt, shorten >=2pt]
\foreach \i in {1,...,6}{
\tikzmath{
if or(or(\i==1, \i==3), \i==4) then {let \coli=red; let \ci=1;} else {
if or(\i==2, \i==6) then {let \coli=blue; let \ci=2;} else {let \coli=black; let \ci=3;};
};
}
\foreach \j in {1,...,4} {
\tikzmath{
if or(\j==2, \j==4) then {let \colj=red; let \cj=1;} else {
if \j==3 then {let \colj=blue; let \cj=2;} else {let \colj=black; let \cj=3;};
};
}
\ifthenelse{\ci=\cj}
{\node[\coli] (ij\i\j) at (\i,\j) {$\LMO{(\i,\j)}$};}
{\node[white] (ij\i\j) at (\i,\j) {$\LMO{(\i,\j)}$};};
\ifthenelse{\i=1}{\node[\colj] (j\j) at (-1,\j) {$\LMO{\j}$};}{};
}
\node[\coli] (i\i) at (\i,-1) {$\LMO{\i}$};
}
\node[draw, inner ysep=0, fit=(i1) (i6)] (i) {};
\node[draw, inner ysep=0, fit=(j1) (j4)] (j) {};
\node[draw, inner ysep=0, fit=(ij11) (ij64)] (ij) {};
\draw[->, thick] (ij) -- (i);
\draw[->, thick] (ij) -- (j);
\end{tikzpicture}
\]
The functions $f\colon \ul{4} \to A$ and $g\colon \ul{6} \to A$ are expressed in
the coloring of the dots: for example, $f(2)=f(4)=\textrm{red}$, while
$g(5)=\textrm{black}$. The pullback selects pairs $(i,j) \in \ul{4} \times
\ul{6}$ such that $f(i)$ and $g(j)$ have the same color.
\end{example}
%---- Subsection ----%
\subsection{A brief note on colimits}
\label{subsec.brief_colimits}\index{colimit}
Just like upper bounds have as a dual concept, namely that of lower bounds, so
limits have a dual concept: colimits. To expose the reader to this concept, we
provide a succinct definition of these using opposite categories and opposite
categories. The point, however, is just exposure; we will return to explore
colimits in detail in \cref{chap.hypergraph_cats}.
\begin{exercise}
Recall from \cref{def.opposite_cat} that every category $\cat{C}$ has an opposite $\cat{C}\op$. Let $F\colon \cat{C} \to \cat{D}$ be a functor. How should we define its opposite, $F\op\colon \cat{C}\op \to \cat{D}\op$? That is, how should $F\op$ do on objects, and how should it act on morphisms?
\end{exercise}
\begin{definition}\label{def.colimit}\index{cocone}
Given a category $\cat{C}$ we say that a \emph{cocone} in $\cat{C}$ is a cone
in $\cat{C}\op$.
Given a diagram $D \colon \cat{J} \to \cat{C}$, we may take the limit of the
functor $D\op \colon \cat{J}\op \to \cat{C}\op$. This is a cone in $\cat{C}\op$,
and so by definition a cocone in $\cat{C}$. The \emph{colimit} of $D$ is this
cocone.
\end{definition}
\cref{def.colimit} is like a compressed file: useful for transmitting quickly, but completely useless for working with. In order to work with it, we will need to unpack it; we do so later in \cref{chap.hypergraph_cats} when we discuss electric circuits.
%---- Section ----%
\section{Summary and further reading}\label{sec.ch2_further_reading}
Congratulations on making it through the longest chapter in the book! We apologize for the length, but this chapter had a lot of work to do. Namely it introduced the ``big three'' of category theory---categories, functors, and natural transformations---as well as discussing adjunctions, limits, and very briefly colimits.
That's really quite a bit of material. For more on all these subjects, one can
consult any standard book on category theory, of which there are many. The bible
(old, important, seminal, and requires a priest to explain it) is
\cite{MacLane:1998a}; another thorough introduction is \cite{Borceux:1994a}; a logical perspective is given in \cite{Awodey:2010a}; a computer science perspective is given in \cite{Barr.Wells:1990a}; math students should probably read \cite{Leinster:2014a} or \cite{Riehl:2017a}; a general audience might start with \cite{Spivak:2014a}.
We presented categories from a database perspective, because data is pretty ubiquitous in our world. A database schema---i.e.\ a system of interlocking tables---can be captured by a category $\cat{C}$, and filling it with data corresponds to a functor $\cat{C}\to\smset$. Here $\smset$ is the category of sets, perhaps the most important category to mathematicians.
The perspective of using category theory to model databases has been rediscovered several times. It seems to have first been discussed by various authors around the mid-90's \cite{Islam.Phoa:1994a,Cadish.Diskin:1995a,Piessens.Steegmans:1995a,Tuijn.Gyssens:1996a}. Bob Rosebrugh and collaborators took it much further in a series of papers including \cite{Fleming.Gunther.Rosebrugh:2003a,Johnson.Rosebrugh:2002a,Rosebrugh.Wood:1992a}. Most of these authors tend to focus on sketches, which are more expressive categories. Spivak rediscovered the idea again quite a bit later, but focused on categories rather than sketches, so as to have all three data migration functors $\Delta,\Sigma,\Pi$; see \cite{Spivak:2013a,Spivak.Wisnesky:2015a}. The version of this story presented in the chapter, including the white and black nodes in schemas, is part of a larger theory of algebraic databases, where a programming language such as Java is attached to a database. This is worked out in \cite{Schultz.Spivak.Vasilakopoulou.Wisnesky:2017a}, and its use in database integration projects can be found in \cite{Schultz.Wisnesky:2015a,Wisnesky.Spivak.Schultz.Subrahmanian:2015a,}.
Before we leave this chapter, we want to emphasize two things: coherence conditions and universal constructions.
\paragraph{Coherence conditions}\index{coherence}
In the definitions of category, functor, and natural transformations, we always had one or more structures that satisfied one or more conditions: for categories it was about associativity and unitality of composition, for functors it was about respecting composition and identities, and for natural transformations it was the naturality condition. These conditions are often called \emph{coherence conditions}: we want the various structures to cohere, to work well together, rather than to flop around unattached.
Understanding why these particular structures and coherence conditions are ``the right ones'' is more science than mathematics: we empirically observe that certain combinations result in ideas that are both widely applicable and also strongly compositional. That is, we become satisfied with coherence conditions when they result in beautiful mathematics down the road.
\paragraph{Universal constructions}\index{universal property}
Universal constructions are one of the most important themes of category theory. Roughly speaking, one gives some specified shape in a category and says ``find me the best solution!'' And category theory comes back and says ``do you want me to approximate from the left or the right (colimit or limit)?'' You respond, and either there is a best solution or there is not. If there is, it's called the (co)limit; if there's not we say ``the (co)limit does not exist''.
Even data migration fits this form. We say ``find me the closest thing in $\cat{D}$ that matches my $\cat{C}$-instance using my functor $F\colon\cat{C}\to\cat{D}$. In fact this approach---known as Kan extensions---in fact subsumes the others. One of the two founders of category theory, Saunders Mac Lane, has a section in his book \cite{MacLane:1998a} called ``All concepts are Kan extensions'', a big statement, no?
\index{Kan extension}
\end{document}