Automata relates to machines with finite memory.
Unlike maths, where we can have infinite of something, we cannot have infinite of something in an automata because it’ll run out of memory.

Automata also relates to pattern matching, modelling, verification of hardware etc.

Deterministic Finite Automata

Alphabets and strings

An alphabet is a finite set of elements. It is denoted with capital sigma, Σ.
Examples:

Σ = {0, 1}
Σ = {a, b, c}
Σ ≠ {1, 2, 3, 4, ...}

A string ‘s’ over Σ is a finite sequence of elements of Σ.
Two strings ‘s’ and ‘t’ are equal when they have the same elements of the same order.
Examples:

Σ = {0, 1}, s = “0100110”
Σ = {a, b}, s = “abbaba”

The set of all strings is denoted with sigma star, Σ*.
It will always be infinite, unless the alphabet is empty.
In every alphabet, including an empty alphabet, the set contains the empty string ‘ϵ’.
More names for Σ* are “free monoid on Σ”, or “Kleene star of Σ”.
Examples:

Σ = {}, Σ* = { ϵ }
Σ = {a}, Σ* = { ϵ, a, aa, aaa, aaaa ... }
Σ = {0, 1}, Σ* = { ϵ, 0, 1, 10, 11, 100, 001, 000 ...}

We can clean up our notation by not including Σ, but just replacing it with its definition:

{}* = { ϵ }
{a}* = { ϵ, a, aa, aaa, aaaa ... }
{0, 1}* = { ϵ, 0, 1, 10, 11, 100, 001, 000 ...}

The length # of a string is its number of elements.
Examples:

#(abba) = 4
#(00101) = 5
#(joestar) = 7

The empty string is usually denoted with a lunate epsilon character, ϵ.
Laws:

ϵ = “”
#(ϵ) = 0
ϵϵ = ϵ

You can concatenate two strings, ‘s’ and ‘t’, like this: ‘st’.
Example:

s = “ab”, t = “ba”, st = “abba”
s = 011, t = 101, st = 001101
s = “za”, t = “warudo”, st = “zawarudo”
s = ϵ, t = ϵ, st = ϵ
ϵs = sϵ = s

Languages

A language L over an alphabet Σ is some subset of strings in Σ*.
Examples:

Σ = {a, b, c}
L1 = {ϵ, a, bb, cba}
L2 = {bbb, aaa}
L3 = {cccccacccccc}
L4 = Ø = {}

Two languages are equal when they contain the same strings (in terms of sets, they are equal).
If the language should also contain the empty string ϵ it must be explicitly added to the language as an element of the set.

Deterministic finite automata (DFAs)

A finite automata is a mathematical model of a system with a finite number of states and a finite amount of memory space.
A state is a description of a system at some point in time.
A transition allows the DFA to go from one state to the other.
Think of DFAs like flow-charts; you go from one state to the next until you reach the goal state.
There are three kinds of finite automata:

Deterministic
Nondeterministic
Nondeterministic with ϵ-moves

Right now, we’ll focus on deterministic finite automata.
A DFA (deterministic finite automata) has 5 properties:

Property	Symbol	Description
States	Q	Possible states that the automata can be in.
Alphabet	Σ	The alphabet that the transition names depend on.
Transitions	δ	A function that allows the automata to go from one state to the other by consuming a symbol. This takes an element of Σ and a state from Q, and outputs the next state from Q to transition to.
Start state	s	The state that the automata starts at.
Final states (accept states)	F	Set of states at which, if the automata ends at them, it accepts the string that was entered into it.

To define a DFA mathematically, use the properties above into a tuple:
Example, given that M is a DFA:

M = (Q, Σ, δ, s, F)

An automata takes in a string that contains letters from the alphabet and reads it from left to right, with each letter being an action (or transition) that changes the state of the automata.
All states must have all possible transitions, i.e. each state must have exactly one transition for every element of the alphabet
If the automata ends on a final state, it’ll accept the string.
If the automata does not end on a final state, it’ll reject the string.

You can also represent DFAs as graphs (pictures), similar to state-machine diagrams:

This small arrow pointing to the ‘0’ state coming from nowhere means that ‘0’ is the initial state.
The states that are not filled in are not final states.
The states that are filled in are final states.
The arrows represent the transitions between states, which are triggered by the input character shown

This DFA accepts strings like abbab and babbbab, but not abba or ϵ.
To say that a string ‘s’ takes the DFA from state 0 to state 1, you would write that as:

Where:

Left-hand side = beginning state
Right-hand side = ending state
Text on the top = string transition

More examples:

Product construction

Product construction is where you merge two DFAs together, so that the combined DFA is actually the two component DFAs running in parallel.
The states aren’t just single string / character names. The states are referenced in a tuple (or pair), with each element being a state name from one of the component DFAs.

Let’s define the DFAs of L1 and L2 as this, respectively:

Now, if we found the product of the DFA of L1 and the DFA of L2, then the states of that product would be:
(0, 0), (0, 1), (0, 2), (1, 0), (1, 1), (1, 2), (2, 0), (2, 1), (2, 2), (3, 0), (3, 1), (3, 2)

The product transition would need a pair state and a transition name. For example, you could go from (2,0) to (3,1) by using ‘a’, or you could go from (0,0) to (1,0) by using ‘b’.

The set of final states F is F1 ⨯ F2, where F1 is the set of final states of L1 and F2 is the set of final states of L2. Therefore, with normal, unedited product construction, for the DFA to accept the string, we need to be in a final state for both component DFAs.
This can be changed, however, to just “at least one DFA to be on a final state” if we define F to be (Q1 ⨯ F2) ∪ (F1 ⨯ Q2), where Qx is the set of states for a DFA.

Therefore, the language expressed by the product of and is .

It’s best to view this as a table, where you can see the DFAs of L1 and L2 simultaneously:

L1’s DFA state	L2’s DFA state	Product state	Transition	Step
0	0	(0,0)	None (yet)	0
1	1	(1,1)	a	1
2	1	(2,1)	b	2
3	2	(3,2)	a	3

To view this more simply, you can just follow two DFAs at the same time, and put one’s state on the left of the tuple and the other’s state on the right of the tuple. It’s nothing more than running two DFAs at the same time, inputting the same word into both of them.

Regular languages

When you have a DFA called M, then L(M) is the language of all accepted strings.
For example, in the previous DFA, L(M) = {b, ab, aab, bb, abb ...} and it will not include elements like ‘a’, ‘ϵ’, ‘ba’ or ‘bba’ because they all will not be accepted by that DFA.

A regular language L is when there exists some DFA M such that L(M) exists.
More informally, a language L is regular if you can come up with some DFA where it’ll accept all the words in L and only those words in L.

For example, the language { ab, ba, aa, bb } is regular because I can construct a DFA that accepts only those words:

All finite languages are regular, because we could (in theory) just build a control flow tree of constant size (you can visualize it as a bunch of nested if-statements that examine one digit after the other) (citation).

The empty language Ø is regular, because you can create a DFA that can’t accept anything (for example, a DFA with no final states), not even the empty string. If a DFA can’t accept any strings, then the L(M) of that DFA would be the empty set Ø, making Ø regular.

The set of all strings Σ* is regular, because you can set up a DFA consisting of a single final state that transitions to itself for any symbol. This accepts any element from the alphabet Σ, like this one when Σ = {a, b, c} and Σ* = {ϵ, a, b, c, ab, bc, abc, cba ...}:

If L is regular, then the complement (Σ* - L, or ~L) is always regular.
For example, let’s take the language L = {ab, ba, aa, bb} from before because we know that’s regular (therefore there must be a DFA whose L(M) is L). The alphabet that it’s from is Σ = {a, b}. The complement of L, ~L, will be the set of all strings from Σ* except for the ones in L.
If we take the DFA whose L(M) is L from before and flip all the final states to non-final states and vice versa, we can create a new DFA whose L(M) is ~L (citation).

This works on any language. We’re simply getting the “inverse” of this DFA.

If L1 and L2 are regular, then the union, L1 ⋃ L2, is regular.

If we want to prove that L1 ⋃ L2 is regular, we need to show a DFA exists for L1 ⋃ L2.
We can use product construction for L1 and L2 , then we can modify it a bit to suit what we need.
First of all, let’s define the result of the product of L1 ⋃ L2 to be M.
Now, we change the set of final states F of M to be “at least one DFA must be on a final state to accept the string”, so we define F to be (Q1 ⨯ F2) ∪ (F1 ⨯ Q2), where Qx is the set of states for a DFA.
By doing that, we have now defined a DFA where L(M) is L1 ⋃ L2.

If L1 and L2 are regular, then the intersection, L1 ∩ L2, is regular.
This one’s even easier than the union one.
Basically, we use product construction for the DFAs L1 and L2.
That’s it. We don’t even need to change the set of final states. This is because, by default, the product of two DFAs require both DFAs to be on a final state for the aggregate DFA to accept the string, which is the intersection.

If L1 and L2 are regular then L1 L2 = { xy | x ∈ L1, y ∈ L2 } is regular.
This proof requires NFAs (nondeterministic finite automata), which are covered in more detail later, but basically it’s the same as DFA except you can have multiple transitions in NFAs, and you can take multiple paths at the same time. Think of it like two threads running in parallel, or two trains travelling on different paths in parallel. It’s similar to a fork node in an activity diagram in UML and how it models concurrency in a system.
First, we define two DFAs for L1 and L2:

M1 = (Q1, Σ, δ1, s1, F1)
M2 = (Q2, Σ, δ2, s2, F2)

Now, we’re going to create an NFA from these two DFAs:

M3 = (Q1 ∪ Q2, Σ, δ1 ∪ δ2, s1, F2)
and for each state q ∈ F1, δ(q, ϵ) = s2 (what this means is, for each final state of the first DFA, make a transition that goes from that final state to the starting state of the second DFA). The transition has the name ‘ϵ’, which means that we can take this transition without using up any characters in the string, it’s basically a free move, but more on that in a later topic. See ‘ϵ-moves’).

What this NFA really means is that we start off in terms of M1, but we need to finish off in terms of M2.
For example, take the two example DFAs for a second:

If we defined a new NFA, like we did above, for these two example DFAs, it’ll look something like the illustration below. I’ve created two illustrations of the same NFA: one without annotations, and one with annotations.

The proper one

The illustrative one

First of all, the top half of this NFA is basically M1. However, when we reach M1’s final state, there’s actually an epsilon move taking the NFA to it’s bottom half. If we don’t reach that epsilon move, we can’t move onto the bottom half of the NFA. When we reach that epsilon move, the string (up to this point) should be one of M1’s accepted strings.
Second, the bottom half of the NFA is M2. Once we’ve reached the bottom half, we’ve found an M1 string, now we just need an M2 string. Once we’ve got an M2 string, we’ll be on a final node and the string will be accepted.
By reaching the final node, we know that the input string is made up of an M1 accepted string as the first half, and an M2 accepted string as the second half. This is literally the definition of L1 L2, so we can safely say that the L(M) of this NFA is L1 L2.
This NFA should accept any concatenated string from L1 and L2. Therefore, L1 L2 is regular.

If L is regular then L* = { x1 ... xk | k ∈ N, xi ∈ L } is regular.
L* is basically the set of every possible word you can make by concatenating the words of L onto themselves.
So if L had “add” and “dad”, L* would have words like “adddad”, “dadadddad”, “daddaddad” etc. and it would go on infinitely.
The operator * is called “Kleene star”, and we’ve been using it already (when we write Σ* to refer to the set of all words of that alphabet).

Yet again, we’ll need NFAs to prove this.
Let’s use my “add dad” example above, and construct a DFA, M, where L(M) is L:

The red nodes are “error nodes”, where the DFA goes when we know the string will be rejected.
This accepts “add” and “dad”, but we need it to accept words like “adddad” as well.
Luckily, epsilon-moves allow us to very easily construct an NFA that does this:

Now, when we complete the word “add” or “dad”, we can move straight back to the start node and search for “add” or “dad” again. Remember, epsilon moves don’t require any characters to use, so we can go straight from the end of “add” to the start of “dad”, for example.
The reason why we create a new final start state that immediately goes to the old start state is because Kleene star supports the empty string (in the picture, it’s state 8)
To more generalise this proof, let’s define a general DFA:

M = (Q, Σ, δ, s’, F ∪ s’)

Now, we set epsilon moves branching from all the final states to the new final starting state:

∀q ∈ F, δ(q, ϵ) = s’

(For all states q in the set of final states F, there is a transition function where when the NFA is in state q, and the empty epsilon move ϵ is input, the state transitioned to is the new final start state s’.)
That’s all there is to this proof.

Nondeterminism, subset construction and ϵ-moves

Nondeterminism

With a DFA, there is always a next state to go to when you input a letter.
However, what if there are cases where there isn’t always a next state to go to? What if there’s multiple states we can go to?
That’s what an NFA is (nondeterministic finite automata), it’s basically a DFA that can either branch off to multiple states, or fail if there are no states to go to. In other words, it guesses where it’s supposed to go. If the guesses are wrong, they “die”.
In other words, a state no longer has |Σ| transitions, where each transition is every letter of the automata’s alphabet.
The whole gimmick of an NFA is that we can go through different “paths” simultaneously. Some paths can die, some paths can survive and reach final states. We only need one path to be on a final state to accept the string.

"It is helpful to think about this process in terms of guessing and verifying.
On a given input, imagine the automaton guessing a successful computation
or proof that the input is a "yes" instance of the decision problem, then
verifying that its guess was indeed correct." Source: Kozen textbook

Example:

As you can see, it’s slightly different from a DFA:

There are two ‘b’ transitions in state ‘0’
There are no transitions in state ‘1’

Let’s walk through this NFA to get a feel for how NFAs work:

Step	Description	String so far	Illustration
1	So far, we haven’t gone anywhere. The arrow that seems to come from nowhere indicates where we are initially, so we start at state ‘0’. The smiley faces indicate different “paths” that the NFA is going through. A different coloured face is a different path, and their journey is displayed on the right, for example the green path has just started on 0, so a green ‘0’ is shown on the right.
2	Let’s try inputting ‘a’ and see where that leads us. There’s only one transition we can go down if we input ‘a’, so no new simultaneous paths are created; we just stick with our original one. The ‘a’ path just goes from 0 to 0, so our green path goes from ‘0’ to ‘0’.
3	Let’s try inputting a ‘b’ and see where that leads us. There are two paths that ‘b’ can take us, from 0 to 0 or from 0 to 1. Here, our green path splits into two, one red and one blue. The red one picks the path that stays on 0, and the blue path picks the one that goes to 1. Right now, there exists a path that is on a final state: the blue one. Therefore, if we left the string as just , this NFA will accept that string.
4	Let’s try inputting another ‘b’ and see where that leads us. The red and blue paths will both need to go through a ‘b’ transition. Since the blue path doesn’t have a blue transition where it is, it’ll ‘die’. The red path needs to go down a ‘b’ transition, too. There are two transitions that the red path can go down, the one that goes from 0 to 0 and the one that goes from 0 to 1. Like last time, the red path splits into two different paths: a yellow path and purple path. The yellow path takes the transition from 0 to 0, therefore the yellow path is still on 0. The purple path takes the transition from 0 to 1, therefore the purple path is now on 1. The NFA should accept this string too, since the purple path is on a final state.

Now, let’s formally construct an NFA:

where:

is the set of all states
is an alphabet
is the transition function , which basically means input a state and a letter from the alphabet and get a set of all states we could be in (see ‘Subset construction’)
is the start state
is the set of final states

Let’s formally construct the NFA as shown above:

There is some notation that is worth noting:
When you see something like:

This means that the NFA will go from 0 to 2 if you input ‘a’, and then ‘b’. There should be some intermediate state, either ‘0’, ‘2’, or even ‘1’, that you enter when you input ‘a’ and leave when you input ‘b’.

Subset construction

It’s easy to convert DFAs into NFAs, because DFAs are a subset of NFAs.
But how do you convert NFAs to DFAs?
You use subset construction. Basically, each state is a subset of states that we could’ve gone to in our NFA counterpart.
For example, in our NFA, if we’re in state 0 and by inputting ‘b’ we can either go to 0 or go to 1, in the DFA we’ll actually go to the state {0,1} because we can go to either 0 or 1 in the NFA.

Let’s define an NFA , then let’s create a DFA which is based on :

This may look like an alien language to you now, but I’ll go through each line and explain it in plain English, then I’ll apply subset construction to the example above.
First things first, what’s going on up there?

Line in weird maths language	Line in plain English
	Here, we create an NFA with the parameters , , , and .
	Here, we create a DFA with the parameters , , , and . The parameters , and are all dependent on the parameters for the NFA .
	is the set of all states, like . However, is the powerset of , so it’s the set of all subsets of , so for example, Because of this, the states in the DFA is all the subsets of , so you could have a state , a state or even a state .
	From before, we know that is a transition function that defines how we go from one state to the other. Therefore, the inputs must be a state and a letter. is our state input. It’s actually a set, which makes sense, because the states in our DFA are all subsets of . is our letter input. Nothing strange here; it’s just a letter within the alphabet . Don’t be too confused with the that comes after. This simply means that, for each NFA state in our input , we’re going to apply the NFA transition function on that state with our input letter and put the result in our state output. More formally, is a bit like the sum in maths, except instead of adding all the terms, it unions all the terms. Remember that the transition function also outputs a state. Since the states of our DFA are subsets of , the DFA transition function also outputs a subset of . For example, let’s say we input . The output will be . I’ll go over this in more detail in our example below, but the point is that this transition function just transitions all the states in and unions them all together into one big output.
	The set of final states is a subset of . For a DFA state to be a final state, it needs to contain an NFA state that is in . For example, if 0 is a non-final state and if 1 is a final state in the NFA, then in the DFA, {0} would not be a final state, but {0,1} and {1} would be final states. That’s basically what the left expression gets; a subset of where all elements contain at least one final state from .

Now that we’ve got the formalities out the way, we can work on an example. Let’s convert the NFA above into a DFA.
As said above, we’ll define our DFA as and our NFA as , so we’ll go through this step-by-step by explaining how we get each parameter for the DFA :

What we’re defining

How we did it

The value of is , so we just need to get the powerset of , or the ‘set of subsets’. This is:

This is just the same as before, nothing new here:

First of all, let’s create a transition table. This is a table where the input letters go on the top, the states go on the left, and you can look up transition results by looking up a state and a letter:

Now we work through this table from top to bottom. First of all, there’s the empty set state. You can’t go from any state from the empty set state; it’s practically a death state. So just input all empty set states here.

Now we’re moving onto the singleton states. For and , just have a look at the NFA and see what states you’d end up at if you were at 0 and you took the paths labelled ‘a’. As you can see, you’d just end up back at 0, so we just input there. As for and , you can see that you can go to 0 or 1 using a ‘b’ path on 0, so we can input there. Continue this until we’re done with all the singleton states.

Now we’re faced with a state with more than one element! Remember the definition from before: we need to perform the transition function on all the elements, then union the results together. For example, on and , we need to union the results of doing and , and and . Just above the and cell, we can see the results of those. We just need to union those together, by which I mean, union and together. When we do that, we get , so in this transition table, when you input on the state , you get . Repeat this for all the states and inputs until you finish the table.

Once you’re done, that’s pretty much it! This transition table is your new transition function for your DFA.

This video goes into more detail and I strongly suggest it if you still struggle with this.

It’s pretty much the same as the original, but you encapsulate it in a set:

Get all the elements of :

{, , , }

Now get rid of the ones that do not have final states in them:

{, , , }

So therefore, your new is:

{, }

ϵ-moves

An epsilon-move (ϵ-move) is a type of transition that you can use without needing to “spend” or “consume” any symbols.
It’s basically a free move.
An NFA that uses epsilon-moves are called ϵNFAs.

They’re formally defined like this:

where

This is basically the same as an NFA, except transitions now support epsilon-moves.

Here is an example of an ϵNFA:

From the starting state, you can go to the left-most state and the right-most final state, all without consuming any symbols.

You can also convert ϵNFAs to NFAs. Don’t worry! It’s not as long as subset construction.
You only need to edit the transition function and the set of final states.
With an ϵNFA we can create an NFA like so:

Your next line will be “Is this going to be translated into plain English again?”
Please see below for a translation into plain English.

Line in weird maths language	Line in plain English
	We’re just defining an ϵNFA with these parameters.
	We’re just defining an NFA with these parameters. These parameters will be based off of the parameters for .
	When you input a state and a symbol, the output will be all the possible states it can reach to using epsilon-moves and that symbol.
	The new set of final states is a superset of the old set of final states, meaning we still keep the final states the same, but we add more. We add the states that can epsilon-move towards a final state.

Now I’ll go over an example, by converting that ϵNFA above into an NFA:

What we’re defining

How we did it

First, let’s create that transition table again. Just for example’s sake, I’m calling the initial state 0, the left-most state 1 and the final state 2:


0
1
2

If you were at state 0 and had an input ‘a’, what states could you get to? Well, you could go down the left epsilon-move to get to state 1 and spend the ‘a’ on going back to state 1. That’s pretty much it, so here, we’d input just state 1.

If you were at state 0 and had an input ‘b’, what states could you get to? You could go down the left epsilon-move and use ‘b’ to go to state 1 or use your ‘b’ to get to state 2. If you went down the right epsilon-move, you wouldn’t be able to use ‘b’, so you can ignore it. Seems like you can get to states 1 and 2, so we put that in the transition table.

We keep going like this until we fill out the transition table:


0
1
2

This transition table is our new transition function!

For this one, you could look at the final states in the graph, and work your way back up the epsilon-moves, looking for non-final states that can epsilon-move to the final state.

As you can see in the graph above, if you start at state 2 (the final state) and work your way up the epsilon-move, you’ll find yourself at state 0. Therefore, we can make state 0 a final state. However, you cannot go up to state 1 from state 2, so state 1 cannot be a new final state.

Obviously, we keep the old final states from the ϵNFA:

Regular expressions and Kleene’s theorem

Defining regular expressions

A string s matches a regular expression whenever
Regular expressions can be defined using atomic and compound patterns

Atomic patterns

There exist the following atomic patterns, which do not need to be broken down further:

Pattern	Matched by …
	… the single symbol from the alphabet:
	… the empty string
	… nothing
	… any single symbol from the alphabet
	… any string formed by the symbols in (including the empty string), i.e. any string in

Compound patterns

Compound patterns are defined recursively, based on the atomic patterns.

Pattern	A String will match if ...
	… it matches or
	… it matches both and
	… it can be broken down into two parts such that the first part matches and the second part matches
	… it does not match
	… it matches 0 or more repetitions of .
	… it matches 1 or more repetitions of .

Examples

All strings that end in “vita”: or more elegant:
All strings with an even number of a’s:
All strings with an odd number of a’s:

Kleene’s theorem

If is a regular expression then is a regular language.

All strings that match a regular expression form a language. This language is regular.

If L is a regular language then for some regular expression .

If a language is regular, then there exists a regular expression that matches all strings in that language (and not more/less).

Therefore, we can conclude that finite automata and regular expressions describe the same languages; they are equally powerful.

In order to prove this theorem, we need to show that we can convert a regular expression to a finite automaton and vice versa.

Reg exp to εNFA

We will convert a regular expression to a
Basically, we’re going to split our regular expression into “building blocks”, then we’re going to convert those “building blocks” into automata individually.
With each regex “building block”, we need to show that we can build an automata out of it. In other words, for each possible regex “building block”, we need to show that it’s language is regular.
Once we’ve shown that all our building blocks are regular, we can use said building blocks to build an automata out of any regex expression.

We can prove we can convert by induction, as regular expressions are built inductively.
This means that we can build any regular expression by combining our atomic patterns, which results in a compound pattern. Think of the atomic patterns as our basic building blocks:

First we check if our basic building blocks are regular.
Then we assume that the blocks we use to build compound patterns are regular.
Then we check if the compound patterns are regular.
If they are regular we have completed the proof.

Let’s start with checking the base cases, these are our basic building blocks, represented by atomic regular expressions:

is regular:
is regular:
is regular:

These are all atomic regular expressions we have to check, as # and @ are redundant, meaning that they can also be represented by compound patterns.
Now, assume that the building blocks we use to build compound patterns are regular. Then we will check if the compound patterns are regular:

We have already proved above that the union of two regular languages is regular.

It was also proved above that the concatenation of two languages are regular.

Again, we have proved above that kleene star is regular.

These are all compound regular expressions we have to check as all other compound regular expressions are redundant, meaning that they can also be represented by other compound patterns.
Now we have completed the proof that from any regular expression we can construct an automaton.

εNFA to reg exp

First of all, let’s simply state what we want to do. We want to prove that there is always a way to convert an εNFA to a regular expression.

We will construct an NFA

is the set of states
is the alphabet
is the transition function
is the start state
is the set of final states

This is what we want: the regular expression which describes all possible paths

starting at the state u
ending at the state v
passing through 0 or more intermediary states, which are in the set X.

It’s important to note that we do not need to pass through any intermediate states in X, but we have the opportunity to pass through as many as we want.

If we can define the regex mathematically, then we’ve shown that we can convert an εNFA to a regex, thereby finishing our proof.
We shall prove this by induction! In case you’ve forgotten, proof by induction goes like this:

Base case
Inductive step

Yes, there will be some weird maths language, but don’t worry; I’ll translate it for you. It sounds scary at first, but once you see the proof, it’s not that bad:

Step

Weird maths language

Normal English + pretty pictures

Base case

Let

be all symbols such that

for

Firstly, our base case. The simplest case is going from the initial state to the final state without going through any intermediate states.

This is what we’re defining on the left here; from top to bottom, we’re defining

If the initial state isn’t the same as the final state and there are transitions from initial state to final state
If the initial state isn’t the same as the final state and there are no transitions from initial state to final state
If the initial state is the final state and there are transitions from the initial state to the final state
If the initial state is the final state and there are no transitions from the initial state to the final state

All those symbols are all transitions that go straight from the initial state to the final state.

Inductive step

First, we have a proposition (assumption). We propose that exists and works.

Then, we define a out of . Once we do that, we’ve defined our inductive step.

By doing this, we show that if there exists a regex from u to v through intermediate states X, there also exists a regex from u to v with intermediate states X + {q}, where ‘q’ is any other state in the automata.

This means we can keep adding on states from our base case until we’ve filled in all the states in our automata.

As you can see from the proof, we can just keep adding on new states to our . That means we can add our initial state and final state to this regular expression, and then append all the intermediate states onto and it’ll work.

That’s great, but how do you actually convert an εNFA to a regular expression? We’ve proved that we can do it, so how do we actually do it?
Below you’ll see an example of how to actually convert an εNFA into a regular expression. It involves breaking down the regex you need by taking out intermediate states, until you get something trivial, then working your way back up again.

EXAMPLE:

Given the following NFA

We need to find a regular expression that starts at state A, ends at state C and passes through any of A,B,C zero or more times.

Therefore, we are looking for

The starting state u is going to be A
The ending state v is going to be C
The set of states which we can pass through X is equal to Q

Now, we are going to rewrite this regular expression using the formula above

We are going to pull out one of the states from X
It is best to choose a state that reduces the paths we can go a lot

I’m going to pull out B. This gives:

We can decompose this further into
It is up to you how far you decompose.

If you are able to see what the regular expression is right away, you don’t need to decompose. However, if you barely decompose, the chances that you miss a path are high.
We could decompose what we have got now even further, but I don’t think that it is necessary to do so as the regular expressions for these are fairly easy to find.

Let’s take a look at the first decomposed regular expression:

b + b(a+b)*(a+b) = b(a+b)*

Any string that starts with b or that starts with b followed by one or more mixed a’s and/or b’s will be matched. This can be simplified into a string starting with b followed by zero or more mixed a’s and/or b’s.
Let’s take a look at the second decomposed regular expression:
a + ε(ε)*a = a + a = a
Any string that contains exactly one a will be matched.

Let’s take a look at the third decomposed regular expression:
a + a(a+b)*(a+b) = a(a+b)*
This is very similar to the first one. It will match a string starting with a followed by zero or more mixed a’s and/or b’s.

As you might have noticed, I didn’t decompose the blue part. The reason for that is that the blue regular expression starts and ends in the same state, meaning that it can only take transitions to the state itself (if exists):
b
Any string that contains exactly one b will be matched.
Now, let’s put together the regular expressions from the coloured parts:
b(a+b)* + ab*a(a+b)*
This is the regular expression that is represented by this NFA - we are done!
Often, there is more than one regular expression that is represented by an automaton, meaning that there is no single correct solution. For example, the regular expression (ab*a + b)(a + b)* is correct as well.

Limitations of regular languages

Is every language regular? No!
For example, look at the language

Our automaton has finitely many states.
It’s going to scan the string from the left to the right, starting with the a’s.
Then, when it’s going to scan the b’s, it has to remember how many a’s there were previously.
Since n can be infinitely big, there is no way to do this with our finite number of states!

We are going to prove by contradiction that the language is not regular.

Suppose we have an automaton that accepts this language. Since the number of states it has is finite, we are going to say that it has states.
Next, we are going to choose such that . We can always find an that is greater than because is not bounded, i.e. can be infinitely large.
Now, our automaton is going to start to scan the ’s from the left to the right. Since we have less states than the number of a’s, there must be a state we are going to enter at least twice (pigeonhole principle)! We are going to call this state . Between the first time we enter and the second time we enter we are going to scan ’s. After the second time we entered our automaton is going to continue to the final state.
When the automaton enters state for the first time, it goes on a path that leads it to again. Then, after arriving at for the second time, it goes on a path that leads it to the final state.
Here is the problem: Why arrive at , and then take a path that leads to again? Why not take the path that goes to the final state upon arriving at for the first time? This means that our automaton would accept a string with number of ’s. However, now the number of ’s is not anymore equal to the number of ’s and here we have our contradiction!

Pumping (🎃) Lemma for regular languages - contrapositive form

We can use the contrapositive form of the pumping lemma for regular languages to show that a language is not regular. We cannot use the pumping lemma to show that a language is regular!

It works by splitting up the language into parts, then “pumping” a subpart until you get a string that isn’t in the language. This is similar to what we have just done before.
The different steps of the proof can be understood as a game versus a demon.

The demon chooses a number such that .

We cannot choose a value for k, we only know that it is greater than 0.

We choose a string such that .

We can choose any string that is within the language. However, we have to make sure that, no matter what k is, our string is always in the language. We do not need to cover the whole language with our string!
and can be empty, there is no restriction concerning them.
The length of the part must be at least . A way to make sure that this is always the case is to let the part be equal to something to the power of .
due to the lower bound of k.
As a general tip, try to make as trivial as possible, like in the following example:
For instance, we could let to make sure that is at least long ( is a symbol from the alphabet).

The demon splits the part of the string into such that , i.e. The whole string would be .

You cannot choose how the string is going to be split! All you know is that the part of the string is not going to be empty.
Therefore, you have to do a general case and define such that every way of splitting the string is possible and we do not introduce any restrictions.
and can be empty, there is no restriction concerning them.
For instance, if , then and because v cannot be empty. Furthermore, the remaining part . If we add the two parts, we are going to get back to . Also, we haven’t made any assumptions about how the string is going to be split.

We pick an such that the string If this is the case we have proven that the language is not regular!

We can pick any we want as long as it is equal to or greater than 0.
Our goal is to choose an such that the resulting string is outside of the language.
For instance, and . If we now choose , we are going to get . We have now removed ’s compared to what we had as in the step before, which was . Also we know that , meaning that we removed at least one !

You MUST say “Therefore L is not regular”, if you don’t, you will lose marks.

If this still doesn’t make any sense, look at a few examples below, and then read the general case above again.

Example	Description
	The demon chooses such that Now choose a string and . I’ll do it like , so and . For every value of I will get a specific string which is in the language of . It is ok that I cannot get the empty string ( which is in the language) because I do not need to cover the whole language. The demon splits your into a such that Now, we have to do the splitting in a general case. Since cannot be empty, then and . Since , then . We pick an such that the string Now for this example, choose Then as the number of b’s will be bigger than the number of a’s, no matter what value k is. Therefore L is not regular
	The demon chooses such that Now choose a string and . I’m going to choose , so The demon splits your into a such that , as it cannot be empty we have . The remaining part because We pick an such that the string We pick Then because the power we have got here is not going to be a factorial! Let me prove why this is not going to be a factorial: is going to be some factorial . The next higher factorial is going to be . However, our power is going to be in the middle of these two factorials: Therefore, it cannot be a factorial! If you still have some doubts, try it out with the smallest k we could have, . Then choose the largest l we can choose, . So we have which holds. For any choice of the next higher factorial is going to grow even faster, therefore this will always hold. Therefore L is not regular
	The demon chooses such that Now choose a string and . I’m going to choose where is going to be the next prime number . Since the number of primes is infinite this will always work, no matter how large is. Therefore, The demon splits your into a such that The length of is going to be , so Since we have That’s all info we need to continue our proof, you will see why in the next step. We pick an such that the string We will pick I’m going to show that the length of is not going to be equal to a prime number, therefore it is outside of the language. , which is not going to be a prime number since we also have a factor here but prime numbers should only be divisible by 1 or itself (p). Therefore, Therefore L is not regular
and	The demon chooses such that Now choose a string and . I’m going to choose where and The demon splits your into a such that We pick an such that the string Pick Now and because we deleted at least one by letting . Therefore Therefore L is not regular

Automata theory: Context free languages

Pushdown Automata (PDA)

Pushdown automata are an extension to the automata we have seen before.
We will take an and add a stack to the control unit.
The stack does not have a size limit, i.e. is infinitely big.

More formally, a PDA is a 7 tuple

is the set of states
is the input alphabet
is the stack alphabet (the set of things we can put on the stack)
is the transition relation (more detail see below)
is the start state
is the initial stack symbol
is the set of final states

Transition relation

Now, let’s take a closer look at the transition relation :

is the state the PDA is currently in.
is the symbol we are currently reading. This symbol could be any symbol from our alphabet or epsilon, which means that we can do epsilon moves by reading nothing!
is the symbol which is currently on top of the stack. When we make a transition we are going to pop (remove) this symbol from the top of the stack.
is the resulting state after we have made the transition.
is going to be 0 or more symbols from our stack alphabet that we are going to push on the stack when we make the transition.

Let’s look at how one tuple of our transition relation looks like:

Here, we are in the state , we will read the symbol , and we see on top of the stack. We will then pop (remove) off the stack, go to the next state and push (add) on the stack. We will start pushing with and end with , such that we now have on top of the stack.

Keep in mind that if is not on the top of the stack, we cannot perform this transition. Think of it as like a “second input”.

Graphically, the transition looks like this:

Configuration

A configuration is a complete description of our PDA at a certain point in time. It is represented as an element of:
Configurations

is the current state.
is the whole part of the input we still have left to scan/read. Note that this is not only the next symbol.
is the whole stack content. Note that this is not only the symbol on top of the stack; it’s the entire thing.

Therefore, every configuration will be of the form

Configurations

We can define relations between those configurations, which tell us how to go from one configuration to the next configuration. We have to distinguish between two cases: The case when we consume a symbol from the input, and the case where we don’t consume any symbol and take an epsilon move.

Case 1: We consume a symbol. We write the relation as:

when there exists such an element in

Case 2: We don’t consume a symbol and take an epsilon move. We write the relation as:

when there exists such an element in

Acceptance

A PDA can accept a string either by empty stack or by final state.
We can turn a PDA that accepts by final state into a PDA that accepts by empty stack and vice versa.

By final state

A PDA will accept a string by final state if we can go from the start configuration to a final configuration which contains a final state and where we don’t have any string left to scan/consume. Between those two configurations there can be 0 or more other intermediary configurations which we pass through.
Formally, this is written as:

We will start at state s with the whole input string x still remaining to scan, and our stack contains the initial stack symbol only.
We will go through as many intermediary states as we want to.
We will end in a state f where we don’t have anything remaining to scan, and our stack content is g.

Therefore, the language accepted by the PDA is

By empty stack

A PDA will accept a string by empty stack if we can go from the start configuration to a final configuration where the stack is completely empty and where we don’t have any string left to scan/consume. Between those two configurations there can be 0 or more other intermediary configurations which we pass through.
Formally, this is written as:

We will start at state s with the whole input string x still remaining to scan, and our stack contains the initial stack symbol only.
We will go through as many intermediary states as we want to.
We will end in a state q where we don’t have anything remaining to scan, and our stack is empty.
“Stack is empty” means that we do not have or any other symbol on the stack

Therefore, the language accepted by the PDA is

Example PDA accepting palindromes

A Palindrome is a string where the reverse of that string is equal to the original string.

Examples: abba, abdba, aaaa

A PDA that will accept all palindromes by final state (and also empty stack) looks like this:

where , i.e. is any symbol from our alphabet .

Let’s start from scratch and build a PDA that looks like this.

We are going to differentiate between palindromes with an even number of symbols and those with an odd number of symbols
Our strategy for constructing the PDA that accepts palindromes with an even number of symbols will be the following:

We scan the word until we reach the middle of it.
Then we scan the second half of the word, expecting the symbols from the first half of the word in reverse order.
If this is the case, our word is a palindrome.

Now, we can assign a state for every step of our strategy

We will be in state 1 when we scan the first half of the word.
We will be in state 2 when we scan the second half of the word.
We will be in the final state when we have finished scanning the whole word and the reverse of the first half is equal to the second half of the word.

Let’s think about our transitions for each of our states:

When we are in state 1

We will stay in state 1 and keep adding the current symbol to the stack if we haven’t reached the middle of the word.
We will make a transition to state 2 when we have reached the middle of the word, without changing the stack.

When we are in state 2

We will stay in state 2 and keep popping the current symbol from the stack as long as the current symbol is equal to the symbol of the top of the stack.
We will make a transition to the final state if our stack contains only the initial stack symbol, and remove the initial stack symbol.

However, there is still one problem: Palindromes with an odd number of symbols, such as “abdba”. How to deal with them? We can throw away the symbol in the middle! For this example, it means that we throw away “d” and we will check the remaining string “abba”. We have already defined a PDA for checking palindromes with an even number of symbols above.
In order to integrate the “throw away the middle symbol” action into our existing PDA we will add one transition from state 1 to 2 where we consume a symbol but

don’t change the stack contents.

Now we have arrived at our final PDA, which is equal to the image above.
Some questions you might ask yourself are:

How does the PDA know when it has reached the middle of the word?
How does the PDA know that it has arrived at the middle symbol of an odd string, which can be thrown away?

The answer is that the PDA is going to guess. The PDA can do this because it is non-deterministic, similar to NFAs we have seen before. In other words, the PDA can take any of the transitions that are possible.
If we have a palindrome as a string and we process this palindrome, there will be one combination of transitions where the PDA will go into the final state at the end.
Let’s simulate our PDA on the string “abcba”, which is a palindrome:

The PDA has reached state 2 and will therefore accept.

Let’s simulate our PDA on the string “abcde”, which is not a palindrome:

X at this point our computation dies because there is no transition we could take (d ≠ b). Therefore the PDA will reject.

Context free grammars

A CFG, consisting of terminal and non-terminal symbols, is used to generate strings (sentences) by following its production rules.

The terminal symbols are the symbols which will form the string we are going to generate.
The non-terminal symbols are going to be replaced by other symbols. These other symbols can be terminals, non-terminals or a mix of both.
There exist so called “productions” which tell us how to replace the non-terminal symbols.
We have to keep replacing non-terminal symbols until we arrive at a string that consists solely of terminal symbols. This process is called “derivation”.
The final string is also called a sentence.

Formally, a CFG is a quadruple

is a finite set of nonterminal symbols
is a finite set of terminal symbols
are the productions
is the start nonterminal

To define production rules …

we will use the shorthand
to mean
we will use the shorthand
to mean

To express that is derivable from we will write or

Without : This means that given the string , we have applied some production rules and arrived at string
With : This means that given the string , we have applied exactly production rules and arrived at string

The language generated by a CFG G is the set of all its sentences, i.e. the set of all strings that can be generated by using its production rules.

The language generated is going to be equal to the set of all strings x that are derivable from the start non-terminal.
We can apply as many production rules as we want, as long as we end with a string (sentence) that solely consists of terminals.

This language L(G), generated by some CFG G, is called context-free (CFL).

If it’s all still a bit too confusing for you, look at the table below, where we convert context-free languages into context-free grammars:

Language	Converting it to a grammar
	Every string with the same number of a’s followed by the same number of b’s will be in the language. We have shown that this language is not regular, however, it is context free as we can come up with a CFG for it. The CFG is We can produce 0 a’s and b’s, i.e. the empty string, by applying only the second production rule: We can produce 1 a and 1 b by applying the first production rule once and the second production rule last: We can produce 2 a’s and 2 b’s by applying the first production rule 2 times and the second production rule last: Therefore, we can produce a’s and b’s by applying the first production rule times and the second production rule last. We achieve this through pumping in the middle of the string every time we want to add an and a .



L(G) is the set of all palindromes over the alphabet
L(G) is the set of all balanced parentheses
English, please So you have a bunch of ‘a’s, then a bunch of ‘b’s, then another bunch of ‘a’s. There needs to be twice as many ‘b’s as ‘a’s. Examples: ε abb abbbba bbbbaa	Simplify: We know that and are just going to be repetitions of the letter We let , then we can use those for the number of times we repeat the first and the last . We put and in the corresponding exponents to denote these repetitions. Simplify further: We use and to denote repetitions of the letters, so they can’t be negative. For example, what would mean? It doesn’t make sense, as we can’t repeat the letter minus one times. We split into . No magic here, just exponent rules. We can now split this into two grammars which we will then concatenate, because CFL are closed under concatenation. One will be for the first half: One will be for the second half: Now, the final grammar will be the concatenation of the grammar for the left and the right part:

Chomsky Normal Form

A CFG is in Chomsky normal form (CNF) when all productions are of the form

Two non-terminals are derivable by one non-terminal

One terminal is derivable by one non-terminal

Note that we don’t have here, therefore we cannot generate the empty string.

Chomsky normal form is harder to convert into a PDA, but it’s used in things like the CYK algorithm which parses context-free grammars.

Language	CFG in Chomsky Normal Form

Greibach Normal Form

A CFG is in Greibach normal form (GNF) when all productions are of the form

where

One terminal followed by non-terminals is derivable by one non-terminal

Note that we don’t have here, therefore we cannot generate the empty string.

Greibach normal form makes it easier to show that there is a PDA for every CFG, and can also be used with recursive descent parsers.

Language	CFG in Greibach Normal Form

Removing epsilon and unit productions

A production of the form is called an epsilon production

Assume we have another production
We can get rid of the epsilon production and refine this production
Our new refined and only production will be

A production of the form is is a called a unit production

Assume we have another production
We can get rid of this production and refine the unit production
Our new refined and only production will be

Since we can remove both epsilon and unit productions, we can conclude that for every CFG there exists another CFG that accepts the same language, except that it cannot accept epsilon:

Where is the grammar but without any epsilon or unit productions

Removing left-recursive productions

This is only required for conversion to GNF and therefore not examinable
A left-recursive production looks like the following:
This rule generates a string starting with exactly one and ending with one or more
It is left recursive because when we want to generate more , we are going to prepend them to the string, i.e. we are going to add them on the left side of the string.
In other words, we are generating the string from the right to the left, starting with the rightmost (last) symbol.
Every left recursive rule must have the part because it allows us to break out of our recursion, i.e. it allows us to end “spamming” , by replacing the non-terminal with at some point in our derivation.

We want to eliminate the left recursion in

We know this rule produces where
We can easily create with this new rule:

Now, we need to add another rule to produce the in front:

So in total we need these two rules to eliminate the left-recursion:

In general, we can have multiple alternatives that are left-recursive, and multiple alternatives to break out of our left-recursion. We have to get rid of all of them.

Our left-recursive production is in the form
For every alpha, we need the following new rule:

For every beta, we need the following new rule:

So all in all, we need the following rules for every alpha, beta:

What? You still don’t understand?
No, don’t feel bad! Have a look at a few examples below, they might help:

With left-recursion	Without left-recursion

You don’t get it? Yare yare daze... it’s alright. There are just some topics that are really hard to master. We won’t judge.
What better last-resort revision technique to fall back on than Indian YouTubers?
For this topic, I recommend this one. Sure, he leaves his answers with epsilon productions, but all the examples above are from this video and it’s really easy to follow.

Converting between CFG, CNF and GNF

For any context-free grammar, there exists a corresponding CNF and GNF that produces the same language, except epsilon.

Converting a CFG to CNF

Remove all epsilon and unit productions like described above
For all add a new nonterminal to our set of non-terminals

We will now have a new non-terminal symbol for each of our symbols in the alphabet
We will use these later when we define some new productions

In all productions, replace any terminal with the the non-terminal

We will replace all terminal symbols in every production by the new non-terminal symbols we have generated in the step before

Add productions to our set of productions

We will need those productions to produce the terminals, since we replaced all of them with our new non-terminals.

While there exists a production ... with , i.e. while there exists a production that derives more than 2 non-terminals and therefore cannot be in CNF

Remove this production from
Add a new non-terminal to

It has to have a different name in every iteration of this “while-loop”

Add the following two productions to

Now our grammar is in CNF.

If this is a little hard to take in, try following through the example below:

Step	Progress	Description
0		We haven’t done anything yet, I just wanted to show you the grammar first before we start converting it. It’s on the left.
1	Remove Remove Remove Remove	In this step, we’re simply removing the -productions and the unit productions, as detailed in the previous section. First, we remove the -production in A, but then it shifts it to B. So then, we remove the -production in B. We’ve gotten rid of all the -productions, but now we’ve got to get rid of all the unit productions. We get rid of the unit productions in B and S. For the one in B, we just replace it with A’s contents, and for the one in S, we just remove it, because is trivial.
2		In this step, we create a non-terminal for every terminal that exists. I call these and .
3		In this step, I convert all terminals into their respective non-terminals that I’ve defined in the last step. At first, this doesn’t make much sense, until the next step...
4		In this step, I set up productions such that the non-terminals can produce their respective terminals. Now the previous step makes sense!
5		In this step, I split up productions that have more than 2 non-terminals. I create non-terminals called so I don’t run out of names and so that I know that any non-terminal called is just used for shortening. Every red highlighted production is a production I need to shorten, and the blue highlighted production underneath is the shortened version.
Finish		This isn’t really a step. I’ve just compiled all the productions and put them into one place, so you can see the finished CNF result! This grammar accepts the same language as the one at the start, except now it’s in a different form: Chomsky Normal Form.

Converting a CFG to GNF

This is not examinable according to Gennaro, but still worth reading to give further insight into context-free grammars.

Convert the CFG into CNF.
Re-order the rules such that the start rule is at the top, the next derivable rules are below and so on. In other words it should be the case that for every possible derivation we are going down in the list of rules we have, but never up.

This will not always be possible, but try to order them as much as possible according to this schema.
You can’t do anything wrong here, the purpose of this ordering is to minimise the amount of work you will have to do.

Replace all non-terminals with non-terminals of the form where is equal to the number of all non-terminals. Start with the first rule at the top and work down, otherwise we cannot benefit from the ordering we did in the last step.
Check if for every production , it is true that Check the rules in ascending order, i.e. start with the rule where . If we find one rule where this does not hold, we must modify this rule immediately before we continue to check other rules. We have to deal with the following two cases:

Replace with whatever produces, i.e. derive
If for the resulting rule still , keep replacing until this is not the case anymore.

(left-recursive rule), so

Eliminate left recursion according to description above.

Now, we are taking a closer look at every production starting with a non-terminal, i.e. every production which is not in GNF yet. For all of those rules it must hold that where . Starting with the rule where the highest number and working in descending order, derive The result of deriving will be either a terminal symbol or two non-terminal symbols:

If we get another non-terminal, repeat this step with the current rule, i.e. derive the left-most non-terminal again, check again, and so on.
If we get a terminal symbol, our production must be in GNF, meaning that it starts with a terminal followed by some non-terminals. This means that this rule is OK, we can leave it as it is and continue to the next rule.

Rename all non-terminals back to their old names.

There might exist some rules which are unreachable from the start symbol. Remove those rules.

Now, our grammar is in GNF. Congrats if you are not confused at this point.
If you need to visualise this problem (or if you’re masochistic), an example of this is provided below:

Step	Progress	Description
0		Here, we haven’t done anything yet! This is the grammar we want to convert to GNF. Right now, it’s not in any normal form.
1		In this step, we convert the grammar into CNF using the method described above that you should have read.
2		In this step, we reorder the steps so that, when going through productions, you go from the top and work your way down; you don’t go back up. Sometimes, this isn’t always completely possible, for example the production uses , which is further up.
3		In this step, we replace all non-terminals with , where is the position of the non-terminal from the top to the bottom.
4	---------------------------- ---------------------------- ----------------------------	In this step, we look for any productions where . There doesn’t actually exist any productions in the example where this is true, so I’ve created a sub-example to demonstrate this step. This won’t affect the example we’re working on. That’s why the text looks like that. In the first block, we notice the production where this condition is true. So we replace in that production with it’s derivation. Next, we notice that, in the production , , which means that we have left recursion. Therefore, we introduce another non-terminal to rectify this called . Finally, on the last block, we notice that in . Therefore, we convert both non-terminals into their derivations, which happen to be single terminals. With this, we’ve completed this step.
5	---------------------------- ---------------------------- ----------------------------	In this step, we look at all the productions that are not in GNF, and we substitute productions until all the productions are in GNF. We start with the highest rule (which is the furthest down in our list) and work in descending order. For example, in the first block, is not in GNF. So we substitute ’s production into it, making the first part ‘a’. Next, is not in GNF, so we substitute the same thing in that, and we get . After that, the only production that is not in GNF is . So we substitute into that one, and we get . There are no longer any more productions that are not in GNF. We’re now in Greibach normal form! But we’re not done yet; our non-terminal symbols are all . We need to change them back.
6		In this step, we’ve just renamed our non-terminal symbols back to normal.
7		In this step, we’re just getting rid of any productions that are unreachable / unused. In this example, the productions for and are never used, so they are deleted.

PDA and CFG conversions

PDAs and CFG have the same power, therefore we can convert a PDA to a CFG and vice versa.

CFG to PDA

We will do this by first converting a CFG into GNF, and then converting GNF into a one state PDA.

Assume that our Grammar is in GNF. Then all productions will look like:

where

One terminal followed by non-terminals is derivable by one non-terminal

The PDA we are going to construct will only have one state and will accept by empty stack.
We are going to call the one and only state .
Formally, this means that the PDA will accept the string when there exists a derivation:

Therefore, the PDA we will construct is going to be of the form:

	representing	is defined as	because
	set of states	a singleton set containing the state	we will only have a single state in this PDA
	input alphabet	terminal symbols of CFG	these are the symbols, produced by the grammar, which our PDA is going to scan
	stack alphabet	non-terminal symbols of CFG	this is what we will push on/pop off the stack
	transition relation	see below	see below
	start state	the state	we only have a single state, so we don’t have much choice where to start.
S	initial stack symbol	the start production of CFG	we won’t have anything on the stack when we start scanning our string
	final states	empty set	our PDA will accept by empty stack, therefore we don’t need any final states.

Now, we still need to define the transition relation.
We are going to add an element to the relation for every production we have in our grammar.
Therefore, for each , add a transition:

Formally:
Graphically:

Example for the grammar of the language

The resulting PDA will have

A set of states
The input alphabet
The stack alphabet
The start state
The initial stack symbol
The transition relation as shown in the following graphic:

If we want to test if our PDA accepts the same language as our grammar, we can try to match every derivation of our grammar with a transition in our PDA
We will do this by giving the leftmost derivation, which means that productions are only applied to the leftmost non-terminal in every step.

Grammar derivations	PDA transitions

PDA to CFG

This proof is pretty much just the previous proof, but in reverse!
The steps we will go through are as follows:

Prove that one-state PDAs can be converted to a CFG
Prove that any PDA can be converted to a one-state PDA

As you can see, the law of transitivity show that by doing this, we will prove that any PDA can be converted into a CFG.

First, we will prove that one-state PDAs can be converted to a CFG.

The obvious thing to do first is to start off with a PDA:

Now we’re going to create a grammar (in GNF) like this:

This is just the previous proof in reverse, so if you understood that, you’ll get this.
If you don’t, I’ll translate that maths language above for you:

Line in weird maths language	Line in plain English
	This one’s nice and simple; I’m just defining a PDA with only one state, . is the alphabet that the PDA accepts is the set of symbols accepted onto the stack is the transition function is the initial stack symbol Where it says just means that there is no final state; this PDA will accept by empty stack.
	Again, I’m just defining a grammar. is the set of non-terminal symbols (notice how it’s the same as the set of symbols accepted onto the PDA stack) is the set of terminal symbols (notice how it’s the same as the alphabet that the PDA accepts) is the set of productions is the start non-terminal (notice how it’s the same as the initial stack symbol in the PDA)
	For every transition that exists in the PDA, there is also a production in the grammar that takes in the input stack symbol, and outputs the input symbol followed by all the stack symbols put into the stack.

Just by simply reversing the previous proof, we can show that a single-state PDA can be converted to a CFG.
But not every PDA is a single-state PDA. To complete the proof, we need to convert any PDA into a single-state PDA.

Now, we will prove that any PDA can be converted to a single-state PDA.

To convert a PDA to a single-state PDA, we need to store the state information inside the stack (since we’re not going to be moving to any state).
Before we do that, let’s change the PDA a little bit.

Let’s change it so that when the PDA reaches a single final state, it can clear it’s entire stack. Basically, we’re changing it so the PDA only has one final state.

Second, we’re going to introduce a new type of stack symbol. It looks like this:

where and
All the new stack symbols will be in a new set called
We do this so we can store state information on the stack.

We’re also going to change to just , where is the name of the only state in the PDA.
So, basically, our PDA looks like this:

What is ? Well, it’s defined as follows:

... and when , then:

To some people (normal people), this is just maths gibberish; what does this actually mean?
Remember that PDAs are also nondeterministic. Here, each guess the PDA makes, the stack “predicts” what transitions it’ll go through in the future. If the prediction is correct, it’ll pass. If the prediction is wrong, that guess will “die”.
In other words, at each stage, we nondeterministically guess the remainder of the computation of the automaton and then verify it.
This is just a way of encoding the non-deterministic nature of the states into the stack.

This is really hard to visualise just by reading the descriptions, so let’s go through an example.
We have a PDA here that maps out the language :

Before we pick out transitions, let’s have a look at all the possible stack symbols:

You can think of each of these as like a possible transition that the PDA could take.
For example, could refer to the transition going from state 1 to 1, accepting on the stack.
Yes, there are some transitions that do not exist, like or, but they could exist. The stack doesn’t know what transitions exist, so it guesses with all possible combinations of transitions. The guesses that predict transitions that do exist will live and are more likely to predict everything correctly, and the guesses that predict transitions that do not exist end up dying.

Let’s pick out a transition from this and convert it to its single-state equivalent.
Let’s pick the transition that goes from state 0 to 0 with ⊥ on the stack:

Now let’s convert it:

So if we compute that transition, our single-state PDA will compute 3 parallel guesses, depending on what’s on the stack.

Let’s pick out a simpler transition and convert that:

Since this transition will only pop from the stack and not push, it’ll only generate one transition relation for our single-state PDA:

So in our single-state PDA, if it seeson the stack, that means the guess’ prediction was correct and the guess will live. If it does not see that on the stack, the guess’ prediction was wrong and will die.

Now that we’ve defined these, let’s try inputting a string into this multi-state PDA, and see how it’s single-state equivalent works by comparing the two.

Multi-state PDA	Single-state PDA guesses	Description
		We haven’t really done anything yet. The initial stack symbol in a single-state PDA is always: This is possible because there’s only one initial state and, like what we’ve defined before, only one final state.
		We’ve inputted an ‘a’ as input, and now our single-state PDA has started guessing! As you’ve (hopefully) read above, three guesses will be made in this situation; a was read from the stack and an ‘a’ input was read, and now the following three guesses have been established. These three guesses go like this: What if we go from state 0 to state 0 using ‘a’ on the stack, then go from state 0 to state 2 using floor? What if we go from state 0 to state 1 using ‘a’ on the stack, then go from state 1 to state 2 using floor? What if we go from state 0 to state 2 using ‘a’ on the stack, then go from state 2 to state 2 using floor? I admit, some of these guesses are a little (really) dumb, but can you blame them? The stack doesn’t know what transitions exist; it doesn’t know that there isn’t a transition from state 0 to state 2. So what does it do? It guesses. It guesses every single combination so that if it gets something wrong, it can fall back on another guess. If all the guesses are wrong, then there’s no way the PDA can accept the string.
		Wow! Look at all those guesses! We’ve inputted another ‘a’ into our PDAs, and now, we’re exponentially making guesses. None of our guesses have died yet. Because we’re pushing more than we’re popping, we’re making more guesses about what we want to do next; there are more possibilities now. The transition here is slightly different, because we’re not using floor ⊥, but the concept is still the same. What? You want to see how this transition is converted to it’s single-state PDA equivalent? Well, alright, but only because you asked so nicely: Converts to:
	-------------------------------------	Oh no! The guesses! They’re dying! We’ve taken the transition: ...which converts into: So, what the single-state PDA is doing is, it’s looking for a on the stack. If it’s there, the guess is right, it just pops it off and doesn’t push anything else on. If it’s not there, the guess is wrong and the guess dies. As you can see, 6 guesses died, leaving only 3 guesses from each coloured partition. The top guesses show which ones survived and which ones died from the previous configuration (the gothy red and black ones are the ones that died). The bottom 3 guesses shows the top 3 living guesses after their ‘b’s have been inputted and stacks have been popped.
	-------------------------------------	We’ve taken the transition: ... which converts to: So, like before, it looks for a on top of the stack. If it’s there, we live. If it’s not there, we die. There’s only one guess here with a ! This guess has correctly predicted every move that the PDA has done. Congratulations, guess! So now, we pop off and continue. The existence of this guess is enough to accept the inputted string, right? Almost. There’s one formality we need to take care of first: we need to move to state 2, the final state, to properly conclude this. Will this last-standing guess be able to conquer the final step by predicting the final transition from state 1 to state 2? Spoilers: it does
		It should be needless to say that this final transition looks for a to pop, and finds one within the final guess. We can now accept by empty stack. This string is within the language of this PDA, and this single-state PDA is equivalent to this multi-state PDA!

If it helps to visualise it better, here’s the successful guess from start to finish:

Proving closure

Union

CFLs are closed under union

This means that the language of the union of two CFGs is also going to be context-free

We are going to create a grammar that is the union of two other grammars.

We can assume that the non-terminals of our two grammars are going to be distinct, meaning that we don’t have non-terminals that appear in both grammars.

If we have, we are just going to rename them, it won’t change the grammar.

Let’s introduce a new start symbol and a new production for this new start symbol.

The new production is going to link our new start symbol to the old start symbols from our two grammars like this:

Where is the start symbol of the first grammar and is the start symbol from the second grammar

As for the other non-terminals and the other productions, we are just copying them from what we have in our two other grammars.

Now we can generate everything that the first grammar is able to produce by deriving and we can also generate everything that the second grammar is able to produce by deriving

Concatenation

CFLs are closed under concatenation

This means that the language of the concatenation of two CFGs is also going to be context-free

Proving this is exactly the same as proving the union, just that our start production is going to be different:

Where is the start symbol of the first grammar and is the start symbol from the second grammar

Now we can generate everything that the first grammar is able to produce by deriving and then we can generate everything that the second grammar is able to produce by deriving

Kleene star

CFLs are closed under kleene star

This means that the language of 0 or more repetitions of a CFG is also going to be context-free.

We are going to create a grammar that represents kleene star of another grammar.

Let’s introduce a new start symbol and a new production for this new start symbol.

The new production is going to look like this:

Where is the start symbol of the other grammar.

As for the other non-terminals and the other productions, we are just copying them from what we have in the other grammar.

Now we can produce 0 or more repetitions of everything that the other grammar is able to produce by deriving or from

Intersection with regular languages

CFLs are closed under the intersection with regular languages

This means that the language of a CFG intersected with a regular language is going to be context-free.
In contrast, the intersection of two CFL is not always context-free.

Idea: Product construction with PDA and DFA.

The slides don’t go into more detail than that, but I will.

Let’s give our automata names, because that makes them easier to work with. Let’s call our DFA and our PDA .
DFAs are just weak PDAs, so we can convert our DFA into a PDA. It’s basically just going to be a PDA that doesn’t use it’s stack and doesn’t use any non-deterministic features. Let’s call this , where .

So now we have two PDAs, and . We need to perform product construction on these, so that we can construct a PDA that accepts strings accepted by both and (because that’s what intersection means).

Let’s call this product construct-ed PDA . Normally, product construction with PDAs don’t work, because they interfere with each other on the stack (they have to share a stack). However, this case is different. How do we know that and don’t interfere with each other on the stack?
We know they don’t interfere because doesn’t even use the stack! Remember, it’s just a DFA calling itself a PDA; it doesn’t even use the stack. Heck, it can’t even use any non-deterministic features, or any epsilon-moves. The stack is completely free for to use.

Because we’ve defined a PDA that supports the intersection of regular languages, we’ve shown that it’s closed. We can convert this PDA into a context-free grammar, showing that the language represented by this PDA is context-free, therefore showing that the intersection with regular languages is closed.

Limits of context-free languages

Are all languages context free? No!

E.g. the language is not context-free, meaning that we cannot construct a PDA that accepts or a CFG that produces this language. In fact, we would need a Turing machine to accept this language, which we will see later.

Assume our grammar is in CNF

Remember that we can convert any grammar to CNF if it isn’t

I’m going to use the following CNF for further explanations:

Which represents the language

If we start deriving a string using our CNF, we can draw the process of deriving the string as a binary tree.
For instance, let’s derive and draw the derivations as a binary tree:

We can see that in the middle path of the tree, we have a long path of non-terminals. We will have this in every derivation which is long enough (we’ll define what long enough means below).
We can also see that there some non-terminals are repeated.

If we go one step down in the tree, we are going to at most double its width

This is because our grammar is in CNF, which allows us to derive at most 2 non-terminals in a single rule.

Therefore, the derivation tree of a string of size must have a depth of at least

Suppose that there are non-terminals. In any derivation tree of a string of length at least there must be a path where some non-terminal appears at least twice, i.e. is repeated.

The derivation tree of a string of size must have a depth of at least
However, we only have non-terminals, therefore we need to repeat at least one of our non-terminals.

If we see a repeated non-terminal, we can start pumping this non-terminal into the tree.

Here, we split our tree into where the tree generated by the blue covers and the tree generated by the red covers only. Everything outside of the blue will be covered by and
Now, we are copying the subtree of the blue and paste it into the position of the red .
Every time we do this, we are essentially adding another part and another part to our tree.

We can also replace the subtree of the blue with the subtree of the red . By doing this we are taking and out of the tree completely.

This act of ‘removing’ and ‘pumping in’ and forms the basis for the pumping lemma for context-free languages.

Pumping (🎃) Lemma for context-free languages - contrapositive form

We can use the contrapositive form of the pumping lemma for context free languages to show that a language is not context free. We cannot use the pumping lemma to show that a language is context free!

It works by splitting up the language into parts, then “pumping” a subpart until you get a string that isn’t in the language. This is similar to what we have just done before.
The different steps of the proof can be understood as a game versus a demon.

The demon chooses a number such that .

We cannot choose a value for , we only know that it is greater than or equal to 0.

We choose a string such that .

We can choose any string that is within the language. However, we have to make sure that, no matter what is, our string is always in the language. We do not need to cover the whole language with our string!
The length of must be at least . A way to make sure that this is always the case is to have at least a part in that is equal to something to the power of .
As a general tip, try to make as trivial as possible, like in the following example:
For instance, we could let to make sure that is at least long ( are symbols from the alphabet).

The demon splits into such that , i.e. , and

You cannot choose how the string is going to be split!
What you do know is that the part of the string is not going to be empty. However, one of or can still be empty.
You also know that the length of the is at most .
You can think of as a window you can slide across what you defined as .
Therefore, we have to list all possible cases in which can be split and proceed with each of them to the next step.

For instance, if , we have to consider the following cases:

Case 1: contains only one repeated letter: Either only ’s, or only ’s, or only ’s

e.g.

Case 2: either or (but not both) is a word that contains more than one kind of symbol.

e.g.

Case 3: and do not have any symbols in common.

e.g.

It cannot be the case that contains all three letters because the length of is at most .

We pick an such that the string If this is the case we have proven that the language is not context-free!

We can pick any we want as long as it is equal to or greater than 0.
Our goal is to choose an such that the resulting string is outside of the language.
We have to do this for every case we defined in the previous step.
Here is how to do it for the example cases above:

Let
Case 1: contains only one repeated letter: then is going to have either too many of , or
Case 2: either or contains two different symbols: then is going to have too many of two of , or .
Case 3: does not contain the same symbol as . then is going to have too many of two of , or .

You MUST say “Therefore L is not context-free”, if you don’t, you will lose marks.

If this still doesn’t make any sense, look at a few examples below, and then read the general case above again.

Example	Solution
	The demon chooses a number such that . We choose a string such that . Let’s choose . We can see that . The demon splits into such that , i.e. , and Let’s consider all the different cases: Case 1: contains only one repeated letter: Either only ’s, or only ’s, or only ’s Case 2: either or (but not both) is a word that contains more than one kind of letter. Case 3: and do not have any letters in common. We pick an such that the string pick , then Case 1: Our resulting word will have one letter repeated more often than the other two letters. Case 2 Our resulting word is not going to be of the form abc*. Case 3: Our resulting word is going to have more of two of the letters but the number of occurrences of the third letter remains unchanged. Therefore, L is not context-free
	The demon chooses a number such that . We choose a string such that . Let’s choose . We can see that . The demon splits into such that , i.e. , and Let’s consider all the different cases: Case 1: contains only one repeated letter: Either only ’s, or only ’s. Case 2: either or (but not both) is a word that contains more than one kind of letter. Case 3: and do not have any letters in common. We pick an such that the string Pick , then Case 1: We are changing the number of either the ’s or the ’s. Therefore, the number of ’s won’t be eight times the number of ’s anymore. Case 2 Our resulting word is not going to be of the form aba*. Case 3: We are changing the number of either the first ’s and the ’s or the ’s and the second ’s. Therefore, the number of the first ’s won’t match with the number of the second ’s anymore. Therefore, L is not context-free
	The demon chooses a number such that . We choose a string such that . Let’s choose . We can see that . The demon splits into such that , i.e. , and This time we only have to consider one case as we only have one letter in our window We know that we have at least 1 and at most ’s in our window, which we are going to pump in the next step. We pick an such that the string pick where we define l to be Define the bounds of to be This is because and when I know that because its length, , cannot be expressed as some number . This is because is going to be strictly between and : If you have doubts, check the bounds of again and also try to plug in and (imagine you are the demon) Therefore, L is not context-free

Computability theory

Turing machines

With DFAs and NFAs, we had states and transitions.
With PDAs, we had states, transitions and a stack.
Now, with Turing machines, we have states, transitions and a “tape”, which is basically a linked list storing the input string at each node.
With this new data structure at our disposal, we can do even more things than PDAs.
In fact, according to the Church-Turing thesis, Turing machines (TMs, not to be confused with Pokémon TMs) are just as powerful as algorithms.

However, TMs do not cover all languages. They’re not that good. See Undecidable Problems for languages not covered by TMs.

So, what exactly are TMs? I’ll go through them informally and then formally.

Informally

They’re basically like PDAs, but instead of a stack, you get a tape, where you can move left or right on each transition.
You can think of the tape as like a doubly linked list, where you can traverse left and right.
The input string is on the tape, and we start at the far left and usually work our way to the right.

Here is an example of a tape in the middle of a TM execution:

⊢

⊔

⊣

The ⊢ symbol means the ‘start’, it’s a left marker.
The ⊔ symbol means ‘empty’, there’s nothing here.
The ⊣ symbol means the ‘end’, it’s a right marker.
The part that’s shaded in red is the current position we’re on, and the other blue parts are all the other positions.

Formally

Formally, a TM is a 9-tuple:
where

is the finite set of states

|Q| ≥ 3 : accept, reject and start states as well as other states

is the input alphabet
is the tape alphabet ( : the input alphabet is a proper subset of the tape alphabet)
is the left endmarker (left endmarkers are part of the tape alphabet, but not the input alphabet)
is the blank symbol (blanks are part of the tape alphabet, but not the input alphabet)

is the transition function

In English, it’s in the form where

is a possible state that is not the accept / reject state
is a tape symbol that must be at our current position on the tape
is a state to go to
is a tape symbol to write to the tape
is either left or right , to traverse the tape

is the start state
is the accept state
is the reject state

When performing transitions, we cannot overwrite and we cannot move left if we’re at the left endmarker.
We can write element in onto the tape (including ⊢ and ).

Decidability

Recursive and R.E languages

A total turing machine is a TM that halts on every input (does not loop infinitely)
A turing machine that isn’t total can hang (like Microsoft Visual Studio)

A language is recursive if there exists a total Turing machine that accepts it.
A language is recursively enumerable (r.e) if there exists a Turing machine that accepts all of the strings in that language.
Remember that all recursive languages are recursively enumerable: R ⊂ R.E

Proof: Recursive Sets Closed under Complement

Recursive sets are closed under complement (if L is recursive, ~L is also recursive)
Remember how we proved DFAs are closed under complement? It’s the same ordeal here.
Let’s say we have a language ‘A’ that is accepted by a TM ‘M’. If we can construct a TM for ~A, then we can prove this lemma.
Just swap the accept state and the reject state. This way, ~A will accept .
In other words, whenever A accepts something, ~A will reject it, and whenever A rejects something, ~A will accept it. That’s literally the definition of a complement!

Lemma: If L and ~L are r.e, then L is recursive
Let’s say that the TMs ‘M1’ accepts ‘L’ and ‘M2’ accepts ‘~L’.
If we can construct a total TM ‘N’ that simulates both M1 and M2, then we can show that this lemma is true.

Why the existence of N proves this lemma:
M1 accepts everything in L, so it never hangs for strings in L.
M2 accepts everything in ~L, so it never hangs for strings that are not in L.
When ‘N’ simulates M1 and M2, that means if M1 accepts, N accepts, and if M2 accepts, N rejects, so that the language of N is the same as M1.
But M1 never hangs for L, and M2 never hangs for anything not in L. If ‘N’ will halt when either M1 or M2 halts, that means ‘N’ will never hang for L and never hang for anything not in L. But that just means ‘N’ will halt on any string!
Therefore ‘N’ won’t hang on any string, making the language L recursive.

Constructing N:
We need to create a TM ‘N’ that simulates both M1 and M2.
Remember subset construction back in Automata theory? Well, this is similar to that.
Basically, we have two tapes, stuck to each other. The top tape is M1, and the bottom tape is M2. and ~L

⊢
⊢

But how do we know what position M1 and M2 are at? Currently, we can only select and together, where is the same number for both ‘a’ and ‘b’, like or .
That’s why we introduce a ‘hat’:

Now we know what position M1 and M2 are on! For example, if our simulation of M1 is on the tape element and our simulation for M2 is on the tape element , we can represent that on our tape like so:

⊢
⊢

Now that our tape is all set up, all we have to do is set the instructions for the TM to follow:

Scans the tape to find a symbol with a hat in the "upper" section of the tape. Then, according to the transitions of M1, it performs M1’s move.
If M1 accepts then N accepts;
Scans the tape to find a symbol with a hat in the "lower" section of the tape. Then, according to the transitions of M2 , it performs M2’s move.
If M2 accepts then N rejects;
Go back to step 1

The states work just like subset construction; we have tuples as states, in the form:
(,)
Where is a state in M1 and is a state in M2.

To complete this formal proof, all we need now is a short statement to show that ‘N’ is total:
“Any is either in or : So it is accepted by either M1 or M2. So N will eventually accept or reject, thus N is total and so is recursive.”
Just like that, the proof is done!

Lemma: If L is r.e and not recursive, ~L is not r.e
Let’s say TM ‘M1’ accepts ‘L’. If ‘L’ is r.e and not recursive, then that means ‘M1’ loops on some strings that are not in L (which are instead in ~L)
Let’s try to construct a TM ‘M2’ that accepts ~L. This Turing Machine would have to accept all strings that are not in L.
However, as stated before, ‘L’ is not recursive, so ‘M1’ loops on some strings that are not in L. Because ~L is every string that is not in L, M2 would have to loop on some strings that are in ~L, meaning that M2 can never fully accept ~L, and no Turing Machine can ever accept ~L, making it not recursively enumerable.

(Semi) Decidable properties

First of all, let’s define what a property is. A property of a string is a predicate that tells you something about the string. It is either true or false.
A few examples are:

The string has length 2 : {w | #w = 2}
There is a letter ‘g’ in the string : {w | ‘g’ ∈ w}
There is an even number of letters in the string : {w | #w mod 2 = 0}
The string contains the substring “jojo” : {w | “jojo” ∈ w}

A property is decidable if all strings with that property form a language that is recursive.
For example: the property that the given string is equal to “duwang” (because all finite languages are recursive)

A property is semi-decidable if all strings with that property form a language that is recursively enumerable.
For example: the property that the given string is an encoding for a Turing Machine that halts (you can simulate the TM and accept if it halts, but if it loops, then your simulation will loop, making this recursively enumerable but not recursive).

Remember:

semi-decidable ⇔ recursively enumerable
decidable ⇔ recursive

Universal Turing machines

Multiple tapes

Are TMs with multiple tapes more powerful than TMs with only one tape?
No! Remember the section “Recursive and R.E languages” where we constructed a dual-tape TM out of a single-tape TM?
We can apply the same logic, but with any number of tapes.
If we can make a multi-tape TM out of a single-tape TM, that means multi-tape TMs are just as powerful as single-tape TMs.

Simulating Turing machines

Turing machines are so powerful that they can simulate other Turing machines! Think of it like a virtual machine, like VMware or VirtualBox.
These Turing machines are called Universal Turing machines (UTM).

Weird maths language

Plain English

There exists a TM such that

Where is an encoding of the TM ‘M’, followed by a ‘#’, followed by an encoding of ‘x’ in M’s input alphabet.

Like all Turing machines, they have a string input.

Universal Turing machines only accept strings that fit this pattern:

... which is an encoding of the TM ‘M’, a separating character (in this case #), and the encoding of the input string ‘x’.

First of all, the UTM checks if M#x is a valid encoding. If it isn’t, it’ll reject it straight away.

If the TM ‘M’ accepts ‘x’, then the UTM will accept M#x

If the TM ‘M’ rejects ‘x’, then the UTM will reject M#x

The UTM does this by simulating M on ‘x’. If it finds that M accepts, then the UTM will accept. If it finds that M rejects, then the UTM will reject.

We use UTMs without even thinking about it, for example interpreters (for Java, Python, Ruby etc.) and virtual machines.

Encoding Turing machines

So how do you actually encode a TM and an input into M#x?
Well, the details aren’t really important, but there’s a method from the Kozen book that converts all TM parameters into numbers, and then encodes them into one big number like this:
where:

is the set of states
is the tape alphabet where the first k numbers are the input alphabet
Start, accept and reject states are , and respectively
Blank symbol is and the endmarker is

So you use the number of zeroes to determine what state / alphabet character the encoding is referring to.

To encode the transitions, this encoding could be followed up by:
... which is the encoded form of , or
... which is the encoded form of

Constructing a UTM

First of all, a UTM checks if the input string M#x is of the right encoding.
The rules for encoding could be anything; there’s lots of ways to do it. One of which is defined in the previous section.

Second, we use three tapes to store:

Description of M
Contents of M’s tape
M’s current state and position on the tape

Third, the UTM will loop through the following steps:

looks at M’s current state and head position (tape 3);
reads the tape contents at the correct position (tape 2);
reads the relevant transition (tape 1);
simulates transition, updating tape, state and head position;
accepts if M accepts, rejects if M rejects.

Halting problem

The Halting problem goes like this:
Can you come up with a TM that takes in an encoding M#x and accept if M halts on x and rejects if M loops on x?
Or, more formally...
Is the set recursive?
No.

Proof by contradiction

Here’s an intuitive proof before we move onto diagonalisation.
Ever heard of the Pinocchio paradox?
What happens when Pinocchio says “My nose will grow?”
If it’s a lie, it won’t grow. But if it doesn’t grow, Pinocchio will be lying, so it will grow. But then he’s telling the truth, so it won’t grow, and we’re just going around in circles.
This proof is just like that.

Let’s say we have a TM called K that takes in an encoding M#x and accepts if it halts, and rejects if it loops. Basically, K solves the HP.
Let’s create another TM called P that takes in an encoding of M, and runs K on M#M. It goes into an infinite loop if K accepts, and halts if K rejects.
What if we run P on P? Will it loop or halt?
If it loops, that means P halts. But if P halts, that means it has to loop. We’re going around in circles again; there is a contradiction!

If that’s a bit too wordy for you, here’s some pseudocode showing this:

P on M:
call K on M#M
if K says halt --> go into infinite loop
if K says doesn't halt --> halt

P(P) = ?

Proof by diagonalisation

We can also show how P is going to behave via diagonalisation
We can define the output of our TM K, which is supposed to decide the HP, by constructing a table
If K exists then we will be able to list the halting behaviour of all TMs in this table

		0	1	00	01	10	11 ...
	H	L	L	H	L	H	L
	L	L	H	H	L	H	L
	H	L	L	L	L	L	L
...	L	H	H	L	H	L	H

Where the left column is all the possible Turing machines and the top row is all the possible inputs.

We can create the TM P by taking a diagonal snippet of this table:

		0	1	00	01	10	11 ...
	H	L	L	H	L	H	L
	L	L	H	H	L	H	L
	H	L	L	L	L	L	L
...	L	H	H	L	H	L	H

Then, we flip all of the cells. Our TM P behaves like this:

		0	1	00	01	10	11 ...
	L	H	H	H	H	L	H

This new TM P can’t fit anywhere on our original table
The TM K cannot exist because the TM P wasn’t anywhere on our table

Examples on how this table entry is equal to P:

P on ():

Call K on #

According to the table K returns H

Therefore we will go into an infinite loop

P on ():

Call K on #

According to the table K returns L

Therefore we will halt

So we’ve proved that there’s no total Turing machine for this. However, the set HP is recursively enumerable.
Why? Because if the TM simulates M#x and it halts, the TM will accept this encoding. On all input strings that the TM accepts, the TM will accept it and not loop (because that’s the definition of an accepting string; it won’t loop).
So HP is recursively enumerable, but not recursive. That also means ~HP is not recursively enumerable (see proof in Recursive and R.E languages).
If ~HP is not recursively enumerable, there is no way of knowing if a Turing machine will loop on an input string or not.

Decidable / Undecidable problems

A decidable problem is a decision problem for which there exists a total TM that decides it.

The total TM will accept for all yes/true instances and reject for all no/false instances of the problem.
Therefore, it will never loop for any input.

An undecidable problem is a decision problem for which there does not exist a total TM that decides it.

However, there might exist a TM that accepts yes/true instances and loops or rejects on no/false instances of the problem, making it semi-decidable.

Ever heard of the membership problem? It goes like this:
Basically, can you make a TM that decides if a TM M will accept an input x?
Sounds simple, right? Well, it’s actually impossible. It’s an undecidable problem.

Why is the membership problem undecidable?
Because if it was decidable, you’d be able to solve the Halting problem, which we know cannot be solved because of our proof from before!

If we had a TM K that solved the membership problem, we could create a TM N that would solve the Halting problem.

Let’s construct a new TM N with input M#x:

Constructs a new TM M’ which will change our existing machine M: All reject states become accept states, and all other states stay the same as before. Basically, M’ will accept iff M halts on x.
Simulates K on M’#x. If K accepts, then make N accept. If K rejects, then make N reject.

Here, N solves the Halting problem because it’ll accept if M halts on x and reject if M loops on x.
We know that the Halting problem is undecidable, so that means the membership problem must be undecidable.

By doing this, we’ve shown that the membership problem is just as hard as the Halting problem, because we can use the membership problem to solve the Halting problem.
This is called a reduction, at which we will take a closer look soon.

Reductions

“Reduction” is an operation between two languages.
It is used to show the decidability of a language.

For example, in the previous section we “reduced” the Halting problem to the membership problem, thus showing that the membership problem is undecidable (because we know the Halting problem is undecidable).
To represent this, we say that HP ≤ MP and, for our previous example, HP ≤ A.

Informally, a reduction is where you map one language to another language, so if you solve the second language with a TM, you can use that solution to solve the first language.
Like what we did with MP and HP; we reduced HP to MP, so if we solve MP, we can solve HP.

Formally, a function is computable when there exists a total TM ‘K’ that when started with on its tape, eventually halts with on its tape.
A reduction of to is a computable function s.t.
So a reduction function maps the inputs of one TM to another TM.

For example, remember the previous section when we showed how to solve the Halting problem if we had a TM K that could solve the membership problem by creating a TM N that could solve the Halting problem?
Here’s a diagram of what the TM N actually does:

What N actually does is it takes in a Halting problem input, and uses the reduction function to turn it into a membership problem input.
That’s all reduction is! It’s just a function converting input of one problem into an equivalent other.

Let’s pluck out that reduction function and take a further look:

Let’s say we have our TM for the Halting problem N and our TM for the membership problem K again.
If we run N on M#x, it should yield the same result as running K on M’#x.

In summary, just think this:
If you see A ≤ B, then you can convert the problem A into the problem B.

Let’s look at some more examples

State Entry Problem (SEP)	Decide whether a TM M’ enters a given state q on input x
	Assume that we already have a TM K that decides the SEP. Create a TM N which decides the HP using a reduction from the HP to the SEP.
	N on inputs M and x (e.g. as an encoding M#x): Construct a new TM M’ from M Add a new state q which is not in M From all halting states add a new transition to q Make all halting states non-halting states Simulate K on inputs M', q, x (e.g. as an encoding M’#q#x) Accept if K accepts, reject if K rejects


	Since we know that the HP is undecidable, the SEP must also be undecidable
Blank Tape Halting Problem (BTHP)	Decide whether a TM M’ halts on the blank tape (no input). Decide whether a TM M’ halts on input epsilon.
	Assume that we already have a TM K that decides the BTHP. Create a TM N which decides the HP using a reduction from the HP to the BTHP.
	N on inputs M and x (e.g. as an encoding M#x): Construct a new TM M’ from M Ignore any input Write x on the blank tape and reset the tape head Execute M with x on the tape and return the result Simulate K on input M’ Accept if K accepts, reject if K rejects

	If you are confused about M’: M’ is a special version of M M’ has the same instructions as M, however, it has some instructions prepended to the other instructions We will ignore any input given to M’ We need to write x on the tape and reset the tape head to the start So all in all we have a TM M’ which has x hard coded in its instructions
	Since we know that the HP is undecidable, the BTHP must also be undecidable
Emptiness Problem (EP)	Semi-decide whether the language of a TM M’ is the empty set
	Assume that we already have a TM K that semi-decides the EP. Create a TM N which semi-decides the LP (complement of HP) using a reduction from the LP to the EP.
	N on inputs M and x (e.g. as an encoding M#x): Construct a new TM M’ from M Ignore any input Write x on the blank tape and reset the tape head Execute M with x on the tape Accept if the execution ends Simulate K on input M’ Accept if K accepts

	If you are confused about M’: Similarly to the last example M’ will ignore any input and write x to the tape, so again our TM M’ comes with the x hard-coded Now, if M loops on x, then M’ will also loop and never accept. Therefore, L(M’) is the empty set. If M doesn’t loop on x, then M’ will accept and its language is clearly not equal to the empty set
	Since we know that the LP is not semi-decidable, the EP cannot be semi-decidable as well.
Same Language Problem (SLP)	Decide whether TMs M1 and M2 accept the same language.
	Assume that we already have a TM K that decides the SLP. Create a TM N which decides the HP using a reduction from HP to SLP.
	N on inputs M and x (e.g. as an encoding M#x): Construct a new TM M1: Simulate M on x If simulation halts accept (no matter what the input is) Construct a new TM M2: Accept (no matter what the input is) Simulate K on inputs M1 and M2 Accept if K accepts, reject if K rejects
	This works because M2 always accepts everything, so its language is M1 either accepts nothing or everything If the simulation halts, i.e. if M halts on x, it will accept everything, so its language is If the simulation doesn’t halt, then it will loop and run forever. Therefore its language is the empty set. We can see that the language of M1 and M2 is the same if and only if M halts on x So when we ask K if M1 and M2 have the same language it will accept if and only if M halts on x and reject otherwise We can use this answer to solve the HP, which is a contradiction because we know that the HP is undecidable
	Since we know that the HP is not decidable, the SLP must be undecidable as well.

Properties of reductions

If A ≤ B and B is r.e. then so is A. Equivalently, if A is not r.e. then B is not r.e.
An example of this reduction rule is MP ≤ HP.
If A ≤ B, then there’s a reduction function from A to B:
If B is r.e, then there exists a TM ‘M’ that accepts B.
To prove A is r.e, we just need to define a TM that accepts A.
This is that TM ‘N’:

On an input ‘x’, compute f(x)
Run ‘M’ on f(x)
If that accepts, make N accept
If that rejects, make N reject

If A is not r.e, then there is no TM that accepts it.
Let’s just say a TM exists for B, called ‘M’.
Using the same TM we defined before (N), we can now define a TM to accept A, if the TM ‘M’ actually exists.
This is a contradiction, so a TM for B cannot exist if A is not r.e., and therefore B is not r.e.

If A ≤ B and B is recursive then so is A. Equivalently, if A isn’t recursive then neither is B
It’s the same proof as before, except now, the TM ‘M’ halts on all inputs.
If we follow through the same steps as before, our TM ‘N’ will not halt on any inputs either, because it uses ‘M’. This means that A is recursive when B is.

The same goes for B not being recursive if A isn’t.
If A is not recursive, then it’s either r.e or not r.e. We’ve already proved the case for when A is not r.e, so let’s prove the case for when A is r.e.
Let’s say that B is recursive and has a TM ‘M’ that halts on all inputs.
If we have ‘M’, we can create a total TM ‘N’ that halts on all inputs and determines A.
Therefore, B cannot be recursive if A is not recursive.

Rice’s theorem

Does Theory of Computing depress you?
Well, this theorem is about to make it worse.

Rice’s theorem states that every non-trivial property of r.e sets is undecidable.

So, what does this actually mean?
Every language with a non-trivial property is constructed by a Turing Machine which loops.
A r.e set is actually the same as a TM, but we’re just referring to a TM by its language. By saying r.e set instead of TM, we ignore things like the TM’s states, tape etc. and we only focus on the language it recognises.
A property of an r.e set is a property of the TM’s language. It’s a predicate that tells you about what the TM recognises.
A non-trivial property is a property that isn’t always true or always false.
So what Rice’s theorem means is that if you partition the set of all r.e sets into two using some condition, both partitions will be undecidable.

Let’s visualise this. We have the set of all r.e sets:

What if I were to split this set into two, using some property?
Let’s say the property is this: “the r.e set must contain the empty string”.

Now we know our sets aren’t recursive anymore! This is due to Rice’s theorem.
Why are they not recursive?
Remember the “Decidable / Undecidable problems” section where we proved that it’s impossible to have a total TM that tells if M accepts the empty string?

It doesn’t stop at accepting the empty string. Any predicate you can think of that splits the set of all r.e sets are undecidable:

Accepts any string at all?
Accepts every string?
Accepts a given finite language?
Accepts a given regular language?
Accepts a context-free language?
Accepts a recursive language?
Accepts a language with the word “davidtyoember” (😍) in it?

You may be thinking “Why keep using the term ‘r.e set’? Why not just say any non-trivial property of a Turing machine is undecidable?”
There’s a reason why it’s only a property of a TM’s language and not a property of the TM itself.
The following are properties of TMs, but not r.e sets:

M has less than 400 states (this isn’t a property of an r.e set because 1. it queries nothing about the language and 2. you could find two TMs that recognise the same set, but one having less than 400 states and one having more than or equal to 400 states)
M halts on all inputs (this isn’t a property of an r.e set because 1. it, again, queries nothing about the language and 2. you could have two TMs that recognise the same set, one halting on all inputs and one which doesn’t)

So if the concept of r.e sets is a little weird to you, then don’t worry; just think about it in terms of TMs and their languages. Instead of a property of a r.e set, think of a property of a TM’s language.

Sure, because of Rice’s theorem, there are lots of properties that are undecidable, like:

membership problem for TMs: does a given TM accept a given string?
emptiness problem for TMs: does a given TM accept any strings at all?
equivalence problem for TMs: given two TMs, do they accept the same language?

... but there are properties that are decidable:

membership problem for DFAs/NFAs: does a given DFA/NFA accept a given string?
emptiness problem for DFAs/NFAs: does a given DFA/NFA accept any strings at all?
equivalence problem for DFAs: given two DFAs, do they accept the same language?
membership problem for CFGs
emptiness problem for CFGs

They’re all decidable because simulations of NFAs, DFAs and CFGs are all decidable; you can’t loop with any of those because with each transition, a bit of the string is being spent.

If Rice’s theorem depressed you, don’t worry. You’ve come this far, and there’s only one topic left to go. You can do it! I believe in you!

Complexity theory

We know that Turing machines and algorithms are equivalent. However, how do we measure and compare the complexity of algorithms?
Complexity theory is concerned with measuring how the time/space required by an algorithm/problem increases as the input gets larger
You should remember Big-O notation from Algorithmics (I hope).
In this module, we use Big-O and Big-ϴ, and we use the classes P and NP to categorise problems.

Big-O and Big-ϴ

To define complexity in an algorithm or a problem, we use Big-O and Big-ϴ notation.
The complexity of a problem is the complexity of the best algorithm that currently exists for this problem.

Big-ϴ

Informally (English):

If an algorithm is for some equation , that means that the algorithm’s complexity scales with and can only scale by , it can do no better than that.
For example, the average-case complexity of quick sort is .

If a problem is for some equation , that means there exists no algorithm that can solve the problem at a complexity better than .
In other words, it means is the complexity of the best known algorithm for .
For example, the sorting problem has the complexity because no sorting algorithm exists that does better than (except sleepsort, but that doesn’t count).

Formally (weird maths language):

We say that is for some if there exist reals:

c > 0
d > 0
integer M > 0

with

for all .

We say that and have the same rate of growth.

Big-O

Informally (English):

If an algorithm is for some equation , that means that the algorithm’s complexity scales, at worst, with , but it could do better in certain cases.
For example, the worst-case complexity of bubble sort is .

If a problem is for some equation , that means our best algorithm can solve this problem with complexity . There might exist an algorithm that does better, but we haven’t discovered it yet.
In other words, it means is the complexity of the best known algorithm for .
For example, if we had an algorithm that had a worst-case of , then that means the algorithm will, at worst, perform , but may perform better on average-case or best-case.

Formally (weird maths language):

We say that is for another function if there exist reals:

c > 0
integer M > 0

with

for all .

We say that g(n) is an asymptotic upper bound for f(n)

The class P

The class P is the set of all decision problems that can be solved by a deterministic Turing machine in polynomial time.

Decision problem → For all inputs we receive a “yes” or “no” answer
Deterministic Turing machine → A Turing machine that cannot make any choices in terms of the next move

This is the model of a Turing machine that we have seen so far
We will see a non-deterministic Turing machine later

Solved in polynomial time → The problem is feasible

Some examples that are in P are

Searching an unordered list

Use sequential search, which is

Searching an ordered list

Use binary search, which is

Testing whether a list is sorted

Go through the list sequentially and compare each element with the element before, which is

An example that is not in P is: Given a first-order statement about non-negative integer variables (with only symbols 0, 1, +, = and logical operators allowed), is it true?

We are going to look at some equation with non-negative integer variables, i.e. variables that can be assigned natural numbers
On the left and right hand side of the equation we can only

use the variables
use the “+” operation
use the numbers “0”, “1”

We will use quantifiers to form the first-order logic statement, which is going to be either true or false

We can use the quantifiers “for all” and/or “there exists”
We must define one quantifier for each variable

Problem instances are:

For all x there exists y such that the equation holds

There exists x then for all y there exists z such that the equation holds

This problem is known to be decidable, but infeasible

This means that we can always say whether this equation holds or not
However, this cannot be done in polynomial time with a deterministic machine

Feasible and infeasible problems

A problem is said to be feasible if it has an algorithm with time complexity some polynomial f(n).

Examples: , , ,
These are problems with polynomial time complexity

Otherwise, a problem that can be solved in finite time but has no polynomial time algorithm is said to be infeasible

Examples: , ,
These are problems with complexity greater than polynomial time (exponential and factorial time complexity)

Decision problems

A computational problem is called a decision problem if the output for any problem instance is either “yes” or “no”.

A problem instance is a specific input for a problem
For example, the numbers “1”, “4”, “5”, “8” are problem instances for an algorithm that determines whether a number is an even number or not
This algorithm is a decision problem because given a number this algorithm can say “yes this number is even” or “no this number is not even”

We can classify the problem instances into positive/”yes” instances and negative/”no” instances

For example, “4” and “8” are positive instances and “1” and “5” are negative instances of the “is this number even” problem.

The “yes” instances are forming a language. Therefore, solving the decision problem is the same as deciding this language

For example, the positive instances of the “is this number even” problem form the language of all even numbers.

The PATH problem

Let PATH be the following decision problem: Given a directed graph, does there exist a directed path connecting two given nodes?
For example, given the following graph:

Does there exist a path from 1 to 6?

Yes, 1 → 2 → 4 → 6

Does there exist a path from 3 to 4?

Yes, 3 → 5 → 4

Does there exist a path from 5 to 1?

No, from 5 we can only reach 4 and 6

Does there exist a path from 1 to 3?

No, from 1 we can reach all nodes but 3

A polynomial time algorithm for PATH is the following:

On input (G, s, t), where G is the graph, s is the start node and t is the goal node
Place a mark on node s
Repeat the following until no additional nodes are marked

For each edge from node a to b of G, if a is marked then also mark b

If t is marked then accept, otherwise reject

This algorithm runs in polynomial time

The graph has n nodes and m edges
For each node n we have to mark and check at most m other nodes which are directly connected (via an edge) to n
Therefore, this algorithm runs in time

The HAMPATH problem

Let HAMPATH be the following decision problem: Given a directed graph, does there exist a directed path connecting two given nodes that goes through each node exactly once?
This is an extension of PATH: We do not only have to find a path from one node to another node - the path also needs to go through each node exactly once.
The best known algorithm for this problem is exponential, however, we do not know if there exists a polynomial time algorithm.
Given a candidate for a HAMPATH, we can verify it in polynomial time.

Regular and context-free languages

Every regular language is in P

If a language is regular then there exists a DFA that accepts the language
A Turing machine can simulate a DFA
It will have the input string and a symbol that marks the end of the string (end marker) on the tape
It will always move right after consuming a symbol and change state in the same way as the DFA would change state
When it reaches the end marker and it is in a state that corresponds to the accept state of the DFA it will accept, otherwise it will reject
Therefore, the Turing machine will execute exactly n steps (where n is the length of the input string) so it’s time complexity is

Every context-free language is in P

For every context-free language there exists a corresponding context-free grammar.
For every context-free grammar, there exists another grammar which accepts the same language in Chomsky normal form.
Grammars in Chomsky can be parsed using the CYK (Cocke-Younger-Kasami) algorithm (citation), which parses in cubic time, which is polynomial.
One could create a Turing machine which applies the CYK algorithm to a given context-free grammar, and that Turing machine would accept said context-free language in cubic time.

The class NP

The class NP is the set of all decision problems that can be solved by a non-deterministic Turing machine in polynomial time
NP stands for non-deterministic polynomial

Anything in P is also in NP. If a problem can be solved in polynomial time by a deterministic machine, then also by a non-deterministic one.

Let’s look at some NP problems which are believed not to be in P

Non-deterministic Turing machine

A non-deterministic TM is of the same form as a deterministic TM:
However, the transition function is defined differently

is the transition function
Note that the output of the function is now a power set of configurations
This means that given a configuration, there are many possible other configurations in which the TM can be after making the transition
Therefore, the Turing machine becomes non-deterministic.

We can think of the non-deterministic TM as a TM that makes all possible transitions from a given configuration in parallel
Therefore, the computation of a non-deterministic TM is a tree, whose branches are all traversed in parallel
In contrast, the computation of a deterministic TM is a linear path, as there is exactly one configuration after making a transition.
Non-deterministic TMs have the same expressibility as deterministic TMs. We can simulate a non-deterministic TM with a multi-tape deterministic TM.

The deterministic TM has four tapes:

One to remember the input.
One to simulate the current state of the branch.
One to remember the branch being simulated on.
One to remember the branches simulated.

The deterministic TM accepts an input when one of the branches accepts, and rejects otherwise.

A non-deterministic TM is a decider when all its branches halts on all inputs, similar to a total TM.
Since we now have a tree of configurations, we need to re-think the condition when the non-deterministic TM accepts/rejects.
Reminder: A deterministic TM

accepts/rejects as soon as it reaches a accept/reject configuration
loops if it never reaches such a configuration

A non-deterministic TM

accepts if some branch of the computation leads to the accept state
rejects if all branches of the computation lead to the reject state
loops if none of those two conditions are ever met

For example, the tree can be infinite and without any accept state

If all branches halt on all inputs, i.e. if the computation tree is finite, the non-deterministic TM is called a decider

This is similar to a deterministic total TM, which always halts on any input
The decider accepts an input if there exists a sequence of choices, where at least one ends in the accepting state.
Otherwise all choices will end in the reject state and the decider will reject the input

Non-deterministic TMs are not more powerful than deterministic TMs as we are going to see shortly
For any non-deterministic TM there exists a deterministic TM accepting the same language

A non-deterministic TM N can be simulated by a 3-tape deterministic TM D
D will try all branches of N’s computation using breadth first search

If we use depth first search our simulation can get stuck in an infinite loop in case of an infinite tree
There might exist an accept state on another branch which we are unable to reach because of the infinite loop
In order to avoid this we have to use breadth first search

The 3 tapes are used as follows

One tape stores the input
One tape used to simulate a branch up to a given depth
One tape used to remember the branch being simulated

If D ever finds the accept state in the tree it will accept. Otherwise D will loop.

For any non-deterministic TM that halts on all branches (= decider) there exists a total TM deciding the same language

We can modify the proof from above such that if N halts on all branches then D halts
Therefore, we will introduce a fourth tape where we remember the branches that halt. We are not going to explore these branches again
If there is no branch left to explore, reject

Therefore, we can conclude that any non-deterministic TM has an equivalent deterministic TM

Time complexity of Turing machines

Let’s think about the time complexity of Turing machines

First, we will define the following:

The size of a problem instance = the number of symbols on the input tape
The algorithm cost (measure of time taken) = number of steps the Turing machine takes

The time complexity of a decider is a function f(n), where f(n) is the maximum number of steps the machine takes on any branch of its computation, on any input of length n.

In other words, to determine the time complexity of a decider, we are going to take a look at the computation tree for some input of length n
The time complexity is equal to the depth of this tree

Every time multi-tape deterministic TM has an equivalent time single-tape deterministic TM

If a multi-tape TM runs in time, it can only use tape cells on each tape
We can construct a single-tape TM that simulates the multi-tape TM
Before simulating a single step of the multi-tape TM we need to find the position of the tape heads. These positions will be marked using special symbols on the tape.
Therefore, we are going to scan the tape from left to right, which takes an extra steps. The tape heads might be anywhere on the tape but cannot be beyond the tape cells, which is why we use big O notation here.
All in all, we have steps to simulate, and for each step we can find the tape heads in time, so all in all we require time for the whole simulation.
The following is an example of a single-tape TM simulating a 3-tape TM. The position of the heads is marked with a “^” symbol:

⊢	⊢	a	b̂	c	d	e
	⊢	h	i	j	k̂	l
	⊢	v̂	w	x	y	z

Every time non-deterministic Turing machine has an equivalent time deterministic Turing machine

Each branch of a time non-deterministic TM N has length of at most .

Remember: the time complexity of a non-deterministic TM is defined as the depth of the computation tree
Therefore, the length of the longest branch is equal to the time complexity

Each node in the computation tree of N can have up to children.

This is also known as the branching factor of the tree.

Therefore, there are at most leaves, and nodes in the computation tree of the non-deterministic TM N

A tree with branching factor and depth cannot have more than nodes
A tree with branching factor and depth cannot have more than leaves. Furthermore, it also cannot have more than leaves, which is a less restrictive statement.

We can construct a multi tape deterministic TM to simulate N. This will take time.

We need to simulate the path from the root to each of the leaves. Since we have at most leaves we need to simulate at most paths.
Each of the paths we are going to simulate is no longer than
Therefore the simulation will take no longer than time

is (see tutorial 8 question 2)

The big O notation is an upper bound and hides a constant , therefore we have to show that there exists some and such that for all

Proof from the tutorial that I don’t understand

My proof that I fully understand

For all , therefore and

The SAT problem

SAT: Boolean satisfiability

Given a boolean expression, does there exist an assignment of true and false (or 0s and 1s) to all variables such that the formula is satisfied?
Example:
Does there exist an assignment for the variables such that the formula is satisfied?

SAT is in NP: We can non-deterministically generate all possible assignments to the variables and then check if the formula is satisfied.

For each variable, we have two options: It can be either true or false
If we want to solve this problem in a deterministic way, we have to try out either true or false for each of the variables
Therefore, it will take time where is the number of variables
However, a non-deterministic machine has the power to always guess the right assignment for each of the variable
For instance, if we have three variables there are 8 different assignments for these variables
The non-deterministic machine will always pick the right assignment for each of the variables before moving on to the next variable
We can visualise this in a computation tree. The non-deterministic machine will instantly pick the right path from the root to the leave which makes the formula true (if such an assignment exists)
Therefore, it can solve the problem in linear time (linear in the number of variables)

We don’t know if SAT is in P because we currently do not have any polynomial time algorithm for it. However, no one has been able to prove that there isn’t one.

The HAMPATH problem

HAMPATH: Given a directed graph, does there exist a directed path connecting two given nodes that goes through each node exactly once?

We can non-deterministically generate all sequences of nodes of length equal to the size of the graph, and then check the following:

No repetitions are found in the sequence
The start and end of the sequence coincide with the required nodes
The sequences defines a directed path through the graph

We don’t know if HAMPATH is in P

The TSP(D) problem

TSP(D) refers to the Travelling Salesperson Problem.
For this problem, you have a graph with weighted edges and an input D.
Is there a route leading all around the graph and back to the start with a total distance at most D?

This problem is in NP because the TM would have to ‘guess’ the order of cities to check in, then add up the distances to see if it’s at most D.
We don’t know if TSP(D) is in P.

The 3COL problem

3COL: 3-colourability
Given a graph G, is it possible to colour all the vertices with one of three available colours such that no two adjacent vertices are the same colour?
Here is an example of a solution:

This problem is in NP because we can non-deterministically guess the colour of each vertex; our computation splits into 3 branches of “is this node red?”, “is this node blue?”, or “is this node green?”.
We don’t know if 3COL is in P.

NP = P?

This is one of the biggest problems in computer science. Is NP equal for P? What if every problem in NP has a P algorithm, but we just haven’t discovered them all yet?
I don’t know.

But if P does equal NP, that means:

P = NP = NP-complete
To prove that P = NP-complete, we need to show that all problems in NP can reduce to all problems in P in polynomial time.
So we’ll have some arbitrary problem X in P and some arbitrary problem Y in NP.
We will then reduce Y to X. When we do that, we’ll have proved that all NP problems reduce to all P problems and that P = NP-complete, hence P = NP = NP-complete.

If Y is in NP, that means it’s in P also (because P = NP).
This proof only works if the language of X is non-trivial, meaning it’s not the empty set or sigma star!
If X is not trivial, that means there exists an input such that X will accept it, and an input such that X will reject it.
Our reduction formula will remember this, and simulate an input y on Y. If Y accepts, then the formula returns . If Y rejects, then the formula returns .
This is computable in polynomial time and maps yes/no instances of Y to yes/no instances of X, hence proving that if P = NP, then P = NP = NP-complete.

P ⊂ NP-hard
Following from the P = NP = NP-complete rule we proved earlier...
Some NP-hard problems are NP-complete.
If P = NP-complete, that means some NP-hard problems are also in P.
Therefore, P is a strict subset of NP-hard.

Here is what the problem sets will look like if P != NP and P = NP (just remember that trivial problems are in P and NP, just not NP-complete!):

Presburger arithmetic

Here’s a problem that’s so hard that it’s not even in NP. However, it is decidable (possible to solve).
It goes like this:
You have a set of variables, each with either a ‘for all’ for ‘there exists’.
There is also a formula that links up all the variables together.
Is the formula true?

If that explanation didn’t make much sense, the problems look like this:

The question is: do these formulae hold?
This is so hard that this is at least doubly exponential, which is !

NP-Completeness

Polynomial time reduction

Remember reduction with Turing machines?
We can do the same here; we can reduce a problem to another problem.

If we can reduce problem A to problem B, then any polynomial algorithm for B also applies for A, right?

Almost! The reduction needs to be in polynomial time as well. This is called a polynomial time reduction.
So to solve problem A, we need to reduce it to B in polynomial time, then solve B in polynomial time. Polynomial + polynomial = polynomial, so solving A would also be in polynomial time.
If you’re a visual learner, maybe this diagram will help (it goes from left to right):

Polynomial time reduction is written like this:

More formally, it’s defined like this:
A function is polynomial time computable if there exists a polynomial time deterministic Turing machine that halts on each input with just on its tape.

From this, we can conclude that if A reduces to B in polynomial time, and B is in P, then A is in P.
Additionally, if A reduces to B in polynomial time, and A is not in P, then B is not in P.

3COL to SAT

We can reduce 3COL (3-colourability) to SAT (boolean satisfiability).
We can do this by:

for each node ‘x’ in 3COL...
... create 3 variables in SAT, rx (is ‘x’ red?), gx (is ‘x’ green?), and bx (is ‘x’ blue?).

After we’ve created our variables, we need to define three expressions:

For each vertex ‘x’, the expression:

(rx OR gx OR bx)

must be true (each vertex must be assigned a colour)

For each vertex ‘x’, the expression:

( !(rx AND gx) AND !(rx AND bx) AND !(bx AND gx) )

must be true (each vertex may only have ONE colour)

For each edge ‘x y’, the expression:

!(rx AND ry) AND !(gx AND gy) AND !(bx AND by)

must be true (no two adjacent vertices should have the same colour)

All these expressions are ‘AND’ed together, and that forms the final expression for SAT.

Here’s a basic example:

So our 3COL problem is just two nodes with one edge joining them.
The first block in the SAT expression makes sure that each node has a colour.
The second block in the SAT expression makes sure each node has only ONE colour.
The third block in the SAT expression makes sure that nodes 1 and 2 don’t share the same colour.

Great, but is it in polynomial time?
The number of characters in the output SAT problem is:
Where and . I don’t know about you, but that looks like a polynomial to me.

By applying this method, we can convert any 3COL problem to SAT.
That means if we solve SAT in polynomial time, we can solve 3COL in polynomial time.
Conversely, if we find that 3COL is not in P, then SAT cannot be in P.

NP-hard

What’s NP-hard?
NP-hard is a set of problems where any NP problem can reduce to any NP-hard problem in polynomial time.
This shows that NP-hard problems are, in a sense, hard, because every NP problem can reduce to it.

The Cook-Levin theorem

The Cook-Levin theorem states that any NP problem can reduce to SAT in polynomial time.
That means that SAT is an NP-hard problem, because every NP problem can reduce to it.
This also means that if we can solve SAT in polynomial time, then every NP problem will be solvable in polynomial time, and NP will be equal to P.

To convert an NP problem X into SAT, you need to ‘encode’ the parameters of X into boolean variables and construct an expression for SAT.
There’s a proof for it, but it’s not examinable. Basically, it’s about converting a Turing machine into a boolean expression.

What is NP-completeness

An NP-complete problem is a problem in NP and NP-hard.
SAT is an example of an NP-complete problem, because it can be solved by a non-deterministic Turing machine and every other NP problem can be reduced to it in polynomial time.

To find another NP-complete problem, you just need to have a problem X that is in NP, then find a reduction function from SAT to X.

Just remember: if you solve an NP-complete problem in polynomial time, you’ve solved all NP problems in polynomial time, too, proving P = NP.

Space complexity

First of all, what is space complexity?
The space complexity of a decider is the function where is the maximum number of tape cells the machine scans on any branch of its computation, on any input of length n.
Basically, what’s the maximum number of tape cells we need to solve this problem?

Here’s a theorem for you:
Assume . If a decider runs in space, then it runs in time.

Why?
Well, think of all the different possible combinations the Turing tape could be:

...

There’s lots of possibilities the tape could be!
What are all the possible combinations of the tape?
Well, first of all, we need to know how long the tape is. We already know that! It’s the space complexity of the algorithm, which is f(n). That’s how many tape cells we need to solve the problem.
Let’s just say that our decider has to run through every possible tape configuration to solve the problem.

We need a bit of combinatorics from Foundations here; what are all the possible combinations of this tape?
For each position on the tape, there are ‘m’ possibilities, where ‘m’ is the number of symbols available to put on the tape.
So, we multiply for each tape cell, m * m * m * m ... f(n) times. This yields us:

We still have a ways to go!
We have our base ‘m’. However, with a little mathsy magic, we can generalise this further and make our base ‘2’:
What’s the opposite of exponential? That’s right; it’s logarithms!
We can simply take the logarithm of ‘m’ and convert the base to a ‘2’!
Still remember your GCSE maths? When you have a power to the power, you multiply the powers.
Wait a minute... is just a constant. It could be anything, depending on what ‘m’ is! Look familiar? That’s the constant ‘c’ found in Big-O notation! Therefore, we can introduce Big-O notation to generalise that constant.
As you can see, we have resulted in , thereby proving the link between time and space complexity.

What does this mean?
It means the time complexity is 2 to the power of space complexity.
Remember our assumption: the decider checks all possible combinations of tape. Therefore, this is simply an “upper bound”. In other words, a decider of space complexity linear or above will have at worst exponential time complexity.

PSPACE

PSPACE is the set of all decision problems that can be solved by a deterministic Turing machine in polynomial space.

NPSPACE

NPSPACE is the set of all decision problems that can be solved by a non-deterministic Turing machine in polynomial space.
Anything in PSPACE is in NPSPACE, and anything in NPSPACE is in PSPACE (see Savitch’s Theorem, which is literally just below)

Savitch’s Theorem

Here’s another theorem, called Savitch’s theorem:
For , every space non-deterministic Turing machine has an equivalent space deterministic Turing machine.
This means that PSPACE = NPSPACE.
There’s a proof for this, but it’s not examinable.

EXPTIME

The class EXPTIME is the set of all decision problems that can be solved by a deterministic Turing machine in time for some .

Relationships between complexity classes

P ⊆ PSPACE: a machine that runs in polynomial time can only use polynomial space
NP ⊆ NPSPACE: same reason
Hence, using Savitch’s theorem, NP ⊆ PSPACE
PSPACE ⊆ EXPTIME:

If a decider runs in f(n) space, then it runs in time.

We don’t know which inclusion is strict.
All we know is P EXPTIME.

Note: One can also define EXPSPACE. We know that

EXPTIME ⊆ EXPSPACE
PSPACE EXPSPACE

PSPACE-hard

A problem X is PSPACE-hard if every problem A in PSPACE is polynomial-time reducible to X.
Examples of PSPACE-hard problems are Chess and Go.

Why not space-reducible?
We defined it like that because we want reductions to be easy compared to problems in NP!
If we used polynomial space reduction from A to B, then found an easy solution for B, it could potentially be factorial time complexity for A!
However, because we use time-reducible, it has to be both polynomial time and polynomial space (because P ⊆ PSPACE).

PSPACE-completeness

PSPACE-complete simply means that a problem is both PSPACE and PSPACE-hard.

An example of a PSPACE-complete problem is TQBF.
Let TQBF be the problem of deciding whether a fully quantified boolean formula is true.
Examples of fully quantified boolean formulas:

∀x∀y(x ∨ y)
∀x∃y((x ∨ y) ∧ (¬x ∨ ¬y))

Are there formulas true for all cases? Yes or no? That is the TQBF problem.

TL;DR

Can’t be bothered to read all these notes?
Need a bite-sized form of these notes?
The TL;DR section has your back!

Automata theory: Regular languages

Alphabet: a set of symbols
Language: any string that can be made with an alphabet

DFAs start at the initial state, then goes through transitions (each character in the string) until it reaches an accept state.
DFAs cannot guess; they must discretely be in only one state at a time, and each state must have all possible types of transitions from it

DFAs either accept a string or reject a string

DFAs are deterministic, NFAs are non-deterministic

NFAs can make guesses

NFAs and DFAs have the same power

The set of all strings that a DFA / NFA can accept is called it’s language

εNFAs have epsilon-moves, which are free moves that do not cost any symbols in the string

A language accepted by some DFA or NFA is called ‘regular’

Regular expressions have the same power as DFAs and NFAs

“a*” means 0 or more repetitions of a
“a + b” means a or b
“ab” means a followed by b

You can always convert an NFA/DFA to regex or vice versa

You can prove a language is regular by constructing a DFA / NFA / regex
You can prove a language is not regular by using the pumping lemma