Automata theory

Automata theory is a branch of computer science closely related to formal language theory and automata. It focuses on the study of abstract machines and automata, as well as, the problems that can be solved by using them. Formal learning of the fundamentals of automata can be a complex subject for students, due to its highly theoretical background.

Seshat works with two main concepts of automaton theory: regular expression and finite automata.

There are several algorithms to build a finite automaton given a regular expression and Seshat implements several of these:

  • Regular expression to Deterministic Finite Automaton: Go
  • Deterministic Finite Automaton to Non-deterministic Finite Automaton: Go
  • Regular expression to Deterministic Finite Automaton: Go

It is also able to minimize finite automata both deterministic and non-deterministic: Go

Regular expressions are a mechanism for describing special types of formal languages, known as regular languages. A regular language is any language with a finite set of words, or obtained from other regular languages using the operations of: union, concatenation, and Kleene closure (zero or more concatenations of language strings). The syntax of a regular expression is a way of denoting these three operations. The union of languages is represented with the character '|', the closure with '*', and no special operator is needed for the concatenation (although some authors use '⋅'). For example, the regular expression: (a|b)(cc)*c represents the set of all the words that start with either a or b followed by an odd number of letters c (the two c letters can be repeated an arbitrary number of times, including zero times, and then an additional letter c is concatenated).

In the context of compiler construction, regular expressions are used to define the lexical components that are recognized in the lexical analysis phase. Regular expressions are used to define identifiers, floating-point numbers, quoted strings, and all the words that, in a syntactic analysis phase, are combined to construct the sentences of a programming language.

A finite automaton is a recognition device that is characterized by having a finite memory, represented by the activation of its states, and a function that defines how the activation of each state changes as the symbols are read from the input. Some states are final and there is a single special state active at the beginning of the recognition process (i.e. there is only one initial state). If a word evolves into the configuration of a final active state, the word has been recognized by an automaton.

Finite automata can be constructed from regular expressions and can then be used to recognize the components described by regular expressions. The figure below shows a graphic image of two automata, both recognizing the language represented by the regular expression (a|b)(cc)*c. The one at the left is a Deterministic Finite Automaton (DFA), with only one single current state (i.e. only one state is active at the same time), and given its current state and the next input symbol, only one subsequent state is possible. The one at the right is a Non-deterministic Finite Automaton (NFA), so it can be in several states simultaneously, and the states can change without consuming any input symbol (arcs labeled with ε).

Automaton examples