McNaughton-Yamada-Thompson algorithm

The McNaughton-Yamada-Thompson algorithm (also known as Thompson algorithm) is designed to transform a regular expression into a Non-Deterministic Finite Automaton (NDFA) that recognizes the language represented by the regular expression.

The basic idea of the algorithm is to construct the automaton on an incremental basis that recognizes the language represented by the regular expression.

First, the automata are constructed for the sub-expressions, then, using epsilon transitions (ε), these automata are merged according to the operation that combines the regular expressions. The automaton corresponding to a regular expression with a single symbol only has the initial and the final state, and the transition that connects them is labeled with the symbol that constitutes the regular expression (epsilon, used for representing the empty string, or a symbol from the alphabet).

INPUT: A regular expression r on a Σ alphabet.

OUTPUT: A NDFA N that accepts L(r).

METHOD:

  1. Build a NDFA for each base regular expresion.
  2. Connect all the NDFA each other according to induction rules.

BASE:

Empty expression: Put an epsilon transition. Autómata para una expresión vacía Symbol: It is the same that empty transition but, in this case, epsilon is changed by the proper symbol. Autómata para un símbolo (a)

INDUCTION: Suppose that N(s) and N(t) are NDFA for regular expressions s and t respectively.

Union expression: In this rule, the children subexpressions of the automaton are combined, as a result they share the same initial and final states. Autómata para una unión (s|t) Concatenation expression: In this rule, the children subexpressions of the automaton are combined, as a result the final node of the first expression is the initial node of the second. Autómata de una concatenación (s.t) Closure expression: In the closure expression two new states are created: initial and final. An epsilon transition is made from the initial to the final state (because the empty string is in the regular expression of the closure) and other more from the final state to the initial one (to make the repetition possible). Autómata para la expresión de cierre

The expression a.b|c* is used as an example. First of all, the tree representation of the regular expression is shown. Árbol del autómata a.b|c* It performs a post-order tree traversal, creating the automata of the following symbols a, b, and c separately. Autómata de la expresión a The next node is a concatenation one, it links its children: expressions a and b, creating the automaton of the expression a.b. Autómata de la expresión a.b The father of the c node creates the expression c* with it. Autómata de la expresión c* Finally, the union node links its children: a.b and c^*, creating the result automaton a.b|c^*. Autómata de la expresión a.b|c* This procedure creates a nondeterministic finite automaton (NDFA). If you want a deterministic one, you can apply the transformation method that is called powerset construction.

To obtain the automaton, in JSON format, just use the address /thompsonJson with get expr parameter, where expr is the regular expression that you want.

For example: a|b.c*

Instructions

Characters to use in precedence order:
  • Parenthesis: ( )
  • Closure: *
  • Concatenation .
  • Union: |
  • Symbols: lowercase letters [a-z]
  • Epsilon: ε, E, €

Try a regular expression