McNaughton-Yamada-Thompson algorithm

Description

The McNaughton-Yamada-Thompson algorithm (also known as Thompson algorithm) is designed to transform a regular expression into a Non-Deterministic Finite Automaton (NDFA) that recognizes the language represented by the regular expression.

The basic idea of the algorithm is to construct the automaton on an incremental basis that recognizes the language represented by the regular expression.

First, the automata are constructed for the sub-expressions, then, using epsilon transitions (ε), these automata are merged according to the operation that combines the regular expressions. The automaton corresponding to a regular expression with a single symbol only has the initial and the final state, and the transition that connects them is labeled with the symbol that constitutes the regular expression (epsilon, used for representing the empty string, or a symbol from the alphabet).

Algorithm

INPUT: A regular expression r on a Σ alphabet.

OUTPUT: A NDFA N that accepts L(r).

METHOD:

Build a NDFA for each base regular expresion.
Connect all the NDFA each other according to induction rules.

BASE:

Empty expression: Put an epsilon transition.

Symbol: It is the same that empty transition but, in this case, epsilon is changed by the proper symbol.

INDUCTION: Suppose that N(s) and N(t) are NDFA for regular expressions s and t respectively.

Union expression: In this rule, the children subexpressions of the automaton are combined, as a result they share the same initial and final states. Concatenation expression: In this rule, the children subexpressions of the automaton are combined, as a result the final node of the first expression is the initial node of the second. Closure expression: In the closure expression two new states are created: initial and final. An epsilon transition is made from the initial to the final state (because the empty string is in the regular expression of the closure) and other more from the final state to the initial one (to make the repetition possible).

Example

The expression a.b|c* is used as an example. First of all, the tree representation of the regular expression is shown.

It performs a post-order tree traversal, creating the automata of the following symbols a, b, and c separately.

The next node is a concatenation one, it links its children: expressions a and b, creating the automaton of the expression a.b.

The father of the c node creates the expression c* with it.

Finally, the union node links its children: a.b and c^*, creating the result automaton a.b|c^*.

This procedure creates a nondeterministic finite automaton (NDFA). If you want a deterministic one, you can apply the transformation method that is called powerset construction.

JSON

To obtain the automaton, in JSON format, just use the address /thompsonJson with get expr parameter, where expr is the regular expression that you want.

For example: a|b.c*

Instructions

Characters to use in precedence order:

Parenthesis: ( )
Closure: *
Concatenation .
Union: |
Symbols: lowercase letters [a-z]
Epsilon: ε, E, €

McNaughton-Yamada-Thompson algorithm

Description

Algorithm

Example

JSON

Instructions

Examples

Try a regular expression

Cited this software as:

Authors