McNaughton-Yamada-Thompson algorithm
The McNaughton-Yamada-Thompson algorithm (also known as Thompson algorithm) is designed to transform a regular expression into a Non-Deterministic Finite Automaton (NDFA) that recognizes the language represented by the regular expression.
The basic idea of the algorithm is to construct the automaton on an incremental basis that recognizes the language represented by the regular expression.
First, the automata are constructed for the sub-expressions, then, using epsilon transitions (ε), these automata are merged according to the operation that combines the regular expressions. The automaton corresponding to a regular expression with a single symbol only has the initial and the final state, and the transition that connects them is labeled with the symbol that constitutes the regular expression (epsilon, used for representing the empty string, or a symbol from the alphabet).
INPUT: A regular expression r on a Σ alphabet.
OUTPUT: A NDFA N that accepts L(r).
METHOD:
- Build a NDFA for each base regular expresion.
- Connect all the NDFA each other according to induction rules.
BASE:
Empty expression: Put an epsilon transition. Symbol: It is the same that empty transition but, in this case, epsilon is changed by the proper symbol.INDUCTION: Suppose that N(s) and N(t) are NDFA for regular expressions s and t respectively.
Union expression: In this rule, the children subexpressions of the automaton are combined, as a result they share the same initial and final states. Concatenation expression: In this rule, the children subexpressions of the automaton are combined, as a result the final node of the first expression is the initial node of the second. Closure expression: In the closure expression two new states are created: initial and final. An epsilon transition is made from the initial to the final state (because the empty string is in the regular expression of the closure) and other more from the final state to the initial one (to make the repetition possible).
Instructions
- Parenthesis: ( )
- Closure: *
- Concatenation .
- Union: |
- Symbols: lowercase letters [a-z]
- Epsilon: ε, E, €