A century of parentheses languages with some amazing returns

Prof. Stefano Crespi Reghizzi

Professore emerito del Dipartimento di Elettronica, Informazione e Bioingegneria del Politecnico di Milano

Parentheses have appeared in algebraic writing in the XV-XVI century. Erasmus of Rotterdam calls them lunulae. Earlier and until the XVIII century, overline vinculum had been used for grouping literals into a term. Abstracting from the contents of parenthesized text, Walter von Dyck’s (1856-1934) name has been given to the formal language every computer science student knows, which is algebraically defined by cancellation rules. If Noam Chomsky’s Context-Free grammar [1956] is used to define the Dyck language, to check if a text is well-parenthesized we can use a push-down machine, i.e., a finite-state device equipped with an auxiliary Last-In-First-Out memory (a push-down stack of cells). A deeper relation between Dyck languages and context-free grammars is stated by the Chomsky-Schutzenberger theorem: each context-free language is obtained using a many-parentheses Dyck language that outlines the syntactic constructs, a finite-state language that specifies which syntax constructs may be adjacent in a sentence, and a letter-by-letter translation. What happens if, instead of a LIFO, we use a first-in-first-out memory, i.e., a queue of cells? We obtain the anti-Dyck language, that is generated by pretty unusual grammars that operate in breadth-first rather than depth-first order. Heavily nested parentheses are confusing, and to avoid them altogether logician Jan Lukasiewicz introduced the Polish notation in 1924, quite convenient inside computers. A more readable way to save parentheses is to assign priorities to, say, arithmetic operators. An outgrowth of the idea are the operator-precedence grammars invented by Robert Floyd in 1963, which allow very fast parallel syntax analysis on multi-core computers. Natural languages rarely exhibit deeply nested structures although in principle possible; good writers moderately use parentheticals since they depart from the main subject. Quite the contrary in artificial languages, such as Lisp, and in marked-up texts such as XML documents. Parentheses languages, in the form generated by Robert McNaughton grammars (1967), have algebraic and decidability properties similar to the regular or finite-state languages. For instance, one can check if two languages are included one into the other. Preserving the nice algebraic and algorithmic properties of parentheses languages, the input-driven languages have recently caught much interest among computer scientists, and are proposed as a model of systems where nested events, such as call or return to subroutine, may be disrupted by interrupts or exceptional events, which leave some extra parentheses unmatched. Parentheses structures prove to be as essential in artificial languages as they are unwelcome in natural communication.