Saturday, February 11, 2012

A trick for parsing first-order logic with equality

Suppose your language is some superset of the following minimal subset of first-order logic with
equality[1].

id     ::= [a-z]+
func ::= id
prop ::= id
term ::= func ( term ... ) 
rel    ::= prop (term ... ) | term = term
form ::= rel | true | form & form | form => form 

This is a natural representation of first-order terms, and is seen for example in the TPTP library.  The problem is that this language is not LR(N) for any N.  To see this, consider the following atomic formulas:

p(t) = q
p(t)

Assume t is huge.  A trick to force this into your LALR grammar is to parse the equality branch not as 

  term = term 

but as 

  rel = term.  

If this rule matches, you can convert the LHS back to a term before adding it to the AST.  This works because = is non-associative; a = b = c is not well formed.  (In higher-order logic this is not the case, but that's another story.)  Thus you know that the LHS has the form 

  prop (term ...) 

which can easily be converted to 

  func (term ...).

[1] This trick obviously holds for any infix predicate symbol.  Equality is just the most common.