Context-independent derivative of regexes with lookarounds

Léopold Gravier

Mines Paris - PSL




Abstract: Lookarounds are a convenient extension to regexes that allows to add a constraint on anterior or posterior parts of the input. The feature is traditionally reserved to backtracking engines, which face severe worst-case performance issues. This motivates a research effort to integrate lookarounds in non-backtracking engines. The state of the art is currently to use oracles that tell whether the lookaround conditions are met.

In this talk, I will present an alternative method for matching lookarounds that relies on regex-derivatives instead of oracles. This means that it transforms the regex to see what it can still accept after removing some letters from the input. I will first present a new semantics for language-derivatives and exhibit some of its properties. I will show that it induces an operation on regexes, but it is computationally expensive in the worst case.