Tools for Checking the Counter-(un)ambiguity of Regexes

In Java, we have implemented a program that can provide a static analysis over regexes to determine whether a regex is counter-(un)ambiguous, where counter-(un)ambiguity is defined in here.

We packaged this program as a light-weight Java library, which requires Java's OpenJDK with version >= 17. The library can be download here.

Configuration

This library requires a Java OpenJDK with version >= 17.

To use the library with the command line shown on this webpage, the java binary needs to be added to the source path. Type java --version on the command line to check whether the binary is added. Below is the example output if the configuration is correct.


              openjdk 17.0.2 2022-01-18
              OpenJDK Runtime Environment (build 17.0.2+8-86)
              OpenJDK 64-Bit Server VM (build 17.0.2+8-86, mixed mode, sharing)
            
Usage of the Java Library

Below shows the usage of our Java library (checker.jar):


              java -jar checker.jar --cambiguity {regex}
              java -jar checker.jar --cambiguity --witness {regex}
              java -jar checker.jar --nca {regex}
            

Here, {regex} is the regular expression used as the input for static analysis. Our Java library supports regexes with a Posix-style syntax.

Usage 1.

First of all, java -jar checker.jar --cambiguity {regex} checks the counter-ambiguity of the regular expression.

Example 1.1. For example, we can use the command below to check the counter-ambiguity of regex a{3,}:


              java -jar checker.jar --cambiguity "a{3,}"
            

The result will be:


              regex is parsed as: (a){3}(a)*
              is counter-ambiguous = false
              token pairs explored = 4
              running time = xxx nanoseconds
            

Example 1.2. If we check regex .*[a-d]{3,5}, the command will be:


              java -jar checker.jar --cambiguity ".*[a-d]{3,5}"
            

The result will be:


              regex is parsed as: (.)*([abcd]){3,5}
              is counter-ambiguous = true
              token pairs explored = 4
              running time = xxx nanoseconds
            

Example 1.3. If we check regex .*(a[^a]{1000}|b[^b]{1000}) by:


              java -jar checker.jar --cambiguity ".*(a[^a]{1000}|b[^b]{1000})"
            

The result will be:


              regex is parsed as: (.)*(a([^a]){1000}|b([^b]){1000})
              is counter-ambiguous = false
              token pairs explored = 4008
              running time = xxx nanoseconds
            

Usage 2.

java -jar checker.jar --cambiguity --witness {regex} also checks whether a regex is counter-ambiguous. However, instead of simply returning a boolean result, it will return a string that is a witness of counter-amgibuity -- by executing on such a string, the NCA of the regex will have a state that has more than two counter values.

Example 2.1. For example:


              java -jar checker.jar --cambiguity --witness ".*ab[ab]{5}"
            

The result will be:


              regex is parsed as: (.)*ab([ab]){5}
              is counter-ambiguous = true
              witness = ababa
            

Example 2.2. When checking counter-unambiguous regex:


              java -jar checker.jar --cambiguity --witness ".*ab{1000}"
            

The result will be:


              regex is parsed as: (.)*a(b){1000}
              is counter-ambiguous = false
              no witness.
            

Usage 3.

Finally, java -jar checker.jar --nca {regex} prints the NCA constructed from the regex.

Example 3.1. For example:


              java -jar checker.jar --nca "a{5}(b{1,6}|c{6,})d"
            

The result will be:


              regex is parsed as: 
              (a){5}((b){1,6}|(c){6}(c)*)d
              >>>>>>>>>>>>>>>> NCA is below >>>>>>>>>>>>>>>>
              states: [0, 1, 2, 3, 4, 5]
              initial States: [0]
              final States: [5]
              ===== below are transitions ======
              (0)  ----  a, true / cnt := 0  --->  (1)
              (1)  ----  a, cnt < 4 / cnt := cnt + 1  --->  (1)
              (1)  ----  b, cnt == 4 / cnt := 0  --->  (2)
              (1)  ----  c, cnt == 4 / cnt := 0  --->  (3)
              (2)  ----  d, 0 <= cnt < 6 / null  --->  (5)
              (2)  ----  b, cnt < 5 / cnt := cnt + 1  --->  (2)
              (3)  ----  d, cnt == 5 / null  --->  (5)
              (3)  ----  c, cnt < 5 / cnt := cnt + 1  --->  (3)
              (3)  ----  c, cnt == 5 / null  --->  (4)
              (4)  ----  d, null / null  --->  (5)
              (4)  ----  c, null / null  --->  (4)