RegEx limitations

These regular expression (RegEx) limitations could be encountered when using the SMAT database:

  • The circumflex "^" and dollar sign "$" metacharacters could incorrectly match in the middle of the message if the target pattern matches at the start or end of a 1024-byte block.
  • The pipe "|" metacharacter might not match correctly when the match happens at the end of a 1024-byte block.

    For example, the pattern 1234|3789 does not match subject string ABC123789 if ABC123 ends at a 1024-byte block and 789 continues into the next byte block. This is because the character 3 happens in both sub-patterns 1234 and 3789 and the match is at a block boundary.

  • Capturing parenthesis and back references (including backward assertions \b or \B) are not supported, and if used could result in an engine crash.
  • \r\n can be used for searching new line on Windows, and \n can be used for UNIX.
  • SMAT encodings that are not supported by the ICU library are treated as UTF-8 strings for regex.
  • When the engine and IDE access the SMAT database simultaneously, this must be from the same system. Otherwise, database corruption could happen.

    This scenario is possible when a .smatdb file is present on a network file system (NFS), is being written to by the engine, and is read from another host by the IDE.

    Multiple IDEs accessing the same.smatdb file from several hosts is not an issue, unless the engine is writing to .smatdb.

Note: The encrypted SMAT database is tied to the site under which it was created. It is important to remember the site name for each .smatdb that was moved out of the site. For example, for archival purposes. All .smatdb files should be stored under the folder named by the site. If the site name is changed, then the .smatdb file created under the old site name does not work. Contact Support before undergoing such a scenario.