Regex


MMBasic Regex Syntax used in INSTR and LINSTR functions.

The alternate forms of the INSTR() and LINSTR() functions can take a regular expression as the search pattern in the Picomites and some other platforms.
The alternate form of the commands are:

 INSTR([start],text$, search$ [,size])
 LINSTR(text%(),search$ [,start] [,size]

In both cases specifying the size parameter causes the firmware to interpret the search string as a regular expression. The size parameter is a floating-point variable that is used by the firmware to return the size of a matching string. If the variable doesn't exist it is created. As implemented in MMBasic you need to apply the returned start and size values to the MID$ function to extract the matched string. e.g.
IF start THEN match$=MID$(text$,start,size) ELSE match$=”” ENDIF

The library used for the regular expressions “implements POSIX draft P1003.2/D11.2, except for some of the internationalization features”. 
See  http://mirror.math.princeton.edu/pub/oldlinux/Linux.old/Ref-docs/POSIX/all.pdf section 2.8 for details of constructing Regular Expressions or other online tutorials if you are not familiar with them.

The syntax of regular expressions can vary slightly with the various implementations. This document is a summary of the syntax and supported operations used in the MMBasic implementation.

Anchors
^    Start of string
$    End of string
\b  Word Boundary
\B  Not a word boundary
\<  Start of word
\>  End of  word

Qualifiers
*          0 or more (not escaped)
\+         1 or more
\?          0 or 1
\{3\}     Exactly 3
\{3,\}   3 or more
\{3,5\} 3,4 or 5

Groups and Ranges
(a\|b)      a or b
\(…\)     group
[abc]      Range (a or b or c)
[^abc]    Not (a or b or c]
[a-q]       lower case letters a to q
[A-Q]    upper case letters A to Q
[0-7]      Digits from 0 to 7

Escapes Required to Match Normal Characters
\^     to match ^   (caret)
\.      to match .   (dot)
\*     to match *  (asterix)
\$     to match $   (dollar)
\[      to match  [   (left bracket)
\\      to match \   (backslash)

Escapes with Special Functions
\+  See Quantifiers
\?  See Quantifiers
\{  See Quantifiers
\}  See Quantifiers
\|   See Groups and Ranges
\(   See Groups and Ranges
\)   See Groups and Ranges
\w  See Character Classes

Character Classes
\w              digits,letters and _                                              
[:word:]     digits,letters and _                                              
[:upper:]    Upper case letters_                                              
[:lower:]    Lower case letters_                                              
[:alpha:]    All  letters                                            
[:alnum:]   Digits and  letters                                            
[:digit:]      Digits
[:xdigit:]    Hexidecimal digits
[:punct:]    Puntuation
[:blank:]    Space and tab
[:space:]    Blank charaters
[:cntrl:]      Control charaters
[:graph:]    Printed characters
[:print:]      Printed chars and spaces

Example expression to match an IP Address which is contained within a word boundary.
    "\<[0-9]\{1,3\}\.[0-9]\{1,3\}\.[0-9]\{1,3\}\.[0-9]\{1,3\}\>"

 

Last edited: 04 October, 2023