diff options
| author | Tyge Løvset <[email protected]> | 2023-01-02 22:36:51 +0100 |
|---|---|---|
| committer | Tyge Løvset <[email protected]> | 2023-01-02 22:36:51 +0100 |
| commit | 16e004c62f8d8d502152a85b2ffd384a1c91a470 (patch) | |
| tree | 368ed5b319c18f88d0ae2e8c291b0c3889ac72c5 /docs/cregex_api.md | |
| parent | 364b8833cb5d91bbe2c7640869912cde4de12846 (diff) | |
| download | STC-modified-16e004c62f8d8d502152a85b2ffd384a1c91a470.tar.gz STC-modified-16e004c62f8d8d502152a85b2ffd384a1c91a470.zip | |
Replaced c_STATIC_ASSERT() which works for C99 (also multiple on same line).
Some regex updates.
Diffstat (limited to 'docs/cregex_api.md')
| -rw-r--r-- | docs/cregex_api.md | 20 |
1 files changed, 10 insertions, 10 deletions
diff --git a/docs/cregex_api.md b/docs/cregex_api.md index 91868235..20cb5d6d 100644 --- a/docs/cregex_api.md +++ b/docs/cregex_api.md @@ -177,17 +177,11 @@ For reference, **cregex** uses the following files: | \B | Not UTF8 word boundary | * | | \Q | Start literal input mode | * | | \E | End literal input mode | * | -| (?i) (?-i) | Ignore case on/off (override global) | * | -| (?s) (?-s) | Dot matches newline on/off (override global) | * | +| (?i) (?-i) | Ignore case on/off (override CREG_C_ICASE) | * | +| (?s) (?-s) | Dot matches newline on/off (override CREG_C_DOTALL) | * | | \n \t \r | Match UTF8 newline, tab, carriage return | | | \d \s \w | Match UTF8 digit, whitespace, alphanumeric character | | | \D \S \W | Do not match the groups described above | | -| \p{Alpha} | Match UTF8 alpha (L& Ll) | * | -| \p{Alnum} | Match UTF8 alphanumeric (Lu Ll Nd Nl) | * | -| \p{Blank} | Match UTF8 blank (Zs \t) | * | -| \p{Space} | Match UTF8 whitespace: (Zs \t\r\n\v\f] | * | -| \p{Word} | Match UTF8 word character: (Alnum Pc) | * | -| \p{XDigit} | Match hex number | * | | \p{Cc} or \p{Cntrl} | Match UTF8 control char | * | | \p{Ll} or \p{Lower} | Match UTF8 lowercase letter | * | | \p{Lu} or \p{Upper} | Match UTF8 uppercase letter | * | @@ -203,6 +197,12 @@ For reference, **cregex** uses the following files: | \p{Zl} | Match UTF8 line separator | * | | \p{Zp} | Match UTF8 paragraph separator | * | | \p{Zs} | Match UTF8 space separator | * | +| \p{Alpha} | Match UTF8 alphabetic letter (L& Nl) | * | +| \p{Alnum} | Match UTF8 alpha-numeric letter (L& Nl Nd) | * | +| \p{Blank} | Match UTF8 blank (Zs \t) | * | +| \p{Space} | Match UTF8 whitespace: (Zs \t\r\n\v\f] | * | +| \p{Word} | Match UTF8 word character: (Alnum Pc) | * | +| \p{XDigit} | Match hex number | * | | \P{***Class***} | Do not match the classes described above | * | | [:alnum:] [:alpha:] [:ascii:] | Match ASCII character class. NB: only to be used inside [] brackets | * | | [:blank:] [:cntrl:] [:digit:] | " | * | @@ -210,7 +210,7 @@ For reference, **cregex** uses the following files: | [:punct:] [:space:] [:upper:] | " | * | | [:xdigit:] [:word:] | " | * | | [:^***class***:] | Match character not in the ASCII class | * | -| $***n*** | *n*-th substitution backreference to capture group. ***n*** in 0-9. $0 is the entire match. | * | +| $***n*** | *n*-th replace backreference to capture group. ***n*** in 0-9. $0 is the entire match. | * | | $***nn;*** | As above, but can handle ***nn*** < CREG_MAX_CAPTURES. | * | ## Limitations @@ -219,6 +219,6 @@ The main goal of **cregex** is to be small and fast with limited but useful unic - In order to limit table sizes, most general UTF8 character classes are missing, like \p{L}, \p{S}, and all specific scripts like \p{Greek} etc. Some/all of these may be added in the future as an alternative source file with unicode tables to link with. - {n, m} syntax for repeating previous token min-max times. - Non-capturing groups -- Lookaround and backreferences +- Lookaround and backreferences (cannot be implemented efficiently). If you need a more feature complete, but bigger library, use [RE2 with C-wrapper](https://github.com/google/re2) which uses the same type of regex engine as **cregex**, or use [PCRE2](https://www.pcre.org/). |
