pcre_exec()的函数定义是
int pcre_exec(const pcre *code, const pcre_extra *extra, const char *subject, int length, int startoffset, int options, int *ovector, int ovecsize); int rc; int ovector[30]; rc = pcre_exec( re, /* pcre_compile()的结果 */ NULL, /* pcre_study()的结果,study可以加速算法,没有则设为NULL */ "some string", /* 匹配的字符串subject string,其中可以包含\0 */ 11, /* 上述字符串长度,因为上面字符串可以包含\0,所以长度在这个地方指出 */ 0, /* subject string开始匹配的offset,看api,貌似pcre不支持形如/g这样的匹配全部的选项,需要通过循环+调整这个偏移量,自己来实现这个功能 */ 0, /* default options */ ovector, /* 匹配结果的数组*/ 30); /* ovector的数组长度 */
The option bits are:
PCRE_ANCHORED Force pattern anchoring PCRE_AUTO_CALLOUT Compile automatic callouts PCRE_BSR_ANYCRLF \R matches only CR, LF, or CRLF PCRE_BSR_UNICODE \R matches all Unicode line endings PCRE_CASELESS Do caseless matching PCRE_DOLLAR_ENDONLY $ not to match newline at end PCRE_DOTALL . matches anything including NL PCRE_DUPNAMES Allow duplicate names for subpatterns PCRE_EXTENDED Ignore white space and # comments PCRE_EXTRA PCRE extra features (not much use currently) PCRE_FIRSTLINE Force matching to be before newline PCRE_JAVASCRIPT_COMPAT JavaScript compatibility PCRE_MULTILINE ^ and $ match newlines within data PCRE_NEVER_UTF Lock out UTF, e.g. via (*UTF) PCRE_NEWLINE_ANY Recognize any Unicode newline sequence PCRE_NEWLINE_ANYCRLF Recognize CR, LF, and CRLF as newline sequences PCRE_NEWLINE_CR Set CR as the newline sequence PCRE_NEWLINE_CRLF Set CRLF as the newline sequence PCRE_NEWLINE_LF Set LF as the newline sequence PCRE_NO_AUTO_CAPTURE Disable numbered capturing paren- theses (named ones available) PCRE_NO_AUTO_POSSESS Disable auto-possessification PCRE_NO_START_OPTIMIZE Disable match-time start optimizations PCRE_NO_UTF16_CHECK Do not check the pattern for UTF-16 validity (only relevant if PCRE_UTF16 is set) PCRE_NO_UTF32_CHECK Do not check the pattern for UTF-32 validity (only relevant if PCRE_UTF32 is set) PCRE_NO_UTF8_CHECK Do not check the pattern for UTF-8 validity (only relevant if PCRE_UTF8 is set) PCRE_UCP Use Unicode properties for \d, \w, etc. PCRE_UNGREEDY Invert greediness of quantifiers PCRE_UTF16 Run in pcre16_compile() UTF-16 mode PCRE_UTF32 Run in pcre32_compile() UTF-32 mode PCRE_UTF8 Run in pcre_compile() UTF-8 mode
返回值rc:
当rc<0表示匹配发生error,==0,没有匹配上,>0返回匹配到的元素数量
ovector是一个int型数组,其长度必须设定为3的倍数,若为3n,则最多返回n个元素,显然有rc<=n
其中ovector[0],[1]为整个匹配上的字符串的首尾偏移;其他[2*i][2*i+1]为对应第i个匹配上的子串的偏移,子串意思是正则表达式中被第i个()捕获的字符串,计数貌似是按照(出现的顺序。
如正则式/abc((.*)cf(exec))test/,在目标字符串11111abcword1cfexectest11111中匹配,将返回4个元素,其首尾偏移占用ovector的0~7位
元素0=abcword1cfexectest,
元素1=word1cfexec
元素2=word1
元素3=exec
ovector的最后1/3个空间,即[2n~3n-1],貌似为pcre正则匹配算法预留,不返回结果
参考资料:http://swoolley.org/man.cgi/3/pcreapi
微信
支付宝