技术: C++中正则表达式的支持

C++11包含了几乎所有的Boost.Regex类和标志, 只要包含 <regex> 就是可以在std::下找到它们.
不过我还是喜欢/习惯于使用boost库(), 本文了演示多个例子.

引子

boost库提供了对正则表达式的支持(libboost_regex), 并且支持三种:

1. perl正则表达式(默认不忽略大小写)
2. posix基本正则表达式(默认不区分大小写)
3. posix扩展正则表达式(默认不区分大小写)

根据boost::regex::flag_type参数的不同, 选择不同的种类.

每次使用正则表达式, 都要先构造一个对象boost::regex, 之后根据需求进行匹配boost::regex_match(string, boost::regex)或者其他一些boost::regex_*函数.

即主要涉及两个类:

1. boost::regex   //用于定义一个正则表达式对象
2. boost::smatch  //用于保存匹配的结果(全串或者子串的结果)

具体可以参考一下文档下列的书籍:
1.《Beyond the C++ Standard Library: An Introduction to Boost》
2.《Boost C++ Application Development Cookbook》
3.《boost_1_56_pdf》regex部分

btw: 下面所有代码编译的时候记得链接boost_regex库(-lboost_regex).

正文

详细讲解

boost::regex

这就是代表正则表达式对象的类, 用来存储你写的正则表达式(如果你写的正则表达式不符合规范的话,可能会报错).
(但是又不会像posix c中对于正则表达式的支持, 需要先编译regcomp(), 之后才能进行匹配regexec())

创建了一个存储正则表达式的对象:(一般要加上const修饰)

1. boost::regex e(string_pattern, flag);
2. boost::regex e(string_pattern);

其中flag取不同的值, 可以设置不同的种类.

1. boost::regex::perl
2. boost::regex::perl | boost::regex::icase
3. boost::regex::extended
4. boost::regex::extended | boost::regex::icase
5. boost::regex::basic
6. boost::regex::basic | boost::regex::icase

还有一个取值, boost::regex::no_exception, 当string_pattern不符合正则表达式语法时,一般创建会失败,默认抛出异常.
但是如果你设置了上面的标志, 可以依赖”boost::regex.status()”进行判断.
boost::regex e(string_pattern, flag);
if(e.status){
//error happened
}

boost::regex_match

boost::regex e(string_pattern);
bool matched = boost::regex_match(target_string, e);

相对完整的demo:

1
2
3
4
5
6
7
8
9
10
11
12
13
#include <boost/regex.hpp>
#include <locale>
#include <iostream>

int main(void)
{
std::locale::global(std::locale("German"));
std::string s = "Boris Schaling";
boost::regex expr("\\w+\\s\\w+");
std::cout << boost::regex_match(s, expr) << std::endl;

return 0;
}

boost::regex_research

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
#include <boost/regex.hpp>
#include <locale>
#include <iostream>

int main()
{
std::locale::global(std::locale("German"));
std::string s = "Boris Schaling";
boost::regex expr("(\\w+)\\s(\\w+)");
boost::smatch what;
if (boost::regex_search(s, what, expr)){
std::cout << what[0] << std::endl;
std::cout << what[1] << "-----" << what[2] << std::endl;
}
}

boost::regex_replace

1
2
3
4
5
6
7
8
9
10
11
12
#include <boost/regex.hpp>
#include <locale>
#include <iostream>

int main()
{
std::locale::global(std::locale("German"));
std::string s = " Boris Schaling ";
boost::regex expr("\\s");
std::string fmt("_");
std::cout << boost::regex_replace(s, expr, fmt) << std::endl;
}

以及

1
2
3
4
5
6
7
8
9
10
11
12
#include <boost/regex.hpp>
#include <locale>
#include <iostream>

int main()
{
std::locale::global(std::locale("German"));
std::string s = "Boris Schaling";
boost::regex expr("(\\w+)\\s(\\w+)");
std::string fmt("\\2 \\1");
std::cout << boost::regex_replace(s, expr, fmt) << std::endl;
}
1
2
3
4
5
6
7
8
9
10
11
12
13
14
#include <boost/regex.hpp>
#include <locale>
#include <iostream>

int main(void)
{
std::locale::global(std::locale("German"));
std::string s = "Boris Schaling";
boost::regex expr("(\\w+)\\s(\\w+)");
std::string fmt("\\2 \\1");
std::cout << boost::regex_replace(s, expr, fmt,
boost::regex_constants::format_literal) << std::endl;
return 0;
}

规范代码

boost::regex_match

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
#include <boost/regex.hpp>
#include <iostream>

int main(void) {
std::cout
<< "Available regex syntaxes:\n"
<< "\t[0] Perl\n"
<< "\t[1] Perl case insensitive\n"
<< "\t[2] POSIX extended\n"
<< "\t[3] POSIX extended case insensitive\n"
<< "\t[4] POSIX basic\n"
<< "\t[5] POSIX basic case insensitive\n"
<< "Choose regex syntax: ";

boost::regex::flag_type flag;
switch (std::cin.get()) {
case '0': flag = boost::regex::perl;
break;

case '1': flag = boost::regex::perl|boost::regex::icase;
break;

case '2': flag = boost::regex::extended;
break;

case '3': flag = boost::regex::extended|boost::regex::icase;
break;

case '4': flag = boost::regex::basic;
break;

case '5': flag = boost::regex::basic|boost::regex::icase;
break;
default:
std::cout << "Inccorect number of regex syntax. Exiting... \n";
return -1;
}
// Disabling exceptions
flag |= boost::regex::no_except;

// Restoring std::cin
std::cin.ignore();
std::cin.clear();

std::string regex, str;
do {
std::cout << "Input regex: ";
if (!std::getline(std::cin, regex) || regex.empty()) {
return 0;
}

// Without `boost::regex::no_except`flag this
// constructor may throw
const boost::regex e(regex, flag);
if (e.status()) {
std::cout << "Incorrect regex pattern!\n";
continue;
}

std::cout << "String to match: ";
while (std::getline(std::cin, str) && !str.empty()) {
bool matched = boost::regex_match(str, e);
std::cout << (matched ? "MATCH\n" : "DOES NOT MATCH\n");
std::cout << "String to match: ";
} // end of `while (std::getline(std::cin, str))`

std::cout << '\n';

// Restoring std::cin
std::cin.ignore();
std::cin.clear();
} while (1);
} // int main()

boost::regex_replace

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
#include <boost/regex.hpp>
#include <iostream>

int main(void) {
std::cout
<< "Available regex syntaxes:\n"
<< "\t[0] Perl\n"
<< "\t[1] Perl case insensitive\n"
<< "\t[2] POSIX extended\n"
<< "\t[3] POSIX extended case insensitive\n"
<< "\t[4] POSIX basic\n"
<< "\t[5] POSIX basic case insensitive\n"
<< "Choose regex syntax: ";

boost::regex::flag_type flag;
switch (std::cin.get()) {
case '0': flag = boost::regex::perl;
break;

case '1': flag = boost::regex::perl|boost::regex::icase;
break;

case '2': flag = boost::regex::extended;
break;

case '3': flag = boost::regex::extended|boost::regex::icase;
break;

case '4': flag = boost::regex::basic;
break;

case '5': flag = boost::regex::basic|boost::regex::icase;
break;
default:
std::cout << "Inccorect number of regex syntax. Exiting... \n";
return -1;
}
// Disabling exceptions
flag |= boost::regex::no_except;

// Restoring std::cin
std::cin.ignore();
std::cin.clear();

std::string regex, str, replace_string;
do {
std::cout << "\nInput regex: ";
if (!std::getline(std::cin, regex)) {
return 0;
}

// Without `boost::regex::no_except`flag this
// constructor may throw
const boost::regex e(regex, flag);
if (e.status()) {
std::cout << "Incorrect regex pattern!\n";
continue;
}

std::cout << "String to match: ";
while (std::getline(std::cin, str) && !str.empty()) {
boost::smatch results;
bool matched = regex_search(str, results, e);
if (matched) {
std::cout << "MATCH: ";
std::copy(
results.begin() + 1,
results.end(),
std::ostream_iterator<std::string>( std::cout, ", ")
);

std::cout << "\nReplace pattern: ";
if (
std::getline(std::cin, replace_string)
&& !replace_string.empty())
{
// std::cout << "RESULT: "
// << boost::regex_replace(str, e, replace_string);
std::cout << "RESULT: " << results.format(replace_string);
} else {
// Restoring std::cin
std::cin.ignore();
std::cin.clear();
}
} else { // `if (matched) `
std::cout << "DOES NOT MATCH";
}

std::cout << "\nString to match: ";
} // end of `while (std::getline(std::cin, str))`

std::cout << '\n';

// Restoring std::cin
std::cin.ignore();
std::cin.clear();
} while (1);

return 0;
} // int main()

尾巴

看到啦, 上面用起来还是挺舒服的.
当然如果想要兼容c和c++, 可以使用posix c提供的那一套, 如果只是C++, 建议用boost这一套.
并且c++11已经兼容了这一套, 直接引入<regex>头文件, 在std::下就可以用了.

c++11之前, 还是要引用boost库.

文章目录
  1. 1. 引子
  2. 2. 正文
    1. 2.1. 详细讲解
      1. 2.1.1. boost::regex
      2. 2.1.2. boost::regex_match
      3. 2.1.3. boost::regex_research
      4. 2.1.4. boost::regex_replace
    2. 2.2. 规范代码
      1. 2.2.1. boost::regex_match
      2. 2.2.2. boost::regex_replace
  3. 3. 尾巴
|