C++11包含了几乎所有的Boost.Regex类和标志, 只要包含 <regex>
就是可以在std::下找到它们.
不过我还是喜欢/习惯于使用boost库(), 本文了演示多个例子.
引子
boost库提供了对正则表达式的支持(libboost_regex), 并且支持三种:
1. perl正则表达式(默认不忽略大小写)
2. posix基本正则表达式(默认不区分大小写)
3. posix扩展正则表达式(默认不区分大小写)
根据boost::regex::flag_type
参数的不同, 选择不同的种类.
每次使用正则表达式, 都要先构造一个对象boost::regex
, 之后根据需求进行匹配boost::regex_match(string, boost::regex)
或者其他一些boost::regex_*
函数.
即主要涉及两个类:
1. boost::regex //用于定义一个正则表达式对象
2. boost::smatch //用于保存匹配的结果(全串或者子串的结果)
具体可以参考一下文档下列的书籍:
1.《Beyond the C++ Standard Library: An Introduction to Boost》
2.《Boost C++ Application Development Cookbook》
3.《boost_1_56_pdf》regex部分
btw: 下面所有代码编译的时候记得链接boost_regex库(-lboost_regex).
正文
详细讲解
boost::regex
这就是代表正则表达式对象的类, 用来存储你写的正则表达式(如果你写的正则表达式不符合规范的话,可能会报错).
(但是又不会像posix c中对于正则表达式的支持, 需要先编译regcomp(), 之后才能进行匹配regexec())
创建了一个存储正则表达式的对象:(一般要加上const修饰)
1. boost::regex e(string_pattern, flag);
2. boost::regex e(string_pattern);
其中flag取不同的值, 可以设置不同的种类.
1. boost::regex::perl
2. boost::regex::perl | boost::regex::icase
3. boost::regex::extended
4. boost::regex::extended | boost::regex::icase
5. boost::regex::basic
6. boost::regex::basic | boost::regex::icase
还有一个取值, boost::regex::no_exception
, 当string_pattern不符合正则表达式语法时,一般创建会失败,默认抛出异常.
但是如果你设置了上面的标志, 可以依赖”boost::regex.status()”进行判断.
boost::regex e(string_pattern, flag);
if(e.status){
//error happened
}
boost::regex_match
boost::regex e(string_pattern);
bool matched = boost::regex_match(target_string, e);
相对完整的demo:
1 2 3 4 5 6 7 8 9 10 11 12 13
| #include <boost/regex.hpp> #include <locale> #include <iostream>
int main(void) { std::locale::global(std::locale("German")); std::string s = "Boris Schaling"; boost::regex expr("\\w+\\s\\w+"); std::cout << boost::regex_match(s, expr) << std::endl;
return 0; }
|
boost::regex_research
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
| #include <boost/regex.hpp> #include <locale> #include <iostream>
int main() { std::locale::global(std::locale("German")); std::string s = "Boris Schaling"; boost::regex expr("(\\w+)\\s(\\w+)"); boost::smatch what; if (boost::regex_search(s, what, expr)){ std::cout << what[0] << std::endl; std::cout << what[1] << "-----" << what[2] << std::endl; } }
|
boost::regex_replace
1 2 3 4 5 6 7 8 9 10 11 12
| #include <boost/regex.hpp> #include <locale> #include <iostream>
int main() { std::locale::global(std::locale("German")); std::string s = " Boris Schaling "; boost::regex expr("\\s"); std::string fmt("_"); std::cout << boost::regex_replace(s, expr, fmt) << std::endl; }
|
以及
1 2 3 4 5 6 7 8 9 10 11 12
| #include <boost/regex.hpp> #include <locale> #include <iostream>
int main() { std::locale::global(std::locale("German")); std::string s = "Boris Schaling"; boost::regex expr("(\\w+)\\s(\\w+)"); std::string fmt("\\2 \\1"); std::cout << boost::regex_replace(s, expr, fmt) << std::endl; }
|
1 2 3 4 5 6 7 8 9 10 11 12 13 14
| #include <boost/regex.hpp> #include <locale> #include <iostream>
int main(void) { std::locale::global(std::locale("German")); std::string s = "Boris Schaling"; boost::regex expr("(\\w+)\\s(\\w+)"); std::string fmt("\\2 \\1"); std::cout << boost::regex_replace(s, expr, fmt, boost::regex_constants::format_literal) << std::endl; return 0; }
|
规范代码
boost::regex_match
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73
| #include <boost/regex.hpp> #include <iostream>
int main(void) { std::cout << "Available regex syntaxes:\n" << "\t[0] Perl\n" << "\t[1] Perl case insensitive\n" << "\t[2] POSIX extended\n" << "\t[3] POSIX extended case insensitive\n" << "\t[4] POSIX basic\n" << "\t[5] POSIX basic case insensitive\n" << "Choose regex syntax: ";
boost::regex::flag_type flag; switch (std::cin.get()) { case '0': flag = boost::regex::perl; break;
case '1': flag = boost::regex::perl|boost::regex::icase; break; case '2': flag = boost::regex::extended; break;
case '3': flag = boost::regex::extended|boost::regex::icase; break;
case '4': flag = boost::regex::basic; break; case '5': flag = boost::regex::basic|boost::regex::icase; break; default: std::cout << "Inccorect number of regex syntax. Exiting... \n"; return -1; } flag |= boost::regex::no_except; std::cin.ignore(); std::cin.clear(); std::string regex, str; do { std::cout << "Input regex: "; if (!std::getline(std::cin, regex) || regex.empty()) { return 0; } const boost::regex e(regex, flag); if (e.status()) { std::cout << "Incorrect regex pattern!\n"; continue; } std::cout << "String to match: "; while (std::getline(std::cin, str) && !str.empty()) { bool matched = boost::regex_match(str, e); std::cout << (matched ? "MATCH\n" : "DOES NOT MATCH\n"); std::cout << "String to match: "; } std::cout << '\n'; std::cin.ignore(); std::cin.clear(); } while (1); }
|
boost::regex_replace
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100
| #include <boost/regex.hpp> #include <iostream>
int main(void) { std::cout << "Available regex syntaxes:\n" << "\t[0] Perl\n" << "\t[1] Perl case insensitive\n" << "\t[2] POSIX extended\n" << "\t[3] POSIX extended case insensitive\n" << "\t[4] POSIX basic\n" << "\t[5] POSIX basic case insensitive\n" << "Choose regex syntax: ";
boost::regex::flag_type flag; switch (std::cin.get()) { case '0': flag = boost::regex::perl; break;
case '1': flag = boost::regex::perl|boost::regex::icase; break; case '2': flag = boost::regex::extended; break;
case '3': flag = boost::regex::extended|boost::regex::icase; break;
case '4': flag = boost::regex::basic; break; case '5': flag = boost::regex::basic|boost::regex::icase; break; default: std::cout << "Inccorect number of regex syntax. Exiting... \n"; return -1; } flag |= boost::regex::no_except; std::cin.ignore(); std::cin.clear(); std::string regex, str, replace_string; do { std::cout << "\nInput regex: "; if (!std::getline(std::cin, regex)) { return 0; } const boost::regex e(regex, flag); if (e.status()) { std::cout << "Incorrect regex pattern!\n"; continue; } std::cout << "String to match: "; while (std::getline(std::cin, str) && !str.empty()) { boost::smatch results; bool matched = regex_search(str, results, e); if (matched) { std::cout << "MATCH: "; std::copy( results.begin() + 1, results.end(), std::ostream_iterator<std::string>( std::cout, ", ") );
std::cout << "\nReplace pattern: "; if ( std::getline(std::cin, replace_string) && !replace_string.empty()) { std::cout << "RESULT: " << results.format(replace_string); } else { std::cin.ignore(); std::cin.clear(); } } else { std::cout << "DOES NOT MATCH"; }
std::cout << "\nString to match: "; }
std::cout << '\n';
std::cin.ignore(); std::cin.clear(); } while (1);
return 0; }
|
尾巴
看到啦, 上面用起来还是挺舒服的.
当然如果想要兼容c和c++, 可以使用posix c提供的那一套, 如果只是C++, 建议用boost这一套.
并且c++11已经兼容了这一套, 直接引入<regex>
头文件, 在std::
下就可以用了.
c++11之前, 还是要引用boost库.