引子
官网上的简介如下:
RapidXml is an attempt to create the fastest XML parser possible, while retaining useability, portability and reasonable W3C compatibility. It is an in-situ parser written in modern C++, with parsing speed approaching that of strlen
function executed on the same data.
Integration with your project will be trivial, because entire library is contained in a single header file, and requires no building or configuration.
The author of RapidXml is Marcin Kalicinski
.
正文
注意 rapidxml 一个很重要的特性: RapidXml is an in-situ parser, which allows it to achieve very high parsing speed.
In-situ
means that parser does not make copies of strings. Instead, it places pointers to the source text in the DOM hierarchy.
换句话说:
Nodes and attributes produced by RapidXml do not own their name and value strings. They merely hold the pointers to them.
安装
总共就4个头文件…直接下载吧
1 | $ wget https://nchc.dl.sourceforge.net/project/rapidxml/rapidxml/rapidxml%201.13/rapidxml-1.13.zip |
看了一下它的源文件, 代码量非常少, 总共不到4000行, 主要逻辑全部集中在 rapidxml.hpp
模板头文件中, 但是除去通用处理类, 比如parse_error, memory_pool等, 真正逻辑也很清晰.
源码就不分析了, 有时间再补上 (主要看class xml_base, xml_attribute等)
入门
拿起来就是用(假设你已经知道了DOM解析, SAX解析, NODE, ROOT, Attribute, Document这些概念了).
解析
解析一个C风格的字符串:(zero-terminated string)1
2
3using namespace rapidxml;
xml_document<> doc; // character type defaults to char
doc.parse<0>(text); // 0 是默认flag
菱形语法表示模板参数有默认值了, parse 的模板参数flag必须是一个编译时常量或者常量表达式, 此处的0是默认的.
doc就是DOM树的根(节点), 表示整个解析后的XML, 也表示整个内存池.
读取
读DOM树, 都需要借助 xml_node
和 xml_attribute
类, 样板代码:
1 | cout << "Name of my first node is: " << doc.first_node()->name() << "\n"; |
修改
Nodes and attributes can be added/removed, and their contents changed. The below example creates a HTML document, whose sole contents is a link to google.com website:
1 | xml_document<> doc; |
还是字符串问题, 上面的代码, 使用的是字符串常量, 所以不用担心字符串声明周期的合法性. 但是添加到node, attribute的string如果是普通变量, 那么在赋值(assign)的时候, 一定要保证string的生命周期和合法性, 毕竟node和string并不是实际存储内容(content), 而只是保留其地址的(浅复制).
所以这里, 作者推荐了一种做法来保证string的声明周期:1
2
3xml_document<> doc;
char *node_name = doc.allocate_string(name); // Allocate string and copy name into it
xml_node<> *node = doc.allocate_node(node_element, node_name); // Set node name to node_name
用内存池技术 memory_pool::allocate_string()
, 来保证分配的string和doc的声明周期一致. (注意上面说过了, doc其实就是代表整个XML, 以及XML内存池)
打印输出
使用 print()
函数和 operator<<()
, 打印 xml_document, xml_node 对象.
(头文件 rapidxml_print.hpp
)
1 | using namespace rapidxml; |
稍稍注意下, 我用的1.13版本有点儿坑, -std=c++11
的时候, 还是会出现 “print_node()” 未定义的现象, 需要把它的源码定义 print_node
位置后移.
详解
细致的说一下相关的API(类, 函数, 常量等), 实际上作者对于相关API的命名真的非常清晰, 也符合一个从C开始到C++的开发人员的命名风格, 非常不错.
(总共4个头文件, 我就不指明是哪个头文件了)
名字空间 rapidxml
:1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23enum node_type
function parse_error_handler(const char *what, void *where)
function print(OutIt out, const xml_node< Ch > &node, int flags=0)
function print(std::basic_ostream< Ch > &out, const xml_node< Ch > &node, int flags=0)
function operator<<(std::basic_ostream< Ch > &out, const xml_node< Ch > &node)
constant parse_no_data_nodes
constant parse_no_element_values
constant parse_no_string_terminators
constant parse_no_entity_translation
constant parse_no_utf8
constant parse_declaration_node
constant parse_comment_nodes
constant parse_doctype_node
constant parse_pi_nodes
constant parse_validate_closing_tags
constant parse_trim_whitespace
constant parse_normalize_whitespace
constant parse_default
constant parse_non_destructive
constant parse_fastest
constant parse_full
constant print_no_indenting
特别说明一下 enum node_type
:1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18node_pi
A PI node. Name contains target. Value contains instructions.
node_document
A document node. Name and value are empty.
node_element
An element node. Name contains element name. Value contains text of first data node.
node_data
A data node. Name is empty. Value contains data text.
node_cdata
A CDATA node. Name is empty. Value contains data text.
---------------------------------------------------------------
node_comment
A comment node. Name is empty. Value contains comment text.
node_declaration
A declaration node. Name and value are empty. Declaration parameters (version, encoding and standalone) are in node attributes.
node_doctype 通常用 node_pi替代了
A DOCTYPE node. Name is empty. Value contains DOCTYPE text.
类模板 rapidxml::memory_pool
1
2
3
4
5
6
7
8constructor memory_pool()
destructor ~memory_pool()
function allocate_node(node_type type, const Ch *name=0, const Ch *value=0, std::size_t name_size=0, std::size_t value_size=0)
function allocate_attribute(const Ch *name=0, const Ch *value=0, std::size_t name_size=0, std::size_t value_size=0)
function allocate_string(const Ch *source=0, std::size_t size=0)
function clone_node(const xml_node< Ch > *source, xml_node< Ch > *result=0)
function clear()
function set_allocator(alloc_func *af, free_func *ff)
类 rapidxml::parse_error
, 一个继承自 std::exception的异常类:1
2
3constructor parse_error(const char *what, void *where)
function what() const
function where() const
类模板 rapidxml::xml_document
:1
2
3constructor xml_document()
function parse(Ch *text)
function clear()
类模板 rapidxml::xml_node
:1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33constructor xml_node(node_type type)
function type() const //node有多种类型
function type(node_type type)
function document() const
function first_node(const Ch *name=0, std::size_t name_size=0, bool case_sensitive=true) const
function last_node(const Ch *name=0, std::size_t name_size=0, bool case_sensitive=true) const
function previous_sibling(const Ch *name=0, std::size_t name_size=0, bool case_sensitive=true) const
function next_sibling(const Ch *name=0, std::size_t name_size=0, bool case_sensitive=true) const
function prepend_attribute(xml_attribute< Ch > *attribute)
function first_attribute(const Ch *name=0, std::size_t name_size=0, bool case_sensitive=true) const
function last_attribute(const Ch *name=0, std::size_t name_size=0, bool case_sensitive=true) const
function prepend_node(xml_node< Ch > *child)
function append_node(xml_node< Ch > *child)
function insert_node(xml_node< Ch > *where, xml_node< Ch > *child)
function remove_first_node()
function remove_last_node()
function remove_node(xml_node< Ch > *where)
function remove_all_nodes()
function append_attribute(xml_attribute< Ch > *attribute)
function insert_attribute(xml_attribute< Ch > *where, xml_attribute< Ch > *attribute)
function remove_first_attribute()
function remove_last_attribute()
function remove_attribute(xml_attribute< Ch > *where)
function remove_all_attributes()
注意操作属性也在node里面.
类模板 rapidxml::xml_attribute
:1
2
3
4constructor xml_attribute()
function document() const
function previous_attribute(const Ch *name=0, std::size_t name_size=0, bool case_sensitive=true) const
function next_attribute(const Ch *name=0, std::size_t name_size=0, bool case_sensitive=true) const
类模板 rapidxml::xml_base
:1
2
3
4
5
6
7
8
9
10constructor xml_base()
function name() const
function name_size() const
function value() const
function value_size() const
function name(const Ch *name, std::size_t size)
function name(const Ch *name)
function value(const Ch *value, std::size_t size)
function value(const Ch *value)
function parent() const
案例
大致包括:
- 创建一个XML文件
- 写,读,修改.
创建一个XML文件:1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
using document = rapidxml::xml_document<>;
using node = rapidxml::xml_node<>;
using rapidxml::node_pi;
using rapidxml::node_element;
int main(void)
{
document doc;
node *first = doc.allocate_node(node_pi,
doc.allocate_string("xml version='1.0' encoding='utf-8'"));
doc.append_node(first);
//该节点作为根节点
node *root = doc.allocate_node(node_element, "config", NULL);
doc.append_node(root);
//下面节点拥有子节点
node *color = doc.allocate_node(node_element, "color", NULL);
root->append_node(color);
color->append_node(doc.allocate_node(node_element,"red","true"));
color->append_node(doc.allocate_node(node_element,"green","true"));
color->append_node(doc.allocate_node(node_element,"blue","true"));
color->append_node(doc.allocate_node(node_element,"alpha","true"));
node *screen = doc.allocate_node(node_element,"screen",NULL);
screen->append_node(doc.allocate_node(node_element,"wide","640"));
screen->append_node(doc.allocate_node(node_element,"length","480"));
root->append_node(screen);
//下面节点拥有属性
node *mode = doc.allocate_node(node_element,"mode", "screen mode");
mode->append_attribute(doc.allocate_attribute("fullscreen","false"));
root->append_node(mode);
//打印试试
std::string text;
rapidxml::print(std::back_inserter(text), doc, 0);
std::cout << text << std::endl;
//直接输出到指定流
std::ofstream out("config.xml");
out << doc;
out.close();
return 0;
}
Linux平台下编译, 居然遇到了这样的坑:1
2
3
4
5
6
7
8
9
10
11/usr/local/include/rapidxml/rapidxml_print.hpp: In instantiation of
‘OutIt rapidxml::internal::print_node(OutIt, const rapidxml::xml_node<Ch>*, int, int) [with OutIt = std\
::back_insert_iterator<std::basic_string<char> >; Ch = char]’:
/usr/local/include/rapidxml/rapidxml_print.hpp:390:36:
required from ‘OutIt rapidxml::print(OutIt, const rapidxml::xml_node<Ch>&, int) [with OutIt = std::back_insert_ite\
rator<std::basic_string<char> >; Ch = char]’
main.cpp:52:51: required from here
/usr/local/include/rapidxml/rapidxml_print.hpp:115:37:
error: ‘print_children’ was not declared in this scope, and no declarations were found by argument-dependent lookup \
at the point of instantiation [-fpermissive]
大致上说, print_node
被调用, 同时rapidxml_print.hpp:115:37
的 print_children
未定义.
然后看看源码, 发现, line 107 行源码是这样的:1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61// Print node
template<class OutIt, class Ch>
inline OutIt print_node(OutIt out, const xml_node<Ch> *node, int flags, int indent)
{
// Print proper node type
switch (node->type())
{
// Document
case node_document:
out = print_children(out, node, flags, indent);
break;
// Element
case node_element:
out = print_element_node(out, node, flags, indent);
break;
// Data
case node_data:
out = print_data_node(out, node, flags, indent);
break;
// CDATA
case node_cdata:
out = print_cdata_node(out, node, flags, indent);
break;
// Declaration
case node_declaration:
out = print_declaration_node(out, node, flags, indent);
break;
// Comment
case node_comment:
out = print_comment_node(out, node, flags, indent);
break;
// Doctype
case node_doctype:
out = print_doctype_node(out, node, flags, indent);
break;
// Pi
case node_pi:
out = print_pi_node(out, node, flags, indent);
break;
// Unknown
default:
assert(0);
break;
}
// If indenting not disabled, add line break after node
if (!(flags & print_no_indenting))
*out = Ch('\n'), ++out;
// Return modified iterator
return out;
}
这个 print_node
要调用具体的 print_children
或者 print_element_node
居然放在别人前面, 还没有前置声明.
修改一下源码(后移print_node
函数以及在print_children
函数前面防止前置声明), 才编译通过:1
2template<class OutIt, class Ch>
inline OutIt print_node(OutIt out, const xml_node<Ch> *node, int flags, int indent);
运行结果还不错:1
2
3
4
5
6
7
8
9
10
11
12
13
14<?xml version='1.0' encoding='utf-8' ?>
<config>
<color>
<red>true</red>
<green>true</green>
<blue>true</blue>
<alpha>true</alpha>
</color>
<screen>
<wide>640</wide>
<length>480</length>
</screen>
<mode fullscreen="false">screen mode</mode>
</config>
同时也生成了 config.xml.
下面读取该文件同时修改一下内容看看:1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
using namespace rapidxml;
void print_info(const char *str)
{
std::cout << str << std::endl;
}
int main(void)
{
//读入内存
file<> fdoc("config.xml");
//fdoc.data()即拿到数据字符串
//保存到document对象中(实际上是解析进去的)
xml_document<> doc;
doc.parse<0>(fdoc.data());
//获取根节点
xml_node<> *root = doc.first_node();
//if(root){} //可以做一下检查, 检查节点是否存在
//print_info(root->name());
//获取根节点的第一个节点color
xml_node<>* node1 = root->first_node();
print_info(node1->name());
//node1的子节点
xml_node<>* node11 = node1->first_node();
print_info(node11->name());
print_info(node11->value());
//添加一个颜色 再保存
node1->append_node(doc.allocate_node(rapidxml::node_element, "yellow","false"));
//删除color的子节点, <alpha>true</alpha>
node1->remove_node(node11->next_sibling("alpha"));
//remove_all_nodes()可以删除根节点的所有子节点
//略
std::string text;
rapidxml::print(std::back_inserter(text),doc,0);
std::cout<<text<<std::endl;
//写回去文件(默认就是文件存在则会覆盖原来的内容,不存在则会新建)
std::ofstream out("config.xml", std::ios::out | std::ios::trunc);
out << doc;
out.close();
return 0;
}
运行结果如下:1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16color
red
true
<config>
<color>
<red>true</red>
<green>true</green>
<blue>true</blue>
<yellow>false</yellow>
</color>
<screen>
<wide>640</wide>
<length>480</length>
</screen>
<mode fullscreen="false">screen mode</mode>
</config>
上面, doc.parse<0>(char_buffer)
, 不管你从哪里解析, 自定的 char *
或者file<> fdoc("config.xml")
的data(), 总之, 应该检查一下解析失败的情况:1
2
3
4
5
6
7
8
9
10
11string err;
try {
//tmpbuf的生命周期必须到解析完
doc.parse<0>((char*)tmpbuf);
} catch (rapidxml::parse_error &e) {
err = "parse xml error: ";
err += e.what();
delete []tmpbuf;
//看看是不是返回给上一级调用者捕获
}
补充
编辑节点:
先删除, 后添加, 并且insert是在指定节点前添加1
2
3
4
5
6
7
8
9
10
11
12
13xml_node<>* root = doc.first_node();
//删除color节点及其子节点.
xml_node<>* delnode = root->first_node("color");
root->remove_node(delnode);
//找到要插入节点的位置
xml_node<>* lnode = root->first_node("screen");
//生成节点
xml_node<>* mynode=doc.allocate_node(node_element,"address","SH");
//添加节点
root->insert_node(lnode,mynode);
遍历当前节点的所有子节点:1
2
3
4
5
6
7rapidxml::xml_node<char> * node = parent_node->first_node("node name");
for( ; node != NULL; ) {
//do sth: name(), value()
//更新节点
node = node->next_sibling();
}
遍历当前节点所有属性:1
2
3
4
5
6
7
8rapidxml::xml_attribute<char> *attr
= node->first_attribute("node name");
for( ; attr != NULL; ) {
//do sth
char *value = attr->value();
attr = attr->next_attribute();
}
尾巴
一个简单的小的开源库, 虽然功能不是太强大, 但是完成日常的解析工作, 没有太多问题.
开源有坑, 记得填坑.