技术: Rapid Xml

本文是对 Rapid-Xml 实践的讲解和总结.

引子

官网上的简介如下:
RapidXml is an attempt to create the fastest XML parser possible, while retaining useability, portability and reasonable W3C compatibility. It is an in-situ parser written in modern C++, with parsing speed approaching that of strlen function executed on the same data.

Integration with your project will be trivial, because entire library is contained in a single header file, and requires no building or configuration.

The author of RapidXml is Marcin Kalicinski .

正文

注意 rapidxml 一个很重要的特性: RapidXml is an in-situ parser, which allows it to achieve very high parsing speed.

In-situ means that parser does not make copies of strings. Instead, it places pointers to the source text in the DOM hierarchy.

换句话说:
Nodes and attributes produced by RapidXml do not own their name and value strings. They merely hold the pointers to them.

安装

总共就4个头文件…直接下载吧

1
2
3
4
5
6
7
8
9
10
11
$ wget https://nchc.dl.sourceforge.net/project/rapidxml/rapidxml/rapidxml%201.13/rapidxml-1.13.zip
$ unzip rapidxml-1.13.zip
Archive: rapidxml-1.13.zip
creating: rapidxml-1.13/
inflating: rapidxml-1.13/license.txt
inflating: rapidxml-1.13/manual.html
inflating: rapidxml-1.13/rapidxml.hpp
inflating: rapidxml-1.13/rapidxml_iterators.hpp
inflating: rapidxml-1.13/rapidxml_print.hpp
inflating: rapidxml-1.13/rapidxml_utils.hpp
$ sudo mv rapidxml-1.13 /usr/local/include/rapidxml

看了一下它的源文件, 代码量非常少, 总共不到4000行, 主要逻辑全部集中在 rapidxml.hpp 模板头文件中, 但是除去通用处理类, 比如parse_error, memory_pool等, 真正逻辑也很清晰.

源码就不分析了, 有时间再补上 (主要看class xml_base, xml_attribute等)

入门

拿起来就是用(假设你已经知道了DOM解析, SAX解析, NODE, ROOT, Attribute, Document这些概念了).

解析

解析一个C风格的字符串:(zero-terminated string)

1
2
3
using namespace rapidxml;
xml_document<> doc; // character type defaults to char
doc.parse<0>(text); // 0 是默认flag

菱形语法表示模板参数有默认值了, parse 的模板参数flag必须是一个编译时常量或者常量表达式, 此处的0是默认的.

doc就是DOM树的根(节点), 表示整个解析后的XML, 也表示整个内存池.

读取

读DOM树, 都需要借助 xml_nodexml_attribute 类, 样板代码:

1
2
3
4
5
6
7
8
9
cout << "Name of my first node is: " << doc.first_node()->name() << "\n";
xml_node<> *node = doc.first_node("foobar");
cout << "Node foobar has value " << node->value() << "\n";
for (xml_attribute<> *attr = node->first_attribute();
attr; attr = attr->next_attribute())
{
cout << "Node foobar has attribute " << attr->name() << " ";
cout << "with value " << attr->value() << "\n";
}

修改

Nodes and attributes can be added/removed, and their contents changed. The below example creates a HTML document, whose sole contents is a link to google.com website:

1
2
3
4
5
6
xml_document<> doc;
//node_element 代表node type的flag
xml_node<> *node = doc.allocate_node(node_element, "a", "Google");
doc.append_node(node);
xml_attribute<> *attr = doc.allocate_attribute("href", "google.com");
node->append_attribute(attr);

还是字符串问题, 上面的代码, 使用的是字符串常量, 所以不用担心字符串声明周期的合法性. 但是添加到node, attribute的string如果是普通变量, 那么在赋值(assign)的时候, 一定要保证string的生命周期和合法性, 毕竟node和string并不是实际存储内容(content), 而只是保留其地址的(浅复制).

所以这里, 作者推荐了一种做法来保证string的声明周期:

1
2
3
xml_document<> doc;
char *node_name = doc.allocate_string(name); // Allocate string and copy name into it
xml_node<> *node = doc.allocate_node(node_element, node_name); // Set node name to node_name

用内存池技术 memory_pool::allocate_string() , 来保证分配的string和doc的声明周期一致. (注意上面说过了, doc其实就是代表整个XML, 以及XML内存池)

打印输出

使用 print() 函数和 operator<<(), 打印 xml_document, xml_node 对象.
(头文件 rapidxml_print.hpp )

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
using namespace rapidxml;
xml_document<> doc; // character type defaults to char
// ... some code to fill the document

// Print to stream using operator <<
std::cout << doc;

// Print to stream using print function, specifying printing flags
print(std::cout, doc, 0); // 0 means default printing flags

// Print to string using output iterator
std::string s;
print(std::back_inserter(s), doc, 0);

// Print to memory buffer using output iterator
char buffer[4096]; // You are responsible for making the buffer large enough!
char *end = print(buffer, doc, 0); // end contains pointer to character after last printed character
*end = 0; // Add string terminator after XML

稍稍注意下, 我用的1.13版本有点儿坑, -std=c++11 的时候, 还是会出现 “print_node()” 未定义的现象, 需要把它的源码定义 print_node 位置后移.

详解

细致的说一下相关的API(类, 函数, 常量等), 实际上作者对于相关API的命名真的非常清晰, 也符合一个从C开始到C++的开发人员的命名风格, 非常不错.

(总共4个头文件, 我就不指明是哪个头文件了)

名字空间 rapidxml :

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
enum node_type
function parse_error_handler(const char *what, void *where)
function print(OutIt out, const xml_node< Ch > &node, int flags=0)
function print(std::basic_ostream< Ch > &out, const xml_node< Ch > &node, int flags=0)
function operator<<(std::basic_ostream< Ch > &out, const xml_node< Ch > &node)

constant parse_no_data_nodes
constant parse_no_element_values
constant parse_no_string_terminators
constant parse_no_entity_translation
constant parse_no_utf8
constant parse_declaration_node
constant parse_comment_nodes
constant parse_doctype_node
constant parse_pi_nodes
constant parse_validate_closing_tags
constant parse_trim_whitespace
constant parse_normalize_whitespace
constant parse_default
constant parse_non_destructive
constant parse_fastest
constant parse_full
constant print_no_indenting

特别说明一下 enum node_type :

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
node_pi
A PI node. Name contains target. Value contains instructions.
node_document
A document node. Name and value are empty.
node_element
An element node. Name contains element name. Value contains text of first data node.
node_data
A data node. Name is empty. Value contains data text.
node_cdata
A CDATA node. Name is empty. Value contains data text.

---------------------------------------------------------------
node_comment
A comment node. Name is empty. Value contains comment text.
node_declaration
A declaration node. Name and value are empty. Declaration parameters (version, encoding and standalone) are in node attributes.
node_doctype 通常用 node_pi替代了
A DOCTYPE node. Name is empty. Value contains DOCTYPE text.

类模板 rapidxml::memory_pool

1
2
3
4
5
6
7
8
constructor memory_pool()
destructor ~memory_pool()
function allocate_node(node_type type, const Ch *name=0, const Ch *value=0, std::size_t name_size=0, std::size_t value_size=0)
function allocate_attribute(const Ch *name=0, const Ch *value=0, std::size_t name_size=0, std::size_t value_size=0)
function allocate_string(const Ch *source=0, std::size_t size=0)
function clone_node(const xml_node< Ch > *source, xml_node< Ch > *result=0)
function clear()
function set_allocator(alloc_func *af, free_func *ff)

rapidxml::parse_error, 一个继承自 std::exception的异常类:

1
2
3
constructor parse_error(const char *what, void *where)
function what() const
function where() const

类模板 rapidxml::xml_document :

1
2
3
constructor xml_document()
function parse(Ch *text)
function clear()

类模板 rapidxml::xml_node :

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
constructor xml_node(node_type type)
function type() const //node有多种类型
function type(node_type type)

function document() const
function first_node(const Ch *name=0, std::size_t name_size=0, bool case_sensitive=true) const
function last_node(const Ch *name=0, std::size_t name_size=0, bool case_sensitive=true) const

function previous_sibling(const Ch *name=0, std::size_t name_size=0, bool case_sensitive=true) const
function next_sibling(const Ch *name=0, std::size_t name_size=0, bool case_sensitive=true) const
function prepend_attribute(xml_attribute< Ch > *attribute)


function first_attribute(const Ch *name=0, std::size_t name_size=0, bool case_sensitive=true) const
function last_attribute(const Ch *name=0, std::size_t name_size=0, bool case_sensitive=true) const

function prepend_node(xml_node< Ch > *child)

function append_node(xml_node< Ch > *child)
function insert_node(xml_node< Ch > *where, xml_node< Ch > *child)

function remove_first_node()
function remove_last_node()
function remove_node(xml_node< Ch > *where)
function remove_all_nodes()

function append_attribute(xml_attribute< Ch > *attribute)
function insert_attribute(xml_attribute< Ch > *where, xml_attribute< Ch > *attribute)

function remove_first_attribute()
function remove_last_attribute()
function remove_attribute(xml_attribute< Ch > *where)
function remove_all_attributes()

注意操作属性也在node里面.

类模板 rapidxml::xml_attribute :

1
2
3
4
constructor xml_attribute()
function document() const
function previous_attribute(const Ch *name=0, std::size_t name_size=0, bool case_sensitive=true) const
function next_attribute(const Ch *name=0, std::size_t name_size=0, bool case_sensitive=true) const

类模板 rapidxml::xml_base :

1
2
3
4
5
6
7
8
9
10
constructor xml_base()
function name() const
function name_size() const
function value() const
function value_size() const
function name(const Ch *name, std::size_t size)
function name(const Ch *name)
function value(const Ch *value, std::size_t size)
function value(const Ch *value)
function parent() const

案例

大致包括:

  • 创建一个XML文件
  • 写,读,修改.

创建一个XML文件:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
#include <iostream>
#include <fstream>
#include <iterator> //back_inserter
#include "rapidxml/rapidxml.hpp"
#include "rapidxml/rapidxml_utils.hpp"
#include "rapidxml/rapidxml_print.hpp"


using document = rapidxml::xml_document<>;
using node = rapidxml::xml_node<>;

using rapidxml::node_pi;
using rapidxml::node_element;


int main(void)
{
document doc;
node *first = doc.allocate_node(node_pi,
doc.allocate_string("xml version='1.0' encoding='utf-8'"));
doc.append_node(first);

//该节点作为根节点
node *root = doc.allocate_node(node_element, "config", NULL);
doc.append_node(root);

//下面节点拥有子节点
node *color = doc.allocate_node(node_element, "color", NULL);
root->append_node(color);

color->append_node(doc.allocate_node(node_element,"red","true"));
color->append_node(doc.allocate_node(node_element,"green","true"));
color->append_node(doc.allocate_node(node_element,"blue","true"));
color->append_node(doc.allocate_node(node_element,"alpha","true"));

node *screen = doc.allocate_node(node_element,"screen",NULL);
screen->append_node(doc.allocate_node(node_element,"wide","640"));
screen->append_node(doc.allocate_node(node_element,"length","480"));
root->append_node(screen);


//下面节点拥有属性
node *mode = doc.allocate_node(node_element,"mode", "screen mode");
mode->append_attribute(doc.allocate_attribute("fullscreen","false"));
root->append_node(mode);

//打印试试
std::string text;
rapidxml::print(std::back_inserter(text), doc, 0);
std::cout << text << std::endl;

//直接输出到指定流
std::ofstream out("config.xml");
out << doc;
out.close();

return 0;
}

Linux平台下编译, 居然遇到了这样的坑:

1
2
3
4
5
6
7
8
9
10
11
/usr/local/include/rapidxml/rapidxml_print.hpp: In instantiation of 
‘OutIt rapidxml::internal::print_node(OutIt, const rapidxml::xml_node<Ch>*, int, int) [with OutIt = std\
::back_insert_iterator<std::basic_string<char> >; Ch = char]’:
/usr/local/include/rapidxml/rapidxml_print.hpp:390:36:
required from ‘OutIt rapidxml::print(OutIt, const rapidxml::xml_node<Ch>&, int) [with OutIt = std::back_insert_ite\
rator<std::basic_string<char> >; Ch = char]’

main.cpp:52:51: required from here
/usr/local/include/rapidxml/rapidxml_print.hpp:115:37:
error: ‘print_children’ was not declared in this scope, and no declarations were found by argument-dependent lookup \
at the point of instantiation [-fpermissive]

大致上说, print_node 被调用, 同时rapidxml_print.hpp:115:37print_children 未定义.

然后看看源码, 发现, line 107 行源码是这样的:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
// Print node
template<class OutIt, class Ch>
inline OutIt print_node(OutIt out, const xml_node<Ch> *node, int flags, int indent)
{
// Print proper node type
switch (node->type())
{

// Document
case node_document:
out = print_children(out, node, flags, indent);
break;

// Element
case node_element:
out = print_element_node(out, node, flags, indent);
break;

// Data
case node_data:
out = print_data_node(out, node, flags, indent);
break;

// CDATA
case node_cdata:
out = print_cdata_node(out, node, flags, indent);
break;

// Declaration
case node_declaration:
out = print_declaration_node(out, node, flags, indent);
break;

// Comment
case node_comment:
out = print_comment_node(out, node, flags, indent);
break;

// Doctype
case node_doctype:
out = print_doctype_node(out, node, flags, indent);
break;

// Pi
case node_pi:
out = print_pi_node(out, node, flags, indent);
break;

// Unknown
default:
assert(0);
break;
}

// If indenting not disabled, add line break after node
if (!(flags & print_no_indenting))
*out = Ch('\n'), ++out;

// Return modified iterator
return out;
}

这个 print_node 要调用具体的 print_children 或者 print_element_node 居然放在别人前面, 还没有前置声明.

修改一下源码(后移print_node函数以及在print_children函数前面防止前置声明), 才编译通过:

1
2
template<class OutIt, class Ch>
inline OutIt print_node(OutIt out, const xml_node<Ch> *node, int flags, int indent);

运行结果还不错:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
<?xml version='1.0' encoding='utf-8' ?>
<config>
<color>
<red>true</red>
<green>true</green>
<blue>true</blue>
<alpha>true</alpha>
</color>
<screen>
<wide>640</wide>
<length>480</length>
</screen>
<mode fullscreen="false">screen mode</mode>
</config>

同时也生成了 config.xml.

下面读取该文件同时修改一下内容看看:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
#include <iostream>

#include "rapidxml/rapidxml.hpp"
#include "rapidxml/rapidxml_utils.hpp"
#include "rapidxml/rapidxml_print.hpp"


using namespace rapidxml;

void print_info(const char *str)
{
std::cout << str << std::endl;
}

int main(void)
{
//读入内存
file<> fdoc("config.xml");
//fdoc.data()即拿到数据字符串

//保存到document对象中(实际上是解析进去的)
xml_document<> doc;
doc.parse<0>(fdoc.data());

//获取根节点
xml_node<> *root = doc.first_node();
//if(root){} //可以做一下检查, 检查节点是否存在
//print_info(root->name());

//获取根节点的第一个节点color
xml_node<>* node1 = root->first_node();
print_info(node1->name());
//node1的子节点
xml_node<>* node11 = node1->first_node();
print_info(node11->name());
print_info(node11->value());

//添加一个颜色 再保存
node1->append_node(doc.allocate_node(rapidxml::node_element, "yellow","false"));
//删除color的子节点, <alpha>true</alpha>
node1->remove_node(node11->next_sibling("alpha"));

//remove_all_nodes()可以删除根节点的所有子节点
//略




std::string text;
rapidxml::print(std::back_inserter(text),doc,0);
std::cout<<text<<std::endl;

//写回去文件(默认就是文件存在则会覆盖原来的内容,不存在则会新建)
std::ofstream out("config.xml", std::ios::out | std::ios::trunc);
out << doc;
out.close();

return 0;
}

运行结果如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
color
red
true
<config>
<color>
<red>true</red>
<green>true</green>
<blue>true</blue>
<yellow>false</yellow>
</color>
<screen>
<wide>640</wide>
<length>480</length>
</screen>
<mode fullscreen="false">screen mode</mode>
</config>

上面, doc.parse<0>(char_buffer), 不管你从哪里解析, 自定的 char *或者file<> fdoc("config.xml")的data(), 总之, 应该检查一下解析失败的情况:

1
2
3
4
5
6
7
8
9
10
11
string err;
try {
//tmpbuf的生命周期必须到解析完
doc.parse<0>((char*)tmpbuf);
} catch (rapidxml::parse_error &e) {
err = "parse xml error: ";
err += e.what();
delete []tmpbuf;

//看看是不是返回给上一级调用者捕获
}

补充

编辑节点:
先删除, 后添加, 并且insert是在指定节点前添加

1
2
3
4
5
6
7
8
9
10
11
12
13
xml_node<>* root = doc.first_node();

//删除color节点及其子节点.
xml_node<>* delnode = root->first_node("color");
root->remove_node(delnode);

//找到要插入节点的位置
xml_node<>* lnode = root->first_node("screen");
//生成节点
xml_node<>* mynode=doc.allocate_node(node_element,"address","SH");

//添加节点
root->insert_node(lnode,mynode);

遍历当前节点的所有子节点:

1
2
3
4
5
6
7
rapidxml::xml_node<char> * node = parent_node->first_node("node name");
for( ; node != NULL; ) {
//do sth: name(), value()

//更新节点
node = node->next_sibling();
}

遍历当前节点所有属性:

1
2
3
4
5
6
7
8
rapidxml::xml_attribute<char> *attr 
= node->first_attribute("node name");
for( ; attr != NULL; ) {
//do sth
char *value = attr->value();

attr = attr->next_attribute();
}


尾巴

一个简单的小的开源库, 虽然功能不是太强大, 但是完成日常的解析工作, 没有太多问题.

开源有坑, 记得填坑.

参考资料

  1. RAPIDXML Manual
文章目录
  1. 1. 引子
  2. 2. 正文
    1. 2.1. 安装
    2. 2.2. 入门
      1. 2.2.1. 解析
      2. 2.2.2. 读取
      3. 2.2.3. 修改
      4. 2.2.4. 打印输出
    3. 2.3. 详解
    4. 2.4. 案例
      1. 2.4.1. 补充
  3. 3. 尾巴
  4. 4. 参考资料
|