2016-06-22

技术: C++ 迭代器失效问题

针对顺序容器的迭代器失效问题, 很长一段事件困扰我, 现在集中说一下, 汇总一下

基本认识

一般就发生在容器(vector, string, deque)中insert/erase等操作后, iterators,pointers,references失效.

C++ primer 原话:

当处理vector,string,deque时，当在一个循环中可能增加或移除元素时，要考虑到迭代器可能会失效的问题.

我个人觉得, 需要注意迭代器失效的真正原因是, 如果你使用失效前的迭代器, 其行为未定义, 简单说,

就怕你store了旧的已经失效的迭代器, 并且拿出来用了.

其实主要注意一下增 & 删 方法, 一旦容器失效手动更新一下. 即重新获取一下即可.

下面说说简单的注意事项.

erase/push_back

例如《C++ Primier》有个例子是这么写的:

int arr[]={0,1,2,3,4,5,6,7,8,9,10}; 
vector<int> a(arr,arr+sizeof(arr)/sizeof(*arr));
for (auto it = a.begin(); it != a.end();){
	if ((*it)&1){
		it=a.erase(it);
	}
	else
		++it;    
}

也就是说, erase之后, 迭代器指针会指向元素的下一个位置(相当于自动refresh了).

但是如果这样写, 例如:

int arr[]={0,1,2,3,4,5,6,7,8,9,10}; 
vector<int> a(arr,arr+sizeof(arr)/sizeof(*arr));

for (auto it = a.begin(); it != a.end();++it ){
	if ((*it)&1){
		a.erase(it);
	}
}

小结:

指针是与内存绑定的，而迭代器是与容器里的元素绑定的，删除了之后，该迭代器就失效了，不能再访问此迭代器
并且删除后, 容器一般是自动更新了迭代器, 例如本例中的自动指向了一下元素.

insert

使用range for的时候, 强调过, 在遍历的时候, 最好不要改变容器的属性.
比如, vector, 由于你遍历途中插入一个元素, ok, capacity翻倍了, 然后迭代器全部失效了, 后面的全部不能再遍历了.

具体的例子:

#include <iostream>
#include <vector>

using namespace std;

int main(void)
{
    vector<int> a;
    for (int i = 0; i < 10; ++i) //一会儿把容量修改成16
    {
        a.push_back(i);
    }

    for (vector<int>::iterator it = a.begin(); it != a.end(); ++it){
        if (*it == 5){ 
            a.insert(it, 100);
            ++it;
        }
    } 
    
    cout << a.capacity() << endl; //16

    for(auto &elem : a)
    {
        cout << elem << " ";        
    }
    cout << endl;

    return 0;
}

因为insert之后指向当前insert元素, 所以要手动++一次, 程序运行结果都不错:

1 2	16 0 1 2 3 4 100 5 6 7 8 9

表面上看上面的程序是没有什么问题的, 但是稍微改一下就知道问题了

#include <iostream>
#include <vector>

using namespace std;

int main(void)
{
    vector<int> a;
    for (int i = 0; i < 16; ++i) //一会儿把容量修改成16
    {
        a.push_back(i);
    }

    for (vector<int>::iterator it = a.begin(); it != a.end(); ++it){
        if (*it == 5){ 
            a.insert(it, 100); //容器此时容量翻倍所有迭代器失效

            ++it;
            cout << " now " << *it << endl; 
        }
    }

    cout << a.capacity() << endl; //16

    for(auto &elem : a)
    {
        cout << elem << " ";        
    }
    cout << endl;

    return 0;
}

运行结果如下:

 now 6
 now 5
32
0 1 2 3 4 100 100 5 6 7 8 9 10 11 12 13 14 15

容器容量翻倍之前, 即使insert也可以继续使用后面的迭代器; 但是翻倍之后, 再继续使用原来的迭代器(不更新)那么就会造成奇怪的后果(当然也可以进行分析, 没必要).

一般情况下, insert之后容器后面的迭代器都失效了(如果你之前存储了之后的迭代器就不要在用了), 甚至可能全部失效(比如 vector capacity翻倍的情况).幸运的是, 容器也会内部调整迭代器, 所以这个时候重新获取即可.

参考代码如下:

#include <iostream>
#include <vector>

using namespace std;

int main(void)
{
    vector<int> a;
    for (int i = 0; i < 16; ++i) //一会儿把容量修改成16
    {
        a.push_back(i);
    }

    for (vector<int>::iterator it = a.begin(); it != a.end(); ++it){
        if (*it == 5){ 
            it = a.insert(it, 100); //重新获取, 手动更新

            ++it;
            cout << " now " << *it << endl; 
        }
    } 
    
    cout << a.capacity() << endl; //16

    for(auto &elem : a)
    {
        cout << elem << " ";        
    }
    cout << endl;

    return 0;
}

第一段代码, 没有重新获取, 为什么还能实现, 我只能说编译器可能做了其他的辅助动作, 因为按照标准:

Causes reallocation if the new size() is greater than the old capacity(). 
If the new size() is greater than capacity(), all iterators and references are invalidated. 
Otherwise, only the iterators and references before the insertion point remain valid. 

The past-the-end iterator is also invalidated.

总结

网上有一堆总结, 我看的苦笑不得, 有时候你去实验一下就知道谁说错了(多试验几个平台, 编译运行环境).

为啥要实验?

的确, 了解容器内部数据结构, 然后分析容器存储新旧元素内存状态, 发现改变部分, 即可分析是否失效, 但是不要忽略一个问题, 那就是编译器做了哪些 under table 的小动作.

真心不敢猜测编译器背后在搞什么鬼, 我通常实验一下, gdb观察一下

我不再多说, 给出一个经典的总结:

除了链式容器list, 以及关联容器(红黑树)set, map;
只要是顺序存储的比如string, vector, array, deque

凡是涉及到增, 删操作的, 不管怎么样, 都手动更新一下迭代器(重新获取一下), 确保安全.

为啥链式和红黑树不用管, 因为原来的空间根本没有发生变化, 也就不存在迭代器失效一说.

本文标题:技术: C++ 迭代器失效问题

文章作者:Merlin

发布时间:2016-06-22, 10:50:22

最后更新:2018-04-12, 14:55:17

原始链接:http://www.merlinblog.site/posts/24da8a94/

许可协议: "署名-非商用-相同方式共享 4.0" 转载请保留原文链接及作者。