std :: vector比普通数组慢吗?
我一直认为这是std::vector
的一般智慧是“作为一个数组来实现的”,等等等等。 今天,我去了,测试它,似乎并非如此:
以下是一些测试结果:
UseArray completed in 2.619 seconds UseVector completed in 9.284 seconds UseVectorPushBack completed in 14.669 seconds The whole thing completed in 26.591 seconds
那大约要慢3到4倍! 对于“几个毫微秒的vector
可能会变慢”的评论并没有真正的理由。
和我使用的代码:
#include <cstdlib> #include <vector> #include <iostream> #include <string> #include <boost/date_time/posix_time/ptime.hpp> #include <boost/date_time/microsec_time_clock.hpp> class TestTimer { public: TestTimer(const std::string & name) : name(name), start(boost::date_time::microsec_clock<boost::posix_time::ptime>::local_time()) { } ~TestTimer() { using namespace std; using namespace boost; posix_time::ptime now(date_time::microsec_clock<posix_time::ptime>::local_time()); posix_time::time_duration d = now - start; cout << name << " completed in " << d.total_milliseconds() / 1000.0 << " seconds" << endl; } private: std::string name; boost::posix_time::ptime start; }; struct Pixel { Pixel() { } Pixel(unsigned char r, unsigned char g, unsigned char b) : r(r), g(g), b(b) { } unsigned char r, g, b; }; void UseVector() { TestTimer t("UseVector"); for(int i = 0; i < 1000; ++i) { int dimension = 999; std::vector<Pixel> pixels; pixels.resize(dimension * dimension); for(int i = 0; i < dimension * dimension; ++i) { pixels[i].r = 255; pixels[i].g = 0; pixels[i].b = 0; } } } void UseVectorPushBack() { TestTimer t("UseVectorPushBack"); for(int i = 0; i < 1000; ++i) { int dimension = 999; std::vector<Pixel> pixels; pixels.reserve(dimension * dimension); for(int i = 0; i < dimension * dimension; ++i) pixels.push_back(Pixel(255, 0, 0)); } } void UseArray() { TestTimer t("UseArray"); for(int i = 0; i < 1000; ++i) { int dimension = 999; Pixel * pixels = (Pixel *)malloc(sizeof(Pixel) * dimension * dimension); for(int i = 0 ; i < dimension * dimension; ++i) { pixels[i].r = 255; pixels[i].g = 0; pixels[i].b = 0; } free(pixels); } } int main() { TestTimer t1("The whole thing"); UseArray(); UseVector(); UseVectorPushBack(); return 0; }
我做错了什么吗? 或者我刚刚破坏了这个表演神话?
我在Visual Studio 2005中使用发布模式。
在Visual C ++中 , #define _SECURE_SCL 0
将UseVector
减半(将其降至4秒)。 海事组织真是太大了。
使用以下内容:
g ++ -O3 Time.cpp -I <MyBoost>
./a.out
UseArray在2.196秒内完成
UseVector在4.412秒内完成
UseVectorPushBack在8.017秒内完成
整个事情在14.626秒完成
所以数组的速度是矢量的两倍。
但是在更详细地查看代码之后,这是预期的; 当你穿过向量两次,数组只有一次。 注意:当你resize()
矢量resize()
时候,你不仅要分配内存,还要运行矢量,并调用每个成员的构造函数。
稍微重新安排代码,以便向量只初始化每个对象一次:
std::vector<Pixel> pixels(dimensions * dimensions, Pixel(255,0,0));
现在再次做同样的时间:
g ++ -O3 Time.cpp -I <MyBoost>
./a.out
UseVector在2.216秒内完成
现在的矢量性能只比数组稍微差一些。 海事组织这种差异是微不足道的,可能是由一大堆事情与测试无关。
我也考虑到你没有正确地初始化/销毁UseArrray()
方法中的Pixel对象,因为构造函数/析构函数都没有被调用(这可能不是这个简单的类的问题,但任何稍微复杂的指针或指针的成员)将导致问题。
伟大的问题。 我来到这里期待找到一些简单的修复方法来加速矢量测试。 这不像我预期的那样工作!
优化有所帮助,但这还不够。 通过优化,我仍然看到UseArray和UseVector之间的2倍性能差异。 有趣的是,UseVector明显比UseVectorPushBack慢,没有优化。
# g++ -Wall -Wextra -pedantic -o vector vector.cpp # ./vector UseArray completed in 20.68 seconds UseVector completed in 120.509 seconds UseVectorPushBack completed in 37.654 seconds The whole thing completed in 178.845 seconds # g++ -Wall -Wextra -pedantic -O3 -o vector vector.cpp # ./vector UseArray completed in 3.09 seconds UseVector completed in 6.09 seconds UseVectorPushBack completed in 9.847 seconds The whole thing completed in 19.028 seconds
想法#1 – 使用新的[]而不是malloc
我试着在UseArray中将malloc()
更改为new[]
,以便构造对象。 并从单个字段分配更改为分配一个像素实例。 噢,重命名内部循环变量为j
。
void UseArray() { TestTimer t("UseArray"); for(int i = 0; i < 1000; ++i) { int dimension = 999; // Same speed as malloc(). Pixel * pixels = new Pixel[dimension * dimension]; for(int j = 0 ; j < dimension * dimension; ++j) pixels[j] = Pixel(255, 0, 0); delete[] pixels; } }
令人惊讶的是(对我来说),这些变化都没有任何区别。 甚至没有改变到new[]
,这将默认构建所有的像素。 看起来gcc可以在使用new[]
时优化默认的构造函数调用,但是在使用vector
时不会。
想法#2 – 删除重复的operator []调用
我也试图摆脱三重operator[]
查找并缓存对pixels[j]
的引用。 这实际上放慢了UseVector! 哎呀。
for(int j = 0; j < dimension * dimension; ++j) { // Slower than accessing pixels[j] three times. Pixel &pixel = pixels[j]; pixel.r = 255; pixel.g = 0; pixel.b = 0; } # ./vector UseArray completed in 3.226 seconds UseVector completed in 7.54 seconds UseVectorPushBack completed in 9.859 seconds The whole thing completed in 20.626 seconds
想法#3 – 删除构造函数
那完全删除构造函数呢? 那么也许gcc可以在矢量创建时优化所有对象的构造。 如果我们将Pixel更改为:
struct Pixel { unsigned char r, g, b; };
结果:快10%左右。 仍然比数组慢。 嗯。
# ./vector UseArray completed in 3.239 seconds UseVector completed in 5.567 seconds
想法#4 – 使用迭代器而不是循环索引
如何使用vector<Pixel>::iterator
而不是循环索引?
for (std::vector<Pixel>::iterator j = pixels.begin(); j != pixels.end(); ++j) { j->r = 255; j->g = 0; j->b = 0; }
结果:
# ./vector UseArray completed in 3.264 seconds UseVector completed in 5.443 seconds
不,没有什么不同。 至少它不会慢。 我认为这将有类似于#2的表现,我使用了Pixel&
参考。
结论
即使一些聪明的cookie也能算出如何使向量循环和数组一样快,但这并不能说明std::vector
的默认行为。 编译器非常聪明,可以优化所有C ++的性能,并且像原始数组一样快地生成STL容器。
底线是编译器在使用std::vector
时无法优化no-op默认构造函数调用。 如果你使用纯new[]
它优化它们就好了。 但不与std::vector
。 即使你可以重写你的代码,以消除面对这里的咒语的构造函数调用:“编译器比你聪明,STL和普通C一样快,不用担心。
公平地说,你不能将C ++实现与C实现进行比较,因为我会调用你的malloc版本。 malloc不会创建对象 – 它只分配原始内存。 那么你把这个内存视为对象而不调用构造函数就是很差的C ++(可能是无效的 – 我会把它留给语言律师)。
也就是说,只需将malloc更改为new Pixel[dimensions*dimensions]
并自由delete [] pixels
,与简单实现Pixel并没有多大区别。 这是我的方块(E6600,64位)的结果:
UseArray completed in 0.269 seconds UseVector completed in 1.665 seconds UseVectorPushBack completed in 7.309 seconds The whole thing completed in 9.244 seconds
但是稍作改动,表格就会变成:
Pixel.h
struct Pixel { Pixel(); Pixel(unsigned char r, unsigned char g, unsigned char b); unsigned char r, g, b; };
Pixel.cc
#include "Pixel.h" Pixel::Pixel() {} Pixel::Pixel(unsigned char r, unsigned char g, unsigned char b) : r(r), g(g), b(b) {}
main.cc
#include "Pixel.h" [rest of test harness without class Pixel] [UseArray now uses new/delete not malloc/free]
这样编译:
$ g++ -O3 -c -o Pixel.o Pixel.cc $ g++ -O3 -c -o main.o main.cc $ g++ -o main main.o Pixel.o
我们得到非常不同的结果:
UseArray completed in 2.78 seconds UseVector completed in 1.651 seconds UseVectorPushBack completed in 7.826 seconds The whole thing completed in 12.258 seconds
使用Pixel的非内联构造函数,std :: vector现在击败原始数组。
看来,通过std :: vector和std:allocator分配的复杂性太多,不能像一个简单的new Pixel[n]
那样有效地进行优化。 然而,我们可以看到,问题是简单地通过调整一些测试函数来创建vector / array一次,而不是通过向vector访问,
void UseVector() { TestTimer t("UseVector"); int dimension = 999; std::vector<Pixel> pixels; pixels.resize(dimension * dimension); for(int i = 0; i < 1000; ++i) { for(int i = 0; i < dimension * dimension; ++i) { pixels[i].r = 255; pixels[i].g = 0; pixels[i].b = 0; } } }
和
void UseArray() { TestTimer t("UseArray"); int dimension = 999; Pixel * pixels = new Pixel[dimension * dimension]; for(int i = 0; i < 1000; ++i) { for(int i = 0 ; i < dimension * dimension; ++i) { pixels[i].r = 255; pixels[i].g = 0; pixels[i].b = 0; } } delete [] pixels; }
我们现在得到这些结果:
UseArray completed in 0.254 seconds UseVector completed in 0.249 seconds UseVectorPushBack completed in 7.298 seconds The whole thing completed in 7.802 seconds
我们可以从中学到的是,std :: vector可以与原始数组进行比较,但是如果需要多次创建和删除vector /数组,创建一个复杂的对象将会花费更多的时间来创建一个简单的数组当元素的构造函数不内联时。 我不认为这是非常令人惊讶的。
这是一个古老而受欢迎的问题。
在这一点上,许多程序员将在C ++ 11中工作。 而在C ++ 11中,写入的OP代码对于UseArray
或UseVector
同样快速运行。
UseVector completed in 3.74482 seconds UseArray completed in 3.70414 seconds
基本的问题是,虽然你的Pixel
结构是未初始化的,但是std::vector<T>::resize( size_t, T const&=T() )
需要一个默认构造的Pixel
并复制它 。 编译器没有注意到它被要求复制未初始化的数据,所以它实际上执行了复制。
在C ++ 11中, std::vector<T>::resize
有两个重载。 第一个是std::vector<T>::resize(size_t)
,另一个是std::vector<T>::resize(size_t, T const&)
。 这意味着当你调用没有第二个参数的resize
时,它只是默认的构造,而且编译器足够聪明,可以认识到默认构造什么都不做,所以它跳过了缓冲区。
(添加了两个重载处理可移动的,可构造的和不可复制的类型 – 在处理未初始化数据时的性能改进是一种奖励)。
push_back
解决方案还会执行fencepost检查,这会降低速度,所以它比malloc
版本更慢。
现场示例 (我也用chrono::high_resolution_clock
替换了定时器)。
请注意,如果你有一个通常需要初始化的结构,但是你想在缓冲区增长后处理它,你可以用一个自定义的std::vector
分配器来完成。 如果你想把它移动到一个更正常的std::vector
,我相信认真使用allocator_traits
和覆盖==
可能会把它拉下来,但我不确定。
试试这个:
void UseVectorCtor() { TestTimer t("UseConstructor"); for(int i = 0; i < 1000; ++i) { int dimension = 999; std::vector<Pixel> pixels(dimension * dimension, Pixel(255, 0, 0)); } }
我得到几乎完全一样的性能与数组。
关于vector
的事情是,它比数组更普遍。 这意味着你必须考虑如何使用它。 它可以以很多不同的方式使用,提供数组甚至不具有的功能。 如果您为了您的目的而使用“错误”,则会产生大量开销,但是如果正确使用它,则通常基本上是零开销的数据结构。 在这种情况下,问题在于你单独初始化向量(导致所有元素都有其默认值),然后用正确的值单独覆盖每个元素。 编译器要比使用数组做同样的事情要好得多。 这就是为什么向量提供了一个构造函数,它可以让你做到这一点:用值X
初始化N
元素。
而当你使用它时,矢量和数组一样快。
所以,不,你没有破坏表演神话。 但是你已经证明,如果你最好地使用矢量,这也是一个很好的观点。 🙂
好的一面是, 最简单的用法是最快的。 如果你将我的代码段(单行)与John Kugelman的答案进行对比,包含大量的调整和优化,但仍然不能完全消除性能差异,那么很明显, vector
设计是非常巧妙的。 你不必为了获得与数组相同的速度而跳过箍环。 相反,你必须使用最简单的解决方案。
当我第一次看到你的代码的时候,这并不是一个公平的比较; 我绝对认为你没有把苹果与苹果比较。 所以我想,让我们在所有测试中调用构造函数和析构函数。 然后比较。
const size_t dimension = 1000; void UseArray() { TestTimer t("UseArray"); for(size_t j = 0; j < dimension; ++j) { Pixel* pixels = new Pixel[dimension * dimension]; for(size_t i = 0 ; i < dimension * dimension; ++i) { pixels[i].r = 255; pixels[i].g = 0; pixels[i].b = (unsigned char) (i % 255); } delete[] pixels; } } void UseVector() { TestTimer t("UseVector"); for(size_t j = 0; j < dimension; ++j) { std::vector<Pixel> pixels(dimension * dimension); for(size_t i = 0; i < dimension * dimension; ++i) { pixels[i].r = 255; pixels[i].g = 0; pixels[i].b = (unsigned char) (i % 255); } } } int main() { TestTimer t1("The whole thing"); UseArray(); UseVector(); return 0; }
我的想法是,在这种设置下,它们应该完全一样。 事实证明,我错了。
UseArray completed in 3.06 seconds UseVector completed in 4.087 seconds The whole thing completed in 10.14 seconds
那么为什么这个30%的性能损失甚至会发生呢? STL拥有所有的头文件,因此编译器应该可以理解所有需要的东西。
我的想法是,它是如何将所有值初始化为默认构造函数。 所以我做了一个测试:
class Tester { public: static int count; static int count2; Tester() { count++; } Tester(const Tester&) { count2++; } }; int Tester::count = 0; int Tester::count2 = 0; int main() { std::vector<Tester> myvec(300); printf("Default Constructed: %i\nCopy Constructed: %i\n", Tester::count, Tester::count2); return 0; }
结果是我怀疑:
Default Constructed: 1 Copy Constructed: 300
这显然是减速的原因,矢量使用复制构造函数初始化默认构造对象中的元素。
这意味着,在构建矢量期间发生以下伪操作顺序:
Pixel pixel; for (auto i = 0; i < N; ++i) vector[i] = pixel;
由于编译器的隐式拷贝构造函数,它被扩展到以下内容:
Pixel pixel; for (auto i = 0; i < N; ++i) { vector[i].r = pixel.r; vector[i].g = pixel.g; vector[i].b = pixel.b; }
所以默认Pixel
保持未初始化,其余的都是初始化的默认Pixel
的未初始化的值。
与New[]
/ Delete[]
的备选情况相比:
int main() { Tester* myvec = new Tester[300]; printf("Default Constructed: %i\nCopy Constructed:%i\n", Tester::count, Tester::count2); delete[] myvec; return 0; } Default Constructed: 300 Copy Constructed: 0
它们全部留给它们未初始化的值,并且没有对序列进行双重迭代。
有了这些信息,我们该如何测试呢? 让我们试着重写隐式拷贝构造函数。
Pixel(const Pixel&) {}
结果呢?
UseArray completed in 2.617 seconds UseVector completed in 2.682 seconds The whole thing completed in 5.301 seconds
总而言之,如果您经常制作数百个载体,请重新考虑您的算法 。
无论如何, STL的实现不会因为一些未知的原因而变慢,而只是按照你的要求去做。 希望你能更好地了解。
尝试禁用选中的迭代器,并在发布模式下构建。 你不应该看到太多的性能差异。
GNU的STL(和其他),给vector<T>(n)
,默认构造一个原型对象T()
– 编译器将优化掉空的构造函数 – 然后任何垃圾的副本碰巧存在于现在保留的内存地址中因为该对象由STL的__uninitialized_fill_n_aux
,该循环将该对象的副本作为向量中的默认值进行填充。 所以,“我的”STL不是循环构建,而是构建循环/复制。 这是违反直觉的,但我应该记住,因为我评论了最近关于这一点的一个stackoverflow问题:构造/复制可以更有效的引用计数的对象等。
所以:
vector<T> x(n);
要么
vector<T> x; x.resize(n);
在许多STL实现中 – 就像:
T temp; for (int i = 0; i < n; ++i) x[i] = temp;
问题是,目前这一代编译器优化器似乎并没有从temp是未初始化的垃圾的洞察中工作,并且无法优化循环和默认拷贝构造函数调用。 你可能会认为编译器绝对不应该优化它,因为编写上述代码的程序员有一个合理的期望,即所有的对象在循环后都是相同的,即使是垃圾(通常关于'相同'/ operator == vs memcmp / operator = etc apply)。 不能期望编译器对std :: vector <>的更大的上下文有任何额外的了解,或者将来建议这个优化的安全性。
这可以与更明显的直接实施形成对照:
for (int i = 0; i < n; ++i) x[i] = T();
我们可以期望编译器能够优化出来。
为了更清楚地了解向量行为的这个方面的理由,可以考虑:
std::vector<big_reference_counted_object> x(10000);
显然,如果我们使10000个独立对象与10000个引用相同的数据相比,这是一个主要区别。 有一个合理的观点认为,保护偶然的C ++用户无意中做一些如此昂贵的事情的好处胜过了难以优化拷贝构建的实际成本。
原文答案(仅供参考/理解意见):没有机会。 向量与数组一样快,至少如果你明智地保留空间。 …
马丁·约克的回答困扰了我,因为这似乎是一种尝试在地毯下刷新初始化问题。 但他认为多余的违约建筑是性能问题的根源是正确的。
[编辑:马丁的答案不再建议改变默认的构造函数。]
对于眼前的问题,你当然可以调用vector<Pixel>
ctor的2参数版本:
std::vector<Pixel> pixels(dimension * dimension, Pixel(255, 0, 0));
如果你想用一个常数值进行初始化,这是一个常见的情况。 但更普遍的问题是: 如何有效地初始化比常量更复杂的东西?
为此,您可以使用一个back_insert_iterator
,它是一个迭代器适配器。 这是一个int
s向量的例子,虽然一般的想法对于Pixel
s来说也是一样的:
#include <iterator> // Simple functor return a list of squares: 1, 4, 9, 16... struct squares { squares() { i = 0; } int operator()() const { ++i; return i * i; } private: int i; }; ... std::vector<int> v; v.reserve(someSize); // To make insertions efficient std::generate_n(std::back_inserter(v), someSize, squares());
或者,你可以使用copy()
或transform()
来代替generate_n()
。
缺点是构建初始值的逻辑需要被移到一个单独的类中,这比使用它的方便(尽管C ++ 1x中的lambdas使这更好)。 另外我预计这仍然不会像基于malloc()
的非STL版本那么快,但是我期望它会接近,因为它只为每个元素做一个构造。
向量的另外调用像素构造函数。
每一个正在造成近一百万欧元的运行时间。
编辑:那么有外部1 … 1000循环,所以使这十亿CTC电话!
编辑2:看到UseArray案例的反汇编会很有趣。 优化器可以优化整个事情,因为除了刻录CPU之外,它没有任何效果。
以下是向量中的push_back
方法的工作原理:
- 向量在初始化时分配X个空间量。
- 如下所述,它检查当前底层数组中是否有空间。
- It makes a copy of the item in the push_back call.
After calling push_back
X items:
- The vector reallocates kX amount of space into a 2nd array.
- It Copies the entries of the first array onto the second.
- Discards the first array.
- Now uses the second array as storage until it reaches kX entries.
重复。 If you're not reserving
space its definitely going to be slower. More than that, if it's expensive to copy the item then 'push_back' like that is going to eat you alive.
As to the vector
versus array thing, I'm going to have to agree with the other people. Run in release, turn optimizations on, and put in a few more flags so that the friendly people at Microsoft don't #@%$^ it up for ya.
One more thing, if you don't need to resize, use Boost.Array.
Some profiler data (pixel is aligned to 32 bits):
g++ -msse3 -O3 -ftree-vectorize -g test.cpp -DNDEBUG && ./a.out UseVector completed in 3.123 seconds UseArray completed in 1.847 seconds UseVectorPushBack completed in 9.186 seconds The whole thing completed in 14.159 seconds
Blah
andrey@nv:~$ opannotate --source libcchem/src/a.out | grep "Total samples for file" -A3 Overflow stats not available * Total samples for file : "/usr/include/c++/4.4/ext/new_allocator.h" * * 141008 52.5367 */ -- * Total samples for file : "/home/andrey/libcchem/src/test.cpp" * * 61556 22.9345 */ -- * Total samples for file : "/usr/include/c++/4.4/bits/stl_vector.h" * * 41956 15.6320 */ -- * Total samples for file : "/usr/include/c++/4.4/bits/stl_uninitialized.h" * * 20956 7.8078 */ -- * Total samples for file : "/usr/include/c++/4.4/bits/stl_construct.h" * * 2923 1.0891 */
In allocator
:
: // _GLIBCXX_RESOLVE_LIB_DEFECTS : // 402. wrong new expression in [some_] allocator::construct : void : construct(pointer __p, const _Tp& __val) 141008 52.5367 : { ::new((void *)__p) _Tp(__val); }
vector
:
:void UseVector() :{ /* UseVector() total: 60121 22.3999 */ ... : : 10790 4.0201 : for (int i = 0; i < dimension * dimension; ++i) { : 495 0.1844 : pixels[i].r = 255; : 12618 4.7012 : pixels[i].g = 0; : 2253 0.8394 : pixels[i].b = 0; : : }
排列
:void UseArray() :{ /* UseArray() total: 35191 13.1114 */ : ... : 136 0.0507 : for (int i = 0; i < dimension * dimension; ++i) { : 9897 3.6874 : pixels[i].r = 255; : 3511 1.3081 : pixels[i].g = 0; : 21647 8.0652 : pixels[i].b = 0;
Most of the overhead is in the copy constructor. 例如,
std::vector < Pixel > pixels;//(dimension * dimension, Pixel()); pixels.reserve(dimension * dimension); for (int i = 0; i < dimension * dimension; ++i) { pixels[i].r = 255; pixels[i].g = 0; pixels[i].b = 0; }
It has the same performance as an array.
My laptop is Lenova G770 (4 GB RAM).
The OS is Windows 7 64-bit (the one with laptop)
Compiler is MinGW 4.6.1.
The IDE is Code::Blocks .
I test the source codes of the first post.
The results
O2 optimization
UseArray completed in 2.841 seconds
UseVector completed in 2.548 seconds
UseVectorPushBack completed in 11.95 seconds
The whole thing completed in 17.342 seconds
system pause
O3 optimization
UseArray completed in 1.452 seconds
UseVector completed in 2.514 seconds
UseVectorPushBack completed in 12.967 seconds
The whole thing completed in 16.937 seconds
It looks like the performance of vector is worse under O3 optimization.
If you change the loop to
pixels[i].r = i; pixels[i].g = i; pixels[i].b = i;
The speed of array and vector under O2 and O3 are almost the same.
A better benchmark (I think…), compiler due to optimizations can change code, becouse results of allocated vectors/arrays are not used anywhere. 结果:
$ g++ test.cpp -o test -O3 -march=native $ ./test UseArray inner completed in 0.652 seconds UseArray completed in 0.773 seconds UseVector inner completed in 0.638 seconds UseVector completed in 0.757 seconds UseVectorPushBack inner completed in 6.732 seconds UseVectorPush completed in 6.856 seconds The whole thing completed in 8.387 seconds
Compiler:
gcc version 6.2.0 20161019 (Debian 6.2.0-9)
中央处理器:
model name : Intel(R) Core(TM) i7-3630QM CPU @ 2.40GHz
And the code:
#include <cstdlib> #include <vector> #include <iostream> #include <string> #include <boost/date_time/posix_time/ptime.hpp> #include <boost/date_time/microsec_time_clock.hpp> class TestTimer { public: TestTimer(const std::string & name) : name(name), start(boost::date_time::microsec_clock<boost::posix_time::ptime>::local_time()) { } ~TestTimer() { using namespace std; using namespace boost; posix_time::ptime now(date_time::microsec_clock<posix_time::ptime>::local_time()); posix_time::time_duration d = now - start; cout << name << " completed in " << d.total_milliseconds() / 1000.0 << " seconds" << endl; } private: std::string name; boost::posix_time::ptime start; }; struct Pixel { Pixel() { } Pixel(unsigned char r, unsigned char g, unsigned char b) : r(r), g(g), b(b) { } unsigned char r, g, b; }; void UseVector(std::vector<std::vector<Pixel> >& results) { TestTimer t("UseVector inner"); for(int i = 0; i < 1000; ++i) { int dimension = 999; std::vector<Pixel>& pixels = results.at(i); pixels.resize(dimension * dimension); for(int i = 0; i < dimension * dimension; ++i) { pixels[i].r = 255; pixels[i].g = 0; pixels[i].b = 0; } } } void UseVectorPushBack(std::vector<std::vector<Pixel> >& results) { TestTimer t("UseVectorPushBack inner"); for(int i = 0; i < 1000; ++i) { int dimension = 999; std::vector<Pixel>& pixels = results.at(i); pixels.reserve(dimension * dimension); for(int i = 0; i < dimension * dimension; ++i) pixels.push_back(Pixel(255, 0, 0)); } } void UseArray(Pixel** results) { TestTimer t("UseArray inner"); for(int i = 0; i < 1000; ++i) { int dimension = 999; Pixel * pixels = (Pixel *)malloc(sizeof(Pixel) * dimension * dimension); results[i] = pixels; for(int i = 0 ; i < dimension * dimension; ++i) { pixels[i].r = 255; pixels[i].g = 0; pixels[i].b = 0; } // free(pixels); } } void UseArray() { TestTimer t("UseArray"); Pixel** array = (Pixel**)malloc(sizeof(Pixel*)* 1000); UseArray(array); for(int i=0;i<1000;++i) free(array[i]); free(array); } void UseVector() { TestTimer t("UseVector"); { std::vector<std::vector<Pixel> > vector(1000, std::vector<Pixel>()); UseVector(vector); } } void UseVectorPushBack() { TestTimer t("UseVectorPush"); { std::vector<std::vector<Pixel> > vector(1000, std::vector<Pixel>()); UseVectorPushBack(vector); } } int main() { TestTimer t1("The whole thing"); UseArray(); UseVector(); UseVectorPushBack(); return 0; }
With the right options, vectors and arrays can generate identical asm . In these cases, they are of course the same speed, because you get the same executable file either way.
By the way the slow down your seeing in classes using vector also occurs with standard types like int. Heres a multithreaded code:
#include <iostream> #include <cstdio> #include <map> #include <string> #include <typeinfo> #include <vector> #include <pthread.h> #include <sstream> #include <fstream> using namespace std; //pthread_mutex_t map_mutex=PTHREAD_MUTEX_INITIALIZER; long long num=500000000; int procs=1; struct iterate { int id; int num; void * member; iterate(int a, int b, void *c) : id(a), num(b), member(c) {} }; //fill out viterate and piterate void * viterate(void * input) { printf("am in viterate\n"); iterate * info=static_cast<iterate *> (input); // reproduce member type vector<int> test= *static_cast<vector<int>*> (info->member); for (int i=info->id; i<test.size(); i+=info->num) { //printf("am in viterate loop\n"); test[i]; } pthread_exit(NULL); } void * piterate(void * input) { printf("am in piterate\n"); iterate * info=static_cast<iterate *> (input);; int * test=static_cast<int *> (info->member); for (int i=info->id; i<num; i+=info->num) { //printf("am in piterate loop\n"); test[i]; } pthread_exit(NULL); } int main() { cout<<"producing vector of size "<<num<<endl; vector<int> vtest(num); cout<<"produced a vector of size "<<vtest.size()<<endl; pthread_t thread[procs]; iterate** it=new iterate*[procs]; int ans; void *status; cout<<"begining to thread through the vector\n"; for (int i=0; i<procs; i++) { it[i]=new iterate(i, procs, (void *) &vtest); // ans=pthread_create(&thread[i],NULL,viterate, (void *) it[i]); } for (int i=0; i<procs; i++) { pthread_join(thread[i], &status); } cout<<"end of threading through the vector"; //reuse the iterate structures cout<<"producing a pointer with size "<<num<<endl; int * pint=new int[num]; cout<<"produced a pointer with size "<<num<<endl; cout<<"begining to thread through the pointer\n"; for (int i=0; i<procs; i++) { it[i]->member=&pint; ans=pthread_create(&thread[i], NULL, piterate, (void*) it[i]); } for (int i=0; i<procs; i++) { pthread_join(thread[i], &status); } cout<<"end of threading through the pointer\n"; //delete structure array for iterate for (int i=0; i<procs; i++) { delete it[i]; } delete [] it; //delete pointer delete [] pint; cout<<"end of the program"<<endl; return 0; }
The behavior from the code shows the instantiation of vector is the longest part of the code. Once you get through that bottle neck. The rest of the code runs extremely fast. This is true no matter how many threads you are running on.
By the way ignore the absolutely insane number of includes. I have been using this code to test things for a project so the number of includes keep growing.
I just want to mention that vector (and smart_ptr) is just a thin layer add on top of raw arrays (and raw pointers). And actually the access time of an vector in continuous memory is faster than array. The following code shows the result of initialize and access vector and array.
#include <boost/date_time/posix_time/posix_time.hpp> #include <iostream> #include <vector> #define SIZE 20000 int main() { srand (time(NULL)); vector<vector<int>> vector2d; vector2d.reserve(SIZE); int index(0); boost::posix_time::ptime start_total = boost::posix_time::microsec_clock::local_time(); // timer start - build + access for (int i = 0; i < SIZE; i++) { vector2d.push_back(vector<int>(SIZE)); } boost::posix_time::ptime start_access = boost::posix_time::microsec_clock::local_time(); // timer start - access for (int i = 0; i < SIZE; i++) { index = rand()%SIZE; for (int j = 0; j < SIZE; j++) { vector2d[index][index]++; } } boost::posix_time::ptime end = boost::posix_time::microsec_clock::local_time(); boost::posix_time::time_duration msdiff = end - start_total; cout << "Vector total time: " << msdiff.total_milliseconds() << "milliseconds.\n"; msdiff = end - start_acess; cout << "Vector access time: " << msdiff.total_milliseconds() << "milliseconds.\n"; int index(0); int** raw2d = nullptr; raw2d = new int*[SIZE]; start_total = boost::posix_time::microsec_clock::local_time(); // timer start - build + access for (int i = 0; i < SIZE; i++) { raw2d[i] = new int[SIZE]; } start_access = boost::posix_time::microsec_clock::local_time(); // timer start - access for (int i = 0; i < SIZE; i++) { index = rand()%SIZE; for (int j = 0; j < SIZE; j++) { raw2d[index][index]++; } } end = boost::posix_time::microsec_clock::local_time(); msdiff = end - start_total; cout << "Array total time: " << msdiff.total_milliseconds() << "milliseconds.\n"; msdiff = end - start_acess; cout << "Array access time: " << msdiff.total_milliseconds() << "milliseconds.\n"; for (int i = 0; i < SIZE; i++) { delete [] raw2d[i]; } return 0; }
输出是:
Vector total time: 925milliseconds. Vector access time: 4milliseconds. Array total time: 30milliseconds. Array access time: 21milliseconds.
So the speed will be almost the same if you use it properly. (as others mentioned using reserve() or resize()).
Well, because vector::resize() does much more processing than plain memory allocation (by malloc).
Try to put a breakpoint in your copy constructor (define it so that you can breakpoint!) and there goes the additional processing time.
I Have to say I'm not an expert in C++. But to add some experiments results:
compile: gcc-6.2.0/bin/g++ -O3 -std=c++14 vector.cpp
machine:
Intel(R) Xeon(R) CPU E5-2690 v2 @ 3.00GHz
OS:
2.6.32-642.13.1.el6.x86_64
输出:
UseArray completed in 0.167821 seconds UseVector completed in 0.134402 seconds UseConstructor completed in 0.134806 seconds UseFillConstructor completed in 1.00279 seconds UseVectorPushBack completed in 6.6887 seconds The whole thing completed in 8.12888 seconds
Here the only thing I feel strange is that "UseFillConstructor" performance compared with "UseConstructor".
代码:
void UseConstructor() { TestTimer t("UseConstructor"); for(int i = 0; i < 1000; ++i) { int dimension = 999; std::vector<Pixel> pixels(dimension*dimension); for(int i = 0; i < dimension * dimension; ++i) { pixels[i].r = 255; pixels[i].g = 0; pixels[i].b = 0; } } } void UseFillConstructor() { TestTimer t("UseFillConstructor"); for(int i = 0; i < 1000; ++i) { int dimension = 999; std::vector<Pixel> pixels(dimension*dimension, Pixel(255,0,0)); } }
So the additional "value" provided slows down performance quite a lot, which I think is due to multiple call to copy constructor. 但…
Compile:
gcc-6.2.0/bin/g++ -std=c++14 -O vector.cpp
输出:
UseArray completed in 1.02464 seconds UseVector completed in 1.31056 seconds UseConstructor completed in 1.47413 seconds UseFillConstructor completed in 1.01555 seconds UseVectorPushBack completed in 6.9597 seconds The whole thing completed in 11.7851 seconds
So in this case, gcc optimization is very important but it can't help you much when a value is provided as default. This, is against my tuition actually. Hopefully it helps new programmer when choose which vector initialization format.