标准非STL容器 : bitset

朱雀 2022-08-05 07:27 249阅读 0赞

1. 概念
什么是“标准非STL容器”?标准非STL容器是指“可以认为它们是容器,但是他们并不满足STL容器的所有要求”。前文提到的容器适配器stack、queue及priority_queue都是标准非STL容器的一部分。此外,valarray也是标准非STL容器。
bitset:一种 高效位集合操作容器。

2. API
bitset提供的api:
(constructor) Construct bitset (public member function)
operator[] Access bit (public member function)
set Set bits (public member function)
reset Reset bits (public member function )
flip Flip bits (public member function)
to_ulong Convert to unsigned long integer (public member function)
to_string Convert to string (public member function)
count Count bits set (public member function)
size Return size (public member function)
test Return bit value (public member function )
any Test if any bit is set (public member function)
none Test if no bit is set (public member function)

3. 源码剖析
SGI bitset部分实现源码

[cpp] view plain copy

  1. template
  2. class bitset : private _Base_bitset<__BITSET_WORDS(_Nb)>
  3. {
  4. private:
  5. typedef _Base_bitset<__BITSET_WORDS(_Nb)> _Base;
  6. typedef unsigned long _WordT;
  7. private:
  8. void _M_do_sanitize() {
  9. _Sanitize<_Nb%__BITS_PER_WORD>::_M_do_sanitize(this->_M_hiword());
  10. }
  11. …..
  12. }

[cpp] view plain copy

  1. #define __BITS_PER_WORD (CHAR_BIT*sizeof(unsigned long))
  2. #define __BITSET_WORDS(__n) \
  3. ((__n) < 1 ? 1 : ((__n) + __BITS_PER_WORD - 1)/__BITS_PER_WORD)

[cpp] view plain copy

  1. template
  2. struct _Base_bitset {
  3. typedef unsigned long _WordT;
  4. _WordT _M_w[_Nw]; // 0 is the least significant word.
  5. _Base_bitset( void ) { _M_do_reset(); }
  6. _Base_bitset(unsigned long __val) {
  7. _M_do_reset();
  8. _M_w[0] = __val;
  9. }
  10. static size_t _S_whichword( size_t __pos )
  11. { return __pos / __BITS_PER_WORD; }
  12. static size_t _S_whichbyte( size_t __pos )
  13. { return (__pos % __BITS_PER_WORD) / CHAR_BIT; }
  14. static size_t _S_whichbit( size_t __pos )
  15. { return __pos % __BITS_PER_WORD; }
  16. static _WordT _S_maskbit( size_t __pos )
  17. { return (static_cast<_WordT>(1)) << _S_whichbit(__pos); }
  18. _WordT& _M_getword(size_t __pos) { return _M_w[_S_whichword(__pos)]; }
  19. _WordT _M_getword(size_t __pos) const { return _M_w[_S_whichword(__pos)]; }
  20. _WordT& _M_hiword() { return _M_w[_Nw - 1]; }
  21. _WordT _M_hiword() const { return _M_w[_Nw - 1]; }
  22. void _M_do_and(const _Base_bitset<_Nw>& __x) {
  23. for ( size_t __i = 0; __i < _Nw; __i++ ) {
  24. _M_w[__i] &= __x._M_w[__i];
  25. }
  26. }
  27. void _M_do_or(const _Base_bitset<_Nw>& __x) {
  28. for ( size_t __i = 0; __i < _Nw; __i++ ) {
  29. _M_w[__i] |= __x._M_w[__i];
  30. }
  31. }
  32. void _M_do_xor(const _Base_bitset<_Nw>& __x) {
  33. for ( size_t __i = 0; __i < _Nw; __i++ ) {
  34. _M_w[__i] ^= __x._M_w[__i];
  35. }
  36. }

节选上述代码,可以得到:

  1. bitset继承_Base_bitset,具体操作封装在_Base_bitset中
  2. bitset 的size作为模板参数(非类型模板参数的一个要求是,编译器能在编译期就能把参数确定下来),因此, bitset大小在编译期固定,不支持插入和删除元素
  3. 各种位操作, 性能高
    4._Base_bitset使unsigned long作为底层存储, 不支持指针、引用、迭代器
  4. 使用 _WordT _M_w[_Nw];分配内存, 因此在栈中定义bitset需要注意大小(和STL标准容器堆内存分配区别开)
    eg,下面的代码将栈溢出(测试机器栈内存10M)

[cpp] view plain copy

  1. void fun()
  2. {
  3. const int n = 800000000;
  4. bitset a;
  5. cout << a.size() << endl;
  6. }
  7. int main(int argc, char** argv)
  8. {
  9. fun();
  10. return 0;
  11. }

大内存分配可以分配在堆中,如下:

[cpp] view plain copy

  1. const int n = 800000000;
  2. bitset *a = new(std::nothrow) bitset;
  3. if(a)
  4. {
  5. cout << a->size() << endl;
  6. delete a;
  7. a = NULL;
  8. }

4. vector及deque
bitset高效,但是size必须在编译器确定,不支持插入和删除。因此,一个可能的替代品是vector和deque
两者的区别:
vector不是一个STL容器,并且不容纳bool(like bitse底层t机制)
deque是一个STL容器,它保存真正的bool值
分别运行

[cpp] view plain copy

  1. deque a;
  2. a[0] = 0;
  3. bool* b = &a[0];
  4. cout << *b << endl;

[cpp] view plain copy

  1. vector a;
  2. a[0] = 0;
  3. bool* b = &a[0];
  4. cout << *b << endl;

将会发现:
使用deque正确,而是用vector会报错:“cannot convert `std::_Bit_reference*‘ to `bool*‘ in initialization“

但是,deque简直是在践踏内存。
使用deque

[cpp] view plain copy

  1. int main(int argc, char** argv)
  2. {
  3. deque a(10000000000);
  4. sleep(100);
  5. return 0;
  6. }

内存使用:
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
23612 work 25 0 9990m 9.8g 720 S 0.0 65.0 0:39.35 test

使用vector

[cpp] view plain copy

  1. int main(int argc, char** argv)
  2. {
  3. vector a(10000000000);
  4. sleep(100);
  5. return 0;
  6. }

内存使用:
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
23909 work 25 0 1198m 1.2g 716 S 0.0 7.8 0:01.31 test

使用bitset

[cpp] view plain copy

  1. int main(int argc, char** argv)
  2. {
  3. const unsigned long int n = 10000000000;
  4. bitset *a = new(std::nothrow) bitset;
  5. sleep(100);
  6. return 0;
  7. }

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
24439 work 25 0 1198m 1.2g 712 S 30.7 7.8 0:00.92 test

10亿个bool,vector和bitset使用内存1198M,deque则是9990M

5. 总结
在需要对位集合进行操作的时候,如何操作集合大小比较固定,优先选择高效的bitset;
如果需要动态增删元素,或者编译期间无法确定集合大小,则可以考虑vector,deque内存开销太大,基本上不考虑。

参考:
http://www.sgi.com/tech/stl/download.html
http://www.cplusplus.com/reference/stl/vector/
http://www.cplusplus.com/reference/stl/bitset/

扩展阅读:

Vector specialization: vector

The vector class template has a special template specialization for the bool type.

This specialization is provided to optimize for space allocation: In this template specialization, each element occupies only one bit (which is eight times less than the smallest type in C++: char).

The references to elements of a bool vector returned by the vector members are not references to bool objects, but a special member type which is a reference to a single bit, defined inside the vector class specialization as:

[cpp] view plain copy

  1. class vector::reference {
  2. friend class vector;
  3. reference(); // no public constructor
  4. public:
  5. ~reference();
  6. operator bool () const; // convert to bool
  7. reference& operator= ( const bool x ); // assign from bool
  8. reference& operator= ( const reference& x ); // assign from bit
  9. void flip(); // flip bit value.
  10. }

发表评论

表情:
评论列表 (有 0 条评论,249人围观)

还没有评论,来说两句吧...

相关阅读