An earlier post discussed using virtual memory APIs to implement a FILO memory allocator. It is meant to be used to support Entity creation within range-based for loops on the ranges created by EntityDatabaseView
.
Range-Based For Loop
There are currently several factors preventing this to become real. The first it how range-based for loops work. It is basically a syntactic candy which rewrites to:
// https://en.cppreference.com/w/cpp/language/range-for { init-statement auto && __range = range_expression ; auto __begin = begin_expr ; auto __end = end_expr ; for ( ; __begin != __end; ++__begin) { range_declaration = *__begin; loop_statement } }
It is obvious that the end_expr
is only evaluated once. However, inserting new Entity during iterating the Entity Pages potentially invalidates that end iterator by pushing new pages into the PageAllocator
. The current begin()
and end()
functions of the EntityDatabaseView
return iterators filtered by checking whether the Entity Page satisfies the Component Filter. The problem of a changing end iterator could be fixed by introducing a helper class EndIterator
, whose equality with normal iterators is checked by fetching the latest end()
iterator from the std::vector
inside the PageAllocator
. However, the standard states that when memory reallocation happens, all the iterators are invalidated for the vector in question. To conform to the standard and not rely on undefined behaviors, the begin iterator has to be fixed, too. It could be done by simply get rid of the base iterator of the vector or whatever things could be affected by the reallocation: just use an index to access the vector.
In this way, the range we used in the for loop is actually dynamically based on the realtime size of the underlying vector and won’t be affected by any iterator invalidation.
Allocator for std::vector
Another problem is the vector class. The only thing the std::vector
lacks comparing with our ideal is that it lacks the ability to grow the storage in-place. When it has to expand the storage, it allocates a larger new buffer, moves all the old elements to the new buffer, and deallocates the old one. It is an unfortunate design that the std::allocator
interface does not support reallocations. However, we don’t want to reinvent the vector class just because this one tiny bit of problems since it takes time and is error-prone. Therefore,
First, let’s check how MSVC’s standard library does the job.
void _Resize_reallocate(const size_type _Newsize, const _Ty2& _Val) { if (_Newsize > max_size()) { _Xlength(); } auto& _My_data = _Mypair._Myval2; pointer& _Myfirst = _My_data._Myfirst; pointer& _Mylast = _My_data._Mylast; const auto _Oldsize = static_cast<size_type>(_Mylast - _Myfirst); const size_type _Newcapacity = _Calculate_growth(_Newsize); const pointer _Newvec = _Getal().allocate(_Newcapacity); const pointer _Appended_first = _Newvec + _Oldsize; pointer _Appended_last = _Appended_first; _TRY_BEGIN _Appended_last = _Ufill(_Appended_first, _Newsize - _Oldsize, _Val); _Umove_if_noexcept(_Myfirst, _Mylast, _Newvec); _CATCH_ALL _Destroy(_Appended_first, _Appended_last); _Getal().deallocate(_Newvec, _Newcapacity); _RERAISE; _CATCH_END _Change_array(_Newvec, _Newsize, _Newcapacity); }
The _Umove_if_noexcept
function, in turn, calls _Uninitialized_move
if the type is nothrow-constructible:
void _Umove_if_noexcept(pointer _First, pointer _Last, pointer _Dest) { // move_if_noexcept [_First, _Last) to raw _Dest, using allocator _Umove_if_noexcept1(_First, _Last, _Dest, bool_constant<disjunction_v<is_nothrow_move_constructible<_Ty>, negation<is_copy_constructible<_Ty>>>>{}); } void _Umove_if_noexcept1(pointer _First, pointer _Last, pointer _Dest, true_type) { // move [_First, _Last) to raw _Dest, using allocator _Uninitialized_move(_First, _Last, _Dest, _Getal()); }
The _Uninitialized_move
function is in the <xmemory>
header. It memmove()
the elements if they have a trivial move constructor or relies on the allocator to perform move construction on the uninitialized storage using a loop.
template <class _InIt, class _Alloc> _Alloc_ptr_t<_Alloc> _Uninitialized_move( const _InIt _First, const _InIt _Last, _Alloc_ptr_t<_Alloc> _Dest, _Alloc& _Al) { // move [_First, _Last) to raw _Dest, using _Al // note: only called internally from elsewhere in the STL using _Ptrval = typename _Alloc::value_type*; auto _UFirst = _Get_unwrapped(_First); const auto _ULast = _Get_unwrapped(_Last); if constexpr (conjunction_v<bool_constant<_Ptr_move_cat<decltype(_UFirst), _Ptrval>::_Really_trivial>, _Uses_default_construct<_Alloc, _Ptrval, decltype(_STD move(*_UFirst))>>) { // calls std::memmove _Copy_memmove(_UFirst, _ULast, _Unfancy(_Dest)); return _Dest + (_ULast - _UFirst); (void) _Al; } else { _Uninitialized_backout_al<_Alloc> _Backout{_Dest, _Al}; for (; _UFirst != _ULast; ++_UFirst) { _Backout._Emplace_back(_STD move(*_UFirst)); } return _Backout._Release(); } }
_Uninitialized_backout_al
is basically a helper for performing the move operations using the pointer type of the element. It is also in the <xmemory>
header.
template <class _Alloc> class _Uninitialized_backout_al { // struct to undo partially constructed ranges in _Uninitialized_xxx_al algorithms using pointer = _Alloc_ptr_t<_Alloc>; public: _Uninitialized_backout_al(pointer _Dest, _Alloc& _Al_) : _First(_Dest), _Last(_Dest), _Al(_Al_) {} // * irrelevant code * template <class... _Types> void _Emplace_back(_Types&&... _Vals) { // construct a new element at *_Last and increment allocator_traits<_Alloc>::construct(_Al, _Unfancy(_Last), _STD forward<_Types>(_Vals)...); ++_Last; } // * irrelevant code * private: pointer _First; pointer _Last; _Alloc& _Al; };
libstdc++ generally does the same thing. Except that I didn’t see it perform the memmove()
optimization. The relevant source codes are listed below:
- https://github.com/gcc-mirror/gcc/blob/master/libstdc%2B%2B-v3/include/bits/vector.tcc
- https://github.com/gcc-mirror/gcc/tree/master/libstdc%2B%2B-v3/include/bits/stl_vector.h
- https://github.com/gcc-mirror/gcc/blob/master/libstdc%2B%2B-v3/include/bits/stl_uninitialized.h
Because we know that our allocator actually returns the same memory address all the time and ignores the deallocate()
invocation, our aim is to trick the vector into only expanding its capacity without performing any move operations at all.
Since the vector uses the allocator to perform the move operation, we can overload the construct()
function of the allocator so that it becomes a no-op for moving the elements. Of course, this comes with the price that we cannot move instances of the contained elements into the vector, neither. e.g. emplace_back(T())
. Because it uses the same construct()
function and we overloaded it to a no-op.
Conclusions
Even we eliminated the expensive memory copying operations which drove us to deal with the virtual memory system in the first place, the use of standard vectors with our crazy allocator is still far from ideal:
- It relies on too many implementation details of the actual vector class, thus may break after updates.
- The logics inside the vector class is overwhelming comprehensive comparing our usage: we merely want a dynamic array which allocates and deallocates like a stack. Nothing more.
- The vector class grows to twice its old size each time running out of its capacity. Meanwhile, our virtual memory allocator is capable of committing and decomitting in the unit of pages: if we need more space, commit one more page. The expansion scheme of the vector works well if we know nothing about the memory allocation, but translates to a lot of wasted pages if we want to control the memory usage precisely. Even with demand paging, we cannot make sure that the implementation never touches the unused pages causing the pages to be allocated. Sure, we can check the source code and this case is unlikely to happen. But there is the risk of overcommitting when we have a lot of allocations.
Therefore, probably it is better to specialize our use to a dedicated container class which only contains our core usages.