Debugging Iterator Race Condition Caused by Entity Page Insertion during Iteration

Introduction

Since Input Systems require the Entities to be visited in an incremental order of time, the behavior of inserting Entity Pages was changed from inserting at the head of the linked list to inserting at the end of the list. This switch, unfortunately, led to an unexpected outcome which is a race condition with page iterators. This bug works together with the multithreaded execution of Systems to make the begin iterator to leap over the end iterator so the iteration will finally lead to an access violation.

Bug Analysis

The bug manifests itself as a begin iterator in a ranged iteration trying to dereference the INVALID_PAGE, thus causes an access violation. This happens in Systems that create Entities inside an iteration of a filtered Entity Page range. An example of such a System is System_fireworks_explode in the fireworks demo. This System iterates over Entities with ComponentFireworks and ComponentPosition, which are basically launched fireworks, then turns fireworks satisfying certain conditions into sparks. After that, the expired fireworks are deleted.

The race condition caused by inserting Entity Page at the list back

To see why this bug happens, we have to first understand how Entities are iterated in a multithreaded context. Currently, std::for_each() is used as a convenient method for distributing the workload over an iterable range to multiple threads. Since iterating directly over the filtered Entities is relatively expensive, we provide a range of Entity Pages to the for_each() function and use another range over the dereferenced page inside the lambda passed to the function. From now on, only the range over the dereferenced page is relevant.

// The update code of System_fireworks_explode

template <typename RuntimeServices, typename EntityDatabaseAccess>
void update(RuntimeServices &&rt, EntityDatabaseAccess &&db)
{
    // Workload distributed over the Entity Pages
    std::for_each(std::execution::par, db.begin(), db.end(), [&](auto &&p) mutable {
        static thread_local ArchetypeSpark spark;

        // Create a range over the dereferenced page
        auto range = db.page_view(p, Filter());
        auto begin = range.begin(); // <- points to INVALID_PAGE
        auto end = range.end(); // <- oops, points to the new page

        for(; begin != end; ++begin) // loop will never end!
        {
            assert(end == range.end());

            auto e = *begin;
            auto &f_f = USAGI_COMPONENT(e, ComponentFireworks);
            auto &f_phy = USAGI_COMPONENT(e, ComponentPhysics);
            auto &f_c = USAGI_COMPONENT(e, ComponentColor);

            // f_c.time_to_explode -= 1.f / 60;
            if(f_phy.velocity.y() < 10)
            {
                for(auto i = 0; i < f_f.num_sparks; ++i)
                {
                    spark.val<ComponentSpark>().fade_time_total =
                        spark.val<ComponentSpark>().fade_time_left =
                        dis_ft(gen);
                    spark.val<ComponentSpark>().base_color = f_c.rgb;

                    // copy the rocket position & color
                    spark.val<ComponentPosition>() =
                        USAGI_COMPONENT(e, ComponentPosition);
                    spark.val<ComponentColor>() =
                        USAGI_COMPONENT(e, ComponentColor);

                    // set particle props
                    spark.val<ComponentPhysics>().velocity =
                        polarToCartesian(dis_v(gen), dis(gen));
                    spark.val<ComponentSprite>().size = 5;

                    db.create(spark); // where a new page may be created
                }
                e.destroy();
            }
        }
    });
}

The range over the dereferenced page filters out only the Entities on that page satisfying the provided filter. This range is based on a pair of EntityIteratorFiltered. This filtered iterator works by taking a pair of begin and end iterators of the unfiltered iterator and using a predicate function to determine whether the current Entity obtained by dereferencing the underlying unfiltered begin iterator should be visible through this filtered iterator. When a filtered iterator is created, it increments the unfiltered begin iterator to the first Entity that satisfies the predicate from the current position or the unfiltered end iterator, in which case, the iteration is finished. When it is incremented, the underlying begin iterator is first incremented by one before looking for the first Entity satisfying the predicate. A range based on a pair of filtered iterators must have the same delimiting end iterator so that finally the filtered begin iterator gets incremented to equal the filtered end iterator, signaling the finish of the iteration.

In our case where the Entities satisfying a filter inside a page are being iterated, the filtered entity range is created by pointing the unfiltered begin iterator to the first Entity inside the current page and the unfiltered end to the first Entity inside the next page of the current page. Therefore, the filtered Entities inside the current page get visited without accessing any Entity in the next page.

The race condition occurs when we are trying to iterate over the Entities in the current last page in the linked list while some other thread is trying to insert a new page at the end of the page list. During the creation of the filtered begin iterator over the Entities inside a page, the delimiting end iterator refers to INVALID_PAGE since this is the last page. However, imagine just after the begin iterator was created, another thread inserts a new page at the end of the linked list so that the page we are iterating over is not the last page anymore. Instead, it’s current next page points to the page just inserted by the other thread. Now, during the creation of the filtered end iterator, its delimiting end iterator points to the first Entity of the newly inserted page. Thus the delimiting iterators of begin and end are out-of-sync.

Technically, if the begin iterator could ever be incremented so that it points to the first Entity of the newly inserted page, the iteration terminated because it now equals the end iterator. However, this is unlikely to be the case. With the delimiting end iterator of the filtered begin iterator being INVALID_PAGE, while a new page was inserted just after the page we are currently visiting, the filtered begin iterator could technically increment and visit the Entities in the newly inserted page. Because the first Entity in the newly inserted page is unlikely to satisfy the filter, the filtered begin iterator will likely skip over the position pointed by the filtered end iterator. Once this happens, it never goes back again, finally leads to the action trying to access INVALID_PAGE because the loop was fooled to think that there are still Entities to visit since the begin iterator does not equal the end.

// imagine we were to iterate over the entities in the last entity page
auto begin()
{
    // current page - ok
    auto this_page = this->entity_page_at_index(mPageIndex);
    auto next_page = this_page;
    // advances to page.next_page - ok: got -1 since this is the last page
    ++next_page;
    // a new page was inserted between here and the invocation of end()
    return IteratorT(
        this->mDatabase,
        this_page,
        0,
        // still refers to -1!
        { this->mDatabase, next_page, 0 }
    );
}

auto end()
{
    // oops, refer to the newly inserted page
    const auto next_page = ++this->entity_page_at_index(mPageIndex);
    return IteratorT(
        this->mDatabase,
        next_page,
        0,
        { this->mDatabase, next_page, 0 }
    );
}

void loop()
{
    auto _begin = begin();
    auto _end = end();
    while(_begin != _end)
    {
        // ...
      
        // when operator++ is called, the filtered iterator moves to
        // the next eligible entity until the specified end iterator,
        // which is -1 for _begin. Therefore, this iterator continues
        // to the end of the entity page list. because _end refers to
        // merely the first entity in the newly inserted page, which does
        // not necessarily satisfies the filter, it got skipped by the
        // _begin iterator, causing _begin > _end so the loop will
        // never end and finally an attempt to dereference the -1 page
        // will be made, causing an access violation.
        ++_begin;
    }
}

Conclusion

The steps to reproduce the bug are:

  • Create a begin filtered iterator on the first Entity of the last Entity Page with the delimiting end iterator being the first Entity of the next page, which is INVALID_PAGE.
  • Insert a new page whose first Entity does not satisfy the filter.
  • Create an end filtered iterator pointing to the first Entity of the next page of the original last page with the same delimiting end iterator. It was supposed to be the first Entity of INVALID_PAGE. However, it now points to the first Entity in the newly inserted page.
  • Iterate over this pair of iterators. The begin iterator will go past the end iterator, finally causing an access violation.
Yukino

1 thought on “Debugging Iterator Race Condition Caused by Entity Page Insertion during Iteration

Leave a Reply

Your email address will not be published. Required fields are marked *