/ c++

A Case for a Custom Array

By Amit Karim, DeepMotion’s Lead Engine Programmer

customarray1
At DeepMotion we’re interested in the boundary where intelligent physics simulation meets animation. Using new and innovative techniques our product line includes a revolutionary new articulated physics solver and machine learning-based character controllers that can synthesize motion skills in real-time, learned from animation or motion capture data sets. Performance optimization across our entire stack is essential, from C++ to C#, each superfluous function adds lag to our product; to accomplish our goal to imbue physical intelligence in digital characters, we require a few things from our core infrastructure:

  1. It has to be fast
  2. It has to be memory efficient
  3. It has to be platform independent

The following 4 part series will detail my process for constructing our core framework; using new language features from C++11 and C++14, I’ll make a case for a custom array, share how we built our Reflection System and Property Descriptors, and cover methods used. While you may not be building a next-generation tool for character simulation, we hope this series will help you develop a leaner and more efficient foundation for your own products. Some of the work detailed is inspired by some extremely clever people and some truly excellent code I have had the privilege to work with in the past. (Anyone who ever worked at Ubisoft Montreal on Assassin’s Creed or Rainbow6 may recognize the inspiration for the next post.) An example of the implementation of a custom array can be found on GitHub, click here.

The Case for Replacing std::vector

While the case for replacing the standard std::vector is nothing new, language features of C++11 and C++14 have made custom containers even easier to implement.

The case against std::vector:

  • size_t for size
    • In x64 size_t is 64bits which is rather wasteful. Your container should ideally be as compact as possible.
    • 2^64 (18,446,744,073,709,551,616) elements is a few too many. (If you hit this boundary you may have other problems to consider)
    • Not platform agnostic. In x86 size_t is 32bits. This causes all kinds of problems with attempting any kind of communication across platform boundaries (network messages, serialized data etc.).
  • No concept of a stack allocated vector

A look at what a custom implementation might look like:

template<typename T, typename CountType, typename Allocator = std::allocator<T>>
class BaseArray
{
public:
   // All the usual methods
   BaseArray();
   ~BaseArray();
   BaseArray(CountType size);
   BaseArray(const BaseArray&);
   BaseArray(BaseArray&&);
   BaseArray& operator=(const BaseArray&);
   BaseArray& operator=(BaseArray&&);   
   void PushBack(const T&);
   void PushBack(T&&);
   // etc.
private:
   void Reallocate(CountType newSize);
   CountType m_Size;
   CountType m_Capacity;
   T* m_Data;
};

We template our BaseArray on the type T that we expect to contain, the type we intend to use to maintain our current size + capacity and the allocator to use for our internal memory allocation.

One of the core methods for your custom Array would be Reallocate(). This is the act of dynamically growing the array to accommodate for new entries.

Here’s an example of what a vanilla version of Reallocate() would look like:

void Reallocate(CountType newCapacity)
{
   if (newCapacity <= m_Capacity)
      return;
   auto newData = Allocator::Allocate(newCapacity);

   if (m_Data != nullptr)
   {
      // move all existing items to the new array
      for (CountType i = 0; i < m_Size; ++i)
      {
         new (&newData[i]) ObjectType(std::move(m_Data[i]));
      }
      // call all our destructors in the original array and clean up
      for (CountType i = 0; i < m_Size; ++i)
      {
         m_Data[i].~ObjectType();
      }
      Allocator::Free(m_Data);
   }
   m_Data = newData;
   m_Capacity = newCapacity;
}

We perform a per-element move constructor initialization of each element to the new memory address, followed by a per-element destructor call to safely destroy each element in the old buffer before freeing the memory.

We can use SFINAE to specialize this method on a type trait of T. In this case whether T can be considered to be POD (“Plain Old Data”):

template<typename U = ObjectType>
typename std::enable_if<std::is_same<U, ObjectType>::value && std::is_pod<U>::value>::type
	Reallocate(CountType newCapacity)
{
   if (newCapacity <= m_Capacity)
      return;
   auto newData = Allocator::Allocate(newSize);
   if (m_Data != nullptr)
   {
      // raw memcpy of the data works just fine for relocating data
      memcpy(newData, m_Data, m_Size * sizeof(ObjectType));
      // we don't need to manually destruct each element
      if (m_OwnsData)
         Allocator::Free(m_Data);
   }
   m_Data = newData;
   m_Capacity = newSize;
}

template<typename U = ObjectType>
typename std::enable_if<std::is_same<U, ObjectType>::value && !std::is_pod<U>::value>::type
    Reallocate(CountType newCapacity)
{
   // our original Reallocate definition
}

We can perform these optimizations where needed across all our methods such as when we copy our BaseArray.

It’s worth noting that POD is a catch-all for saying that this object is trivially constructible, trivially destructible, trivially copyable and trivially assignable. In our case there are actually two optimizations:

  1. Replacing the per-member move with a simple memcpy (trivially copy constructible)
  2. Replacing the per-member destructor call in the old data (trivially destructible)

You could instead write three variations of Reallocate:

template<typename U = ObjectType>
typename std::enable_if<std::is_same<U, ObjectType>::value && std::is_trivially_copy_constructible<U>::value && std::is_trivially_destructible<U>::value>::type
    Reallocate(CountType newCapacity); // Contains both optimizations

template<typename U = ObjectType>
typename std::enable_if<std::is_same<U, ObjectType>::value && std::is_trivially_copy_constructible<U>::value && !std::is_trivially_destructible<U>::value>::type
    Reallocate(CountType newCapacity); // Can do a memcpy, but still need per member destructors

template<typename U = ObjectType>
typename std::enable_if<std::is_same<U, ObjectType>::value && !std::is_trivially_copy_constructible<U>::value && std::is_trivially_destructibe<U>::value>::type
    Reallocate(CountType newCapacity); // Per member copy but don't need to call destructors

template<typename U = ObjectType>
typename std::enable_if<std::is_same<U, ObjectType>::value && !std::is_trivially_copy_constructible<U>::value && !std::is_trivially_destructible<U>::value>::type
    Reallocate(CountType newCapacity); // Generic implementation with no optimizations

We now have an array that is fast and customizable. We’ve eliminated the need for size_t, replacing it with whatever we want.

We’ll run into problems though when we try and do something like:

BaseArray<std::unique_ptr<int>, uint16_t> myArray;

Your compiler will complain that you can’t actually create copies of unique_ptr.

We need to prevent the array from being copyable if the contents aren’t copyable. There’s a trick to go about doing this. We begin by defining our BaseArray with an additional default template parameter IsCopyable:

template<typename C, typename O, typename A, bool IsCopyable = std::is_copy_constructible<ObjectType>::value>
class BaseArray
{
public:
    // specifically we have all our copy/move assignment and copy/move construction
    BaseArray(const BaseArray&);
    BaseArray& operator=(const BaseArray&);
    BaseArray(BaseArray&&);
    BaseArray& operator=(BaseArray&&);
    // the rest of our usual methods
    // ...
    // ...
};

We can then partially specialize our BaseArray for the case where IsCopyable == false (BaseArray) and make it a derivation of BaseArray, but we explicitly delete the copy constructor and copy assignment operator:

template<typename C, typename O, typename A>
class BaseArray<C, O, A, false> : BaseArray<C, O, A, true>
{
public:
    // Mark these explicitly default (will call base implementations)
    BaseArray() = default;
    BaseArray(BaseArray&&) = default;
    BaseArray& operator=(BaseArray&&) = default;

    // delete the copy assignment + construction
    BaseArray(const BaseArray&) = delete;
    BaseArray& operator=(const BaseArray&) = delete;
};

Now we can hide most of the template ugliness:

template<typename ObjectType, typename Allocator = DefaultAllocatorT<ObjectType>>
using Array = BaseArray<uint16_t, ObjectType, Allocator>;

template<typename ObjectType, typename Allocator = DefaultAllocatorT<ObjectType>>
using BigArray = BaseArray<uint32_t, ObjectType, Allocator>;

And can now declare our Array as follows:

Array<std::unique_ptr<int>> myArray;
BigArray<std::unique_ptr<int>> myArrayBig;

We’re basically at parity with std::vector at this point barring some additional operators and methods, exposing iterators etc. But before we’re done we’re going to add a few more bit fields to our array:

union
{
    uint8_t m_Flags;
    struct
    {
        uint8_t m_OwnsData : 1;
        // couple of other flags we don't care about
    };
};

This takes up a little more memory but allows us to do something pretty interesting.

The concept of m_OwnsData means we can initialize our arrays with data that it doesn’t own. Specifically data that it does not have to delete. Like data from the stack, for example.

So we add one more constructor (the one place that m_OwnsData is initialized to false):

BaseArray(ObjectType* data, CountType numElements, CountType maxCapacity, bool ownsData);

And we need one more check in our Reallocate:

template<typename U = ObjectType>
typename std::enable_if<std::is_same<U, ObjectType>::value && std::is_pod<U>::value>::type
    Reallocate(CountType newCapacity)
{
    if (newCapacity <= m_Capacity)
        return;
    auto newData = Allocator::Allocate(newCapacity);
    if (m_Data != nullptr)
    {
        memcpy(newData, m_Data, m_Size * sizeof(ObjectType));
        if (m_OwnsData) // only free data we own
            Allocator::Free(m_Data);
    }
    m_Data = newData;
    m_OwnsData = 1;
    m_Capacity = newCapacity;
}

We can now declare our InplaceArray<>:

template<typename ObjectType, uint16_t FIXED_SIZE, typename Allocator = DefaultAllocatorT<ObjectType>>
class InplaceArray : public BaseArray<uint16_t, ObjectType, Allocator>
{
    typedef BaseArray<uint16_t, ObjectType, Allocator> super;
public:
    InplaceArray() : super(reinterpret_cast<ObjectType*>(&m_FixedBuffer), 0, FIXED_SIZE, false) {}
    InplaceArray(const InplaceArray&) = default;
    InplaceArray(InplaceArray&& other) = default;
    ~InplaceArray() = default;
   InplaceArray(const BaseArray<uint16_t, ObjectType, Allocator>& other)
       : super(other)
   {
   }
    InplaceArray& operator=(const InplaceArray&) = default;
    InplaceArray& operator=(InplaceArray&& other) = default;
private:
   typename std::aligned_storage<sizeof(ObjectType)*FIXED_SIZE, alignof(ObjectType)>::type m_FixedBuffer;
};

An advantage to this approach is that InplaceArray is an Array, which means we can pass it to any method that expects a parameter of type Array transparently. This can significantly cut down on unnecessary allocations during function calls that populate or return you an array:

// usually returns around 5 items but could return 10,000!
void GetItems(std::vector<Item>& itemsOut); // how you wrote it in C++03
std::vector<Item> GetItems();               // how you wrote it in C++11
void GetItems(Array<Item>& itemsOut);       // and we’re back!

int main()
{
    InplaceArray<Item, 10> items;           
    GetItems(items);                        // zero allocation most of the time
}

The entire definition of Array + InplaceArray, including using directives and specializations and 90% of use case methods, is <600 lines of code. The class is extremely testable and you could write a bunch of tests in catch that could get you to near 100% code coverage quite easily. The upside is something that can potentially give you a significant performance gain, greater platform stability, a smaller memory footprint, and better cache coherency.

I hope some of you will find these techniques useful as you build your own projects and classes.


DeepMotion is working on core technology to transform traditional animation into intelligent simulation. Through articulated physics and machine learning, we help developers build lifelike, interactive, virtual characters and machinery. Many game industry veterans remember the days when NaturalMotion procedural animation used in Grand Theft Auto was a breakthrough from IK-based animation; we are using deep reinforcement learning to do even more than was possible before. We are creating cost-effective solutions beyond keyframe animation, motion capture, and inverse kinematics to build a next-gen motion intelligence for engineers working in VR, AR, robotics, machine learning, gaming, animation, and film. Interested in the future of interactive virtual actors? Learn more here or sign up for our newsletter.