Engineering Game Development

Lee Winder, Technical Manager at BlitzTech on Software Engineering, Game Development and Education

Browsing Posts in FTL

I’ve wanted to blog more about our internal STL implementation for a while and one of the interesting things I wanted to cover was how we implemented the STL deque, which I think is a sometimes misunderstood and unfortunately not a very often used container.

But first, what is a deque.

A good place to start is to say it’s pronounced ‘Deck’ and is short hand for ‘double ended queue’, giving you an idea of what it’s main strengths might be. Like a vector it offers constant time operations when adding or removing from the back, but unlike the vector it also offers the same when adding or removing from the front, making it a perfect container for queue style containers.

And it’s this later ability that makes it vastly different from a vector than the interface might otherwise suggest.

One major missing set of functions is something that you should always be using by default on a vector – reserve() and capacity(). And those missing functions should indicate an underlying difference in the way memory is allocated.

An STL implemented deque usually allocates memory in pools which are able to contain N number of entries in the container. Exceed the number that can fit in a pool and another pool is created, linked to the previous one. This gives it the ability to add values to the front and back of the container, in effect creating a chunky link list structure internally.

And it’s a good idea. By enabling a chunky (or sometimes not so chunky) list structure you have less of a need to preallocate memory, inserting values leans more towards a fixed, or more predictable, model and you lose the spikes that often accompany a vector should you keep extending past the current capacity.

But it’s not perfect. And it’s primary strength can also be it’s primary weakness.

Game developers like static memory. They like to allocate memory and stick with it. They don’t like memory that is allocating left, right and centre and they don’t like to have to guess at whats been allocated and when. Consoles like static memory too, and they like memory to be in the general vicinity of memory they are already looking at.

Coming from this, game developers and consoles generally don’t like link lists (though that’s not to say we don’t use them when we need to). And if we have to use them, it’s good to explicitly use them, or even better to use an intrusive list instead (more on those another day).

But surely you can be a bit clever. Maybe setting the size of a pool to be how ever many elements you want to add, in other words trying to mimic reserve(). But then you still need to allocate more memory when you add to the front, or the back, or where-ever depending where your particular implementation of a deque drops in the first elements (and since there is nothing guiding the implementation in the standard it could be anywhere).

And some implementations (Visual Studio I am looking squarely at you) have horrific ways of specifying how big these pools should be and simply do not provide the flexibility you need when working with them.

So what did we want and how did we approach implementing our own Deque container?

The first thing we wanted was our reserve and capacity back (along with our custom resize_capacity and other related functions) because that gives programmers vital control over what’s being allocated and when. Granted you could probably get similar behaviour by using a different allocator, but we don’t want to have to use a unique allocator 99% of the time!

As a result (and it’s a good one) that leads us back to having a block of continuous memory allocated within the deque which can make traversing the container much more cache friendly and makes memory management much easier. It also removes the question of “if I remove this element will the memory layout change?”. We’re sticking with vector style memory, which dictates that memory will not be freed unless specifically requested, another feature that ties in with developers needing to be confident of what their containers are doing.

This also allows us to lead quite easily towards a fixed::deque container with the same interface and very similar behaviour which is much harder with the chunky linked list approach.

But obviously a vector has this memory layout, and doesn’t have constant time insertion at the start of the deque. So something needs to be different?

A Ring Buffer is a pretty common data structure in computer science and fits the bill perfectly. Add an element to the back or front, and simply wrap around the available heap until your buffer start/end hit each other. At that point either stop accepting new entries (as a fixed deque would) or allocate additional memory and keep adding (in our case we increase the allocated size by 50% to keep it consistent with the vector).

This allows us to have constant time insertion at the start/end but it does make our iterators a bit more complicated as they need to be aware of their heap and buffer boundaries, but that’s only a few pointers so it’s not to bad (and traversing linked lists using iterators isn’t much different so it’s not much of a trade off).

Obviously going down this route has it’s downsides. Inserting elements one after another without reserving is much slower but developers shouldn’t be doing that anyway and most importantly traversal of the deque is much improved. Benchmarks on the platform we develop for at work show an increase in traversal speed in the order of 100% on some platforms, with the worst being around 30% faster which is nothing to sniff at.

But a fundamental change like this does means that people coming to this Deque when they are intimately familiar with a std::deque will either miss the sometimes small tweaks and use it inefficiently or be thrown by what they see and be less inclined to use it. But decent documentation and mentoring can easily overcome those hurdles.

From the outset memory management and traversal were the areas we were most concerned about when looking at the ftl::deque, and while insertion is certainly slower if you don’t reserve (though no-one should be using an FTL Deque without reserving memory before hand) this is a price I’m quite happy to pay.

Title image by mdezemery

It would be a good idea to start with a bit of history. In previous posts I’ve talked about our custom STL implementation which contains a basic vector, but also a fixed::vector which takes a fixed size and allocates all memory within that container (usually on the stack) and cannot increase or decrease in size. This comes with a lot of benefits and provides game teams with a more consistent memory model when using containers.

But it comes with a lot of duplication. Those two containers are classes in their own right, each with over 100 tests (written with UnitTest++) which is a lot to maintain and (more importantly) keep consistent. A change to one usually means a change to the other and additional tests and checks need to be rolled out. I wasn’t happy about this, so moved all the vector code into a base class and the fixed and normal vectors are now types based on that, with a different allocator being the only difference (I didn’t want to go for inheritance when a templated structure could do all the work).

That was the easy bit. But now, how do you unit test an object (in this case the vector) who’s behaviour greatly depends on an internal object (in this case the different allocator models). The allocators have been unit tested, so we know their behaviour is correct, but the behaviour of the vector will depend on the allocator it is using (in some cases it will allocate more memory – so it’s capacity varies – in others it won’t allocate memory at all – so it’s ability to grow is different). And this behaviour may be obvious or it may be much more subtle.

And it’s this which causes the problem. The behaviour should be more or less the same, but in some cases, the behaviour will be different. push_back/insert may fail, reserve may reserve more memory than has been requested and construction behaviour will greatly depend on the allocator being used.

So to start – the simple and easy option. Create all the unit tests for one type and then duplicate them for each different allocator we come up against. But we went down this road to reduce the amount of duplication so there is still going to be a large number of tests that are the same and need to be maintained. In fact you could say it would be harder to maintain because the behaviour in some tests would be suitably different rather than clearly the same.

So I started to look for a more intelligent approach to this.

The first thing was go back to the basics – what was unit testing for? Such basics are often lost in the grand scheme of things, but in this case the point of the tests was to make sure the behaviour was correct, or more importantly, what was expected. There are obviously other aspects to unit tests (which are outside the scope of this post) but this is my primary concern.

When thought about it from this level the conflicting behaviour becomes quite clear – or at least it should. We can split the various methods into fixed and dependant calls, ones which will be the same regardless of the allocator and those which depend on what is being used. If we cannot split the behaviour then we have a problem with either our interface or our implementation. Luckily the allocation code is quite contained, but there were little tendrils of code that assumed things about the allocation of memory that it shouldn’t. Not a problem as the tests will easily pick these out – I hope.

So I may as well start with the easy stuff to test, the stuff that I now know is common across a vector regardless of the allocator. Iteration, removal (since the vector doesn’t de-allocate on removal), access, equality checks (vectors are equal based on their contents) – in effect anything that reads, accesses or removes content, nothing else. And since I have tests for these already this is quickly done.

This is testing against the vectors behaviour as I know this behaviour is constant and will never change.

The hard part is now a little bit easier. Looking at what I have left I can easily see what I need to test. Since the changes in behaviour are based on nothing other than the allocator, we can easily extract that and create a couple of mocks specifying specific allocator behaviour. One which allocates the memory as you request, one which allocates a different amount of memory and one which returns no memory at all.

But now I’m not testing the behaviour of the vector, I’m testing the implementation of the vector – knowing how it works internally and how that affects its external state.

At first I tried to be a bit clever with this. Each test was extracted into a small templated helper function, with the checks being done in those. That way I only had one implementation of a test and simply passed in the vector, source data and expected values. I had to make a few tweaks to the UT++ source to enable checks to be done outside the TEST(…) call, but this was pretty simple.

At first this worked well. As long as the set of tests were the same across all objects using our various mocks it was fine. The same number of expected results, the same or similar  input values and the same general behaviour. But it started to get tricky when the tests needed to deviate. For example, when testing a vector that uses a fixed size allocator, I need to test that the vector works fine on the boundaries, boundaries which the other vectors I’m testing don’t have. So we either end up with a large number of superficial tests, or much worse, I forget the extract the specific test cases for the specific method of implementation.

But since there is nothing wrong with a bit of duplication, especially when each ‘duplicated’ test is testing a specific thing I don’t need to be so clever. By having very discrete tests based on the implementation of the vector means I am free to deviate the test suite as I see fit, when I know that the behaviour of the vector will be different based on the type of allocator it is using. It would be beautiful to be able to use the same tests to test each type, but frankly that’s just not possible when you have to understand the internal behaviour of the object to test it correctly.

So the final solution – duplicate some tests but only those that you know have different behaviour based on an implementation or an expected outcome. There might be a bit more code but it produces the best results, the best output (it’s much easier to see what has failed and why) and the easiest to understand.

If the object depended on multiple internal objects (a policy based smart pointer is an excellent example) then I don’t believe this solution would work. The more variations you have, the blurrier the lines between fixed and dependant behaviour become. In those cases I probably would expect to be writing many more tests based on the final object, simply for piece of mind.

You can only go so far with this, and unit tests are designed to aid you and give you a security blanket. In an ideal world (one where TDD rules all) you don’t need to know what the internal implementation is doing, only the final outcome. But if you are spending more time figuring out your tests (let along writing them) then you need to take a step back and reassess what you are doing.

Title image by fisserman.

I recently implemented smart pointers into the FTL and was looking for a similar set to Boost’s own due to their wide spread use.  Obviously there are a few ways in which smart pointers can be implemented and I thought it would be interesting to document the reasons why I followed a particular route rather than the most obvious (and some would say simplest) method.

Due to the complexities of shared pointers (shared_ptr and weak_ptr etc.) it made sense to start with simply non-copyable pointers which would allow me to nail down the design process before moving on so I ended up with the following specification (obviously based on Boost’s specification)

simple_ptr – A non-copyable pointer with no garbage collection
simple_array – A non-copyable pointer to array with no garbage collection
scoped_ptr – A non-copyable pointer with automatic garbage collection when the pointer leaves scope
scoped_array – A non-copyable pointer to array with automatic garbage collection when the pointer leaves scope

I did originally debate the need for ‘array’ versions of these pointers instead hoping people would prefer the FTL vector or array, which would obviously be much safer than standard C arrays.  But very often people are used to working with memory and need to work at the lowest level possible.  Not giving the _array option will just limit the use of the smart pointers, which will turn people away from using them and generally lead to more unsafe code.

 

Boost Style Pointers
The initial approach was to look at ‘Boost Style’ pointers which in other words are individual, well thought-out classes build for a specific purpose.  Each class is defined and created manually, taking an individual template which specifies the type contained within the pointer.

ftl::ptr::simple_ptr<int> aSimplePointer; 
ftl::ptr::scope_ptr<int> aScopedPointer;

The good thing about this approach is that the template class is as simple as they come, generally only containing one type member and a couple of access functions.  Any other code will act as if this class wasn’t templated and is very easy to read and understand.  Also related to the simplicity of the template is that it is pretty much guaranteed to compile on any platforms that have template support (which probably covers 99.9% of mature C++ compilers and certainly the ones we work with every day).

Unfortunately its simplicity is also its main disadvantage.  Each class (unless you want to use an inheritance approach) is a self contained unit, which means there is a lot of duplication between the different classes.  Operator overloads, memory management etc. will be the same in multiple classes and while the classes can be quite small it does mean fixes need to be made more than once which can lead to copy-and-paste errors.  Unit Testing is also constantly duplicated as you have to test the same code in different classes again and again, doubling the number of changes you need to make for even a simple fix.

It also means that if someone wants to extend a pointer for a specific case they will have to manually create their pointer specification due to the base pointers containing no virtual functions and not supporting inheritance.

It was due to these problems that made me move away from this method.  The rigid structure that they bring to the table and the way projects cannot easily create their own versions lead me to think it would cause frustration when doing anything other than using them as simple pointers. 

 

Template Method Specialisation
To try and limit the amount of duplicated code I looked at how some of the functions that were different could be wrapped up in the same class.  The simplest way to do this would be to use a take on the Template Method Pattern which is designed to provide a class that has 99% the same behaviour but has some elements that needs to behave slightly different.

Using this would allow me to wrap up the _ptr and _array versions of a pointer into the same class and halving the amount of duplication needed to have multiple types.

For example
template<typename T, bool Array = false>
class simple_ptr_type
{
public:
  ~simple_ptr_type(void)
  {
    deallocate_memory<Array>();
  }
private:
  template<bool C>
  void deallocate_memory()
  {
    delete m_ptr;
  }
  template<>
  void deallocate_memory<true>()
  {
    delete [] m_ptr;
  }
};

ftl::ptr::simple_ptr_type<int> aSimplePointer;
ftl::ptr::simple_ptr_type<int, true> aSimpleArray;

This would be an elegant solution which gives the class a bit more flexibility though care would have to be taken to pass in the correct type when declaring them.  We could make this clearer by defining an enum and using this as the templated type

enum EPointerType
{
  EPointerType_Ptr,
  EPointerType_Array,
};

and taking that further, creating a typedef for each pointer type which makes it even clearer (see the next section for problems with this…)

typedef simple_ptr_type<T, EPointerType_Ptr>   simple_ptr;
typedef simple_ptr_type<T, EPointerType_Array> simple_array;

So far so good and this cut down on the main problem we hit when using the initial implementation.  The second template could be quite misleading and requires people to investigate the class to figure out exactly what it’s for but that’s not exactly a serious drawback due to the solutions I’ve already covered.

The main problem comes from the additional template type that is used to decide which method to call.  On some platforms the compiler was smart enough to only compile what was used for each templated type.  On others it compiled all the code regardless, which means that if a particular branch was illegal with a given templated type, it would still cause compiler errors even if it was never used.

Even with those problems I did see the advantages to this but also that it could lead down the following path
template<typename T, bool Array = false, bool Scoped = false, …> class simple_ptr_type

So while having the pointer customisable with various templated types could have it’s uses, it would start to become quite unmanageable (and confusing) and in the end not actually all that extendible since you could only switch it one way or the other.

 

Policy Based Ideas
It was around this time that I finally received a copy of Modern C++ Design by Andrei Alexandrescu.  In one section he discussed a policy design approach to programming where-as the templated types actually provide specific behaviour within a fixed scope.  This started to ring bells with what I was moving towards during the previous implementation where the templates would allow me to customise a ‘basic’ pointer type but also allow others to create the pointers they needed.

So I quickly moved forwards and ended up with something similar to the following

template< typename T,
          class StoragePolicy,
          class DeallocationPolicy,
          class DestructionPolicy,
          class CheckingPolicy,
          class ConversionPolicy>
class smart_pointer_noncopyable
{
};

Now obviously this is a lot of templates and a lot of work to understand – especially when you come to it cold. So it made sense to provide various objects that would come as stock polices that people could put together to create a range of smart pointers suited to their needs.  These policies would also provide the default interface that others could build from.

So as an example of how these policies can be used in the smart pointer object

const_reference operator*(void) const
{
  // This allows the user to provide
  // additional checks when dereferencing
  // and then return the pointer type as
  // a reference (you may be using multiple
  // pointer types which need to be manually
  // checked before parts of it are returned)
  CheckingPolicy::on_dereference( m_storageValue );
  return StoragePolicy::as_reference_type(m_storageValue);
}

~smart_pointer_noncopyable(void)
{
  // Here the deallocation policy can simply
  // call ‘delete’ or ‘delete []‘ or maybe does
  // something more extensive with heap pools etc.
  DeallocationPolicy::deallocate_memory( m_storageValue );
}

It was a conscious decision to make the individual polices static member functions rather than internal or inherited objects to reduce the size of the pointer.  If it was to use a component based system where the pointer contained individual objects then the size of the pointer could bloat quite quickly and it would also lead towards inheritance based solutions to other pointer types which is something I wanted to avoid.

You’ll also notice that the pointer type used in the example has the post-fix ‘_noncopyable’ which obviously leads to another type called ’smart_pointer_copyable’.  It would have been ideal to have a single pointer template and the policies themselves dictate which can be copied and which cannot (which is the whole idea after all!).  This is something I initially looked into where the pointer object’s copy constructor and =operator used polices themselves and these denied copying or asserted when they were used.  Unfortunately this meant that we couldn’t use a compile time check (we would have to wait until the operator was actually used) for the same reasons the template method couldn’t be used (some compilers compiling the whole file regardless of use). So while this still leads to a bit of code duplication between the copy and non-copy variants it still limits it to the structure rather than the actual implementation of the polices.

Obviously this is all good and well, but it would be a colossal pain if we had to manually list all the policies (even if we provided defaults it would still be difficult to change the odd policy).  Typedef’s should be the answer to this so we can manually provide the template type, but the template provides the rest of the policies.

For example (namespaces removed to make it readable on here):
template<typename T>
typedef smart_ptr<T>  simple_pointer;
template<typename T>
typedef smart_ptr<T, policy::DeleteArray>  simple_array;

Which would just be

ftl::ptr::simple_ptr<int> aSimplePointer;
ftl::ptr::simple_array<int> aSimpleArray;

Unfortunately, templated typedefs are not part of the C++ standard (but hopefully this feature will be added sometime in the future), so the only other option was to do the following…

template <typename T>
struct simple_ptr
{
  typedef ftl::ptr::smart_ptr<
                              T
                             > ptr;
};

template <typename T>
struct simple_array
{
  typedef ftl::ptr::smart_ptr<
                              T,
                              policy::DeleteArray
                             > ptr;
};

Which finally leads to

ftl::ptr::simple_ptr<int>::ptr aSimplePointer;
ftl::ptr::simple_array<int>::ptr aSimpleArray;

I really cannot take credit for that little gem and it was a relief when I found a solution otherwise this was yet another implementation that would have been so close but just not good enough. 

So finally we have a nice way of creating basic smart pointer types that allows any other user to customise their pointers to make them ideal for whatever situation they find themselves in.  While the static policy based approach can limit what individual pointers can do it still provides a wide range of extensibility without over-bloating the pointer.

Allocators, by anyone’s standards, are the most difficult and ‘tacked-on’ feature of the STL and it is because of this that different implementations have very different opinions on how they should be implemented.

Obviously we want the FTL allocator model to be as simple as possible to actively encourage people to create their own custom classes and greatly increase the efficiency of the library.

The first decision made was to use only static member functions (inline with the Sgi implementation) as opposed to an allocation object which is created and contained within the container. This was done due to concerns about adding additional objects to classes that need to be as small as possible and not contain anything that isn’t strictly necessary.

The second decision related to the actual use of the allocator. An allocator should only ever do two things, allocate memory when needed and deallocate memory when it’s finished with. There should be no re-allocation available due to worries of fragmentation and there should be no need to define additional and superficial interfaces.

Unfortunately this is not perfect.

Since a collection of containers will use the same allocator model, you cannot have an individual container that has an individual allocator without creating one per object. I did initially pass a unique container ID to the allocator which would allow people to work around this but removed it for simplicity. If this does become a problem for developers in the future then I will consider reading this feature.

Other than that, it has so far proved to be a success and people have already started creating their own allocators for efficiency and debugging reasons amongst others. Hopefully the other alterations to the STL standard will be as successful!

It makes sense to start with the simplest and most obvious addition to the standard in the FTL which are fixed length containers.

The FTL provides fixed length containers for the vector, queue, stack, bitset and array objects which allow developers to allocate all space for the container on the stack or within the allocation of another objects memory. By containing all required memory within the container itself the object will generally have more cache friendly memory access and gives developers much more control over the size of the. Within fixed containers no dynamic memory allocation takes place and a fatal warning is given if someone attempts to store values outside its scope.

There is no scope within the container for allocating or gracfully ignoring a request to add more data than the container can manage. If the container is full, this is a fatal error and the developer needs to know about this as soon as possible.

One notable omission from the fixed container set is a fixed deque. Due to the way memory is allocated for the deque (more on that in a future post), it would generally be more costly to provide a fixed amount of space for the deque when the library doesn’t know how the container will be ultimately used.

The fixed containers also provide null operations for functions like vector::reserve so a developer can swap between them at will without having to alter their code. This allows a dynamic version of the object to be used during development and then swapped to a fixed length container near the end of the project when the full range of the container is known and they are working on small optimisations.

To keep a consistent interface, the fixed array class is the implementation of the TR1 array specification and the array object outside the fixed namespace is based on an altered, dynamic implementation of that specification. The same goes for the bitset container.

Fixed length containers are declared by simply passing the size of the container along with the type.

ftl::fixed::fvector<int, 10> fixedVector;

You might be wondering why the fixed vector is called fvector rather than vector as it is within a separate name space. This is simply due to a bug in Doxygen which leads to the classes not being documented correctly. Since this is the tool we use for all documentation, this is a simple and easy work-around.

There is often a lot of debate amongst game developers about whether we should be using STL in development code or whether it can, in the long run, cause more problems than it solves. Unfortunately, ignoring the STL and dismissing it as a complicated and bloated set of containers can lead to development teams missing out on a lot of fantastic features and a lot of time saving classes that end up being written from scratch, debugged and tested every time a new member wants a container to work ‘just a little bit differently’.

The reasons for these worries are well documented and without doubt well founded. A lot of them were explained in a paper written by Paul Pedriana of EA, where he described their implementation of STL called EASTL (which I hope is read by all members of the C++ Standards Committee before the final release of the C++09 Standards Document). What was interesting about the EASTL document was that at the time I was in the middle of writing a set of internal containers based on the STL specification for many of the same reasons he stated.

While our current internal implementation (currently called the Framework Template Library or the FTL) in no way covers as much ground as the EASTL, it is slowly making ground with the most commonly used containers developed (and various other elements) while being widely used on some of our projects. Over the next few weeks, this Blog will focus on the various design decisions I made while developing an STL replacement for Blitz, while still keeping it as tight to the STL (and TR1) specification as possible.

To start off, I want to list a few of the decisions that led me to develop our own STL style implementation before going onto the differences that out implementation has compared to the majority of implementations out there.

Platform Support
Without doubt the main reason for developing an internal STL implementation was platform support. Blitz currently supports 9 (at the last count) platforms with our internal tool chain. To keep up with development needs, new platforms have to be investigated and preliminary development started as soon as the tools become available. Initially these new platforms might not support even templates, let alone have a fully featured and stable STL implementation. Having our own internal implementation at least gives us some protection against this.

Consistency
One of the most vocal opponents to the internal implementation voiced a valid argument – “If you are developing your own implementation, why limit yourself to a fixed interface”. My counter argument was simple and one I stick with today. The STL is widely used by both game developers and other industry programmers, so why not use something that a new developer to the company will instantly feel comfortable with. Also, with the fantastic amount of tutorials out there on STL, if we stick with the STL specification people will be able to the find the answers they need without additional documentation which I would inevitable have to write.

Memory Allocation
I have yet to talk to someone who has enjoyed writing an STL Allocator replacement. It is quite obvious that allocators were tagged onto the end of the specification when it was realised that not everyone would want the default memory model. Since Blitz has an extensive collection of memory allocation and debugging tools, we needed containers that took advantage of this while making it easy for developers to alter them slightly without having to spend a few days figuring out some of the nuances of Allocators.

Readability
Have you seen some of the STL implementations? Some of the code could make even the most hardened C++ developer weep, let alone allow us mere mortals to understand what is going on under the hood. You might say that you don’t need to know what’s happening when you push_back a vector element, but I am always wary of tools that do things I don’t fully understand (you could say that’s my problem, but then I’m not the only one).

Feature Bloat
STL contains a wide range of container classes, tools and objects that can be used quite easily. Unfortunately, as mentioned a few of them are simply not suited for game development. It becomes even more complicated when what you think is a simple container actually uses another STL class that is wholly inappropriate for game development. By only implementing the classes that are deemed ’suitable’ this problem can easily be removed.

So for the next few weeks I am going to concentrate on the main differences between the FTL and the STL in both implementation details and content. Hopefully most of them will make sense!