Engineering Game Development

Lee Winder, Technical Manager at BlitzTech on Software Engineering, Game Development and Education

Browsing Posts in Programming

I’ve wanted to blog more about our internal STL implementation for a while and one of the interesting things I wanted to cover was how we implemented the STL deque, which I think is a sometimes misunderstood and unfortunately not a very often used container.

But first, what is a deque.

A good place to start is to say it’s pronounced ‘Deck’ and is short hand for ‘double ended queue’, giving you an idea of what it’s main strengths might be. Like a vector it offers constant time operations when adding or removing from the back, but unlike the vector it also offers the same when adding or removing from the front, making it a perfect container for queue style containers.

And it’s this later ability that makes it vastly different from a vector than the interface might otherwise suggest.

One major missing set of functions is something that you should always be using by default on a vector – reserve() and capacity(). And those missing functions should indicate an underlying difference in the way memory is allocated.

An STL implemented deque usually allocates memory in pools which are able to contain N number of entries in the container. Exceed the number that can fit in a pool and another pool is created, linked to the previous one. This gives it the ability to add values to the front and back of the container, in effect creating a chunky link list structure internally.

And it’s a good idea. By enabling a chunky (or sometimes not so chunky) list structure you have less of a need to preallocate memory, inserting values leans more towards a fixed, or more predictable, model and you lose the spikes that often accompany a vector should you keep extending past the current capacity.

But it’s not perfect. And it’s primary strength can also be it’s primary weakness.

Game developers like static memory. They like to allocate memory and stick with it. They don’t like memory that is allocating left, right and centre and they don’t like to have to guess at whats been allocated and when. Consoles like static memory too, and they like memory to be in the general vicinity of memory they are already looking at.

Coming from this, game developers and consoles generally don’t like link lists (though that’s not to say we don’t use them when we need to). And if we have to use them, it’s good to explicitly use them, or even better to use an intrusive list instead (more on those another day).

But surely you can be a bit clever. Maybe setting the size of a pool to be how ever many elements you want to add, in other words trying to mimic reserve(). But then you still need to allocate more memory when you add to the front, or the back, or where-ever depending where your particular implementation of a deque drops in the first elements (and since there is nothing guiding the implementation in the standard it could be anywhere).

And some implementations (Visual Studio I am looking squarely at you) have horrific ways of specifying how big these pools should be and simply do not provide the flexibility you need when working with them.

So what did we want and how did we approach implementing our own Deque container?

The first thing we wanted was our reserve and capacity back (along with our custom resize_capacity and other related functions) because that gives programmers vital control over what’s being allocated and when. Granted you could probably get similar behaviour by using a different allocator, but we don’t want to have to use a unique allocator 99% of the time!

As a result (and it’s a good one) that leads us back to having a block of continuous memory allocated within the deque which can make traversing the container much more cache friendly and makes memory management much easier. It also removes the question of “if I remove this element will the memory layout change?”. We’re sticking with vector style memory, which dictates that memory will not be freed unless specifically requested, another feature that ties in with developers needing to be confident of what their containers are doing.

This also allows us to lead quite easily towards a fixed::deque container with the same interface and very similar behaviour which is much harder with the chunky linked list approach.

But obviously a vector has this memory layout, and doesn’t have constant time insertion at the start of the deque. So something needs to be different?

A Ring Buffer is a pretty common data structure in computer science and fits the bill perfectly. Add an element to the back or front, and simply wrap around the available heap until your buffer start/end hit each other. At that point either stop accepting new entries (as a fixed deque would) or allocate additional memory and keep adding (in our case we increase the allocated size by 50% to keep it consistent with the vector).

This allows us to have constant time insertion at the start/end but it does make our iterators a bit more complicated as they need to be aware of their heap and buffer boundaries, but that’s only a few pointers so it’s not to bad (and traversing linked lists using iterators isn’t much different so it’s not much of a trade off).

Obviously going down this route has it’s downsides. Inserting elements one after another without reserving is much slower but developers shouldn’t be doing that anyway and most importantly traversal of the deque is much improved. Benchmarks on the platform we develop for at work show an increase in traversal speed in the order of 100% on some platforms, with the worst being around 30% faster which is nothing to sniff at.

But a fundamental change like this does means that people coming to this Deque when they are intimately familiar with a std::deque will either miss the sometimes small tweaks and use it inefficiently or be thrown by what they see and be less inclined to use it. But decent documentation and mentoring can easily overcome those hurdles.

From the outset memory management and traversal were the areas we were most concerned about when looking at the ftl::deque, and while insertion is certainly slower if you don’t reserve (though no-one should be using an FTL Deque without reserving memory before hand) this is a price I’m quite happy to pay.

Title image by mdezemery

One of the most difficult things when it comes to unit testing is cutting down on the number of duplicated tests. In the past I’ve often developed objects which have very different implementations but externally produce very similar results (for example when developing a vector compared to a fixed vector, or implementing different types of iterators), and one thing I don’t want to do is have a set of tests which are almost identical to another set of tests I wrote last week.

One method I’ve used is to write a common set of tests independent of a specific type and had the type defined in another file, along with other flags which are used to indicate which tests should or shouldn’t be run.

So for example, when developing a ring buffer style iterator, it was useful to know that the const and non-const versions produced the same results in a lot of different situations without having to have comparison tests or duplicating a large amount of test code.

The following examples are written using UnitTest++ but will work with a number of different unit testing Frameworks

// In the file RingBufferIterator_Tests.inl

// Tests are defined in a separate file assuming certain
// settings have been defined
TEST(Blah)
{
RINGBUFFER_ITERATOR_TYPE myItor;

//...  Doing tests with this type
}

Now we simply need to define the type of iterator and indicate if some tests should or shouldn’t be run

// In the file RingBufferIterator_TestTypes.cpp

SUITE(RingBufferItor)
{
// Define the type of iterator we want to test
#define RINGBUFFER_ITERATOR_TYPE ftl::itor::ringbuffer

// Define a couple of settings which the tests file uses to make
// sure some type specific tests are run or excluded

// This one simply indicates that the tests checking the ability
// to alter the content of the iterator will be run
#define RINGERBUFFER_ITERATOR_NONCONST_CONTENT

// Now we simply need to include the file used to
// implement all the unit tests for this kind of iterator
#include "RingBufferIterator_Tests.inl"
}

We do the same for the const version of the iterator we are testing

// In the file RingBufferConstIterator_TestTypes.cpp

SUITE(RingBufferConstItor)
{
// Define the type of iterator we want to test
#define RINGBUFFER_ITERATOR_TYPE ftl::itor::ringbuffer_const

// In this case we don't have any additional settings so we
// simply include the test files and let the tests run
#include "RingBufferIterator_Tests.inl"
}

You can take this as far as you want but there is obviously going to be a point where the number of flags you’re defining starts to make the tests hard to read or unmanageable. Limiting it to about two, where you can easily group the tests within these defines generally works quite well.

You don’t need to limit it to a single file. For example, some types might be similar enough to share 50% of the tests but not the rest. Splitting up the common tests is easy enough.

// In the file VectorContainer_TestTypes.cpp

SUITE(Vector)
{
// Define our type
#define VECTOR_TYPE ftl::vector

// Include our common tests
#include "Vector_CommonTests.inl"

// Include only the tests for this type
#include "Vector_DynamicTests.inl"
}

When tests are not written like this there is a a big gap in the ability to test if comparable types behave the same (fixed and non-fixed containers for example) especially when simple things like human error can get in the way of creating duplicate behaviour tests. By changing over to this style of testing we can automatically test the behaviour of similar types without creating additional work.

It does have it’s draw backs, the main one being that you can get multiple test failures (if a test is used by >1 type and they both fail at the same time) or worse you get one fail and you’re not sure which type caused the problem. An easy solution to this would be to allow the failure message to have a option component, allowing you to identify the type in the message, but this isn’t supported by any test framework that I know of.

Occasionally my compilers dependancy checker doesn’t cope very well, compiling only one of the type files rather than all of them if a test changes, but this has been remarkably rare and these draw backs don’t overshadow the ability to confirm that your types are functionally the same even if their implementation is vastly different.

Unity builds. I don’t like them.

Of all the tools at your disposal to make a build faster, this is the worst. And it’s not just the “hey let’s #include .cpp files” weirdness, but the way that it can make a well structure, modular code base become worse than spaghetti junction at rush hour, and the worse thing is that it’s not the fault of the programmer, especially when something as exceptional as Visual Assist can start helping you create non-modular code because of the way Unity Builds collect the code you write.

What Is a Unity Build?
Unity Builds have nothing to do with the excellent Unity Tool Set. I’ll just clear that up right off the bat.

These Unity Builds are a way of reducing build over-head (specifically opening and closing files and reducing link times by reducing the number of object files generated) and as such are used to drastically speed up build times. I say drastically because they are usually used after a code base has starting generating build times that totally cut down on a programmer’s productivity. This might be 10 minutes, 30 minutes or even hours.

Because Unity Builds give you a quick gain, it’s seen as a pretty magical solution, and when a full build is reduced from 30 to 3 minutes you can see why. But the caveat in that statement? Full builds.

I won’t go through how to generate Unity Builds here as the blog post The Magic of Unity Builds does a great job so have a look there. It’s interesting that I’m linking to a blog post that is in the total opposite direction that I’m coming from, but unless you know both sides of a tale, you don’t know the tale.

So without any more delay, why don’t I like them?

Say Goodbye to Local Properties
Well written code is modular, and modular code relies on a few assumptions. One of those being that a single translation unit (a cpp file and any associated include files) is isolated from other translation units, so (unless you extern the contents) anything defined within a cpp file is not available outside, and as a consequence you can have local objects being called the same things without conflict.

Take for example the following simple code (easily expanded to a real world example I’m sure)

// In VectorTest.cpp
namespace
{
   const uint StandardTestCount = 10;
}
// In ListTest.cpp
namespace
{
   const uint StandardTestCount = 3;
}

In a normal compilation environment these variables are local to the file they are working in, and this is a good thing. Why should the vector tests file care what is defined within another test file?

But if you have a Unity Build with the following then you’re going to have issues…

#include "VectorTest.cpp"
#include "ListTest.cpp"
#include "ArrayTest.cpp"
#include "QueueTest.cpp"

Specifically errors relating to the variable ::StandardTestCount already being defined. So we now have to be aware of what is defined throughout the entire project and resolve any issues, and inevitable this will end up with all variables being pre-appended with (in this example) VectorTest_ and ListTest_ etc.

I don’t want to refer to the ‘Magic’ post to much, as it’s a good article, but there is a statement in there referring directly to this. Specifically the following

“Technically you shouldn’t have this problem if you’re writing “proper” code”

This is wrong. “Proper” code is well structured and well maintained, meaning it’s modular and only refers to what it needs to refer to. In a lot of cases you will have variables with the same name, functions with the same name and other elements that you naturally write as you go along. That’s why the anonymous namespace (or the C style ’static’) is so useful, and so useless if you switch to Unity Builds.

Using Is (even more) Lethal
Never use the ‘using’  keyword in a header file. It’s a standard statement and one that stands up easily. The reasons behind this are pretty clear (declaring ‘using’ in a header file forces all code including the header file to use that statement – in effect wiping out a namespace).

But using it in a .cpp file isn’t a crime. In fact it’s remarkably useful, even within the whole files scope. As a developer, I should know what I’m using in a local file, what elements I’m bringing into the file and what would/could conflict. Some people might not agree that ‘using’ at the top of a cpp file is a good idea at all, and I see their point, but on a whole file scope, it can be massively useful and as it’s localised it’s not going to pollute anything other than the file it’s in.

But in Unity Builds, using (that is not scoped within a function or type etc.) is lethal. Declaring ‘using ‘ in a cpp file makes it available throughout the code base (or more confusingly only in those files that are included after the one declaring it). Suddenly, using is everywhere. And your namespaces are now useless.

Every Change is a Rebuild
This is one of my biggest problems with Unity Builds. In well structured projects (more on that below) changing the content of a cpp file (or even a local header file) shouldn’t require everything to be rebuilt. Every. Single. Time. Yes, the unity build rebuilds are faster than doing a full build without a unity file, but doing even a fast rebuild every time will build up. Quickly.

When-ever I change a .cpp file, all I need to build is that .cpp file. Granted, link times are a tad long because it still has to re-link the other object files, but it takes a second to compile that one file. On average (I took count today) when I changed a header file it compiled on average 5 .cpp files. And it (again on average) took about 5 seconds to build.

Very rarely should I be required to re-build the entire project, and most of the time I’m don’t. And that saves me a lot of time. Every single day.

Multiple Configurations
This is mainly a bugbear of mine rather than a direct issue with Unity Builds, but I see it in nearly every Unity Build project I use. The main project is the Unity Build project, but there is another project that is built and maintained that doesn’t use Unity Builds. Now there is a point here – by having an additional ‘normal’ project you are forcing the modularity that can collapse with Unity Builds to be checked (usually a Continuous Build server will be building this as well as the Unity Build every time).

But we have problems with this.

Firstly, the non-unity build is only ever being built on the CB server. So any problems are going to break the build, and it will break if people are not using it day-to-day. Secondly you now have multiple projects to maintain. Not too much of a problem if you have an automated project generation step (see below) but it is still another project to maintain.

People may occasionally have to use the non-unity configurations, especially if they are getting an unknown error on the CB server. So now they are left working on a configuration that is uncared for and probably builds so slowly and erratically that they are probably losing all the time they saved from those quick unity builds they have been doing all day.

But What About My Build Times Then?
Well structured software builds quickly (or as quickly as they can anyway). But what is a well structured project when it comes to build times?

  • Sensible file inclusion – Only include the files you need and try as best you can to limit them to .cpp files. This means the compiler only needs to bring in what’s necessary and when you change one of these header files only those things that directly rely on it will change. If you find yourself having to include files that constantly change into a large number of cpp files, then I’d wager that a refactoring of the large header file would be in order. You should be building only a small number of cpp files when you change the content of a header file.
  • Forward Declarations – Where possible use forward declarations rather than including the header file in your class or function declaration. Annoyingly you cannot forward declare enums (on standard compliant compliers anyway) which sometimes throws this out of the window. But by not including the files until use, you’re cutting down on the amount of code being included and the number of files being opened.
  • Pre-compiled Headers – Use Pre-compiled Headers (PCH). Using PCH’s is the one (built into the IDE) feature that will cause your build times to plummet, especially if you are being sensible with them (such as including files that don’t change – SDK’s and common, fixed headers for example). Using pre-compiled headers across multiple platforms can be a pain, but it is possible to unify them and get a massive boost. I’ll cover these a little bit more below.
  • Library Files – Modular code usually leads towards easily extracting common code into a collection of libs. Reducing the amount of code in a project (and as a result how much you need to build) can speed up your productivity quickly.
  • Parallel Building – If you’re IDE supports it (and it might do and you just don’t know) and you have a multi-core machine, turn on parallel building of files. Building individual files at the same time is obviously going to cut down on your compile times no matter how quick they are at the moment.
  • Get a Better PC – It goes without saying (but I will anyway) doing this will speed everything up.

Pre-Compiled Headers
Pre-compiled headers are one of the best ways to get a massive boost to your compile times. So how fast are my build times compared to that of a Unity Build version when using PCH’s and taking into account the other build improvements suggestions?

As stated above the ‘average’ build time of a minimal build throughout the day was around 5 seconds. On a Unity Build it was a full build every time and was around 2 minutes on the slowest platform.

Changes to ‘core’ files, which are included by a higher than average number of files, resulted in builds of around 30 seconds on around 20 files. Again on a Unity Build this would have been around 2 minutes regardless.

Full rebuild (which I did twice today) was around 3 minutes. Granted a whole minute slower than a Unity Build but I did it twice rather than every single time.

Pre-compiled headers are not a silver bullet solution. Nothing is. And because of this here are issues that you do need to be aware of

  • Compiler Support – Some compilers out there simply do not support PCH’s. On a daily basis I use between 4 or 5 different compilers and while I’ve never encountered one that doesn’t, they are out there. This can shoot my argument in the foot, but it is a rare problem and one most people won’t encounter.
  • PCH Quirks – While I use a number of compilers that do support PCH’s, every single one of them has a slightly different way of building them and slightly different requirements before they can be used. This doesn’t affect the code that is written but does affect the content of your project, especially if you want to make them seem as consistent as possible.
  • Over-Inclusion – Because your PCH usually includes common files and files that rarely change, it does mean that some files are being brought into the project in a compilation unit that wouldn’t otherwise be required

Unity builds are a solution for the wrong problem. Long compile times are not caused by not using Unity Builds, they are the result of having badly structured and badly referenced code. Fix that (and have better code to boot) and you’ll be able to use ‘proper’ build tools without resorting to a quick fix.

Making Unity Builds Better
I don’t want to just leave it at that, because no matter how much I argue, Unity Builds will always exist and people will always use them (after all it’s quicker to rescue a broken down build by making it a Unity Build than doing it the hard way). So what can people do to actually make Unity Builds a bit more desirable and less of a problematic fix?

  • Automate Adding Files – A lot of teams have auto project generation code already (XML files, template projects etc.) so it’s important that you automate adding files to the project, otherwise people will forget to remove the file from the build step and they will forget to add it to the unity build file.
  • Multiple Unity Files – Running a Unity Build doesn’t require you to have one unity file with every cpp file inside it. You can have multiple build files (usually per module or per component) which means at least some of the translation unit leaking is limited to each module rather than the whole program.
  • Additional Project – No, this isn’t a contradiction from the above “Multiple Configurations” comment above. In this situation you will have a project that contains the cpp and header files so you can reference them but this project isn’t built. Instead, you have the ‘proper’ project simply contain the unity file(s). This isn’t something I’ve personally tried, but it does get around the issues of adding files if you don’t have an automated step.

SAFE_DELETE and SAFE_FREE. What do those functions (macros?) say to you? What if you also have over-ridden calls to DELETE and FREE too?

I think most people see a safe delete function as one that handles a null pointer safely – only calling delete or free when the ptr isn’t null. Since this is totally irrelevant (‘delete null’ is a no-op as defined by the standard and free(null) seems to be just as safe) there’s nothing special or even needed here.

So the ’safe’ part must be something else?

What about the usual delete/free over-ride behaviour of setting the pointer to the free’d memory to null itself. Well, that’s just common sense. No-one wants dangling pointers left lying around as that way only leads to memory over-rides and random crashes (usually in the submission candidate build 10 minutes before it needs to be uploaded). So if that’s the reason, then to be honest the macros should be called SENSIBLE_FREE or REASONABLE_DELETE. But then that’s just daft.

I suppose you might have a situation where setting it to NULL isn’t really necessary.

delete myPointer;
/* myPointer = NULL */ // Don't need this now

myPointer = new SomeClass;

But seriously, is having an extra line setting ‘myPointer’ to NULL a problem in the grand scheme of things? Any half decent compiler would probably strip it out anyway, and besides you’re freeing and allocating memory – an assignment isn’t going to mean squat in the middle of that.

Thinking some more, you might also have a const pointer, that cannot be set to null. This, I suppose, is an issue, but it’s an interesting one, that gets into the realms of what’s a const pointer (or a pointer to const data) represents to the user? I don’t want to cover that here, as it could take pages and pages (and generate lots and lots of comment), but it’s a case that is rare and isn’t something that should be seen as ‘the norm’.

Another possibility is that these ’safe’ functions are doing a little extra. Maybe in debug, the heap code is checking that this pointer was actually allocated in the same heap you are trying to free it from, or if indeed it was allocated at all. Safe in this context seems to be more like ‘close your eyes and hope for the best’ kind of safe. It’s all good and well being safe in debug, but I’m guessing code like this isn’t run in master builds, so you’re left with finding problems in master, when it becomes so much harder to track down issues of any kind. It’s much better to have these problems flagged at the earliest opportunity, so doing these kinds of checks in a vanilla over-ride makes more sense than doing it in a ’safe’ version.

So really, there is nothing ’safe’ about functions like these, nothing they do that wouldn’t be carried out by a programmer in their day to day work. So why do so many libraries have different versions of an over-ridden function, ones which are ’safe’ and one’s which, apparently, are not.

So just bite the bullet and reduce the number of functions that a programmer can call to delete some memory. Have one version of FREE and one version of DELETE (or SDK_FREE and SDK_DELETE to make the distinction obvious) that hooks into your memory manager, making NULL a no-op and making the pointer in question a NULL pointer when it’s done. Reducing the number of available functions makes even more sense if the function is actually a macro, as with no scoping available, polluting the namespace is something you really want to avoid.

And this isn’t just about FREE or DELETE. Reduce your macros, reduce your function calls and make your code easier to use, easier to understand and easier to test as a result.

C++ can have some interesting behaviour, sometimes expected and sometimes not so. It’s one of the reasons I enjoy using the language (maybe I like punishment) but it’s one of the reasons many people don’t.

One of the often overlooked aspects of C++ is the way classes can be easily, and incorrectly, copied without you knowing about it, due to the way classes will have copy and creation methods automatically generated for you if you don’t specify them yourself.

This post intends to look at how you can avoid this by making sure you are explicit when creating your C++ classes, through the use of copy constructors and assignment operators. To look at how these can be used, I’ll go through a possibly contrived example, which will show exactly where these members come into play and how not knowing how they work can generate behaviour you don’t expect, or don’t want.

A Vector Implementation
To show how and when these C++ features can be a help or a hindrance I’m going to use a simple example, creating an implementation of a basic vector container. While you might not do this yourself, it will easily show how we can use various member functions to make the object behave exactly how we want it to.

So to start with the basics, this is what our implementation of a vector (and probably most of you basic classes) looks like – and for the sake of simplicity, it won’t be getting more complicated either.

class MyVector
{
public:
  MyVector();
  ~MyVector();

private:
  int       m_entryCount;
  void*     m_entries;
};

So we have the most basic implementation here, a pointer to the vector elements which references dynamic heap memory (this is the member that’s going to be causing us problems) and an integer indicating how many elements the vector currently contains.

As the implementation continues, the vector gains member functions to add elements, return references to them, remove them, increase or decrease the capacity and so on.

Copying The Vector
But as people continue to use the container, they become more comfortable with them until eventually they do the following

MyVector masterVector;
MyVector localCopy;

// ... Add plenty of things to masterVector

// Copy a vector into our local copy so we
// have something to play with
localCopy = masterVector;

This all seems innocent enough – we are copying the one vector into another so we can use the contents and, we assume, modify them without affecting the original. After all, this is how built in types work.

But because our basic vector implementation hasn’t defined an assignment operator it’s not going to do what we expect.

But if we haven’t defined one, surely the compiler will generate a couple of errors because it doesn’t know what to do? Unfortunately in this case, the compiler tries to help you out, by creating a default assignment operator, which does nothing more than a shallow copy. And all a shallow copy does is copy the basic types, nothing more, nothing less.

In effect it does nothing more the following…

localCopy.m_entryCount = masterVector.m_entryCount;
localCopy.m_entries = masterVector.m_entries;

So when we do this, we end up with both m_entries pointers pointing to the exact same area of memory. And this is not a good thing. Especially if the user continues and does the following…

// Copy the vector into a local copy
// so we can alter it
localCopy = masterVector;

// Since the user would assume that
// they have an independent vector,
// they do what they want with it

// Lets clear the contents - which deletes
// all the allocated memory
localCopy.clear();

Since we have two objects pointing to the same area of memory, and one of them just called ‘delete’ on it, we are left in a very bad state. The master vector now references invalid memory and it thinks it actually contains elements, because its m_entryCount hasn’t been effected by the change to the copied vector…

So while in some cases it’s handy for an assignment operator to be automatically generated, in this case it certainly isn’t. And we should make sure that an assignment operator is only ever generated when we explicitly require it.

So the first thing to do is to actually generate an error when someone tries to use ‘=’ unless we specifically request it. And the way to do this is through a private declaration as part of the class.

So we end up with the following

class MyVector
{
public:
  MyVector();
  ~MyVector();

private:
  // Declare a private operator= so nobody can use it
  MyVector& operator= (const MyVector& rhs);
};

By putting the operator= method into the private part of the class, and not defining it, we stop anyone from using it, with the compiler generating an easy to understand and quick error.

Obviously in this case people want to be able to copy one vector to another, but now we have the ability to control how and when this happens. By simply moving the operator= method into the public space, and defining the function so it does a deep copy (in other words it copies the content of the allocated memory rather than just the pointer to the memory) we can properly copy one vector into another.

Users can now copy vectors at will, knowing that each one is independent of the one that came before it and if you don’t want them to, we can easily block them and let the compiler tell them the bad news.

Creating Copies
So we can now control what happens with the vector when someone tries to copy it, but what happens in the following situation?

// Function that takes a vector of objects
void DoSomethingCool(MyVector listOfObjects)
{
  // ... Do some stuff with listOfObjects
  // ... Again, lets clear it
  listOfObjects.clear();
}

// Create our vector
MyVector objectList;

// ... Add plenty of objects to it

// Then pass it along
DoSomethingCool(objectList);

Now this might be a simple mistake. We might have wanted to pass by reference rather than value, but that’s irrelevant. By passing by value we have forced the compiler to automatically generate a copy constructor. And this comes with all the problems we had with an automatically generated assignment operator. Especially as the local function goes and clears the vector again…

So we have the exact same solution to the copy constructor problem as we had with the assignment operator…

class MyVector
{
public:
  MyVector();
  ~MyVector();

private:
  // Declare a private copy constructor so nobody can use it
  MyVector(const MyVector& rhs);
  MyVector& operator= (const MyVector& rhs);
};

So again we can make the pass by value safe by either blocking the creation by keeping it private or by moving the copy constructor into the public space and defining a deep copy, making sure that again our vectors are independent.

You can also use the copy constructor to generate new objects at the point of creation, by calling the copy constructor explicitly.

MyVector masterVector;
// Add stuff to the master vector

// Create a new one using the original
MyVector copiedVector(masterVector);

By using the copy constructor explicitly, you may be able to reduce a more expensive call to the default constructor followed by an assignment operator into a single, more efficient call.

Creation or Assignment?
The following might not do what you think it does

MyVector masterVector;
// Add stuff to the master vector

// Create a new one using the original
MyVector copiedVector = masterVector;

Does this call the assignment operator, or the copy constructor, or something else?!?

It’s using the ‘=’ operator, but the important part of the question is that we using this as the point of creation, so we, maybe surprisingly, use the copy constructor in this case.  I’ll look at ways that assignment/creation operations can be made more explicit in a later post.

This is worth remembering as calling the default constructor followed by the assignment operator might be more expensive than simply calling the copy constructor.

Summary

  • The assignment operator and copy constructor are used at different times of an objects lifecycle.
  • MyVector createdVector1(masterVector);  // Uses the copy constructor
    MyVector createdVector2 = masterVector;  // Uses the copy constructor
    createdVector1 = createdVector2; // Uses the assignment operator
  • If you need an assignment operator, or a copy constructor, then you will more than likely need the other. Even if you think you might not, you need to make your class safe for the end user, so adding them both it the best way forward.
  • You should create a private copy constructor and assignment operator by default, only removing them when you need a shallow copy, or defining them if they want a deep copy. So by default all your classes should start life looking like the following (you’re more likely to forget to add them when you don’t need them, than remove them if you do)
  • class DefaultClass
    {
    public:
      DefaultClass();
      ~DefaultClass();
    
    private:
      // Declare private copy constructor and assignment operator,
      // only removing them when we actually need them...
      DefaultClass(const DefaultClass& rhs);
      DefaultClass& operator= (const DefaultClass& rhs);
    };
  • If you only need a shallow copy, then you can remove the private declarations since the compiler can do all the work for you. But only remove them if you explicitly want the objects to be copied.
  • All this relates to structures as well as classes.  But if you are looking at assignment operators and copy constructors for one of your structs, you might be better off creating a class rather than a struct.

The information here only scrapes the surface of how to best use assignment operators and copy constructors, but you can easily investigate how to use these and other automatically generated functions to make your objects more efficient and, most importantly, safe for the users.  Without knowing how and when these class members are used, and how to take advantage of them, you can’t confidently say that the objects you are creating are safe to use in the situations they will be used in.

I’m going to look at a couple of other well hidden features of C++ in future post which will compliment the use of assignment operators and copy constructors quite nicely.


It would be a good idea to start with a bit of history. In previous posts I’ve talked about our custom STL implementation which contains a basic vector, but also a fixed::vector which takes a fixed size and allocates all memory within that container (usually on the stack) and cannot increase or decrease in size. This comes with a lot of benefits and provides game teams with a more consistent memory model when using containers.

But it comes with a lot of duplication. Those two containers are classes in their own right, each with over 100 tests (written with UnitTest++) which is a lot to maintain and (more importantly) keep consistent. A change to one usually means a change to the other and additional tests and checks need to be rolled out. I wasn’t happy about this, so moved all the vector code into a base class and the fixed and normal vectors are now types based on that, with a different allocator being the only difference (I didn’t want to go for inheritance when a templated structure could do all the work).

That was the easy bit. But now, how do you unit test an object (in this case the vector) who’s behaviour greatly depends on an internal object (in this case the different allocator models). The allocators have been unit tested, so we know their behaviour is correct, but the behaviour of the vector will depend on the allocator it is using (in some cases it will allocate more memory – so it’s capacity varies – in others it won’t allocate memory at all – so it’s ability to grow is different). And this behaviour may be obvious or it may be much more subtle.

And it’s this which causes the problem. The behaviour should be more or less the same, but in some cases, the behaviour will be different. push_back/insert may fail, reserve may reserve more memory than has been requested and construction behaviour will greatly depend on the allocator being used.

So to start – the simple and easy option. Create all the unit tests for one type and then duplicate them for each different allocator we come up against. But we went down this road to reduce the amount of duplication so there is still going to be a large number of tests that are the same and need to be maintained. In fact you could say it would be harder to maintain because the behaviour in some tests would be suitably different rather than clearly the same.

So I started to look for a more intelligent approach to this.

The first thing was go back to the basics – what was unit testing for? Such basics are often lost in the grand scheme of things, but in this case the point of the tests was to make sure the behaviour was correct, or more importantly, what was expected. There are obviously other aspects to unit tests (which are outside the scope of this post) but this is my primary concern.

When thought about it from this level the conflicting behaviour becomes quite clear – or at least it should. We can split the various methods into fixed and dependant calls, ones which will be the same regardless of the allocator and those which depend on what is being used. If we cannot split the behaviour then we have a problem with either our interface or our implementation. Luckily the allocation code is quite contained, but there were little tendrils of code that assumed things about the allocation of memory that it shouldn’t. Not a problem as the tests will easily pick these out – I hope.

So I may as well start with the easy stuff to test, the stuff that I now know is common across a vector regardless of the allocator. Iteration, removal (since the vector doesn’t de-allocate on removal), access, equality checks (vectors are equal based on their contents) – in effect anything that reads, accesses or removes content, nothing else. And since I have tests for these already this is quickly done.

This is testing against the vectors behaviour as I know this behaviour is constant and will never change.

The hard part is now a little bit easier. Looking at what I have left I can easily see what I need to test. Since the changes in behaviour are based on nothing other than the allocator, we can easily extract that and create a couple of mocks specifying specific allocator behaviour. One which allocates the memory as you request, one which allocates a different amount of memory and one which returns no memory at all.

But now I’m not testing the behaviour of the vector, I’m testing the implementation of the vector – knowing how it works internally and how that affects its external state.

At first I tried to be a bit clever with this. Each test was extracted into a small templated helper function, with the checks being done in those. That way I only had one implementation of a test and simply passed in the vector, source data and expected values. I had to make a few tweaks to the UT++ source to enable checks to be done outside the TEST(…) call, but this was pretty simple.

At first this worked well. As long as the set of tests were the same across all objects using our various mocks it was fine. The same number of expected results, the same or similar  input values and the same general behaviour. But it started to get tricky when the tests needed to deviate. For example, when testing a vector that uses a fixed size allocator, I need to test that the vector works fine on the boundaries, boundaries which the other vectors I’m testing don’t have. So we either end up with a large number of superficial tests, or much worse, I forget the extract the specific test cases for the specific method of implementation.

But since there is nothing wrong with a bit of duplication, especially when each ‘duplicated’ test is testing a specific thing I don’t need to be so clever. By having very discrete tests based on the implementation of the vector means I am free to deviate the test suite as I see fit, when I know that the behaviour of the vector will be different based on the type of allocator it is using. It would be beautiful to be able to use the same tests to test each type, but frankly that’s just not possible when you have to understand the internal behaviour of the object to test it correctly.

So the final solution – duplicate some tests but only those that you know have different behaviour based on an implementation or an expected outcome. There might be a bit more code but it produces the best results, the best output (it’s much easier to see what has failed and why) and the easiest to understand.

If the object depended on multiple internal objects (a policy based smart pointer is an excellent example) then I don’t believe this solution would work. The more variations you have, the blurrier the lines between fixed and dependant behaviour become. In those cases I probably would expect to be writing many more tests based on the final object, simply for piece of mind.

You can only go so far with this, and unit tests are designed to aid you and give you a security blanket. In an ideal world (one where TDD rules all) you don’t need to know what the internal implementation is doing, only the final outcome. But if you are spending more time figuring out your tests (let along writing them) then you need to take a step back and reassess what you are doing.

Title image by fisserman.

I recently implemented smart pointers into the FTL and was looking for a similar set to Boost’s own due to their wide spread use.  Obviously there are a few ways in which smart pointers can be implemented and I thought it would be interesting to document the reasons why I followed a particular route rather than the most obvious (and some would say simplest) method.

Due to the complexities of shared pointers (shared_ptr and weak_ptr etc.) it made sense to start with simply non-copyable pointers which would allow me to nail down the design process before moving on so I ended up with the following specification (obviously based on Boost’s specification)

simple_ptr – A non-copyable pointer with no garbage collection
simple_array – A non-copyable pointer to array with no garbage collection
scoped_ptr – A non-copyable pointer with automatic garbage collection when the pointer leaves scope
scoped_array – A non-copyable pointer to array with automatic garbage collection when the pointer leaves scope

I did originally debate the need for ‘array’ versions of these pointers instead hoping people would prefer the FTL vector or array, which would obviously be much safer than standard C arrays.  But very often people are used to working with memory and need to work at the lowest level possible.  Not giving the _array option will just limit the use of the smart pointers, which will turn people away from using them and generally lead to more unsafe code.

 

Boost Style Pointers
The initial approach was to look at ‘Boost Style’ pointers which in other words are individual, well thought-out classes build for a specific purpose.  Each class is defined and created manually, taking an individual template which specifies the type contained within the pointer.

ftl::ptr::simple_ptr<int> aSimplePointer; 
ftl::ptr::scope_ptr<int> aScopedPointer;

The good thing about this approach is that the template class is as simple as they come, generally only containing one type member and a couple of access functions.  Any other code will act as if this class wasn’t templated and is very easy to read and understand.  Also related to the simplicity of the template is that it is pretty much guaranteed to compile on any platforms that have template support (which probably covers 99.9% of mature C++ compilers and certainly the ones we work with every day).

Unfortunately its simplicity is also its main disadvantage.  Each class (unless you want to use an inheritance approach) is a self contained unit, which means there is a lot of duplication between the different classes.  Operator overloads, memory management etc. will be the same in multiple classes and while the classes can be quite small it does mean fixes need to be made more than once which can lead to copy-and-paste errors.  Unit Testing is also constantly duplicated as you have to test the same code in different classes again and again, doubling the number of changes you need to make for even a simple fix.

It also means that if someone wants to extend a pointer for a specific case they will have to manually create their pointer specification due to the base pointers containing no virtual functions and not supporting inheritance.

It was due to these problems that made me move away from this method.  The rigid structure that they bring to the table and the way projects cannot easily create their own versions lead me to think it would cause frustration when doing anything other than using them as simple pointers. 

 

Template Method Specialisation
To try and limit the amount of duplicated code I looked at how some of the functions that were different could be wrapped up in the same class.  The simplest way to do this would be to use a take on the Template Method Pattern which is designed to provide a class that has 99% the same behaviour but has some elements that needs to behave slightly different.

Using this would allow me to wrap up the _ptr and _array versions of a pointer into the same class and halving the amount of duplication needed to have multiple types.

For example
template<typename T, bool Array = false>
class simple_ptr_type
{
public:
  ~simple_ptr_type(void)
  {
    deallocate_memory<Array>();
  }
private:
  template<bool C>
  void deallocate_memory()
  {
    delete m_ptr;
  }
  template<>
  void deallocate_memory<true>()
  {
    delete [] m_ptr;
  }
};

ftl::ptr::simple_ptr_type<int> aSimplePointer;
ftl::ptr::simple_ptr_type<int, true> aSimpleArray;

This would be an elegant solution which gives the class a bit more flexibility though care would have to be taken to pass in the correct type when declaring them.  We could make this clearer by defining an enum and using this as the templated type

enum EPointerType
{
  EPointerType_Ptr,
  EPointerType_Array,
};

and taking that further, creating a typedef for each pointer type which makes it even clearer (see the next section for problems with this…)

typedef simple_ptr_type<T, EPointerType_Ptr>   simple_ptr;
typedef simple_ptr_type<T, EPointerType_Array> simple_array;

So far so good and this cut down on the main problem we hit when using the initial implementation.  The second template could be quite misleading and requires people to investigate the class to figure out exactly what it’s for but that’s not exactly a serious drawback due to the solutions I’ve already covered.

The main problem comes from the additional template type that is used to decide which method to call.  On some platforms the compiler was smart enough to only compile what was used for each templated type.  On others it compiled all the code regardless, which means that if a particular branch was illegal with a given templated type, it would still cause compiler errors even if it was never used.

Even with those problems I did see the advantages to this but also that it could lead down the following path
template<typename T, bool Array = false, bool Scoped = false, …> class simple_ptr_type

So while having the pointer customisable with various templated types could have it’s uses, it would start to become quite unmanageable (and confusing) and in the end not actually all that extendible since you could only switch it one way or the other.

 

Policy Based Ideas
It was around this time that I finally received a copy of Modern C++ Design by Andrei Alexandrescu.  In one section he discussed a policy design approach to programming where-as the templated types actually provide specific behaviour within a fixed scope.  This started to ring bells with what I was moving towards during the previous implementation where the templates would allow me to customise a ‘basic’ pointer type but also allow others to create the pointers they needed.

So I quickly moved forwards and ended up with something similar to the following

template< typename T,
          class StoragePolicy,
          class DeallocationPolicy,
          class DestructionPolicy,
          class CheckingPolicy,
          class ConversionPolicy>
class smart_pointer_noncopyable
{
};

Now obviously this is a lot of templates and a lot of work to understand – especially when you come to it cold. So it made sense to provide various objects that would come as stock polices that people could put together to create a range of smart pointers suited to their needs.  These policies would also provide the default interface that others could build from.

So as an example of how these policies can be used in the smart pointer object

const_reference operator*(void) const
{
  // This allows the user to provide
  // additional checks when dereferencing
  // and then return the pointer type as
  // a reference (you may be using multiple
  // pointer types which need to be manually
  // checked before parts of it are returned)
  CheckingPolicy::on_dereference( m_storageValue );
  return StoragePolicy::as_reference_type(m_storageValue);
}

~smart_pointer_noncopyable(void)
{
  // Here the deallocation policy can simply
  // call ‘delete’ or ‘delete []‘ or maybe does
  // something more extensive with heap pools etc.
  DeallocationPolicy::deallocate_memory( m_storageValue );
}

It was a conscious decision to make the individual polices static member functions rather than internal or inherited objects to reduce the size of the pointer.  If it was to use a component based system where the pointer contained individual objects then the size of the pointer could bloat quite quickly and it would also lead towards inheritance based solutions to other pointer types which is something I wanted to avoid.

You’ll also notice that the pointer type used in the example has the post-fix ‘_noncopyable’ which obviously leads to another type called ’smart_pointer_copyable’.  It would have been ideal to have a single pointer template and the policies themselves dictate which can be copied and which cannot (which is the whole idea after all!).  This is something I initially looked into where the pointer object’s copy constructor and =operator used polices themselves and these denied copying or asserted when they were used.  Unfortunately this meant that we couldn’t use a compile time check (we would have to wait until the operator was actually used) for the same reasons the template method couldn’t be used (some compilers compiling the whole file regardless of use). So while this still leads to a bit of code duplication between the copy and non-copy variants it still limits it to the structure rather than the actual implementation of the polices.

Obviously this is all good and well, but it would be a colossal pain if we had to manually list all the policies (even if we provided defaults it would still be difficult to change the odd policy).  Typedef’s should be the answer to this so we can manually provide the template type, but the template provides the rest of the policies.

For example (namespaces removed to make it readable on here):
template<typename T>
typedef smart_ptr<T>  simple_pointer;
template<typename T>
typedef smart_ptr<T, policy::DeleteArray>  simple_array;

Which would just be

ftl::ptr::simple_ptr<int> aSimplePointer;
ftl::ptr::simple_array<int> aSimpleArray;

Unfortunately, templated typedefs are not part of the C++ standard (but hopefully this feature will be added sometime in the future), so the only other option was to do the following…

template <typename T>
struct simple_ptr
{
  typedef ftl::ptr::smart_ptr<
                              T
                             > ptr;
};

template <typename T>
struct simple_array
{
  typedef ftl::ptr::smart_ptr<
                              T,
                              policy::DeleteArray
                             > ptr;
};

Which finally leads to

ftl::ptr::simple_ptr<int>::ptr aSimplePointer;
ftl::ptr::simple_array<int>::ptr aSimpleArray;

I really cannot take credit for that little gem and it was a relief when I found a solution otherwise this was yet another implementation that would have been so close but just not good enough. 

So finally we have a nice way of creating basic smart pointer types that allows any other user to customise their pointers to make them ideal for whatever situation they find themselves in.  While the static policy based approach can limit what individual pointers can do it still provides a wide range of extensibility without over-bloating the pointer.

Recently the topic of code size and its effect on the productivity and maintainability of a code base has been brought up in a number of forums and blogs across the internet. One of the more read would be the following (I won’t go into the issue I have with the author only ‘talking’ to young or inexperienced programmers) -

Codes Worst Enemy – Steve Yegge December 19th 2007

But what exactly is the problem that’s being brought up in these comments? Is it actually the size of the code base that seems to be the issue or is it everything else that has contributed to a problematic increase (a lot of the comments lead towards the difference between strong and weak type languages but a bit on that later)?

A Symptom Not A Problem?
Take for example the attack on design patterns (see ‘Design Patterns Are Not Features’)

A Factory isn’t a feature, nor is a Delegate nor a Proxy nor a Bridge. They “enable” features in a very loose sense, by providing nice boxes to hold the features in. But boxes and bags and shelves take space. And design patterns – at least most of the patterns in the “Gang of Four” book – make code bases get bigger.

When it comes down to it, every single feature of a language ‘enables’ a set of features, as do linked lists, rendering systems, input handling and anything else that comes to mind. But a correctly implemented pattern using these features is much more suitable for a large system than an individual solution that crams as much functionality into as small a space as possible. God help the person who then has to come in a fix this solution when the original author has moved on, where-as a simple comment of ‘This uses a proxy pattern to do…’ would make the code instantly more understandable to any half-decent programmer.

I wouldn’t have such a problem if the issue came from over-use of patterns that lead to a large increase of very small and distributed classes that spread out the functionality across the entire code base, but it seems as though this has come through ‘simple’ implementations, so the question that is begging to be asked is “what has happened to make it so big?”.

I really feel that the main problem being missed here is confusion between code bloat and duplication (and I don’t just mean simple copy and paste duplication but implementation duplication). Steve has mentioned this in his original post but seems to quickly skip past it.

However, copy-and-paste is far more insidious than most scarred industry programmers ever suspect. The core problem is duplication, and unfortunately there are patterns of duplication that cannot be eradicated from Java code.

Now I will admit that it has been a few years since I worked closely with Java, and I am sure someone of Steve’s experience with the language is not wrong, but there are problems with every language and solutions need to be found for the problems you encounter (not everyone is fortunate enough to be able to say “From now on, we are working with X”).

As an example take the FTL. On previous titles I’ve worked on, the mantra was if you needed something, then you implemented it. It didn’t matter if someone else had written a linked list, you needed something slightly different so you created what you needed (one of the first titles I ever worked on professionally actually had 3 different list classes used in throughout the project). But now we are moving forward and using the FTL to avoid this, the code base has been reduced, and it’s clear that it wasn’t the size of the code that was the problem, but the way in which the code base had been approached.

Language Choices

A lot of the discussion (as mentioned above) has come down heavily on the strong/weak language divide. I’m fortunate that while I work primarily with C++, I spend a good chunk of my time working with Ruby. And I will be the first to admit that my Ruby tools are written with less code than the equivalent would be with C++ (or more likely C#). But why?

The simplest offering is not that the language is weakly typed (I would personally say that has very little to do with it), but with the amazing level of support from externally developed libraries. I don’t have the code to hand, but recently I needed to create an FTP interface for a particular script I was working with. With Ruby, this was simply a case of requiring the relevant module and away we go? And if I was writing the tool in C#, if there was an equivalent library, it would have taken just as many lines of code. The only difference being that (in my opinion) the C# version would be easier to understand without running it than the Ruby one for a majority of developers.

An interesting quote from one of the developers of Bugzilla earlier this year

Since 1998 there have been many advances in programming languages. PHP has decent object-oriented features, python has many libraries and excellent syntax, Java has matured a lot, and Ruby is coming up in the world quickly. Nowadays, almost all of our competitors have one advantage: they are not written in Perl. They can actually develop features more quickly than we can…

Full article can be read here – The Problems of Perl: The Future Of Bugzilla

This accurately sums up the point I just made, that these newer language are easier to develop for (and indirectly decrease the code base) because of the number of modules and contributors they have developing for them. Without these (and it’s a shame Java and C++ don’t have such plug-able modules) the languages would be used to create programs that are just as big as any other language.

Conclusions?

Don’t get me wrong on any of this, a large code base is difficult to maintain and develop and care must always be taken not to overly bloat something when it isn’t necessary, but rather than look at a code base and say “This is big, before I move on I need to make it smaller to make my life easier”, you need to approach it from a different direction. Examine the complexity from all angles, and if trimming code makes it easier the read or work with, then go for it, but if refactoring it increases the size, but makes it more structured and maintainable, then that is just a valid as any other solution.

Allocators, by anyone’s standards, are the most difficult and ‘tacked-on’ feature of the STL and it is because of this that different implementations have very different opinions on how they should be implemented.

Obviously we want the FTL allocator model to be as simple as possible to actively encourage people to create their own custom classes and greatly increase the efficiency of the library.

The first decision made was to use only static member functions (inline with the Sgi implementation) as opposed to an allocation object which is created and contained within the container. This was done due to concerns about adding additional objects to classes that need to be as small as possible and not contain anything that isn’t strictly necessary.

The second decision related to the actual use of the allocator. An allocator should only ever do two things, allocate memory when needed and deallocate memory when it’s finished with. There should be no re-allocation available due to worries of fragmentation and there should be no need to define additional and superficial interfaces.

Unfortunately this is not perfect.

Since a collection of containers will use the same allocator model, you cannot have an individual container that has an individual allocator without creating one per object. I did initially pass a unique container ID to the allocator which would allow people to work around this but removed it for simplicity. If this does become a problem for developers in the future then I will consider reading this feature.

Other than that, it has so far proved to be a success and people have already started creating their own allocators for efficiency and debugging reasons amongst others. Hopefully the other alterations to the STL standard will be as successful!

It makes sense to start with the simplest and most obvious addition to the standard in the FTL which are fixed length containers.

The FTL provides fixed length containers for the vector, queue, stack, bitset and array objects which allow developers to allocate all space for the container on the stack or within the allocation of another objects memory. By containing all required memory within the container itself the object will generally have more cache friendly memory access and gives developers much more control over the size of the. Within fixed containers no dynamic memory allocation takes place and a fatal warning is given if someone attempts to store values outside its scope.

There is no scope within the container for allocating or gracfully ignoring a request to add more data than the container can manage. If the container is full, this is a fatal error and the developer needs to know about this as soon as possible.

One notable omission from the fixed container set is a fixed deque. Due to the way memory is allocated for the deque (more on that in a future post), it would generally be more costly to provide a fixed amount of space for the deque when the library doesn’t know how the container will be ultimately used.

The fixed containers also provide null operations for functions like vector::reserve so a developer can swap between them at will without having to alter their code. This allows a dynamic version of the object to be used during development and then swapped to a fixed length container near the end of the project when the full range of the container is known and they are working on small optimisations.

To keep a consistent interface, the fixed array class is the implementation of the TR1 array specification and the array object outside the fixed namespace is based on an altered, dynamic implementation of that specification. The same goes for the bitset container.

Fixed length containers are declared by simply passing the size of the container along with the type.

ftl::fixed::fvector<int, 10> fixedVector;

You might be wondering why the fixed vector is called fvector rather than vector as it is within a separate name space. This is simply due to a bug in Doxygen which leads to the classes not being documented correctly. Since this is the tool we use for all documentation, this is a simple and easy work-around.