AIStatefulTask ‐ Asynchronous, Stateful Task Scheduler library.

Threads-like task objects evolving through user-defined states.

Introduction

AIStatefulTask is an open source project written by Carlo Wood and is the accumulation of more than a decade of design effort, testing and improvements. AIStatefulTask allows you to write event driven applications of arbitrary complexity.

Features

The features of the library include:

• High performance.

  • A great amount of effort was put into assuring the fastest possible algorithms and code. The whole design of the library has been with one goal in mind: to have every core in the machine run a thread that never blocks; does as little system calls as possible, certainly none that might put the thread to sleep.

  • Where possible a lock-free approach has been chosen using weakly ordered atomics, while critical areas — when unavoidable — are the least number of clock cycles long in order to reduce concurrent locking of mutexes to a negligible amount.

  • To give one example: it is possible to have millions of timers running without a significant overhead. The required algorithm to achieve this dictates that one has to make a sacrifice: it is only possible to start (such high performance) timers with a timeout interval that is known at compile time. A more naïve design would rather have approached the problem from the starting point that one should be able to start timers with arbitrary (run time) intervals with as unavoidable result that using too many timers results in significant overhead.

  • Another example is the extremely efficient I/O. This library is the only library in the world that allows reading and writing the same socket concurrently. Input is buffered but without the need to copy the data: data is read from a socket into memory and in most cases never moved, but processed in-place.

  • In order to minimize the number of context switches, a thread only goes to sleep when there is nothing else to do (the lock-free ring buffers of the thread pool are empty). Nevertheless, in order to avoid unnecessary system calls to wake up sleeping threads, a special SpinSemaphore has been designed (again, based on weakly ordered atomics) that utilizes the linux-specific futex system call directly in order to gain some cpu cycles by being able to wake up more than one thread at a time (the default POSIX semaphore only allows waking up a single thread at a time, because not all operating systems support waking up more). A (specifically tuned) test application became 20 times faster using our custom semaphore as opposed to using the libc provided one.

  • And the list goes on. If anything, then this is a Maximum High Performance library.

• Scalable to many cores.

  • Doing I/O requires one thread to sleep in epoll. For the timers we use the (hardware) per-process POSIX timer (see timer_create(2)). There is no distinction between other threads; the user can have dedicated threads of course, and have it donate cycles to the tasks running in this library by calling a function (e.g. from its 'idle' point in the main loop), but the library works perfectly well by dividing the work over threads from its thread pool. The idea is to have as many threads as one has cores, so that there is no need for context switching by the kernel within the application (aka, on an otherwise unloaded machine). A thread that mostly sleeps, like the dedicated I/O thread that basically does nothing but wait for I/O events in epoll, does not need to be counted here of course (not using a core makes the application slower than having a context switch upon I/O events).
  • As the work of an application can be divided over many little tasks, those will be executed in parallel whenever possible and when (more) cores are available.

• Highly robust.

  • Every demand that the library puts on its usage is enforced. Preferably at compile time; but when that is not possible (or would introduce less efficient code), usage errors are caught with asserts that are accompanied with detailed comments explaining what one did wrong when that assert failed and how to fix it. Once a program is finished and is using the library correctly then those asserts can of course be removed by compiling for the production environment as opposed to debug mode.
  • In fact, because of this, a reasonable approach would be to just "try something" and/or make random changes to working code; after fixing all compile errors and testing the application well enough to make sure there are no asserts, it is guaranteed that it works again. I am guilty of using this approach myself many times, so that this feature is in fact a tested feature.
  • For example:
    // The lifetime of at least one FileLock object, of the corresponding
    // canonical path, must be longer than the lifetime of FileLockAccess
    // objects created from it. I.e. you deleted the FileLock object
    // corresponding to this FileLockAccess. Don't do that.
    ASSERT(!m_debug_weak_ptr.expired());

• Builtin debugging support.

  • Debugging complex, multi-threaded programs can be quite a challenge. Stepping through the code with a debugger is often not going to cut it. The only reasonable way to debug such a program is with highly detailed debug output that prints everything that goes on. This library been written with that in mind from the beginning. Based on libcwd, debug output is written to so called debug channels allowing to follow exactly what happened just prior to the problem that you want to investigate. The debug code can be omitted entirely from production code of course, but even in debug mode doesn't cause a very significant overhead apart from the actual output (especially to screen). However, one can— for example— turn all debug output off except for one (or more) specific Tasks.

• Thread-safe by design.

  • AIStatefulTask helps in many ways with writing a (heavy) multi-threaded program without running into all the normal problems: each Task is an object with its own (private) members that it is manipulating. The core of a task is implemented in a single virtual function ( multiplex_impl). Only one thread at a time will execute that function, so it is basically single-threaded; no mutex locking is necessary for any of the private variables. Synchronization with other tasks is provided by the library in a fail-safe way.
  • For example, a parent task could start a child task that will call signal on its parent once it is done. The parent has to wait until the child task has finished; and therefore calls wait immediately after starting the child task. This normally leads to a race condition: will the signal arrive before or after the parent calls wait? But here that doesn't matter: both the signal and the wait call merely toggle a flag and are processed atomically once the parent returned to the library and it has to be decided if the parent task needs to go to sleep; which it will only do when wait was called and signal wasn't.
  • When a task is run from an AIEngine then it is executed from a known point in the main loop of, for example, some other library. That means that the task runs at the moment that that library doesn't run so again there is no need for complex mutex locking. This is the prefered way run tasks that have to interact with non-thread-safe third party libraries.

• Error handling and reporting via C++ exceptions.

  • C++ exceptions are the preferred error reporting mechanism for most applications. The exception thrown includes detailed information important for diagnosing the exact cause of errors. One can use english or keywords; the errors are in a format that allows translation (and reordering of information as a result of a different grammar). This allows one to start in English and add translation later. In most cases such errors are fatal for whatever task initiated it. For example, the user tries to write a file to a non-existing directory. Then the rest of the program can continue running, the only thing needed is to cancel this user-action. Hence, an error is closely related to (if not the same as) a pop-up telling the user of a problem.

Rationale

Problem statement

Applications often need to do something along the lines of

"do this, and when you're done [...]"

If that task simply requires lots of CPU cycles then one can pass the task to a thread and let that crunch away until the task is done. If the task is something basic, like reading a socket, then you might be able to find a library that supports performing that particular task in an asynchronous manner. In both cases you probably need some kind of event mechanism to be notified when the task has finished.

In reality life isn't that simple. Often a task will use some– or possibly a significant amount of– CPU cycles alternated by the need to idle (e.g. waiting for network I/O). It might even never finish; requiring a timeout and/or error handling, etc. There certainly won't be an existing library that does this work asynchronously for you and unless you can spare a core to do the task synchronously passing the task to another thread won't help you one single bit.

If your application is complex, tasks will need to perform an arbitrary number of other tasks, which in turn need to do tasks before they can finish, so that it is both, completely unpredictable what kind of resources and timing a single task needs, as well as not feasible to use a thread-per-task because there are simply too many tasks.

Moreover, each task needs a callback mechanism; if such callbacks are performed by the thread that happens to finish a task (remember, you can't allocate one particular thread for a task) then it is not possible to predict which thread, or in what state it is; as a result you can't do much more than set a flag in the callback and still need something to pick up on the fact that the task was finished.

Heavily asynchronous code has the tendency to exist of a large collection of callback functions where one quickly loses overview of which function(s) will be called when and in what order. Object orientation is lost because of this and thread-safety can only be guaranteed by locking each and every variable every time they are accessed, which either leads to unpredictable deadlocks or thread-unsafe accesses when attempting to avoid 'unneeded' mutex locking.

The objective of AIStatefulTask.

The design goal of AIStatefulTask was to create a framework that allows one to dispatch tasks to a thread pool without losing thread-safety or object orientation and without wasting CPU cycles or losing the ability to fully utilize the power of every available core.

A primary goal has also been to allow a developer to achieve all that by only concentrating on a single object at a time; having a clear one-on-one relationship between a Task and an object.

Despite the complexity of thread dispatching, asynchronous execution and error handling, the code of such a task should give (mostly) a linear feel along the way of,

"To do this task, do this, and when done, do this, and when done, do this, and when done, then this task is finished. Do this if an error occurs."

Thread-safety And Object Orientation.

Each Task in AIStatefulTask is derived from class AIStatefulTask, overriding up to six virtual functions.

These virtual functions allow the developer to perform initialization, execute code in the case something unexpected happens or when the task finishes, and of course define what the task is supposed to accomplish by overriding the pure virtual AIStatefulTask::multiplex_impl member function of the base class.

The linear feel of a task is accomplished by having all code that comprises the majority of the work done by the task in a single switch statement, in the overridden multiplex_impl member function:

void MyTask::multiplex_impl(state_type run_state)
{
switch(run_state)
{
case MyTask_start:
/* Do this, and when done continue with MyTask_next_step. */
break;
case MyTask_next_step:
/* Do this, and when done continue with MyTask_last_step. */
break;
case MyTask_last_step:
/* Do this, and when done continue with MyTask_done. */
break;
case MyTask_done:
/* This task is finished. */
finish();
break;
}
}
void MyTask::abort_impl()
{
/* Do this if an error occurs. */
}

The states are user defined. This construct allows for a sequential feel— while the thread, at any point, still can return to its main loop and later reenter the code where it left of. Note that it doesn't have to be linear, but in general the state of a Task will more or less advance in one direction; in the above example from MyTask_start to MyTask_next_step to MyTask_last_step and finally to MyTask_done. So if the case statements are put in that order then this gives an intuitive feel of how the task will progress (from top to bottom).

A Task object is only executed by a single thread at a time. Although theoretically that can be a different thread each time multiplex_impl is reentered the effect is still single threaded; as long as the variables that the Task is accessing aren't also accessed outside of a Task then no locking is necessary. Most notably, (non-static) private member variables of the Task do not need a mutex to protect them when a Task works in isolation (that is, no other threads call its member functions while it is working), which is the normal mode of operation for a Task as its internal state is basically unknown until the task has finished: a task should be left alone doing its thing while it is doing its thing.

Moreover, a Task is executed from a well-defined point in each threads main loop. This allows to access variables of thread-unsafe third party code by running the Task in the same thread as that third-party code. Note that it is possible to tell a Task in which thread to run.

See Usage for more detailed information.