Details

Overview of the structure of the *alpaka* library with concepts and implementations.

The full stack of concepts defined by the alpaka library and their inheritance hierarchy is shown in the third column of the preceding figure. Default implementations for those concepts can be seen in the blueish columns. The various accelerator implementations, shown in the lower half of the figure, only differ in some of their underlying concepts but can share most of the base implementations. The default implementations can, but do not have to be used at all. They can be replaced by user code in arbitrary granularity. By substituting, for instance, the atomic operation implementation of an accelerator, the execution can be fine-tuned, to better utilize the hardware instruction set of a specific processor. However, also complete accelerators, devices and all of the other concepts can be implemented by the user without the need to change any part of the alpaka library itself. The way this and other things are implemented is explained in the following paragraphs.

Concept Implementations

The alpaka library has been implemented with extensibility in mind. This means that there are no predefined classes, modeling the concepts, the alpaka functions require as input parameters. They allow arbitrary types as parameters, as long as they model the required concept.

C++ provides a language inherent object oriented abstraction allowing to check that parameters to a function comply with the concept they are required to model. By defining interface classes, which model the alpaka concepts, the user would be able to inherit his extension classes from the interfaces he wants to model and implement the abstract virtual methods the interfaces define. The alpaka functions in turn would use the corresponding interface types as their parameter types. For example, the Buffer concept requires methods for getting the pitch or changing the memory pinning state. With this intrusive object oriented design pattern the BufCpu or BufCudaRt classes would have to inherit from an IBuffer interface and implement the abstract methods it declares. An example of this basic pattern is shown in the following source snippet:

struct IBuffer
{
  virtual std::size_t getPitch() const = 0;
  virtual void pin() = 0;
  virtual void unpin() = 0;
  ...
};

struct BufCpu : public IBuffer
{
  virtual std::size_t getPitch() const override { ... }
  virtual void pin() override { ... }
  virtual void unpin() override { ... }
  ...
};

ALPAKA_FN_HOST auto copy(
  IBuffer & dst,
  IBuffer const & src)
-> void
{
  ...
}

The compiler can then check at compile time that the objects the user wants to use as function parameters can be implicitly cast to the interface type, which is the case for inherited base classes. The compiler returns an error message on a type mismatch. However, if the alpaka library were using those language inherent object oriented abstractions, the extensibility and optimizability it promises would not be possible. Classes and run-time polymorphism require the implementer of extensions to intrusively inherit from predefined interfaces and override special virtual functions.

This is feasible for user defined classes or types where the source code is available and where it can be changed. The std::vector class template on the other hand would not be able to model the Buffer concept because we can not change its definition to inherit from the IBuffer interface class since it is part of the standard library. The standard inheritance based object orientation of C++ only works well when all the code it is to interoperate with can be changed to implement the interfaces. It does not enable interaction with unalterable or existing code that is too complex to change, which is the reality in the majority of software projects.

Another option to implement an extensible library is to follow the way the C++ standard library uses. It allows to specialize function templates for user types to model concepts without altering the types themselves. For example, the std::begin and std::end free function templates can be specialized for user defined types. With those functions specialized, the C++11 range-based for loops (for(auto & i : userContainer){...}) see C++ Standard 6.5.4/1 can be used with user defined types. Equally specializations of std::swap and other standard library function templates can be defined to extend those with support for user types. One Problem with function specialization is, that only full specializations are allowed. A partial function template specialization is not allowed by the standard. Another problem can emerge due to users carelessly overloading the template functions instead of specializing them. Mixing function overloading and function template specialization on the same base template function can result in unexpected results. The reasons and effects of this are described more closely in an article from H. Sutter (currently convener of the ISO C++ committee) called Sutter’s Mill: Why Not Specialize Function Templates? in the C/C++ Users Journal in July 2001.

See also

different way

The solution given in the article is to provide “a single function template that should never be specialized or overloaded”. This function simply forwards its arguments “to a class template containing a static function with the same signature”. This template class can fully or partially be specialized without affecting overload resolution.

The way the alpaka library implements this is by not using the C++ inherent object orientation but lifting those abstractions to a higher level. Instead of using a non-extensible``class``/struct and abstract virtual member functions for the interface, alpaka defines free functions. All those functions are templates allowing the user to call them with arbitrary self defined types and not only those inheriting from a special interface type. Unlike member functions, they have no implicit this pointer, so the object instance has to be explicitly given as a parameter. Overriding the abstract virtual interface methods is replaced by the specialization of a template type that is defined for each such function.

A concept is completely implemented by specializing the predefined template types. This allows to extend and fine-tune the implementation non-intrusively. For example, the corresponding pitch and memory pinning template types can be specialized for std::vector. After doing this, the std::vector can be used everywhere a buffer is accepted as argument throughout the whole alpaka library without ever touching its definition.

A simple function allowing arbitrary tasks to be enqueued into a queue can be implemented in the way shown in the following code. The TSfinae template parameter will be explained in a following section.

namespace alpaka
{
  template<
    typename TQueue,
    typename TTask,
    typename TSfinae = void>
  struct Enqueue;

  template<
    typename TQueue,
    typename TTask>
  ALPAKA_FN_HOST auto enqueue(
    TQueue & queue,
    TTask & task)
  -> void
  {
    Enqueue<
      TQueue,
      TTask>
    ::enqueue(
      queue,
      task);
  }
}

A user who wants his queue type to be used with this enqueue function has to specialize the Enqueue template struct. This can be either done partially by only replacing the TQueue template parameter and accepting arbitrary tasks or by fully specializing and replacing both TQueue and TTask. This gives the user complete freedom of choice. The example given in the following code shows this by specializing the Enqueue type for a user queue type UserQueue and arbitrary tasks.

struct UserQueue{};

namespace alpaka
{
  // partial specialization
  template<
    typename TTask>
  struct Enqueue<
    UserQueue
    TTask>
  {
    ALPAKA_FN_HOST static auto enqueue(
      UserQueue & queue,
      TTask & task)
    -> void
    {
      //...
    }
  };
}

In addition the subsequent code shows a full specialization of the Enqueue type for a given UserQueue and a UserTask.

struct UserQueue{};
struct UserTask{};

namespace alpaka
{
  // full specialization
  template<>
  struct Enqueue<
    UserQueue
    UserTask>
  {
    ALPAKA_FN_HOST static auto enqueue(
      UserQueue & queue,
      UserTask & task)
    -> void
    {
      //...
    }
  };
}

When the enqueue function template is called with an instance of UserQueue, the most specialized version of the Enqueue template is selected depending on the type of the task TTask it is called with.

A type can model the queue concept completely by defining specializations for alpaka::Enqueue and alpaka::Empty. This functionality can be accessed by the corresponding alpaka::enqueue and alpaka::empty template functions.

Currently there is no native language support for describing and checking concepts in C++ at compile time. A study group (SG8) is working on the ISO specification for conecpts and compiler forks implementing them do exist. For usage in current C++ there are libraries like Boost.ConceptCheck which try to emulate requirement checking of concept types. Those libraries often exploit the preprocessor and require non-trivial changes to the function declaration syntax. Therefore the alpaka library does not currently make use of Boost.ConceptCheck. Neither does it facilitate the proposed concept specification due to its dependency on non-standard compilers.

The usage of concepts as described in the working draft would often dramatically enhance the compiler error messages in case of violation of concept requirements. Currently the error messages are pointing deeply inside the stack of library template invocations where the missing method or the like is called. Instead of this, with concept checking it would directly fail at the point of invocation of the outermost template function with an expressive error message about the parameter and its violation of the concept requirements. This would simplify especially the work with extendable template libraries like Boost or alpaka. However, in the way concept checking would be used in the alpaka library, omitting it does not change the semantic of the program, only the compile time error diagnostics. In the future when the standard incorporates concept checking and the major compilers support it, it will be added to the alpaka library.

Template Specialization Selection on Arbitrary Conditions

Basic template specialization only allows for a selection of the most specialized version where all explicitly stated types have to be matched identically. It is not possible to enable or disable a specialization based on arbitrary compile time expressions depending on the parameter types. To allow such conditions, alpaka adds a defaulted and unused TSfinae template parameter to all declarations of the implementation template structs. This was shown using the example of the Enqueue template type. The C++ technique called SFINAE, an acronym for Substitution failure is not an error allows to disable arbitrary specializations depending on compile time conditions. Specializations where the substitution of the parameter types by the deduced types would result in invalid code will not result in a compile error, but will simply be omitted. An example in the context of the Enqueue template type is shown in the following code.

struct UserQueue{};

namespace alpaka
{
  template<
    typename TQueue,
    typename TTask>
  struct Enqueue<
    TQueue
    TTask,
    std::enable_if_t<
      std::is_base_of<UserQueue, TQueue>::value
      && (TTask::TaskId == 1u)
    >>
  {
    ALPAKA_FN_HOST static auto enqueue(
      TQueue & queue,
      TTask & task)
    -> void
    {
      //...
    }
  };
}

The Enqueue specialization shown here does not require any direct type match for the TQueue or the TTask template parameter. It will be used in all contexts where TQueue has inherited from UserQueue and where the TTask has a static const integral member value TaskId that equals one. If the TTask type does not have a TaskId member, this code would be invalid and the substitution would fail. However, due to SFINAE, this would not result in a compiler error but rather only in omitting this specialization. The std::enable_if template results in a valid expression, if the condition it contains evaluates to true, and an invalid expression if it is false. Therefore it can be used to disable specializations depending on arbitrary boolean conditions. It is utilized in the case where the TaskId member is unequal one or the TQueue does not inherit from UserQueue. In this cirumstances, the condition itself results in valid code but because it evaluates to false, the std::enable_if specialization results in invalid code and the whole Enqueue template specialization gets omitted.

Argument dependent lookup for math functions

Alpaka comes with a set of basic mathematical functions in the namespace alpaka::math. These functions are dispatched in two ways to support user defined overloads of these functions.

Let’s take alpaka::math::abs as an example: When alpaka::math::abs(acc, value) is called, a concrete implementation of abs is picked via template specialization. Concretely, something similar to alpaka::math::traits::Abs<decltype(acc), decltype(value)>{}(acc, value) is called. This allows alpaka (and the user) to specialize the template alpaka::math::traits::Abs for various backends and various argument types. E.g. alpaka contains specializations for float and double. If there is no specialization within alpaka (or by the user), the default implementation of alpaka::math::traits::Abs<….>{}(acc, value) will just call abs(value). This is called an unqualified call and C++ will try to find a function called abs in the namespace where the type of value is defined. This feature is called Argument Dependent Lookup (ADL). Using ADL for types which are not covered by specializations in alpaka allows a user to bring their own implementation for which abs is meaningful, e.g. a custom implementation of complex numbers or a fixed precision type.