Using `extern template` to Reduce Compile Times

Why am I talking about this?

Something I've begun making judicious use of at work in our codebase are strong / new types. Strong types are extremely thin wrappers around other types, usually language provided primitive ones, that help to ensure you don't accidentally use a type or value in a place that you shouldn't. I'll be writing about them in more detail soon, but in my opinion the best way to implement them is through the use of a templates.

Why is compiling templates slow?

Templates are instantiated in every translation unit (cpp file) that uses them unless you explicitly tell the compiler not to. This leads to the compiler performing the same work over and over many times if you use the same type in a template repeatedly. The linker then has to clean up the compiler's mess and throw out nearly all of that work, otherwise the one definition rule (ODR) would be violated.

What can we do about it?

Assuming that templates are necessary in the situation you're using them (in my case I believe they are), there's some easy things we can do for huge reductions in compilation time.

If you know what types you are going to be using in your templates the majority of the time, declare them as extern in the header that declares them, and implement them once in a single translation unit. Let's use a Pixel3 strong type as an example.

// In file pixel3.hpp

template<typename T>
class Pixel3 {
  public:
    T b;
    T g;
    T r;
};

When working with pixel data, there's a finite number of types we can expect T to be, mainly primitive integral and floating point types. The code I work with tends to be entirely 8 bit integers, 16 bit integers, and the the two floating point types. Instead of making the compiler repeat work, let's make its life a little easier:

// at the bottom of file pixel3.hpp, after the Pixel3 class implementation

extern template class Pixel3<uint8_t>;
extern template class Pixel3<uint16_t>;
extern template class Pixel3<float>;
extern template class Pixel3<double>;

using Pixel3b = Pixel3<uint8_t>;
using Pixel3us = Pixel3<uint16_t>;
using Pixel3f = Pixel3<float>;
using Pixel3d = Pixel3<double>;

extern template tells the compiler that it should not instantiate a Pixel3 template in a translation unit that includes pixel3.hpp if it's one of the four aforementioned types, and instead trust that it will find those definitions during the linking process. I also define some aliases to save on typing in the future. Then in one single file we can tell the compiler to generate the definitions:

// in file pixel3.cpp
// don't forget to link against this file!

template class Pixel3<uint8_t>;
template class Pixel3<uint16_t>;
template class Pixel3<float>;
template class Pixel3<double>;

And that's it! The compiler now only performs the work of generating our four Pixel3 types once. This would also be a good place to throw a few static_asserts if your strong types have specific size and / or alignment requirements.

Measuring Elapsed Time in C and Rust

C: <time.h>

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <time.h>

enum { v_size = 100000000 };

void long_function() {
  double* v = malloc(v_size * sizeof(double));

  if (!v) {
    fprintf(stderr, "Memory allocation failed!\n");
    exit(1);
  }

  // initializing array to all 55's
  for (int i = 0; i < v_size; i += 1) {
    v[i] = 55;
  }

  // performing the transformation
  for (int i = 0; i < v_size; i += 1) {
    v[i] += 1;
  }

  free(v);
}

int main() {
  // current time measured in seconds relative to an epoch
  time_t start_calendar = time(0);

  // raw processor clock time since the program started
  clock_t start_clock = clock();

  // new in C11, returns calendar time in seconds and nanoseconds based on a
  // given time base
  struct timespec start_timespec = {0};
  timespec_get(&start_timespec, TIME_UTC); // this function has an error return
                                           // type, ignoring it for brevity

  long_function();

  time_t end_calendar = time(0);
  clock_t end_clock = clock();
  struct timespec end_timespec = {0};
  timespec_get(&end_timespec, TIME_UTC);

  // computes the difference between two time_t calendar time in seconds
  double calendar_diff = difftime(end_calendar, start_calendar);
  // difference between two clock_t using CLOCKS_PER_SEC
  double clock_diff = 1.0 * (end_clock - start_clock) / CLOCKS_PER_SEC;
  // difference between two timespecs
  double timespec_sec_diff =
      difftime(end_timespec.tv_sec, start_timespec.tv_sec);
  long timespec_nanosec_diff = end_timespec.tv_nsec - start_timespec.tv_nsec;

  printf("calendar_diff: %lf\nclock_diff: %lf\ntimespec_sec_diff: "
         "%lf\ntimespec_nanosec_diff: %ld",
         calendar_diff, clock_diff, timespec_sec_diff, timespec_nanosec_diff);
}

Rust: std::time

use std::time::{Duration, Instant, SystemTime};

// Function which presumably takes a while to execute
fn long_function() {
    let mut v = vec![55; 100000000];
    v.iter_mut().for_each(|x| *x += 1);
}

fn main() {
    // A system clock which uses your operating systems current time
    // This can go backwards!
    let start_system = SystemTime::now();

    // A monotonic non-decreasing clock, i.e. a clock that should always increase
    // by the same amount, and never go backwards.
    let start_instant = Instant::now();

    long_function();

    // Because the system clock is susceptible to drift and could
    // go backwards, this operation returns a result.
    let duration_system = start_system.elapsed().unwrap();

    // there is no possibility of this occurring with durations from Instants
    let duration_instant = start_instant.elapsed();

    println!(
        "Time elapsed in long function: \nSystemTime: {:?}\nInstant: {:?}",
        // defaults to milliseconds if an as_period() is not used
        duration_system,
        // converts to floating point seconds, see the docs for other
        // possible as_*() methods
        duration_instant.as_secs_f32()
    );
}