Learning to Listen to Cpp Con Talks
/home /experience /projects /skills /etc
What is this?
A blog that's a slideshow so that I can present my thoughts in this form.
Click the buttons below to see the slides. Or use the drop-down above.
I can't be bothered to CSS well enough to keep those button in the same spot.
Also, some parts have no reason to be one slide, except that I was lazy. This might be better thought of as a collection of mini-posts around a sort of trend I'm seeing.
Alternate Titles
Things to Think About When Watching CppCon
What I Wish I Knew About Low Level Langauges
How to Stop Worrying And Love Systems Programming
Trends in Programming Languages and Talks I Like
Actual Topics
This is what I mean to include:
  1. The End of Moore's Law
  2. SIMD
  3. Async
  4. Atomics
  5. Instruction Execution
  6. Better Hash Map Implementations
  7. Benchmarking
  8. Static Checking and Correctness
  9. Superior C Syntax
  10. Modules
  11. Compile-Time Programming
  12. RAII
  13. Linear Types
  14. Error Handling
  15. Dispatch
  16. Type Erasure
  17. Things I Wish I Could Talk About
I should mention: I sort of prattle on about topics in whatever depth I feel like. If you can read (x86) assembler, know somethings about object-oriented programming, and have messed around with C, I think you should be able to follow most of what I ramble on about. If you are ever confused, please note that it's more likely my lack of coherence, patience, clarity, or editing, than it is any fault on your part.
The End of Moore's Law
Moore's law was a rule of thumb that declared that CPUs would have double the number of transistors every year.
This reached a physical limit. In particular, CPUs don't double in density any more. We've reached the peak clock rate for the time being.
There are a few countermeasures: parallelism and alternate hardware. This is opening up the world of low-level programming where we can question how we're designing the interactions between the hardware and the programming language. We can look back again and ask: what are we doing? Are we doing the best we can? This leads to the next two big movements I'd like to talk about: SIMD and Async programming. It's worth noting that instruction execution is another important factor.
There are two other noteworthy innovations are parts of SIMD:
  • GPUs: great for SIMD, but really hard to use well for some tasks.
  • TPUs: GPUs specialized even further for ML. I do not know enough about these to discuss them.
SIMD
This is a parallelism paradigm that executes a single instruction on multiple datapoints in parallel.
This is what GPUs do and they're really good at it, but the more interesting movement is the fact that CPUs also execute SIMD instructions and many compilers will output them.
CPU SIMD is a strange sort of thing since it can only do some arithmetic operations on up to 128-bit values. Yet this can be faster for large enough lists. Look into AVX and AMX for industry implementations of this.
SIMD Algorithms
This is more applicable to GPUs but brings up a lot of interesting points in SIMD programming.
Fundamentally, there is an interesting quirk, best expressed in PTX (the GPU assembly language).
.reg .pred p; // declare p a predicate register.
setp.lt.s32 p, i, n; // p = (i < n)
@p add.s32 i, i, 1; // i++
	    
Suppose p was not set, that i >= n. In this case, if all the GPU cores were running the same instruction in lock-step, what would happen?
Therein lies the catch: SIMD does not to branch very well. In GPUs, this branch would lead to some threads running a no-op. If there was an else case:
.reg .pred p; // declare p a predicate register.
setp.lt.s32 p, i, n; // p = (i < n)
@p add.s32 i, i, 1; // if (p) i++
@!p add.s32 n, n, 1; // if (!p) n++
	    
Now, some threads will no-op in the first branch, and the rest will no-op in the second. This can be slow. It really depends.
This changes the way some algorithms can be approached. There are a lot of good algorithms for SIMD, particularly at the level of parallelization that a GPU can offer. Parallel reductions (summing) and Bitonic sort are classics, but radix sort also exists. Graph algorithms on GPUs are even more challenging as questions about the memory layout are esssential.
Async
Another, simpler, more CPU-bound way to handle the fact that speed no longer doubles for free is the use of multi-threading. And one really interesting abstraction for it is asynchronous execution. Technically, the two are actually completely unrelated: we could run code in any wacky order (I see you Haskell) without running multiple threads, and we could make the execution order seem "synchronous" (or sort of linear) even if it crosses through multiple threads. However, marrying the two concepts justifies asyncronous execution while simplifying multithreading.
This is also the culmination of a long movement that actually may have started in JavaScript. JavaScript used asynchrony for web development (which is how a lot of low-level usage is done too). In JavaScript, you’d use async to load external resources so that the page could load and then wait for external requests. This was a slow and steady escalation (of which I’ve only used the first half, and half-heartedly heard and read on the second): there was AJAX, then jQuery and promises, then Axios and Fetch, which finally reached compatibility with a native async/await syntax. This too may be best first seen in code:
async function loadStuff() {
    let someFirstThing = await loadThatFirstThing();
}
	    
If you’re familiar with promises, you’d expect that they may fail. In this case there are two possible resolutions, detailed here.
In either case, the notable thing is that this fits really well with "left-leaning" code. This is means that code does not end up in some nested callback mess. You can see this with the use of the await statement since it keeps the callback on the left instead of inside a scope and (hopefully) indentation level. However, there is an interesting complication that I do not know how JS addresses:
async fn loadStuff() {
    let someFirstThing = await loadThatFirstThing();
    let someOtherThing = await loadThatOtherThing();
}
	    
Here, in Rust (note my cheeky s/function/fn/g), the awaits will execute one after another. There are ways to join them so they will execute concurrently.
This paradigm has some interesting implications that are related to ideas in RAII and with closures.
Atomics
Somehow I cannot help but think of Dune when I say Atomics.
In a way the Dune reference might be relevant: you could want to nuke your code with atomics. Use these as carefully as most people would use Dune atomics. Or, like most people in Dune, don’t.
Atomics are instructions that cannot be divided. This is useful for one big reason: multithreading.
#include <atomic>

template<class T>
class Locked{
public:
    Locked(T t): locked_(t), mutex_(false) {}
    T& ExclusivelyAccess() {
        bool f = false;
        while(!mutex_.compare_exchange_strong(f, true));
        return locked_;
    }
    void EndExclusiveAccess() {
        mutex_ = false;	
    }
    bool IsHeld() { return !mutex_; }
private:
    std::atomic<bool> mutex_;
    T locked_;
};
	    
This a way to implement a lock. In theory. I have yet to test this (and I doubt I ever really will -- testing multithreading is really hard). A std::atomic<bool> handles the locking, ensuring that between the check for whether the bool was false and when it was set to true. This makes the locking safe. Documentation on std::atomic. Be very careful with this: the built-in is great for small types that fit in hardware-based instructions. Otherwise you will see that the class will use locks and, honestly, it should.
These are important basic building blocks. I will (theoretically) implement an improvement on this in my discussion on RAII.
Execution Order
(I only know how Intel does it (sorta) but the point here is that there can be a lot more abstraction than you’re expecting.)
This has to do with pipelining which is an essential point in modern computing. Memory is slow. This is really worth its own slide. Or emphasis.
Memory is Slow
The CPU is about 100x faster so there are two key solutions:
  • Caching
  • Pipelining
I’ll rant about caching later on. So this bit is about pipelining.
Pipelining is where the CPU loads the next instruction while executing the current one. This has a lot of interesting issues, particularly with branching where there’s a whole branch prediction system as I will discuss a bit later. But first: out of order execution. This is the fact that CPUs are really smart. Not just in as much as they compute, but also that they make sure they’re getting fed instructions at all times. They pipeline, but they do one more thing: out of order execution.
But let me get to the main point as I discuss these things.
Meltdown: Attacking Out of Order Execution
This is much better covered in the paper about this attack.
So first, let me describe instructions. x86, a comon commodity assembler, is a CISC assembly. This means that instructions can decompose into multiple different ones sometimes. Consider something like char aznable = white_base[amuro][ray] (suppose that amuro and ray were ints and white_base is a char[][]).
;suppose %rdi is the address of white_base[0][0], %rbx = amuro, and %rcx = ray
movq %rdx, (%rdi, %rbx, 8)
mov %al, (%rdx, %rcx)
;now aznable is in %al
            
Technically that assembly is more for illustrative purposes than it is for real technical stuff. This pair of instructions would expand into a few RISC instructions (RISC being the simpler assemblers whose instructions really are one instruction):
;suppose %rdi is the address of white_base[0][0], %rbx = amuro, and %rcx = ray
mult %rbx, %rbx, 8
add %rdi, %rdi, %rbx
load %rdx, %rdi
add %rdx, %rcx
load %al, %rdx
;now aznable is in %al
            
The mult and adds are in the x86 too, just hidden in the addressing modes. The upshot (other than an assembly crash course that went faster than the red comet) is that CISC has to get turned into RISC (sort of -- it is probably more complicated and with different restrictions since the CPU does it).
Since the CPU does it, it can also be smarter about its own state, and reorder instructions. In particular, consider the following C:
#define CACHE_LINE_SIZE 1024
char probe[CACHE_LINE_SIZE * 256];	      
char should_not_be_accessed = *pointer_into_kernel_memory;
char* for_cache = &probe[should_not_be_accessed * CACHE_LINE_SIZE];
            
Here, the CPU actually will probably schedule the multiplication and read that happens after the pointer read to run at the same time. This is sort of magical since it really speeds up the the rate at which the CPU can execute instructions: it will hardly wait for the memory since it is perparing the subsequent operations.
However, the example above is more interesting. If pointer_into_kernel_memory is actually a pointer to an address the running process cannot read, this will signal an error (SIGSEGV for instance). If the program can recover from that, however, the Meltdown vulnerability found that the for_cache pointer may have already been loaded and then unloaded once the exeception occured. This is not really an issue except that the cache will have changed (yes I will talk about caching). By reading what part of memory was loaded into the cache, the attacker can deduce the value of should_not_be_accessed.
Spectre: Attacking Branch Prediction
This too is better explained in the paper.
This is another example of speculative execution. Earlier, I mentioned pipelining as an optimization on running programs, and just vaguely mentioned that branching is difficult. Here I will have to elaborate.
When the code branches, the CPU guesses what path the code will take a prefetches along that path. However, if the CPU guesses incorrectly, it would have to rewind and run along the correct path. This can be slow, so, surprisingly, the CPU has been tuned to be right nearly 95% of the time. Initially, I thought that this was obvious since the majority of branches would be jumping to the top of a loop. However, it is more interesting to consider that the branch predictor may have to reason about returns. But still: it might be right more than 95% of the time. And here I was thinking that 50% was good.
However, when the CPU is wrong, it may not reset the cache when it goes back to execute the branch. This can be interesting if the instructions set up the cache so that:
  1. Some secret data is in the cache
  2. Data related to the branching is not in the cache
  3. The CPU has been mistrained to think that the branch will go one way
Then, the cache might incorrectly have some data the reveals a secret.
The Upshot
The point of including this, particularly in a menagerie of rants about various things "relating to systems programming" is to point out that despite being able to understand assembly, you likely do not know what the computer is really doing. That how the CPU works is really, really hard to reason about. Simply, the point is: keep questioning it -- you don’t understand it.
Better Hash Maps
This is really a rant about my education, but brings up some useful points. I hope to whine about Hyrum’s law and benchmarking enough to make this more than just complaining that I did not understand a hash map implementation from back when I was in data structures.
If you read the last slide (woah you’re still reading?! OK, sorry for the Gundam references.) you might be waiting for me to talk about caching. Well, this is that bit.
Hashmap Crash Course
Hashmaps are the bread and butter of fast storage and retrieval. The only places that shirk them have other constraints that force them to do so. If you can use a hashmap, it’s likely that you should. It gives you this API:
Function Functionality Average big-O
Insert Puts something in the hashmap. This "something" is usually a "key" and the "value" it maps to. O(1)
Update Updates something in the hashmap. This "something" is usually a "key" and the new "value" it maps to. O(1)
Get Retrives the value attached to the key. O(1)
Delete Removes the key and value that were in the hashmap. Usually this returns the value and sometimes also the key. O(1)
If you do not understand this, I should appologise and admit I am not too great at explaining how to use these wonderful data structures since I kind of got used to them before properly understanding how they worked. I think these are difficult to motivate if you are particularly inexperienced in coding. If you have coded much, you might have found yourself needing to related pairs of things and may be this sentence helped something click. I can only hope.
Implementing Hash Maps
The rightmost column in the table above should look somewhat magical: O(1) for everything does sound like a scam. The catch is that this is the average case: the worst case for all of those operations is O(n). The secret sauce is called hashing. You project values into 64-bit integers (nowadays) and use those integers to look in an array.
Hashing is not quite that simple, though, since there is one problem that is common with all implementations of hashmaps: not all values of every type can fit into 64 bits. (Consider strings, particularly those with more than 8 characters, for example.) This leads to a problem with the hashing: every hash function cannot be injective (there will be two keys k1 and k2 where hash(k1) = hash(k2) despite k1 not being the same as k2). Keys that hash to the same thing while being different are called collisions. This would lead to them being placed in the same spot in the array in a hashmap. Implementations tend to differ on how to handle this. In particular:
  • Chaining: placing keys and values into a linked list in the relavant array slot
  • Linear probing: placing collisions into the next spot on the array
A First Glance: Is Chaining Better?
Initially, it seems that the chaining approach would be better: since every collision is in an independent linked-list, one collision does not affect any others, and deletions do not make look up too difficult. Consider this sequence:
  1. insert(k1)
  2. insert(k2) (where k2 collides with k1)
  3. insert(k3) (where k3 hashes "next to" k1 and k2)
  4. delete(k2)
Naive implementations of linear probing could be stuck: once k2 is removed, for correct behavior with k3 you’d either have to move k3 down to where it "belongs" or look through the entire list. This, however, can be alliviated with a simple fix: have a "tombstone" state to mark the ghosts of collisions past. This way, only a few lookups would have to perform a (partial) linear search through the array.
This leaves the question of which one is actually better: it may still seem like chaining is better since it is not only simpler, but has slightly stronger guarantees: you can be a little more sure that collisions will be contained to the array slot (commonly called bucket).
Caching: Why Linear Probing is (Sort of) Better
Despite the simplicity of Chaining, there is an implementation called a dense hashmap which turns out to be better (sometimes). There are a few key observations that help. In particular, it’s a lot rarer for users of hashmaps to delete things. Usually, hashmaps are built in a function and then queried a bunch. Generally, they may be inserted into as well. This can be intuited from implementations of graph traversals and algorithms. This benefits linear probing with tombstones since the worst case is rarer. But there more, since, recall:
Memory is Slow!
This is where the CPU does something smart: it prepares for the memory I/O by having layers of caching. It sucks nearby memory into a cache each time it needs to since it is likely that programs will access nearby memory soon after it accesses the memory being accessed (spatial and temporal locality respectively). Linked lists ruin this since it’s hard to ensure that the nodes will be close enough for the cache to suck in adjacent nodes. Arrays, however, work really well for this. Traversing an array can be a lot faster. Another advantage is that the arrays could be traversed via SIMD instructions, particularly when probing for occupied slots.
This can be further optimized as the flat_hash_map. This can even be further optimized to incorporate chaining, but we end up reaching an important (tangential) discussion: theorizing only helps so much. At some point, you should just benchmark. Your dataset might have some biases, your CPU, or the memory layouts you end up with could vary. You do not actually know what will perform better until you measure it.
Benchmarking
Not to imply that I have actually done any of this. But do as I say, not as I do.
I hope the last two slides have made it clear that computers are a lot more complicated than you would think. And even then, they are hard to predict. It is definitely odd that the interaction of small predictable rules leads to unpredictable behavior. But such is life. So if you want to know about speed an be sure, benchmark.
The catch is that getting reliable benchmarks is hard. The behavior of various data structures and algorithms depends really heavily on the data you run it on. Sorting is a really important special case about this since it has been studied very heavily.
Static Checking, Correctness...
...and the worst transition in topics ever.
Now for someting Completely different.
The previous topics were sort of granular, small details. 細かい. They vary by CPU and detail precisely how the computer executes commands and what sorts of tricks it uses. The rest of these slides are really one abstraction layer above: they detail programming language design decisons and how they interact with programming. This can be seen as a lot broader, more opinionated, and (sorta) a move from science to mathematics.
The discussion of programming languages does have measurable hypotheses and long-term experiments, but also tends to lead to less data-dependant and more mathematical facts. Furthermore, it really helps to reason about the compiler as a theorem prover (particularly not for C++). These facts assist coding since when code behaves like math, it tends to be error (and weirdness) free. Once code is weirdness-free, it also becomes easier to optimize. (I will define "weirdness" as it is relevant.)
A key takeaway is that there is a difference between static and dynamic certitude that a program is correct. The point is that static assurances apply to the code and dynamic assurances would apply depending on the data.
Hence, there is an important point to be made about static guarantees: they assure you regardless of the data. This can protect your code from security vulnerabilities and crashes without the need of tests. (Though you should still have unit tests for some sweet dynamic guarantees too.)
This is made even better by the fact that static analysis tools are getting better. They understand more errors and patterns in larger programs. Compilers are getting smarter too.
Superior C Syntax
This is a trend I have seen in quite a few programming languages that starts with this line of C/C++ code: int* a, b; and asks what the type of b is. Here, many people will have thought that b is an int*, whereas it is really an int.
Furthermore, consider typedef void* (*alloc)(int,int)[2]; or any other weird alias of a function pointer type. This is a large complication, to the point where there is a website to decipher it. (Exercise: guess what I was trying to forward declare. Suggest some (useful) values of that type.)
That’s one of the simpler surface-level issues with C: its syntax makes it easy to run with scissors. Nowadays, having the type after the variable name has become more popular, so let a, b: * int would make two pointers two integers. (Actually, I would hope that the notion of a fixed int also vanishes and we move to stating the size in the type like Rust’s i32. I also like that the unsigned keyword can be replaced by a (in my opinion) cleaner u prefix.)
Bigger Critiques of C
The above points are important: they would reduce the bugs in code and make the learning curve for some programming languages less steep. But there are bigger fish to fry. And that is mostly what the next sections discuss. I leave you with this link: to a really good writeup.
The most compelling critique to me is the small growth in replacement languages:
  • Zig
  • Odin
  • Jiyu
(All are pretty small still, but they were made for a reason.) Some trends are:
  • Immutability by default.
  • Support for Generics.
  • (Sometimes.) Closures.
  • (Sometimes.) RAII or destruction.
  • Better error handling either with panics or a result sum type
  • Enums being tagged unions (or at least namespaced)
  • Smarter handling of integer overflows or underflows (depending on compilation modes sometimes)
Modules
using namespace std;
Not really: do not use that and C++ namespaces are not modules
This is (ok, also) at two levels: the C/C++ (and COBOL) use, and how C++20, Rust, and various newer languages do it.
Header Files Suck
I’m going to say some things and if you recognise them, it will hurt.
ODR Violation
Static Initialization Of a Global
Include Guard
Makefile (and automake)
Hurt yet? Had enough? (I had enough of my own CSS as well as these issues.)
What if I told you you were doing something COBOL does? That is the most egregious and striking example of C/C++’s inclusion system, in my opinion.
In COBOL, the included file is called a copybook. And it is literally copied into the file. But wait, there’s more:
COPY some-procedure IN some-library REPLACING ==:TAG:== BY ==some-example==
	    
This is explained in more detail in IBM’s manual.
My point is that there was a time when copying files in and replacing text was the cutting edge. But modules mean a lot more now. There are some invariants we would like that simple inclusion does not provide. In particular, I would like:
  • Order of imports not to matter
  • Modules inside modules
  • Ways of specifying access and aliasing
  • Everything being in a module
This leads to a lot of issues going away since name collisons will be harder to contrive, and build tools can be significantly simpler since they do not have to worry about the order in which they build things. Programmers would have all the conveniences they need too. But I have a second point:
Modules: The Real Encapsulation
OOP is dead. Well kinda. That is the hot new thing to say. Why you say it and what you really mean does not matter, just like how it never mattered what you said when you "did OOP." But there was a rule in OOP that is quite interesting, and, to some extent important: encapsulation.
Encapsulation is the idea that drivers need not know how engines work. It is the division between understanding what something does and understanding how it works. In code, you do not have to look at the innards of a library to use it. An object would not reveal its fields.
This has been implemented in a sort of mantra that plagues Java code:
public class Point3D {
  private int x, y, z;

  Point3D(int x, int y, int z) {
    this.x = x;
    this.y = y;
    this.z = z;
  }

  //Drumroll please...
  public int getX(){ return x; }
  public void setX(int x){ this.x = x; }
  public int getY(){ return y; }
  public void setY(int y){ this.y = y; }
  public int getZ(){ return z; }
  public void setZ(int z){ this.z = z; }
}
	    
This is not intended to critique Java as a programming language, but the lack of understanding of the point of encapsulation. The Point3D encapsulates nothing: it is just three integers and a smidge of semantics. It would be deceptive to add any semantics over this class -- it would loose its point. Philosophically, the death of OOP means that coupling data and behavior is no longer fashionable. This moves the responsibility of encapsulation away from the structures that store data (classes, structs, records, or whatever).
Modules can manage the encapsulation instead: instead of objects of hiding fields in them, type erasure (stay tuned!) and simply not exposing the type would help. This can extend to functions, traits (interfaces, concepts, type families, and so on), and even sub-modules.
To me, this is better for two reasons: classes do not end up becoming namespaces in some strange way, and the access specifications are simpler since there are only two and it is easier to trace what is being used where.
Compile-Time Programming
Because everything becomes O(1), of course.
This is also called metaprogramming. That is really what I shall focus on (particularly in the cases (Lisps) where there is not necessarily a compiler involved).
Macros
C/C++ have a notion of preprocessor macros. Lisp has something different (as does Rust). Macros are perhaps the simplest metaprogramming: they quite transparently transform code into code. This can get messy, though.
C/C++ preprocessor macros are notorious for being a headache. Consider this:
#define SQUARE(x) x*x

double vecNorm2D(Vec2D* v) {
  return sqrt(SQUARE(v->x) + SQUARE(v->y));
}
	    
Looks cool, right? SQUARE might seem like it is acting as a sort of "function" that, at compile-time, puts in the code to square something. Seems awesome, but this has drawbacks. A whole list of them:
  • The lack of type safety
  • The weirdness of predence rules (consider that SQUARE(x + 1) computes x + 1 * x + 1)
  • The silliness of string matching (if you #define PI 3.1415, APPLE_PIE could become mangled into APPLE_3.1415E)
  • The lack of hygiene (consider #define INVENT(v) int x = v; which dirties the namespace where it is called to include some variable)
These can be mitigated by good conventions in C/C++ and Lisps. (Clojure, for instance, has gensym to ensure that values do not pollute the namespaces.) Rust is stricter, not only in limiting the complexity of macros (making them more local AST transformations unless you implement a procedural macro which is really Rust code that acts on a compiler-provided token stream), but also in the capability since macros cannot (yet) define values or implement interfaces of types.
Some languages use these ideas for conditional compilation too. This is important in cases where programs are compiled to run on different systems. Different systems may label things differently or simply include or exclude features. I conflate these since the C/C++ and Rust implementations of this (which are the only ones I know) use conditionals that can look similar to the macro invocation. Rust also integrates this with its build system, though, which is an essential difference (OK, if you wanna brave -D flags, you could say C/C++ build systems ("systems") include it).
GADTs And Stuff
GADTs are another way to get the compiler to do a lot of work for you. There is a different focus here, though, since GADTs are not directly about code generation. GADTs are special types that specify other types in their constructors. They can change behavior depending on the types they wrap. That is, they really start to look at generic types as functions into types, allowing the result to vary based on any input.
This is perhaps the more complex, theoretically rigorous way to understand Rust's const generics, C++'s non-type template parameters, and C++'s template specialization, and I will cover a simple view on them later, but if it was not already apparent to you, I like the sight of my own typing (like I like the sound of my own voice at times), so I shall have a section on GADTs anyway.
I'm so serious about how skippable this is that I will just let you skip: just collapse the content here..
Since the theory cat is out of the bag (not Shrodinger’s cat, my theoretical cat), I should also admit that I will use Haskell for the time being. Since you may have realized that my explanations are impatient gobs of unedited text, you may want to skip this or read a little on GADTs in Haskell before proceeding. I will develop the example that introduced me to GADTs, not the one discussed in the Haskell wiki. Before that, however, I want to detour into HoTT.
HoTT Stuff
HoTT means "homotopy type theory," and most of the time, the "type theory" part is the more important part for programming. The "homotopy" part only tangentially comes up and is brought up here mostly for a pun in the title (though technically, there's some refls that Haskell will need later on, so may be there's kinda homotopy?).
So we can reduce this to a discussion of just type theory. (Sort of meaning any intuitionistic (I think) type theory.) There are some really beautiful bits I already ranted about, but for GADTs, I'd like to draw an equivalence to one thing: product types. Not the garden-variety AxB stuff, but the type theoretic big pi-notation, that is, \prod_{x \in X} A(x). Normally, we programmers are weirdos who work with \prod_{x \in \mathcal{U}} A(x), which you need a nested universe structure for and doesn't really come up in proofs. But consider more: consider what you could do with something like \prod_{l \in Nat, T \in \mathcal{U}} List(l, T). If you know C++, this is std::array. Except, the compiler really reasons about the l and assures operations. Let’s work through one.
This is mostly for my ego (and own future reference) so feel free to skip.
Now that you have two disclaimers, let me steamroll through some Haskell. I'm going to focus on the data declaration with GADTs first. Normally, data begins the declaration of a sum type (tagged union, discriminated union, and what Rust calls an enum). Hence, saying something like data GoodWeekday = SATURDAY | FRIDAY is declaring an enum of two days. To add a generic, we put it before the = and use it after, like: data Option a = Some a | None (that being Rust's wording of Haskell's Maybe). This still is not the product type. For that we need more and I'm going to just jump in (adding a comment or two to help you follow).
{-# LANGUAGE DataKinds, KindSignatures, GADTs,
  TypeFamilies, PolyKinds #-}
-- DataKinds lets us define our own kinds (types of types)
-- KindSignatures allows the * -> * sort of notation,
-- TypeFamilies let us express functions on our kinds,
-- PolyKinds lets us express functions about our kinds,
-- and GADTs adds... well, GADTs.

-- A kind for natural numbers. ZNat is 0, SNat is the next one.
data Nat :: * where
  ZNat :: Nat
  SNat :: Nat -> Nat

-- A fixed length vector. 
-- This takes the length as the second parameter.
-- The property of Cons producing a vector one longer is key.
data FixedVec :: * -> Nat -> * where
  Empty :: FixedVec a ZNat
  Cons :: a -> FixedVec a n -> FixedVec a (SNat n)
	      
Since "kinds" came up, I'll mention that they're types of types: the higher universe we need when talking of functions on types. I like to think of it as the requisite Russell's paradox dodging you always need in a theoretical foundation of mathematics: your foundational object can't contain all instances of itself (sorta, usually). We can't have the type of all types, so we have the kind (*) of all types.
-- Small complaint: it was hard to find the right syntax.
type family Plus (x :: Nat) (y :: Nat) :: Nat
type instance Plus ZNat     m = m
type instance Plus (SNat n) m = SNat (Plus n m)

-- GHC just figures this out.
fconcat :: FixedVec a n -> FixedVec a m -> FixedVec a (Plus n m)
fconcat Empty       ys = ys
fconcat (Cons x xs) ys = Cons x $ fconcat xs ys
	      
A lot of what follows is thanks to this SO post.
-- Brings in the :~: type-level equality
import Data.Type.Equality
		
-- Brings the Nat kind into the type universe.
data ReifyNat (n :: Nat) where
  Zeify :: ReifyNat ZNat
  Seify :: ReifyNat n -> ReifyNat (SNat n)

-- Brings the vector lenghts into the type universe.
reifyVecLen :: FixedVec a n -> ReifyNat n
reifyVecLen Empty       = Zeify
reifyVecLen (Cons _ xs) = Seify $ reifyVecLen xs

-- The proof of 1 + (x + y) = x + (y + 1).
-- Base case: 1 + (0 + 0) = 0 + (0 + 1)
sumSuccRight :: ReifyNat x -> ReifyNat y
  -> (SNat (Plus x y)) :~: (Plus x (SNat y))
sumSuccRight Zeify     Zeify     = Refl
sumSuccRight (Seify n) Zeify     = apply Refl
  $ sumSuccRight n Zeify
sumSuccRight Zeify     (Seify n) = apply Refl
  $ sumSuccRight Zeify n
sumSuccRight (Seify m) (Seify n) = apply Refl
  $ sumSuccRight m (Seify n)

-- GHC needs that (1 + (x + y)) = (x + (y + 1)).
-- You'll note that that's not how we defined +.
fdup :: FixedVec a n -> FixedVec a (Plus n n)
fdup Empty       = Empty
fdup (Cons x xs) = case
  sumSuccRight (reifyVecLen xs) (reifyVecLen xs) of
    Refl -> Cons x (Cons x (fdup xs))
	      
The inductive cases above are non-trivial and this snippet brings in homotopies. The :~: type (yeah, TypeOperators extension go brr) is the identity type that much of HoTT fixates on. It has one constructor called Relf. This Refl is the compiler marking that it can show that two types are equal. The other non-trivial thing is the use of apply. I'm going to butcher this with words when a diagram (in fact "geometric" intuition) would really help. apply takes two equal functions and two equal arguments and proves that the outputs of the functions on the arguments are equal. Hence, the inductive cases are really saying:
  1. Base case: 1 + (0 + 0) = 0 + (0 + 1).
  2. f(x) = x + 1 equals itself.This isn't directly said, but pronounced as Refl since it is the fact that SNat = SNat
  3. 1 + ((n + 1) + 0) = (n + 1) + (0 + 1) since 2 and n = 0 is the base case.
  4. The next part of the function counts the other parameter down, showing 1 + 0 + (m + 1) = 0 + ((m + 1) + 1) and the same base case (using apply thanks to 2).
  5. The final case counts down the first parameter, noting the same fact as 3, but for m (second parameters) more than 0.
That about wraps up this skippable type theory side-rant (that's probably reaching the length of my original type theory rant). Here are some links: the whole Haskell file and a similar thing in Rust (note that the Rust one needs so massaging for the correct lifetimes -- the simplest thing to do is make everything needed a 'static, but that's not useful enough for me to actually want to put it as my final version). This is one way to look at the next section and I really hope more theoretically inclined people actually unify this GADT approach with compile-time programming as we know it in C++.
Actually, there's a different side-rant I'd like to give before discussing constexpr. Again, feel free to skip.
TMP does not GADTs Make
TMP (template metaprogramming) is the C++ construct that is a small functional programming language with pattern matching for conditionals and recursion (yes, it's Turing complete) that simplifies some C++ code generation. I'll spare you an example. All I mean to say here is that formally, TMP does not add GADTs into C++. (Concepts in C++20 might, but I cannot be too sure: since they don't have to be complete, they might not reach the same theorem-proving strength.)
There are a few explanations for this. The one I personally found is that the C++ compiler does not prove any facts about the templates it sees: it merely stashes them for later use. You can put in some impossible nonsense in a template and have a compiling C++ program. The errors arise when, in the buildable unit, there is a use of the template (an instance) that the compiler cannot understand. The compiler tries to fill out every template for each of its instances and only raises errors when this "filling out" (expansion, instantiation) fails. GHC does not do that with GADTs: it actually ensures that the Nats in the FidexVec, despite being unused, actually are the correct Nats each time. Rustc does the same thing with the PhantomData marker and phantom types (where you specifically make types stronger than their representations as seen by the CPU).
constexpr and Compile-time Programming
It's strange to basically repeat the title of the slide so close to the end. Cinema sins voice: roll credits. But, there's no better way to express this. This type of programming does not (yet) emphasize code generation or verification. This really ends up filling in some gaps that low-level programming needs: easier initialization and computation of constants. The essential idea is that some constants need a little nudge to be ready for the lifetime of the program. Something like the list of the first n primes or some other product. Even complex types like a lookup table (or strings or a lookup table of strings). There are a few more complications: such as how many copies of the constant value the compiler may create (see C++'s inline constexpr), and the fact that constexpr functions and constructors can be run at runtime.
There are other instances of this. Really, compilers do this more than just when you tell them to: there's an optimization called constant propagation that does this sort of thing as much as possible. Furthermore, other programming languages have equivalents to constexpr: Zig has comptime as a keyword for function parameters known at compile time (these serve as means for adding generics too), const generics are coming to Rust to complement const fns. It's worth noting that this is the simpler way to reason about array types too (that really should be qualified by their lengths).
RAII
Resource Allocation is Initialization
What a wonderful phrase
Resource Allocation is Initialization
Ain't no passing craze!
It means no worries, for your resources.
It's our problem-free philosophy,
Resource Allocation is Initialization
It really is. The incantation doesn't sound like Hakuna Matata, but if you're used to it, it will feel like that. This originates in C++, so my discussion will be focuessed on that. Rust has the same concept too, but it's easier to me to explain it in C++. In C++, when you declare a variable, you allocate memory for it. Consider:
Hyena ed;
if (be_prepared()) {
  ed = banzai;
} else {
  ed = shenzi;
}
	    
Besides being a nonsensical film reference, this snippet demonstrates something important about C++: the default constructor of ed was run -- hence, it was allocated. This could hold some resources (not that hyenas held much in the film). And there was only one way to get these resources: initialization. In a good API, the way to get a resource would be to initialize a handle. Then when that handle is destroyed (like any good leak-free program must), the destructor can release the resource. Consider (as promised earlier):
#include <atomic>

template <class T>
class Locked {
 public:
  class HeldValue {
   public:
    HeldValue(T* h, std::atomic<bool>* m) {
      bool f = false;
      while (!m->compare_exchange_strong(f, true));
      held_ = h;
      mutex_ = m;
    }

    // For more on this, see below
    HeldValue(HeldValue&& other)
        : held_(std::exchange(other.held_, nullptr)),
          mutex_(std::exchange(other.mutex_, nullptr)) {}
    HeldValue operator=(HeldValue&& rhs) {
      if (this != &rhs) {
        held_ = std::exchange(rhs.held_, nullptr);
        mutex_ = std::exchange(rhs.mutex_, nullptr);
      }
      return std::move(*this);
    }

    T& operator*() { return *held_; }
    T* operator->() { return held_; }
    ~HeldValue() { *mutex_ = false; }

   private:
    T* held_;
    std::atomic<bool>* mutex_;

    // We will only move and construct with 2 params.
    HeldValue operator=(const HeldValue& rhs) = delete;
    HeldValue(const HeldValue& h) = delete;
    HeldValue() = delete;
  };

  Locked(T t) : locked_(t), mutex_(false) {}
  HeldValue Lock() { return HeldValue(&locked_, &mutex_); }
  bool IsHeld() { return !mutex_; }

 private:
  std::atomic<bool> mutex_;
  T locked_;
};

void useLockedThingy(Locked<int>* p) {
  Locked<int>::HeldValue grabbed = p->Lock();
  (*grabbed)++;
}
	    
I think it's important to emphasize how this works. So here's the assembly, since that's how I'd like to explain it. (And gcc.godbolt.org is really cool.)
useLockedThingy(Locked<int>*):
        mov     BYTE PTR [rsp-1], 0
        mov     edx, 1
.L3:
        movzx   eax, BYTE PTR [rsp-1]
        lock cmpxchg BYTE PTR [rdi], dl
        je      .L4
        mov     BYTE PTR [rsp-1], al
        jmp     .L3
.L4:
        add     DWORD PTR [rdi+4], 1
        xor     eax, eax
        xchg    al, BYTE PTR [rdi]
        ret
	    
This is really amazing: you'll notice that the notion of the HeldValue entirely disappears: all the functions that were called get inlined to just the five bytes that are the int and the bool as the values pointed to by rdi.
The most wonderful thing, though, is that despite having no increased cost, the right thing happens: the xchg al, BYTE PTR [rdi] releases the mutex.
RAII doesn't just apply to mutexes and memory, but files, sockets, database connections, and pretty much anything that you'll cease to need at a deterministic time.
In the memory, unmanaged memory,
Resources sleep tonight.
Run destructors, clean up garbage,
Resources sleep tonight.
Afterword: the rule of zero and why C++ is difficult.
I like yakking, but this video covers this stuff better and introduced me to the rule of zero.
RAII is great for simplifying correctness guarantees in C++, but they aren't the whole story. It's actually more complex given the semantics of C++ objects. (Note: this doesn't apply to Rust, C, and probably most other languages.)
Suppose I have some really simple C++ code. Like a few characters: struct Basic { int x; };. This actually produces (at least, something roughly like):
struct Basic {
  int x;

  // constructors, 
  Basic() { x = 0; }
  Basic(int x_) : x(x_) {}
  Basic(const Basic& other) { x = other.x; }
  Basic(Basic&& other) { x = other.x; }

  // assignment operators, 
  Basic operator=(const Basic& other) {
    x = other.x;
    return *this;
  }
  Basic operator=(Basic&& other) {
    x = other.x;
    return *this;
  }

  // and a destructor.
  ~Basic() {}
};
	    
These constructors actually end up enabling different things like default construction (like Basic b;), assignment (Basic c, b(5); c = b; has the second expression use it, and you'll note that the return type allows for chaining (like a = b = c;)), and copy-construction (Basic b = c; or Basic b(c);). The destructor is also there in case the class holds any resources.
Now that I've introduced a lot of non-basic stuff in our seemingly Basic struct, I should mention why: I got this wrong in my first writing of HeldValue above. You see, without deleting the copy constructor, this is possible: Locked<Basic>::HeldValue h = something.lock(), i = h; and now h and i could both mess with the mutex (holding copies of the same pointers). Technically, this copy behavior would destroy the mutex: it'd be possible to access the value through the copy. This makes the mutex a little more specific than something simpler like a vector. Mutexes are unique: you can only have one at a time.
Once you settle on the "one at a time" semantics, life becomes easy: you just need to delete the copy constructors and assignment operators. However, you could move the HeldValue around, ensuring that there is only one copy. You'll note that I mericlessly set all the pointers to nullptr to maintain unique ownership (and then let the user seg fault if they dereference a dangling moved value -- but I consider this their fault since they'd have to say std::move in the first place).
In general, once you write a destructor, start wondering about what the semantics of copying and moving should be for your class. And watch the video that showed me my folly.
Linear Types
So linear types aren't really just about move semantics. This is another wedding of some concepts and I think really go hand-in-hand. Move semantics help use linear types and linear types help motivate the use of linear semantics. But I'm really just going to mention that moves exist as a thing. Bjarne Stoustroup has a good brief explanation of the issue. Since this is about linear types, you should assume that a lot of things use move semantics.
Other than that, it's worth mentioning the more important linear types: std::unique_ptr in C++ and any type that isn't a Copy in Rust. Haskell also has linear types. C++ and Haskell support linear types as a special subset of the semantics, but Rust makes it the default.
The idea of a linear type is that it is produced and then consumed. In a way, it's like thinking of values as molecules in chemistry. This, to me, is more like how resources should work in computers too since you don't get copies for free. I've discussed this before too.
Error Handling
This is a trend I've seen that is pretty interesting. Rust has most of what I'm on about. Golang does too, and C++ supports it, but I know very little about exceptions and think that they're a philosophical anda syntactic mess. So first I'll bash them and then present the (monadic) light.
Exceptions are a really weird term for errors in a program. Pragmatically, there are two types of errors: something the program can handle, and something that the program cannot handle. The former is a reasonable candidate for error handling, the latter being something the OS should resolve on the program's behalf, perhaps terminating it. Consider some invalid input or a message from the network failing to parse versus a segmentation fault or running out of memory. The word "exception" has an implication: that something exceptional occured. This is a really weird conflation of the subjective divide on what errors programs should and should not be able to recover from. If the program can recover from the error, just how "exceptional" was the behavior? Are we supposed to expect exceptions to our rules? These semantics are strange to me. I'm sure the word made more sense earlier, but now, we have recoverable errors and crashes, and I don't think we need to really care about further distinctions.
Furthermore, semantics aside, I also think that the syntax of exceptions is invariably clumsy. You end up with a separate exit from a function, separate handling of the errors and various really strange workarounds for "exceptional exits." In particular, when trying to catch an exception, the control flow becomes really strange: if you must run code before exiting the function, you need a special goto now that's just relabelled as finally (at least in Java which also now has some sort of RAII scoped try thingy that will cleanup resources for you). This adds a lot of mental overhead to a programming language, in my opinion. It is worsened by the notion of checked and unchecked exceptions (another quirk in the semantics too) and odd conventions like noexcept's details. I think these facts make error handling unsightly -- much uglier than it needs to be.
Now that I've bashed one approach, let me be even more opinionated by presenting the solution I like: the Result built-in sum type. Rust has this. This type is great for various reasons: there is no overhead in terms of new syntax (except may be the ? operator, but that's a success story if anything), no notion of unchecked exceptions, no extra changes in control flow, and no odd semantic implications about "exceptional behavior." The Result type is also really well-understood in Haskell's Either which provides a lot of useful functions for proceeding until the first error, or transforming the success state if there is one, or even transforming the error if that's there. This can lead to some beautiful code since you can chain operations rather elegantly with either ? being used a lot or the .and_then function.
Such a sum type is no panacea: the types of errors are hard to extend when working with a sum type, but this is an area of active work in Rust.
Dispatch
How to Stop Worrying and Generically Program
This is about static and dynamic dispatch. Broadly, dispatch is about knowing which concrete function to call for abstract types. Consider (in C++):
#include <string>

class Slide {
public:
  virtual std::string GetText() = 0;
  virtual ~Slide();
};

class LongSlide : public Slide {
public:
  std::string GetText() override;
  int GetScrollLevel();
};
	    
This could lead to vague code: if I have a Slide * s and I s->GetText(), the exact implementation of GetText could be vague. If I added more child classes under Slide, the choices may abound. There are two ways the compiler may fix this:
  • Static dispatch: where different functions are generated every time a choice is made and so there is excatly one function every time. This can also be called monomorphization.
  • Dynamic dispatch: where every object carries a small table of function pointers disambiguating which function you'd call.
There's a tradeoff worth knowing here: dynamic dispatch has at least 2 (since some lookups in C++ may defer to parent classes) jumps that could really slow the CPU down, but static dispatch can bloat the binary, also slowing down execution. The above example with Slides actually does a dynamic dispatch (as of December 2020 on GCC with an -O3). C++ templates may do static dispatch, Rust's behavior varies based on the way you're using traits, how many traits you're using, and how large the traits are (in instances and probably number of functions). Java does something like what Rust does.
From Closures to Type Erasure
Consider this: it's the mid 2000s and C++ has a standard (finally), but it's just a record of what C++ has become. There's a great library, and it provides much: <algorithm>. It's really neat since a lot of array and vector transformations are easier -- more readable and things you don't need to rewrite repeatedly. Consider, however, the C code that lfind is based on: void * lfind (const void* key, const void* base, size_t nitems, size_t size, int (*compar) (const void*, const void*)); which, the bad function pointer syntax notwithstanding, makes the comparator have to be a static constant. Suppose that we wanted to sort the same range with more or less the same comparator? If we wanted these comparators to vary, we'd need a different functions. For instance, if the void* referred to names, and we only wanted to find by first names or last names, and sometimes by the whole name, this would be three functions.
C++98 makes this a little easier since we could have what C++ calls a functor (it's sort of abuse of notation, but I'm not talking about real functors, so I'll use the C++ terminology). A functor is a callable with data, a struct that overloads operator(). So consider the C++ std::find_if. In C++20 (or C++23 when this gets more standard), this would look like template<Iterator It, FunctionType<bool, It::ValueType> Fn> It std::find_if(It begin, It end, Fn compar); (which seems like jumping the gun since I am still really talking about C++98 where this would just be a template with nothing too fancy). In this case, you'd encode the logic of the three comparators (first name only, last name only, or whole name from the example above) into a struct. The really neat part about this is that the compiler can inline a lot of the notion of there having been a struct at all.
C++11 made this situation better by adding a really nice syntactic sugar for functors: lambdas. The notion of capturing codifies the state the functor would've been initialized with and the lambda body takes the place of the operator(). This is getting better and better as C++ evolves, but there is more to the puzzle. The functor struct used to be a concrete type. Lambdas are also supposed to be compiler-generated (opaque) types. This brings up an issue: what if I want to write my own higher-order function? Am I forced to add a template? Does that have to propagate all the way down?
There's some alternate magic: std::function. This can wrap any callable, but you can peer behind that curtain (other than the syntactic sugar that lets you write the return type first -- I have no clue what that is). There are two ways to do this. C++17 and earlier needed std::enable_if hacks. But now I present another way to deal with this as of C++20's draft (using clang-trunk in December 2020):
#include <concepts>

template <typename F, typename RT, typename... Args>
concept Function = requires (RT r, Args... a) {
  {F{}(a...)} -> std::convertible_to<RT>;
  std::copy_constructible<F>;
};

// Technically useless, but this is what C++17 (sorta) had as its std::function
template <typename F, typename RT, typename... Args>
requires Function<F, RT, Args...>
class MyStdFunction {
public:
  MyStdFunction(F f) : f_(f) {}
  RT operator()(Args... a) {
      f_(a...);
  }
private:
  F f_;
};
	    
Amazingly, this makes the "good" type declaration of std::find_if that I wrote above fairly accurate. At the risk of overusing concepts (and not discussing the boring old std::enable_if-based classes in C++17), here's an iterator concept too:
template <typename It>
concept IteratorForFind = requires(It i) {
  typename It::value_type;
  { *i } -> std::convertible_to<typename It::value_type>;
  { i++ } -> std::convertible_to<It>;
};
	    
This isn't quite the right thing. The next slide is that, because I got distracted and the concepts are way cooler than the class-based old type erasure. Rust has it too: See the std::ops::Fn trait.
C++20 Concepts: A Detour Because I Can
So this started with baby type erasure:
#include <concepts>

template <typename F, typename RT, typename... Args>
concept Function = requires (RT r, Args... a) {
  {F{}(a...)} -> std::convertible_to<RT>;
  std::copy_constructible<F>;
};

template <typename It>
concept IteratorForFind = requires(It i) {
  typename It::value_type;
  { *i } -> std::convertible_to<typename It::value_type>;
  { i++ } -> std::convertible_to<It>;
};
	    
To get the compiler explorer (as of December 2020, clang trunk, --std=c++2a -O3) to output something, I added (rather naturally if I say so myself):
template <typename It, typename Comp>
requires IteratorForFind<It> && Function<Comp, bool, typename It::value_type>
It my_find_if(It begin, It end, Comp c) {
  for(; begin != end && !c(*begin); begin++);
  return begin;
}

int * find_int(int n, int* arr, int key) {
    return my_find_if(arr, arr + n, [key](int i) { return i == key;});
}
	    
This doesn't compile since you can't pronounce int *::value_type. I tried to do some template voodoo to make that work with a "concept specialization" (yes it's a thing, you can now TMP with higher predicates UwU). After some finnagling, and giving up on the value_type being handed to me, I did get this to compile to a tidy loop:
#include <concepts>

template <typename F, typename RT, typename... Args>
concept Function = requires (RT r, Args... a) {
  {F{}(a...)} -> std::convertible_to<RT>;
  std::copy_constructible<F>;
};

template <typename It, typename T>
concept IteratorForFind = requires(It i) {
  { *i } -> std::convertible_to<T>;
  { i++ } -> std::convertible_to<It>;
  std::equality_comparable_with<It, It>;
};

template <typename T, typename It, typename Comp>
requires IteratorForFind<It, T> && Function<Comp, bool, T>
It my_find_if(It begin, It end, Comp c) {
  for(; begin != end && !c(*begin); begin++);
  return begin;
}

int * find_int(int n, int* arr, int key) {
    return my_find_if<int>(arr, arr + n, [key](int i) { return i == key;});
}
	    
This works, but got me wondering about the built-ins. Turns out the in iterator, there are concepts for iterators already. (In hindsight, duh.) Here they use std::iterator_traits to get at the value (duh, again).
#include <concepts>
#include <iterator>

template <std::forward_iterator It, typename Comp>
requires std::predicate<Comp, typename std::iterator_traits<It>::value_type>
It my_find_if(It begin, It end, Comp c) {
  for (; begin != end && !c(*begin); begin++);
  return begin;
}

int* find_int(int n, int* arr, int key) {
  return my_find_if(arr, arr + n, [key](int i) { return i == key; });
}
	    
Yeah, there's a built-in for predicates too. And this makes life really easy.
References and TL;DR
Yeah, this should've been everything, may be.
TL;DR
Instead of reading all of the stuff I wrote out, I think this summary would help:
  • CPUs are not speeding up as much as they used to so hardware is specializing.
  • One specialization is SIMD where, even commodity CPUs but especially in GPUs, one instruction acts on much data.
  • Atomics provide ways to implement multi-threading.
  • CPUs are confusing and complicated, and micro-architectural state is not easy.
  • Benchmarking is difficult and specific datasets will make algorithms behave differently.
  • Static guarantees and tooling is the best it's ever been and is only getting better.
  • There's a wellspring of low-level systems languages. They're all modernizing C in many ways and agree on some interesting new trends, making code easier to understand.
  • Compile-time programming is a great way to get static guarantees and code generation.
  • Modules are the "right" way to do encapsulation.
  • Resource allocation is initialization: that is, tie resources to object lifetimes.
  • Error handling should fit in with other bits of code.
  • Static and dynamic dispatch are well-understood, and you should know about that. This is applied in type erasure.
  • C++20 concepts are really cool.
It's worth noting that this "slideshow" is a little bit of everything, best understood through larger complicated low-level (that is non-managed) languages. I've generally used C++ because it's the easiest for me to get my hands on examples, but Rust is just as good. Zig, Odin, C, and similar languages also could cover a lot of the earlier topics. I broadly call this systems programming since systems programmers seem to engineer large complex systems with the help of the latest programming tools and with great attention to how these abstract bits of code interact with the hardware. I'm not sure what the hard academic line is between systems programming, operating systems, distributed systems, programming language theory, and computer architecture -- I like to think that systems programming is a unification of all these topics.
References
You may not recall, but one of the titles mentioned CppCon. Here's why:
Talk Topics
Empirically Measuring, & Reducing, C++’s Accidental Complexity - Herb Sutter - CppCon 2020 This is about parameter passing (not something I talked about enough) which is a great example of declarative code.
How C++20 Changes the Way We Write Code - Timur Doumler - CppCon 2020 An overview of concepts, ranges, and coroutines as they will be in C++20.
C++20 Ranges in Practice - Tristan Brindle - CppCon 2020 A detailed look at ranges. I love how this can interact with constexpr.
Structure and Interpretation of Computer Programs: SICP - Conor Hoekstra - CppCon 2020 Just another plug for both C++20 and one of my favorite books.
How to Implement Your First Compiler Feature:The Story of Concepts in Clang - Saar Raz - CppCon 2019 A really fun look at the story of concepts in C++20 with a lot of good advice on collaboration and navigating a large codebase.
Back to Basics: Lambdas from Scratch - Arthur O'Dwyer - CppCon 2019 and then Back to Basics: Type Erasure - Arthur O'Dwyer - CppCon 2019 A great coverage of lambdas and how they work followed by the talk that actually made me want to talk about type erasure.
CppCon 2019: Matt Godbolt “Path Tracing Three Ways: A Study of C++ Style” A really fun talk about some good C++ code, with a got bit on benchmarking.
CppCon 2019: Matt Godbolt “Compiler Explorer: Behind The Scenes” Explains some of compiler explorer, which is one of my favorite things.
CppCon 2019: Chandler Carruth “There Are No Zero-cost Abstractions” I honestly don't fully remember what the talk was about. But it was good. I like the notion of "no zero-cost abstractions" given the explanations given here.
CppCon 2019: Titus Winters “Maintainability and Refactoring Impact of Higher-Level Design Features” Really similar to the above, but mentions how to refactor incrementally in really interesting ways.
CppCon 2019: Hyrum Wright Time Travel: Applying Gradual Typing to Time Types with Clang's LibTooling Conversion of types and really neat arithmetic optimizations leading to a great library.
CppCon 2019: Herb Sutter “De-fragmenting C++: Making Exceptions and RTTI More Affordable and Usable” This is a really neat summary on the entire error handling point I wanted to make.
CppCon 2019: Andrei Alexandrescu “Speed Is Found In The Minds of People" A really good talk about benchmarking and optimization. This is the speaker who made me want to mention benchmarking.
CppCon 2018: Jefferson Amstutz “Compute More in Less Time Using C++ Simd Wrapper Libraries” A really good talk about SIMD, particularly about AVX and how to use it pratically.
CppCon 2018: Chandler Carruth “Spectre: Secrets, Side-Channels, Sandboxes, and Security” A great overview on the Spectre attack in the CPU.
CppCon 2017: Matt Kulukundis “Designing a Fast, Efficient, Cache-friendly Hash Table, Step by Step” and then CppCon 2019: Matt Kulukundis “Abseil's Open Source Hashtables: 2 Years In” The talks that got me talking about hashmaps.
CppCon 2017: Bob Steagall “How to Write a Custom Allocator” A really nice overview of std::allocator and the design of allocators. I wish I could discuss these -- allocators are essential in low-level programming.
CppCon 2017: Fedor Pikus “C++ atomics, from basic to advanced. What do they really do?” std::atomic and the talk that made me want to talk about those.
CppCon 2017: P. McKenney, M. Michael & M. Wong “Is Parallel Programming still hard? PART 1 of 2” and then CppCon 2017: P. McKenney, M. Michael & M. Wong “Is Parallel Programming still hard? PART 2 of 2” Very interesting survey on issues in parallel programming, particularly history and low-level details with caching. I should mention that I don't really like the speakers, but their content is awesome.
CppCon 2017: Herb Sutter “Meta: Thoughts on generative C++” An ambitious proposal for generative C++ macros (they seem to be like better Rust procedural macros).
CppCon 2017: Matt Godbolt “What Has My Compiler Done for Me Lately? Unbolting the Compiler's Lid” A tour through some really fun compiler optimizations powered by the compiler explorer.
CppCon 2015:Marshall Clow “Type Traits - what are they and why should I use them?" An overview of C++ types and the pre-concepts way to manage them. SFINAE go brr.
CppCon 2015: Andrei Alexandrescu “Declarative Control Flow" Another chapter in the tale of exceptions: resolving resources through exceptions.
CppCon 2016: Herb Sutter “Leak-Freedom in C++... By Default.” A memory management system for C++.
Back to Basics: RAII and the Rule of Zero - Arthur O'Dwyer - CppCon 2019 RAII, why it's good, and how to use it properly.
Of course, cppcon is no longer the only set of good systems programming talks, and there are other interesting topics. Here are a few others from various other places:
Talk Topics
Rust: A Language for the Next 40 Years - Carol Nichols A good overview on how Rust was trending in mid 2019, outlining a lot of the philosophy and direction.
The Talk You've Been Await-ing for An overview of Rust async/await that explains the architecture of the whole system in Rust, with wakers, polling, tasks, and all the malarchy.
Zig: A programming language designed for robustness, optimality, and clarity – Andrew Kelley An overview of Zig, by its creator.
Simon Peyton-Jones: Escape from the ivory tower: the Haskell journey The tale of Haskell, as explained by one of its creators.
Adventure with Types in Haskell - Simon Peyton Jones (Lecture 1) and the three subsequent lectures. A tour through the implementation of Haskell's more complex type stuff. This is actually the first place that made me really think about dynamic dispatch as a small table of functions (I think that's in the second lecture somewhere).
Odin Programming Language: An Introduction - 2020-11-26 An overview of Odin, another "new C" programming language.
If there's something incorrect in my writing, it's not due to the videos. It may be from my own experience or misremembering. Either way, blame me, and just watch the video.
Email GitHub
LinkedIn