Back

Benchmarking Generics in Go

How will generics impact performance? Let us figure out by benchmarking a few use cases.

by Percy Bolmér, February 21, 2022

Image by Percy Bolmér. Gopher by Takuya Ueda, Original Go Gopher by Renée French (CC BY 3.0)

Last week I wrote about the new features that we will see in Go 1.18. If you haven’t you should read it Go 1.18 Comes With Many Amazing Changes.

Many readers of that article have reached out to me on various social media platforms, and I want to say thank you to those readers. I love the engagement and discussions it has brought forth.

One of those discussions is a topic I want to write about, the Performance impact that we will see from the introduction to generics. Many readers have raised a concern that generics will decrease performance, my thesis is however that generics will improve performance. The reasoning behind my thesis is that generics will allow us to skip type conversions, assertions, and reflecting during runtime, and instead rely on the compiler to fix this at compile time.

In my article about Learning Generics, I explain the usage of generics, and the two major benefits were reducing duplicate functions based on data type and also avoiding the interface{}. Those are the use cases that we will benchmark in this article, to discover the performance of the changes.

Let me make a small notion here, I am not a benchmarking wizard. I am but a humble benchmarking noob. Benchmarking is according to me incredibly hard.

To make a fair benchmark, we will set up a test case for each use case. This will mean that we will

Benchmark using Duplicate functions
Benchmark using Generics
Benchmark using interface{}

Preparing The Functions to Benchmark

We will be reusing some code from the Learning Generics, in it, we have a Subtract function that subtracts the value between three Subtractable data types.

We will want to determine which Subtract methods perform the best.

package functions

// Subtract will subtract the second value from the first
func SubtractInt(a, b int) int {
	return a - b
}

// Subtract64 will subtract the second value from the first
func SubtractInt64(a, b int) int {
	return a - b
}

// SubtractFloat32 will subtract the second value from the first
func SubtractFloat32(a, b float32) float32 {
	return a - b
}
// SubtractTypeSwitch is used to subtract using interfaces
func SubtractTypeSwitch(a, b interface{}) interface{} {
	switch a.(type) {
	case int:
		return a.(int) - b.(int)
	case int64:
		return a.(int64) - b.(int64)
	case float32:
		return a.(float32) - b.(float32)
	default:
		return nil
	}
}

// Subtract will subtract the second value from the first
func Subtract[V int64 | int | float32](a, b V) V {
	return a - b
}

The Subtraction methods that we will benchmark. Try it at Playground

There we have the functions that we will begin to benchmark. They should be fairly simple to understand, and we cover the possible solutions to subtraction, data types based, type switched, and generic.

Preparing the Benchmarks

Create a regular test file where we can store the benchmarks, if you are unfamiliar with benchmarks in Go, you can read my tutorial on it.

At the top of the benchmark, I will generate two slices, one slice of random integers, and one with random float32s. These random slices will be used as input parameters to the subtract methods.

Then we create a b.Run which will trigger the functions one at a time for as many times as we set to benchmarker to run with -benchtime flag.

For this benchmark, I will be forcing the benchmarker to run each function 1000000000 times. If you don’t specify the number of times to run the functions, the benchmarker runs the function as many times as possible for a specific time. This will end with them not running the same amount of operations, and I want them to.

This is what my final Benchmark looks like. I wish I could provide a Playground link to the benchmark, but only regular testing.T is allowed on the playground. So you will have to copy the gist if you want to try this yourself.

package functions

import (
	"math/rand"
	"testing"
	"time"
)
// Benchmark_Subtract is used to determine the most performant solution to subtraction
func Benchmark_Subtract(b *testing.B) {

	// Create a slice of random numbers based on the number of iterations set
	// to test the performance of the function
	// Default iterations for me is 1000000000
	// b.N is always 1 so we can use that to set the number of iterations
	numbers := make([]int, 1000000001)
	floatNumbers := make([]float32, 1000000001)
	// Create a random seed

	seed := rand.NewSource(time.Now().UnixNano())
	// Give the seed to the random package
	randomizer := rand.New(seed)
	for i := 0; i < b.N; i++ {
		// randomize numbers between 0-100
		numbers[i] = randomizer.Intn(100)
		floatNumbers[i] = float32(randomizer.Intn(100))
	}
	// run a benchmark for regular Ints
	b.Run("SubtractInt", func(b *testing.B) {
		for i := 0; i < b.N; i++ {
			SubtractInt(numbers[i], numbers[i+1])
		}
	})
	// run a benchmark for regular Floats
	b.Run("SubtractFloat", func(b *testing.B) {
		for i := 0; i < b.N; i++ {
			SubtractFloat32(floatNumbers[i], floatNumbers[i+1])
		}
	})
	// run a benchmark for TypeSwitched Ints
	b.Run("Type_Subtraction_int", func(b *testing.B) {
		for i := 0; i < b.N; i++ {
			SubtractTypeSwitch(numbers[i], numbers[i+1])
		}
	})
	// run a benchmark for TypeSwitched Floats
	b.Run("Type_Subtraction_float", func(b *testing.B) {
		for i := 0; i < b.N; i++ {
			SubtractTypeSwitch(floatNumbers[i], floatNumbers[i+1])
		}
	})

	// run a benchmark for Generic Ints
	b.Run("Generic_Subtraction_int", func(b *testing.B) {
		for i := 0; i < b.N; i++ {
			Subtract[int](numbers[i], numbers[i+1])
		}
	})
	// run a benchmark for Generic Floats
	b.Run("Generic_Subtraction_float", func(b *testing.B) {
		for i := 0; i < b.N; i++ {
			Subtract[float32](floatNumbers[i], floatNumbers[i+1])
		}
	})
	// run a benchmark where generic type is infered
	b.Run("Generic_Inferred_int", func(b *testing.B) {
		for i := 0; i < b.N; i++ {
			Subtract(numbers[i], numbers[i+1])
		}
	})

}

A test file for performing a Benchmark to determine the generic performance impact.

The benchmark will test the subtraction functions for both int and float32 in all use cases. In the generic benchmark, I have added a third option, inferred data type. I want to also determine how much performance we lose if we let the generic function infer the data type to int.

To run the benchmark, use the following command. Note that the -count 5 parameter is used to run each benchmark 5x times. This is because if you run each benchmark once you might get an unfair result. During one of the benchmarks, maybe some other processor stole power from the computer.

go test -v -bench=Benchmark -benchtime=1000000000x -count 5

Analyzing The Result

The benchmark will be outputted with the name of the function that was running, which we can use to identify the different ones. The second value is the number of operations ran, in our case, we set that to a fixed number, so all of the rows should display the same.

The third output is the interesting one, it is the nanoseconds per operation (ns/op). This is the metric that displays the average speed of the function.

goos: windows
goarch: amd64
pkg: programmingpercy/benchgeneric
cpu: Intel(R) Core(TM) i7-10750H CPU @ 2.60GHz
Benchmark_Subtract
Benchmark_Subtract/SubtractInt
Benchmark_Subtract/SubtractInt-4                1000000000               0.9002 ns/op
Benchmark_Subtract/SubtractInt-4                1000000000               0.8904 ns/op
Benchmark_Subtract/SubtractInt-4                1000000000               0.8277 ns/op
Benchmark_Subtract/SubtractInt-4                1000000000               0.8290 ns/op
Benchmark_Subtract/SubtractInt-4                1000000000               0.8266 ns/op
Benchmark_Subtract/SubtractFloat
Benchmark_Subtract/SubtractFloat-4              1000000000               0.8591 ns/op
Benchmark_Subtract/SubtractFloat-4              1000000000               0.8033 ns/op
Benchmark_Subtract/SubtractFloat-4              1000000000               0.8108 ns/op
Benchmark_Subtract/SubtractFloat-4              1000000000               0.8168 ns/op
Benchmark_Subtract/SubtractFloat-4              1000000000               0.8040 ns/op
Benchmark_Subtract/Type_Subtraction_int
Benchmark_Subtract/Type_Subtraction_int-4               1000000000               1.597 ns/op
Benchmark_Subtract/Type_Subtraction_int-4               1000000000               1.711 ns/op
Benchmark_Subtract/Type_Subtraction_int-4               1000000000               1.607 ns/op
Benchmark_Subtract/Type_Subtraction_int-4               1000000000               1.570 ns/op
Benchmark_Subtract/Type_Subtraction_int-4               1000000000               1.588 ns/op
Benchmark_Subtract/Type_Subtraction_float
Benchmark_Subtract/Type_Subtraction_float-4             1000000000               1.320 ns/op
Benchmark_Subtract/Type_Subtraction_float-4             1000000000               1.311 ns/op
Benchmark_Subtract/Type_Subtraction_float-4             1000000000               1.323 ns/op
Benchmark_Subtract/Type_Subtraction_float-4             1000000000               1.424 ns/op
Benchmark_Subtract/Type_Subtraction_float-4             1000000000               1.321 ns/op
Benchmark_Subtract/Generic_Subtraction_int
Benchmark_Subtract/Generic_Subtraction_int-4            1000000000               0.8251 ns/op
Benchmark_Subtract/Generic_Subtraction_int-4            1000000000               0.8288 ns/op
Benchmark_Subtract/Generic_Subtraction_int-4            1000000000               0.8420 ns/op
Benchmark_Subtract/Generic_Subtraction_int-4            1000000000               0.8377 ns/op
Benchmark_Subtract/Generic_Subtraction_int-4            1000000000               0.8357 ns/op
Benchmark_Subtract/Generic_Subtraction_float
Benchmark_Subtract/Generic_Subtraction_float-4          1000000000               0.7952 ns/op
Benchmark_Subtract/Generic_Subtraction_float-4          1000000000               0.7987 ns/op
Benchmark_Subtract/Generic_Subtraction_float-4          1000000000               0.7877 ns/op
Benchmark_Subtract/Generic_Subtraction_float-4          1000000000               0.8037 ns/op
Benchmark_Subtract/Generic_Subtraction_float-4          1000000000               0.8283 ns/op
Benchmark_Subtract/Generic_Inferred_int
Benchmark_Subtract/Generic_Inferred_int-4               1000000000               0.8297 ns/op
Benchmark_Subtract/Generic_Inferred_int-4               1000000000               0.8283 ns/op
Benchmark_Subtract/Generic_Inferred_int-4               1000000000               0.8319 ns/op
Benchmark_Subtract/Generic_Inferred_int-4               1000000000               0.8366 ns/op
Benchmark_Subtract/Generic_Inferred_int-4               1000000000               0.8623 ns/op
PASS
ok      programmingpercy/benchgeneric   37.114s

The benchmarking result from the go test tooling.

From the results, we can determine that the type assertion functions were far slower. It was about 50–90% slower. In this test case, it might seem ridiculous since we are talking about half a nanosecond.

The generic functions performed about the same as the data type-specific, with a small increase in speed. This small increase in speed is probably due to inference from other software running on my computer. In my state of mind I think that after the compiler has done its job, the generic function calls should be the same as the regular ones.

One other takeaway we can see in the results is that int subtraction is more time-consuming than float32 subtraction. The average speed for regular int subtraction was 0,85478 ns/op, the average speed for regular float32 subtraction was 0,8188 ns/op. That means the float32 subtraction is about 5% faster in my benchmark.

So the key takeaway from this benchmark is that:

Type assertion / Type conversion solution is slowest, as per my thesis
Generics and Regular data types functions are equally performant
Float32 subtraction is faster than int

A Real-Life Scenario Benchmarked

Let us also compare a real-life scenario. In the use case, we have two structures Person and Car who both can Move. Both of the structures have a Move function that accepts the distance, however, the Person distance is passed as float32 and the car accepts an int.

Both of these structures are handled in the same workflow, so we will want to handle them in the same function.

The generic solution for this is to create generic structures at which we can define the data type to use upon creation. The interface solution is to accept the structures as input and type assert them and convert the correct data type. We cant have a shared interface for them as the data type is not the same.

I won’t explain in detail how the generic solution works, if you want to understand that you can check out Learning Generics in Go.

In the code examples, there is an implementation for both generics and the old type assertion solution, the type assertion is suffixed with Regular so we can easier know what is related to what solution.


package benchmarking



// Subtractable is a type constraint that defines subtractable datatypes to be used in generic functions
type Subtractable interface {
	int | int64 | float32
}
// Moveable is the interace for moving a Entity
type Moveable[S Subtractable] interface {
	Move(S)
}

// Car is a Generic Struct with the type S to be defined
type Car[S Subtractable] struct {
	Name string
	DistanceMoved S
}

// Person is a Generic Struct with the type S to be defined
type Person[S Subtractable] struct {
	Name string
	DistanceMoved S
}

// Person is a struct that accepts a type definition at initialization
// And uses that Type as the data type for meters as input
func (p *Person[S]) Move(meters S) {
	p.DistanceMoved += meters
}
func (c *Car[S]) Move(meters S) {
	c.DistanceMoved += meters
}

// Move is a generic function that takes in a Generic Moveable and moves it
func Move[S Subtractable, V Moveable[S]](v V, meters S) {
	v.Move(meters)
}

The generic solution for executing Move on Cars and Persons with different data types. Try it at Playground


package benchmarking

// Below is the Type casting based Solution
//
type CarRegular struct {
	Name          string
	DistanceMoved int
}

type PersonRegular struct {
	Name          string
	DistanceMoved float32
}

func (p *PersonRegular) Move(meters float32) {
	p.DistanceMoved += meters
}

func (c *CarRegular) Move(meters int) {
	c.DistanceMoved += meters
}

func MoveRegular(v interface{}, distance float32) {
	switch v.(type) {
	case *PersonRegular:
		v.(*PersonRegular).Move(distance)
	case *CarRegular:
		v.(*CarRegular).Move(int(distance))
	default:
		// Handle Unsupported types, not needed by Generic solution as Compiler does this for you
	}
}

The typed switched Solution to Move Try it at Playground

Now that we have the solutions in place, it is time to set up the Benchmark. I will create the Persons and Cars before the benchmark and we will measure the performance of Move and MoveRegular.

package benchmarking

import "testing"

func Benchmark_Structures(b *testing.B) {

	// Init the structs
	p := &Person[float32]{Name: "John"}
	c := &Car[int]{Name: "Ferrari"}

	pRegular := &PersonRegular{Name: "John"}
	cRegular := &CarRegular{Name: "Ferrari"}

	// Run the test
	b.Run("Person_Generic_Move", func(b *testing.B) {
		for i := 0; i < b.N; i++ {
			// generic will try to use float64 if we dont tell it is a float32
			Move[float32](p, 10.2)
		}
	})

	b.Run("Car_Generic_Move", func(b *testing.B) {
		for i := 0; i < b.N; i++ {
			Move(c, 10)
		}
	})

	b.Run("Person_Regular_Move", func(b *testing.B) {
		for i := 0; i < b.N; i++ {
			MoveRegular(pRegular, 10.2)
		}
	})

	b.Run("Car_Regular_Move", func(b *testing.B) {
		for i := 0; i < b.N; i++ {
			MoveRegular(cRegular, 10)
		}
	})
}

The benchmark that will run the Move and MoveRegular functions

I run the tests with the following command

go test -v -bench=Benchmark_Structures -benchtime=1000000000x -count 5

goos: windows
goarch: amd64
pkg: programmingpercy/benchgeneric
cpu: Intel(R) Core(TM) i7-10750H CPU @ 2.60GHz
Benchmark_Structures
Benchmark_Structures/Person_Generic_Move
Benchmark_Structures/Person_Generic_Move-4              1000000000               4.690 ns/op
Benchmark_Structures/Person_Generic_Move-4              1000000000               4.668 ns/op
Benchmark_Structures/Person_Generic_Move-4              1000000000               4.727 ns/op
Benchmark_Structures/Person_Generic_Move-4              1000000000               4.664 ns/op
Benchmark_Structures/Person_Generic_Move-4              1000000000               4.699 ns/op
Benchmark_Structures/Car_Generic_Move
Benchmark_Structures/Car_Generic_Move-4                 1000000000               3.176 ns/op
Benchmark_Structures/Car_Generic_Move-4                 1000000000               3.188 ns/op
Benchmark_Structures/Car_Generic_Move-4                 1000000000               3.296 ns/op
Benchmark_Structures/Car_Generic_Move-4                 1000000000               3.144 ns/op
Benchmark_Structures/Car_Generic_Move-4                 1000000000               3.156 ns/op
Benchmark_Structures/Person_Regular_Move
Benchmark_Structures/Person_Regular_Move-4              1000000000               4.694 ns/op
Benchmark_Structures/Person_Regular_Move-4              1000000000               4.634 ns/op
Benchmark_Structures/Person_Regular_Move-4              1000000000               4.677 ns/op
Benchmark_Structures/Person_Regular_Move-4              1000000000               4.660 ns/op
Benchmark_Structures/Person_Regular_Move-4              1000000000               4.626 ns/op
Benchmark_Structures/Car_Regular_Move
Benchmark_Structures/Car_Regular_Move-4                 1000000000               2.560 ns/op
Benchmark_Structures/Car_Regular_Move-4                 1000000000               2.555 ns/op
Benchmark_Structures/Car_Regular_Move-4                 1000000000               2.553 ns/op
Benchmark_Structures/Car_Regular_Move-4                 1000000000               2.579 ns/op
Benchmark_Structures/Car_Regular_Move-4                 1000000000               2.560 ns/op
PASS
ok      programmingpercy/benchgeneric   75.830s

The results from running the Benchmark

I am a bit surprised to see that the type asserted solution is faster than the generic solution. I made sure to run the benchmark multiple times so it wasn’t temporary.

We can see from the benchmark that the Cars, Int based solutions are both faster than the Person, float32 based data types.

The Person move function has the same performance, both the generic and regular solution. However, you can see a difference in the Cars, with the type asserted car being the fastest. The type asserted cars were executed around 20% faster, than the generic counterpart.

So the key takeaway from this benchmark is the following.

The float-based types share the same performance, while the Integer cars that are type asserted is faster, as per not my thesis
Float32 addition is slower than int

Conclusion

So, we now have tested some use cases in which I can see generics being helpful.

Let me be honest, I did hope for the second benchmark to also prove that generics were faster. That would strengthen my claim that generics are more performant due to being decided on compile-time instead of runtime.

We can see a pretty big performance gain in the first use case by using generics or data type-specific functions. I know a few nanoseconds may seem ridiculous, but there are use cases where these types of extreme optimizations are important. I once worked on a high performant network sniffer, that had to handle large amounts of network data in real-time. Writing such software will require all optimizations there is.

We have seen that selecting the correct data type can have a big impact on performance. However, I think we can say that the readers who expressed a fear of generics slowing down the software can be calm. On the bright side, I see that the generic solutions allow us to swap data types more easily, thus increasing performance even.

On the other hand, type assertion and type conversion in Go seems to be super performant.

As we have seen, many factors can play a role in the result, such as the arithmetic operator used, the data type, etc. There may be mistakes made in my benchmarks of which I am unaware.

If you have any ideas on how to improve the benchmarks or want to discuss them, feel free to reach out. You can find the full code at GitHub.