Rendering a Triangle with Apple's Metal API, using Go

Did you learn that the OpenGL API is being deprecated in macOS 10.14?
Are you interested in giving Apple's Metal API a try?
Are you a fan of Go and don't feel like switching to Swift or Objective-C for this?
Then this post is for you.

By the end of this post, we'll render a single frame with a colorful triangle using Metal. We'll render to an off-screen texture, and then copy its contents into an image.Image for further inspection.

I'll focus more on the Go code to make this happen. If you're not already familiar with the general principles of modern low-level GPU APIs such as Metal, Vulkan, Direct3D 12, it's a good idea to learn more about them first. There are lots of resources on this topic, and they're easily findable. For Metal specifically, I can recommend watching the Metal for OpenGL Developers session from WWDC18 that gives a pretty good overview, especially for those who are familiar with OpenGL.

General approach

Metal API is officially available in Objective-C and Swift variants. We will use cgo, which allows calling C code from Go, to use the Objective-C Metal API. A wrapper Go package can be created, exposing Metal as a convenient Go API.

I've already started a Go package for this at, and I'll be using it below. It's in very early stages of development though, so its API is going to evolve over time. You're welcome to use it as is, as a starting point for your own version (i.e., fork it), or just for reference.

Hello Triangle high-level overview

At a high level, there are 10 steps we'll follow to render the triangle:

  1. Create a Metal device. (It needs to be available on the system.)
  2. Create a render pipeline state. (This includes vertex and fragment shaders.)
  3. Create a vertex buffer. (It will contain vertex data: the position and color of each triangle vertex.)
  4. Create an output texture. (To render into. We'll specify a storage mode, dimensions and pixel format.)
  5. Create a command buffer. (We'll encode all the commands for rendering a single frame into it.)
  6. Encode all render commands.
  7. Encode all blit commands. (This is to synchronize the texture from GPU memory into CPU-accessible memory.)
  8. Commit and wait. (Until all encoded commands have completed executing.)
  9. Read pixels from output texture. (Into an image.)
  10. Save the image. (As a PNG.)

Let's look at each of the steps in more detail.

1. Create a Metal device

This is the starting point.

It will work as long as your system has a Metal device on it, otherwise the error will report there isn't one. See the system requirements for Metal if you're not sure about your Mac.

device, err := mtl.CreateSystemDefaultDevice()
if err != nil {

2. Create a render pipeline state

Metal uses the Metal Shading Language. Here's a starting program containing very basic vertex and fragment shaders:

#include <metal_stdlib>

using namespace metal;

struct Vertex {
	float4 position [[position]];
	float4 color;

vertex Vertex VertexShader(
	uint vertexID [[vertex_id]],
	device Vertex * vertices [[buffer(0)]]
) {
	return vertices[vertexID];

fragment float4 FragmentShader(Vertex in [[stage_in]]) {
	return in.color;

The vertex shader emits vertices from the vertex buffer. Note that each vertex contains a float4 position and color. float4 are four 32-bit float values.

We'll come back to this in the next step, since the vertex data we supply from the Go side needs to align with how the vertex shader will interpret it.

For simplicity, we'll put the Metal Shading Language program source code in a const in our Go code, and have Metal compile the shaders from that source. Then, we can use them to create a render pipeline state. It looks like this:

// Create a render pipeline state.
const source = `#include <metal_stdlib> ...`
lib, err := device.MakeLibrary(source, mtl.CompileOptions{})
if err != nil {
vs, err := lib.MakeFunction("VertexShader")
if err != nil {
fs, err := lib.MakeFunction("FragmentShader")
if err != nil {
var rpld mtl.RenderPipelineDescriptor
rpld.VertexFunction = vs
rpld.FragmentFunction = fs
rpld.ColorAttachments[0].PixelFormat = mtl.PixelFormatRGBA8UNorm
rps, err := device.MakeRenderPipelineState(rpld)
if err != nil {

3. Create a vertex buffer

This is where our geometry data is described. Each vertex contains a 4D position and an RGBA color. We'll use the f32.Vec4 type, which map 1:1 to the aforementioned float4 type in the vertex shader.

// Create a vertex buffer.
type Vertex struct {
	Position f32.Vec4
	Color    f32.Vec4
vertexData := [...]Vertex{
	{f32.Vec4{+0.00, +0.75, 0, 1}, f32.Vec4{1, 0, 0, 1}},
	{f32.Vec4{-0.75, -0.75, 0, 1}, f32.Vec4{0, 1, 0, 1}},
	{f32.Vec4{+0.75, -0.75, 0, 1}, f32.Vec4{0, 0, 1, 1}},
vertexBuffer := device.MakeBuffer(unsafe.Pointer(&vertexData[0]), unsafe.Sizeof(vertexData), mtl.ResourceStorageModeManaged)

To create a buffer, we need to give it some raw bytes. At this time, we'll use C-style unsafe code to pass the pointer to the beginning of the memory block and its size. It might be worth considering avoiding unsafe and instead passing a safely converted []byte, but I'll leave that for future work.

4. Create an output texture

We'll specify an RGBA pixel format, with each component being a normalized uint8 ranging from 0 to 255. 512x512 will be the output size.

// Create an output texture to render into.
td := mtl.TextureDescriptor{
	PixelFormat: mtl.PixelFormatRGBA8UNorm,
	Width:       512,
	Height:      512,
	StorageMode: mtl.StorageModeManaged,
texture := device.MakeTexture(td)

Importantly, we've used managed storage mode. That means there are two copies of the texture data: one in GPU memory, and another in CPU-accessible memory. As a result, when one makes changes to it, we'll need to synchronize the resource before the other can safely access the latest contents.

We'll deal with that in step 7 using a blit command encoder.

5. Create a command buffer

To render a frame, we need a command buffer to encode commands into. We can get a command queue from the device, and create a command buffer from it.

cq := device.MakeCommandQueue()
cb := cq.MakeCommandBuffer()

6. Encode all render commands

This step encodes all render commands into the command buffer.

// Encode all render commands.
var rpd mtl.RenderPassDescriptor
rpd.ColorAttachments[0].LoadAction = mtl.LoadActionClear
rpd.ColorAttachments[0].StoreAction = mtl.StoreActionStore
rpd.ColorAttachments[0].ClearColor = mtl.ClearColor{Red: 0.35, Green: 0.65, Blue: 0.85, Alpha: 1}
rpd.ColorAttachments[0].Texture = texture
rce := cb.MakeRenderCommandEncoder(rpd)
rce.SetVertexBuffer(vertexBuffer, 0, 0)
rce.DrawPrimitives(mtl.PrimitiveTypeTriangle, 0, 3)

We've created a render command encoder that clears the color attachment on load, stores the results at the end, clears using black clear color, and uses texture as the render target.

We set the render pipeline state from step 2, the vertex buffer from step 3, and then issue a draw call with triangle primitive type, starting at index 0 and with 3 vertices in total.

EndEncoding indicates the end of encoding commands into the render command encoder.

7. Encode all blit commands

This step is necessary because the output texture we created in step 4 uses the managed storage mode:

Importantly, we've used managed storage mode. That means there are two copies of the texture data: one in GPU memory, and another in CPU-accessible memory. As a result, when one makes changes to it, we'll need to synchronize the resource before the other can safely access the latest contents.

We'll deal with that in step 7 using a blit command encoder.

The commands we've encoded in the previous step ensure a triangle is rendered to the texture by the GPU, so now's the time to tell it to synchronize that texture for CPU access.

// Encode all blit commands.
bce := cb.MakeBlitCommandEncoder()

The Synchronize blit command does exactly that.

8. Commit and wait

By now, we've encoded all the render and blit commands we wanted. We commit the command buffer, which gets the GPU to start executing them all, and wait for the all encoded commands to finish executing.


If we didn't do the wait and moved on to the next step right away, we might get a partially-rendered triangle when reading the texture pixels.

9. Read pixels from output texture

We did wait, so we can now safely call texture.GetBytes to read the texture pixels. We'll copy them into an *image.NRGBA that we create with the same dimensions as the texture we created. Its pixel byte layout matches exactly, so we can just copy directly into img.Pix byte slice:

// Read pixels from output texture into an image.
img := image.NewNRGBA(image.Rect(0, 0, texture.Width, texture.Height))
bytesPerRow := 4 * texture.Width
region := mtl.RegionMake2D(0, 0, texture.Width, texture.Height)
texture.GetBytes(&img.Pix[0], uintptr(bytesPerRow), region, 0)

10. Save the image

We have the image.Image, let's write it to disk as a PNG with image/png package:

// Write output image to a PNG file.
err = writePNG("triangle.png", img)
if err != nil {

Where writePNG is a simple utility function:

// writePNG encodes the image m to a named file, in PNG format.
func writePNG(name string, m image.Image) error {
	f, err := os.Create(name)
	if err != nil {
		return err
	defer f.Close()
	err = png.Encode(f, m)
	return err


Putting all the steps together gives us the hellotriangle command. You can see it at

When we run it on a Metal-capable system, we get a beautiful triangle!

The triangle you render yourself will look even prettier to you, I promise. Good luck and have fun!


dmitshur commented 3 months ago · edited

An update: there is now code to open a window with a Metal layer, and render a triangle that follows your mouse at 60+ FPS (vsync on).



Write Preview Markdown
to comment.