Hijacking PipeWire for Fun and Profit

One of the things Linux users nearly always complain about is audio. From it flat-out not working, to it not getting picked up in screen shares, it always seems to be behind most of the issues that users cite with any distro (that and NVIDIA drivers but I digress). Fortunately in recent years, the audio situation has gotten a ton better with the introduction of PipeWire.

PipeWire, a Crash Course

PipeWire, to put it simply, is an audio-visual processing+routing tool that allows different streams of content to be "wired" and processed in various ways to reach a target destination. It itself sits on top of various audio and video backends and acts as a common interface that can be interacted with via a set of programs provided by the project, config files, or libpipewire, a wrapper around the PipeWire socket itself. Each of these methods have their advantages: with the programs, you can take the high-level tools given to you to interface with PipeWire and get pretty far without having to delve deep into the internals of media routing. Although this is arguably the simplest to get started with, it does relegate you to more "script-like" approaches of interacting with the audio server and incurs the overhead of launching a program every time you need to do something. Config files, as their names imply, give the user the capability to configure the audio server and filtering chain itself at startup. This approach works extremely well when users want a static approach to their configuration but do fall short when more complex runtime processing is required. The final interaction method, libpipewire, is the most complex but flexible way to interact with PipeWire and gives a large amount of control to the developer.

Although this gives a high-level overview of what PipeWire is and how we interact with it, we need to be a little more familiar with it before we start doing anything with it.

The Node Graph

Long explanations are all well and good but this is a case where a picture can speak a thousand words. Using the excellent tool Helvum (or qpwgraph), users can view (and edit!) their own node graphs for their system:

So what do these fancy boxes and lines mean? Well, each of these boxes represents a node in the graph and each line represents a link. Nodes represent things that either emit, consume, or modify media and have any number of ports that input or output data. Links act as bridges between these ports and can be viewed as linking a display cable between your computer and your monitor. You might also notice some nodes with mixologist in the graph, ignore those for now, they're a surprise for later.

Loopback

Loopback in PipeWire-land is a simple, yet powerful tool to give users more control over their streams. In short, they take media from an input and forward it to an output, allowing users to create virtual devices and modify the data going through them as they see fit. There is a module provided to us to use that packages up all that logic inside and allows us to treat loopback modules as if they were any other node in the graph. Although there is a provided tool to create loopback devices via the command line (pw-loopback), one can also load the libpipewire-module-loopback module via libpipewire to interact with them via code (without having to invoke subprocesses).

libpipewire

Let's get going in our first libpipewire application. Fortunately, the docs are relatively good and full program examples can be found by looking at the source code of tools like pw-cli.

Getting started is as simple as writing the following C program:

#include <pipewire/pipewire.h>

int main(int argc, char *argv[]) {
    pw_init(&argc, &argv);

    fprintf(stdout, "Using pipewire library version %s\n"
        "with client name %s\n",
        pw_get_library_version(),
        pw_get_client_name());
    return 0;
}

Bindings

Unfortunately, I am a masochist and an Odin user (jury is still out on whether those two are linked). That means that I need to either generate or write bindings. Although generation is a valid approach, writing them by hand should give me a better idea of how the library works and make it so that I can better ensure mapping of concepts from PipeWire to Odin (which will come in handy as we will see later).

To start, we need to write the following pipewire.odin in a pipewire/ directory:

package pipewire

foreign import pipewire "system:pipewire-0.3"

@(default_calling_convention = "c", link_prefix = "pw_")
foreign pipewire {
	init :: proc(argc: ^int, argv: [^]cstring) ---
	deinit :: proc() ---
	get_library_version :: proc() -> cstring ---
	get_client_name :: proc() -> cstring ---
}

This is relatively simple, just telling the linker to link libpipewire and giving some procedures to link with, allowing us to write the following program (executable with odin run .):

package main

import pw "./pipewire"
import "core:fmt"

main :: proc() {
	pw.init(nil, nil)

    fmt.println(
		"Using PipeWire library version:",
		pw.get_library_version(),
		"with client name:",
		pw.get_client_name(),
	)
}

In this case, we are safe to pass nil to pw_init as we have no arguments that we need to pass to libpipewire itself.

The benefit to using Odin here might not yet be apparent but in short, it will allow us to use some nice language features like dynamic arrays, maps, explicit allocators, and the excellent core library.

This covers most of our bases for bindings — any time we need a new procedure, we can simply translate the types from C to Odin and mirror the procedures and structs on the Odin side. There are however, a few glaring issues: our arch-nemeses static inline and macros (why binding generators shouldn't be trusted to "just work" with this project). PipeWire heavily relies on libspa, a header-only library that extensively uses C macros in inventive ways (i.e. implementing interfaces in C). The issue with these procedures and macros is that they aren't exported by libpipewire itself and instead must be reimplemented on the binding side. Fortunately, clangd does provide macro expansion on hover so both static inline procedures and macros can be trivially copied over and translated (even if some look a bit ugly as seen here):

registry_add_listener :: proc(
	registry: ^registry,
	listener: ^spa_hook,
	events: ^registry_events,
	data: rawptr,
) {
	_f := cast(^registry_methods)((cast(^spa_interface)registry).cb).funcs
	if _f != nil && _f.version >= VERSION_REGISTRY_METHODS && _f.add_listener != nil {
		_f.add_listener((&(cast(^spa_interface)registry).cb).data, listener, events, data)
	} else {
		panic("could not add listener")
	}
}

Users of the bindings don't have to write anything this gross.

Fortunately, most of this work is rather trivial and can amount to a somewhat relaxing experience cranking out code without a ton of thinking. Additionally, we don't necessarily need to write bindings for all of libpipewire, just the parts of the API we will use which does tone down the workload a fair bit.

ChatMix

What is ChatMix

Remember a couple minutes earlier when I mentioned those mixologist nodes? Well, here's where they come into play: a few months ago a friend purchased a SteelSeries headset. This headset had a feature called ChatMix, accessible through the (half-working) Sonar software. ChatMix, in short, allows users to create virtual audio devices and use a wheel on the side of the headset to mix volumes between those audio devices.

This can be extremely useful in certain cases where you may be in a long running Discord call and also have audio playing from another program (say a browser running FoundryVTT). Rather than constantly popping in and out of the volume mixer to make sure the audio levels are good, it can be much nicer to have some sort of hardware control that allows you to adjust the volume on the fly.

Pitfalls of ChatMix

Of course, ChatMix does have a few shortcomings:

Relegated to SteelSeries headsets
No Linux support
No app filtering rules
Requires users to "plumb" programs to the proper virtual device

The last two points require some extra explanation: even though these virtual audio devices exist, users cannot have programs automatically routed to the correct virtual device by program name, instead requiring them to change the audio setting for each program to select the proper audio output. This can become tedious and sometimes doesn't play nice with Windows audio settings where devices will inexplicably shuffle around for no apparent reason, resetting all of that configuration.

A Solution?

This is where we get to use our previously-attained PipeWire knowledge:

What is a "virtual device" in the context of ChatMix?
- Loopback, where the volume can be adjusted within the loopback node, not the program
Is there a way to control the "plumbing" of applications?
- Rewiring links in the node graph based on program name is doable via libpipewire

So, in short, would it be that difficult to write a program that does what Sonar + ChatMix does with the added bonus of supporting app rules and automatic plumbing? Probably not.

Mixologist

So, let's think about what we might need to design such an application. First, a name (because we have priorities here). Mixologist sounds like a good name so we'll go with that. Next, we need to figure out what this program will actually do:

Hardware volume control
Configurable program names
Automatic routing of programs

Hardware volume control could be done via a complex device driver but that seriously limits who could potentially use the program. Instead, we can rely on keyboard shortcuts that trigger commands via a cli of some sort. A cli does necessitate the use of some IPC however and as a result, we will probably need to use something like an abstract socket to add that. This means that a daemon will be needed as well which will manage the socket and do the actual routing. Adding a socket also enables any program to communicate with the daemon, opening up the possibility of a GUI in the future.

This gives us the following architecture:

mixd

So, let's get to the meat of the problem, the daemon. We'll call it mixd and think about what it should do.

IPC
Routing
Volume control
Config management

We can really contain all of the state we need for that in this struct:

Context :: struct {
	// config state
	config_file:       string,
	cache_file:        string,
	inotify_fd:        linux.Fd,
	inotify_wd:        linux.Wd,
	// pipewire required state
	main_loop:         ^pw.thread_loop,
	loop:              ^pw.loop,
	core:              ^pw.core,
	pw_context:        ^pw.pw_context,
	registry:          ^pw.registry,
	registry_listener: pw.spa_hook,
	pw_odin_ctx:       runtime.Context,
	// sinks
	default_sink:      Sink,
	aux_sink:          Sink,
	aux_rules:         [dynamic]string,
	device_inputs:     map[string]Link,
	passthrough_nodes: map[u32]Node,
	passthrough_ports: [dynamic]u32,
	vol:               f32,
	// allocations
	arena:             virtual.Arena,
	allocator:         mem.Allocator,
	// control flow/ipc state
	should_exit:       bool,
	ipc:               posix.FD,
	addr:              posix.sockaddr_un,
}

The first part of the struct handles the config file itself along with hot reloading via inotify. The next part handles the state that PipeWire itself requires. Note that we use a pw.thread_loop (as opposed to the standard pw.main_loop) so that we can still do IPC and config reloading on the main thread. Also note the registry and registry_listener fields, those will come in handy later. The third section holds the meat of the application state such as the state for each "virtual device", the rules to use for application routing, the volume, and some state to handle passthrough (for things like screen sharing). Finally, the last section just holds data for general control flow and IPC. Although this gives an idea of the data we keep on-hand, this doesn't explain how we actually do the routing. For this, we need to prepare ourselves for more PipeWire knowledge.

PipeWire Crash Course 2, Electric Boogaloo

The way mixd will handle PipeWire events will be by using a registry and corresponding listener. The registry can be thought of a global where information about every item PipeWire knows about is stored. Although we can (and will) query things from the registry based on their id, the listener is much more important to us right now.

A listener can be set up with many diffferent PipeWire objects and handlers but the handler we care about is shown below:

// global
registry_events := pw.registry_events {
	version       = pw.VERSION_REGISTRY_EVENTS,
	global_add    = global_add,
	global_remove = global_destroy,
}

// in main
main :: proc() {
    // ...
	ctx.registry = pw.core_get_registry(ctx.core, pw.VERSION_REGISTRY, 0)
	pw.registry_add_listener(ctx.registry, &ctx.registry_listener, &registry_events, &ctx)
    // ...
}

After this setup, every time an element is added or removed from the graph, the corresponding procedure will get called (note the "c" calling convention):

global_add :: proc "c" (
	data: rawptr,
	id: u32,
	permissions: u32,
	type: cstring,
	version: u32,
	props: ^pw.spa_dict,
) {}

global_destroy :: proc "c" (
    data: rawptr,
    id: u32
) {}

You may also notice that we register the listener with a pointer to the context struct. This is a fairly common practice with many C APIs where you also pass a data pointer to the procedure.

Fun note: Did you notice that registry_events is actually attaching procedures to a struct? This is doing dynamic dispatch!

Once we enter the global_add procedure, we can do the following:

ctx := cast(^Context)data
context = ctx.pw_odin_ctx

switch type {
case "PipeWire:Interface:Node":
	node_handler(ctx, id, version, type, props)
case "PipeWire:Interface:Port":
	port_handler(ctx, id, version, props)
case "PipeWire:Interface:Link":
	link_handler(ctx, id, version, props)
}

rebuild_connections(ctx)
free_all(context.temp_allocator)

This allows to to match on the different types of elements being added and handle them individually. We also then rebuild the graph after every element is added since there is no guarantee as to the order that events come in.

You might notice the free_all at the end of the procedure. This is something that Odin allows on arena allocators (no-op on non-arenas) to free everything allocated with that allocator.

Each of these handlers do what they say on the tin and make connections in the virtual graph that the Context keeps around for the rebuild procedure to then work on.

mixcli

With the majority of the daemon at least explained, we now need to turn our attention to the cli. First and foremost, we need a message passing format which can be seen here:

Message :: union {
	Volume,
	Program,
}

Volume :: struct {
	act: enum {
		Set,
		Shift,
	},
	val: f32,
}

Program :: struct {
	act: enum {
		Add,
		Remove,
	},
	val: string,
}

This format is extremely simple, consisting of a tagged union with the two kinds of messages. One can either set or modulate the volume by a set value, the other can add or remove a program from the list of selected programs. This is a relatively simple message but to send it over a socket, we need to first serialize it. My format of choice will be CBOR which bears similarities to JSON while being binary encoded. This also makes reading and writing the data as simple as:

// in mixcli
message, encoding_err := cbor.marshal(msg)

// in mixd
msg_err := cbor.unmarshal(string(buf[:bytes_read]), &msg)

CBOR is also in Odin's core library meaning that you can use it without installing anything extra. In fact, no external libraries have been needed for either the cli or daemon.

Now that we can send messages, the rest of mixcli is just a simple cli (which is made extremely easy with Odin's "core:flags" library).

Flags:
	-add_program:<string>, multiple     | name of program to add to aux
	-remove_program:<string>, multiple  | name of program to remove from aux
	-set_volume:<f32>                   | volume to assign nodes
	-shift_volume:<f32>                 | volume to increment nodes

End Result

So where does all of this work put us? Well, the project can be found on GitHub and is something I run on my personal machine. Keyboard control can be set up in the keybinds section of any major desktop environment. Configuration is relatively simple, right now being a newline separated list in ~/.config/mixologist/mixologist.conf. Hot reloading ensures that users can also modify the config file directly to update the program list. Some work additionally was put into creating a Systemd unit to start the program on login and an RPM package to make installing on RPM distributions simple.

Next Steps

What's left then?

A GUI that allows users to configure rules without having to touch the config file.
A physical peripheral that will act as a keyboard and allow for volume mixing.
A refactor to make the daemon something that can be embedded into any application, opening up the potential for a Flatpak.

To anyone who made it this far, thank you so much for reading!