Chapter 9: Audio

There are three ways to play sounds on the web, each with different tradeoffs in latency, file size, and control.

The HTMLAudioElement [MDN], which streams audio from a url for playback as soon as enough data has buffered. This is best used for background music because most of the audio can download during playback rather than forcing a long loading time up front.
An AudioContext [MDN] with preloaded audio samples. This is ideal for short samples and sound effects where precise timing and low-latency are critical. Sound effects tend to be short enough that preloading them all up front is practical.
An AudioContext using oscillators [MDN] or manually buffered bytes to build a virtual synth, which you can then use for both music and sound effects. It requires generating audio entirely in code, which adds significant complexity. In most cases, exporting samples from a dedicated audio program is both easier and more flexible.

`HTMLAudioElement`

HTML Audio Element

You can stream audio from any url using the HTML Audio Element. Place the following in an HTML file, and as long as the src points to an actual audio file you'll see a mini player right in your browser.

Programmatic control

Grab a reference to this element's DOM node and you'll be able to control it programmatically.

You could even remove the controls attribute from the HTML element to make it so that the audio UI doesn't even show up on the page.

new Audio()

The same result can be achieved without needlessly querying the DOM by using the new Audio(..) constructor.

index.html

<audio src="./my-example-song.mp3" controls />

This is the simplest way to get sound into your game and is great for background music, but it comes with a few caveats that you should keep in mind:

The audio isn't preloaded. The browser streams audio data as the track plays rather than downloading it all up front. And that's not always a bad thing. For background music, it's a better experience to load audio during playback than to force a very long wait time up front.
Playback timing is imprecise. .play() queues audio playback and returns a promise that resolves when the browser is ready to play, rather than playing instantly and synchronously. With this API there will always be some lag.
Playback requires a mouse or keyboard event to start. Gamepad inputs unfortunately don't qualify. If you want music to begin as soon as the game starts, the most reliable approach is a dedicated Play button triggered by a mouse click or keypress. Once that first interaction unlocks audio, subsequent sounds can play freely.

`AudioContext`

To play audio without latency, it needs to be pre-buffered. This means:

You need to fetch the raw bytes from the URL where the audio file is stored.
Once downloaded, those bytes need to be decoded for playback.

main.ts

// when using Vite, imports like this resolve to a url string
import  from "./assets/audio/shoot.wav";

// this provides the API to decode and work with audio data
const  = new ();

async function (: string) {
  // fetch the audio from the given url
  const  = await ();
  // then decode the data so that it can be played
  const  = await .(
    await .(), // raw bytes
  );
  return ;
}

const : <string, AudioBuffer> = {};

// run this as soon as you want to begin loading audio
().(() => {
  // optionally save the buffer to a cache here
  ["shoot"] = ;
  // or update the game state to indicate loading success
  . = true;
});

Playing buffered audio

With the decoded audio loaded, we can then play it.

Playback speed

The playbackRate property speeds up or slows down the sample. A value of 2 plays at double speed (and an octave higher). Let's try quarter speed:

Pitch adjustment

Similar to playbackRate, the source node has a detune property (in cents [Wikipedia]). Randomizing it gives every shot a slightly different pitch. This is also useful to break up the pitches of footsteps and impact sounds.

Detune is still just playback speed

Under the hood, detune speeds up or slows down the audio sample--just like playbackRate. The main difference is that playbackRate deals in speed multiples whereas detune operates in musical units.

If you use both together, they will stack.

rate = playbackRate * 2^(detune/1200)

Volume

The Web Audio API wires together nodes in a graph. A source is where audio data enters the graph, and the destination is the output (your speakers). So far we've been connecting them directly:

To control volume, insert a GainNode between them:

Its gain.value, between 0 and 1, scales how loud the signal is as it passes through.

Looping

Set loop to true and the source repeats until you call source.stop().

Stereo panning

A StereoPannerNode sits between the source and destination just like a GainNode. It allows you to pan between -1 (left) and 1 (right).

const audioCtx = new AudioContext();
const button = document.querySelector("button")!;

button.onclick = () => {
  const source = audioCtx.createBufferSource();
  source.buffer = soundCache["shoot"];
  source.connect(audioCtx.destination);
  source.start();
};

Oscillators

An oscillator takes a wave (like a sine wave) and generates a tone from it directly--without needing an audio file. We'll walk through the bare essentials in this section, but oscillators are a deep topic, so this will only scratch the surface. In most cases, we recommend working with exported audio samples rather than procedurally generating sounds with Web Audio oscillators. That said, these are the core concepts you should know.

Basic oscillator

createOscillator() produces a sine wave. After beginning playback with a call to start(), we'll also schedule a stop() one second later so it doesn't run forever.

Wave types

The type property selects the waveform: "sine", "square", "sawtooth", or "triangle". Each has a different timbre.

Frequency

frequency is measured in Hz. The default frequency for an oscillator node is concert A at 440 Hz. 220 is an octave below.

const audioCtx = new AudioContext();

button.onclick = () => {
  const osc = audioCtx.createOscillator();
  osc.connect(audioCtx.destination);
  osc.start();
  osc.stop(audioCtx.currentTime + 1);
};

For a deeper look at oscillators, wave shaping, and building a synth, see the MDN Web Audio API guide.

Scheduling and ramps

You may have noticed that some properties are set directly, like oscillator.type = "square", but others have an intermediate object with a value property that gets set. For example, we previously used oscillator.frequency.value to set the frequency rather than just writing oscillator.frequency = 220. This intermediate object is an AudioParam [MDN] and it gives us the ability to not just directly set a value, but also ramp it over time.

So you could slide between octaves, for example, with code like this:

const osc = audioCtx.createOscillator();
osc.frequency.value = 220;
osc.frequency.linearRampToValueAtTime(
  440, // new target frequency
  audioCtx.currentTime + 1, // when the ramp should complete
);

But you can also ramp gain (volume), detune, pan, playbackRate, delay, and a bunch of more advanced filtering [MDN] and waveshaping [MDN] settings, too.

Ramps prevent clicking

In addition to being useful creatively, ramps solve a common audio problem. Abrupt shifts in certain audio settings cause a discontinuity that will sound like an unpleasant click or a pop through your speaker. Using an imperceptibly-fast ramp instead of an immediate change to the value fixes this problem.

Abrupt changes cause clicks

This poorly-implemented Mute button abruptly toggles the gain between 0 and 0.5. You should hear an audible popping sound when toggling.

The fix

Replace the instant assignment with a very short ramp. We've gone with 2ms, which is far too fast to hear as a fade, but is enough to smooth out the discontinuity.

setValueAtTime(currentValue, now) anchors the current value on the timeline so the ramp has a starting point. linearRampToValueAtTime then glides to the target over the given duration.

Fade in and out

The same pattern with a longer duration becomes a fade. We also need to call cancelScheduledValues first to clear previous ramps.

Ramping other params

gain is not the only AudioParam. frequency, playbackRate, detune, and others can be ramped the same way. Here the oscillator sweeps from 220Hz down to 10Hz over one second using exponentialRampToValueAtTime.

exponentialRampToValueAtTime works just like linearRampToValueAtTime, except it follows a curve instead of a straight line. Because human hearing is logarithmic, exponential ramps often sound more even than linear ones.

Cannot ramp to zero

exponentialRampToValueAtTime throws if the target is 0. It also can't start from 0. When you need to ramp to zero, use a tiny value like 0.001 instead.

const audioCtx = new AudioContext();

const osc = audioCtx.createOscillator();
const gain = audioCtx.createGain();
gain.gain.value = 0;
osc.connect(gain);
gain.connect(audioCtx.destination);
osc.start();

button.onclick = () => {
  const previousValue = gain.gain.value;
  if (previousValue === 0) {
    gain.gain.value = 0.5;
  } else {
    gain.gain.value = 0;
  }
};

Bonus 1: Manual buffers

Manual buffers give you direct access to sample values, making it easier to port existing audio algorithms and apply standard signal processing techniques. Web Audio's oscillator API, in contrast, abstracts away the sample values you'd need for those algorithms.

Working at this level opens up techniques the oscillator API cannot match. You can play with volume by scaling buffered values. You can change a sine wave into a square wave by rounding. And any classic C audio code can be almost directly ported to JavaScript when you're just dealing with raw bytes in a buffer. This is exactly the approach jsfxr takes for its web-based sound effect generator (source).

White noise in a buffer

Filling the buffer with random values between -1 and 1 creates white noise.

Warning headphones users

This is a harsh, loud sound.

Sine wave

Here's what a 440hz sine wave looks like when manually buffered.

Multiple tones at once

You can add as many tones together as you want. To avoid clipping, scale them down after the addition. Here's an A Minor chord.

const audioCtx = new AudioContext();

const rate = audioCtx.sampleRate;
const duration = 1; // in seconds
const length = rate * duration;
const buffer = audioCtx.createBuffer(1, length, rate);
const data = buffer.getChannelData(0);

for (let i = 0; i < data.length; i++) {
  data[i] = Math.random() * 2 - 1;
}

button.onclick = () => {
  const source = audioCtx.createBufferSource();
  source.buffer = buffer;
  source.connect(audioCtx.destination);
  source.start();
};

Bonus 2: Spatial audio

With headphones on, compare the following raw audio...

...with the same clip in 3D space, positioned as though someone is walking in a circle around you, the listener:

This is possible with a PannerNode, the 3D counterpart to the simple StereoPannerNode from earlier. It positions audio in full 3D space relative to a listener, simulating how sound changes as it moves around a human head. However, this added realism comes at a significant performance cost, so 3D panning should be used where it matters most rather than everywhere by default.

MDN summarizes what's happening under the hood as:

basically a whole lotta cool maths to make audio appear in 3D space
- MDN, really

For an intuitive demo, check out this 3D boombox demo [MDN].

The 3D PannerNode is good in 2D, too!

Just because you have access to three dimensions does not mean you have to use all three. If you're building a 2D game, PannerNode is still the right tool to use for audio spatialization--just ignore the Z axis.

The basic setup looks like this:

const  = new ();

const  = .();
. = "HRTF"; // "Head-Related Transfer Function"

.. = -8; // 8 units to the left
.. = 0; // neither above nor below
.. = 1; // 1 unit in front of the listener

// you can also change the direction the audio source is pointing.
// this is not YOUR orientation, but the orientation of the sound!
.. = 1; // pointing to the right
.. = 0;
.. = 0;

const  = .();
.();
.(.);

HRTF stands for Head-Related Transfer Function [Wikipedia], which is used to simulate changes in sounds as they're positioned relative to a human head. These changes are deeper than simply adjusting left/right volume. Sounds in front of your head vs behind it would have identical left/right panning and yet be perceived differently. HRTF attempts to mimic these changes by also adjusting timbre, either boosting or attenuating frequencies in a physically realistic way. The alternative to HRTF is the more-efficient default value equalpower. Whereas HRTF relies on measured responses from human subjects, equalpower is more like stereo panning but in three dimensions.

To play audio in a circle around the listener:

const  = new ();
const  = .();
. = ["footsteps"];

const  = 4; // walk radius in meters
const  = 24; // # of segments to approximate a circle

const  = .();
. = "HRTF";
. = ;

const  = .;

// as AudioParams, all position and orientation values support
// the scheduling and ramping techniques we covered earlier.
..(0, );
..(0, );
..(-, );

for (let  = 1;  <= ; ++) {
  const  =  + (.. * ) / (2 * );
  const  = (2 * . * ) / ;
  ..( * .(), );
  ..(- * .(), );
}

.();
.(.);
.();

There is considerably more to explore with spatial audio. We've shown how to position audio sources, but you can also position the listener, set distance scales, modify the shape of the cone of sound emanating from the source, and stack multiple 3D positioned audio sources together. For a deeper tour, see MDN’s Web Audio spatialization basics.

Bonus 3: Working around autoplay restrictions

Browsers prevent you from autoplaying audio without a user-initiated mouse or keyboard event, which browsers refer to as a gesture. But that gesture doesn't have to be the one that actually starts playback. The standard workaround is therefore to react to any gesture at all, then call .resume() on the AudioContext as soon as a supported event has fired. After that, the browser will play all subsequent audio--including sounds triggered by gamepads--without restriction.

You still won't be able to autoplay audio immediately on load, nor will you be able to rely solely on gamepad interactions to kick off playback, but this is the closest we can get to true autoplay in 2026.

Chrome has a code snippet in its developer blog that they recommend. If you intend to only have one AudioContext in your game, which is likely, here's a simpler alternative:

const  = new ();
const  = new ();

const  = [
  "click",
  "contextmenu",
  "auxclick",
  "dblclick",
  "mousedown",
  "mouseup",
  "pointerup",
  "touchend",
  "keydown",
  "keyup",
] as ;

const  = () => {
  if (. === "suspended") {
    .();
  }
  .();
};

// fixes: "The AudioContext was not allowed to start. It must be resumed (or created) after a user gesture on the page."
.(() => {
  .(, , {
    : true,
    : .,
  });
});

References

Web Audio API guide — a more technical introduction to web audio for games
Jsfxr Pro — retro 8-bit sound effects generator
Web Audio spatialization basics [MDN] — a deeper dive on web audio spatialization
HRTF [Wikipedia] — how audio spatialization works
Web Audio API best practices [MDN] — loading strategies, autoplay handling, and tips for AudioParams
Web Audio, Autoplay Policy and Games [Chrome dev blog] — Chrome's announcement explaining their updated autoplay policy and its impact on web games