Scale House Back to blog
AI & Tooling 4 min read 11 April 2026

The voice in this post is fake.

ElevenLabs v3 lets you stage-direct an AI voice the way you'd stage-direct a human. Here's the voice we use, the one we shelved, and what the square brackets actually do.

Three buttons. One is Murray's voice clone, which we don't use because it gave him a British accent he doesn't have. One is the voice we actually use for narration. The third is the same voice, with stage directions in square brackets, generated through ElevenLabs v3. Press them in order. We'll wait.

What v3 actually changed

Until last year, generating an AI voice for a brand meant choosing one tone and hoping it fit every situation. You could nudge it with stability and style sliders, but the model itself didn't know what "excited" meant. It just knew "louder, faster". That's why most AI voiceovers from 2024 sound like a slightly nervous TED talk no matter what you ask for.

ElevenLabs v3 dropped that approach. The new model understands stage directions written in square brackets, the same way an actor reads them off a script. [whispers], [laughs], [excited], [sarcastic], [sighs]. You write them inline, the model performs them inline. The third button above is one voice doing three different deliveries inside a single ten-second clip. Same speaker, three different intentions.

The reason this matters for marketing is not that you can now make a clone laugh on command. It's that you can finally write copy the way you'd write copy for a human, with rhythm and breath and a tone shift mid-sentence, and the output stops sounding like a voicemail.

What we use it for

Three things, in order of how much we charge for them.

  1. Narrated pitch documents. When we send a deck to a brand, the founders read it on their laptop, on their phone, in the car. The car is the killer. A pitch you can press play on and listen to during a thirty minute drive lands in a way a PDF never will. The voice we use is calm, low-stakes, and never tries to hard-sell.
  2. Ad voiceovers. The traditional unit economics of a video ad include a freelance voice actor and a turnaround of two days. With v3 it's two minutes and the model can do the take seven different ways while you decide which one fits. We'd still hire a human for a hero spot. Everything below the hero spot is a candidate for this.
  3. Multilingual dubbing. ElevenLabs will dub a clip into 29 languages while keeping the same vocal style. For brands selling into Europe or LATAM that's not a feature, it's a market entry strategy. You record once and ship in nine languages by Friday.

What we don't use it for

Cloning a customer's voice without their knowledge. Cloning a celebrity. Cloning a competitor's founder. Replacing a creator who has built an audience around the sound of their actual voice. These are obvious. They're also the things that will get the whole category banned if enough people pretend they're not obvious.

Murray's clone is on the shelf because it's a bad clone, not because we're squeamish. The Scale House narrator voice was built from a custom voiceover session, not scraped, not cloned without consent. If that part of the workflow gets careless, the technology stops being a tool and becomes a liability.

The honest part

It's not free. ElevenLabs charges per character, and a long-form narration burns through credits faster than you'd think. The Creator plan covers what most small brands need, the Pro plan covers a small agency, and anything above that is for people running entire studio pipelines. Budget for it the same way you budget for stock footage.

Also: a synthetic voice that's good enough to fool a casual listener should be disclosed. We don't pretend the Scale House narrator is a real person. The disclosure is the whole reason this post exists.


Sources