Tale of the Transcription Tape

By Matt Seidel in Our Passions, Our Day Jobs — 06 Feb 2014

Sometimes a love scene calls for [WHIMPERS], sometimes it needs [YELPS], but knowing which one to use makes all the difference. The secret life of a professional closed captioner.

Daniel Arsham. Crystal Eroded Reel to Reel, 2013. Courtesy the artist and MCLEMOI GALLERY.

After being inadvertently zapped by a gamma ray, a young woman finds herself acutely attuned to the sounds of the surrounding forest. She pauses, stunned by the deafening clamor of birdsong. I paused the video and thought about how best to convey her newly enhanced auditory perception of the natural world, toggling between [HEIGHTENED WARBLINGS] and the Keatsian [FULL-THROATED AVIAN SINGING]. Suddenly aware of how ridiculous either would look on screen, I settled on the distinctly un-poetic [LOUD BIRD NOISES] and went on about my business. I had already wasted enough time, and who knew what other agonizing choices lay ahead?

I do freelance editing work for a captioning and transcription company. On my worst days as a transcriber, I’m an unthinking drone, a Bartleby who doesn’t even have a curious narrator to speculate over his inner workings. On my best days, I consider myself something of a medium, a modern-day oracle who crafts a garbled mess spit out by the voice-recognition software gods into a signifying whole. I don’t have a fancy stenographer’s keyboard or a shorthand system to make my life easier, just my wits and a pair of Beats By Dre headphones. (They’re a tax write-off.)

The best transcriptionist is one who doesn’t attract notice to his work. Unlike the translator, whose work is always a form of betrayal—traduttore, traditore [translator, traitor] goes the famous Italian saying—the transcriber aims for nothing short of absolute fidelity. And thus, given that my job is literally to reproduce the material as accurately as possible, I am only as good as my material. So what keeps me motivated? Well, it’s the hope that one day—maybe not today, maybe not tomorrow, but soon—I will transcribe an iconic line. I often think of those fast-typing legends of yore, whether their hands trembled while captioning, commas and all, “Frankly, my dear, I don’t give a damn.”

On any given day I can end up with files from disparate fields—a MOOC on electrical circuitry, a technology company’s conference call, a focus group on pasta sauce.

The company I freelance for has its own voice recognition software. Its engineers run client audio files through the software to produce a rough transcript that can be quite accurate depending on the recording. After being thus processed, the files are placed on an online marketplace, at which point the editor logs on to choose one to transcribe. The editor can see information about each file: the client, running time, and price per minute. An audio and visual preview is also available so that one can avoid the most difficult files—faint recordings seemingly set in a wind tunnel and featuring multiple motor-mouthed speakers with accents that confound the voice-recognition technology, which is set to American English. These files lure in many a young transcriber by offering higher rates, but wily veterans know to search for the hidden gems that maximize one’s dollar-to-effort ratio (which reflects the real hourly rate more accurately than the dollar per audio minute does). My greatest such discovery was a documentary about a wandering yogi wherein three total words were spoken, two of which were subtitled and needed only a [SPEAKING FOREIGN LANGUAGE] tag. I measure every new project against this Platonic ideal.

After selecting a file, the editor follows along with the video, which is synched to the transcript, and fills in dropped words, corrects erroneous computer recognitions, and adds punctuation, paragraph breaks, and speaker IDs if necessary. Depending on the client’s wishes, a file is transcribed in one of two ways: “verbatim” or “clean-read.” With verbatim transcription, every “like,” “um,” “you know,” and dead-end introductory clause is preserved for posterity; the goal is to reproduce the listening experience as closely as possible, stutters and all. In clean-read transcription, yours truly rides in on his white horse to save the day, slashing through odious filler and jousting with the most dishonorably ungrammatical constructions. Or to put it less grandiosely, I take out every “like,” “um,” “you know,” and false start, perhaps correct an agreement problem now and then.

On any given day I can end up with files from disparate fields—a MOOC on electrical circuitry, a technology company’s conference call, a focus group on pasta sauce, a documentary on shark electroreception, a campy movie about a malevolent Gaelic fairy moonlighting as a DJ, or an Australian TV series about teenagers at a horse-riding camp.

I do mostly entertainment files now, given that they’re less dull and better remunerated than most, but every once in a while I dabble in MOOCs to keep sharp. In my brief experience with online education, I find the modules to resemble ransom videos more than anything else—certainly one way to keep a remote student’s attention. A beleaguered teacher stands awkwardly in a poorly lit room and intones a prepared text out into the void. I could plainly see the fear in one woman’s eyes as she demonstrated how to calculate body fat percentage with a caliper. The only thing missing was a newspaper close-up to confirm that the captive instructor was indeed still alive.

However, for every dry tutorial on mathematical order of operations there’s a delightful oddity, such as the demonstration of laughter therapy I recently transcribed. After cataloging the increasing intensity of the teacher’s mirth from [TITTERS] to [CHUCKLES] to [CHORTLES] to [GUFFAWS], I gave in to the moment and joined her in one of the most cathartic gut-busters of my life. It made my next file on database management slightly more bearable.

As for the movie transcription, most of the films I work on are second-rate; online media providers have implemented a “No B-Movie Left Behind” policy for captioning their swelling catalogs. From my perspective, the great thing about bad movies is how sparse the dialogue usually is, especially after the standard first-act exposition speech. Especially in middling dramas, characters literally and figuratively don’t have much to say. There are long scenes of contemplation. Sometimes they go on walks and contemplate further, killing more time before the film’s 85 minutes elapse and the credits mercifully appear.

I usually let the thunderous conclusions of love scenes pass without comment, with the exception of one tussle so histrionic that to deny its participants a [JOINT CLIMAXES] seemed downright petty.

The type-A characters usually go on runs. I love it when a character in a bad movie is a runner because it guarantees at least three near-five minute stretches of blissful quiet. I just slap a [MUSIC PLAYING] tag on there and relax for a few, maybe round off the scene with a [PANTING] if I’m feeling munificent. I don’t even pay attention to the inevitable [LOUD BIRD NOISES] in these scenes.

Comedies are a little tougher—a batch of old Saturday Night Live files quickly lost its allure after a few fast-talking, accented skits—though your garden-variety rom-com is pretty transcriber-friendly, save for a few madcap scenes and the inevitable presence of my arch-villain, the quirky best friend. This nuisance pops in every once in a while to interrupt the nice little rhythm I have going with the stolid and monosyllabic main character, who may or may not like to jog. Quirky best friends never go on jogs. They just talk and talk and talk, quickly and in different voices, until an indulgent eye-roll from the protagonist sends them on their way. On the transcriber’s shitlist, they rank somewhere between one of Aaron Sorkin’s voluble policy wonks and a Cockney street urchin in a British police procedural.

More than other types of transcription work, movie captioning allows me to cultivate my own voice. I feel most creatively fulfilled filling in nonverbal sound effects. Any hack can nail an off-screen [DOORBELL], but it takes an artist to convey the full range of the human emotional experience. My signature is the multiple descriptor. I like [GASPS AND BLUBBERS] over colorless [SOBS], [GURGLING CATERWAULING] over tepid [WAILS], and [CACKLING GIBBERISH] over jejune [UNINTELLIGIBLE], a tag for which we have a shortcut key. I feel these florid touches set me apart from the horde of doorbell-catalogers I call colleagues.

And then there is that most enticing creative challenge, the love scene. Call me old-fashioned, but I’m a [MOANS] guy. However, depending on the mood and the character, I can be persuaded to throw in a [GRUNTS] or a [GROANS] in exigent circumstances. And I confess that, just once, I added a gratuitous [ULLULATES]. (In my defense, I had just done several episodes of Xena: Warrior Princess.) Generally, however, one should exercise maximum restraint. As such, I usually let the thunderous conclusions of love scenes pass without comment, with the exception of one tussle so histrionic that to deny its participants a [JOINT CLIMAXES] seemed downright petty.

Like sex, transcribing is not without its bloopers. Watching one spirited romp—not Fifty Shades of Grey territory, but getting close—I confidently typed in a [WHELPS] to describe the peculiar shrieks of the leading lady. I realized several days later that the scene, though admittedly edgy, had little to do with canine birthing. I had really wanted either [WHIMPERS] or [YELPS], and in my excitement had mistakenly combined the two. From then on, I resolved to stick to [MOANS], or maybe [HEIGHTENED WARBLINGS] for special occasions.

I also enjoy working on horror and action movies because I can translate my love-scene skills to fights. You can always tell a movie I’ve done because there are [MOANS] and [GRUNTS] scattered through the final, kinky action sequences.

Speaking of kinky, just last week I transcribed a 90-minute documentary on freight trains in East Anglia. From his roadside vantage point, the narrator breathlessly rattled off timetables, routes, technical specifications, and updates about the newly refurbished signal box near Ely Station. He was particularly ecstatic when the conductor of one glorious freighter acknowledged his presence with a piercing [WHISTLE]. I was tempted to add a [CLIMAXES] to the rail fan’s rhapsodic narration, but I couldn’t hear the definitive proof over the rumbling carriages. As the Bard says, in a line I would kill to have transcribed, discretion is the better part of valor.