I have recently started working on a series of multimedia pieces as part of a virtual, multimnedia installation with the title Enough, Not Enough. The subject is a set of contradictions I cannot stop thinking about: a world that produces more than enough (food, technology, resources, capacity) yet runs on systems that guarantee scarcity for most. Not a malfunction. The mechanism itself. But this post isn’t really about the subject. It’s about a problem I walked into the moment I decided this work needed images as well as sound.
The problem is the eye. The eye is a fucking bully .
Soundtracks and videotracks
There is a default hierarchy in audiovisual work that gets rarely called out: the image leads, the sound follows. Film scores serve the cut. Music videos illustrate a song but defer to the screen’s grammar. Even in video art, where the rules are supposedly looser, the eye tends to win. Sound becomes atmosphere, underscore, emotional furniture arranged around whatever the image is doing.
It’s in the word itself: SOUND-TRACK . The sound is one layer in a stack, subordinate to the visual timeline. A composer scoring a film works to picture lock. Cuts already made, rhythms already established, the music’s job is to amplify what the eye already understands. Even when the score is extraordinary (and there’s plenty of that!), it operates within a frame someone else built.
I wanted to reverse that relationship. Not because scoring to picture is wrong (far from it), but because I am a composer first, and if this work is going to mean anything to me, the music has to be the structural foundation. Not accompaniment. Not atmosphere. The thing the rest is built around.
So the workflow is blunt: each piece is composed, recorded, and mixed before a single frame of video exists. The music is finished. It stands alone as a composition (it has to; that’s the test). Only then does the visual work begin, and when it does, the editing follows the musical structure: its sections, its pulse, its dynamics. The video is cut to the music the way a film score is written to picture, except inverted. The image becomes the videotrack. The music is not the soundtrack.
Easier said
This is harder than it sounds (pun perhaps intended). The moment an image appears on screen, attention migrates toward it, and the ear settles into a supporting role almost reflexively. You can compose the most structurally deliberate piece you’re capable of, but the second it sits alongside moving images, the listener’s brain reclassifies it as score. Background. Context for what the eyes are processing.
I put together a teaser for the series recently, and it demonstrates the problem with unapologetic clarity:
The images are just too strong. Plain and simple. The rapid sequences, the collaged text, the archival footage: they pull focus exactly the way Iyoutube said they shouldn’t. The music underneath (slow piano, almost-sub-bass synth, distant cello textures) does what it’s supposed to do structurally, but the visual density overwhelms it. The ear concedes to the eye within seconds.
It’s a useful failure, honestly. The teaser works as a compressed announcement of the series’ themes, and in that role the visual intensity makes sense. But it also showed me where the actual pieces need to be different: sparser, more willing to leave the screen empty or near-static, more disciplined about letting silence (visual silence) do its work.
The music has to breathe on screen the way it breathes on its own. The image needs to know when to get out of the way.
Why it needs both
Some could wonder: if the music comes first and must stand alone, why add visuals at all?
Because the subject demands it . The sound can embody the tensions at the heart of this series (dissonance that refuses resolution, rhythmic layers that won’t agree, structures that drift rather than cadence). But the specific trace of the contradictions I’m interested in (the language of development, the rhetoric of progress, the data that quietly indicts) lives in a visual register. Archive footage of mid-century optimism. Corporate vocabulary repurposed until its emptiness shows. Statistics rendered as landscape.
These materials sharpen what the music is already doing, not by illustrating it but by completing it. The visual layer adds a critical dimension the sound alone can only imply: juxtaposition as method, image and sound placed alongside each other, neither explaining the other, both making the contradictions harder to look away from.
So that’s the challenge I’m sitting with: give the eye enough to stay, not so much that it takes over. Let the music be the architecture and the image be the wallpaper (good wallpaper, the kind you actually notice, but wallpaper nonetheless). I have no idea yet whether I’ll pull it off. More soon.