Lesson 09 · Foundations · ~6 min

AI Isn't Just for Writing: Images, Audio, Video & Code

Way back in the first lesson, a term slipped by almost unnoticed: generative AI — AI that creates new things, we said, like text, images, music, and video. Then we zoomed straight in on the language side, because words are the part you talk to every day. Now it's time to open the other doors.

Because here's the surprising part, and it's the whole point of this chapter: it's all the same trick. The thing that drafts your email also paints pictures, writes songs, and makes movies. Once you see why, those breathless "AI can now do that?!" headlines stop catching you off guard.

The other doors

Walk through them one at a time. For each, the move is the same — you describe what you want, it makes one.

  • Images. Describe a picture in plain words and get the picture back. Handy for an illustration, a quick logo for a side hustle, a mockup, or restyling a photo you already have.
  • Audio and voice. It turns written words into natural-sounding speech, can create or copy a particular voice, and can run a recording back the other way into text. A real example you may have already met: the narration on this very course is AI-generated.
  • Music. Describe a mood or a genre and get a finished track — background music for a video, a little jingle, a birthday song with someone's name woven in.
  • Video. Describe a scene and get moving footage, more and more often with sound to match. This is the newest and fastest-moving corner of the whole thing, so it changes month to month.
  • Code. Describe what you want a small app or script to do, and it writes the working code. Even if you never plan to build a thing, knowing that AI makes software too is the piece that completes the picture.

How far it reaches now

And it keeps spreading past the creative stuff. It's starting to shape 3D objects for games and printing. Hand it your own messy spreadsheet and it'll summarize and chart it for you — that one's less "create something new" and more "make sense of what you've got," but it's a genuinely useful everyday win.

It reaches well beyond making things to look at or listen to, too — helping scientists predict the shapes of proteins and search for new medicines, even guiding robots that pick up a task from a spoken instruction. You won't use most of that. It's here just so you feel the scale of where this has gone.

The thread tying it all together

Notice what every one of these has in common. Back when we looked at how AI writes, the picture was a brilliant improviser — someone astonishingly well-read who riffs the next word from everything they've ever taken in. Prediction, not lookup.

Widen that exact idea by one notch and the whole chapter clicks into place. Show a system enough pictures and it learns the patterns of how images tend to go together, then paints a fresh one. Feed it enough music and it learns how songs tend to move, then composes. Same with speech, with video, with code. It's the same improviser as before — only now it's well-watched and well-listened-to as well as well-read.

You've got the map

And that's the whole foundation. You can tell the words apart, you know how it thinks and what it's for, how to ask well, where to go, what it costs — and now, the sheer range of what it can make. You came in as a driver, and a driver never needed to know how the engine is bolted together. You needed to know what these machines can do and how to steer them. You do now.

And there's a bonus round, if you want it. We keep leaning on one word — agent — that we've only ever met in passing: the AI that doesn't just talk, but goes off and actually does things. The next lesson is the most advanced one here, and completely optional. It cracks that word wide open, in a single simple picture. Curious? It's right next door.

← PREVIOUS
How You Pay for AI: Subscriptions vs the API