Fable Studio, a San Francisco-based startup that gained fame for demonstrating the ability to create an episode of South Park with a brief prompt, is making headlines again. The company is launching Showrunner, a streaming platform that will enable users to create their own AI-prompted episodes of various shows.
“The vision is to be the Netflix of AI,” says chief executive Edward Saatchi. “Maybe you finish all of the episodes of a show you’re watching and you click the button to make another episode. You can say what it should be about or you can let the AI make it itself.”
The ability for users to conjure their own scenes, control characters and dialogue, and assemble full episodes from just a few prompts feels like science fiction come to life. In other words, you are essentially choosing your own adventure. Netflix experimented with the “choose your own adventure” concept a few years ago, but it lacked the corporate gumption to continue investing in its experiments.
Netflix was not mistaken — creating “choose your adventure” television is expensive and also limited. However, “choose your own adventure” serves as the ideal test case for AI tools to demonstrate their capabilities.
Let’s face it — there are going to be numerous issues. The legal issues — we don’t know how the models were trained, whether they absorbed copyrighted material, and how they might infringe on those copyrights. We also don’t know how people will use these tools to take their creations to what one might politely call “dark places.”
Even without using the service — there is a waiting list — my first reaction was that this is interesting. Given their history, I hope Fable Studio ships this, as it will definitely be fertile ground for experimentation, and that’s what is exciting.
This places “AI” in the hands of people in a more tangible way — it goes beyond a “chat window” and instead provides a starting point. My first reaction to the news was that this is like a coloring book for the AI age.
Then it hit me — this is taking the idea of “fan fiction” to the video realm. Outside the safe and demographically challenged confines of Silicon Valley, we have Web Tunes, which is essentially a prehistoric version of what Fable Studio is proposing. In Asia, where anime and manga dominate, a regeneration-based offering is likely to be an even bigger opportunity.
There are one-minute soap operas that are popular in South Korea and have made their way to the U.S. Check out apps like DreameShort, ShortMax, and ReelShort, which describe themselves as platforms for mini-drama. These are ripe for a “choose your adventure” makeover. These products are built like “games” and not as Hollywood shows — and that is why we are likely to see experimentation. If you have paid attention to Character.ai, you know young people have a deep desire to engage with characters and create their own media. The Verge recently reported that Character.ai attracts 3.5 million daily users who spend an average of two hours a day.”
When I start to think about the idea of Showrunners (and hopefully many other such experimental competitors), I can draw parallels between them and early blogging platforms. They allowed individuals to publish, remix, share, and get involved in the web — making it more interactive, more two-way, and more social.
In the pre-Internet era, we did many of these things, just differently. We wrote journals, talked about things in cafes with friends, and sometimes wrote letters to the editor. The internet gave all of us tools to publish without any gatekeepers. I mean, what else are we doing on Facebook and Twitter? Same content, a different platform. That was at the turn of the century — almost 24 years ago. The web then was mostly “textual,” which made sense given our bandwidth limitations.
Fast forward to today, and the web has changed. A decade ago, I wrote an essay about the rise of the visual web. I explored many converging trends starting with camera phones, photo-sharing services, the omnipresence of cameras, and how information was going to be extracted from these visual artifacts, much like we did with text. Visual data was going to transform how we interact with information.
“Photos have always been tools of creative, artistic and personal satisfaction. But going forward, the real value creation will come from stitching together photos as a fabric, extracting information and then providing that cumulative information as a totally different package.”
I made these observations two years after some key computer vision breakthroughs, mostly because it was clear to me where we were headed. Since then, we have become a visual society. Today, Instagram, Snapchat, and TikTok wield gargantuan influence over our lives. They exhibit the same behavioral dynamics as blogging, which is essentially shorthand for social behaviors on the internet: publish, remix, and share. It is “visual blogging.”
What we don’t realize is that this visual tsunami on the web has been a key part of training what we call “AI” these days. Whether it is YouTube videos, photos on the web, or the unending heaps of textual web data, it has all ended up in the witches’ brew. If this is the new way we interact with data — and I do think it will be — then Showrunner, or a future variation of this idea, seems logical, at least to me.
Let me elaborate.
Perhaps not today, but in the near future, we will begin to interact with information through non-textual interfaces. Until now, we have primarily used computers via keyboards (both physical and virtual) and mice. With smartphones, we started to use “touch” as a method of interaction. Whether it’s capturing receipts for Expensify or taking photos of items to shop for later, we have begun using the “camera” to capture information. As technology improves, we are increasingly using the camera as a conduit between text and ourselves.
Cameras are already acting as non-textual conduits. Facebook’s Glasses are a good case study. We will continue to see proliferation and experimentation with new devices. Vision Pro, Ai Pin, Rabbit, or “Enter Name” are just the beginning and might become failed experiments, but there is no turning back. Fast forward a decade from now — perhaps sooner — one thing is certain: the share of “keyboards” as a way to interact will be much less. A good comparison is the share of viewing minutes for linear television in the age of streaming.
This future will indeed feature a larger presence of voice interfaces in our computing lives. The reason this hasn’t occurred so far is that the technology is not quite complete. The new AI technologies will act like Botox for voice interfaces. I discussed this topic recently on Howard Lindzon’s podcast. In my piece, the Real Personal (AI) Computer, I wrote:
What does the next step in personal computing mean? So far, we have used mobile apps to get what we want, but the next step is to just talk to the machine. Apps, at least for me, are workflows set to do specific tasks. Tidal is a “workflow” to get us music. Calm or Headspace are workflows for getting “meditation content.” In the not-too-distant future, these workflows leave the confines of an app wrapper and become executables where our natural language will act as a scripting language for the machines to create highly personalized services (or apps) and is offered to us as an experience.
In this not-too-distant future, we won’t need apps to have their wrapper. Instead, we would interface with our digital services through an invisible interface. Do I need to create a playlist in my music service when I only want it to play a certain kind of music? (By the way, that was the number one use case on Amazon’s Alexa.) Alexa, Google Home, and Siri are some technologies that have set the stage for this interaction behavior. Our kids are growing up talking to machines — for them, it will be natural to use their voice to get machines to do things.
A significant shift will happen when today’s kids, who have grown up “chatting” with Alexa and Siri, become adults. They will expect to interact with their machines through voice and other interfaces.
Add the demographic trends to the directional drift of technology, and you start to see where we are heading — what seems like a ludicrous idea today will become mainstream. Imagine being on ShowPress (WordPress of shows, LOL!) watching something on your “Vision Pro 15,” and then telling it, “Hey, tell this same story from the perspective of a big player in the story,” and the whole information is regenerated. Or, you yourself can recreate the “information” and share it with your dozen friends.
It might seem far-fetched today, but it can happen. Once this technology is in the hands of people, who knows where we will end up? Most of us forget that technology is for people, as blogging and podcasting pioneer Dave Winer reminds us. He writes:
“AI is opening up creative expression to people who can’t draw, or aren’t good writers, or people who want to be better programmers, or who knows — this technology is the most powerful I’ve used in my long life in tech.”
Just as the high priests of journalism once scoffed at blogging (only to become bloggers themselves) and radio jocks mocked podcasts, Hollywood insiders are likely to be up in arms about this. They will label it the end of creativity. What this represents is the beginning of a new era for creators. It takes more than a tool to make a great blogger or a great podcaster.
Similarly, in the AI age, it will require much more than a tool to be a creator. The only difference is that with millions trying, Hollywood, like the news media and radio before it, will have its work cut out. It will need to be truly creative and not just rely on gatekeepers to remain successful.
Popcorn, please, hold the butter!
June 1, 2024. San Francisco