I’m writing this on a 30-foot screen on top of a 10,000-foot mountain in Hawaii, at a table in an Austin coffee shop where I’m pretty sure other people are taking photos of me to send to their friends so they can all call me a piece of shit. In the last week, life has gotten weird.
My journey to the Haleakalā shield volcano Austin coffee shop began more than 30 years ago, in 1990, when my parents brought me to something called a “virtual reality exhibit” at the Seaport World Trade Center in Boston. I stood on a little circular pedestal, and the guy handed me a plastic gun and put a big headset on me. Suddenly I was in some cartoon world, in a military uniform, holding a real gun. The person on the pedestal next to me was there, also a cartoon, also holding a gun. After some janky waving and shooting, they kicked me out for the next person in line.
I had recently read The Phantom Tollbooth, where a kid in the real world crosses a magic threshold and enters a cartoon world. This felt like that. I wanted more.
Then VR disappeared for 25 years. Throughout the ‘90s and 2000s, “virtual reality” was a forgotten dream—a cool concept that never made it. But in the mid-2010s, VR made an unexpected comeback. 20-year-old Palmer Luckey’s duct-taped headset prototype had impressed enough investors for Oculus to become a real company. In 2014, Facebook bought Oculus. Google and Sony got involved. It was all finally happening.
In 2016, I decided to write about the VR revolution. I went around Silicon Valley, interviewing people at Google and Facebook to get the full scoop on VR. I even sat down with Mark Zuckerberg.
I demoed everything. It was mind-blowing. VR was about to take over the world. And I was gonna be the one to tell everyone.
Then two things happened:
- VR didn’t take over the world.
- I didn’t write a VR post because I fell into a six-year book hole instead.
From the bottom of my book hole, I kept following the story. At Facebook’s 2016 developer conference, Zuckerberg had demoed a new bleeding-edge kind of “standalone, inside out” headset. Up until then, there were two ways to do VR: The first was with a cheap headset, maybe using your phone as a screen, that could do primitive head-movement tracking but had no way to see the environment around you. The second was with external sensors on the walls and a headset that attached to a high-powered PC with a cord. “Standalone” meant the new headset would have the computer inside, with no need for a cable. “Inside out” meant the headset could see the room around you, so you didn’t need external sensors. In 2016, this was just a prototype. Three years later, Facebook launched Oculus Quest.
In 2020, standing around during Covid with my dick in my hand like everyone else, I got myself a Quest 2. It was amazing. I loved it. It was my daily post-writing reward activity. I made 3D art. I swam with whales. I went on cartoon vacations. I exercised by slashing music. I beat Trover.
Then, for some reason, I stopped. I can’t really explain why. I really loved being in the Quest 2. I recently dusted it off to give friends a demo and they were floored, reminding me how great it is. It just didn’t hook me. Maybe it was the solo aspect. I don’t have friends who do VR so there’s no one else to play with. Maybe it’s the friction. It’s minor, but charging the headset, putting it on, and creating a boundary1 is still a lot more friction than picking up my phone. Maybe my delight relied more on novelty than I realized.
It’s not just me. VR blows everyone away when they try it, but it seems to have a hard time hooking people for the long run. After a major wave of hype in the mid-2010s, VR receded into the land of subcultures.
And the question is: Is there some fatal flaw to the concept of VR that will always prevent it from achieving mass adoption? Or are we some tipping point away from VR exploding into the stratosphere like the computer and smartphone?
Enter Apple
Everyone remembers where they were when they learned that JFK was shot, a man had landed on the moon, or airplanes had flown into the Twin Towers. I remember where I was when I saw Steve Jobs unveil the first iPhone.
I didn’t always like Apple. My family’s first computer was an Apple 2GS. But then, like many early Apple computer users, I became a PC person. I used an IBM ThinkPad in college and thought Apple people were annoying.
Then Steve Jobs came back to Apple and started Making Apple Great Again. My post-college music composing path forced me to get a 2004 PowerBook G4. After getting used to the interface (why the fuck is there no start button?), I realized that Macs were amazing, and I’ve been an annoying Apple person ever since. But it wasn’t until 2007 that I became a fanboy.
In the presentation, when Jobs did the world’s first “swipe to unlock,” the audience made an audible gasp. A minute later, he brought up a list of artists in the phone’s “iPod” app and asked, “Well, how do I scroll through my list of artists? I just take my finger and scroll.” Another audible gasp. It’s weird that something so normal today was jaw-dropping 17 years ago.
The feeling I had watching that presentation had happened before. I felt it when I was five years old and tried Nintendo for the first time at a friend’s house (I can make something on the TV move by clicking this button??). I felt it in the early ‘90s when my friend showed me how to send an email (You can type something on your computer, hit a button, and it shows up on mine??). I felt it the first time I test-drove a Tesla (Why is this car accelerating so futuristically?).
I’ve learned to see a lot of meaning in these holy shit moments. In most cases, they’ve been followed by an entirely new industry sweeping the world—like the smartphone, video game, internet, and electric car revolutions.
In June 2023, Apple announced the VR—sorry, spatial computing—headset that had been long rumored: the Vision Pro.
I watched the presentation, but it wasn’t quite like my experience in 2007. First, I had gotten excited about VR multiple times in the past and ended up disappointed. Second, unlike demoing an iPhone, watching a VR demo on a 2D screen just doesn’t show you what it’s actually like. Oh, also, it was $3,500. I happily shelled out $600 for the first iPhone. But $3,500? For a V1 product that will get way better (and cheaper) in the next few years? When I already have a Meta Quest? Nah. I might be a fanboy but I’m not a chump. It was the obvious grown-up decision to wait it out. Then I ordered one in the first minute after preorders started.
This Monday morning, I went to the Apple Store to put the Vision Pro on my chump face for the first time. The staff member guided me through a demo. And there it was: the holy shit moment.
But it was a holy shit moment with an asterisk. I had experienced full holy shit moments both in 1990 and in 2016 with VR, and these were the notable exceptions to the “holy shit moments are a surefire omen of an industry about to blow up” rule. Was this time different or would history repeat itself?
What I did know was that it was finally time to write a VR post. I wanted to post this week while everyone was hyped up about the Vision Pro. But I didn’t want to write about it before I had used it a lot, so I could experience not only the honeymoon phase but also what it was like to get thoroughly sick of it.
The plan was clear. I went home, told my wife that I would be deeply ignoring her and our baby for the week, and spent twelve hours a day in the headset for four straight days. I’m writing this on Thursday afternoon, having already logged over forty hours. Here are my thoughts.
Vision Pro, V1
There are three elements of any VR system: hardware, operating system, and applications. Let’s talk about each.
Hardware
Apple Vision Pro (AVP)1 is heavy—a decent amount heavier (650 grams) than Meta’s Quest 3 (515 grams). It comes with a fancy band that goes on easily and you tighten with a little knob. It’s awesome. For 12 minutes. Then it started killing my face. With 3,500 regrets, I switched to the other band it comes with, which includes a loop that goes over the top of your head, and thank god for that because it was way better—so good that I am shocked to say that even at the end of a full day wearing it, I didn’t feel a euphoric “ahhh” relief taking it off. At least right now, it seems only a little more uncomfortable than wearing over-ear headphones for long periods of time. This might not apply to everyone, but I have not felt nauseous once while wearing it.
That doesn’t mean there’s nothing that sucks about wearing it. The “field of view” isn’t great, meaning there are thick black walls where your peripheral vision is supposed to be, which is a bummer. I can’t imagine it’s great for your eyes. And there’s no way around the fact that you feel like an asshole when other people are in the room.
There’s an external battery pack that connects to the headset with a cord and typically lives in my pocket. The battery lasts about three hours, but you can plug in the battery to make it last forever, like a computer if the battery only lasted for three hours. (You’re often using it in conjunction with your computer, which makes it a non-issue because you can plug the battery into the computer.)
When you put the headset on, it does the AVP version of Face ID: scanning your irises. This is seamless and very futuristic. Then, you see exactly what you saw before putting the headset on. Lots of reviewers have marveled over AVP’s “pass-through” capabilities, and the second I put it on, I understood why. While it’s not perfect, it’s almost like you’re wearing a transparent snorkeling mask. The headset is in fact opaque—cameras on the outside transmit the world onto screens on the inside. But the screens are so good and the latency so low that it really seems transparent. Then there’s the much less successful attempt to make it look transparent from the outside as well, using cameras on the inside to broadcast your eyes onto the front of the headset. The goal is that if you’re talking to someone while wearing the headset, it feels to both people like you’re wearing a transparent snorkeling mask. But at least in V1, the eyes don’t show up nearly as well as advertised.
The internal screens save energy by doing something clever called “foveated rendering”—i.e. only putting the exact place you’re looking in perfect focus while making the rest of the view lower-res. This is what your actual eyes do, which is why your peripheral vision is blurry. If you watch this viewcast I made, you’ll see that most of it is blurry (the sharp part was where I happened to be looking while taking it)—but as the person in the headset, I only ever saw perfect sharpness.
The way Vision Pro does audio is also cool. There have always been two sound options for me while on my phone or computer: play from the speaker and everyone can hear it or put on headphones and no one can hear it. AVP speakers are somewhere in between. The speakers (which sound great) are small and right above your ears, and while people right next to you can hear what you’re hearing, people in the next room cannot. So in a coffee shop or on an airplane, you still need headphones, but I do a lot of my work in an office in our house with the door open, and it’s been nice to work both without headphones and without bothering anyone in the other room.
Operating System
This was the biggest holy shit of my holy shit moment. Apple is the king of simple intuitive interfaces. Part of what drew those gasps in 2007 was how natural the iPhone’s interface was. You scrolled down by pushing the page up, just like you would in real life. You zoomed by pinching with two fingers. It seemed like magic. AVP’s interface is gaspworthy for the same reason. The main gesture is what I’ve been calling the “eye pinch.”
When you press the button at the top of the headset, your apps come up, floating in the room in front of you, looking as real as any other object in the room. They’re fixed in space. You can walk right up to them, and the detail is amazing.2
Vision Pro’s eye tracking is outrageously good. It knows precisely where you’re looking. So all you do to select an app is look at it and tap your thumb and index finger together. Your hand doesn’t need to move up to do this, just somewhere the headset can see it. Watching the ads, it seemed like this might be annoying to do, but it’s every bit as easy and intuitive as opening an app on a smartphone.
No matter what you’re doing, the eye pinch is the equivalent of touching a finger to a smartphone screen. To scroll, look anywhere in the window, pinch, and move your hand up. To move a window, look at the little bar below the window, pinch, and move it where you want to. To resize the window, look at the window’s corner, pinch, and resize.
As John Gruber put it in his review:
The fundamental interaction model in VisionOS feels like it will be copied by all future VR/AR headsets, in the same way that all desktop computers work like the Mac, and all phones and tablets now work like the iPhone. And when that happens, some will argue that of course they all work that way, because how else could they work? But personal computers didn’t have point-and-click GUIs before the Mac, and phones didn’t have “it’s all just a big touchscreen” interfaces before the iPhone. No other headset today has a “just look at a target, and tap your finger and thumb” interface today. I suspect in a few years they all will.
Then there’s the fact that everything you see in front of you is available desktop to work with. On my computer, I’m used to my applications being stacked, and I toggle between them. Or maybe I put a few vertical windows side by side. In AVP, I can put one eight-foot window in front of me, two more on either side of it, and a couple more above them in the sky. Then, if I get up to go to the other room, the windows all stay exactly where they are, waiting for me to come back. If I want to switch work spots, I just hold the headset button and the whole configuration jumps to the new location. This is all way cooler than I’m making it sound, so I made a video to show you how it works:
One thing you’ll notice in the video is that I routinely spin the digital crown on the headset to slide between being entirely in reality, partially in reality, and entirely in a virtual landscape. This is ridiculously fun to do. And it’s a general reminder that AR and VR2 being separate categories is a thing of the past. In the Vision Pro, the Quest 3, and any future headset, you can be 100% in the real world (when there’s nothing on the screen and it seems like you’re wearing a snorkeling mask), you can be mostly in the real world except there’s a virtual game board on your kitchen table or a little virtual butterfly fluttering around. You can be halfway between reality and virtual when, say, portals open up in the walls around you during a game. Or you can go full virtual.
Apps
There are many categories of spatial computing apps—productivity, entertainment, social, gaming, creative, fitness—and for most of them today, you’ll need a Meta Quest or some other non-Apple headset. There are a small handful of astounding apps for AVP, but they’re more a sampling of what’s possible than an actual app store.
The most “you can absolutely not do this anywhere but a VR headset” thing I did was their little taster menu of immersive entertainment. Entertainment on a headset runs on a spectrum of immersion. The least immersive is watching a normal movie on a massive screen in a virtual space like the moon or a giant theater. Those movies you missed that everyone says are best seen on the big screen—you can see those on a big screen now.
Next are movies that are framed in a normal rectangle, but they’re 3D looking—like when we used to wear those stupid paper glasses but much, much better. Sometimes, these surprise you when something comes out of the screen to fly through the air or stand on the floor between you and the screen. The AVR comes with one of these—“Encounter Dinosaurs”—and it’s delightful.
Finally, there’s full immersion, where the scene entirely surrounds you and you actually feel like you’re there. These are better described as “experiences” than “entertainment.” I saw rhinos up close in person last year. Then, this week, I did one of the Vision Pro experiences that’s an up-close hang with rhinos. These two experiences were very similar. Another experience lets you sit in on an Alicia Keys rehearsal where she sings some songs standing two feet away from you. You can watch her for a while, then look over at what the drummer or keyboardist is doing for a while—just like you would if you were actually there.
Photos and videos are also cool. When you take a panoramic photo, you sweep your phone around in a C-shaped arc—but on a flat phone screen, the result is a flat photo. In AVP, panos are C-shaped, like the photo you actually took. The C wraps around you, which I quickly learned brings the memory back way better than the flat version. You can also turn the headset into a camera and record photos and videos, both of which are immersive. When you later view them in the headset, they’re 3D, putting you right back into the actual scene.
Then there’s the infamous Vision Pro avatars. You get one of these by flipping the headset around and letting it take pictures of you from different angles. Then when you FaceTime someone, your avatar mimics whatever facial expressions you’re making. Here’s mine:
The first person I tested it out on was my wife, who immediately gasped in horror, saying I had “little uncanny valley snake eyes rolling around in my skull,” whatever the fuck that means.
The uncanny valley she’s upset about is this:3
The idea is that we like faces that are somewhat humanlike, and we like faces that are totally humanlike, but we hate faces that are almost-but-not-totally humanlike. Faces that fall just short of being human give us the collective willies.
Avatars used to suck. Then they got better. Now they’ve gotten so good they’ve plunged into the uncanny valley. This was always gonna happen at some point on the road to perfect avatars and that time is now.
To test it out for myself, I FaceTimed my friend Jules Terpak, who also has a Vision Pro. First I put her across from me at this table while we sat around with each other’s uncanny valley faces for a few minutes.
One very cool thing is that when I moved her window to a different seat at the table, her voice shifted locations to that spot. We concluded that this activity was not actually an upgrade over FaceTime, but that if there were more than two people, it could feel like everyone was sitting around a table together, which would be better than talking to a group FaceTime or Zoom.
Then we shifted locations to Mount Hood.
This felt more like we were actually hanging out somewhere, which is an effect you can’t get on FaceTime.
When we started going into apps together, it felt even more like we were actually doing an activity together, in a way you normally can’t do without being in person.
It’s very crude right now, but it’s a primitive version of something we’ll probably all be doing constantly in the 2030s. It’s the next step in a centuries-long human mission to conquer long-distance. First there were letters, then phone calls, then mobile phones3 and video calls. The next step is VR hangouts.
By far the thing I spent the most time doing in the Vision Pro was exactly what I normally do, but the AVP version. When you’re sitting down in front of your computer while wearing the headset, you can open your computer screen as a giant virtual window (which you still control with your normal keyboard and trackpad). Whatever screen you’re used to working on is now much, much bigger. It’s also much more mobile. I don’t usually work on the couch because I prefer my big monitor over my laptop screen. This week, I spent a lot of time working on my couch on a 100-inch monitor. I don’t normally work lying flat in bed because the laptop screen isn’t directly above me. This week I did, putting the screen up on the ceiling. I did some work outside on the porch and some more under a tree. Sometimes I saw the room around me, only with a big screen floating in it. Other times, I went fully immersive, writing on a mountain top, a sand dune, or the moon. And as I mentioned at the beginning of the post, I’m currently using the AVP in a coffee shop, which is officially embarrassing.
For some odd reason, you can’t open multiple desktops (yet), but you can open some of the things on your desktop as their own apps in separate windows. There’s an AVP iMessage app, so I closed iMessage on my desktop and opened it in an adjacent window. I often remotely cowork with Alicia (WBW’s Manager of Lots of Things), putting her in a little window in the corner of my screen. Now, she’s in her own window. If I’m willing to bite the bullet and switch from Chrome to Safari, I can pull my research and web browsing off the desktop too. The end result is that a single small, immobile computer screen has been replaced with a giant mosaic of screens, for the small price of having a snorkeling mask on my face all day. It kind of feels like you stepped into your computer screen, into the beautiful wallpaper landscape, amongst the windows. Very surreal. I wrote this entire post in the headset and have found myself enjoying writing more—and being more focused—than normal.
My overall feelings
The best way I can describe how I feel about the Vision Pro is a strange combination of utterly thunderstruck and mildly underwhelmed.
The magical interface, the giant screens, the immersive experiences—they’re just unfathomably cool and awe-inspiring. It feels like a sneak peek at the 2030s.
But after a couple of days, I found myself thinking, “Is that…it?” I had done the small handful of immersive experiences, played some of the small selection of games, looked at a bunch of my panoramic photos, and tried avatar FaceTime—and at the moment, there’s just not that much else to do in the Vision Pro.
The first iPhone left me feeling the same combination of blown away and bored. The phone and I had a torrid honeymoon, but after the novelty of the interface wore off, all it had to offer was the same 16 practical apps.
There was no app store yet, it dropped calls constantly, and the cellular internet (which you couldn’t use while on a call) was painfully slow. The iPhone wasn’t a world-changing device yet. It was the seed from which a world-changing device would grow.
If you zoom out on a story of technology, you usually see a big exponential curve.
But if you look at the curve up close, you see that it’s wavy, made of S-curves.
The first iPhone was such a big deal because it launched a new S. Investors had a new place to pour their money. Developers had a new place to pour their efforts. Creators had a new place to pour their talents. As millions of human hours worked on the collective human project, the next five years were a whirlwind of innovation and excitement. Apple’s keynotes became a must-watch for anyone interested in tech, as each jump between the iPhone 1 > 3G > 3GS > 4 > 4S > 5 was a major leap in hardware and software. It was the steep part of the S.
Then, the keynotes got boring. The changes were incremental. Apple stopped innovating and started refining. This coincided with Tim Cook taking over, but it isn’t his fault. The steep part of the S-curve doesn’t go on forever, and companies often reap the biggest rewards in the boring, top part of the S once the industry matures.
Maybe the reason VR has been slow to take off isn’t because there’s something fundamentally wrong with VR. Maybe it’s because, for the last decade, we’ve been working our way through the very early part of the VR S-curve—the slow part where foundational technology is researched and built. My Vision Pro is highly imperfect—overpriced, heavy, slightly glitchy, very limited, creepy-avatared—because that’s exactly what products are like at the bottom of the S. Consumer products aren’t ready for mass adoption during this stage. But it’s the breakthroughs made during these years that set the stage for the explosive exponential phase of the curve.
The lesson from past VR hype cycles is to temper expectations. The VR S-Curve explosion may be many years away or never come at all. But the lesson from past Apple launches is don’t bet against Apple, and Apple’s bet is that the Vision Pro could be a seed like the first iPhone—a platform for innovation that kicks a new S-curve into high gear.4
Vision Pro, V2 – V10
For someone to regularly use a piece of technology, the benefits have to outweigh the costs. Right now, the Vision Pro benefits are probably less than the costs.
I’ve already paid for mine, which removes one of the costs, and it’s still a question to what extent I’ll choose it over my computer in the long run. In that regard, the AVP might currently be more like those first cell phones you had to carry around with a briefcase than the first iPhone. Would you get a cell phone if the only way they came was attached a briefcase? Maybe, but it’s a close call.
For VR to achieve mass adoption, the good needs to be better and the bad needs to be less bad. It’s easy to imagine a pathway to both.
The operating system will get better each year. The two-finger pinch is currently the only gesture. More will be added. Eventually, there may be dozens of ways to make gestures with our fingers, each one a different command, like today’s keyboard shortcuts. When you spend ten minutes setting up an elaborate configuration of windows, you’ll be able to save (and share) it.
Avatars will go from uncanny valley to indistinguishable from your normal face. When you go into immersive environments, you can currently see only your hands. In the future, you’ll be able to identify other objects to remain visible (like a coffee mug). The environments around you will expand from the six current options to hundreds, including delightful fantasy worlds, and they’ll be interactive, allowing you to change things like the weather.
The hardware will get continually smaller and more comfortable. The resolution and frame rate will become as advanced as the latency. The battery will get way better. So will the look from the outside: to people in the room, the headset will come to look totally transparent. (My personal fantasy: The computer itself becomes detachable, allowing the headset to be a light, sleek, cool-looking visor. The computer and battery snap together into something the size of a smartphone. You’ll be able to snap it to the back of your visor if you don’t want the cord, but most people will prefer the weight to be somewhere other than their heads. The computer/battery rectangle will also have a screen and function as a smartphone for times when you want to do something with the visor off. The visor will fold neatly onto the rectangle to make the whole thing a single compact object.)
Finally, the amount of content, applications, and experiences will multiply by 1,000-fold, just like the apps in the app store did from 2008 to today. There will be a wide array of immersive games and entertainment. People will watch sports from one of many vantage points on the field, sideline, stands, or overhead—next to their friends, who will be able to look at each other and talk as well as if they were actually together in person. Pop stars will play in front of 50,000 people in person and 5 million people virtually. Fitness will become fun, interactive, and social. The best teachers and coaches will reach millions of people. Amazing AI teachers could reach billions. Distance will melt away, allowing people to spend high-quality time with their loved ones, no matter where they are. People who couldn’t dream of traveling the world today will get to enjoy vivid experiences anywhere on the globe. Of course, my silly 2024 imagination can’t scratch the surface any more than people in the briefcase phone days could have predicted Uber, TikTok, or Tinder.
Over time, the price will come down, with some companies making headsets dirt cheap the way they have for smartphones today. As the value proposition gets better and better, more people will have them, enhancing the social component and eradicating any stigma. Mass adoption seems like a very real future possibility.
I know what many of you are thinking: A world where everyone is in VR headsets (or visors, or glasses, or contact lenses) sounds dystopian and awful. And granted, this is coming from a guy who thought that world of glazed over people in moving chairs in Wall-E looked like a great place to live—but I’m excited.
K can I take this thing off my face now?
_______
What to read next:
A post about a technology even more intense than VR
A post about a different technology that’s also even more intense than VR
A post about a third technology that’s even more intense than VR
_______
If you like Wait But Why, sign up for our email list and we’ll send you new posts when they come out.
To support Wait But Why, visit our Patreon page.