Charles Borland
- Jan 12
- 14 min read

Express Yourself: How VTubing Can Connect Hollywood With Fans And Help Scale The Metaverse

Imagine it’s the year 2030 and you’ve just created your very own one-of-a-kind 3D avatar. It could be a digital replica of you or it could be a ‘citizen’ from your favorite film, anime, or game universe. The point is, you own it. It’s yours. It represents who you are. It’s your unique digital identity in the real-time user-owned Metaverse.

Technically speaking, the avatar might be a series of identical 3D files spanning glTF, USD, and FBX, allowing the user to port their identity seamlessly from one virtual word to another, across hardware platforms, and throughout extended realities (XR).

But porting your avatar around 3D worlds and across realities isn’t enough. If you can’t express yourself - your personality, your views, your knowledge, your style - in real-time in the virtual realm, just like you can in the physical world, then what’s the point of having a unique digital identity in the first place?

Enter VTubing.

The Rise of VTubing

Short for ‘Virtual YouTubing’, VTubing refers to an online human performer who uses real-time motion capture (mocap) technology, including facial motion capture, to live stream as an avatar.

Note: In this article, the term ‘VTubing’ is used as a catch-all term for those who want to express themselves to others in the digital realm.

Source: Voltaku

Taking root in Japanese anime culture, the term ‘Virtual YouTuber’ was first coined on December 1st, 2016 by Kizuna AI, the first avatar live-streamer to self-identify as a VTuber.

Since then, VTubing has exploded across live-streaming social platforms like Twitch, YouTube Live, and, increasingly, TikTok. This growth is especially pronounced on the demand side of the VTuber equation, where VTuber viewership has snowballed at a staggering rate, catapulting entertainers like Usada Pekora to record-breaking heights.

Source: Voltaku

This increased engagement makes sense considering the seismic shift in attention away from passive video-only experiences, like TV and film, to the immersive and interactive experiences found in video games and on social media platforms.

GenZ provides particularly vivid insight into why demand for digital self-expression is so important in the Metaverse via Roblox’s latest Digital Expression, Fashion & Beauty report:

86% of GenZ respondents say it is “at least somewhat important that their avatar is able to express emotions in order to feel fully represented in the Metaverse”
84% of GenZ respondents report that “their physical style is at least somewhat inspired by their avatar’s style, including 54% who say they are very or extremely inspired by what their avatar and other avatars wear”
88% of GenZ respondents say that “expressing themselves in immersive spaces has likely helped them comfortably express themselves in the physical world”

Roblox’s survey provides support for a phenomenon called the ‘Proteus Effect’, in which a person’s behavior in a virtual world is altered by their avatar’s unique characteristics.

But this behavior isn’t new. In fact, it’s as old as the Ancient Greek theatre itself. Every classically trained actor in the Western world who has ever taken mask class will be intimately familiar with the findings in the Roblox report.

That’s because theatre masks are used as training tools for actors to free themselves from self consciousness and break down inhibitions in order to release their creative impulses. When you look in the mirror and another face stares back at you, it’s almost impossible for the wearer’s behavior not to be affected too. It’s not much of a leap, therefore, to see how this same phenomenon would naturally migrate to the virtual world.

Source: The documentary film, Creating a Character: The Moni Yakim Legacy

VTubing, then, is the canary in the coal mine alerting the physical world to the behavioral trends shaping the digital realm. With this in mind, virtual self expression acts as a sort of social glue for the Metaverse, bonding people and communities together, and real-time facial motion capture is the mechanism that facilitates that bond.

In its current form, VTubing is predominantly centered in anime-inspired idol/influencer culture. This focus on entertainment value helps explain why VTuber viewership and engagement has exploded. Both professional VTubers, like Gawr Gura, who is backed by a VTuber company called Hololive, and successful indie VTubers, like Nimu, can generate millions of dollars in revenue every year in tips, events, branding, and advertising.

However, this emphasis on idol culture obscures the true opportunity VTubing presents. That’s because in its current form the VTubing economy is centered on performers generating income as entertainers. But what if you don’t care about making money as an influencer or pop idol, you simply want to express yourself unencumbered in the digital realm?

We can see this behavior manifested in VRChat, an online virtual world platform where community members, using a combination of motion capture technology and IK Solvers, interact with each other in real-time as avatars.

The difference between VRChatting (made-up word) and VTubing boils down to two factors:

Incentives: Interacting with other avatars in a virtual world social setting (VRChat) vs live-streaming to an online audience on a social media platform (VTubing)
Technology: Players wear a VR headset and gear to interact with each other in a virtual world setting (VRChat) vs entertainers and creators using real-time motion capture camera tracking (desktop or mobile) to live stream on social media platforms (VTubing)

The emphasis in VTubing on communicating effectively to an audience via real-time facial camera tracking is key here as it allows for more expressiveness from the player/user. It’s the same technology used in movies such as Avatar, only on a smaller scale.

However, if we peer into our Metaverse crystal ball we can see how these two experiences will begin to overlap as the internet itself evolves from a 2D video experience that incentivizes an entertainer/audience dynamic to a real-time rendered 3D experience that incentivizes an interactive community dynamic.

The social media platforms of today will increasingly look like gaming platforms (many games are social platforms). As a result, avatars will not only need to seamlessly transcend platforms, realities, and devices but also be able to easily express the user's intentions and personality to others.

Unfortunately, the main bottleneck limiting this growth is the difficulty creating, driving, and even finding an avatar.

VTubing’s Supply Problem

While the total number is difficult to pin down across the entire online landscape, there are approximately 49,500 VTubers on Twitch and YouTube Live, the two largest live-streaming platforms.

Source: Gamesight

That works out to roughly 0.4% of the total number of live streamers across both platforms. And there’s a good reason for that massive gulf: it’s really hard to live stream as an avatar both from a cost perspective and a technical perspective.

Let’s start with 2D avatars.

2D avatars tend to dominate the current VTubing landscape since they’re more cost effective and easier to create and operate. They are also typically created in an anime style, which is a 2D form of animation beloved by first adopters.

Popularized by companies such as Hololive, some of the biggest names in VTubing employ a 2D avatar. Additionally, 2D VTubers are ubiquitous on platforms such as Twitch, where the act of live streaming gameplay as a 2D avatar is less cumbersome.

Source: Hololive

That said, there is still a healthy learning curve needed to live stream as a 2D avatar. It’s also not cheap. It can easily cost $1K to create and rig one of your own (think of rigging as akin to the hooks and strings needed to puppeteer a marionette).

Of course, these costs and complications scale exponentially once you enter the 3D realm.

To create a bespoke high-end 3D avatar, you have to hire a 3D modeler, a rigger, and a blendshape artist who knows their way around ARKit blendshapes, which are in turn based on FACS data. Meanwhile, most pro modelers and riggers have full-time jobs at leading tech, gaming, or VFX firms, making them hard to find.

If you do find and hire a moonlighting pro - and you’re really lucky - the modeler (or rigger) is also an expert facial blendshape artist. That’s because there’s a delicate balance between the facial topology of the avatar, which the modeler creates, and the performance of the avatar’s facial movements. If the underlying topology of the face is ‘off’, then the blendshapes that facilitate facial movement will move the face in odd ways, leading to an undesirable performance.

For instance, many modelers who come from the VFX industry in Hollywood create avatars with high polycounts used for offline rendering (3D models are created by joining thousands of 2D polygons - usually triangles - together, forming a 3D mesh with countless faces and vertices), whereas modelers who come from the gaming industry create avatars with lower polycounts used for real-time rendering.

If you really want to differentiate yourself in the VTubing world, you might want to consider a higher polycount more realistic-looking avatar, but then you’re going to need to find a blendshape artist who can make that avatar’s high-polycount face perform well in real-time. That’s not easy (we faced this issue at Voltaku).

Pro tip: Good blendshape artists and riggers will know if the topology of an avatar face is ’unsuitable’, give notes, and kick the work back to the modeler. Alternatively, a pro modeler who is also a great blendshape artist shouldn’t run into that problem in the first place (or if they do they’ll be able to fix it).

All that work means the cost to create and rig a single premium 3D VTuber avatar from scratch can run up to $15K, depending on factors like polycount, texturing, hair/cloth simulation, etc.

Ok. So, you now have your very own premium avatar, it’s time to ‘puppeteer’ it. To do that you’ll most likely buy wearable motion capture hardware. A good inertia-based full-body mocap suit with hand tracking gloves, and the requisite software needed to run them will cost over $25K (that’s if you’re an indie developer, it will be more if you’re a business).

Note: Camera-based motion capture AI solutions like Move AI are working to eliminate the need for expensive wearable motion capture hardware.

So, all-in, you’re looking at roughly $40K and weeks-to-months to set up and run an at-home high-end avatar system. Additionally, you should already have a powerful laptop and know your way around game engines like Unreal Engine or Unity, as everything needed to power the avatar mocap setup has to run through them, which means you’re tethered to your desk/studio.

But what if all you want to do is walk down the street and live stream on YouTube or Twitch as your avatar. Well, then you need to go mobile. But that comes with its own set of challenges.

You’ll either have to:

Pay hundreds of dollars for a lightweight wearable mocap setup like Sony’s Mocopi, which comes with its own, albeit less steep, learning curve, or
Download an indie 3D VTuber app, which, more often than not, suffers from poor UX and may not support high polycount avatars

Of course, you can always create and purchase avatars from third party avatar makers like Ready Player Me and VRoid Studio. However, while these are great options for creating an avatar, users are, by necessity, locked into a specific style of avatar as well as topology, which limits the ability to create a unique avatar that truly expresses you. For instance, if you don’t want your avatar to have an anime-inspired appearance, then a VRoid avatar won’t be an option.

The key takeaway here is that 3D avatars present a much greater opportunity than 2D avatars over the long haul because of their ability to be used in so many different ways, from games to live streaming on social media to filmed entertainment to augmented reality. They not only represent true utility but also allow for a more lifelike experience in the virtual realm.

However, the current barriers to create and implement these assets are a huge impediment to mass adoption.

The AI Solution

The obvious solution to VTubing’s avatar supply problem is artificial intelligence. Since it’s essentially a technological infrastructure layer, AI will permeate and infuse almost every aspect of the Metaverse.

For instance, the potential generative AI holds in being able to create, animate, and power avatars in an interoperable real-time 3D rendered setting is massive. We’re just not there yet.

To achieve scale in the Metaverse an avatar needs to be useful. And for an avatar to have utility, it has to be updateable, upgradable, portable, interoperable, and ownable. In other words, the avatar has to be film-ready, game-ready, XR-ready, and Web3-capable so that it can be used in any digital context the user/owner wants.

Additionally, avatar behavior should feel lifelike. Certain ‘traits’, like an avatar’s jacket and hair, for instance, should move independently of the body and limbs. This requires separate, layered, elements to work in tandem when constructing the avatar. Each layered trait, in turn, needs to be rigged (if it moves) independently, with requisite blendshapes as well.

This isn’t a problem for rigid objects like a fire hydrant, for example, which has a low polycount, hard surface, and no moving parts (and therefore no rigging required), but the complexity soars exponentially when it’s a human avatar with trait layers, a soft surface, and topology that continuously deforms as the model moves (meaning, each vertex of the avatar’s geometry needs to constantly be updated in real-time as its position in 3D space shifts).

Like a great coder, great modelers and riggers are able to achieve optimal results by finding a clever balance between complexity and execution, depending on the requirements. Therefore, future AI systems need to be at least as elegant.

Current generative AI systems, however, simply aren’t able to construct avatars with this level of sophistication (yet). Many current 3D AI systems create 3D rendered scenes that use a rasterization technique called gaussian splatting, which is a way of rendering 3D environments in real-time based on sample 2D images. Another method involves reconstructing 3D models from 2D portraits by training a system on large data sets.

Both methods show great promise for animating video but less so for navigating virtual worlds, where the user/player needs to interact (pick objects up, etc.) with their surroundings in real-time. That said, it is only a matter of time before these generative AI systems are able to get there.

When that happens, we can expect commerce in the digital realm to explode.

Digital Identity and the Virtual Economy

Simply put, the Metaverse is an evolution of the web, from a 2D video experience to a 3D virtual experience.

“The Metaverse is a massively scaled and interoperable network of real-time rendered 3D virtual worlds and environments which can be experienced synchronously and persistently by an effectively unlimited number of users with an individual sense of presence, and with continuity of data, such as identity, history, entitlements, objects, communications, and payments.” - Matt Ball

Web3, by contrast, is a revolution in the backend of the internet, where data is distributed across a decentralized peer-to-peer network rather than funneled into a centralized server.

In this world, persistent and immutable ownership of unique digital assets via blockchain-enabled digital native payment rails will form a pillar of the Virtual Economy that will underpin the Metaverse.

At its intellectual and inspirational core, Web3 is all about community ownership and community value creation that encourages a direct-to-consumer and peer-to-peer culture of exchange.

Unfortunately, the current state of Web3 is based on a speculation-based economy tied to crypto ‘degen’ culture, which is a huge turn-off for most people.

To transcend this state of affairs, a robust transaction-based economy is required to bring more buyers and sellers into the Web3 fold, which will be dependent on first scaling the Metaverse.

And what will incentivize more businesses and brands to enter the Metaverse en masse? When users are having fun rather than trying to get rich.

And fun requires utility. After all, where’s the fun in having an avatar if you can’t do anything with it?

Nuanced self expression in a real-time rendered 3D world, therefore, is vital to cultivating a true sense of digital identity. In the Metaverse, it won’t be enough to simply play your avatar in a game and interact with AI-driven NPCs (non-player characters). We already do that. Instead, your avatar will be an interoperable extension of you.

This is a potentially huge opportunity for Hollywood.

Direct Fan Engagement

Hollywood excels at telling stories but not at direct-to-consumer fan engagement.

As we wrote in our previous blog, Hollywood faces a constellation of challenges. And core to all of them is the flight of attention away from passive video-only content, like TV and film, toward immersive and interactive real-time experience sharing, like games and social media.

However, while Hollywood may find itself behind the proverbial eight ball when it comes to direct fan engagement, it still enjoys a tremendous advantage over other forms of entertainment with its stellar constellation of IP.

Hollywood is the undisputed leader when it comes to getting audiences to fall in love with characters, and the universes those characters inhabit (though the games industry is catching up).

Still, the most valuable properties in all of entertainment are immersive, ever-expanding fantasy worlds, and Hollywood has developed the most valuable storyworlds in existence.

The question then becomes, “how can Hollywood bond with fans where they spend most of their time so that studios can leverage their IP more effectively?” The answer is two-fold:

Create legal carve outs around the rights to IP to better incentivize fans to build directly on top of their favorite storyworlds
Create 3D animated content and virtual assets purpose-built for social platforms, XR, and games by merging real-time virtual production and gaming pipelines

Let’s take each of these points in turn.

Currently, storyworlds are locked behind legal walled gardens that make leveraging direct fan participation extremely difficult, if not impossible. Fans who want to create user generated content (UGC) and build on top of their favorite stories are actively discouraged in today’s closed media rights landscape.

In the Metaverse, these walls have to come down...at least partially.

For instance, Hollywood studios could retain the rights to the most vital aspects of a storyworld, such as the ‘named characters’ and the plotline/story itself, while allowing fans to partially ‘own’ the rights to 3D avatar citizens and various cultural assets, such as environments, props, fashion, gear, etc., from the larger story ‘universe’, so they can not only create content online but also monetize their efforts.

Disclaimer: I am not a lawyer nor an expert in IP rights, which is a thorny wicket. The proposed example is meant to highlight how rights management could evolve in the Metaverse.

This may be difficult for existing stories, like Harry Potter or Spiderman, where rights are dispersed across multiple stakeholders, but it could work for new stories Hollywood develops, especially 100% 3D animated content… if it plans ahead.

Let’s pretend Harry Potter is a new property and the key stakeholders are unified on how to allow fans to own certain elements of the story.

In this context, a fan couldn’t own the rights to a Hermione Granger avatar, for example, nor call their story Harry Potter but they would be able to build their own avatar and create their own content using cultural assets (e.g., think of a KitBash3D version of Diagon Alley) from the larger Wizarding World universe, and even monetize their online efforts.

The overall point here is to encourage rather than discourage fans from actively developing lore. The network effects, free marketing, and myriad potential revenue streams gained from harnessing the love fans have for their favorite stories is a potentially massive opportunity.

Note: This strategy is optimized for 100% 3D rendered animated worlds, like ‘The Lion King (2019)’, ‘Avatar: The Way of Water’, and certain virtually produced episodes of ‘Love Death + Robots’, where digital assets can be reused, rather than live action filmed entertainment.

And what exactly would these ‘efforts’ look like? This is where we move to the second point.

Currently, when a 3D asset is created in Hollywood, it is only used once, for the explicit purpose of filming it. After the film or TV show is exhibited, the asset is rarely, if ever, reused.

Think of all the avatars, environments, fashion, props, weapons, and other digital goods created in recent years for various movies and TV series. Almost all of these 3D assets are created with software that can be used in tandem with a game engine. Yet, these assets are never utilized beyond the discreet purpose of capturing them on film.

By merging real-time virtual production and gaming pipelines to create interoperable 3D assets, Hollywood studios could leverage web3 network effects to much greater effect by generating recurring revenue streams prior to the release of a movie or TV series.

Note: In this model, the discreet horizontal/waterfall stages of content production (development, pre-production, photography, post-production) that Hollywood uses shift to a vertical/agile workflow. As such, the allocation of sunk costs shift too. For more information on this topic, see our previous blog here.

Web3 in the Metaverse is a very different beast than the NFT craze of 2021 and 2022, where utility was primarily confined to a token granting certain perks to the owner, like privileged access to special live events or some secret web3 ‘alpha’.

Just like in games, utility in the Metaverse should first and foremost be about fun.

Imagine a major Hollywood studio that is developing a new 3D movie. Since this studio planned ahead, every asset (avatars, environments, props, etc.) it creates for that movie can be deployed by fans across realities and devices.

Since each asset is game-ready, VTube-ready, and ownable, every person who purchases an avatar from that storyworld can both play it in an open world game and use it to live stream on social platforms.

All this UGC activity is of course free marketing. But it doesn’t have to stop there.

Using their own VTubers, like Hololive does, and virtual production workflows, studios can create short-form narrative content to drive storylines and prime the community ahead of the release of the movie. In this sense, the studio (the prime content creator) is the tip of the narrative spear, always out in front of the community, driving the narrative forward.

This gives the community the sturdy narrative trunk it needs to develop new narrative branches.

And this all circles back to VTubing and self expression. Being able to both identify as a character from your favorite storyworld and then express yourself as that avatar in any digital context you want is a huge unlock for fans.

There are millions of Harry Potter fans across the globe, many of whom not only strongly identify with the Wizarding World but also derive significant meaning from that shared identity. How many of these fans, then, would jump at the chance to actually become a wizard?

The #OpenMetaverse is a parallel world to our own. A world that is being built and evolving in real-time.

A world where one’s imagination can finally become reality.