Avatars in a browser tab: Rendering optimizations for large events and beyond

The Metaverse Festival was intended to be a watershed moment in the platform's history, so the user experience had to be top-notch: increasing avatar rendering performance was critical.

Introduction

The Metaverse Festival was supposed to be a watershed moment in the platform's history, so the user experience had to be top-notch: increasing avatar rendering performance was crucial.

When the Metaverse Festival was first planned, only 20 avatars could be created simultaneously around the user, and performance suffered dramatically if they were all drawn on screen at the same moment. Due to limits in web browser rendering and the previous communications protocol, which was later enhanced with the new Archipelago solution, the hard cap was imposed. This would not provide consumers with a true social festival experience, therefore Decentraland contributors decided that in order for the festival to be successful, they needed to expand the number of users who could be depicted on screen.

Contributors to Explorer started by putting a theory to the test: that rendering (together with CPU skinning) was the main cause of performance concerns caused by having several avatars on screen. After evaluating performance with 100-200 avatar bots in a controlled environment, the theory was proven.

In light of these findings, the goal was to increase the number of avatars that could be displayed from 20 to 100. Three attempts were made to achieve this goal.

A new imposter system has been implemented. When appropriate, this technology replaces distant avatars with a single look-alike billboard to avoid rendering and animating them.
A unique GPU skinning implementation was implemented, which significantly reduced CPU-bound skinning bottlenecks.
The avatar rendering pipeline was rebuilt from the ground up, lowering the number of draw calls from about 10 to only one in the best-case scenario. Due to the render state flip, complex combinations of wearables could increase the draw call count, but it wouldn't exceed three or four calls in the worst instances.

These combined changes boosted avatar rendering performance by roughly 180 percent, according to benchmarking testing. This is the first of many steps in preparing the platform for large-scale events such as the Metaverse Festival, which drew over 20,000 people.

Impostors of the Avatar

The authors were ready to use the profiling data and develop the avatar impostor system after implementing a tool for spawning avatar bots and another for profiling performance in the web browser.

A scenario was developed for testing purposes while great scenes were being created in the build up to the event. It had a stadium-like structure, certain continually updated elements, and at least two separate video streaming sources, all of which were desired for the setting. The Decentraland SDK was used for everything.

The proof of concept for avatar impostors was initiated once everything else was in place.

The contributors began working on the "visual portion" of the feature after the fundamental functionality was complete.

To begin, each avatar's impostors were randomly generated using a sprite atlas with default impostors.

Later, utilising runtime-captured pictures with each avatar's attitude towards the camera, several tests were conducted. However, texture manipulation in the browser during runtime proved to be exceedingly resource intensive, thus that option was dropped.

Finally, the default sprite atlas was applied to bots and users with no profile, and the users' body snapshot already existent in the content servers was utilised for their impostor.

Finally, some final effects in terms of position and distance were applied and tuned.

When testing with real users, highly frequented settings like Wondermine proved beneficial.

Skinning on the GPU

The Unity skinning code for the WebGL/WASM target compels the skinning computations to run on the main thread, and it also ignores all of the SIMD benefits available on other platforms. This cost accumulates and becomes a performance issue when rendering numerous avatars, taking up to 15% of frame time (or more!) when rendering multiple avatars.

The animation data is packed into textures and then passed into the skinning shader in most of the GPU skinning techniques documented on the internet. This is good for paper performance, but it has its own drawbacks. Blending between animations, for example, is difficult, especially when you have to develop your own custom animator to manage the animation slate.

Because the WASM target's skinning is so badly optimised, contributors discovered that even without packaging the animation data into textures, a performance boost of 200 percent can be observed. As a result, a simple approach that merely uploads the bone matrices into a skinning shader per frame will suffice. This efficiency was further improved by the fact that the most distant animations are throttled and do not upload their bone matrices every frame. Overall, this technique improved avatar performance while avoiding rebuilding Unity's animation system and maintaining support for animation state blending.

These films from the Metaverse Festival show the throttled GPU skinning in action on the farthest avatars:

Overhaul of Avatar Rendering

Avatars were previously rendered as a collection of skinned mesh renderers that shared bone data. A unique material was also required for some wearables to adjust for skin colour and emission. Some wearables required two or three draw calls as a result of this. In this case, drawing a whole avatar could take ten or more draw calls. WebGL draw calls are quite expensive, easily similar to mobile GPU draw call costs—if not worse. When testing with more than 20 avatars on the screen, the frame rate began to drop significantly.

The new avatar rendering pipeline works by combining all wearable primitives into a single mesh and encoding sampler and uniform data in the vertex stream in such a way that each wearable material may be packed into a single mesh.

You might be curious how the textures for wearables are packed. Because wearables are dynamic, they don't benefit from atlasing, and texture sharing is poor—almost non-existent—between them. The most visible improvements are:

All textures are packed into a runtime produced atlas.
Using a 2D texture array and rendering all wearable textures in real time.

The problem with these systems is that because to the CPU-GPU memory bottleneck, producing and transferring texture pixels in real time is exceedingly expensive. Another drawback of the texture array approach is that texture components cannot simply reference other textures; they must be copied, and all of the textures in the array must be the same size.

Furthermore, producing a new texture for each avatar uses a lot of memory, and the present client heap is limited to 2 GB due to emscripten limitations—it can be increased to 4 GB but only from Unity 2021 onwards, as emscripten has been changed. This was not an option because the contributors' intention was to support at least a hundred avatars at the same time.

A simpler but more efficient technique was adopted to avoid the texture copy concerns entirely.

A texture sampler pool is used in the avatar shader to employ the 12 available sampler slots for avatar textures. When rendering, special UV data is used to index the required sampler. The data is organised in such a way that the different UV channels can be used to identify albedo and emission textures. This method allows for extremely efficient packing. For example, the same material could have six albedo textures and six emissive textures, or eleven albedo textures and one emissive texture, and so on. The avatar combiner takes use of this by attempting to pack all of the wearables used in the most efficient manner possible—the price being that branching must be used in the shader code, but fragment performance is unaffected, therefore the investment returns are quite good.

Before the improvement (120ms / 8FPS), top image. After that, bottom image (50ms / 20FPS).

Conclusion

When 100 avatars were on screen after these improvements, performance increased by 180 percent (from an average of 10FPS to an average of 28FPS).

Some of these enhancements, such as GPU skinning throttled, may be extended to additional animated meshes in Decentraland in the future.

These massive efforts to allow 5x more avatars on screen (from 20 to 100) for the Festival are now permanently embedded in the Decentraland explorer, and they will continue to bring value to in-world social interactions.