LLMR: Real-time Prompting of Interactive Worlds using Large Language Models

2Massachusetts Institute of Technology,
3Rensselaer Polytechnic Institute

*Indicates Equal Contribution

Teaser video


We present Large Language Model for Mixed Reality (LLMR), a framework for the real-time creation and modification of interactive Mixed Reality experiences using LLMs. LLMR leverages novel strategies to tackle difficult cases where ideal training data is scarce, or where the design goal requires the synthesis of internal dynamics, intuitive analysis, or advanced interactivity. Our framework relies on text interaction and the Unity game engine. By incorporating techniques for scene understanding, task planning, self-debugging, and memory management, LLMR outperforms the standard GPT-4 by 4x in average error rate. We demonstrate LLMR's cross-platform interoperability with several example worlds, and evaluate it on a variety of creation and modification tasks to show that it can produce and edit diverse objects, tools, and scenes. Finally, we conducted a usability study (N=11) with a diverse set that revealed participants had positive experiences with the system and would use it again.

Video Presentation

How does LLMR work?

LLMR is an orchestration of an ensemble of specialized GPTs. At its center is the BuilderGPT serving as an architect of C# Unity code for crafting interactive scenes. However, the multitude of tasks falling under virtual world creation renders a standalone coder insufficient. For instance, the ability to meaningfully modify an existing virtual world necessitates a profound semantic understanding of the scene. As humans, we have the ability to infer the properties of objects in the world and can refer to objects in the environment using demonstratives. To simulate the benefits of perceptual access, we incorporated the Scene Analyzer GPT. It generates a comprehensive summary of scene objects, offering detailed information when requested, including aspects like size, color, and the functionalities of interactive tools previously generated by LLMR. We also implemented the Skill Library GPT that determines the relevant skills that are needed for the Builder to accomplish the user’s request. In addition, we have observed that the code generated by the Builder lacks robustness and frequently contains bugs. To remedy this, we introduce the InspectorGPT, which evaluates the Builder's code against a predefined set of rules. This evaluation acts as a protective measure against compilation and run-time errors before the code is executed via the Roslyn Compiler.

Real-time Animation Generation and Control on Rigged Models via Large Language Models

We found that LLMR can be used to generate novel animations on a given 3D model using only natural language descriptions. Our method outputs structured strings encoding positional and rotational time series for each joint, which are parsed to produce animations on the rigged object. We showcase the generated animations on hierarchically distinct models with a variety of motions to underscore the robustness of our approach. Separately, LLMR can be used to program animation transition on humanoid characters via the generation and execution of appropriate Unity C\# scripts. Our approach is characterized by its flexibility, allowing for the seamless integration of pre-existing animations with custom game logic.

One-shot and zero-shot generation of animations using Large Language Models

Additional videos


    title={LLMR: Real-time Prompting of Interactive Worlds using Large Language Models},
    author={Fernanda De La Torre and Cathy Mengying Fang and Han Huang and Andrzej Banburski-Fahey and Judith Amores Fernandez and Jaron Lanier},

    title={Real-time Animation Generation and Control on Rigged Models via Large Language Models},
    author={Huang, Han and De La Torre, Fernanda and Fang, Cathy Mengying and Banburski-Fahey, Andrzej and Amores, Judith and Lanier, Jaron},
    journal={arXiv preprint arXiv:2310.17838},