LLMR: Real-time Prompting of Interactive Worlds using Large Language Models

¹Microsoft,
²Massachusetts Institute of Technology,
³Rensselaer Polytechnic Institute
2023
^*Indicates Equal Contribution

Abstract

We present Large Language Model for Mixed Reality (LLMR), a framework for the real-time creation and modification of interactive Mixed Reality experiences using LLMs. LLMR leverages novel strategies to tackle difficult cases where ideal training data is scarce, or where the design goal requires the synthesis of internal dynamics, intuitive analysis, or advanced interactivity. Our framework relies on text interaction and the Unity game engine. By incorporating techniques for scene understanding, task planning, self-debugging, and memory management, LLMR outperforms the standard GPT-4 by 4x in average error rate. We demonstrate LLMR's cross-platform interoperability with several example worlds, and evaluate it on a variety of creation and modification tasks to show that it can produce and edit diverse objects, tools, and scenes. Finally, we conducted a usability study (N=11) with a diverse set that revealed participants had positive experiences with the system and would use it again.

How does LLMR work?

LLMR is an orchestration of an ensemble of specialized GPTs. At its center is the BuilderGPT serving as an architect of C# Unity code for crafting interactive scenes. However, the multitude of tasks falling under virtual world creation renders a standalone coder insufficient. For instance, the ability to meaningfully modify an existing virtual world necessitates a profound semantic understanding of the scene. As humans, we have the ability to infer the properties of objects in the world and can refer to objects in the environment using demonstratives. To simulate the benefits of perceptual access, we incorporated the Scene Analyzer GPT. It generates a comprehensive summary of scene objects, offering detailed information when requested, including aspects like size, color, and the functionalities of interactive tools previously generated by LLMR. We also implemented the Skill Library GPT that determines the relevant skills that are needed for the Builder to accomplish the user’s request. In addition, we have observed that the code generated by the Builder lacks robustness and frequently contains bugs. To remedy this, we introduce the InspectorGPT, which evaluates the Builder's code against a predefined set of rules. This evaluation acts as a protective measure against compilation and run-time errors before the code is executed via the Roslyn Compiler.

Real-time Animation Generation and Control on Rigged Models via Large Language Models

We found that LLMR can be used to generate novel animations on a given 3D model using only natural language descriptions. Our method outputs structured strings encoding positional and rotational time series for each joint, which are parsed to produce animations on the rigged object. We showcase the generated animations on hierarchically distinct models with a variety of motions to underscore the robustness of our approach. Separately, LLMR can be used to program animation transition on humanoid characters via the generation and execution of appropriate Unity C\# scripts. Our approach is characterized by its flexibility, allowing for the seamless integration of pre-existing animations with custom game logic.

Additional videos

Controlling existing animations using LLMR

BibTeX

@article{delatorre2023llmr, title={LLMR: Real-time Prompting of Interactive Worlds using Large Language Models}, author={Fernanda De La Torre and Cathy Mengying Fang and Han Huang and Andrzej Banburski-Fahey and Judith Amores Fernandez and Jaron Lanier}, year={2023}, eprint={2309.12276}, archivePrefix={arXiv}, primaryClass={cs.HC} } @article{huang2023real, title={Real-time Animation Generation and Control on Rigged Models via Large Language Models}, author={Huang, Han and De La Torre, Fernanda and Fang, Cathy Mengying and Banburski-Fahey, Andrzej and Amores, Judith and Lanier, Jaron}, journal={arXiv preprint arXiv:2310.17838}, year={2023} }

LLMR: Real-time Prompting of Interactive Worlds using Large Language Models

Teaser video

Abstract

Examples of diverse use cases and functionalities enabled by LLMR.

LLMR can be used to make accessible interfaces from user prompts.

Cross-Platform and Cross-Scene Transferability made possible by LLMR.

Sketching objects into existence with LLMR.

Video Presentation

How does LLMR work?

LLMR architecture for real-time interactive 3D scene generation.

The Planner and its role in breaking down a user's high-level request into a sequence of manageable subtasks.

The virtual scene is converted into a parsed scene hierarchy in JSON format. This, along with the user request, serves as input to the Scene Analyzer.

Builder-Inspector paradigm in LLMR. This feedback loop significantly enhances the quality of the generated scripts.

Skill Library module workflow.

Object Retriever pipeline for generating a 3D scene.

Real-time Animation Generation and Control on Rigged Models via Large Language Models

One-shot and zero-shot generation of animations using Large Language Models

Additional videos

Controlling existing animations using LLMR

BibTeX