The fragmented and technically demanding nature of augmented reality (AR) content development continues to limit its broader adoption in industrial settings. Current AR authoring methods often require specialized knowledge in areas such as 3D modeling, programming, computer vision, tracking, and rendering. While automated tools attempt to streamline content creation, they are typically rigid and unsuitable for dynamic or undefined processes.
Many AR development approaches depend on existing resources like CAD models or PDF manuals. However, in cases where these materials are outdated, incomplete, or unavailable—such as with legacy equipment or obsolete electronics—content creation becomes a significant bottleneck.
Although model and symbol libraries are emerging to address this, they require constant updates and rely on fiducial markers for registration. This introduces further challenges, including the need for infrastructure changes (e.g., marker placement and visibility), time-consuming setup, and ongoing maintenance. Natural feature recognition can eliminate the need for markers but instead demands complete, high-quality 3D models and clear visibility of all object features, which is often impractical in industrial environments. Furthermore, while depth sensors and image capture can help generate 3D models without CAD data, these methods may interrupt workflows, require invasive equipment, and struggle to capture small or occluded components.
To address these challenges, AR content creation methods must evolve to accommodate non-experts, operate without dependencies on pre-existing resources or infrastructure modifications, and adapt to changes in processes or tasks. The proposed method introduces a low-disruption, template-based approach that bridges traditional and AR interfaces by capturing expert knowledge—both explicit and tacit—through eye-tracking and structured information mapping.
A Multi-Modal Framework for AR Content Creation and Delivery
The framework supports two user roles:
- Content creators (e.g., trainers or experts), and
- Content consumers (e.g., trainees or task novices).
It comprises five main steps:
- Business Need Identification – Pinpoint a requirement for efficient knowledge transfer between experienced and inexperienced staff, particularly for hands-busy tasks.
- Task Recording via Eye Tracking – The expert performs the task while wearing eye-tracking glasses, generating visual and audio data that are then broken down into clear training steps. These are populated into a Unity 3D-based AR content template.
- Iterative Refinement – Content is refined through usability testing and feedback loops.
- User Feedback Collection – Feedback during real-world use is captured to guide further optimization.
- Knowledge Preservation and Empowerment – Organizations can retain critical knowledge through archiving and moderation, while users gain autonomy through self-paced learning.
(a) Task Recording & Information Mapping: The initial stage involves capturing the training task using eye-tracking technology. The recorded content is then structured through information mapping to define clear instructional steps. (b) Unity Development Environment: Unity is used as the core development platform, providing a visual interface for organizing and integrating multimedia elements into the AR training experience. (c) Augmented Repair Training Application Template with Mixed Reality Toolkit: The template, built on Unity and enhanced with the Mixed Reality Toolkit, offers a pre-configured structure for rapidly building AR training applications with intuitive interaction components. (d) Content Import: Trainers upload task-related media—such as video clips and images—into the template to visually support each instructional step. (e) Step Customization: Trainers define and organize the number of training steps, and for each one, they add supporting images, descriptive text, and other guidance to enhance user comprehension.
The training content (video, audio, gaze data) is captured directly during task execution. Eye-tracking glasses combine a front-facing video feed with a gaze fixation marker and integrated audio. This method enables the transfer of explicit procedural steps and tacit expertise, such as visual attention and workflow patterns. The resulting media is edited using standard software (e.g., Apple iMovie) and mapped into structured instructional steps.
Design of the AR Interface: Augmented Repair Training Application
The template organizes content using user-centric design principles. A numbered step menu allows users to track progress and navigate between instructions. Controls such as “home,” “back,” and “next” replicate traditional interface metaphors, easing the transition for new AR users. To enhance clarity, content is displayed with high-contrast text (white font with blue accents).
Instructional content is structured using information mapping, a method that includes:
- Chunking related steps and concepts
- Highlighting essential information
- Ensuring consistency in formatting and structure
- Integrating visuals effectively
- Presenting content hierarchically for intuitive learning
Verification of Learning and Task Completion
Verification of knowledge transfer can be achieved through interactive assessments or expert review. While quizzes may suit well-defined tasks with quantifiable outcomes, they lack flexibility for variable, real-world processes. For this application, expert review was chosen for its simplicity and alignment with the goal of maintaining learner autonomy while ensuring training effectiveness.
USE CASE:
The AR training assistance method was implemented to create shop floor training content for a small enterprise. While participants expressed enthusiasm about the innovative approach, many were unfamiliar with the tools involved—specifically Unity3D, eye-tracking software, and the AR device itself. This unfamiliarity led to several usability challenges during the content editing and deployment stages. Technicians required support to navigate unexpected issues such as accidentally deleted scripts or misplaced files.
Despite these challenges, subjective feedback from stakeholders was highly positive. Senior management and the sales team viewed the AR training tool as a valuable and forward-thinking investment. When showcased at trade events and to existing clients, the tool generated strong interest and favorable responses. Shop floor staff were also enthusiastic about the interface, interpreting it as a sign that their company was embracing modern technologies and investing in employee development. However, a small number of staff expressed discomfort using certain features of the AR device, such as voice commands and air tap gestures.
In response to the challenges identified during deployment, it became evident that a more accessible content creation interface would improve usability and reduce technical barriers. To this end, the development of an "intermediary interface" is proposed for future work. This interface would offer a form-based editing environment, simplifying the interaction between non-expert users and the underlying AR content formatting system.
Key features of the proposed intermediary interface include:
- Form-style input fields for guided data entry (e.g., text, video, audio annotations)
- Built-in tools to edit and align eye-tracking outputs into AR-friendly formats
- Tooltips and help documentation embedded within each section to guide users step-by-step
- Automated conversion of form inputs into AR code via a secure back-end service
- Responsive layout using a bootstrap-style grid system to accommodate various AR form factors
The Integrated Development Environment (IDE) would be decoupled from the front end and managed in the back end, ensuring secure storage, script execution, and formatting integrity. Additional layers of security and content moderation can be introduced to prevent accidental corruption, unauthorized access, or unapproved deployments.
This redesigned workflow aims to empower non-technical users to contribute to AR content creation while preserving the integrity, flexibility, and scalability of the training system.