Home
Challenge in Field Work

As a result, the burden on field managers is increasing, and the potential for delays in responding to accidents and incidents due to human error is growing.
In addition, analysis of vast amounts of data is time-consuming, preventing quick decision-making.
Field Work Support AI Agent

The reports generated automatically enable rapid reporting and accurate situational understanding and analysis to support efficient decision-making.
In addition, it also enables cause analysis based on long-duration videos.
The benefits of Field Work Support AI Agent
- Increase safety
- Real-time incident detection enhances accident prevention and safety management
- Data-driven decision-making
- Real-time data and automatically generated reports support more accurate and faster decision-making
- Increase productivity and work efficiency
- Reduce work time and increase productivity by understanding field-work status and providing accurate instructions
- Reduce workload and cost
- AI reduces the workload on workers and field managers by analyzing long duration videos
- Reduce losses from accidents and errors and also contributes to reduce cost
|
|
|
|
|
|
|
|
Technical Overview
Target Industry/Users
- Employees engaged in field work in manufacturing, logistics, public and road management, construction, etc.
- Field managers and workers
Challenges in Target Industry and Operations
With a shortage of workers and an aging of skilled workers in the manufacturing and logistics industries, it is necessary to ensure safe and secure field work while maintaining productivity and quality. AI agents that cooperate with people are showing great potential in desk work and conversation support, but further evolution is needed for field work support.
Technical Challenges
- Response to field-specific situations
- Conventional multimodal LLM is not good at recognizing events field-specific events from video. There is a need for AI technology that can respond flexibly to diverse situations.
- Processing of long duration videos
- Multimodal LLM requires frame thinning when inputting long duration video. This causes a problem that decreases the accuracy of answers when analyzing video with time-series changes. Therefore, there is a need for technology that can efficiently process video and extract necessary information.
Solutions
- Enhance field understanding ability
- Field-specific events that cannot be recognized from the video are learned by the multimodal LLM by corresponding them to linguistic information in documents such as instructions and safety rules. This can enhance the AI agent's ability to understand video.
- Memorize video context
- By emulating the human “selective attention” mechanism, AI agent memorizes video by focusing on the most important information. It reduces the amount of memory required for processing while retaining information on the subject.This allows AI agent to handle even long video data.
Fujitsu's Technological Advantage
- Fine-tuning technology to enhance spatial understanding abilities
- For events that the multimodal LLM cannot recognize from the video, it can learn to correspond the document's textual information to enhance the AI agent's ability to understand the video. It is possible to add to AI agents a variety of abilities required for field work support, such as the ability to understand spatial relationships between workers and objects, field-specific object recognition, recognition of individual worker tasks, etc.
- Context memorizing technology to analyze video efficiently
- With “selective attention,” only the features in the video frame that suit the subject are selected, compressed, and stored in the video memory as video context memory. Video context memory allows multimodal LLM to handle long duration video without frame thinning. As a result of question-answering benchmarks for long duration videos, including videos over two hours in duration, the developed method achieved the world's highest answering accuracy with the smallest storage capacity compared to conventional video compression technologies for multimodal LLM.
The Benefits of Field Work Support AI Agent (Detailed version)
By analyzing camera videos installed at manufacturing or logistics fields with spatial recognition and referencing document information such as work instructions and rules, it supports worker operations by autonomously proposing improvement plans and creating work reports.
Various abilities required for field work support, such as the ability to understand spatial relationships between workers and objects, field-specific object recognition, and recognition of individual worker tasks, can be realized by adding them to the AI agent.
For the performance evaluation of AI agents, FieldWorkArena is available.
Demo Video
Related Information
- Benchmark suite for evaluating AI Agent FieldWorkArena
- Fujitsu develops video analytics AI agent to support safe, secure, and efficient frontline workplaces (Press Release, on December 12th, 2024)
- An Introduction of technologies to enhance spatial understanding abilities to realize "field work support agents" (Fujitsu TECH BLOG, on December 12th, 2024)