Home
Update
- Feb. 27th, 2025: As FieldWorkArena V1.1, the factory dataset has been released for global use.
- Evaluation software: GitHub
- Evaluation dataset: Application form
AI agent deployment and evaluation

By FieldWorkArena

The benefits of FieldWorkArena
- Objective AI Performance Assessment
- You can evaluate AI agents in a real-world environment and objectively measure their performance.
- Rapid AI development cycle
- Accelerate development of AI agents through efficient testing with benchmarks.
- Reliable AI deployment
- Reduce risk and increase success for AI deployments.
- Improvement of efficiency and safety in field operations
- Through the selection and development of high-performance AI agents, we will improve the efficiency and safety of on-site operations.
- Accelerating the evolution of AI technologies
- Accelerate research and development of AI technologies by providing standardized benchmarks.
|
|
|
|
|
|
|
|
|
|
Technical Overview
Target Industry/Users
The manufacturing such as factories and warehouses, and logistics industries are the main targets. Users include developers of AI agents and companies seeking to improve efficiency and safety management in field operations.
Challenges in Target Industry and Operations
- Near-miss incidents in safety and manufacturing occur daily in field operations, and it is necessary to control the occurrence of serious incidents.
- There is a huge amount of data including images and documents in the field, making it difficult to extract and analyze information.
- There is no way to link incidents to corporate systems.
Technical Challenges
AI technologies such as multimodal LLM and AI agents such as GPT-4o can be used to solve the above problems. However, full-scale introduction has not been achieved for the following reasons.
- The ability of existing AI technologies to handle current complex workflows is unclear.
- Difficulty in integrated processing of various data formats (text, images, video, logs, etc.) obtained in the field.
- Technology has not been established to select appropriate sources and perform tasks autonomously, depending on the situation.
Solutions
FieldWorkArena from Fujitsu is a benchmark suite for AI agents that includes more than 40 types of data (Image, operation manual) from 2 real-world scenes, as well as around 500 field-specific tasks and correct answers. You can quantitatively evaluate the extent to which existing multimodal LLMs and AI agents under research and development can support various tasks in the field. FieldWorkArena can be used to clarify issues to be solved and as evidence when applying AI in the field.
Fujitsu's Technological Advantage
- No other company has a benchmark suite for evaluating AI agent performance that consists of real-world data and tasks such as factories or warehouses (as of January, 2025).
- Provides benchmarks that comprehensively address various types of field operations: work planning, action, and reporting.
- Collaboration with Carnegie Mellon University (CMU), the world leader in AI agent benchmarking.
The benefits of FieldWorkArena (Detailed version)
- Provide standard benchmarks for the development and evaluation of field support AI agents
- Contribute to improving the efficiency, safety and productivity of factory, warehouse and other manufacturing operations
- Activating research and development of AI agents for field work support in the research community
Use Cases
- End users:
- Existing AI agents and AI technologies such as multimodal LLM can be validated.
- By browsing the leaderboard, the best AI technology can be selected.
- App Developers
- By evaluating AI technologies under research and development such as AI agents and multimodal LLM in this benchmark, it is possible to claim superiority over existing technologies.
Case studies
- Evaluation of an AI agent that detects near-miss events from on-site camera footage and automatically reports them to the appropriate person
- Detection and reporting of health and safety violations in warehouse operations
- Confirmation of compliance with operating procedures in the assembly process of parts and materials
- Plans to offer retail scenes and tasks using CG data in the future
Program and Data
- Evaluation software: GitHub
- Evaluation dataset: Application form
- Leaderboard: coming soon
Related Information
- Video analytics AI agent to support safe, secure, and efficient frontline workplaces
- Fujitsu develops video analytics AI agent to support safe, secure, and efficient frontline workplaces (Press Release, on December 12th, 2024))
- An Introduction of technologies to enhance spatial understanding abilities to realize "field work support agents" (Fujitsu TECH BLOG, on December 12th, 2024)