Home

Fujitsu Auto Data Wrangling

In the process leading up to machine learning, AI now solves the data pre-processing for model building, which has traditionally been the most labor-intensive task for humans.

Challenge in Machine Learning

Traditionally, leveraging machine learning required the prior construction of an AI model. However, advancements in AI have now made it possible to automate model building itself using AI.
On the other hand, the data required for this model building still needs to be prepared and formatted manually, and a significant amount of time and effort is still spent on data pre-processing.

Solution with This Technology

By leveraging AI in the data pre-processing stage, we can reduce the time traditionally required by approximately 90%.
The resulting time savings can be used for more advanced analysis and model development.
Furthermore, by automatically performing processes that reduce errors when applying machine learning tools, as well as formatting and enhancement that are likely to contribute to improved prediction accuracy, we can shorten the machine learning model development period and improve accuracy.

The benefits of Fujitsu Auto Data Wrangling

Significant Time Savings	Reduces data pre-processing time by approximately 90%, allowing data scientists to dedicate their valuable time to what they should be doing.
Improved　Machine Learning Model Accuracy	Automated data cleansing and enrichment have been confirmed to improve machine learning model accuracy by up to 15%.
Utilization of Natural Language Data	Automatically processes data items described in natural language, converting them into a format that can be used for machine learning, thereby greatly expanding the scope of data utilization.
Flexible Integration via API	Seamless integration with existing systems is possible through the provided Web API, enabling rapid deployment into various business processes.

Demo
App Click here to try the demo app

Technical Overview

Target Industry/Users

Applicable to all industries, sectors, and services that utilize machine learning. Aims to be effective in environments where data is not well-organized.

Challenges in Target Industry and Operations

In data analysis projects, a vast amount of time and effort is spent on data pre-processing (cleaning, transformation, integration, etc.). The low quality of data and the complexity of the tasks required to bring it into a state suitable for analysis require specialized knowledge and skills. As a result, the lead time to start analysis is long, leading to lost business opportunities. In addition, the shortage of specialized personnel such as data scientists is also a major challenge.

Technical Challenges

Applying machine learning to on-site table data requires significant effort in pre-processing such as data formatting.
Conventional machine learning techniques cannot directly handle diverse text items, limiting accuracy improvements.

Solutions

Automated data cleaning based on LLM-based Feature Type Inference.
Automated data enrichment by analyzing text items using LLMs and adding new data items.

Fujitsu's Technological Advantage

Reduced data preparation effort (90% reduction compared to manual data wrangling).
Realization of automated data enhancement for text items, which cannot be addressed by other data wrangling tools.
A 15% improvement in machine learning accuracy with the application of this technology (when using Fujitsu AutoML).

The benefits of Fujitsu Auto Data Wrangling（Detailed version）

By automating the pre-processing of table data, data scientists and data engineers can focus on more advanced analytical tasks. This enables faster and more accurate business decision-making.

Data Cleaning : Formats input data to prevent errors when applying machine learning tools such as Fujitsu AutoML.
Data Enrichment : Analyzes the feature columns of input data and creates/adds new feature columns that are likely to contribute to improved prediction accuracy.

Note: This technology does not perform machine learning itself. Existing machine learning tools such as Fujitsu AutoML should be used for machine learning processes such as data learning, classification, and prediction.

Use Cases

End users:
- Users in various departments and industries who hold manually entered data or data in inconsistent formats that cannot be effectively utilized.
App Developers:
- Developers who can now shape and process data that was previously unusable due to poor quality, enabling the incorporation of machine learning, or enhance the accuracy of existing machine learning implementations.

Case studies

Ongoing PoCs are in progress.

Technical Trial

Demo App : The demo app is available for you to use.
Demo Video
A Proof of Concept（PoC） is possible.

Tech Blog, etc.
- Introducing Fujitsu Auto Data Wrangling
Related Technology
- Fujitsu AutoML

Documents

Document Title	Explanation
Technical manual	Fujitsu Auto Data Wrangling User Manual
OSS list	OSS list