Skip to content

Home

Takane Enterprise LLM with Generative AI Reconstruction
We enhance our LLM "Takane" using generative AI reconstruction technology, an AI lightweighting technique that achieves reduced size and power consumption for large language models. This technology consists of two core innovations: the world's highest-precision quantization technology that compresses weights assigned to connections between neurons—the foundation of AI reasoning—to the extreme, and the world's first specialized AI distillation technology that achieves both lightweighting and accuracy exceeding the original AI model.

Quantization Technology

(slide 7)

Quantization is a technology that reduces memory consumption by representing data typically expressed in 16-bit or 32-bit format as 8-bit or 4-bit. However, extreme quantization to 1-bit was previously considered impossible.

(slide 9)

In deep neural networks such as large language models (LLMs), input data is processed through many layers to achieve high reasoning capability.

(slide 10)

However, reducing the number of bits causes misalignment in how layers accumulate, leading to accuracy collapse. We achieve the world's highest-performance 1-bit quantization by predetermining target positions using sample data and deciding placement positions to cancel out total misalignment from previous layers (QEP: Error Propagation Method), combined with large-scale discrete optimization technology (QQA: Quasi-Quantum Annealing) cultivated through Digital Annealer.

Specialized AI Distillation Technology

(distillation)

Specialized distillation optimizes AI model structure like the brain strengthens necessary knowledge and organizes unnecessary memories. First, we generate diverse candidate models by performing Pruning to remove unnecessary knowledge and adding Transformer blocks to grant new capabilities to the foundation AI model. Next, we automatically select the optimal model that balances customer requirements (GPU resources, speed) and accuracy using Neural Architecture Search (NAS) with our proprietary Proxy evaluation technology. Finally, we distill knowledge from teacher models such as "Takane" into the selected structured model. This unique approach not only compresses but achieves accuracy exceeding the foundation generative AI model in specialized tasks.

Value Delivered by Takane Enterprise LLM with Generative AI Reconstruction

  1. Significant Memory Consumption Reduction
    • Enables AI agent execution on edge devices such as smartphones and factory machinery
    • Enables LLMs requiring one high-performance GPU to run on one low-performance GPU, significantly reducing computational costs
  2. Significant Performance Improvement through Specialized AI Construction
    • Achieves accuracy exceeding teacher models with student models 1/100th the parameter size
    • Reduces required GPU memory usage and operational costs by 70%

Technical Overview

Target Industries and Users

Quantization Technology

  • Users employing LLMs in on-premises or edge environments

Specialized AI Distillation Technology

  • Users seeking to leverage LLMs to process high-context information and provide decision-making support

Challenges in Target Industries and Operations

Quantization Technology

  • Power consumption and GPU costs are excessively high when using LLMs, or compact GPUs are needed for edge applications

Specialized AI Distillation Technology

  • While the potential for LLM usage in solving various business challenges is expanding, most users face issues related to the large scale of models, such as high usage costs and slow processing speeds

Technical Challenges

Quantization Technology

  • Previous quantization methods faced exponential accumulation of quantization errors in neural networks with many layers like LLMs, making it impossible to maintain performance beyond 4-bit quantization

Specialized AI Distillation Technology

  • Scaling generative AI models poses urgent implementation challenges including increased development and operational costs, serious environmental impact from massive power consumption, and efficient execution of large-scale LLMs on edge devices

Value Delivered by Generative AI Reconstruction Technology for Takane (Detailed)

Quantization Technology

  • For example, LLMs requiring 4 high-performance GPUs pre-quantization or 1 high-performance GPU with 4-bit quantization using competitor technology can now run on 1 low-performance GPU. This offers advantages in power consumption and cost, with potential expansion to smart speakers and similar devices

Specialized AI Distillation Technology

  • Lightweight models using our proposed specialized AI distillation method achieve the following performance in deal revenue prediction:

  • 70% reduction in required GPU memory usage and operational costs

  • Improved reliability in deal revenue prediction

Fujitsu's Technical Advantages

Quantization Technology

  • 89% performance retention with 1-bit quantization and 3x speed improvement represents world-class performance; competitor methods achieve less than 50% performance

Specialized AI Distillation Technology

  • Focused on efficiently developing domain-specialized lightweight LLMs for diverse customers

Usage Scenarios

Quantization Technology

  • Application Developers
    • When seeking to reduce GPU costs and power consumption for running LLMs, or when developing LLMs for edge devices such as smart speakers

Specialized AI Distillation Technology

  • End Users

    • Situations where users want to process specific domain tasks using LLMs. For example, processing high-context information for deal revenue prediction
  • Application Developers

    • Can be used when creating customer-specific models or when building foundational environments for customers to perform model lightweighting themselves

Case Studies and Use Cases

  • Not yet published

Technology Trial

Documentation

Document Name Description
Hugging Face Public Information Model card on Hugging Face