跳到主要内容

14 Energy-Efficient Artificial Intelligence

AI 1.0 vs AI 2.0

AI 1.0

  • Task-specific models
  • Heavy feature engineering
  • Data–task tightly coupled
  • Limited generalization

AI 2.0

  • Foundation models (e.g., LLMs)
  • Unified token-based representation
  • Core paradigm: next-token prediction
  • Strong transfer and general intelligence

System I vs System II

  • System I

    • Fast, intuitive, automatic
    • Corresponds to perception intelligence
  • System II

    • Slow, analytical, reasoning-based
    • Corresponds to cognitive intelligence

Modern LLMs aim to combine both.

LLM Inference Pipeline

Pre-fill Stage

  • Process the full prompt
  • Memory-bandwidth bound
  • Initialize KV cache

Decoding Stage

  • Autoregressive token generation
  • Compute-intensive
  • Latency-critical
Inference=Pre-fill+Decoding\begin{equation*} \text{Inference} = \text{Pre-fill} + \text{Decoding} \end{equation*}

Tokens per Joule (Tokens/J)

  • New efficiency metric for AI 2.0
  • Replaces FLOPS as primary system goal
  • Measures end-to-end inference efficiency
Tokens/J=Generated TokensEnergy Consumption\begin{equation*} \text{Tokens/J} = \frac{\text{Generated Tokens}}{\text{Energy Consumption}} \end{equation*}

Scaling Paradigms

Scaling Up

  • Larger models
  • More data
  • Higher capability ceiling

Scaling Down

  • Maintain performance at lower cost
  • Techniques:
    • Quantization
    • Pruning
    • Knowledge distillation

Scaling Out

  • Distributed systems
  • Parallelism and system-level optimization

New Scaling

  • Multi-agent and collaborative LLM systems
  • Example: MetaGPT

Quantization

  • Lower numerical precision (FP32 → INT8 / INT4)
  • Reduce memory footprint and computation cost
  • Main challenge: preserving accuracy

Hardware–Software Co-Design

  • Joint optimization of:
    • Algorithms
    • Systems
    • Hardware architecture
  • Enables orders-of-magnitude performance improvement

Examples:

  • AI accelerators
  • Memory-aware model design
  • Dataflow optimization

Golden Age of Computer Architecture (AI Era)

  • Moore’s Law slowing down
  • Performance gains come from:
    • Architectural innovation
    • Software–hardware interface redesign
    • Domain-specific accelerators

Multi-Agent Systems

  • Multiple LLMs with specialized roles
  • Mimic human organizational structure
  • Improve performance on complex tasks

Example:

  • MetaGPT