Sekėjai

Ieškoti šiame dienoraštyje

2025 m. rugsėjo 5 d., penktadienis

How Does Today’s AI Work? China Won't Get Addicted to America's Chips


Today's AI primarily works through machine learning and neural networks, allowing systems to learn from massive amounts of data without explicit programming. While China has relied on U.S. chips for AI development, recent reports suggest it is actively pursuing indigenous alternatives and is unlikely to remain dependent on American technology.

 

“As Rush Doshi and Chris McGuire note, selling chips to China would liquidate U.S. leverage, turning scarcity into stockpiles and handing Beijing the parts and know-how to close the innovation gap (Letters, Sept. 2). The notion that we can get China addicted to our chips ignores Communist Party doctrine. Beijing's strategy seeks to develop indigenous and controllable tech to achieve self-reliance from the West. To that end, the state gate-keeps market access, data and content through licensing, cybersecurity and procurement rules, as well as other conditional, revocable permissions.

 

The recent H20 episode illustrates the point. After Washington licensed sales, Beijing told firms to make a priority of domestic alternatives. Companies may exploit brief windows to stockpile -- much as they bought A800/H800s in 2022-23 before that loophole closed -- but the policy direction is unchanged. Beijing will take what it needs today to reach self-reliance tomorrow.

 

Critics of export controls are right that some contraband chips will leak. But that proves that controls bite. What reaches the market are small, warranty-free, irregular lots -- not the steady, uniform, high-volume supply that large-scale training requires. When enforced, controls raise costs, stretch timelines and cap scale by constraining the chokepoints that matter: training-class accelerators [1], HBM [2], EDA tools [3] and lithography.

 

Some suggest that CUDA, Nvidia's software for programming chips, would "tether" China to the American AI ecosystem. Yet authorizing supported CUDA deployments would replace today's gray-market trickle with stable, vendor-backed systems that train Chinese engineers and accelerate domestic substitution. Beijing could then shut down access at will, trading our leverage for its own learning.

 

Craig Singleton

 

Fdn. for Defense of Democracies

 

Washington” [5]

 

 

 

 

 

 

1. AI training-class accelerators are specialized hardware components designed to speed up the process of training machine learning and deep learning models. These processors are essential for handling the massive computational and data-processing demands of complex AI tasks, which general-purpose CPUs cannot manage efficiently.

The main types of AI training accelerators include:

 

    Graphics Processing Units (GPUs): The most common type of accelerator, GPUs are highly effective for AI training due to their massive parallel processing capabilities. Originally developed for graphics rendering, their ability to execute thousands of operations simultaneously is ideal for the tensor and matrix computations common in deep learning. Key players include NVIDIA and AMD.

    Tensor Processing Units (TPUs): Developed by Google, TPUs are custom-designed Application-Specific Integrated Circuits (ASICs) built specifically for machine learning workloads, especially with the TensorFlow framework. They excel at the large-scale matrix multiplications that form the core of deep learning [4], offering high performance and energy efficiency. TPUs are primarily available through cloud services like Google Cloud.

    Field-Programmable Gate Arrays (FPGAs): FPGAs are reconfigurable integrated circuits that can be reprogrammed after manufacturing to perform specific tasks. This flexibility allows developers to create custom hardware logic for unique AI workloads, striking a balance between the speed of ASICs and the flexibility of GPUs.

    Wafer-Scale Engines (WSEs): Representing a more recent and powerful development, WSEs integrate an entire processor onto a single, massive silicon wafer. Companies like Cerebras have created WSEs that feature a large number of interconnected AI cores, providing a significant boost in computational power for training very large, complex models.

    Cloud-based custom silicon: In addition to Google's TPUs, major cloud providers offer their own custom chips for AI workloads. For example, Amazon Web Services (AWS) provides Trainium chips specifically to reduce the cost and accelerate the training of machine learning models.

 

Key components of an AI training system

Beyond the core accelerator chip, a complete training system includes other components optimized for AI:

 

    Memory: Training massive models and processing large datasets requires a significant amount of high-speed memory.

        High-Bandwidth Memory (HBM): This specialized memory technology, used in high-end GPUs and other accelerators, provides extremely fast data transfer speeds.

        Video RAM (VRAM): Dedicated memory on GPUs that stores the large amounts of data and parameters needed for parallel processing.

    Networking: For training extremely large models, multiple accelerators must communicate rapidly. High-speed, high-bandwidth networking is critical for connecting thousands of chips within a data center.

    Cooling and power supply: The computational intensity of AI training generates enormous heat and consumes large amounts of power. High-performance systems require robust cooling systems and powerful, efficient power supplies to ensure reliable, sustained performance.

 

Training accelerators vs. inference accelerators

It is important to distinguish between accelerators designed for training and those for inference.

 

    Training accelerators: These are optimized for the intensive, large-batch, and highly parallel computations required for the initial training of a model.

    Inference accelerators: These are optimized for the lower-latency, smaller-batch computations needed for using a trained model to make predictions in a production environment. While training chips can perform inference, specialized inference chips are often more cost- and energy-efficient for that specific task.

 

2. HBM (High Bandwidth Memory) refers to a type of DRAM with a 3D-stacked architecture that provides significantly higher data transfer speeds and more memory capacity than traditional DRAM. For AI, HBM is crucial because it directly feeds the immense data demands of AI models, enabling faster and more efficient training and inference processes for applications like generative AI and large language models (LLMs). Major manufacturers of HBM include SK Hynix, Samsung, and Micron, with SK Hynix currently leading the market.

 

    3D Stacking:

    Instead of laying memory chips flat, HBM vertically stacks them, creating a dense, compact, and power-efficient memory solution.

 

Silicon Interposer:

This 3D stack is connected to the processor (like a GPU) via a silicon interposer, a complex platform etched with thousands of traces for rapid data transfer.

High Bandwidth:

The stacked design and short connections reduce the physical distance data travels, resulting in much faster data throughput (bandwidth) compared to traditional memory.

Reduced Power Consumption:

Despite its high performance, the compact and efficient design of HBM can also lead to better power efficiency than some alternatives.

 

Impact on AI

 

    Faster Training:

 

AI models, especially large ones, require vast amounts of data during training. HBM's high bandwidth allows this data to be fed into the AI processors much faster, significantly reducing training time.

Efficient Inference:

Similarly, HBM helps to reduce bottlenecks during the inference stage, where the AI model is used for predictions or generating content, leading to more responsive AI applications.

Enabling Advanced AI:

The increased capabilities provided by HBM are essential for the continued advancement and scaling of complex AI applications, such as generative AI, which have redefined computational demands.

 

Key Manufacturers

 

    SK Hynix: Currently the market leader, having pioneered HBM technology.

 

Samsung: A major player in the memory market.

Micron: An increasing presence, focusing on efficiency and innovation in HBM products.

 

The Future of HBM in AI

 

    HBM4:

    The next generation of HBM, expected to significantly increase data rates and bandwidth, is already nearing standardization.

 

Increased Customization:

Future HBM products will likely include more customer-specific logic on the base die to manage and optimize memory for particular AI accelerators.

Continued Growth:

The market for HBM is projected to see significant growth, driven by the accelerating adoption of AI across various industries.

 

3. EDA (Electronic Design Automation) tools are software suites used by engineers to design, simulate, analyze, and verify complex electronic systems, integrated circuits (ICs), and printed circuit boards (PCBs). Key features include schematic capture, circuit simulation, layout design, and design verification, which automate parts of the design process to improve accuracy and efficiency.

 

Popular EDA tool vendors and their offerings include Cadence and Synopsys for IC design, Altium for PCB design, and Keysight for semiconductor modeling.

 

    Schematic Capture:

    Creating a graphical representation of the electronic circuit's components and their interconnections.

 

Simulation:

Predicting the behavior of a proposed circuit design before it's physically implemented.

Verification:

Ensuring the circuit design meets its intended functionality and performs correctly.

Layout Design:

Designing the physical arrangement of components on a PCB or integrated circuit.

Analysis:

Optimizing designs for performance, power consumption, and area.

Library Management:

Creating and managing libraries of electronic components and their associated data.

 

Examples of EDA Tools & Vendors

 

    Altium:

    Provides a comprehensive solution for PCB design and layout with features for routing, placement, and manufacturing file generation.

 

Cadence Design Systems:

Offers a broad range of EDA tools for designing complex semiconductor chips, covering simulation, design, and verification processes.

Synopsys:

Known for its AI-powered EDA solutions that automate tasks, optimize chip performance, and facilitate design migration.

Keysight:

Specializes in end-to-end modeling solutions for semiconductor devices, including device model extraction and PDK validation.

 

Benefits of Using EDA Tools

 

    Increased Productivity:

    Automating complex tasks streamlines the design workflow.

 

Improved Accuracy:

Reduces the risk of errors by simulating and verifying designs before production.

Cost Reduction:

Identifying and fixing design flaws early in the process minimizes expensive rework.

Faster Time-to-Market:

Accelerating the design and validation process helps get products to market quicker.

Complex Design Support:

Enables the creation of highly intricate electronic systems that would be impossible to design manually.

 

4. Large-scale matrix multiplications form the core of deep learning because they are the fundamental mathematical operations used to transform data within neural networks during both training and inference. Every key component of a neural network can be broken down into matrix multiplications, which also enable the efficiency required for modern AI at scale. 

 

The core mechanism of a neural network 

 

A neural network is a function that transforms input data into a desired output, like transforming an image into a label ("cat") or text into a generated response. This process occurs through a series of neuron layers, and matrix multiplication is central to every step. 

 

Forward propagation: When data moves through a network, it passes from one layer to the next. In a dense or fully connected layer, this is a matrix multiplication.

 

Input and weights as matrices: The input data (a batch of images, for instance) is represented as a matrix.

 

The network's learned parameters, called "weights," are stored in a separate matrix.

 

The calculation: The output of a layer is calculated by multiplying the input matrix by the weight matrix, then adding a bias vector. This is often followed by a non-linear activation function.

 

Output = activation(input x weights + bias)

 

Backpropagation: During training, the network adjusts its weights to minimize error.

 

 This process, called backpropagation, uses the chain rule of calculus to compute the gradient of the loss function with respect to the weights. This complex calculation is mathematically expressed as a series of large matrix multiplications. 

The chain rule is a calculus rule for finding the derivative of a composite function (a function within a function), such as f(g(x)). It states that the derivative of f(g(x)) is the derivative of the outer function f, evaluated at the inner function g(x), multiplied by the derivative of the inner function g(x). The formula is d/dx [f(g(x))] = f'(g(x)) * g'(x).

Matrix multiplication is used for all major layer types 

 

Beyond simple, fully connected layers, matrix multiplications are the core engine for more advanced neural network architectures. 

 

Convolutional layers: In a Convolutional Neural Network (CNN), convolutions can be refactored into highly efficient matrix multiplication operations. This allows GPUs to accelerate the process of detecting patterns like edges and textures in image data.

 

Transformer layers: The attention mechanism, which powers large language models, is built almost entirely on matrix multiplications. Queries, keys, and values are all represented as matrices, and multiplying them is how the model determines the relevance of different parts of the input sequence. 

 

In a Transformer's self-attention layer, the "key" is a vector representation derived from an input token that serves as a reference to identify relevant information from other parts of the sequence. The Transformer then compares this key with the "query" vectors from other tokens to calculate an "attention score," which determines how much importance each token should give to the other's "value" vector, effectively creating a weighted representation of the input sequence.

 

The key vectors are multiplied with the corresponding query vectors to create attention scores, indicating the similarity between different parts of the input.

 

These attention scores are then used to compute a weighted sum of the "value" vectors, where the weights determine the contribution of each value to the final output. This process allows the Transformer to focus on the most relevant information when constructing its output for each token.

 

Hardware is built for matrix multiplication 

 

The reliance on matrix multiplication is not just a mathematical convenience; it aligns perfectly with modern hardware architecture. 

 

GPU optimization: Graphical Processing Units (GPUs), which power most deep learning, are specifically designed for massive parallel processing. The repeated, independent calculations in a matrix multiplication operation are a perfect match for a GPU's architecture, allowing millions of computations to happen simultaneously.

 

Tensor cores: NVIDIA GPUs include specialized components called "Tensor Cores" that are optimized to accelerate mixed-precision matrix operations. This directly targets and speeds up the core computational task of deep learning. 

 

Summary: The synergy of math and hardware 

 

In essence, matrix multiplication is the language of deep learning. It's the mathematical primitive that enables data transformation in neural network layers and is the basis for updating weights during training. This approach is highly efficient for modern GPUs, creating a powerful synergy that makes large-scale deep learning possible and practical. 

 

5. China Won't Get Addicted to America's Chips. Wall Street Journal, Eastern edition; New York, N.Y.. 05 Sep 2025: A16. 


Komentarų nėra: