Sekėjai

Ieškoti šiame dienoraštyje

2025 m. liepos 16 d., trečiadienis

U.S. Commerce Secretary Howard Lutnick Says Nvidia Chips in China Benefit U.S.

 

“Commerce Secretary Howard Lutnick on Tuesday explained the Trump administration's turnabout in assuring Nvidia that it can sell its H20 artificial-intelligence chip in China, saying the U.S. wants China dependent on American technology.

 

Lutnick said on CNBC that the U.S. wants to stay one step ahead of what China can build so China will continue to buy U.S. semiconductors.

 

"We want to keep having the Chinese using the American technology stack, because they still rely upon it," he said, adding that Nvidia is selling China an older AI chip.

 

Nvidia, based in Santa Clara, Calif., said it has been assured by the Trump administration that it can sell its H20 artificial-intelligence chip in China.

 

This came days after Chief Executive Jensen Huang met President Trump, and followed the Commerce Department's April move to restrict sales of the H20 chip in China.

 

The U.S. decision to allow more Nvidia chips to flow to China again was viewed in Beijing as a gesture of good faith in trade talks, said people close to official thinking. Access to chips and advanced technology has been a main priority for Chinese negotiators.

 

Nvidia last week became the first company valued at more than $4 trillion. Its shares closed Tuesday at $170.70, up 4%.

 

The administration said Nvidia would be allowed to sell the H20 chip after licenses are granted by the Commerce Department, according to the company. Nvidia said it would soon resume deliveries of the chip, which was designed for customers in China, where it has been a top seller since 2024.

 

In addition, Huang said Nvidia has developed a new AI chip for China that he said would be useful for factory automation and logistics. The chip is built on the Blackwell architecture [1] -- Nvidia's most advanced on the market -- but is downgraded in some features to address U.S. officials' concerns about exports to China, people familiar with the chip said.

 

Huang long preferred to stay out of politics but has emerged as a central player in U.S.-China tensions in recent months, hopping from Beijing to Washington and back in the hopes of maximizing Nvidia's access to China and global markets.

 

Last week, he met Trump and made the case that his company should be allowed to continue to do business with China and tap AI talent there, according to people familiar with the meeting.

 

Huang told Trump that Nvidia should be allowed to sell its technology freely to most parts of the world so American companies can dominate AI instead of Chinese companies such as Huawei, the people said. The CEO also discussed similar topics with Lutnick, they said.

 

In May, Trump called Huang "my friend" at an event in Saudi Arabia and praised Nvidia's market share dominance in chip design.

 

This week, the CEO is in Beijing, at least the third time this year he has visited China. In meetings with top officials, Huang aimed to assure Beijing that his company would continue to do business in China to the extent that U.S. regulations allow, the people said.

 

Nvidia chips are vital for cutting-edge data centers that train AI models and operate AI applications, making Huang a popular figure in world capitals.

 

Huang's latest China trip has already drawn scrutiny in Washington. Last week, a pair of U.S. senators -- Jim Banks (R., Ind.) and Elizabeth Warren (D., Mass.) -- sent a letter to Huang about his trip, asking him to abstain from meeting with companies that are working with military or intelligence bodies in China.

 

Such concerns mean geopolitical risk remains high for Nvidia. The company has spent years modifying designs to meet U.S. export rules for China, only for the goal posts to be moved repeatedly.

 

In May, Huang called U.S. export-control policy a failure.” [2]

 

1. The Blackwell architecture is NVIDIA's latest GPU architecture designed for generative AI and accelerated computing, succeeding the Hopper architecture.

 

It's built to handle large language models (LLMs) and other complex AI workloads with improved performance, efficiency, and scalability.

 

Blackwell features include a new class of AI superchip with 208 billion transistors, a custom TSMC 4NP process, and a high-speed chip-to-chip interconnect. It also incorporates features like the second-generation Transformer Engine, confidential computing, and advanced NVLink.

Key Features and Improvements:

 

    New AI Superchip:

 

Blackwell GPUs are built with a custom TSMC 4NP process, featuring two reticle-limited dies connected by a 10 TB/s chip-to-chip interconnect, according to NVIDIA.

Second-Generation Transformer Engine:

Blackwell includes a more advanced version of the Transformer Engine, supporting new 4-bit floating point AI inference [3] capabilities and enabling larger models and faster processing.

Fifth-Generation NVLink:

.

This new NVLink technology delivers a groundbreaking 1.8 TB/s bidirectional throughput per GPU, facilitating communication between up to 576 GPUs for large-scale LLMs, says NVIDIA.

Confidential Computing:

Blackwell GPUs offer confidential computing capabilities, protecting sensitive data and AI models from unauthorized access.

RAS Engine:

A dedicated engine for reliability, availability, and serviceability is also included in the Blackwell architecture.

5nm Process Technology:

Blackwell is built on a 4NP process, which is a custom 5nm process, enabling higher transistor density and improved performance.

 

Impact and Applications:

 

    AI Factories:

    Blackwell is designed to power "AI factories," enabling efficient training and real-time inference for generative AI and LLMs.

 

Reduced Cost and Energy Consumption:

Blackwell is projected to enable organizations to run real-time inference on trillion-parameter LLMs at 25x less cost and energy consumption than its predecessors, notes Hyperstack Cloud.

Scalability:

The architecture is designed to handle the increasing scale of AI models, with features like the fifth-generation NVLink for improved GPU communication.

Various Industries:

Blackwell is expected to benefit industries like healthcare, cloud computing, and others that rely on large-scale AI and data processing.

 

Blackwell vs. Hopper:

 

    Blackwell succeeds the Hopper architecture, which was introduced in 2022.

 

Blackwell offers significant performance improvements over Hopper in AI training and inference, particularly for large language models.

Blackwell incorporates advancements like the second-generation Transformer Engine, enhanced NVLink, and confidential computing, which were not present in Hopper.

 

Availability:

 

    NVIDIA has not yet released specific timelines for the availability of all Blackwell-based products, but they are expected to begin shipping later this year.

 

Some Blackwell-based products, like the HGX B200 and GB200 NVL72, are already available.

 

 

2. U.S. News: Lutnick Says Nvidia Chips in China Benefit U.S. Hatcher, Nicholas; Huang, Raffaele.  Wall Street Journal, Eastern edition; New York, N.Y.. 16 July 2025: A2. 

 

 

3. 4-bit floating point AI inference

4-bit floating point (FP4) inference in AI is a technique for running trained neural networks using numbers represented by only four bits, drastically reducing the computational and memory requirements compared to higher precision formats like FP32 or FP16.

How it works

Quantization, the core process here, involves mapping the original, high-precision floating-point values (weights and activations) to a limited set of 16 distinct values representable by four bits. This mapping typically uses a scaling factor to adjust the numbers to fit within the 4-bit range. Various methods exist for this mapping, including FP4 (standard floating point with sign, exponent, and mantissa) and NF4 (normalized float) which is optimized for normally distributed variables like neural network weights, according to Medium. The quantized values are then processed by the model during inference.

Advantages

 

    Reduced Model Size: Significantly shrinks the size of the AI model, making it easier to store and deploy, especially on devices with limited memory. For example, a model requiring 32 GB of VRAM in FP32 might use less than 10 GB with FP4 quantization.

    Increased Inference Speed: By reducing the data size and computational complexity, FP4 inference can lead to faster predictions and lower latency, particularly important for real-time applications and edge devices.

    Energy Efficiency: Less computation means lower power consumption, which is critical for mobile and embedded systems.

    Scalability: Enables deployment of powerful AI models on consumer-grade hardware and makes cloud computing resources more efficient.

 

Challenges and considerations

 

    Accuracy Loss: The main drawback is a potential reduction in model accuracy, as representing numbers with fewer bits inherently sacrifices some precision. The extent of this loss can depend on the specific model, dataset, and quantization techniques used. For instance, 4-bit quantization typically results in a 2-5% accuracy drop compared to 8-bit quantization's less than 1% drop.

    Hardware Support: Efficient FP4 inference often requires specialized hardware support for low-precision operations, like Nvidia's Tensor Cores, according to zach's tech blog.

    Implementation Complexity: Quantization techniques can be complex to implement correctly and efficiently.

    Impact of Outliers: Extreme values in the original floating-point data can pose challenges for quantization, potentially requiring specific handling to minimize accuracy impact.

 

Applications

FP4 inference is particularly well-suited for scenarios where efficiency, speed, and deployment on resource-constrained devices are paramount. This includes:

 

    Edge devices: Mobile phones, IoT devices, and embedded systems where memory and processing power are limited.

    Large Language Models (LLMs): Significantly reducing the memory footprint of these models, allowing deployment on consumer GPUs.

    Computer Vision: Tasks like object classification, face recognition, and segmentation, where acceptable accuracy can be achieved with reduced precision.

    Speech and Natural Language Processing: Applications like keyword recognition and transformer-based models.

 

Conclusion

4-bit floating point inference is a rapidly evolving area of AI research and development. While it presents some challenges, especially in managing accuracy trade-offs, its potential to enable faster, more efficient, and more widely deployable AI models across various platforms makes it a highly promising and impactful technology.

Komentarų nėra: