Sekėjai

Ieškoti šiame dienoraštyje

2026 m. balandžio 24 d., penktadienis

DeepSeek’s V4 Sequel Set to Extend China’s Reach in Open-Source A.I.


Will it be enough of a Mac Mini for DeepSeek V4 to learn from videos of my actions controlling events on my iPhone screen?

 

Yes, a modern Mac Mini (M4 or M4 Pro with 64GB+ RAM) is capable of running and learning from video data using the DeepSeek V4 series (specifically the 284B Flash model) locally, but it requires specific configurations, such as 4-bit quantization, to function within its RAM constraints.

DeepSeek V4 models can process multi-modal input (text, image, and video), with the "Flash" variant (284B parameters) designed to be efficient enough for smaller-than-datacenter hardware.

 

Key Considerations for This Use Case

 

    Model Selection: You should use DeepSeek-V4-Flash (284B parameters, 13B activated) rather than the V4 Pro (1.6T parameters). The Pro model is too large for a single machine, while the Flash variant can run on high-RAM Apple Silicon with proper quantization.

    Hardware Requirements (Mac Mini):

        Unified Memory (RAM): A minimum of 64GB is required for 4-bit quantized (Q4) models. 128GB is strongly preferred for smoother processing of long video contexts (up to 1M tokens).

        Processor: Apple Silicon M4 or M4 Pro chip.

    Performance: Local execution on a Mac Mini with 64GB RAM can provide "usable" inference speeds (estimated ~10-15 tokens/sec), enabling AI-based visual analysis of your iPhone screen, though it will not match data-center speeds.

    Software Stack: You will likely need to use tools like Ollama or LM Studio to run GGUF-quantized versions of the model, which are optimized for Apple Silicon (Metal Performance Shaders).

 

Limitations

 

    Learning Capability: While the model can "see" and analyze video frames of your actions, "learning" or retraining the model to adapt specifically to your actions (Fine-tuning) is more computationally demanding than just running the model (Inference). Local fine-tuning of a model this size on a single Mac Mini is impractical; you will need to perform "Inference with Context" (providing the video as context in the prompt) or use LoRA (Low-Rank Adaptation) techniques to make the fine-tuning process feasible on that hardware.

 

How to use LoRA (Low-Rank Adaptation) techniques to make the fine-tuning process feasible on Mac Mini?

Using LoRA (Low-Rank Adaptation) on a Mac Mini (Apple Silicon) is highly feasible by reducing memory requirements by up to 65%. By freezing the base model and training small, low-rank decomposition matrices (A and B), you can tune 7B+ parameter models on limited unified memory. MLX, Apple's machine learning framework, is the best tool for this, often paired with 4-bit quantization (QLoRA) for maximum efficiency.

 

Steps for LoRA Fine-Tuning on Mac Mini

 

    Environment Setup: Install the necessary libraries, ensuring you have the MLX framework installed for Apple Silicon optimization.

    Dataset Preparation: Prepare your training data in JSONL format, typically including prompts and responses, similar to the format described in this beginners guide.


    Configure LoRA: Apply LoRA to all linear layers (not just query/value) for the best performance. Set a low rank (e.g., r equals 4,8, or 16) to minimize trainable parameters.

    Run Training with MLX: Utilize MLX's native support for LoRA/QLoRA to train on your Mac’s unified memory.

    Merge and Export: After training, merge the LoRA adapters with the base model to use it for inference.

 

Tips for Mac Mini Efficiency

 

    Use QLoRA: Apply 4-bit quantization to the base model via MLX to significantly reduce the memory footprint.

    Adjust Rank (r): Lower r values (e.g., 4 or 8) mean fewer parameters and less memory.

    All-Layer Adaptation: For best results, apply LoRA to all linear transformer block layers.

    Batch Size & Dropout: Set small batch sizes and increase lora_dropout (e.g., 0.1) to improve regularization.

    Check Resources: This article from Medium offers a detailed guide to implementing these steps.


 

Note: For detailed practical tips and a deeper understanding of LoRA, you can read this article by Sebastian Raschka.


 

    Video Processing: Processing long videos requires high memory bandwidth to prevent bottlenecks.

 

Verdict: A 64GB-128GB M4 Mac Mini is an excellent choice for a private, local setup to analyze and learn from your screen action videos, provided you use the smaller V4 Flash model with 4-bit quantization.

 

How to do 4-bit quantization of the DeepSeek smaller V4 Flash model?

 

4-bit quantization of the DeepSeek-V4-Flash model (284B parameters) allows it to run on local hardware by reducing the memory footprint from FP16 to a manageable size, typically using NF4 (Normalized Float 4) or INT4 schemes. Given its 284B structure, this is best achieved using advanced loading techniques or specialized quantization toolkits like bitsandbytes

 

Here is the approach to doing 4-bit quantization on the DeepSeek-V4-Flash model:

 

1. Requirements for Quantization

  • VRAM/Memory: Even in 4-bit, a 284B model requires significant memory (likely ~140-150 GB for weights). A Mac Studio (192GB-256GB RAM) or multiple high-end GPUs (e.g., 2x H100 96GB) are recommended.
  • Libraries: transformers, accelerate, bitsandbytes, torch.
  • Checkpoint: DeepSeek-V4-Flash (standard Hugging Face format). 

2. Method A: 4-bit Loading with bitsandbytes (Easy Mode) 

 

You can load the model in 4-bit directly using Hugging Face's transformers library, which automatically applies bitsandbytes 4-bit quantization (NF4) during loading. 

 

python

import torch

from transformers import AutoModelForCausalLM, BitsAndBytesConfig

 

# Configure and load in 4-bit (NF4)

bnb_config = BitsAndBytesConfig(load_in_4bit=True, bnb_4bit_quant_type="nf4", bnb_4bit_compute_dtype=torch.float16)

model = AutoModelForCausalLM.from_pretrained(

    "deepseek-ai/DeepSeek-V4-Flash",

    quantization_config=bnb_config,

    device_map="auto",

    trust_remote_code=True # Required for custom architecture

)

Use code with caution.

 

3. Method B: Using llmcompressor (For Specialized Formats)

For post-training quantization (GPTQ) to produce weights for engines like vLLM, llmcompressor can be used.

python

from llmcompressor.transformers import SparseAutoModelForCausalLM, oneshot

from llmcompressor.modifiers.quantization import GPTQModifier

 

# Apply 4-bit quantization

model = SparseAutoModelForCausalLM.from_pretrained("deepseek-ai/DeepSeek-V4-Flash", device_map="auto")

recipe = GPTQModifier(targets="Linear", scheme="W4A16", bits=4)

oneshot(model=model, recipe=recipe)

model.save_pretrained("DeepSeek-V4-Flash-4bit")

Use code with caution.

 

4. Important Considerations for V4-Flash

  • Performance: While DeepSeek-V4-Flash uses native mixed-precision, 4-bit quantization further reduces memory usage at a potential cost to accuracy.
  • Memory Management: Utilizing device_map="auto" is essential for splitting the 284B model across multiple GPUs.
  • Pre-quantized Models: Check the deepseek-ai Hugging Face repository for pre-quantized GGUF versions to avoid high local computational costs.

 

No wonder newspapers are excited by this achievement:

 

“When the Chinese start-up DeepSeek published details about one of its artificial intelligence models last year, it sent shock waves through the tech industry.

 

The company said it had built its system by spending far less on computer chips than American rivals like OpenAI and Anthropic. It marked the start of what became known as China’s “DeepSeek moment,” shorthand for the belief that Chinese A.I. companies were ready to showcase their technical capabilities to the world.

 

The DeepSeek moment reflected a shift in the global A.I. landscape. The change was not only about lower costs, but also openness in how the technology is shared.

 

DeepSeek released its models as open source, which means others can freely use and modify them. By contrast, OpenAI and Anthropic kept their leading models proprietary. The episode demonstrated that an open-source system could perform almost as well as closed versions. In the months that followed, Chinese firms released dozens of other open-source models. By the end of 2025, these models made up a significant share of global A.I. usage.

 

On Friday, DeepSeek released a preview of V4, its long-awaited follow-up model, which it intends to open source. The new model excels at writing computer code, an increasingly important skill for leading A.I. systems. It significantly outperformed every other open-source system at generating code, according to tests from Vals AI, a company that tracks the performance of A.I. technologies.

 

DeepSeek released its new model just days after Moonshot AI, another Chinese start-up, introduced its latest open-source model, Kimi 2.6. While these systems trail the coding capabilities of the leading U.S. models from Anthropic and OpenAI, the gap is narrowing.

 

The implications are meaningful. Using A.I. to write code is faster and frees up human programmers to focus on bigger issues. It also means people can use DeepSeek’s latest release to power A.I. agents, which are personal digital assistants that can use other software applications on behalf of office workers, including spreadsheets, online calendars and email services.

 

As A.I. systems improve at writing computer code, they are also getting better at finding security vulnerabilities in software — a skill that is fundamentally changing cybersecurity. That means tools like DeepSeek’s can be used to both attack and defend computer networks.

 

Across tasks, DeepSeek V4 is on par with Moonshot’s latest model. “They are basically neck-and-neck,” said Rayan Krishnan, the chief executive of Vals AI.

 

In the months leading up to DeepSeek’s latest release, foreign rivals moved to pre-empt another round of glowing headlines. Silicon Valley’s A.I. giants, Anthropic and OpenAI, said DeepSeek had unfairly piggybacked on their technology through distillation, a process in which engineers mimic a rival model by querying it millions of times and copying its behavior.

 

The competition to build the best-performing A.I. systems has transformed into a geopolitical power struggle. While Silicon Valley leaders at Anthropic and OpenAI warn that their technology would be dangerous in the hands of autocratic countries, China has invested billions to become an A.I. superpower, viewing the technology as a critical engine of economic growth.

 

DeepSeek’s open-source models are central to this strategy. While many Western companies guard their most valuable models, China has embraced open source and almost all of its top-performing systems are widely available.

 

Even so, Chinese A.I. firms face major hurdles. Three U.S. administrations have imposed export controls limiting access to advanced chips needed for cutting-edge A.I. systems. And firms in Silicon Valley continue to outspend Chinese rivals in the race for top A.I. talent.

 

China’s push into open-source A.I. has become a major economic advantage at home, according to a new study by a U.S. congressional advisory body. With few barriers to use, the systems have spread across industries such as robotics, logistics and manufacturing. The study found that these industrial applications generate real-world data that are used to improve A.I. systems.

 

This approach has allowed Chinese tech firms to capture global influence, as programmers and engineers around the world adopt their systems to build new products.

 

From Lagos to Kuala Lumpur, developers on tight budgets are turning to Chinese open-source models because they are cheaper to run and therefore easier to experiment with. Last May, Malaysia’s deputy minister of communications said the country’s sovereign A.I. infrastructure would be built on DeepSeek’s technology.

 

Chinese open-source models accounted for roughly one-third of global A.I. usage last year, according to a study by OpenRouter, an A.I. model marketplace. DeepSeek was the most widely used, followed by models from Alibaba, the Chinese internet company.

 

That reflects a broader strategy. As Chinese companies expand abroad, making their systems open-source helps them gain traction with coders by offering cheaper, more accessible tools.

 

“Open source is the soft power of technology of the future,” said Kevin Xu, the U.S.-based founder of Interconnected Capital, a hedge fund that invests in artificial intelligence technologies. Mr. Xu and his fund do not invest in DeepSeek.

 

Wei Sun, a principal analyst in A.I. at Counterpoint Research in Beijing, said DeepSeek’s success paved the way for China’s tech giants to release their A.I. systems publicly rather than closely guarding them.

 

Alibaba has since emerged as a leader. Its Qwen model family has surpassed 1 billion downloads. ByteDance, parent company of TikTok, has also shared some details of its technology after spending $11 billion on A.I. infrastructure in 2024.

 

“The A.I. generation of open-source builders from China was arguably the biggest A.I. story in 2025,” Mr. Xu said. “The progress of the models, the pace of the releases, and the number of A.I. labs that both compete with each other but also seem to cheer each other on came fast and furious with no signs of slowing down.”” [1]

 

1. DeepSeek’s Sequel Set to Extend China’s Reach in Open-Source A.I. Tobin, Meaghan; Metz, Cade.  New York Times (Online) New York Times Company. Apr 24, 2026.

Komentarų nėra: