Sekėjai

Ieškoti šiame dienoraštyje

2026 m. birželio 26 d., penktadienis

Anthropic Veterans in Mirendil and Chinese DeepSeek Are Trying to Help AI Efforts of Scientists


What is the closest Chinese competitor of Mirendil?

 

DeepSeek AI is widely considered the closest Chinese equivalent to Mirendil.

 

Both are recognized for their intense focus on autonomous AI research platforms (AI that builds AI) and state-of-the-art reasoning models trained under heavy compute constraints.

 

           Mirendil (US): A San Francisco-based "neo-lab" founded by ex-Anthropic and xAI researchers. Backed by a16z, they focus on self-accelerating AI R&D and autonomous modeling for scientific and medical research.

           DeepSeek (China): A Beijing-based infrastructure and foundation model lab. Like Mirendil, they are pioneers in AI reasoning and focus on optimizing models (such as DeepSeek-V4/R1) for high-performance deployment with fewer resources.

 

How did Mirendil come into existence?


“AI that can make smarter AI: That's the central bet of big artificial-intelligence labs to accelerate their own capabilities. But those labs restrict outsiders from using their tools to do the same thing.

 

Now two veterans of those big labs are wagering they can outflank their former employers by making and distributing AI that accelerates AI research for everyone.

 

Their startup Mirendil said on Wednesday that it raised $200 million in seed funding at a $1 billion valuation from venture firms Andreessen Horowitz and Kleiner Perkins and Nvidia, one of the larger seed valuations for a new AI company in recent years.

 

Mirendil's co-founders, Behnam Neyshabur and Harsh Mehta, say their hope is that building self-improving AI can help open-source AI developers keep pace with frontier labs. They say that accelerating AI research will allow scientists to develop their own specialized in-house AI models in fields like medicine or materials.

 

"What we are doing is kind of AI for AI for science, as opposed to AI for science," said Neyshabur, Mirendil's chief executive. He cited creating a model that can predict a person's risk of developing Alzheimer's disease as one way a customer might use Mirendil's future tools.

 

The two co-founders met in 2019 while working at Google. Mehta sent a cold email to Neyshabur, who just joined the company and was known for his work on understanding why AI models work. "He was kind of like a mini celebrity in the field already," Mehta said.

 

They had long been excited about the future possibilities of using AI to accelerate science, but "back then, the models were really bad," Mehta said.

 

They both moved to Anthropic in late 2024 and left in December 2025, shortly after the launch of the Claude Opus 4.5 model, which dramatically expanded the capabilities of so-called agents to perform complex tasks.

 

Mirendil's fundraising comes as top AI labs are themselves increasingly using AI to accelerate AI research. As of May, Anthropic said its Claude model wrote more than 80% of Anthropic's code.

 

But the company discourages other AI developers from using Claude to accelerate their own high-end AI development. Anthropic's terms of service prohibit using its tools to develop "any products or services that compete with our Services."

 

In a statement, Anthropic said its policies were standard among major model providers and help prevent foreign adversaries from eroding the U.S. lead in frontier AI.

 

When Anthropic recently released Fable 5, a safety-constrained version of its powerful Mythos model, it degraded responses to some questions about AI development without notifying users, a practice some critics called anticompetitive. The company said it subsequently made those safeguards visible to users. Shortly afterward, the company decided to suspend access to Fable 5 and Mythos 5 indefinitely after the Trump administration imposed export controls.

 

Anthropic has pointed to using AI to help build more advanced AI -- sometimes called recursive self-improvement -- as a potential danger. Some AI-safety researchers believe the ability of models to rewrite their own code without human oversight could lead to a scenario in which AI capabilities grow rapidly beyond human control. But Mehta and Neyshabur see recursive self-improvement as the "shortest path" to accelerating science, and believe it will be possible to safely supervise it. "I don't buy it when people just say, 'oh, this is not possible,'" Neyshabur said. "It's just a difficult problem."

 

Andreessen Horowitz investor Matt Bornstein said leading labs are simply being "rational economic actors" in denying customers the ability to supercharge their own models, creating a need for a startup like Mirendil. "Structurally, there has to be an independent company," Bornstein said.

 

The Information earlier reported some details of the funding.

 

Mirendil currently has about 20 technical staff, operating out of an office in downtown San Francisco. The founding team also includes Shayan Salehian, an early member of xAI, and Tara Rezaei, an MIT graduate.

 

It joins the legion of startups with a name referencing The Lord of the Rings -- in Elvish, Mirendil roughly means "friend of precious things." The startup plans in coming months to release a model and a product to get feedback from users.” [1]

 

How does “Mirendil” test, that next generation of AI model is better than previous one?

 

Mirendil tests whether a new AI model is better than its predecessor through recursive self-improvement loops. Instead of relying only on traditional static benchmarks, the company measures if the new model can independently improve the R&D process—generating, debugging, and evaluating its own neural network architectures to advance machine learning faster.

The startup’s evaluation methodology focuses on how well the AI runs this entire research loop:

           Autonomous Experimentation: Testing whether the new model can independently design and conduct tests, evaluate the results, and iterate on complex scientific problems.

           Model Debugging & Coding: Assessing the model's capacity to identify and fix errors in its own programming and data preparation phases.

           Architecture Iteration: Using reinforcement learning sandboxes where neural networks interact to refine their skills and architecture.

For more context on how this ex-Anthropic startup is approaching R&D, you can read the Andreessen Horowitz Mirendil Announcement.

 

Usually next-generation AI models are tested against previous versions by moving beyond static benchmarks and focusing on agentic task completion (how well an AI functions autonomously) and practical utility (real-world work) rather than relying exclusively on aggregate scores.


Leading AI development uses several distinct testing methodologies to prove a newer model is objectively better:

           Task-Based & Agentic Benchmarks: Developers utilize multi-step frameworks like SWE-Bench or GIAA2 to see if the new model can successfully solve real-world engineering and research problems without hallucinating.

           "LLM-as-a-Judge" Evals: A newer model's outputs are often scored against an older model's output by deploying a third-party, specialized "judge" model (e.g., using a model like Prometheus) to determine quality, reasoning, and instruction-following.

           Long-Context Stress Tests: New models are tested for "context rot" by embedding specific facts at various depths in massive, 1-million+ token documents to ensure the model accurately retrieves the data without losing focus.

           Real-World Sandbox Simulations: Advanced developers put autonomous models into virtual simulated societies with real-time news and variables to test reasoning, safety, and goal-completion over extended sessions.


1. Anthropic Veterans Seek to Help AI Efforts of Scientists. Li, Tina.  Wall Street Journal, Eastern edition; New York, N.Y.. 25 June 2026: B4.

Komentarų nėra: