Yes, Moonshot AI's latest Kimi K2.5 model (released in early 2026) is designed to analyze screenshots and screen recordings to perform tasks. It is specifically built to understand visual layouts, user interfaces, and interactions, allowing it to translate what it "sees" into actions, code, or data analysis.
Capabilities
Visual Analysis: The agent can interpret images and videos, allowing it to understand UI elements, layouts, and interactions from screenshots.
Action Performance: Kimi K2.5 can act as an agent that takes over, such as automatically creating code based on a screenshot of a website layout or performing "visual debugging" by inspecting its own output.
Agent Swarm: It can coordinate up to 100 AI agents to work in parallel on complex, long-horizon tasks.
Cost
API Usage: Kimi K2.5 pricing is approximately $0.60 per 1M input tokens and $2.50 per 1M output tokens.
Subscription: Agent swarm capabilities for teams or heavy users require a paid subscription, which has been cited in ranges of $31–$159 per month.
Free Access: A free version is available on the website with usage limits.
Key Considerations
Target Audience: The visual-to-code and screen analysis features are heavily integrated into Kimi Code, a tool aimed at developers for building front-end interfaces, websites, and managing workflows.
Limitations: While capable, the technology is specialized in creating and analyzing web interfaces and code based on visuals rather than acting as a direct replacement for human input on all mobile touch screen applications.
Komentarų nėra:
Rašyti komentarą