Perceptron AI has unveiled its flagship multimodal model, Mk1, designed for video understanding and embodied reasoning. Founded by former Meta FAIR researchers Armen Aghajanyan and Akshat Shrivastava, the 14-member team aims to compete with industry giants like Google and OpenAI by offering Mk1 at a lower cost. The model excels in video temporal reasoning, capable of generating structured timeline analyses and detecting specific events within videos.
Mk1's capabilities extend to image processing, supporting pixel-level pointing, dense object counting, and complex OCR. It can convert documents into HTML, JSON, or Markdown, making it suitable for industrial applications such as dashboard digitization. For robotics, Mk1 outputs spatial primitives for policy models and can annotate teleoperated video recordings, reducing the need for manual annotation. The model is available through the Perceptron API and OpenRouter.
Perceptron AI Launches Mk1 Model, Challenging Google and OpenAI
면책 조항: Phemex 뉴스에서 제공하는 콘텐츠는 정보 제공 목적으로만 제공됩니다. 제3자 기사에서 출처를 얻은 정보의 품질, 정확성 또는 완전성을 보장하지 않습니다.이 페이지의 콘텐츠는 재무 또는 투자 조언이 아닙니다.투자 결정을 내리기 전에 반드시 스스로 조사하고 자격을 갖춘 재무 전문가와 상담하시기 바랍니다.
