Microsoft has introduced Fara-7B, a 7B-parameter small language model tailored for computer task automation. Utilizing a multimodal decoder architecture, Fara-7B processes screenshot images and textual context to predict operational actions and thought chains. The model, based on Qwen 2.5-VL (7B), supports a 128k context length and was trained on 64 H100 GPUs over 2.5 days. Released under the MIT license, it can execute tasks like booking restaurants and planning trips by interpreting browser inputs and predicting actions.
Fara-7B employs safety measures, including post-training methods and key-point recognition, to avoid policy violations and halt operations at critical points, such as when entering personal data. The model is available for deployment via GitHub, vllm, and fara-cli tools, facilitating automation of web-based tasks.
Microsoft Unveils Fara-7B, a 7B-Parameter Model for Computer Task Automation
Disclaimer: The content provided on Phemex News is for informational purposes only. We do not guarantee the quality, accuracy, or completeness of the information sourced from third-party articles. The content on this page does not constitute financial or investment advice. We strongly encourage you to conduct you own research and consult with a qualified financial advisor before making any investment decisions.
