Microsoft has introduced Fara-7B, a 7B-parameter small language model tailored for computer task automation. Utilizing a multimodal decoder architecture, Fara-7B processes screenshot images and textual context to predict operational actions and thought chains. The model, based on Qwen 2.5-VL (7B), supports a 128k context length and was trained on 64 H100 GPUs over 2.5 days. Released under the MIT license, it can execute tasks like booking restaurants and planning trips by interpreting browser inputs and predicting actions. Fara-7B employs safety measures, including post-training methods and key-point recognition, to avoid policy violations and halt operations at critical points, such as when entering personal data. The model is available for deployment via GitHub, vllm, and fara-cli tools, facilitating automation of web-based tasks.