llama.cpp has enhanced its local model inference speed by 78% through the implementation of MTP, a speculative decoding method. This improvement was highlighted in a tweet by victormustar, noting that the Qwen3.6-27B model's dense generation speed increased from 25 tokens per second to 45 tokens per second on an A10G GPU. The speed boost was achieved by using the flags --spec-type draft-mtp and --spec-draft-n-max 2 in llama-server. The information was shared via a personal tweet and not as an official announcement.