Qwen 3.6 27B Model Achieves 40 Tokens/s on RTX 3090

The Qwen 3.6 27B Dense model has demonstrated a processing speed of 40 tokens per second on an RTX 3090 24GB GPU, according to preliminary tests by user @sudoingX. The tests, conducted without quantization techniques or fused kernels, utilized direct Q4_K_M quantization via llama.cpp and successfully passed all 10 out of 10 tests. Additionally, a particle swarm benchmark was developed to evaluate the model's performance.

Source: Show Original

Disclaimer: The content provided on Phemex News is for informational purposes only. We do not guarantee the quality, accuracy, or completeness of the information sourced from third-party articles. The content on this page does not constitute financial or investment advice. We strongly encourage you to conduct you own research and consult with a qualified financial advisor before making any investment decisions.