Alibaba's Qwen team has introduced Qwen-Image-Bench, an open-source benchmark designed to evaluate the text-to-image capabilities of large models. Accompanying this release is Q-Judger, a visual judge model trained on Qwen3.6-27B, which assesses models across five dimensions: image quality, aesthetics, text-image alignment, real-world fidelity, and creative generation. The benchmark includes 1,000 bilingual prompts and evaluates models on 56 detailed metrics. Initial evaluations show GPT Image 2 leading with a composite score of 64.69, excelling in all five categories. Other top performers include Nano Banana 2.0 and GPT Image 1.5. Alibaba's Qwen Image 2.0 Pro ranks fifth. The evaluation highlights common challenges in AI image generation, such as difficulties with human hand anatomy and physical laws representation.