Perplexity has disclosed its post-training process for a web search agent, utilizing open-source models Qwen3.5-122B-A10B and Qwen3.5-397B-A17B. The process involves a two-stage approach: supervised fine-tuning (SFT) for establishing deployment behaviors, followed by online policy reinforcement learning (RL) to enhance search accuracy and efficiency. The RL stage employs the GRPO algorithm, using a synthetic multi-hop QA dataset and general dialogue data to maintain instruction adherence and prevent behavioral degradation.
The post-trained Qwen3.5-397B-SFT-RL model demonstrates superior performance on search benchmarks, achieving 57.3% accuracy on FRAMES with a single tool call, surpassing GPT-5.4 and Sonnet 4.6. With a moderate budget, its accuracy reaches 73.9% at $0.02 per query, outperforming competitors in both accuracy and cost-efficiency.
Perplexity Unveils Post-Training Method for Enhanced Web Search Agent
Disclaimer: The content provided on Phemex News is for informational purposes only. We do not guarantee the quality, accuracy, or completeness of the information sourced from third-party articles. The content on this page does not constitute financial or investment advice. We strongly encourage you to conduct you own research and consult with a qualified financial advisor before making any investment decisions.
