Meta has introduced MobileMoE, a Mixture of Experts (MoE) model optimized for mobile devices, achieving significant performance improvements on smartphones. MobileMoE-S demonstrated up to 3.8x faster input processing on the iPhone 16 Pro's GPU/MLX backend compared to dense models, while maintaining comparable memory usage and accuracy. This advancement marks the first efficient MoE inference on commercial smartphones, leveraging increased DRAM capacity. The MobileMoE model retains a decoder-only Transformer architecture, replacing dense feed-forward layers with MoE layers. It uses a four-stage training process, including pre-training on 6 trillion tokens and quantization-aware training. Despite slight performance declines post-quantization, MobileMoE maintains competitiveness, outperforming other models like OLMoE-1B-7B in certain benchmarks. Future improvements will focus on refining instruction-following capabilities and optimizing memory usage for real-world inputs.