A recent paper by AMD and Pennsylvania State University reveals that the instability in FP4 training is due to structural scaling errors, not insufficient randomness. The study, which successfully pre-trained the Llama 3.1-8B model on AMD's Instinct MI355X GPU using the MXFP4 format, achieved a 9–10% speedup over FP8 with only an 8–9% increase in token overhead. This marks the first complete experiment of large model pre-training on native FP4 hardware. The research highlights that the instability arises from the accumulation of structural errors along sensitive gradient paths, particularly during weight gradient computations. Traditional methods that introduced randomness failed to stabilize training, whereas deterministic Hadamard rotation effectively reduced token overhead and maintained convergence quality close to FP8. This breakthrough suggests that FP4 can be viable for training, potentially doubling the training compute resources on existing hardware.