Large language models typically perform so similarly that their differences can be measured by millimeters. But in some scenarios, these models are...