AI Model Evaluation and the Spectrum of Better | Library | Long Arc Research