Running Data-Driven Evaluations of AI Engineering Tools | Atlas Bench