What System Differences Matter? Using L1/L2 Regularization to Compare Dialogue Systems