We present the second part of the rigorous evaluation of state-of-the-art machine learning force fields (MLFFs) within the TEA Challenge 2023. This study provides an in-depth analysis of the performance of MACE, SO3krates, sGDML, SOAP/GAP, and FCHL19* in modeling molecules, molecule-surface interfaces, and periodic materials. We compare observables obtained from molecular dynamics (MD) simulations using different MLFFs under identical conditions. Where applicable, density-functional theory (DFT) or experiment serves as a reference to reliably assess the performance of the machine learning models. In the absence of DFT benchmarks, we conduct a comparative analysis based on results from various MLFF architectures. Our findings indicate that at the current stage of MLFF development, the choice of the ML model is becoming less critical, as the resulting simulations are largely independent of the MLFF architecture. Instead, emphasis should be placed on developing complete, reliable, and representative training datasets. Nonetheless, long-range noncovalent interactions remain challenging for all MLFF models, necessitating special caution in simulations of physical systems where such interactions are prominent, such as molecule-surface interfaces. The findings presented here reflect the state of MLFF models as of October 2023.