Automatic segmentation of brain anatomy has been a key processing step in quantitative neuroimaging analyses. An extensive body of literature has relied on Freesurfer segmentations. Yet, in recent years, the multi-atlas segmentation framework has consistently obtained results with superior accuracy in various evaluations. We compared brain anatomy segmentations from Freesurfer, which uses a single probabilistic atlas strategy, against segmentations from Multi-atlas region Segmentation utilizing Ensembles of registration algorithms and parameters and locally optimal atlas selection (MUSE), one of the leading ensemble-based methods that calculates a consensus segmentation through fusion of anatomical labels from multiple atlases and registrations. The focus of our evaluation was twofold. First, using manual ground-truth hippocampus segmentations, we found that Freesurfer segmentations showed a bias towards over-segmentation of larger hippocampi, and under-segmentation in older age. This bias was more pronounced in Freesurfer-v5.3, which has been used in multiple previous studies of aging, while the effect was mitigated in more recent Freesurfer-v6.0, albeit still present. Second, we evaluated inter-scanner segmentation stability using same day scan pairs from ADNI acquired on 1.5T and 3T scanners. We also found that MUSE obtains more consistent segmentations across scanners compared to Freesurfer, particularly in the deep structures.