The collection of a high number of pixel-based labeled training samples for tree species identification is time consuming and costly in operational forestry applications. To address this problem, in this paper we investigate the effectiveness of explanation methods for deep neural networks in performing weakly supervised semantic segmentation using only image-level labels. Specifically, we consider four methods:i) class activation maps (CAM); ii) gradient-based CAM; iii) pixel correlation module; and iv) self-enhancing maps (SEM). We compare these methods with each other using both quantitative and qualitative measures of their segmentation accuracy, as well as their computational requirements. Experimental results obtained on an aerial image archive show that:i) considered explanation techniques are highly relevant for the identification of tree species with weak supervision; and ii) the SEM outperforms the other considered methods. The code for this paper is publicly available at this https URL.