An important task in several wellness applications is detection of emotional valence from speech. Two types of features of speech signals are used to detect valence: acoustic features and text features. Acoustic features are derived from short frames of speech, while text features are derived from the text transcription. In this paper, we investigate the effect of text on acoustic features. Some studies show that acoustic features of phones carry specific emotion information. We also observe that emotion words and the emotional valence of the spoken sentence need not always match (e.g. the usage of ‘not happy’). We thus propose that acoustic features of speech segments carrying emotion words must be treated differently from other segments that do not carry such words. In this paper, we propose that all speech segments carrying emotion words are excluded from the training set. Standard emotion words from a language, words from Plutchik’s wheel of emotion, and their synonyms are considered. We report performance results on the the Elderly Emotion Sub-Challenge corpus of the Computational Paralinguistics Challenge 2020. We show that exclusion of emotional words show significant improvements for both OpenSMILE (p < 0.05) and OpenXBoAW features (p < 0.01).