249
Views
1
CrossRef citations to date
0
Altmetric
NEW: The interplay between language and emotion

Age-related differences in processing of emotions in speech disappear with babble noise in the background

ORCID Icon, ORCID Icon, ORCID Icon & ORCID Icon
Received 12 Sep 2023, Accepted 26 Apr 2024, Published online: 19 May 2024

ABSTRACT

Older adults process emotional speech differently than young adults, relying less on prosody (tone) relative to semantics (words). This study aimed to elucidate the mechanisms underlying these age-related differences via an emotional speech-in-noise test. A sample of 51 young and 47 older adults rated spoken sentences with emotional content on both prosody and semantics, presented on the background of wideband speech-spectrum noise (sensory interference) or on the background of multi-talker babble (sensory/cognitive interference). The presence of wideband noise eliminated age-related differences in semantics but not in prosody when processing emotional speech. Conversely, the presence of babble resulted in the elimination of age-related differences across all measures. The results suggest that both sensory and cognitive-linguistic factors contribute to age-related changes in emotional speech processing. Because real world conditions typically involve noisy background, our results highlight the importance of testing under such conditions.

Effective social interactions are critical for health and well-being in older age. Communication difficulties can create barriers to social participation, prompting withdrawal and isolation. Understanding the factors affecting spoken communication abilities is thus vital for promoting healthy aging (Heinrich et al., Citation2016). A key aspect of social communication, and the focus of this study, is emotional speech comprehension. The literature points to age-related changes in the processing of emotions in speech. However, the mechanisms underlying these changes remain unclear (Ben-David et al., Citation2019). In the current study, we presented spoken emotional sentences to young and older adults under two types of background noise used to generate energetic masking or informational masking. The first type assesses a drop in performance due to sensory decrement, whereas the second assesses decrease due to joint sensory and cognitive changes with age.

Two channels of processing emotion in speech

Emotional information is conveyed via two auditory speech channels: semantics (the meaning of the words or of the entire sentence heard) and prosody (variations in fundamental frequency [F0], duration, intensity, and voice quality; Ding & Zhang, Citation2023). Age-related differences in emotional speech comprehension can be attributed to changes in in each channel separately or to changes in the relative dominance of each speech channel. Young adults were found to assign a larger relative weight to prosody over semantics, whereas older adults weighed prosody and semantics more equally. For example, the semantically happy sentence “I won the Lottery” spoken with an angry prosody was rated as angry by young adults (prosodic dominance), but similarly happy and angry by older adults (Ben-David et al., Citation2019).

Sensation: the role of poorer sensory performance

According to the sensory degradation hypothesis age-related changes in the auditory system degrade the quality of information, thus impeding speech understanding. Sensory degradation could impair both the quality of acoustic features that serve as prosodic cues, such as F0, speech duration and synchrony, as well as the ability to correctly process semantics (Schneider et al., Citation2010). Recently, Dor et al. (Citation2022a) have directly compared recognition thresholds for semantic and for prosodic emotions presented on the background of different levels of steady-state noise. The higher (i.e. worse) recognition thresholds found for older adults suggested sensory degradation. However, the fact that both older and young listeners had lower (better) recognition thresholds for emotional prosody compared to semantics (to a similar extent) led the authors to conclude that the reduced prosodic dominance in older adults cannot be solely attributed to sensory factors, suggesting a role for cognition as well. Similarly, Wright et al. (Citation2018) proposed that specific impairments in recognition of emotional prosody among individuals with neurological impairments (e.g. dementia) may stem from deficits in perceptual processing of basic acoustic features, or from a cognitive difficulty to match the sensory input to abstract representations.

Cognition: the role of cognitive changes

There is evidence that aging is associated with diminished cognitive resources available for processing, specifically, working-memory capacity. This age-related cognitive change is associated with impaired speech perception, especially under adverse listening conditions (Schneider et al., Citation2010). Age-related cognitive changes can affect the processing of emotional stimuli, both in the semantics and in the prosody, as well as their integration. Recent studies have shown that individuals with smaller working-memory span were slower to process spoken words (Nitsan et al., Citation2022). Cognition plays an even larger role in processing the semantic content of the full sentence, as listeners need to retain each of the words until they form a cohesive unit (Rönnberg et al., Citation2019). Accruing and maintaining acoustic components related to prosody (e.g. rhythm, frequency) also call for working-memory resources. Indeed, the processing of emotional prosody was related to older adults’ cognitive resources and was impaired by concurrent tasks that taxed working memory (Baglione et al., Citation2023).

In addition, the ability to successfully inhibit irrelevant information decreases with aging, often observed as an impairment in selective attention (Schneider et al., Citation2010). With respect to emotional speech processing, if inhibition is limited, older adults may not be able to process the emotion conveyed in one channel (e.g. semantic) separately from that conveyed by the other (prosodic). As a result, older adults may unwittingly integrate the emotional information provided from both channels even when they attempt to focus on only one. Indeed, when asked to focus only on one speech channel and ignore the other, larger failures of selective – attention were documented for older – than for young-adults, both in quiet and in noise (Ben-David et al., Citation2019; Dor et al., Citation2022a).

Two maskers: energy-based and information-based

In the lab, speech is usually presented under optimal conditions, whereas in daily life it is interfered by noise (Wu et al., Citation2018). Noise impedes speech perception both via energetic masking and via informational masking. For the former, noise can activate the same regions on the basilar membrane as the speech signal, masking its content (Heinrich et al., Citation2008). Several studies have found that adjusting the Signal-to-Noise Ratios (SNR) of energetic masking to match young– vs. older- adults’ threshold differences can somewhat offset age-related changes in speech processing (Schneider et al., Citation2010). For informational masking, many real-life situations involve listening to speech in the background of “babble" – multiple overlapping conversations and indistinct voices. Babble noise interferes with many aspects of speech processing, especially at older age (Schneider et al., Citation2010). Successful segregation of speech in informational masking is attention-demanding, which may explain the unique challenge for older adults (Heinrich et al., Citation2008). When speech is presented in babble, listeners may involuntarily engage in phonological and semantic processing of the background, interfering with the perception of target speech and reducing available resources (Mattys et al., Citation2009). Babble was also shown to impair the processing of emotional prosody even with the semantic content unavailable to the listeners (e.g. in a foreign language; Scharenborg et al., Citation2018).

To recap, energetic and informational masking present different challenges. The former involves mainly a sensory interference, whereas the latter can involve cognitive and linguistic interference, too.

The current study

In the current study, we compared the processing of spoken emotions between young- and older- adults under different types of noise. We used the validated online Test for Rating of Emotions in speech (iT-RES; Ben-David et al., Citation2021). Older and young listeners performed the test under two types of masking: fluctuating speech-spectrum noise (SSN; energetic masker) and babble noise (BN; informational masker). To partially accommodate for age-related sensory degradation, the SNR was larger by 4 dB for older adults under both types of noise. This difference is well documented in the literature, leading to similar rates of word recognition accuracy in both age groups (e.g. Ben-David et al., Citation2011). Performance in noise was compared to performance in quiet that was observed earlier using the same test (Dor et al., Citation2022b). To increase external validity, the tests were conducted at the participants’ homes.

Our pre-registered hypotheses referred to three types of effects.

  1. Age-related effects: We expected older adults as compared to young adults to exhibit worse identification of emotions, larger failures of selective attention, and lower dominance of the prosodic over the semantic channel.

  2. Noise-related effects: Compared with performance in quite, we expected both noise types to impair identification (for both age groups), to increase failures of selective attention (for both age groups), and to curtail prosodic dominance (for young adults).

  3. Differential noise effects: Compared to performance in quiet, we expected babble noise to affect both groups, but to affect older participants to a larger extent.

Materials and methods

Transparency and openness

Data sets generated for this study, the analytic code, and Supplemental materials are available (https://osf.io/jf4nh/). The study design, main hypotheses, and analytic plan were preregistered (https://aspredicted.org/cy89a.pdf).

Participants

Fifty-one young adults (YA), undergraduate students from Reichman University (36 women; age range between 19–28 years, M = 23.65 years, SD = 1.62), and 42 older adults (OA) from the larger Herzlyia community (29 women; range = 60–85 years, M = 68.93 years, SD = 5.34) were included in the study. The participants followed the recruitment and screening procedure described by Dor et al. (Citation2022b) and met the pre-registered inclusion criteria (See Supplemental Appendix A for Power analysis and detailed description of inclusion criteria). Participants were randomly assigned into one of two noise groups: wideband Speech-Spectrum Noise (SSN; N = 24 YA and 21 OA) or Babble Noise (BN; N = 27 YA and 21 OA). Performance in these two conditions was compared to the benchmark performance in quiet as reported by Dor et al. (Citation2022b; young: 35 women, 6 men; M = 23.10 years, SD = 1.74; older: 23 women, 15 men; M = 68.68 years, SD = 4.11). The participants did not differ across the three conditions (quiet, SSN and BN) in basic demographic characteristics (see Supplemental Table S1).

Apparatus and stimuli

Test for assessing emotion in speech

Our main tool, the online version of T-RES in Hebrew (Ben-David et al., Citation2021) is composed of 30 spoken sentences. Each sentence can carry emotions via semantics, via prosody, or both (for detailed description of stimuli see Supplemental Appendix B). Three emotions were examined: anger, happiness, and sadness. Three types of sentences were presented in the test: congruent sentences in which the same emotion appeared in both channels; incongruent sentences which carried different emotional content in prosody and semantics; baseline sentences, in which only one channel carried emotional content, and the other was kept emotionally neutral.

The test included three tasks: Rating of emotion via prosody; Rating of emotion via semantics; Rating of emotion based on both speech channels (General-rating). Based on these tasks, the T-RES assesses four variables related to the processing of emotions in speech.

Identification: Detection of the specific emotion presented via prosody or via semantics of the spoken sentence.

Selective Attention: Interference by the unattended speech channel to performance in the attended channel. This also reflects the ability to inhibit one of the speech channels.

Integration: Detection of the emotion conveyed by the spoken sentence based on joint information from both channels.

Channel Dominance: Relative influence of prosody and semantics in determining the emotion conveyed by the full spoken sentence. If the former is greater than the latter, we witness prosodic dominance.

Supplemental Appendix B presents the full makeup of the test, stimuli and all related variables.

Types of noise

Speech stimuli were presented on the background of either a fluctuating wideband speech-spectrum noise, or an eight-female-talker babble noise (both taken from Mama et al., Citation2018). Both noise types used in this study were fluctuating and had comparable audiological characteristics (see Mama et al., Citation2018). This means that, energetically, both types of noise masked the target speaker to the same extent, hence posing equal sensory challenge to the listener. In contrast, the babble noise contained fragments of meaningful speech (rather than meaningless noise), thus comprising an informational masker. As a result, the babble posed additional cognitive (and/or linguistic) challenge to the listener. Sentences were presented to older and young participants at SNR's of 0 and −4 dB, respectively. Supplemental Appendix C and Figure S1 present the full details of both noise types and SNR's used in the study.

Procedure

The remote use of T-RES (iT-RES) was validated for younger and older adults in a recent dedicated study (Dor et al., Citation2022b). The current study was approved by the ethics committee of Reichman University (protocol #2020507). After signing informed consent, the participants filled several preliminary questionnaires. Those who met inclusion criteria received the link to the iT-RES and performed the test on their personal computer at home (see Supplemental Appendix B).

Analysis

Rating of emotion in the sentencewas our dependent variable. The independent variables were age (young, older), noise type (SSN, BN, Quiet), and target emotion (anger, happiness, or sadness). The former two were manipulated between participants and the latter within participants. We conducted a series of multilevel model (MLM) analyses on SPSS V.20. The models also included a dedicated within-participant variable for each specific analysis (see Supplemental Appendix B for a detailed description and calculation of three test-specific variables: identification, selective attention, and channel dominance). The models included all two-way interactions of the abovementioned variables, and the three-way interaction of age, noise type, and the test-specific variable.

The tasks of rating-via-prosody, rating-via-semantics and General-rating were analyzed separately. To test for integration of emotions in the General-rating task we also included models that compare ratings on each type of trial separately (congruent trials that present the target emotion in both channels; prosodic-trials that present the target emotion only in prosody; semantic-trials that present the target emotion only in semantics; target-emotion-absent trials that do not present the target emotion in any of the channels). To test the specific effects of noise, these analyses were followed by models comparing quiet with SSN and quiet with BN, separately (see Supplemental Appendix D). We also tested the effects of age in each type of background noise (Quiet, SSN or BN), separately.

Results

Our main results are depicted in . The full MLM results and all tested effects are detailed in Supplemental Appendix D.

Figure 1. All effects of emotion are presented for young and older adults under three types of background noise, Quiet, SSN and BN. Error bars indicate 95% confidence intervals around the respective mean. Panels A and B: Shown are the differences in rating of baseline sentences containing the target emotion and sentences that do not. Panel A: Identification of emotion via prosody; Panel B: Identification of emotions via semantics. Panels C and D: Shown are the differences in rating of the same emotion between incongruent and congruent sentence. Panel C: Selective attention to prosody; Panel D: Selective attention to semantics. Panel E: Shown are the differences in rating between prosody (sentences that contain the target emotion only in prosody) and semantics (sentences that contain the target emotion only in semantics) in general rating of the sentence.

Figure 1. All effects of emotion are presented for young and older adults under three types of background noise, Quiet, SSN and BN. Error bars indicate 95% confidence intervals around the respective mean. Panels A and B: Shown are the differences in rating of baseline sentences containing the target emotion and sentences that do not. Panel A: Identification of emotion via prosody; Panel B: Identification of emotions via semantics. Panels C and D: Shown are the differences in rating of the same emotion between incongruent and congruent sentence. Panel C: Selective attention to prosody; Panel D: Selective attention to semantics. Panel E: Shown are the differences in rating between prosody (sentences that contain the target emotion only in prosody) and semantics (sentences that contain the target emotion only in semantics) in general rating of the sentence.

Identification of emotions

Prosody

As can be seen in A, identification of emotion by prosody was higher for young- than for older- adults, F(1,168.15) =  35.6, p < .001 (means of 3.1 vs. 2.27, respectively). Identification was affected by the type of noise, F(2,165.81) =  36.46, p < .001 (means of 3.33, 2.79 and 1.87 for Quiet, SSN and BN, respectively). We also detected a three-way interaction between presence of target emotion, age and noise type, F(2,166) =  5.06, p = .007, indicating that noise affected identification differently for young and older adults. Post-hoc tests showed significant age-related differences in Quiet and SSN, F(1,77) = 34.89, p < .001; F(1,43) = 27.27, p < .001, respectively, but not in BN (p = .854) [see also Supplemental Tables S2A and S3].

Semantics

As can be seen in B, identification of emotion by semantics was higher for young- than for older- adults, F(1,166.22) = 12.69, p < .001 (means of 3.46 vs 2.89, respectively). Identification was affected by the type of noise though, F(2,166.06) = 3.69, p = .027 (means of 3.28, 3.39 and 2.86 for Quiet, SSN and BN, respectively). In post-hoc tests, significant age-related differences in Quiet, F(1,77) = 21.21, p < .001, vanished under SSN and BN (p = .237 and p = .369, respectively) [see also Supplemental Tables S2A and S4].

In summary, older adults exhibited poorer emotion identification in speech than young adults for both prosody and semantics (H1). The addition of noise impaired identification for both age groups (H2), but it impacted each age group in different ways (H3). Age-related differences vanished under SSN and BN for semantics and under BN only for prosody.

Selective attention

Prosody

As can be seen in C, both age groups failed to selectively attend to prosody, F(1,161.07) = 84.33, p < .001. However, there were larger failures for older- than for young- adults, F(1,167.13) = 14.47, p < .001 (means of 0.61 vs. 0.29, respectively). We recorded a three-way interaction of congruency, age, and noise type, F(2,166) = 3.47, p = .033. Post-hoc tests revealed age-related differences in Quiet, F(1,77) = 10.86, p = .001, and under SSN, F(1,43) = 8.85, p = .005, but not under BN (p = .772) [see also Supplemental Tables S2B and S5].

Semantics

As shown in D, participants failed to selectively attend to semantics, F(1,162.48) = 4.80, p = .030. An interaction of congruency and noise, F(2,166.02) = 3.39, p = .036, indicated that failures of selectivity were impacted by noise type (means of 0.37, 0.11 and -0.02 for Quiet, SSN and BN, respectively). The three-way interaction of congruency, age, and noise type was significant, F(2,166) = 6.12, p = .003. Post-hoc tests found significantly larger failures of selective attention for older- than for young- adults in quiet, F(1,77) = 10.29, p = .002, but not under SSN or BN (p = .167 and p = .104, respectively) [see also Supplemental Tables S2B and S6].

In summary, both groups exhibited failures in attempts to selectively attend to emotion in prosody or semantics. However, larger failures of selective attention to prosody were found in older- than in young- adults (H1). Noise impacted selective attention to semantically conveyed emotion (H2). Noise impacted selectivity differently for young- than for older- adults (H3). SSN and BN removed age-related differences in selective attention to semantics, but only BN removed age-related differences in selective attention to prosody.

Integration and channel dominance

The Integration of emotional information from both semantics and prosody was tested via the ratings in the General-rating task. Results of dedicated analyses for each type of trial (congruent, prosodic, semantic and target-emotion-absent trials) are presented in Supplemental Appendix E, Tables S2C and S7.

Channel Dominance was tested by comparing the weights given to the prosodic versus the semantic channel (with a positive score indicating prosodic dominance). Inspection of Panel E of indicates overall prosodic dominance, F(1,166.46) = 118.09, p < .001. Prosodic dominance was still larger for young- than for older- adults, F(1,166.59) = 10.32, p = .002 (means of 1.30 and 0.71, respectively). Prosodic dominance decreased under noise, F(2,166.06) = 3.93, p = .021 (means of 1.30, 1.01, and 0.71, for Quite, SSN, and BN, respectively). The three-way interaction was also significant, F(2,166) = 3.20, p = .043. Post-hoc tests revealed that age-related differences in prosodic dominance appear in Quiet and SSN, F(1,77) = 14.46, p < .001; F(1,43) = 4.16, p = .048, respectively, but not under BN (p = .989) [see also Supplemental Tables S2D and S8 for channel dominance].

In summary, prosodic information was weighted higher than semantic information, especially for young adults (H1). This prosodic bias was reduced by noise (H2), but to a different extent for young- and older- adults (H3). Notably, age-related changes in the integration of emotional information evaporated under the cognitively challenging listening condition of BN.

To test the possibility of emotion-specific effects, Supplemental Appendix F includes tests conducted separately for the three tested emotions, for all age-related comparisons. ∼85% of these emotion-specific comparisons showed the same trend as the general results, suggesting that age-related differences were not based on one specific emotion.

Discussion

Older- and young- adults process emotions in speech differently. When speech is presented under ideal listening conditions, older adults (1) identify emotions conveyed via prosody or via semantics not as well as young adults; (2) fail to focus on one speech channel while inhibiting the other more than do young adults; and (3) rely on prosody over semantics to a lesser extent than young adults. The goal of the current study was to test the source of these age-related differences by manipulating the sensory input (presenting speech in noise vs. quiet), and the cognitive demand (presenting informational vs. energetic masking). We compared the performance of young- and older- adults when speech was presented on the background of speech-spectrum noise (SSN, energetic masker) or babble noise (BN, informational masker). Emotion processing was compared to benchmark performance in quiet. Our findings implicate both sensory and cognitive factors at the base of age-related differences.

Our first and second pre-registered hypotheses were generally confirmed. (1) Age-related effects: The main age-related effects were replicated across noise conditions. Older adults identified spoken emotions to a lesser extent than younger adults (in both channels); exhibited larger failures of selective attention (for rating by prosody); and showed smaller prosodic dominance (for General-rating). (2) Noise-related effects: Across age groups, noise impaired identification of emotions (in both channels); affected selective attention (for rating by semantics); and reduced reliance on emotion in prosody. (3) Age-related interactions: We found that the type of noise interacted with age-related differences on most measures, with BN generally showing stronger interactions than SSN. However, as can be seen in , BN affected the performance of younger adults more than that of older adults (and not vice versa), inconsistent with our third pre-registered hypothesis. summarizes the main findings of age-related differences under the three noise conditions.

Table 1. Summary of the main findings. Age-related differences under three types of background noise across the domains tested in the iT-RES.

An inspection of the three columns of tells the story. Most salient is the difference between performance in quite (left-hand column) and performance under BN (right-hand column). In quite we observed an age-related difference in almost all measures. In contrast, such differences vanished under BN on all measures. The fact that young-older differences collapsed under BN indicates the involvement of cognitive factors related to the processing of speech under informational masking, as this type of noise imposes attentional and linguistic challenges for the listener.

Performance under SSN (middle column) was somewhat in between. Under SSN, young-old differences vanished when emotion was conveyed through semantics but not when emotion was conveyed through prosody. The former finding is in line with previous findings in the literature, as in the current study the level of speech was augmented by 4 dB SNR for older adults. This 4 dB advantage was previously found to equate spoken word recognition accuracy between young and older adults (Ben-David et al., Citation2011). Hence, it stands to reason that age-related differences in processing semantic meaning of the full sentence will also be eliminated in this 4 dB SNR difference.

In contrast with processing of emotion conveyed via semantics, age-related differences did remain intact under SSN when emotion was conveyed via prosody. Likely, the relatively low level of noise used in this study was not sufficient to offset the segregation and processing demands presented by the prosodic channel. Indeed, the literature suggests that prosodic processing is cognitively demanding for older adults, and that age-related differences in the comprehension of emotional prosody cannot be accounted for by merely sensory changes (see Baglione et al., Citation2023 for a recent systematic review). On the neural level, age-related differences in prosodic processing were related to both anatomical and functional changes in cortical areas linked with cognitive processes (Giroud et al., Citation2019).

BN, as opposed to SSN, negatively affected most measures (see Supplemental Appendix D). As stated in our third hypothesis, the addition of BN to quiet also had a differential effect on older- and young- adults, but in the opposite direction than expected. BN had a large effect on identification and integration performance with young adults, but not with older adults. Most important, as can be seen in the right column of , all age-related differences tested in this study – identification, failure of selective attention, integration, and prosodic dominance – vanished when performing under BN.

Notably, this result cannot be solely attributed to unrecognizable speech, as identification scores show both groups could recognize the emotional content even in BN. In addition, variance in performance for older adults was highly similar to that for young adults (see for example 95% CI's presented in Table S2). This does not support the claim that individual differences in auditory processing affected the results. A possible explanation may be that as young adults exhibited higher baseline performance, they had more “room do decrease” in BN than older adults. Even if so, our conclusions still stand: When BN was imposed, age-related differences were eliminated. Mimicking the difficulty older adults experience when processing spoken emotions in noise, was best done in babble. With noise maskers matched for sensory interference, the difference in performance between conditions is mainly due to the competing information in the BN condition. Our results could be viewed in light of the trichotomy model (Ding & Zhang, Citation2023), proposing the interplay of cognitive (working memory), linguistic (semantic knowledge) and emotional processes in the recognition of spoken prosody. In that sense, the comparable performance found in BN hints that imposing a strain on the two former components for young adults, can mimic older adults’ performance on the latter, emotional, component.

A key finding was related to prosodic dominance, typically observed in young adults. With a 4 dB SNR advantage, the age-related difference in prosodic dominance disappeared in BN but not in SSN. This finding may seem counterintuitive, as the literature points to the unique challenge that BN presents for older adults (Schneider et al., Citation2010). However, the current outcome can be viewed in light of contemporary speech processing literature, suggesting that efficient processing of speech, specifically when the signal is distorted (e.g. under noise masking) is mainly based on available cognitive resources, task demands, and the listener’s effort. As available resources decrease (e.g. in aging) and/or task demands increase (e.g. BN) there is a need to exert more effort to complete the task efficiently (for relevant theoretical models see Pichora-Fuller et al., Citation2016; Rönnberg et al., Citation2019).

The smaller prosodic dominance for older adults in quiet might represent processing of a cognitively demanding task. Efficient speech processing calls for investment of much more cognitive effort for older adults than young adults (Harel-Arbeli et al., Citation2023). This increased cognitive investment by older adults may direct the system to a more integrative pattern, increasing the reliance on multiple sources of information (prosody and semantics), thus decreasing the bias towards emotional prosody. Indeed, both older age and decreased cognitive resources were shown to be associated with a more integrative processing strategy (Benichov et al., Citation2012).

Young adults, on the other hand, invest less cognitive effort in the listening task. Consequently, the system may opt to focus more on one basic aspect of emotional speech, namely, prosody (Myers et al., Citation2019). When speech is presented under energetic masking, task demands for young adults increase only marginally, resulting in a non-significant change in prosodic dominance. However, informational masking calls for increased effort, even for young adults, leading to a decrease in prosodic dominance. Under informational masking, a 4 dB SNR advantage for older adults can match both the sensory and cognitive challenges. Matching the effort exerted to complete the task erases age-related differences in the processing of emotion is speech.

Ecological validity, caveats and future directions

In daily communication, speech is often presented on a multi-talker background. In our study we tested speech under BN, with noise levels corresponding to levels measured in everyday environment (Wu et al., Citation2018). Our results suggest that age-related differences in spoken emotion processing may no longer be present in real-life situations. The difference in prosodic dominance found under ideal listening conditions in the laboratory vanished when the same stimuli were presented embedded in babble and the test was conducted at home. This finding aligns with the literature suggesting that age-related differences in emotion perception are mitigated when presenting real-life interpersonal situations or additional contextual information (Abo Foul et al., Citation2022).

While we tested fluctuating speech-spectrum- and eight-female-talker babble- noises, future studies could examine other noise types and auditory distortions (e.g. a steady-state noise, different kinds of babble) to widen the scope of our findings. Remote testing was based on a previous validation of the tool (see Dor et al., Citation2022b, for measures taken to deal with possible obstacles in remote administration). This increased ecological validity and the scope of population tested (given mobility challenges), but we could not directly control for participants’ hearing thresholds, the equipment used, and the sound level. Rather, loudness was individually calibrated by the participants, and auditory abilities were examined by a self-report questionnaire. Previous research has found this method to be suitable for assessing hearing abilities under complex listening conditions and a better predictor for auditory functioning (Heinrich et al., Citation2019). However, future studies may wish to replicate our results in a controlled lab setting.

Summary and implications

This study explored the interaction of sensory and cognitive factors in processing of emotional speech by young and older participants. The main message of our results is the demonstration that cognitive processes are involved in comprehending emotion in speech. Babble noise engages cognitive resources; under this type of noise, performance was on a par by young and older participants. The results also suggest that age-related decrement in performance as tested in the lab may be inflated. We highlight the importance of testing speech perception in realistic listening environments to understand communicative challenges faced by older adults.

Authors’ note

Portions of this study were presented at the 38th Annual Meeting of the International Society for Psychophysics (Fechner Day 2022, Lund, Sweden) and at the 10th Conference on Cognition Research of the Israeli Society for Cognitive Psychology (ISCOP 2023, Akko, Israel). Daniel Algom was supported in part by an Israel Science Foundation Grant (ISF-543-19). Boaz Ben-David was supported in part by an Israel Science Foundation Grant (ISF 1726/22). The authors have no conflicts of interest to disclose.

Disclosure statement

No potential conflict of interest was reported by the author(s).

Additional information

Funding

The author(s) reported there is no funding associated with the work featured in this article. We have now reported in the Central Article Tracking System the following: Daniel Algom was supported in part by an Israel Science Foundation Grant (ISF-543-19). Boaz Ben-David was supported in part by an Israel Science Foundation Grant (ISF 1726/22).

References

  • Abo Foul, Y., Eitan, R., Mortillaro, M., & Aviezer, H. (2022). Perceiving dynamic emotions expressed simultaneously in the face and body minimizes perceptual differences between young and older adults. The Journals of Gerontology: Series B, 77(1), 84–93. https://doi.org/10.1093/geronb/gbab064
  • Baglione, H., Coulombe, V., Martel-Sauvageau, V., & Monetta, L. (2023). The impacts of aging on the comprehension of affective prosody: A systematic review. Applied Neuropsychology: Adult, 1–16. https://doi.org/10.1080/23279095.2023.2245940
  • Ben-David, B. M., Chambers, C. G., Daneman, M., Pichora-Fuller, M. K., Reingold, E. M., & Schneider, B. A. (2011). Effects of aging and noise on real-time spoken word recognition: Evidence from eye movements. Journal of Speech, Language, and Hearing Research, 54(1), 243–262. https://doi.org/10.1044/1092-4388(2010/09-0233)
  • Ben-David, B. M., Gal-Rosenblum, S., van Lieshout, P. H. H. M., & Shakuf, V. (2019). Age-related differences in the perception of emotion in spoken language: The relative roles of prosody and semantics. Journal of Speech, Language, and Hearing Research, 62(4), 1188–1202. https://doi.org/10.1044/2018_JSLHR-H-ASCC7-18-0166
  • Ben-David, B. M., Mentzel, M., Icht, M., Gilad, M., Dor, Y. I., Ben-David, S., Carl, M., & Shakuf, V. (2021). Challenges and opportunities for telehealth assessment during COVID-19: iT-RES, adapting a remote version of the test for rating emotions in speech. International Journal of Audiology, 60(5), 319–321. https://doi.org/10.1080/14992027.2020.1833255
  • Benichov, J., Cox, L. C., Tun, P. A., & Wingfield, A. (2012). Word recognition within a linguistic context. Ear & Hearing, 33(2), 250. https://doi.org/10.1097/AUD.0b013e31822f680f
  • Ding, H., & Zhang, Y. (2023). Speech prosody in mental disorders. Annual Review of Linguistics, 9(1), 335–355. https://doi.org/10.1146/annurev-linguistics-030421-065139
  • Dor, Y. I., Algom, D., Shakuf, V., & Ben-David, B. M. (2022a). Age-Related changes in the perception of emotions in speech: Assessing thresholds of prosody and semantics recognition in noise for young and older adults. Frontiers in Neuroscience, 16), https://doi.org/10.3389/fnins.2022.846117
  • Dor, Y. I., Algom, D., Shakuf, V., & Ben-David, B. M. (2022b). Detecting emotion in speech: Validating a remote assessment tool. Auditory Perception & Cognition, 5(3-4), 238–258. https://doi.org/10.1080/25742442.2022.2101841
  • Giroud, N., Keller, M., Hirsiger, S., Dellwo, V., & Meyer, M. (2019). Bridging the brain structure—brain function gap in prosodic speech processing in older adults. Neurobiology of Aging, 80, 116–126. https://doi.org/10.1016/j.neurobiolaging.2019.04.017
  • Harel-Arbeli, T., Palgi, Y., & Ben-David, B. M. (2023). Sow in tears and reap in joy: Eye tracking reveals age-related differences in the cognitive cost of spoken context processing. Psychology and Aging, 38(6), 534–547. https://doi.org/10.1037/PAG0000753
  • Heinrich, A., Gagné, J.-P., Viljanen, A., Levy, D. A., Ben-David, B. M., & Schneider, B. A. (2016). Social inquiry into well-being effective communication as a fundamental aspect of active aging and well-being: Paying attention to the challenges older adults face in noisy environments. Social Inquiry Into Well-Being, 2(1), 51–69. https://doi.org/10.13165/SIIW-16-2-1-05
  • Heinrich, A., Mikkola, T. M., Polku, H., Törmäkangas, T., & Viljanen, A. (2019). Hearing in real-life environments (HERE): structure and reliability of a questionnaire on perceived hearing for older adults. Ear & Hearing, 40(2), 368–380. https://doi.org/10.1097/AUD.0000000000000622
  • Heinrich, A., Schneider, B. A., & Craik, F. I. M. (2008). Investigating the influence of continuous babble on auditory short-term memory performance. Quarterly Journal of Experimental Psychology, 61(5), 735–751. https://doi.org/10.1080/17470210701402372
  • Mama, Y., Fostick, L., & Icht, M. (2018). The impact of different background noises on the production effect. Acta Psychologica, 185, 235–242. https://doi.org/10.1016/j.actpsy.2018.03.002
  • Mattys, S. L., Brooks, J., & Cooke, M. (2009). Recognizing speech under a processing load: Dissociating energetic from informational factors. Cognitive Psychology, 59(3), 203–243. https://doi.org/10.1016/j.cogpsych.2009.04.001
  • Myers, B. R., Lense, M. D., & Gordon, R. L. (2019). Pushing the envelope: Developments in neural entrainment to speech and the biological underpinnings of prosody perception. Brain Sciences, 9(3), 70. https://doi.org/10.3390/brainsci9030070
  • Nitsan, G., Banai, K., & Ben-David, B. M. (2022). One size does not fit all: Examining the effects of working memory capacity on spoken word recognition in older adults using eye tracking. Frontiers in Psychology, 13, 841466. https://doi.org/10.3389/fpsyg.2022.841466
  • Pichora-Fuller, M. K., Kramer, S. E., Eckert, M. A., Edwards, B., Hornsby, B. W. Y., Humes, L. E., Lemke, U., Lunner, T., Matthen, M., Mackersie, C. L., Naylor, G., Phillips, N. A., Richter, M., Rudner, M., Sommers, M. S., Tremblay, K. L., & Wingfield, A. (2016). Hearing impairment and cognitive energy: The framework for understanding effortful listening (FUEL). Ear & Hearing, 37(1), 5S–27S. https://doi.org/10.1097/AUD.0000000000000312
  • Rönnberg, J., Holmer, E., & Rudner, M. (2019). Cognitive hearing science and ease of language understanding. International Journal of Audiology, 58(5), 247–261. https://doi.org/10.1080/14992027.2018.1551631
  • Scharenborg, O., Kakouros, S., & Koemans, J. (2018). The effect of noise on emotion perception in an unknown language. Proceedings of the 9th International Conference on Speech Prosody, 364–368. https://doi.org/10.21437/SpeechProsody.2018-74
  • Schneider, B. A., Pichora-Fuller, K., & Daneman, M. (2010). Effects of senescent changes in audition and cognition on spoken language comprehension. In S. Gordon-Salant, R. D. Frisina, A. N. Popper, & R. R. Fay (Eds.), The aging auditory system (pp. 167–210). Springer New York. https://doi.org/10.1007/978-1-4419-0993-0_7
  • Shehabi, A. M., Prendergast, G., Guest, H., & Plack, C. J. (2022). The effect of lifetime noise exposure and aging on speech-perception-in-noise ability and self-reported hearing symptoms: An online study. Frontiers in Aging Neuroscience, 14, 510. https://doi.org/10.3389/FNAGI.2022.890010/BIBTEX
  • Wright, A., Saxena, S., Sheppard, S. M., & Hillis, A. E. (2018). Selective impairments in components of affective prosody in neurologically impaired individuals. Brain and Cognition, 124, 29–36. https://doi.org/10.1016/j.bandc.2018.04.001
  • Wu, Y. H., Stangl, E., Chipara, O., Hasan, S. S., Welhaven, A., & Oleson, J. (2018). Characteristics of real-world signal to noise ratios and speech listening situations of older adults with mild to moderate hearing loss. Ear & Hearing, 39(2), 293–304. https://doi.org/10.1097/AUD.0000000000000486