[go: nahoru, domu]

Jump to content

Emotional prosody: Difference between revisions

From Wikipedia, the free encyclopedia
Mvanfoss (talk | contribs)
Kechambers (talk | contribs)
m Kechambers moved page User:Emotional Prosody to Emotional Prosody: Moving into mainspace
(No difference)

Revision as of 12:57, 9 May 2012

This sandbox is in the article namespace. Either move this page into your userspace, or remove the {{User sandbox}} template. Template:WAP assignment


Emotional Prosody ...
Emotional Prosody is characterized as an individual's tone of voice in speech that is conveyed through changes in pitch, loudness, timbre, speech rate, and pauses which is different than linguistic and semantic information. It can be isolated from linguistics and interacts with verbal content (ie. sarcasm). It is perceived or decoded slightly worse than facial expressions but accuracy varies with emotions. Anger and sadness are perceived most easily followed by fear and happiness, with disgust being the most poorly perceived.[1]

Speech Acoustics

In the source-filter theory of speech production, speech sounds result from a combination of the energy created by the vibration of the vocal folds (vocal cords) and the filtering by the vocal tract above the larynx. Emotions research, focuses on source-related acoustic cues as opposed to filter related cues. Source-related cues are associated with vocal-fold vibration. Measures associated with F0 (fundamental frequency of speech which is perceived as pitch), jitter (variability in frequency of vocal fold vibration), and shimmmer (variability in amplitude of vocal fold vibration) are most often used. Filter related cues such as facial expressions are less examined but can be crucial to understanding emotional speech because they influence filtering effects. For example, sentences spoken while smiling, sound different from the same sentence spoken while frowning.[2]

Production of Vocal Emotion

Emotion research has primarily focused on the emotional speech produced by small numbers of actors or naive subjects. Portrayals of fear, joy, and anger are associated with a higher frequency (pitch) of speech, while portrayls of sadness are associated with lower frequencies in comparison to neutral speech. Researchers have noted that vocally expressed emotion depends on the valence (ie, relative pleasantness or unpleasantness) of the emotion but also the intensity with which it was expressed.[2]

Anger: Anger can be divided into two types: "anger" and "hot anger". In comparison to neutral speech, anger is produced with a lower pitch, higher intensity, more energy (500 Hz) across the vocalization, higher first formant (first sound produced) and faster attack times at voice onset (the start of speech). "Hot anger", in contrast, is produced with a higher, more varied pitch, and even greater energy (2000 Hz). [3]

Disgust: In comparison to neutral speech, disgust is produced with a lower, downward directed pitch, with energy (500 Hz), lower first formant, and fast attack times similar to anger. Less variation and shorter durations are also characteristics of disgust.[3]

Fear: Fear can be divided into two types: "panic" and "anxiety". In comparison to neutral speech, fearful emotions have a higher pitch, little variation, lower energy, and a faster speech rate with more pauses.[3]

Sadness: In comparison to neutral speech, sad emotions are produced with a higher pitch, less intensity but more vocal energy (2000 Hz), longer duration with more pauses, and a lower first formant. [3]

Perception of Vocal Emotion

Decoding emotions in speech includes three (3) stages. Determining acoustic features, creating meaningful connections with these features, and processing the acoustic patterns in relation to the connections established. In the processing stage, connections with basic emotional knowledge is stored separately in memory network specific to associations. These associations can be used to form a baseline for emotional expressions encountered in the future. Emotional meanings of speech are implicitly and automatically registered after the circumstances, importance and other surrounding details of an event have been analyzed. [4]

On average, listeners are able to perceive intended emotions exhibited to them at a rate significantly better than chance (chance=approximately 10%).[3] However, error rates are also high. This is partly due to the observation that listeners are more accurate at emotional inference from particular voices and perceive some emotions better than others.[2] Vocal expressions of anger and sadness are perceived most easily, fear and happiness are only moderately well perceived, and disgust has low perceptibility.[1] Additionally, vocal emotions can be identified across cultures indicating they may possess discrete acoustic-perceptual properties in the voice. This idea is supported by evidence that vocal expressions of anger, disgust, fear, sadness, and happiness/joy can be accurately recognized when listening to a foreign language.[4]

The Brain in Vocal Emotions

Language can be split into two components: the verbal and vocal channels. The verbal channel is the semantic content made by the speaker's chosen words. In the verbal channel, the semantic content of the speakers words determines the meaning of the sentence. The way a sentence is spoken however, can change its' meaning which is the vocal channel. This channel of language conveys emotions felt by the speaker and gives us as listeners a better idea of the intended meaning. Nuances in this channel are expressed through intonation, intensity, a rhythm which combined for prosody (see Prosody). Usually these channels convey the same emotion, but sometimes they differ. Sarcasm and irony are two forms of humor based on this in-congruent style.[5]

Neurological processes integrating verbal and vocal (prosodic) components are relatively unclear. However it is assumed that verbal content and vocal are processed in different hemispheres of the brain. Verbal content composed of syntactic and semantic information is processed the left hemisphere. Syntactic information is processed primarily in the frontal regions and a small part of the temporal lobe of the brain while semantic information is processed primarily in the temporal regions with a smaller part of the frontal lobes incorporated. In contrast, prosody is processed primarily in the same pathway as verbal content, but in the right hemisphere. Neuroimaging studies using functional magnetic resonance imaging (fMRI) machines, provide further support for this hemisphere lateralization and temporo-frontal activation. Some studies however show evidence that prosody perception is not exclusively lateralized to the right hemisphere and may be more bilateral. There is some evidence that the basal ganglia may also play an important role in the perception of prosody.[5]

Impairment of Emotion Recognition

As we age, it has been found that it gets increasingly difficult to recognize vocal expressions of emotion. Older adults have slightly more difficulty labeling vocal expressions of emotion, particularly sadness and anger) than young adults but have a much greater difficulty integrating vocal emotions and corresponding facial expressions. A possible explanation for this difficulty is that combining two sources of emotion requires greater activation of emotion areas of the brain, in which adults show decreased volume and activity. Another possible explanation is that hearing loss could have led to a mishearing of vocal emotional. HIgh frequency hearing loss is known to begin occurring around the age of 50 particularly in men. [6]

Considerations

Most research regarding vocal expression of emotion has been studied through the use of synthetic speech or portrayals of emotion by professional actors. Little research has been done with spontaneous, "natural" speech samples. These artificial speech samples have been considered to be close to natural speech but specifically portrayals by actors may be influenced stereotypes of emotional vocal expression and may exhibit intensified characteristics of speech skewing listeners perceptions. Another consideration lies in listeners individual perceptions. Studies typically take the average of responses but few examine individual differences in great depth. This may provide a better insight into the vocal expressions of emotions.[3]

References

  1. ^ a b "The Social and Emotional Voice" (PDF). Retrieved 29 March 2012.
  2. ^ a b c Bachorowski, Jo-Anne (1999). "Vocal Expression and Perception of Emotion". Current Directions in Psychological Science. 8 (2): 53–57. {{cite journal}}: Unknown parameter |month= ignored (help)
  3. ^ a b c d e f Sauter, Disa A. (1 November 2010). "Perceptual cues in nonverbal vocal expressions of emotion". The Quarterly Journal of Experimental Psychology. 63 (11): 2251–2272. doi:10.1080/17470211003721642. {{cite journal}}: |access-date= requires |url= (help); Unknown parameter |coauthors= ignored (|author= suggested) (help)CS1 maint: date and year (link)
  4. ^ a b Pell, Marc D. (7 November 2011). "On the Time Course of Vocal Emotion Recognition". PLoS ONE. 6 (11): e27256. doi:10.1371/journal.pone.0027256. Retrieved 1 May 2012. {{cite journal}}: Unknown parameter |coauthors= ignored (|author= suggested) (help)CS1 maint: unflagged free DOI (link)
  5. ^ a b Berckmoes, Celine (2004). "Neural Foundations of Emotional Speech Processing". Current Directions in Psychological Science. 13 (5): 182–185. {{cite journal}}: |access-date= requires |url= (help); Unknown parameter |coauthors= ignored (|author= suggested) (help)
  6. ^ Ryan, Melissa (2010). "Aging and the perception of emotion: Processing vocal emotions alone and with faces". Experimental Aging Research. 36 (1): 1–22. {{cite journal}}: |access-date= requires |url= (help); Unknown parameter |coauthors= ignored (|author= suggested) (help); Unknown parameter |month= ignored (help)