[go: nahoru, domu]

Kwok et al., 2003 - Google Patents

Eigenvoice speaker adaptation via composite kernel principal component analysis

Kwok et al., 2003

View PDF
Document ID
4451613605683684189
Author
Kwok J
Mak B
Ho S
Publication year
Publication venue
Advances in Neural Information Processing Systems

External Links

Snippet

Eigenvoice speaker adaptation has been shown to be effective when only a small amount of adaptation data is available. At the heart of the method is principal component analysis (PCA) employed to find the most important eigenvoices. In this paper, we postulate that …
Continue reading at proceedings.neurips.cc (PDF) (other versions)

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/14Speech classification or search using statistical models, e.g. hidden Markov models [HMMs]
    • G10L15/142Hidden Markov Models [HMMs]
    • G10L15/144Training of HMMs
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/06Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
    • G10L15/065Adaptation
    • G10L15/07Adaptation to the speaker
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06KRECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
    • G06K9/00Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
    • G06K9/62Methods or arrangements for recognition using electronic means
    • G06K9/6217Design or setup of recognition systems and techniques; Extraction of features in feature space; Clustering techniques; Blind source separation
    • G06K9/6232Extracting features by transforming the feature space, e.g. multidimensional scaling; Mappings, e.g. subspace methods
    • G06K9/6247Extracting features by transforming the feature space, e.g. multidimensional scaling; Mappings, e.g. subspace methods based on an approximation criterion, e.g. principal component analysis
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification
    • G10L17/04Training, enrolment or model building
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/02Feature extraction for speech recognition; Selection of recognition unit
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06KRECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
    • G06K9/00Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
    • G06K9/62Methods or arrangements for recognition using electronic means
    • G06K9/6296Graphical models, e.g. Bayesian networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06KRECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
    • G06K9/00Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
    • G06K9/62Methods or arrangements for recognition using electronic means
    • G06K9/6267Classification techniques
    • G06K9/6268Classification techniques relating to the classification paradigm, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06KRECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
    • G06K9/00Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
    • G06K9/00221Acquiring or recognising human faces, facial parts, facial sketches, facial expressions
    • G06K9/00268Feature extraction; Face representation
    • G06K9/00281Local features and components; Facial parts ; Occluding parts, e.g. glasses; Geometrical relationships
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification
    • G10L17/06Decision making techniques; Pattern matching strategies
    • G10L17/08Use of distortion metrics or a particular distance between probe pattern and reference templates
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signal analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signal, using source filter models or psychoacoustic analysis

Similar Documents

Publication Publication Date Title
Mak et al. Kernel eigenvoice speaker adaptation
Kadyan et al. A comparative study of deep neural network based Punjabi-ASR system
Hazen A comparison of novel techniques for rapid speaker adaptation
Yücesoy et al. Gender identification of a speaker using MFCC and GMM
Guo et al. Deep neural network based i-vector mapping for speaker verification using short utterances
Omar et al. Training Universal Background Models for Speaker Recognition.
Kwok et al. Eigenvoice speaker adaptation via composite kernel principal component analysis
CA2260685C (en) Linear trajectory models incorporating preprocessing parameters for speech recognition
Mak et al. Kernel eigenspace-based MLLR adaptation
Woodland et al. Optimising hidden Markov models using discriminative output distributions
Mak et al. A study of various composite kernels for kernel eigenvoice speaker adaptation
Elnaggar et al. A new unsupervised short-utterance based speaker identification approach with parametric t-SNE dimensionality reduction
Mak et al. Embedded kernel eigenvoice speaker adaptation and its implication to reference speaker weighting
Kim et al. Maximum a posteriori adaptation of HMM parameters based on speaker space projection
Kim et al. Bayesian speaker adaptation based on probabilistic principal component analysis.
Siu et al. Parametric trajectory mixtures for LVCSR
Ager et al. Combined waveform-cepstral representation for robust speech recognition
Li et al. Multi-feature combination for speaker recognition
Mak et al. Improving eigenspace-based MLLR adaptation by kernel PCA
Miao et al. Learning discriminative basis coefficients for eigenspace MLLR unsupervised adaptation
Zheng et al. Improved maximum mutual information estimation training of continuous density HMMs.
Singh et al. Sparse representation classification over discriminatively learned dictionary for language recognition
Hong et al. Discriminative training for speaker identification based on maximum model distance algorithm
Hossa et al. An Effective Speaker Clustering Method using UBM and Ultra-Short Training Utterances
Mak et al. Various reference speakers determination methods for embedded kernel eigenvoice speaker adaptation