Kwok et al., 2003 - Google Patents

Eigenvoice speaker adaptation via composite kernel principal component analysis

Kwok et al., 2003

Document ID: 4451613605683684189
Author: Kwok J; Mak B; Ho S
Publication year: 2003
Publication venue: Advances in Neural Information Processing Systems

External Links

Cited by

Snippet

Eigenvoice speaker adaptation has been shown to be effective when only a small amount of adaptation data is available. At the heart of the method is principal component analysis (PCA) employed to find the most important eigenvoices. In this paper, we postulate that …

Continue reading at proceedings.neurips.cc (PDF) (other versions)

230000004301 light adaptation 0 title abstract description 43

Classifications

- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/14—Speech classification or search using statistical models, e.g. hidden Markov models [HMMs]
- G10L15/142—Hidden Markov Models [HMMs]
- G10L15/144—Training of HMMs
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/06—Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
- G10L15/065—Adaptation
- G10L15/07—Adaptation to the speaker
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06K—RECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
- G06K9/00—Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
- G06K9/62—Methods or arrangements for recognition using electronic means
- G06K9/6217—Design or setup of recognition systems and techniques; Extraction of features in feature space; Clustering techniques; Blind source separation
- G06K9/6232—Extracting features by transforming the feature space, e.g. multidimensional scaling; Mappings, e.g. subspace methods
- G06K9/6247—Extracting features by transforming the feature space, e.g. multidimensional scaling; Mappings, e.g. subspace methods based on an approximation criterion, e.g. principal component analysis
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification
- G10L17/04—Training, enrolment or model building
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/02—Feature extraction for speech recognition; Selection of recognition unit
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06K—RECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
- G06K9/00—Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
- G06K9/62—Methods or arrangements for recognition using electronic means
- G06K9/6296—Graphical models, e.g. Bayesian networks
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06K—RECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
- G06K9/00—Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
- G06K9/62—Methods or arrangements for recognition using electronic means
- G06K9/6267—Classification techniques
- G06K9/6268—Classification techniques relating to the classification paradigm, e.g. parametric or non-parametric approaches
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06K—RECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
- G06K9/00—Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
- G06K9/00221—Acquiring or recognising human faces, facial parts, facial sketches, facial expressions
- G06K9/00268—Feature extraction; Face representation
- G06K9/00281—Local features and components; Facial parts ; Occluding parts, e.g. glasses; Geometrical relationships
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification
- G10L17/06—Decision making techniques; Pattern matching strategies
- G10L17/08—Use of distortion metrics or a particular distance between probe pattern and reference templates
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signal analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signal, using source filter models or psychoacoustic analysis

Similar Documents

Publication	Publication Date	Title
Mak et al.	2005	Kernel eigenvoice speaker adaptation
Kadyan et al.	2019	A comparative study of deep neural network based Punjabi-ASR system
Hazen	2000	A comparison of novel techniques for rapid speaker adaptation
Yücesoy et al.	2013	Gender identification of a speaker using MFCC and GMM
Guo et al.	2018	Deep neural network based i-vector mapping for speaker verification using short utterances
Omar et al.	2010	Training Universal Background Models for Speaker Recognition.
Kwok et al.	2003	Eigenvoice speaker adaptation via composite kernel principal component analysis
CA2260685C (en)	2002-10-22	Linear trajectory models incorporating preprocessing parameters for speech recognition
Mak et al.	2007	Kernel eigenspace-based MLLR adaptation
Woodland et al.	1991	Optimising hidden Markov models using discriminative output distributions
Mak et al.	2004	A study of various composite kernels for kernel eigenvoice speaker adaptation
Elnaggar et al.	2019	A new unsupervised short-utterance based speaker identification approach with parametric t-SNE dimensionality reduction
Mak et al.	2006	Embedded kernel eigenvoice speaker adaptation and its implication to reference speaker weighting
Kim et al.	2004	Maximum a posteriori adaptation of HMM parameters based on speaker space projection
Kim et al.	2000	Bayesian speaker adaptation based on probabilistic principal component analysis.
Siu et al.	1998	Parametric trajectory mixtures for LVCSR
Ager et al.	2011	Combined waveform-cepstral representation for robust speech recognition
Li et al.	2010	Multi-feature combination for speaker recognition
Mak et al.	2004	Improving eigenspace-based MLLR adaptation by kernel PCA
Miao et al.	2013	Learning discriminative basis coefficients for eigenspace MLLR unsupervised adaptation
Zheng et al.	2001	Improved maximum mutual information estimation training of continuous density HMMs.
Singh et al.	2017	Sparse representation classification over discriminatively learned dictionary for language recognition
Hong et al.	2004	Discriminative training for speaker identification based on maximum model distance algorithm
Hossa et al.	2016	An Effective Speaker Clustering Method using UBM and Ultra-Short Training Utterances
Mak et al.	2005	Various reference speakers determination methods for embedded kernel eigenvoice speaker adaptation