US6542869B1 - Method for automatic analysis of audio including music and speech - Google Patents
Method for automatic analysis of audio including music and speech Download PDFInfo
- Publication number
- US6542869B1 US6542869B1 US09/569,230 US56923000A US6542869B1 US 6542869 B1 US6542869 B1 US 6542869B1 US 56923000 A US56923000 A US 56923000A US 6542869 B1 US6542869 B1 US 6542869B1
- Authority
- US
- United States
- Prior art keywords
- matrix
- kernel
- audio signal
- points
- audio
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Expired - Lifetime
Links
- 238000000034 method Methods 0.000 title claims abstract description 77
- 238000004458 analytical method Methods 0.000 title claims description 19
- 239000013598 vector Substances 0.000 claims abstract description 41
- 230000005236 sound signal Effects 0.000 claims abstract description 32
- 239000011159 matrix material Substances 0.000 claims description 59
- 238000001228 spectrum Methods 0.000 claims description 38
- 238000005259 measurement Methods 0.000 claims description 16
- 238000005070 sampling Methods 0.000 claims description 4
- 238000012935 Averaging Methods 0.000 claims description 2
- 230000002596 correlated effect Effects 0.000 claims description 2
- 230000008859 change Effects 0.000 abstract description 13
- 238000012800 visualization Methods 0.000 description 16
- 238000013459 approach Methods 0.000 description 15
- 230000011218 segmentation Effects 0.000 description 11
- 230000003595 spectral effect Effects 0.000 description 10
- 230000006870 function Effects 0.000 description 9
- 239000011295 pitch Substances 0.000 description 9
- 230000033001 locomotion Effects 0.000 description 8
- IJJWOSAXNHWBPR-HUBLWGQQSA-N 5-[(3as,4s,6ar)-2-oxo-1,3,3a,4,6,6a-hexahydrothieno[3,4-d]imidazol-4-yl]-n-(6-hydrazinyl-6-oxohexyl)pentanamide Chemical compound N1C(=O)N[C@@H]2[C@H](CCCCC(=O)NCCCCCC(=O)NN)SC[C@@H]21 IJJWOSAXNHWBPR-HUBLWGQQSA-N 0.000 description 6
- 241001342895 Chorus Species 0.000 description 5
- 238000011524 similarity measure Methods 0.000 description 5
- 230000007704 transition Effects 0.000 description 5
- HAORKNGNJCEJBX-UHFFFAOYSA-N cyprodinil Chemical compound N=1C(C)=CC(C2CC2)=NC=1NC1=CC=CC=C1 HAORKNGNJCEJBX-UHFFFAOYSA-N 0.000 description 4
- 239000000203 mixture Substances 0.000 description 4
- 230000004048 modification Effects 0.000 description 4
- 238000012986 modification Methods 0.000 description 4
- 230000003252 repetitive effect Effects 0.000 description 4
- 230000001020 rhythmical effect Effects 0.000 description 4
- 230000002459 sustained effect Effects 0.000 description 4
- 230000000875 corresponding effect Effects 0.000 description 3
- 238000000605 extraction Methods 0.000 description 3
- 230000001755 vocal effect Effects 0.000 description 3
- 230000015572 biosynthetic process Effects 0.000 description 2
- 230000006835 compression Effects 0.000 description 2
- 238000007906 compression Methods 0.000 description 2
- 230000007423 decrease Effects 0.000 description 2
- 230000033764 rhythmic process Effects 0.000 description 2
- 238000003786 synthesis reaction Methods 0.000 description 2
- 230000002123 temporal effect Effects 0.000 description 2
- 241000544061 Cuculus canorus Species 0.000 description 1
- 241001465754 Metazoa Species 0.000 description 1
- 230000002238 attenuated effect Effects 0.000 description 1
- 230000008901 benefit Effects 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 239000004020 conductor Substances 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 230000007613 environmental effect Effects 0.000 description 1
- 238000002474 experimental method Methods 0.000 description 1
- 230000004907 flux Effects 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 230000008520 organization Effects 0.000 description 1
- 230000000737 periodic effect Effects 0.000 description 1
- 230000008569 process Effects 0.000 description 1
- 238000012545 processing Methods 0.000 description 1
- 230000009467 reduction Effects 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 239000007787 solid Substances 0.000 description 1
- 238000010183 spectrum analysis Methods 0.000 description 1
- 238000013179 statistical model Methods 0.000 description 1
- 230000001360 synchronised effect Effects 0.000 description 1
- 230000036962 time dependent Effects 0.000 description 1
- 238000012549 training Methods 0.000 description 1
- 230000000007 visual effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G11—INFORMATION STORAGE
- G11B—INFORMATION STORAGE BASED ON RELATIVE MOVEMENT BETWEEN RECORD CARRIER AND TRANSDUCER
- G11B27/00—Editing; Indexing; Addressing; Timing or synchronising; Monitoring; Measuring tape travel
- G11B27/10—Indexing; Addressing; Timing or synchronising; Measuring tape travel
- G11B27/19—Indexing; Addressing; Timing or synchronising; Measuring tape travel by using information detectable on the record carrier
- G11B27/28—Indexing; Addressing; Timing or synchronising; Measuring tape travel by using information detectable on the record carrier by using information signals recorded by the same method as the main recording
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/60—Information retrieval; Database structures therefor; File system structures therefor of audio data
- G06F16/63—Querying
- G06F16/632—Query formulation
- G06F16/634—Query by example, e.g. query by humming
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/60—Information retrieval; Database structures therefor; File system structures therefor of audio data
- G06F16/68—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
- G06F16/683—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10H—ELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
- G10H1/00—Details of electrophonic musical instruments
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/04—Segmentation; Word boundary detection
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10H—ELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
- G10H2210/00—Aspects or methods of musical processing having intrinsic musical character, i.e. involving musical theory or musical parameters or relying on musical knowledge, as applied in electrophonic musical tools or instruments
- G10H2210/031—Musical analysis, i.e. isolation, extraction or identification of musical elements or musical parameters from a raw acoustic signal or from an encoded audio signal
- G10H2210/041—Musical analysis, i.e. isolation, extraction or identification of musical elements or musical parameters from a raw acoustic signal or from an encoded audio signal based on mfcc [mel -frequency spectral coefficients]
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10H—ELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
- G10H2210/00—Aspects or methods of musical processing having intrinsic musical character, i.e. involving musical theory or musical parameters or relying on musical knowledge, as applied in electrophonic musical tools or instruments
- G10H2210/031—Musical analysis, i.e. isolation, extraction or identification of musical elements or musical parameters from a raw acoustic signal or from an encoded audio signal
- G10H2210/046—Musical analysis, i.e. isolation, extraction or identification of musical elements or musical parameters from a raw acoustic signal or from an encoded audio signal for differentiation between music and non-music signals, based on the identification of musical parameters, e.g. based on tempo detection
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10H—ELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
- G10H2210/00—Aspects or methods of musical processing having intrinsic musical character, i.e. involving musical theory or musical parameters or relying on musical knowledge, as applied in electrophonic musical tools or instruments
- G10H2210/031—Musical analysis, i.e. isolation, extraction or identification of musical elements or musical parameters from a raw acoustic signal or from an encoded audio signal
- G10H2210/061—Musical analysis, i.e. isolation, extraction or identification of musical elements or musical parameters from a raw acoustic signal or from an encoded audio signal for extraction of musical phrases, isolation of musically relevant segments, e.g. musical thumbnail generation, or for temporal structure analysis of a musical piece, e.g. determination of the movement sequence of a musical work
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10H—ELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
- G10H2240/00—Data organisation or data communication aspects, specifically adapted for electrophonic musical tools or instruments
- G10H2240/121—Musical libraries, i.e. musical databases indexed by musical parameters, wavetables, indexing schemes using musical parameters, musical rule bases or knowledge bases, e.g. for automatic composing methods
- G10H2240/131—Library retrieval, i.e. searching a database or selecting a specific musical piece, segment, pattern, rule or parameter set
- G10H2240/135—Library retrieval index, i.e. using an indexing scheme to efficiently retrieve a music piece
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10H—ELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
- G10H2250/00—Aspects of algorithms or signal processing methods without intrinsic musical character, yet specifically adapted for or used in electrophonic musical processing
- G10H2250/131—Mathematical functions for musical analysis, processing, synthesis or composition
- G10H2250/215—Transforms, i.e. mathematical transforms into domains appropriate for musical signal processing, coding or compression
- G10H2250/235—Fourier transform; Discrete Fourier Transform [DFT]; Fast Fourier Transform [FFT]
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
- G10L19/02—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using spectral analysis, e.g. transform vocoders or subband vocoders
Definitions
- the present invention relates to a method for identifying changes in an audio signal which may include music, speech, or a combination or music and speech. More particularly, the present invention relates to the identification of changes in the audio for the purpose of indexing, summarizing, beat tracking, or retrieving.
- frame-to-frame differences provide a useful measure of overall changes or novelty in the video signal content.
- Frame-to-frame differences can be used for automatic segmentation and key frame extraction, as well as for other purposes.
- a typical approach to audio segmentation is to detect silences. Such a system is disclosed by Arons, B. in “SpeechSkimmer: A system for interactively skimming recorded speech.” ACM Trans. On Computer Human Interaction , 4(1):3-38, Match 1997. A procedure for detecting silence works best for speech, even though silences in the speech signal may have little or no semantic significance. Much audio such as popular music or reverberant sources, may contain no silences at all, and the silence based segmentation methods will fail.
- Auditory Scene Analysis tries to detect harmonically and temporally related components of sound.
- Auditory Scene Analysis tries to detect harmonically and temporally related components of sound.
- A. Bregman in “Auditory Scene Analysis: Perceptual Organization of Sound”, Bradford Books , 1994.
- the Auditory Scene Analysis procedure works only in a limited domain, such as a small number of sustained and harmonically pure musical notes.
- the Bregman approach looks for components in the frequency domain that are harmonically or temporally related.
- rules are assumptions are used to define what “related” means, and the rules typically work well only in a limited domain.
- Another approach uses speaker identification to segment audio by characteristics of an individual. Such a system is disclosed by Siu et al., “An Unsupervised Sequential Learning Algorithm For The Segmentation Of Speech Waveforms With Multiple Speakers”, Proc. ICASSP , vol. 2, pp. 189-192, March 1992. Though a speaker identification method could be used to segment music, it relies on statistical models that must be trained from a corpus of labeled data, or estimated by clustering audio segments.
- Another approach to audio segmentation operates using musical beat-tracking.
- beat tracking correlated energy peaks across sub-bands are used.
- Another approach depends on restrictive assumptions such as the music must be in 4/4 time and have a bass drum on the downbeat. See, Gogo, M. and Y. Muraoaka, “A Beat Tracking System for Acoustic Signals of Music,” in Proc. ACM Multimedia 1994, San Francisco, ACM.
- a method is provided to automatically find points of change in music or audio, by looking at local self-similarity.
- the method can identify individual note boundaries or natural segment boundaries such as verse/chorus or speech/music transitions, even in the absence of cues such as silence.
- the present invention works for any audio source regardless of complexity, does not rely on particular acoustic features such as silence, or pitch, and needs no clustering or training.
- the method of the present invention can be used in a wide variety of applications, including indexing, beat tracking, and retrieving and summarizing music or audio.
- the method works with a wide variety of audio sources.
- the method in accordance with the present invention finds points of maximum audio change by considering self-similarity of the audio signal. For each time window in the audio signal, a formula, such as a Fast Fourier Transform (FFT), is applied to determine a parameterization value vector.
- FFT Fast Fourier Transform
- the self-similarity as well as cross-similarity between each of the parameterization values is determined for past and future windows.
- a significant point of novelty or change will have a high self-similarity in the past and future, and a low cross-similarity.
- the extent of the time difference between “past” and “future” can be varied to change the scale of the system so that, for example, individual notes can be found using a short time extent while longer events, such as musical themes, can be identified by considering windows further into the past or future. The result is a measure of how novel the source audio is at any time.
- Audio indexing/browsing jump to segment points.
- Audio summarization play only start of significantly new segments.
- Indexing/browsing audio link/jump to next novel segment
- the method in accordance with the present invention thus, produces a time series that is proportional to the novelty of an acoustic source at any instant. High values and peaks correspond to large audio changes, so the novelty score can be thresholded to find instances which can be used as segment boundaries.
- FIG. 1 is a flow chart illustrating the steps of the method of analysis in accordance with the present invention
- FIG. 2 shows the musical score for the first bars of Bach's Prelude No. 1;
- FIGS. 3A and 3B provide a visualization of the distance matrix for the first few seconds taken from a performance of the bars of FIG. 2;
- FIG. 4 shows a 3-dimensional plot of a 64 ⁇ 64 checkerboard kernel with a radial Gaussian taper
- FIG. 5 shows the novelty score computed on the similarity matrix S for the Gould performance from FIG. 3A;
- FIG. 6 shows the segment boundaries extracted from a 1 ⁇ 2 second kernel of the first 10 seconds of the Gould performance shown in FIG. 3A;
- FIG. 7 shows the similarity matrix for a 56-second segment of the soundtrack to the motion picture Raiders of the Lost Ark
- FIG. 8 shows an example of a beat spectrum B( 1 ) computed for a range of 3 seconds in the Gould performance shown in FIG. 3A;
- FIG. 9 shows a beat spectrum computed from the first 10 seconds of the jazz composition Take 5 by the Dave Brubeck Pair;
- FIGS. 10A and 10B show the self-similarity of the entire first movement of Beethoven's Symphony No. 5;
- FIG. 11 shows a similarity image of the Bach prelude for which the music bars are shown in FIG. 2, with data derived directly from MIDI data.
- FIG. 1 is a flow chart illustrating the steps of the method of analysis of audio in accordance with the present invention.
- the source audio 100 is sampled. Such sampling is done by windowing portions of the audio wave-form. Variable window widths and overlaps can be used. For example, a window may be 256 samples wide, with overlapping by 128 points. For audio sampled at 16 kHz, this results in a 16 mS frame width and a 125 per second frame rate.
- a second step 102 the sampled audio signal is parameterized.
- Each frame is parameterized using an analysis function that provides a vector representation of the audio signal portion such as a Fourier transform, or a Mel-Frequency Cepstral Coefficients (MFCC) analysis.
- MFCC Mel-Frequency Cepstral Coefficients
- Other parameterization methods include ones based on linear prediction, psychoacoustic considerations or potentially a combination of techniques, such as a Perpetual Linear Prediction.
- each analysis frame is windowed with a 256-point Hamming window and a Fast Fourier transform (FFT) is used for parameterization to estimate the spectral components in the window.
- FFT Fast Fourier transform
- the logarithm of the magnitude of the result of the FFT is used as an estimate of the power spectrum of the signal in the window.
- High frequency components are discarded, typically those above one quarter of the sampling frequency (Fs/4), since the high frequency components are not as useful for similarity calculations as lower frequency components.
- the resulting vector characterizes the spectral content of a window.
- MPEG Moving Picture Experts Group
- MPEG is a family of standards used for coding audio-visual information in a digital compressed format.
- MPEG Layer 3 uses a spectral representation similar to an FFT and can be used for a distance measurement and avoids the need to decode the audio. Regardless of the parameterization used, the desired result obtained is a compact vector of parameters for every frame.
- MFCC representation which preserves the coarse spectral shape while discarding fine harmonic structure due to pitch, may be appropriate for certain applications.
- a single pitch in the MFCC domain is represented by roughly the envelope of the harmonics, not the harmonics themselves.
- MFCCs will tend to match similar timbres rather than exact pitches, though single-pitched sounds will match if they are present.
- the method in accordance with the present invention is flexible and can subsume most any existing audio analysis method for parameterizing.
- the parameterization step can be tuned for a particular task by choosing different parameterization functions, or for example by adjusting window size to maximize the contrast of a resulting similarity matrix as determined in subsequent steps.
- the parameters are embedded in a 2-dimensional representation.
- One way of embedding the audio is described by the present inventor J. Foote in “Visualizing Music and Audio Using Self-Similarity”, Proc. ACM Multimedia 99, Orlando, Fla., which is incorporated herein by reference.
- a key is a measure of the similarity, or dissimilarity (D) between two feature vectors v i and v j .
- the vectors v i and v j are determined in the parameterization step for audio frames i and j, discussed previously.
- Another measurement of vector similarity is a scalar dot product of vectors.
- the dot product of the vectors will be large if the vectors are both large and similarly oriented.
- the dot product can be represented as follows:
- the dot product can be normalized to give the cosine of the angle between the vector parameters.
- the cosine of the angle between vectors has the property that it yields a large similarity score even if the vectors are small in magnitude. Because of Parsevalls relation, the norm of each spectral vector will be proportional to the average signal energy in a window to which the vector is assigned.
- the normalized dot product which gives the cosine of the angle between the vectors utilized can be represented as follows:
- narrow windows identifying mostly vectors with low energy, such as those containing silence will be spectrally similar, which is generally desirable.
- the narrow windows identifying vectors with low energy, or feature vectors will occur at a rate much faster than typical musical events in a musical score, so a more desirable similarity measure can be obtained by computing the vector correlation over a larger window w.
- the larger window also captures an indication of the time dependence of the vectors. For a window to have a high similarity score, vectors in a window must not only be similar but their sequence must be similar as well.
- the scalar sequence (1,2,3,4,5) has a much higher cosine similarity score with itself than with the sequence (5,4,3,2,1).
- the dot-product and cosine measures grow with increasing vector similarity while Euclidean distance approaches zero.
- the Euclidean distance can be inverted.
- Other reasonable distance measurements can be used for distance embedding, such as statistical measures or weighted versions of the metric examples disclosed previously herein.
- the distance measure D is a function of two frames, or instances in the source signal. It may be desirable to consider the similarity between all possible instants in a signal. This is done by embedding the distance measurements D in a two dimensional matrix representation S.
- the matrix S contains the similarity calculated for all frames, or instances, or for all the time indexes i and j such that the i,j element of the matrix S is D (i,j). In general, S will have maximum values on the diagonal because every window will be maximally similar to itself.
- the matrix S can be visualized as a square image such that each pixel i,j is given a gray scale value proportional to the similarity measure D(i,j) and scaled such that the maximum value is given the maximum brightness.
- FIG. 2 shows the musical score for the first bars of Bach's Prelude No. 1 in C Major, from The Well-Tempered Clavier, BVW 846.
- FIG. 3A shows a visualization of a 1963 piano performance by Glenn Gould from the bars of the composition shown in FIG. 2 .
- FIG. 3A provides a visualization of the distance matrix for the first few seconds taken from a performance of the bars of FIG. 2 .
- image rather than matrix coordinate conventions are used, so the origin is at the lower left and time increases both with height and to the right.
- FIG. 3B shows an acoustic realization from Musical Instrument Digital Interface (MIDI) protocol data for the composition shown in FIG. 2 .
- the visualizations in FIGS. 3A and 3B make both the structure of the piece and details of the performance visible.
- MIDI Musical Instrument Digital Interface
- the musical structure is clear from the repetitive motifs. Multiples of the repetition time can be seen in the off-diagonal stripes parallel to the main diagonal in FIGS. 3A and 3B. In the first few bars of the score shown in FIG. 2, the repetitive nature of the piece is made clear. In the visualizations of FIG. 3A, approximately 34 notes can be seen as squares along the diagonal. Repetition of the notes is visible in the off-diagonal stripes parallel to the main diagonal. The repeated notes are visible in the off-diagonal stripes parallel to the main diagonal. The stripes are visible beginning at 0, 2, 4 and 6 seconds in FIG. 3 A.
- FIG. 3B visualizes a similar excerpt, but from a MIDI realization using a passable piano sample with a strict tempo. Beginning silence is visible as a bright square at the lower left which has a high self-similarity, but low cross-similarity with remaining non-silent portions as can be seen by the dark squares projected both horizontally and vertically beginning at the origin of the visualization.
- all notes have exactly the same duration and articulation.
- the degree of simplification using the slanted matrix is particularly true in a number of applications presented herein later, where the similarity determination is only required for relatively small lags and not all combinations of i and j.
- Computing the matrix L only for small, non-negative values of l can result in substantial reductions in computation and storage.
- the next step 106 which may be taken, as illustrated in FIG. 1, is to determine the degree of change or novelty between distance measurements.
- the structure of the matrix S is a key in the determination of the degree of change or novelty measurement.
- S can be used to determine novelty, consider a simple song having two successive notes of different pitch, for example a cuckoo-call. When visualized, S for this two note example will look like a 2 ⁇ 2 checkerboard. White squares on the diagonal correspond to the notes, which have high self-similarity. Black squares on the off-diagonals correspond to regions of low cross-similarity. Assuming we are using the normalized dot product, or cosine of the angle between vectors for a difference determination, similar regions will be close to 1, while dissimilar regions will be closer to ⁇ 1.
- Finding the instant when the notes change in S for the cuckoo call is as simple as finding the center of the checkerboard. This can be done by correlating S with a kernel that itself looks like a checkerboard, termed a “checkerboard” kernel.
- the simplest is the following 2 ⁇ 2 unit kernel: [ ⁇ 1 - 1 - 1 1 ]
- the first “coherence” term measures the self-similarity on either side of the center point. This will be high when both regions of the matrix S are self-similar.
- the second “anticoherence” term measures the cross-similarity between the two regions. The cross-similarity will be high when the two regions are substantially similar, or when there is little difference across the center point. The difference between the self-similarity and the cross-similarity estimates the novelty of the signal at the center point. The difference value will be high when the two regions are self-similar, but different from each other.
- Kernels can be smoothed to avoid edge effects using windows, such as a Hamming, that taper toward zero at the edges.
- windows such as a Hamming
- the kernel C has a width (lag) of L and is centered on 0,0.
- S can be zero-padded to avoid undefined values, or, as in the present examples, only computed for the interior of the signal where the kernel overlaps S completely. Note that only regions of S with a lag of L or smaller are used, so the slant representation is particularly helpful. Also, typically both S and the kernel C are symmetric, so only one-half of the values under the double summation (those for m ⁇ n) need be computed.
- the width of the kernel L directly affects the properties of the novelty measure.
- a small kernel detects novelty on a short time scale, such as beats or notes. Increasing the kernel size decreases the time resolution, and increases the length of novel events that can be detected. Larger kernels average over short-time novelty and detect longer structure, such as musical transitions like that between verse and chorus, key modulations, or symphonic movements.
- FIG. 5 shows the novelty score computed on the similarity matrix S for the Gould performance from FIG. 3 A. Results using two kernel widths are shown. The 2S kernel plot was offset slightly upwards scale for clarity.
- the shorter kernel 0.5S kernel clearly picks out the note events, although some notes are slurred and are not as distinct. In particular in Gould's idiosyncratic performance, the last three notes in each phrase are emphasized with a staccato attack. Note how this system clearly identifies the onset of each note without analyzing explicit features such as pitch, energy, or silence.
- the longer kernel yields peaks at the boundaries of 8-note phrases, at 2, 4 and 6 seconds. Each peak occurs exactly at the downbeat of the first note in each phrase. Note that although the present method has no a-priori knowledge of musical phrases or pitch, it is finding perceptually and musically significant points.
- extrema in the novelty score correspond to large changes in the audio characteristics. These novelty points often serve as good boundaries for segmenting the audio, as the audio will be similar within boundaries and significantly different across them. The novelty points also serve as useful indexes into the audio as they indicate points of significant change.
- Finding the segment boundaries is a simple matter of finding the peaks in the novelty score.
- a simple approach is to find points where the score exceeds a local or global threshold. This is illustrated in step 110 of FIG. 1, where thresholds in the novelty score 108 are determined to identify segment boundaries 112 . For time precision segmenting determinations, maximum or zero-slope points above the threshold locate peaks exactly.
- a useful way of organizing the index points is in a binary tree structure constructed by ranking all the index points by novelty score.
- the highest-scoring index point becomes the root of the tree, and divides the signal into left and right sections.
- the highest-scoring index point in the left and right sections become the left and right children of the root node, and so forth recursively until no more index points are left above the threshold.
- the tree structure facilitates navigation of the index points, making it easy to find the nearest index from an arbitrary point by walking the tree.
- the tree can be truncated at any threshold level to yield a desired number of index points, and hence segments.
- An enhancement to the tree approach is to reduce the size of the kernel as the tree is descended, so lower-level index points reveal increasingly fine time granularity.
- FIG. 6 shows the segment boundaries extracted from a 1 ⁇ 2 second kernel of the first 10 seconds of the Gould performance shown in FIG. 3 A. Boundaries are indicated with a ⁇ . Individual notes are clearly isolated, except for the fourth note which has been slurred with the third.
- segment boundaries identified in FIG. 6 work well for musical notes, but cannot be expected to segment speech into words unless they are spectrally different. Speech is different because words are often not well delineated acoustically. For example, the phrase “that's messy” would be segmented into that“s-s” and “tupid” because there is little acoustic differences in the “s” sounds of the two words. Note that this would likely be the segmentation that a non-English speaker would choose.
- FIG. 7 shows the audio novelty score for the first minute of “Animals Have Young”, MPEG Requirements Group, video V14 from the MPEG-7 content set, Description of MPEG-7 Content Set, Doc. ISO/MPEG N2467, MPEG Atlantic City Meeting, October 1998.
- This segment contains 4 seconds of introductory silence, followed by a short musical segment with the production logo.
- the titles start, and very different theme music commences.
- the largest peak occurs directly on the speech/music transition at 35 seconds.
- the two other, major peaks occur at the transitions between silence and music at 4 seconds and between the introduction and theme music at 17 seconds.
- a novelty score can't in general discriminate between speakers unless they have markedly different vocal spectra, such as between genders. Often, however, there are enough differences for the similarity measure to find.
- a novelty score in accordance with the present invention can often be used to find differences in speaking style and not, as in previous approaches, differences between known vocal characteristics of particular speakers or speaker models.
- it is a straightforward matter to use a distance metric tuned to discriminate speakers by vocal characteristics for example based on cepstral coefficients, or using the methods disclosed in J. T. Foote and H. F. Silverman, “A Model Distance Measure For Talker Clustering And Identification”, Proc. ICASSP , vol. S1, pp. 317-320, Sydney, Australia, April 1994.
- beat tracking as illustrated by step 114 of FIG. 1 which can be provided as an alternative to performing a kernel correlation to obtain a novelty score.
- beat spectrum B( 1 ). Peaks in the beat spectrum correspond to periodicities in the audio.
- a simple estimate of the beat spectrum can be found by summing S along the diagonal as follows: B ⁇ ( l ) ⁇ ⁇ kcR ⁇ S ⁇ ( k , k + l )
- B( 0 ) is simply the sum along the main diagonal over some continuous range R
- B( 1 ) is the sum along the first sub-diagonal, and so forth.
- the slant representation is especially convenient as diagonal sums are simply the sum across columns, or the projection onto the lag axis.
- FIG. 8 shows an example of a beat spectrum B( 1 ) computed for a range of 3 seconds in the Gould performance illustrated in FIG. 3 A.
- the periodicity of each note can be clearly seen, as well as the strong 8-note periodicity of the phrase with a sub-harmonic at 16 notes. Especially interesting are the peaks at notes 3 and 5 . These come from the three-note periodicity of the eight-note phrase. In each phrase, notes 3 and 6 , notes 4 and 7 , and notes 5 and 8 are the same.
- B(k, 1 ) will be symmetrical, it is only necessary to sum over one variable, giving the one dimensional result B( 1 ).
- the beat spectrum B( 1 ) provides good results across a range of musical generes, tempos and rhythmic structures.
- FIG. 9 shows a beat spectrum computed from the first 10 seconds of the jazz composition Take 5 by the Dave Brubeck Pair. Apart from being in an unusual 5/4 time signature, this rhythmically sophisticated piece requires some interpretation. First not that no obvious periodicity occurs at the actual beat tempo, the beat tempo being marked by the solid vertical lines in FIG. 9 . Rather there is a marked periodicity at 5 beats, and a corresponding subharmonic at 10. jazz aficionados know that “swing” is the subdivision of beats into non-equal periods rather than “straight” equal eighth-notes. The beat spectrum clearly shows that each beat is subdivided into near perfect triplets. The triplets are indicated with dotted lines spaced 1/3 of a beat apart between the second and third beats. A clearer illustration of “swing” would be difficult to provide.
- beat spectrum in combination with the narrow kernel novelty score gives excellent estimates of musical tempo. Peaks in the beat spectrum give the fundamental rhythmic periodicity, while peaks in the novelty score give the precise downbeat time or phase. Correlating the novelty score with a comb-like function with a period from the beat spectrum yields a signal that has strong peaks at every beat. Strong off-beats and syncopations can be then deduced from secondary peaks in the beat spectrum.
- Conventional approaches for beat tracking look for absolute acoustic attributes. For example energy peaks across particular sub-bands.
- the beat tracking method in accordance with the present invention is more robust since the only signal attribute needed is a repetitive change.
- the beat spectrum discards absolute timing information.
- the beat spectrum is introduced for analyzing rhythmic variation over time.
- a spectrogram images Fourier analysis of successive windows to illustrate spectral variation over time.
- a beat spectrogram presents the beat spectrum over successive windows to display rhythmic variation over time.
- the beat spectrum is an image formed by successive beat spectra. Time is on the x axis, with lag time on the y axis. Each pixel in the beat spectrograph is colored with the scaled value of the beat spectrum at the time and lag, so that beat spectrum peaks are visible as bright bars in the beat spectrogram.
- the beat spectrograph shows how tempo varies over time. For example, an accelerating rhythm will be visible as bright bars that slope downward, as the lag time between beats decreases with time.
- the beat spectrum has interesting parallels with the frequency spectrum.
- the beat spectrum is a frequency operator, and therefore does not commute with a time operator. In particular, if the tempo changes over the analysis window, it will “smear” the beat spectrum. Analogously, changing a signal's frequency over the analysis window will result in a smeared frequency spectrum.
- beat spectral analysis just like frequency analysis, is a trade-off between spectral and temporal resolution.
- the ability to reliably segment and beat-track audio has a number of useful applications. Several are described to follow. Note that the method of the present invention can be used for any time-dependent media, such as video, where some measure of point-to-point similarity can be determined.
- the method in accordance with the present invention provides good estimates of audio segment boundaries.
- the segment boundary locations are useful for an application where one might wish to play back only a portion of an audio file.
- selection operations can be constrained to segment boundaries so that the selection region does not contain factional notes or phrases.
- Such an audio editing tool would be similar to “smart cut and paste” in a text editor that constrains selection regions to entire units such as words or sentences.
- the segment size can be adjusted to the degree of zoom so that the appropriate time resolution is available. When zoomed in, higher resolution (perhaps from the lower levels of the index tree) would allow note-by-note selection, while a zoomed-out view would allow selection by phrase or section.
- segmenting audio greatly facilitates audio browsing: a “jump-to-next-segment” function allows audio to be browsed faster than real time. Because segments will be reasonably self-similar, listening to a small part of the segment will give a good idea of the entire segment.
- the audio segmentation and indexing approach can be extended to an automatic summarization. For example, by playing on the start of each segment as in the “scan” feature on a CD player.
- segments can be clustered so that only significantly novel segments are included in the summary. Segments too similar to a segment already in the summary could be skipped without losing too much information. For example, when summarizing a popular sing, repeated instances of the chorus could be excluded from the summary.
- a further refinement of audio summarization could be audio “gisting”, or finding a short segment that best characterizes the entire work.
- a number of on-line music retailers offer a small clip of their wares for customers to audition. The clips are typically just a short interval taken near the beginning of each work and may not be representative of the entire piece.
- Simple processing of a similarity matrix in accordance with the present invention can find the segment most similar throughout the entire work. Averaging the similarity matrix over the interval corresponding to each segment gives a measure of how well the segment represents the entire work. The highest-scoring segment would thus be the best candidate for a sample clip.
- Classification and retrieval such as classification of advertisements in an audio portion of a TV broadcast, will work better using the method in accordance with the present invention on shorter audio segments with uniform features than long heterogeneous data.
- streaming audio like a radio broadcast of a TV soundtrack, it is rarely clear when one segment begins and ends.
- each segment is guaranteed to be reasonably self-similar and hence homogeneous.
- segments determined in accordance with the method of the present invention will be good units to cluster by similarity, or classify as in the methods disclosed by the present inventor J. Foote in “content-Based Retrieval of Music and Audio” Multimedia Storage and Archiving Systems II, Proc. SPIE , Vol. 3229, Dallas, Tex. 1997.
- Visualizations in accordance with the method of the present invention show how acoustically similar passages can be located in an audio recording. Similarity can also be found across recordings as well as within a single recording.
- a classification procedure using the present invention to segment the audio would be immediately useful to identify known music or audio located in a longer file. For example, it would be a simple matter to find the locations of the theme music in a news broadcast, or the times that advertisements occur in a TV broadcast if the audio was available.
- the similarity measure could be computed between all frames of the source commercial and the TV broadcast resulting in a rectangular similarity matrix. Commercial onset times would be determined by thresholding the similarity matrix at some suitable value. Even if the commercials were not known in advance, they could be detected by their repetition. The structure of most music is sufficient to characterize the work.
- FIGS. 10A and 10B show the self-similarity of the entire first movement of Beethoven's Symphony No. 5. The two visualizations shown are each from a different performance featuring different conductors, Herbert von Karajan and the Berlin concertc in FIG. 10A, and Carlos Kleiber and the Vienna concertc in FIG. 10 B. Because of the length of the piece is more than seven minutes, much fine detail is not observable.
- FIGS. 10A and 10B illustrate how visualizations capture both the essential structure of the piece as well as variations of individual performers.
- a benefit of the novelty score in accordance with the present invention is that it will be reasonably invariant across different realizations of the same music. Because the novelty score is based on self-similarity, rather than particular acoustic features, different performances of the same music should have similar novelty scores. Thus Bach's Air on a G String should result in a novelty score that is similar whether played on a violin or a kazoo.
- One application for novelty scores for different realizations of the same music is to use the scores to align the music.
- the time axis of one realization's novelty score can be warped to match another using dynamic programming.
- the warping function then serves as a tempo map.
- one piece could be played back with the tempo of the other.
- audio time and pitch scaling methods see S. Sprenger, “Time and Pitch Scaling of Audio Signal s”, www.dspdimension.com/html/timepitch.html.
- segment boundaries are useful landmarks for time-scale modification, just as “control points” are used for image “morphing”.
- Another application might be to play back an audio piece along with unpredictably timed events such as progress through a video game.
- Longer segments could be associated with particular stages, such as a game level or virtual environment location. As long as the user stayed at that stage, the segment would be looped. Moving to a different stage would cause another segment to start playing.
- Knowing the time location of landmarks allows synchronization of external events to the audio. For example, an animated character could beat time or dance to a musical tempo. Video clips could be automatically sequenced to an existing musical soundtrack. In another useful application, the lip movements of animated characters could be synchronized to speech or singing. Conversely, audio can be matched to an existing time sequence, by warping segments so that audio landmarks, or segment boundaries occur at given times. Examples include creating a soundtrack for an existing animation or video sequence. Another application might be to arrange songs by similar tempo so that the transition between them is smooth. This is a process that is done manually by expert DJs and also for vendors of “environmental” music, such as MuzakTM.
- FIG. 11 shows a similarity image of the Bach prelude for which the music bars are shown in FIG. 2, with data derived directly from MIDI data.
- FIG. 11 no acoustic information was used.
- Matrix entries (i,j) were colored white if note i was the same pitch as note j, and left black otherwise. Comparing this image with the acoustic similarity images of FIG. 3A, clearly the structures of both visualizations are highly similar, indicating that they do indeed capture the underlying structure of the music.
- FIGS. 3A and 11 show two realizations of the same Bach piece, one by Glenn Gould in FIG. 3 A and another which is a computer rendition of an MIDI file in FIG.
- Reliably segmenting music by note or phrase allows substantial compression of the audio. For example, a repeated series of notes can be represented by the first note and the repetition times. If the second chorus is nearly identical to the first, it need not be stored, only a code to indicate repetition of the first is needed.
- the MPEG-4 structured audio standard supports exactly this kind of representation, but previously there have been few reliable methods to analyze the structure of the existing audio.
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Multimedia (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Library & Information Science (AREA)
- Mathematical Physics (AREA)
- Computational Linguistics (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Auxiliary Devices For Music (AREA)
Abstract
Description
Claims (36)
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US09/569,230 US6542869B1 (en) | 2000-05-11 | 2000-05-11 | Method for automatic analysis of audio including music and speech |
JP2001140826A JP3941417B2 (en) | 2000-05-11 | 2001-05-11 | How to identify new points in a source audio signal |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US09/569,230 US6542869B1 (en) | 2000-05-11 | 2000-05-11 | Method for automatic analysis of audio including music and speech |
Publications (1)
Publication Number | Publication Date |
---|---|
US6542869B1 true US6542869B1 (en) | 2003-04-01 |
Family
ID=24274595
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US09/569,230 Expired - Lifetime US6542869B1 (en) | 2000-05-11 | 2000-05-11 | Method for automatic analysis of audio including music and speech |
Country Status (2)
Country | Link |
---|---|
US (1) | US6542869B1 (en) |
JP (1) | JP3941417B2 (en) |
Cited By (167)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20020172372A1 (en) * | 2001-03-22 | 2002-11-21 | Junichi Tagawa | Sound features extracting apparatus, sound data registering apparatus, sound data retrieving apparatus, and methods and programs for implementing the same |
US20030018709A1 (en) * | 2001-07-20 | 2003-01-23 | Audible Magic | Playlist generation method and apparatus |
US20030037010A1 (en) * | 2001-04-05 | 2003-02-20 | Audible Magic, Inc. | Copyright detection and protection system and method |
US20030097640A1 (en) * | 2001-07-25 | 2003-05-22 | International Business Machines Corporation | System and method for creating and editing documents |
US20030135623A1 (en) * | 2001-10-23 | 2003-07-17 | Audible Magic, Inc. | Method and apparatus for cache promotion |
US20030200134A1 (en) * | 2002-03-29 | 2003-10-23 | Leonard Michael James | System and method for large-scale automatic forecasting |
US20030208289A1 (en) * | 2002-05-06 | 2003-11-06 | Jezekiel Ben-Arie | Method of recognition of human motion, vector sequences and speech |
US20030205124A1 (en) * | 2002-05-01 | 2003-11-06 | Foote Jonathan T. | Method and system for retrieving and sequencing music by rhythmic similarity |
US20030231775A1 (en) * | 2002-05-31 | 2003-12-18 | Canon Kabushiki Kaisha | Robust detection and classification of objects in audio using limited training data |
US20040027369A1 (en) * | 2000-12-22 | 2004-02-12 | Peter Rowan Kellock | System and method for media production |
US20040054542A1 (en) * | 2002-09-13 | 2004-03-18 | Foote Jonathan T. | Automatic generation of multimedia presentation |
US20040073554A1 (en) * | 2002-10-15 | 2004-04-15 | Cooper Matthew L. | Summarization of digital files |
US20040093354A1 (en) * | 2001-03-23 | 2004-05-13 | Changsheng Xu | Method and system of representing musical information in a digital representation for use in content-based multimedia information retrieval |
US20040122671A1 (en) * | 2002-10-14 | 2004-06-24 | Lam Yin Hay | Method for recognizing speech |
US20040215447A1 (en) * | 2003-04-25 | 2004-10-28 | Prabindh Sundareson | Apparatus and method for automatic classification/identification of similar compressed audio files |
US6813600B1 (en) * | 2000-09-07 | 2004-11-02 | Lucent Technologies Inc. | Preclassification of audio material in digital audio compression applications |
US20050044189A1 (en) * | 2000-02-17 | 2005-02-24 | Audible Magic Corporation. | Method and apparatus for identifying media content presented on a media playing device |
US20050045025A1 (en) * | 2003-08-25 | 2005-03-03 | Wells Robert V. | Video game system and method |
US20050065915A1 (en) * | 2003-09-23 | 2005-03-24 | Allen Wayne J. | Method and system to add protocol support for network traffic tools |
US20050091062A1 (en) * | 2003-10-24 | 2005-04-28 | Burges Christopher J.C. | Systems and methods for generating audio thumbnails |
US20050091053A1 (en) * | 2000-09-12 | 2005-04-28 | Pioneer Corporation | Voice recognition system |
US20050097075A1 (en) * | 2000-07-06 | 2005-05-05 | Microsoft Corporation | System and methods for providing automatic classification of media entities according to consonance properties |
US20050097120A1 (en) * | 2003-10-31 | 2005-05-05 | Fuji Xerox Co., Ltd. | Systems and methods for organizing data |
US20050137861A1 (en) * | 2001-03-26 | 2005-06-23 | Microsoft Corporation | Methods and systems for synchronizing visualizations with audio streams |
US20050190199A1 (en) * | 2001-12-21 | 2005-09-01 | Hartwell Brown | Apparatus and method for identifying and simultaneously displaying images of musical notes in music and producing the music |
US20050209851A1 (en) * | 2001-04-05 | 2005-09-22 | Chang-Qing Shu | Systems and methods for implementing segmentation in speech recognition systems |
US20050228649A1 (en) * | 2002-07-08 | 2005-10-13 | Hadi Harb | Method and apparatus for classifying sound signals |
US20050234366A1 (en) * | 2004-03-19 | 2005-10-20 | Thorsten Heinz | Apparatus and method for analyzing a sound signal using a physiological ear model |
US20050238238A1 (en) * | 2002-07-19 | 2005-10-27 | Li-Qun Xu | Method and system for classification of semantic content of audio/video data |
US20050249080A1 (en) * | 2004-05-07 | 2005-11-10 | Fuji Xerox Co., Ltd. | Method and system for harvesting a media stream |
US20060020958A1 (en) * | 2004-07-26 | 2006-01-26 | Eric Allamanche | Apparatus and method for robust classification of audio signals, and method for establishing and operating an audio-signal database, as well as computer program |
US20060032362A1 (en) * | 2002-09-19 | 2006-02-16 | Brian Reynolds | System and method for the creation and playback of animated, interpretive, musical notation and audio synchronized with the recorded performance of an original artist |
AU2003204588B2 (en) * | 2002-05-31 | 2006-02-23 | Canon Kabushiki Kaisha | Robust Detection and Classification of Objects in Audio Using Limited Training Data |
US20060058998A1 (en) * | 2004-09-16 | 2006-03-16 | Kabushiki Kaisha Toshiba | Indexing apparatus and indexing method |
US20060065106A1 (en) * | 2004-09-28 | 2006-03-30 | Pinxteren Markus V | Apparatus and method for changing a segmentation of an audio piece |
WO2006034743A1 (en) * | 2004-09-28 | 2006-04-06 | Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. | Device and method for arranging in groups temporal segments of a piece of music |
DE102004047032A1 (en) * | 2004-09-28 | 2006-04-06 | Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. | Apparatus and method for designating different segment classes |
US20060085188A1 (en) * | 2004-10-18 | 2006-04-20 | Creative Technology Ltd. | Method for Segmenting Audio Signals |
US20060107216A1 (en) * | 2004-11-12 | 2006-05-18 | Fuji Xerox Co., Ltd. | Video segmentation combining similarity analysis and classification |
US20060173746A1 (en) * | 2005-01-18 | 2006-08-03 | Fuji Xerox Co., Ltd. | Efficient methods for temporal event clustering of digital photographs |
US20060195861A1 (en) * | 2003-10-17 | 2006-08-31 | Morris Lee | Methods and apparatus for identifying audio/video content using temporal signal characteristics |
US20060203105A1 (en) * | 2003-09-17 | 2006-09-14 | Venugopal Srinivasan | Methods and apparatus to operate an audience metering device with voice commands |
US20060217966A1 (en) * | 2005-03-24 | 2006-09-28 | The Mitre Corporation | System and method for audio hot spotting |
US20060218505A1 (en) * | 2005-03-28 | 2006-09-28 | Compton Anthony K | System, method and program product for displaying always visible audio content based visualization |
US20070022867A1 (en) * | 2005-07-27 | 2007-02-01 | Sony Corporation | Beat extraction apparatus and method, music-synchronized image display apparatus and method, tempo value detection apparatus, rhythm tracking apparatus and method, and music-synchronized display apparatus and method |
US20070074147A1 (en) * | 2005-09-28 | 2007-03-29 | Audible Magic Corporation | Method and apparatus for identifying an unknown work |
FR2891651A1 (en) * | 2005-10-05 | 2007-04-06 | Sagem Comm | Karaoke system for use with e.g. CD, has real time audio processing unit to deliver karaoke video stream carrying text information of input audiovisual stream voice part and storage unit to temporarily store input stream during preset time |
WO2007059498A2 (en) | 2005-11-14 | 2007-05-24 | Mediaguide, Inc. | Method and apparatus for automatic detection and identification of unidentified broadcast audio or video signals |
WO2007072394A2 (en) * | 2005-12-22 | 2007-06-28 | Koninklijke Philips Electronics N.V. | Audio structure analysis |
US20070203696A1 (en) * | 2004-04-02 | 2007-08-30 | Kddi Corporation | Content Distribution Server For Distributing Content Frame For Reproducing Music And Terminal |
US20070239753A1 (en) * | 2006-04-06 | 2007-10-11 | Leonard Michael J | Systems And Methods For Mining Transactional And Time Series Data |
US20070261537A1 (en) * | 2006-05-12 | 2007-11-15 | Nokia Corporation | Creating and sharing variations of a music file |
US20070288237A1 (en) * | 2006-06-07 | 2007-12-13 | Chung-Hsien Wu | Method And Apparatus For Multimedia Data Management |
US20080040123A1 (en) * | 2006-05-31 | 2008-02-14 | Victor Company Of Japan, Ltd. | Music-piece classifying apparatus and method, and related computer program |
US20080046406A1 (en) * | 2006-08-15 | 2008-02-21 | Microsoft Corporation | Audio and video thumbnails |
US20080077263A1 (en) * | 2006-09-21 | 2008-03-27 | Sony Corporation | Data recording device, data recording method, and data recording program |
US20080113586A1 (en) * | 2006-10-02 | 2008-05-15 | Mark Hardin | Electronic playset |
US20080195654A1 (en) * | 2001-08-20 | 2008-08-14 | Microsoft Corporation | System and methods for providing adaptive media property classification |
WO2008100485A1 (en) * | 2007-02-12 | 2008-08-21 | Union College | A system and method for transforming dispersed data patterns into moving objects |
US20080215324A1 (en) * | 2007-01-17 | 2008-09-04 | Kabushiki Kaisha Toshiba | Indexing apparatus, indexing method, and computer program product |
US20080221895A1 (en) * | 2005-09-30 | 2008-09-11 | Koninklijke Philips Electronics, N.V. | Method and Apparatus for Processing Audio for Playback |
US7451077B1 (en) | 2004-09-23 | 2008-11-11 | Felicia Lindau | Acoustic presentation system and method |
US20080281590A1 (en) * | 2005-10-17 | 2008-11-13 | Koninklijke Philips Electronics, N.V. | Method of Deriving a Set of Features for an Audio Input Signal |
US20080301318A1 (en) * | 2005-12-13 | 2008-12-04 | Mccue John | Segmentation and Transmission of Audio Streams |
US20090006551A1 (en) * | 2007-06-29 | 2009-01-01 | Microsoft Corporation | Dynamic awareness of people |
US20090031326A1 (en) * | 2007-07-27 | 2009-01-29 | Audible Magic Corporation | System for identifying content of digital data |
US20090067807A1 (en) * | 2007-09-12 | 2009-03-12 | Kabushiki Kaisha Toshiba | Signal processing apparatus and method thereof |
US20090080694A1 (en) * | 1995-05-08 | 2009-03-26 | Levy Kenneth L | Deriving Multiple Identifiers from Multimedia Content |
US20090093899A1 (en) * | 2003-01-02 | 2009-04-09 | Yaacov Ben-Yaacov | Portable music player and transmitter |
US7521622B1 (en) * | 2007-02-16 | 2009-04-21 | Hewlett-Packard Development Company, L.P. | Noise-resistant detection of harmonic segments of audio signals |
US20090132252A1 (en) * | 2007-11-20 | 2009-05-21 | Massachusetts Institute Of Technology | Unsupervised Topic Segmentation of Acoustic Speech Signal |
US20090151544A1 (en) * | 2007-12-17 | 2009-06-18 | Sony Corporation | Method for music structure analysis |
US7562012B1 (en) * | 2000-11-03 | 2009-07-14 | Audible Magic Corporation | Method and apparatus for creating a unique audio signature |
EP2093753A1 (en) * | 2008-02-19 | 2009-08-26 | Yamaha Corporation | Sound signal processing apparatus and method |
US20090216611A1 (en) * | 2008-02-25 | 2009-08-27 | Leonard Michael J | Computer-Implemented Systems And Methods Of Product Forecasting For New Products |
EP2096626A1 (en) | 2008-02-29 | 2009-09-02 | Sony Corporation | Method for visualizing audio data |
US20100008536A1 (en) * | 1994-10-21 | 2010-01-14 | Rhoads Geoffrey B | Methods and Systems for Steganographic Processing |
US7668610B1 (en) | 2005-11-30 | 2010-02-23 | Google Inc. | Deconstructing electronic media stream into human recognizable portions |
US7702014B1 (en) | 1999-12-16 | 2010-04-20 | Muvee Technologies Pte. Ltd. | System and method for video production |
US7716022B1 (en) | 2005-05-09 | 2010-05-11 | Sas Institute Inc. | Computer-implemented systems and methods for processing time series data |
US20100138010A1 (en) * | 2008-11-28 | 2010-06-03 | Audionamix | Automatic gathering strategy for unsupervised source separation algorithms |
US20100145708A1 (en) * | 2008-12-02 | 2010-06-10 | Melodis Corporation | System and method for identifying original music |
US20100174389A1 (en) * | 2009-01-06 | 2010-07-08 | Audionamix | Automatic audio source separation with joint spectral shape, expansion coefficients and musical state estimation |
US7826911B1 (en) | 2005-11-30 | 2010-11-02 | Google Inc. | Automatic selection of representative media clips |
US7877438B2 (en) | 2001-07-20 | 2011-01-25 | Audible Magic Corporation | Method and apparatus for identifying new media content |
US7881446B1 (en) | 2004-09-03 | 2011-02-01 | Confinement Telephony Technology, Llc | Telephony system and method with enhanced validation |
US20110071824A1 (en) * | 2009-09-23 | 2011-03-24 | Carol Espy-Wilson | Systems and Methods for Multiple Pitch Tracking |
US20110160887A1 (en) * | 2008-08-20 | 2011-06-30 | Pioneer Corporation | Information generating apparatus, information generating method and information generating program |
US8062089B2 (en) | 2006-10-02 | 2011-11-22 | Mattel, Inc. | Electronic playset |
US8112302B1 (en) | 2006-11-03 | 2012-02-07 | Sas Institute Inc. | Computer-implemented systems and methods for forecast reconciliation |
US8130746B2 (en) | 2004-07-28 | 2012-03-06 | Audible Magic Corporation | System for distributing decoy content in a peer to peer network |
US20120101606A1 (en) * | 2010-10-22 | 2012-04-26 | Yasushi Miyajima | Information processing apparatus, content data reconfiguring method and program |
US20120143610A1 (en) * | 2010-12-03 | 2012-06-07 | Industrial Technology Research Institute | Sound Event Detecting Module and Method Thereof |
US8199651B1 (en) | 2009-03-16 | 2012-06-12 | Audible Magic Corporation | Method and system for modifying communication flows at a port level |
US8332326B2 (en) | 2003-02-01 | 2012-12-11 | Audible Magic Corporation | Method and apparatus to identify a work received by a processing system |
US20120316886A1 (en) * | 2011-06-08 | 2012-12-13 | Ramin Pishehvar | Sparse coding using object exttraction |
CN102956237A (en) * | 2011-08-19 | 2013-03-06 | 杜比实验室特许公司 | Method and device for measuring content consistency and method and device for measuring similarity |
US8433431B1 (en) | 2008-12-02 | 2013-04-30 | Soundhound, Inc. | Displaying text to end users in coordination with audio playback |
US20130132401A1 (en) * | 2011-11-17 | 2013-05-23 | Yahoo! Inc. | Related news articles |
US20130179439A1 (en) * | 2001-05-16 | 2013-07-11 | Pandora Media, Inc. | Methods and Systems for Utilizing Contextual Feedback to Generate and Modify Playlists |
US20130255473A1 (en) * | 2012-03-29 | 2013-10-03 | Sony Corporation | Tonal component detection method, tonal component detection apparatus, and program |
US20130325853A1 (en) * | 2012-05-29 | 2013-12-05 | Jeffery David Frazier | Digital media players comprising a music-speech discrimination function |
US8631040B2 (en) | 2010-02-23 | 2014-01-14 | Sas Institute Inc. | Computer-implemented systems and methods for flexible definition of time intervals |
US8640179B1 (en) | 2000-09-14 | 2014-01-28 | Network-1 Security Solutions, Inc. | Method for using extracted features from an electronic work |
US20140185816A1 (en) * | 2013-01-02 | 2014-07-03 | Samsung Electronics Co., Ltd. | Apparatus and method for processing audio signal |
US20140350923A1 (en) * | 2013-05-23 | 2014-11-27 | Tencent Technology (Shenzhen) Co., Ltd. | Method and device for detecting noise bursts in speech signals |
EP2816550A1 (en) * | 2013-06-18 | 2014-12-24 | Nokia Corporation | Audio signal analysis |
US20150006411A1 (en) * | 2008-06-11 | 2015-01-01 | James D. Bennett | Creative work registry |
US8965766B1 (en) * | 2012-03-15 | 2015-02-24 | Google Inc. | Systems and methods for identifying music in a noisy environment |
US20150081064A1 (en) * | 2013-09-19 | 2015-03-19 | Microsoft Corporation | Combining audio samples by automatically adjusting sample characteristics |
US20150094835A1 (en) * | 2013-09-27 | 2015-04-02 | Nokia Corporation | Audio analysis apparatus |
US9037998B2 (en) | 2012-07-13 | 2015-05-19 | Sas Institute Inc. | Computer-implemented systems and methods for time series exploration using structured judgment |
US9047371B2 (en) | 2010-07-29 | 2015-06-02 | Soundhound, Inc. | System and method for matching a query against a broadcast stream |
US9047559B2 (en) | 2011-07-22 | 2015-06-02 | Sas Institute Inc. | Computer-implemented systems and methods for testing large scale automatic forecast combinations |
US9081778B2 (en) | 2012-09-25 | 2015-07-14 | Audible Magic Corporation | Using digital fingerprints to associate data with a work |
US9124769B2 (en) | 2008-10-31 | 2015-09-01 | The Nielsen Company (Us), Llc | Methods and apparatus to verify presentation of media content |
US9147218B2 (en) | 2013-03-06 | 2015-09-29 | Sas Institute Inc. | Devices for forecasting ratios in hierarchies |
US9208209B1 (en) | 2014-10-02 | 2015-12-08 | Sas Institute Inc. | Techniques for monitoring transformation techniques using control charts |
US20160005387A1 (en) * | 2012-06-29 | 2016-01-07 | Nokia Technologies Oy | Audio signal analysis |
US9244887B2 (en) | 2012-07-13 | 2016-01-26 | Sas Institute Inc. | Computer-implemented systems and methods for efficient structuring of time series data |
US9292488B2 (en) | 2014-02-01 | 2016-03-22 | Soundhound, Inc. | Method for embedding voice mail in a spoken utterance using a natural language processing computer system |
US9336493B2 (en) | 2011-06-06 | 2016-05-10 | Sas Institute Inc. | Systems and methods for clustering time series data based on forecast distributions |
US9390167B2 (en) | 2010-07-29 | 2016-07-12 | Soundhound, Inc. | System and methods for continuous audio matching |
US9418339B1 (en) | 2015-01-26 | 2016-08-16 | Sas Institute, Inc. | Systems and methods for time series analysis techniques utilizing count data sets |
US9507849B2 (en) | 2013-11-28 | 2016-11-29 | Soundhound, Inc. | Method for combining a query and a communication command in a natural language computer system |
US9547715B2 (en) | 2011-08-19 | 2017-01-17 | Dolby Laboratories Licensing Corporation | Methods and apparatus for detecting a repetitive pattern in a sequence of audio frames |
US9564123B1 (en) | 2014-05-12 | 2017-02-07 | Soundhound, Inc. | Method and system for building an integrated user profile |
US9584940B2 (en) | 2014-03-13 | 2017-02-28 | Accusonus, Inc. | Wireless exchange of data between devices in live events |
US9653056B2 (en) | 2012-04-30 | 2017-05-16 | Nokia Technologies Oy | Evaluation of beats, chords and downbeats from a musical audio signal |
US20170287505A1 (en) * | 2014-09-03 | 2017-10-05 | Samsung Electronics Co., Ltd. | Method and apparatus for learning and recognizing audio signal |
US9798974B2 (en) | 2013-09-19 | 2017-10-24 | Microsoft Technology Licensing, Llc | Recommending audio sample combinations |
US9812150B2 (en) | 2013-08-28 | 2017-11-07 | Accusonus, Inc. | Methods and systems for improved signal decomposition |
US9892370B2 (en) | 2014-06-12 | 2018-02-13 | Sas Institute Inc. | Systems and methods for resolving over multiple hierarchies |
US9934259B2 (en) | 2013-08-15 | 2018-04-03 | Sas Institute Inc. | In-memory time series database and processing in a distributed environment |
US10121165B1 (en) | 2011-05-10 | 2018-11-06 | Soundhound, Inc. | System and method for targeting content based on identified audio and multimedia |
US10170090B2 (en) * | 2016-06-08 | 2019-01-01 | Visionarist Co., Ltd | Music information generating device, music information generating method, and recording medium |
US10169720B2 (en) | 2014-04-17 | 2019-01-01 | Sas Institute Inc. | Systems and methods for machine learning using classifying, clustering, and grouping time series data |
US20190014312A1 (en) * | 2017-07-05 | 2019-01-10 | Tektronix, Inc. | Video Waveform Peak Indicator |
US10255085B1 (en) | 2018-03-13 | 2019-04-09 | Sas Institute Inc. | Interactive graphical user interface with override guidance |
US10289916B2 (en) * | 2015-07-21 | 2019-05-14 | Shred Video, Inc. | System and method for editing video and audio clips |
US10297241B2 (en) * | 2016-03-07 | 2019-05-21 | Yamaha Corporation | Sound signal processing method and sound signal processing apparatus |
US10324899B2 (en) * | 2005-11-07 | 2019-06-18 | Nokia Technologies Oy | Methods for characterizing content item groups |
US10331490B2 (en) | 2017-11-16 | 2019-06-25 | Sas Institute Inc. | Scalable cloud-based time series analysis |
US10338994B1 (en) | 2018-02-22 | 2019-07-02 | Sas Institute Inc. | Predicting and adjusting computer functionality to avoid failures |
US10366121B2 (en) * | 2016-06-24 | 2019-07-30 | Mixed In Key Llc | Apparatus, method, and computer-readable medium for cue point generation |
US10460732B2 (en) * | 2016-03-31 | 2019-10-29 | Tata Consultancy Services Limited | System and method to insert visual subtitles in videos |
US10468036B2 (en) * | 2014-04-30 | 2019-11-05 | Accusonus, Inc. | Methods and systems for processing and mixing signals using signal decomposition |
US10560313B2 (en) | 2018-06-26 | 2020-02-11 | Sas Institute Inc. | Pipeline system for time-series data forecasting |
US20200151208A1 (en) * | 2016-09-23 | 2020-05-14 | Amazon Technologies, Inc. | Time code to byte indexer for partial object retrieval |
US10685283B2 (en) | 2018-06-26 | 2020-06-16 | Sas Institute Inc. | Demand classification based pipeline system for time-series data forecasting |
EP2201478B1 (en) * | 2007-09-18 | 2020-10-14 | Microsoft Technology Licensing, LLC | Synchronizing slide show events with audio |
US20200357369A1 (en) * | 2018-01-09 | 2020-11-12 | Guangzhou Baiguoyuan Information Technology Co., Ltd. | Music classification method and beat point detection method, storage device and computer device |
US20210064916A1 (en) * | 2018-05-17 | 2021-03-04 | Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. | Device and method for detecting partial matches between a first time varying signal and a second time varying signal |
US10957310B1 (en) | 2012-07-23 | 2021-03-23 | Soundhound, Inc. | Integrated programming framework for speech and text understanding with meaning parsing |
US10983682B2 (en) | 2015-08-27 | 2021-04-20 | Sas Institute Inc. | Interactive graphical user-interface for analyzing and manipulating time-series projections |
US11189277B2 (en) * | 2013-03-14 | 2021-11-30 | Amazon Technologies, Inc. | Dynamic gazetteers for personalized entity recognition |
US11264015B2 (en) | 2019-11-21 | 2022-03-01 | Bose Corporation | Variable-time smoothing for steady state noise estimation |
US11295730B1 (en) | 2014-02-27 | 2022-04-05 | Soundhound, Inc. | Using phonetic variants in a local context to improve natural language understanding |
US11315545B2 (en) | 2020-07-09 | 2022-04-26 | Raytheon Applied Signal Technology, Inc. | System and method for language identification in audio data |
US11374663B2 (en) * | 2019-11-21 | 2022-06-28 | Bose Corporation | Variable-frequency smoothing |
US11373657B2 (en) * | 2020-05-01 | 2022-06-28 | Raytheon Applied Signal Technology, Inc. | System and method for speaker identification in audio data |
US11443724B2 (en) * | 2018-07-31 | 2022-09-13 | Mediawave Intelligent Communication | Method of synchronizing electronic interactive device |
JP2022545342A (en) * | 2019-08-27 | 2022-10-27 | エヌイーシー ラボラトリーズ アメリカ インク | Sequence model for audio scene recognition |
US20230282205A1 (en) * | 2022-03-01 | 2023-09-07 | Raytheon Applied Signal Technology, Inc. | Conversation diarization based on aggregate dissimilarity |
CN117636900A (en) * | 2023-12-04 | 2024-03-01 | 广东新裕信息科技有限公司 | Musical instrument playing quality evaluation method based on audio characteristic shape matching |
US12020697B2 (en) | 2020-07-15 | 2024-06-25 | Raytheon Applied Signal Technology, Inc. | Systems and methods for fast filtering of audio keyword search |
US12124498B2 (en) * | 2020-01-09 | 2024-10-22 | Amazon Technologies, Inc. | Time code to byte indexer for partial object retrieval |
Families Citing this family (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2005202014A (en) * | 2004-01-14 | 2005-07-28 | Sony Corp | Audio signal processor, audio signal processing method, and audio signal processing program |
KR100725018B1 (en) | 2005-11-24 | 2007-06-07 | 삼성전자주식회사 | Method and apparatus for summarizing music content automatically |
JP6257537B2 (en) * | 2015-01-19 | 2018-01-10 | 日本電信電話株式会社 | Saliency estimation method, saliency estimation device, and program |
TWI796955B (en) | 2021-02-17 | 2023-03-21 | 日商日本製鐵股份有限公司 | Non-oriented electrical steel sheet and manufacturing method thereof |
TWI809799B (en) | 2021-04-02 | 2023-07-21 | 日商日本製鐵股份有限公司 | Non-oriented electrical steel sheet and manufacturing method thereof |
Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5227892A (en) | 1990-07-06 | 1993-07-13 | Sony Broadcast & Communications Ltd. | Method and apparatus for identifying and selecting edit paints in digital audio signals recorded on a record medium |
US5598507A (en) | 1994-04-12 | 1997-01-28 | Xerox Corporation | Method of speaker clustering for unknown speakers in conversational audio data |
US5655058A (en) | 1994-04-12 | 1997-08-05 | Xerox Corporation | Segmentation of audio data for indexing of conversational speech for real-time or postprocessing applications |
US5659662A (en) | 1994-04-12 | 1997-08-19 | Xerox Corporation | Unsupervised speaker clustering for automatic speaker indexing of recorded audio data |
US5828994A (en) | 1996-06-05 | 1998-10-27 | Interval Research Corporation | Non-uniform time scale modification of recorded audio |
US5918223A (en) * | 1996-07-22 | 1999-06-29 | Muscle Fish | Method and article of manufacture for content-based analysis, storage, retrieval, and segmentation of audio information |
US5986199A (en) * | 1998-05-29 | 1999-11-16 | Creative Technology, Ltd. | Device for acoustic entry of musical data |
US6185527B1 (en) * | 1999-01-19 | 2001-02-06 | International Business Machines Corporation | System and method for automatic audio content analysis for word spotting, indexing, classification and retrieval |
US6370504B1 (en) * | 1997-05-29 | 2002-04-09 | University Of Washington | Speech recognition on MPEG/Audio encoded files |
-
2000
- 2000-05-11 US US09/569,230 patent/US6542869B1/en not_active Expired - Lifetime
-
2001
- 2001-05-11 JP JP2001140826A patent/JP3941417B2/en not_active Expired - Fee Related
Patent Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5227892A (en) | 1990-07-06 | 1993-07-13 | Sony Broadcast & Communications Ltd. | Method and apparatus for identifying and selecting edit paints in digital audio signals recorded on a record medium |
US5598507A (en) | 1994-04-12 | 1997-01-28 | Xerox Corporation | Method of speaker clustering for unknown speakers in conversational audio data |
US5655058A (en) | 1994-04-12 | 1997-08-05 | Xerox Corporation | Segmentation of audio data for indexing of conversational speech for real-time or postprocessing applications |
US5659662A (en) | 1994-04-12 | 1997-08-19 | Xerox Corporation | Unsupervised speaker clustering for automatic speaker indexing of recorded audio data |
US5828994A (en) | 1996-06-05 | 1998-10-27 | Interval Research Corporation | Non-uniform time scale modification of recorded audio |
US5918223A (en) * | 1996-07-22 | 1999-06-29 | Muscle Fish | Method and article of manufacture for content-based analysis, storage, retrieval, and segmentation of audio information |
US6370504B1 (en) * | 1997-05-29 | 2002-04-09 | University Of Washington | Speech recognition on MPEG/Audio encoded files |
US5986199A (en) * | 1998-05-29 | 1999-11-16 | Creative Technology, Ltd. | Device for acoustic entry of musical data |
US6185527B1 (en) * | 1999-01-19 | 2001-02-06 | International Business Machines Corporation | System and method for automatic audio content analysis for word spotting, indexing, classification and retrieval |
Non-Patent Citations (22)
Title |
---|
Applications of Signal Processing to Audio and Acoustics, 1999 IEEE Workshop. Tzanetakis et al., "Multifeature audio segmentation for browsing and annotation". pp. 103-106. Oct. 1999.* * |
Arons, B., "SpeechSkimmer: A System for Interactively Skimming Recorded Speech," ACM Trans. on Computer Human Interaction, Mar. 1997, vol. 4, No. 1, pp. 3-38. (http://www.media.mit.edu/~barons/tochi97.html). |
Arons, B., "SpeechSkimmer: A System for Interactively Skimming Recorded Speech," ACM Trans. on Computer Human Interaction, Mar. 1997, vol. 4, No. 1, pp. 3-38. (http://www.media.mit.edu/˜barons/tochi97.html). |
Bregman, A. S., Auditory Scene Analysis: Perceptual Organization of Sound, Bradford Books, 1990. |
Eckmann, J.P. et al., "Recurrence Plots of Dynamical Systems," Europhys. Lett., vol. 4 (9), pp. 973-977, Nov. 1, 1987. |
Foote, J., "Content-Based Retrieval of Music and Audio," SPIE, 1997, vol. 3229, pp. 138-147. |
Foote, J., "Visualizing Music and Audio Using Self-Similarity," ACM Multimedia '99 10/99, Orlando, Florida. |
Foote, J.T. and Silverman, H.F., "A Model Distance Measure for Talker Clustering and Identification," pp. I-317-I-320, Apr. 1994, IEEE. |
Gish et al., "Segregation of Speakers for Speech Recognition and Speaker Identification," IEEE, Jul. 1991, vol. 2, pp. 873-876. |
Goto, M. and Muraoaka, Y., "A Beat Tracking System for Acoustic Signals of Music," Proc. ACM Multimedia 1994, San Francisco, ACM, pp. 365-372. |
ICME 2000. IEEE International Conference on Multimedia and Expo, 2000. Foote, "Automatic audio segmentation using measure of audio novelty". pp. 452-455 vol. 1, Aug. 2000.* * |
Johnson, P., "sci-skeptic FAQ.: The Frequently Questioned Answers" http://www.fags.org/fags/skeptic-faq/index.html, Apr. 21, 1996. |
Kimber, D. and Wilcox, L., "Acoustic Segmentation for Audio Browsers," in Proc. Interface Conference, Sydney, Australia, 1996, 10 pp. |
Rabiner, L. and Juang, B.-H., Fundamentals of Speech Recognition, PTR, Prentice Hall, Englewood Cliffs, New Jersey, 1993. |
Scheirer, E.D., "Tempo and Beat Analysis of Acoustic Musical Signals," J. Acoust. Soc. Am., vol. 103, No. 1, Jan. 1998, pp. 588-601. |
Scheirer, E.D., Using Musical Knowledge to Extract Expressive Performance Information from Audio Recordings, Sep. 30, 1996, Nikos Drakos, Computer Based Learning Unit, University of Leeds (http://sound.media.mit.edu/~eds/papers/html/ijcai95/). |
Scheirer, E.D., Using Musical Knowledge to Extract Expressive Performance Information from Audio Recordings, Sep. 30, 1996, Nikos Drakos, Computer Based Learning Unit, University of Leeds (http://sound.media.mit.edu/˜eds/papers/html/ijcai95/). |
Shepard, R., "Representations of Structure in Similar Data: Problems and Prospects," Psychometrica, Dec. 1974, vol. 39, No. 4, pp. 373-421. |
Siu et al., "An Unsupervised, Sequential Learning Algorithm for the Segmentation of Speech Waveforms with Multiple Speakers," IEEE, Sep. 1992, pp. II-189-II-192. |
Slaney, M., "Auditory Toolbox," Jan. 19, 1999, (http://sound.media.mit.edu/dpwe-bin/mhmessage.cgi/Audiotry/postings/1999/21). |
Sprenger, S., "Time and Pitch Scaling of Audio Signals," Nov. 1999, http://www.dspdimension.com/html/timepitch. html. |
Sugiyama et al., "Speech Segmentation and Clustering Based on Speaker Features," IEEE, Apr. 1993, pp. II-395-II-398. |
Cited By (374)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8073193B2 (en) | 1994-10-21 | 2011-12-06 | Digimarc Corporation | Methods and systems for steganographic processing |
US20100008536A1 (en) * | 1994-10-21 | 2010-01-14 | Rhoads Geoffrey B | Methods and Systems for Steganographic Processing |
US20090080694A1 (en) * | 1995-05-08 | 2009-03-26 | Levy Kenneth L | Deriving Multiple Identifiers from Multimedia Content |
US7602978B2 (en) | 1995-05-08 | 2009-10-13 | Digimarc Corporation | Deriving multiple identifiers from multimedia content |
US7702014B1 (en) | 1999-12-16 | 2010-04-20 | Muvee Technologies Pte. Ltd. | System and method for video production |
US9049468B2 (en) | 2000-02-17 | 2015-06-02 | Audible Magic Corporation | Method and apparatus for identifying media content presented on a media playing device |
US7917645B2 (en) | 2000-02-17 | 2011-03-29 | Audible Magic Corporation | Method and apparatus for identifying media content presented on a media playing device |
US7500007B2 (en) | 2000-02-17 | 2009-03-03 | Audible Magic Corporation | Method and apparatus for identifying media content presented on a media playing device |
US10194187B2 (en) | 2000-02-17 | 2019-01-29 | Audible Magic Corporation | Method and apparatus for identifying media content presented on a media playing device |
US20050044189A1 (en) * | 2000-02-17 | 2005-02-24 | Audible Magic Corporation. | Method and apparatus for identifying media content presented on a media playing device |
US7756874B2 (en) * | 2000-07-06 | 2010-07-13 | Microsoft Corporation | System and methods for providing automatic classification of media entities according to consonance properties |
US20050097075A1 (en) * | 2000-07-06 | 2005-05-05 | Microsoft Corporation | System and methods for providing automatic classification of media entities according to consonance properties |
US6813600B1 (en) * | 2000-09-07 | 2004-11-02 | Lucent Technologies Inc. | Preclassification of audio material in digital audio compression applications |
US20050091053A1 (en) * | 2000-09-12 | 2005-04-28 | Pioneer Corporation | Voice recognition system |
US9781251B1 (en) | 2000-09-14 | 2017-10-03 | Network-1 Technologies, Inc. | Methods for using extracted features and annotations associated with an electronic media work to perform an action |
US9282359B1 (en) | 2000-09-14 | 2016-03-08 | Network-1 Technologies, Inc. | Method for taking action with respect to an electronic media work |
US10621227B1 (en) | 2000-09-14 | 2020-04-14 | Network-1 Technologies, Inc. | Methods for using extracted features to perform an action |
US10108642B1 (en) | 2000-09-14 | 2018-10-23 | Network-1 Technologies, Inc. | System for using extracted feature vectors to perform an action associated with a work identifier |
US10073862B1 (en) | 2000-09-14 | 2018-09-11 | Network-1 Technologies, Inc. | Methods for using extracted features to perform an action associated with selected identified image |
US10063940B1 (en) | 2000-09-14 | 2018-08-28 | Network-1 Technologies, Inc. | System for using extracted feature vectors to perform an action associated with a work identifier |
US10305984B1 (en) | 2000-09-14 | 2019-05-28 | Network-1 Technologies, Inc. | Methods for using extracted features to perform an action associated with selected identified image |
US10303713B1 (en) | 2000-09-14 | 2019-05-28 | Network-1 Technologies, Inc. | Methods for using extracted features to perform an action |
US10063936B1 (en) | 2000-09-14 | 2018-08-28 | Network-1 Technologies, Inc. | Methods for using extracted feature vectors to perform an action associated with a work identifier |
US10621226B1 (en) | 2000-09-14 | 2020-04-14 | Network-1 Technologies, Inc. | Methods for using extracted features to perform an action associated with selected identified image |
US10057408B1 (en) | 2000-09-14 | 2018-08-21 | Network-1 Technologies, Inc. | Methods for using extracted feature vectors to perform an action associated with a work identifier |
US9883253B1 (en) | 2000-09-14 | 2018-01-30 | Network-1 Technologies, Inc. | Methods for using extracted feature vectors to perform an action associated with a product |
US9832266B1 (en) | 2000-09-14 | 2017-11-28 | Network-1 Technologies, Inc. | Methods for using extracted features to perform an action associated with identified action information |
US9824098B1 (en) | 2000-09-14 | 2017-11-21 | Network-1 Technologies, Inc. | Methods for using extracted features to perform an action associated with identified action information |
US9805066B1 (en) | 2000-09-14 | 2017-10-31 | Network-1 Technologies, Inc. | Methods for using extracted features and annotations associated with an electronic media work to perform an action |
US9807472B1 (en) | 2000-09-14 | 2017-10-31 | Network-1 Technologies, Inc. | Methods for using extracted feature vectors to perform an action associated with a product |
US10552475B1 (en) | 2000-09-14 | 2020-02-04 | Network-1 Technologies, Inc. | Methods for using extracted features to perform an action |
US9558190B1 (en) | 2000-09-14 | 2017-01-31 | Network-1 Technologies, Inc. | System and method for taking action with respect to an electronic media work |
US9544663B1 (en) | 2000-09-14 | 2017-01-10 | Network-1 Technologies, Inc. | System for taking action with respect to a media work |
US9536253B1 (en) | 2000-09-14 | 2017-01-03 | Network-1 Technologies, Inc. | Methods for linking an electronic media work to perform an action |
US9538216B1 (en) | 2000-09-14 | 2017-01-03 | Network-1 Technologies, Inc. | System for taking action with respect to a media work |
US9529870B1 (en) | 2000-09-14 | 2016-12-27 | Network-1 Technologies, Inc. | Methods for linking an electronic media work to perform an action |
US9348820B1 (en) | 2000-09-14 | 2016-05-24 | Network-1 Technologies, Inc. | System and method for taking action with respect to an electronic media work and logging event information related thereto |
US10205781B1 (en) | 2000-09-14 | 2019-02-12 | Network-1 Technologies, Inc. | Methods for using extracted features to perform an action associated with selected identified image |
US9256885B1 (en) | 2000-09-14 | 2016-02-09 | Network-1 Technologies, Inc. | Method for linking an electronic media work to perform an action |
US10540391B1 (en) | 2000-09-14 | 2020-01-21 | Network-1 Technologies, Inc. | Methods for using extracted features to perform an action |
US8904465B1 (en) | 2000-09-14 | 2014-12-02 | Network-1 Technologies, Inc. | System for taking action based on a request related to an electronic media work |
US8904464B1 (en) | 2000-09-14 | 2014-12-02 | Network-1 Technologies, Inc. | Method for tagging an electronic media work to perform an action |
US8782726B1 (en) | 2000-09-14 | 2014-07-15 | Network-1 Technologies, Inc. | Method for taking action based on a request related to an electronic media work |
US8656441B1 (en) | 2000-09-14 | 2014-02-18 | Network-1 Technologies, Inc. | System for using extracted features from an electronic work |
US8640179B1 (en) | 2000-09-14 | 2014-01-28 | Network-1 Security Solutions, Inc. | Method for using extracted features from an electronic work |
US10521470B1 (en) | 2000-09-14 | 2019-12-31 | Network-1 Technologies, Inc. | Methods for using extracted features to perform an action associated with selected identified image |
US10303714B1 (en) | 2000-09-14 | 2019-05-28 | Network-1 Technologies, Inc. | Methods for using extracted features to perform an action |
US10521471B1 (en) | 2000-09-14 | 2019-12-31 | Network-1 Technologies, Inc. | Method for using extracted features to perform an action associated with selected identified image |
US10367885B1 (en) | 2000-09-14 | 2019-07-30 | Network-1 Technologies, Inc. | Methods for using extracted features to perform an action associated with selected identified image |
US7562012B1 (en) * | 2000-11-03 | 2009-07-14 | Audible Magic Corporation | Method and apparatus for creating a unique audio signature |
US8086445B2 (en) | 2000-11-03 | 2011-12-27 | Audible Magic Corporation | Method and apparatus for creating a unique audio signature |
US20090240361A1 (en) * | 2000-11-03 | 2009-09-24 | Wold Erling H | Method and apparatus for creating a unique audio signature |
US20040027369A1 (en) * | 2000-12-22 | 2004-02-12 | Peter Rowan Kellock | System and method for media production |
US8006186B2 (en) | 2000-12-22 | 2011-08-23 | Muvee Technologies Pte. Ltd. | System and method for media production |
US7373209B2 (en) * | 2001-03-22 | 2008-05-13 | Matsushita Electric Industrial Co., Ltd. | Sound features extracting apparatus, sound data registering apparatus, sound data retrieving apparatus, and methods and programs for implementing the same |
US20020172372A1 (en) * | 2001-03-22 | 2002-11-21 | Junichi Tagawa | Sound features extracting apparatus, sound data registering apparatus, sound data retrieving apparatus, and methods and programs for implementing the same |
US20040093354A1 (en) * | 2001-03-23 | 2004-05-13 | Changsheng Xu | Method and system of representing musical information in a digital representation for use in content-based multimedia information retrieval |
US7596582B2 (en) | 2001-03-26 | 2009-09-29 | Microsoft Corporation | Methods and systems for synchronizing visualizations with audio streams |
US7526505B2 (en) * | 2001-03-26 | 2009-04-28 | Microsoft Corporation | Methods and systems for synchronizing visualizations with audio streams |
US20050137861A1 (en) * | 2001-03-26 | 2005-06-23 | Microsoft Corporation | Methods and systems for synchronizing visualizations with audio streams |
US7599961B2 (en) | 2001-03-26 | 2009-10-06 | Microsoft Corporation | Methods and systems for synchronizing visualizations with audio streams |
US7620656B2 (en) * | 2001-03-26 | 2009-11-17 | Microsoft Corporation | Methods and systems for synchronizing visualizations with audio streams |
US20090077673A1 (en) * | 2001-04-05 | 2009-03-19 | Schmelzer Richard A | Copyright detection and protection system and method |
US8484691B2 (en) | 2001-04-05 | 2013-07-09 | Audible Magic Corporation | Copyright detection and protection system and method |
US20090328236A1 (en) * | 2001-04-05 | 2009-12-31 | Schmelzer Richard A | Copyright detection and protection system and method |
US20030037010A1 (en) * | 2001-04-05 | 2003-02-20 | Audible Magic, Inc. | Copyright detection and protection system and method |
US7680662B2 (en) * | 2001-04-05 | 2010-03-16 | Verizon Corporate Services Group Inc. | Systems and methods for implementing segmentation in speech recognition systems |
US7707088B2 (en) | 2001-04-05 | 2010-04-27 | Audible Magic Corporation | Copyright detection and protection system and method |
US7363278B2 (en) | 2001-04-05 | 2008-04-22 | Audible Magic Corporation | Copyright detection and protection system and method |
US7711652B2 (en) | 2001-04-05 | 2010-05-04 | Audible Magic Corporation | Copyright detection and protection system and method |
US7797249B2 (en) | 2001-04-05 | 2010-09-14 | Audible Magic Corporation | Copyright detection and protection system and method |
US7565327B2 (en) | 2001-04-05 | 2009-07-21 | Audible Magic Corporation | Copyright detection and protection system and method |
US8645279B2 (en) | 2001-04-05 | 2014-02-04 | Audible Magic Corporation | Copyright detection and protection system and method |
US8775317B2 (en) | 2001-04-05 | 2014-07-08 | Audible Magic Corporation | Copyright detection and protection system and method |
US9589141B2 (en) | 2001-04-05 | 2017-03-07 | Audible Magic Corporation | Copyright detection and protection system and method |
US20080141379A1 (en) * | 2001-04-05 | 2008-06-12 | Audible Magic Corporation | Copyright detection and protection system and method |
US20080155116A1 (en) * | 2001-04-05 | 2008-06-26 | Audible Magic Corporation | Copyright detection and protection system and method |
US20050209851A1 (en) * | 2001-04-05 | 2005-09-22 | Chang-Qing Shu | Systems and methods for implementing segmentation in speech recognition systems |
US20130179439A1 (en) * | 2001-05-16 | 2013-07-11 | Pandora Media, Inc. | Methods and Systems for Utilizing Contextual Feedback to Generate and Modify Playlists |
US8082150B2 (en) | 2001-07-10 | 2011-12-20 | Audible Magic Corporation | Method and apparatus for identifying an unknown work |
US10025841B2 (en) | 2001-07-20 | 2018-07-17 | Audible Magic, Inc. | Play list generation method and apparatus |
US20030018709A1 (en) * | 2001-07-20 | 2003-01-23 | Audible Magic | Playlist generation method and apparatus |
US7877438B2 (en) | 2001-07-20 | 2011-01-25 | Audible Magic Corporation | Method and apparatus for identifying new media content |
US8972481B2 (en) | 2001-07-20 | 2015-03-03 | Audible Magic, Inc. | Playlist generation method and apparatus |
US20030097640A1 (en) * | 2001-07-25 | 2003-05-22 | International Business Machines Corporation | System and method for creating and editing documents |
US8082279B2 (en) | 2001-08-20 | 2011-12-20 | Microsoft Corporation | System and methods for providing adaptive media property classification |
US20080195654A1 (en) * | 2001-08-20 | 2008-08-14 | Microsoft Corporation | System and methods for providing adaptive media property classification |
US20030135623A1 (en) * | 2001-10-23 | 2003-07-17 | Audible Magic, Inc. | Method and apparatus for cache promotion |
US20050190199A1 (en) * | 2001-12-21 | 2005-09-01 | Hartwell Brown | Apparatus and method for identifying and simultaneously displaying images of musical notes in music and producing the music |
US20030200134A1 (en) * | 2002-03-29 | 2003-10-23 | Leonard Michael James | System and method for large-scale automatic forecasting |
US20030205124A1 (en) * | 2002-05-01 | 2003-11-06 | Foote Jonathan T. | Method and system for retrieving and sequencing music by rhythmic similarity |
US20030208289A1 (en) * | 2002-05-06 | 2003-11-06 | Jezekiel Ben-Arie | Method of recognition of human motion, vector sequences and speech |
US7366645B2 (en) | 2002-05-06 | 2008-04-29 | Jezekiel Ben-Arie | Method of recognition of human motion, vector sequences and speech |
US20030231775A1 (en) * | 2002-05-31 | 2003-12-18 | Canon Kabushiki Kaisha | Robust detection and classification of objects in audio using limited training data |
AU2003204588B2 (en) * | 2002-05-31 | 2006-02-23 | Canon Kabushiki Kaisha | Robust Detection and Classification of Objects in Audio Using Limited Training Data |
US7263485B2 (en) | 2002-05-31 | 2007-08-28 | Canon Kabushiki Kaisha | Robust detection and classification of objects in audio using limited training data |
US20050228649A1 (en) * | 2002-07-08 | 2005-10-13 | Hadi Harb | Method and apparatus for classifying sound signals |
US20050238238A1 (en) * | 2002-07-19 | 2005-10-27 | Li-Qun Xu | Method and system for classification of semantic content of audio/video data |
US7383509B2 (en) | 2002-09-13 | 2008-06-03 | Fuji Xerox Co., Ltd. | Automatic generation of multimedia presentation |
US20040054542A1 (en) * | 2002-09-13 | 2004-03-18 | Foote Jonathan T. | Automatic generation of multimedia presentation |
US10056062B2 (en) | 2002-09-19 | 2018-08-21 | Fiver Llc | Systems and methods for the creation and playback of animated, interpretive, musical notation and audio synchronized with the recorded performance of an original artist |
US8637757B2 (en) | 2002-09-19 | 2014-01-28 | Family Systems, Ltd. | Systems and methods for the creation and playback of animated, interpretive, musical notation and audio synchronized with the recorded performance of an original artist |
US20090173215A1 (en) * | 2002-09-19 | 2009-07-09 | Family Systems, Ltd. | Systems and methods for the creation and playback of animated, interpretive, musical notation and audio synchronized with the recorded performance of an original artist |
US8633368B2 (en) | 2002-09-19 | 2014-01-21 | Family Systems, Ltd. | Systems and methods for the creation and playback of animated, interpretive, musical notation and audio synchronized with the recorded performance of an original artist |
US7851689B2 (en) | 2002-09-19 | 2010-12-14 | Family Systems, Ltd. | Systems and methods for the creation and playback of animated, interpretive, musical notation and audio synchronized with the recorded performance of an original artist |
US9472177B2 (en) | 2002-09-19 | 2016-10-18 | Family Systems, Ltd. | Systems and methods for the creation and playback of animated, interpretive, musical notation and audio synchronized with the recorded performance of an original artist |
US20060032362A1 (en) * | 2002-09-19 | 2006-02-16 | Brian Reynolds | System and method for the creation and playback of animated, interpretive, musical notation and audio synchronized with the recorded performance of an original artist |
US7423214B2 (en) * | 2002-09-19 | 2008-09-09 | Family Systems, Ltd. | System and method for the creation and playback of animated, interpretive, musical notation and audio synchronized with the recorded performance of an original artist |
US7752044B2 (en) * | 2002-10-14 | 2010-07-06 | Sony Deutschland Gmbh | Method for recognizing speech |
US20040122671A1 (en) * | 2002-10-14 | 2004-06-24 | Lam Yin Hay | Method for recognizing speech |
US7284004B2 (en) | 2002-10-15 | 2007-10-16 | Fuji Xerox Co., Ltd. | Summarization of digital files |
US20040073554A1 (en) * | 2002-10-15 | 2004-04-15 | Cooper Matthew L. | Summarization of digital files |
US8996146B2 (en) * | 2003-01-02 | 2015-03-31 | Catch Media, Inc. | Automatic digital music library builder |
US20090093899A1 (en) * | 2003-01-02 | 2009-04-09 | Yaacov Ben-Yaacov | Portable music player and transmitter |
US8332326B2 (en) | 2003-02-01 | 2012-12-11 | Audible Magic Corporation | Method and apparatus to identify a work received by a processing system |
US20040215447A1 (en) * | 2003-04-25 | 2004-10-28 | Prabindh Sundareson | Apparatus and method for automatic classification/identification of similar compressed audio files |
US8073684B2 (en) * | 2003-04-25 | 2011-12-06 | Texas Instruments Incorporated | Apparatus and method for automatic classification/identification of similar compressed audio files |
US7208669B2 (en) | 2003-08-25 | 2007-04-24 | Blue Street Studios, Inc. | Video game system and method |
US20050045025A1 (en) * | 2003-08-25 | 2005-03-03 | Wells Robert V. | Video game system and method |
US7752042B2 (en) | 2003-09-17 | 2010-07-06 | The Nielsen Company (Us), Llc | Methods and apparatus to operate an audience metering device with voice commands |
US20060203105A1 (en) * | 2003-09-17 | 2006-09-14 | Venugopal Srinivasan | Methods and apparatus to operate an audience metering device with voice commands |
US20080120105A1 (en) * | 2003-09-17 | 2008-05-22 | Venugopal Srinivasan | Methods and apparatus to operate an audience metering device with voice commands |
US7353171B2 (en) | 2003-09-17 | 2008-04-01 | Nielsen Media Research, Inc. | Methods and apparatus to operate an audience metering device with voice commands |
US20050065915A1 (en) * | 2003-09-23 | 2005-03-24 | Allen Wayne J. | Method and system to add protocol support for network traffic tools |
US7650616B2 (en) | 2003-10-17 | 2010-01-19 | The Nielsen Company (Us), Llc | Methods and apparatus for identifying audio/video content using temporal signal characteristics |
US8065700B2 (en) | 2003-10-17 | 2011-11-22 | The Nielsen Company (Us), Llc | Methods and apparatus for identifying audio/video content using temporal signal characteristics |
US20100095320A1 (en) * | 2003-10-17 | 2010-04-15 | Morris Lee | Methods and apparatus for identifying audio/video content using temporal signal characteristics |
US20060195861A1 (en) * | 2003-10-17 | 2006-08-31 | Morris Lee | Methods and apparatus for identifying audio/video content using temporal signal characteristics |
US20050091062A1 (en) * | 2003-10-24 | 2005-04-28 | Burges Christopher J.C. | Systems and methods for generating audio thumbnails |
US7379875B2 (en) * | 2003-10-24 | 2008-05-27 | Microsoft Corporation | Systems and methods for generating audio thumbnails |
US20050097120A1 (en) * | 2003-10-31 | 2005-05-05 | Fuji Xerox Co., Ltd. | Systems and methods for organizing data |
CN100461168C (en) * | 2004-02-24 | 2009-02-11 | 微软公司 | Systems and methods for generating audio thumbnails |
US20050234366A1 (en) * | 2004-03-19 | 2005-10-20 | Thorsten Heinz | Apparatus and method for analyzing a sound signal using a physiological ear model |
US8535236B2 (en) * | 2004-03-19 | 2013-09-17 | Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V. | Apparatus and method for analyzing a sound signal using a physiological ear model |
US7970618B2 (en) * | 2004-04-02 | 2011-06-28 | Kddi Corporation | Content distribution server for distributing content frame for reproducing music and terminal |
US20070203696A1 (en) * | 2004-04-02 | 2007-08-30 | Kddi Corporation | Content Distribution Server For Distributing Content Frame For Reproducing Music And Terminal |
US20050249080A1 (en) * | 2004-05-07 | 2005-11-10 | Fuji Xerox Co., Ltd. | Method and system for harvesting a media stream |
US20060020958A1 (en) * | 2004-07-26 | 2006-01-26 | Eric Allamanche | Apparatus and method for robust classification of audio signals, and method for establishing and operating an audio-signal database, as well as computer program |
US7580832B2 (en) * | 2004-07-26 | 2009-08-25 | M2Any Gmbh | Apparatus and method for robust classification of audio signals, and method for establishing and operating an audio-signal database, as well as computer program |
US8130746B2 (en) | 2004-07-28 | 2012-03-06 | Audible Magic Corporation | System for distributing decoy content in a peer to peer network |
US8031849B1 (en) | 2004-09-03 | 2011-10-04 | Confinement Telephony Technology, Llc | Telephony system and method with enhanced fraud control |
US8064580B1 (en) | 2004-09-03 | 2011-11-22 | Confinement Telephony Technology, Llc | Telephony system and method with improved fraud control |
US7881446B1 (en) | 2004-09-03 | 2011-02-01 | Confinement Telephony Technology, Llc | Telephony system and method with enhanced validation |
US8761353B1 (en) | 2004-09-03 | 2014-06-24 | Confinement Telephony Technology, Llc | Telephony system and method with enhanced call monitoring, recording and retrieval |
US8295446B1 (en) | 2004-09-03 | 2012-10-23 | Confinement Telephony Technology, Llc | Telephony system and method with enhanced call monitoring, recording and retrieval |
US20060058998A1 (en) * | 2004-09-16 | 2006-03-16 | Kabushiki Kaisha Toshiba | Indexing apparatus and indexing method |
US7451077B1 (en) | 2004-09-23 | 2008-11-11 | Felicia Lindau | Acoustic presentation system and method |
US20060080100A1 (en) * | 2004-09-28 | 2006-04-13 | Pinxteren Markus V | Apparatus and method for grouping temporal segments of a piece of music |
US7304231B2 (en) * | 2004-09-28 | 2007-12-04 | Fraunhofer-Gesellschaft zur Förderung der Angewandten Forschung Ev | Apparatus and method for designating various segment classes |
US7345233B2 (en) * | 2004-09-28 | 2008-03-18 | Fraunhofer-Gesellschaft Zur Forderung Der Angewandten Forschung Ev | Apparatus and method for grouping temporal segments of a piece of music |
DE102004047032A1 (en) * | 2004-09-28 | 2006-04-06 | Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. | Apparatus and method for designating different segment classes |
US7282632B2 (en) * | 2004-09-28 | 2007-10-16 | Fraunhofer-Gesellschaft Zur Forderung Der Angewandten Forschung Ev | Apparatus and method for changing a segmentation of an audio piece |
US20060065106A1 (en) * | 2004-09-28 | 2006-03-30 | Pinxteren Markus V | Apparatus and method for changing a segmentation of an audio piece |
DE102004047069A1 (en) * | 2004-09-28 | 2006-04-06 | Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. | Device and method for changing a segmentation of an audio piece |
US20060080095A1 (en) * | 2004-09-28 | 2006-04-13 | Pinxteren Markus V | Apparatus and method for designating various segment classes |
WO2006034743A1 (en) * | 2004-09-28 | 2006-04-06 | Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. | Device and method for arranging in groups temporal segments of a piece of music |
US8521529B2 (en) * | 2004-10-18 | 2013-08-27 | Creative Technology Ltd | Method for segmenting audio signals |
US20060085188A1 (en) * | 2004-10-18 | 2006-04-20 | Creative Technology Ltd. | Method for Segmenting Audio Signals |
US20060107216A1 (en) * | 2004-11-12 | 2006-05-18 | Fuji Xerox Co., Ltd. | Video segmentation combining similarity analysis and classification |
US7783106B2 (en) * | 2004-11-12 | 2010-08-24 | Fuji Xerox Co., Ltd. | Video segmentation combining similarity analysis and classification |
US20060173746A1 (en) * | 2005-01-18 | 2006-08-03 | Fuji Xerox Co., Ltd. | Efficient methods for temporal event clustering of digital photographs |
US7640218B2 (en) | 2005-01-18 | 2009-12-29 | Fuji Xerox Co., Ltd. | Efficient methods for temporal event clustering of digital photographs |
US7617188B2 (en) | 2005-03-24 | 2009-11-10 | The Mitre Corporation | System and method for audio hot spotting |
US20060217966A1 (en) * | 2005-03-24 | 2006-09-28 | The Mitre Corporation | System and method for audio hot spotting |
US7953751B2 (en) | 2005-03-24 | 2011-05-31 | The Mitre Corporation | System and method for audio hot spotting |
US20100076996A1 (en) * | 2005-03-24 | 2010-03-25 | The Mitre Corporation | System and method for audio hot spotting |
US20060218505A1 (en) * | 2005-03-28 | 2006-09-28 | Compton Anthony K | System, method and program product for displaying always visible audio content based visualization |
US8010324B1 (en) | 2005-05-09 | 2011-08-30 | Sas Institute Inc. | Computer-implemented system and method for storing data analysis models |
US8014983B2 (en) | 2005-05-09 | 2011-09-06 | Sas Institute Inc. | Computer-implemented system and method for storing data analysis models |
US8005707B1 (en) | 2005-05-09 | 2011-08-23 | Sas Institute Inc. | Computer-implemented systems and methods for defining events |
US7716022B1 (en) | 2005-05-09 | 2010-05-11 | Sas Institute Inc. | Computer-implemented systems and methods for processing time series data |
US7534951B2 (en) * | 2005-07-27 | 2009-05-19 | Sony Corporation | Beat extraction apparatus and method, music-synchronized image display apparatus and method, tempo value detection apparatus, rhythm tracking apparatus and method, and music-synchronized display apparatus and method |
US20070022867A1 (en) * | 2005-07-27 | 2007-02-01 | Sony Corporation | Beat extraction apparatus and method, music-synchronized image display apparatus and method, tempo value detection apparatus, rhythm tracking apparatus and method, and music-synchronized display apparatus and method |
US7529659B2 (en) | 2005-09-28 | 2009-05-05 | Audible Magic Corporation | Method and apparatus for identifying an unknown work |
US20070074147A1 (en) * | 2005-09-28 | 2007-03-29 | Audible Magic Corporation | Method and apparatus for identifying an unknown work |
US8069036B2 (en) * | 2005-09-30 | 2011-11-29 | Koninklijke Philips Electronics N.V. | Method and apparatus for processing audio for playback |
US20080221895A1 (en) * | 2005-09-30 | 2008-09-11 | Koninklijke Philips Electronics, N.V. | Method and Apparatus for Processing Audio for Playback |
FR2891651A1 (en) * | 2005-10-05 | 2007-04-06 | Sagem Comm | Karaoke system for use with e.g. CD, has real time audio processing unit to deliver karaoke video stream carrying text information of input audiovisual stream voice part and storage unit to temporarily store input stream during preset time |
EP1772851A1 (en) * | 2005-10-05 | 2007-04-11 | Sagem Communication S.A. | Karaoke system for displaying the text corresponding to the vocal part of an audiovisual flux on a display screen of an audiovisual system |
US20080281590A1 (en) * | 2005-10-17 | 2008-11-13 | Koninklijke Philips Electronics, N.V. | Method of Deriving a Set of Features for an Audio Input Signal |
US8423356B2 (en) * | 2005-10-17 | 2013-04-16 | Koninklijke Philips Electronics N.V. | Method of deriving a set of features for an audio input signal |
US10324899B2 (en) * | 2005-11-07 | 2019-06-18 | Nokia Technologies Oy | Methods for characterizing content item groups |
EP1952639A4 (en) * | 2005-11-14 | 2008-12-31 | Mediaguide Inc | Method and apparatus for automatic detection and identification of unidentified broadcast audio or video signals |
WO2007059498A2 (en) | 2005-11-14 | 2007-05-24 | Mediaguide, Inc. | Method and apparatus for automatic detection and identification of unidentified broadcast audio or video signals |
EP1952639A2 (en) * | 2005-11-14 | 2008-08-06 | Mediaguide, inc. | Method and apparatus for automatic detection and identification of unidentified broadcast audio or video signals |
US9633111B1 (en) * | 2005-11-30 | 2017-04-25 | Google Inc. | Automatic selection of representative media clips |
US10229196B1 (en) | 2005-11-30 | 2019-03-12 | Google Llc | Automatic selection of representative media clips |
US7826911B1 (en) | 2005-11-30 | 2010-11-02 | Google Inc. | Automatic selection of representative media clips |
US8538566B1 (en) | 2005-11-30 | 2013-09-17 | Google Inc. | Automatic selection of representative media clips |
US7668610B1 (en) | 2005-11-30 | 2010-02-23 | Google Inc. | Deconstructing electronic media stream into human recognizable portions |
US8437869B1 (en) | 2005-11-30 | 2013-05-07 | Google Inc. | Deconstructing electronic media stream into human recognizable portions |
US10091266B2 (en) * | 2005-12-13 | 2018-10-02 | Audio Pod Inc. | Method and system for rendering digital content across multiple client devices |
US9954922B2 (en) * | 2005-12-13 | 2018-04-24 | Audio Pod Inc. | Method and system for rendering digital content across multiple client devices |
US20120317245A1 (en) * | 2005-12-13 | 2012-12-13 | Mccue John | Transmission of digital audio data |
US20170078357A1 (en) * | 2005-12-13 | 2017-03-16 | John McCue | Method and system for rendering content across multiple client devices |
US20160182589A1 (en) * | 2005-12-13 | 2016-06-23 | Audio Pod Inc. | Method and system for rendering digital content across multiple client devices |
US20080301318A1 (en) * | 2005-12-13 | 2008-12-04 | Mccue John | Segmentation and Transmission of Audio Streams |
US9930089B2 (en) * | 2005-12-13 | 2018-03-27 | Audio Pod Inc. | Memory management of digital audio data |
US9203884B2 (en) * | 2005-12-13 | 2015-12-01 | Audio Pod Inc. | Transmission of digital audio data |
US20190044993A1 (en) * | 2005-12-13 | 2019-02-07 | Audio Pod Inc., | Method of downloading digital content to be rendered |
US8285809B2 (en) * | 2005-12-13 | 2012-10-09 | Audio Pod Inc. | Segmentation and transmission of audio streams |
US10735488B2 (en) * | 2005-12-13 | 2020-08-04 | Audio Pod Inc. | Method of downloading digital content to be rendered |
US20140304374A1 (en) * | 2005-12-13 | 2014-10-09 | Audio Pod Inc. | Transmission of digital audio data |
US20160050250A1 (en) * | 2005-12-13 | 2016-02-18 | Audio Pod Inc. | Memory management of digital audio data |
US8738740B2 (en) * | 2005-12-13 | 2014-05-27 | Audio Pod Inc. | Transmission of digital audio data |
WO2007072394A3 (en) * | 2005-12-22 | 2007-10-18 | Koninkl Philips Electronics Nv | Audio structure analysis |
WO2007072394A2 (en) * | 2005-12-22 | 2007-06-28 | Koninklijke Philips Electronics N.V. | Audio structure analysis |
US20070239753A1 (en) * | 2006-04-06 | 2007-10-11 | Leonard Michael J | Systems And Methods For Mining Transactional And Time Series Data |
US7711734B2 (en) * | 2006-04-06 | 2010-05-04 | Sas Institute Inc. | Systems and methods for mining transactional and time series data |
US20070261537A1 (en) * | 2006-05-12 | 2007-11-15 | Nokia Corporation | Creating and sharing variations of a music file |
US8442816B2 (en) | 2006-05-31 | 2013-05-14 | Victor Company Of Japan, Ltd. | Music-piece classification based on sustain regions |
US8438013B2 (en) | 2006-05-31 | 2013-05-07 | Victor Company Of Japan, Ltd. | Music-piece classification based on sustain regions and sound thickness |
US20110132174A1 (en) * | 2006-05-31 | 2011-06-09 | Victor Company Of Japan, Ltd. | Music-piece classifying apparatus and method, and related computed program |
US20110132173A1 (en) * | 2006-05-31 | 2011-06-09 | Victor Company Of Japan, Ltd. | Music-piece classifying apparatus and method, and related computed program |
US7908135B2 (en) * | 2006-05-31 | 2011-03-15 | Victor Company Of Japan, Ltd. | Music-piece classification based on sustain regions |
US20080040123A1 (en) * | 2006-05-31 | 2008-02-14 | Victor Company Of Japan, Ltd. | Music-piece classifying apparatus and method, and related computer program |
US7739110B2 (en) * | 2006-06-07 | 2010-06-15 | Industrial Technology Research Institute | Multimedia data management by speech recognizer annotation |
US20070288237A1 (en) * | 2006-06-07 | 2007-12-13 | Chung-Hsien Wu | Method And Apparatus For Multimedia Data Management |
US20080046406A1 (en) * | 2006-08-15 | 2008-02-21 | Microsoft Corporation | Audio and video thumbnails |
US20080077263A1 (en) * | 2006-09-21 | 2008-03-27 | Sony Corporation | Data recording device, data recording method, and data recording program |
US20080113586A1 (en) * | 2006-10-02 | 2008-05-15 | Mark Hardin | Electronic playset |
US8062089B2 (en) | 2006-10-02 | 2011-11-22 | Mattel, Inc. | Electronic playset |
US8292689B2 (en) | 2006-10-02 | 2012-10-23 | Mattel, Inc. | Electronic playset |
US8112302B1 (en) | 2006-11-03 | 2012-02-07 | Sas Institute Inc. | Computer-implemented systems and methods for forecast reconciliation |
US8364517B2 (en) | 2006-11-03 | 2013-01-29 | Sas Institute Inc. | Computer-implemented systems and methods for forecast reconciliation |
US8145486B2 (en) | 2007-01-17 | 2012-03-27 | Kabushiki Kaisha Toshiba | Indexing apparatus, indexing method, and computer program product |
US20080215324A1 (en) * | 2007-01-17 | 2008-09-04 | Kabushiki Kaisha Toshiba | Indexing apparatus, indexing method, and computer program product |
WO2008100485A1 (en) * | 2007-02-12 | 2008-08-21 | Union College | A system and method for transforming dispersed data patterns into moving objects |
US7521622B1 (en) * | 2007-02-16 | 2009-04-21 | Hewlett-Packard Development Company, L.P. | Noise-resistant detection of harmonic segments of audio signals |
US20090006551A1 (en) * | 2007-06-29 | 2009-01-01 | Microsoft Corporation | Dynamic awareness of people |
US9268921B2 (en) | 2007-07-27 | 2016-02-23 | Audible Magic Corporation | System for identifying content of digital data |
US8732858B2 (en) | 2007-07-27 | 2014-05-20 | Audible Magic Corporation | System for identifying content of digital data |
US8006314B2 (en) | 2007-07-27 | 2011-08-23 | Audible Magic Corporation | System for identifying content of digital data |
US20090031326A1 (en) * | 2007-07-27 | 2009-01-29 | Audible Magic Corporation | System for identifying content of digital data |
US10181015B2 (en) | 2007-07-27 | 2019-01-15 | Audible Magic Corporation | System for identifying content of digital data |
US9785757B2 (en) | 2007-07-27 | 2017-10-10 | Audible Magic Corporation | System for identifying content of digital data |
US20090030651A1 (en) * | 2007-07-27 | 2009-01-29 | Audible Magic Corporation | System for identifying content of digital data |
US8112818B2 (en) | 2007-07-27 | 2012-02-07 | Audible Magic Corporation | System for identifying content of digital data |
US8200061B2 (en) | 2007-09-12 | 2012-06-12 | Kabushiki Kaisha Toshiba | Signal processing apparatus and method thereof |
US20090067807A1 (en) * | 2007-09-12 | 2009-03-12 | Kabushiki Kaisha Toshiba | Signal processing apparatus and method thereof |
EP2201478B1 (en) * | 2007-09-18 | 2020-10-14 | Microsoft Technology Licensing, LLC | Synchronizing slide show events with audio |
US20090132252A1 (en) * | 2007-11-20 | 2009-05-21 | Massachusetts Institute Of Technology | Unsupervised Topic Segmentation of Acoustic Speech Signal |
US20090151544A1 (en) * | 2007-12-17 | 2009-06-18 | Sony Corporation | Method for music structure analysis |
US8013230B2 (en) | 2007-12-17 | 2011-09-06 | Sony Corporation | Method for music structure analysis |
EP2088518A1 (en) * | 2007-12-17 | 2009-08-12 | Sony Corporation | Method for music structure analysis |
EP2093753A1 (en) * | 2008-02-19 | 2009-08-26 | Yamaha Corporation | Sound signal processing apparatus and method |
US20090216354A1 (en) * | 2008-02-19 | 2009-08-27 | Yamaha Corporation | Sound signal processing apparatus and method |
US8494668B2 (en) | 2008-02-19 | 2013-07-23 | Yamaha Corporation | Sound signal processing apparatus and method |
US20090216611A1 (en) * | 2008-02-25 | 2009-08-27 | Leonard Michael J | Computer-Implemented Systems And Methods Of Product Forecasting For New Products |
EP2096626A1 (en) | 2008-02-29 | 2009-09-02 | Sony Corporation | Method for visualizing audio data |
US20090228799A1 (en) * | 2008-02-29 | 2009-09-10 | Sony Corporation | Method for visualizing audio data |
US20150006411A1 (en) * | 2008-06-11 | 2015-01-01 | James D. Bennett | Creative work registry |
US20110160887A1 (en) * | 2008-08-20 | 2011-06-30 | Pioneer Corporation | Information generating apparatus, information generating method and information generating program |
US11778268B2 (en) | 2008-10-31 | 2023-10-03 | The Nielsen Company (Us), Llc | Methods and apparatus to verify presentation of media content |
US11070874B2 (en) | 2008-10-31 | 2021-07-20 | The Nielsen Company (Us), Llc | Methods and apparatus to verify presentation of media content |
US10469901B2 (en) | 2008-10-31 | 2019-11-05 | The Nielsen Company (Us), Llc | Methods and apparatus to verify presentation of media content |
US9124769B2 (en) | 2008-10-31 | 2015-09-01 | The Nielsen Company (Us), Llc | Methods and apparatus to verify presentation of media content |
US20100138010A1 (en) * | 2008-11-28 | 2010-06-03 | Audionamix | Automatic gathering strategy for unsupervised source separation algorithms |
US8433431B1 (en) | 2008-12-02 | 2013-04-30 | Soundhound, Inc. | Displaying text to end users in coordination with audio playback |
US8452586B2 (en) * | 2008-12-02 | 2013-05-28 | Soundhound, Inc. | Identifying music from peaks of a reference sound fingerprint |
US20100145708A1 (en) * | 2008-12-02 | 2010-06-10 | Melodis Corporation | System and method for identifying original music |
US20100174389A1 (en) * | 2009-01-06 | 2010-07-08 | Audionamix | Automatic audio source separation with joint spectral shape, expansion coefficients and musical state estimation |
US8199651B1 (en) | 2009-03-16 | 2012-06-12 | Audible Magic Corporation | Method and system for modifying communication flows at a port level |
US20110071824A1 (en) * | 2009-09-23 | 2011-03-24 | Carol Espy-Wilson | Systems and Methods for Multiple Pitch Tracking |
US8666734B2 (en) * | 2009-09-23 | 2014-03-04 | University Of Maryland, College Park | Systems and methods for multiple pitch tracking using a multidimensional function and strength values |
US10381025B2 (en) | 2009-09-23 | 2019-08-13 | University Of Maryland, College Park | Multiple pitch extraction by strength calculation from extrema |
US9640200B2 (en) | 2009-09-23 | 2017-05-02 | University Of Maryland, College Park | Multiple pitch extraction by strength calculation from extrema |
US8631040B2 (en) | 2010-02-23 | 2014-01-14 | Sas Institute Inc. | Computer-implemented systems and methods for flexible definition of time intervals |
US10657174B2 (en) | 2010-07-29 | 2020-05-19 | Soundhound, Inc. | Systems and methods for providing identification information in response to an audio segment |
US9563699B1 (en) | 2010-07-29 | 2017-02-07 | Soundhound, Inc. | System and method for matching a query against a broadcast stream |
US9047371B2 (en) | 2010-07-29 | 2015-06-02 | Soundhound, Inc. | System and method for matching a query against a broadcast stream |
US9390167B2 (en) | 2010-07-29 | 2016-07-12 | Soundhound, Inc. | System and methods for continuous audio matching |
US10055490B2 (en) | 2010-07-29 | 2018-08-21 | Soundhound, Inc. | System and methods for continuous audio matching |
US20120101606A1 (en) * | 2010-10-22 | 2012-04-26 | Yasushi Miyajima | Information processing apparatus, content data reconfiguring method and program |
US20120143610A1 (en) * | 2010-12-03 | 2012-06-07 | Industrial Technology Research Institute | Sound Event Detecting Module and Method Thereof |
US8655655B2 (en) * | 2010-12-03 | 2014-02-18 | Industrial Technology Research Institute | Sound event detecting module for a sound event recognition system and method thereof |
US10121165B1 (en) | 2011-05-10 | 2018-11-06 | Soundhound, Inc. | System and method for targeting content based on identified audio and multimedia |
US10832287B2 (en) | 2011-05-10 | 2020-11-10 | Soundhound, Inc. | Promotional content targeting based on recognized audio |
US12100023B2 (en) | 2011-05-10 | 2024-09-24 | Soundhound Ai Ip, Llc | Query-specific targeted ad delivery |
US9336493B2 (en) | 2011-06-06 | 2016-05-10 | Sas Institute Inc. | Systems and methods for clustering time series data based on forecast distributions |
US20120316886A1 (en) * | 2011-06-08 | 2012-12-13 | Ramin Pishehvar | Sparse coding using object exttraction |
US9047559B2 (en) | 2011-07-22 | 2015-06-02 | Sas Institute Inc. | Computer-implemented systems and methods for testing large scale automatic forecast combinations |
US9547715B2 (en) | 2011-08-19 | 2017-01-17 | Dolby Laboratories Licensing Corporation | Methods and apparatus for detecting a repetitive pattern in a sequence of audio frames |
US9460736B2 (en) | 2011-08-19 | 2016-10-04 | Dolby Laboratories Licensing Corporation | Measuring content coherence and measuring similarity |
US9218821B2 (en) | 2011-08-19 | 2015-12-22 | Dolby Laboratories Licensing Corporation | Measuring content coherence and measuring similarity |
CN102956237B (en) * | 2011-08-19 | 2016-12-07 | 杜比实验室特许公司 | The method and apparatus measuring content consistency |
WO2013028351A3 (en) * | 2011-08-19 | 2013-05-10 | Dolby Laboratories Licensing Corporation | Measuring content coherence and measuring similarity of audio sections |
CN102956237A (en) * | 2011-08-19 | 2013-03-06 | 杜比实验室特许公司 | Method and device for measuring content consistency and method and device for measuring similarity |
US8713028B2 (en) * | 2011-11-17 | 2014-04-29 | Yahoo! Inc. | Related news articles |
US20130132401A1 (en) * | 2011-11-17 | 2013-05-23 | Yahoo! Inc. | Related news articles |
US8965766B1 (en) * | 2012-03-15 | 2015-02-24 | Google Inc. | Systems and methods for identifying music in a noisy environment |
US20130255473A1 (en) * | 2012-03-29 | 2013-10-03 | Sony Corporation | Tonal component detection method, tonal component detection apparatus, and program |
US8779271B2 (en) * | 2012-03-29 | 2014-07-15 | Sony Corporation | Tonal component detection method, tonal component detection apparatus, and program |
US9653056B2 (en) | 2012-04-30 | 2017-05-16 | Nokia Technologies Oy | Evaluation of beats, chords and downbeats from a musical audio signal |
US20130325853A1 (en) * | 2012-05-29 | 2013-12-05 | Jeffery David Frazier | Digital media players comprising a music-speech discrimination function |
US20160005387A1 (en) * | 2012-06-29 | 2016-01-07 | Nokia Technologies Oy | Audio signal analysis |
US9418643B2 (en) * | 2012-06-29 | 2016-08-16 | Nokia Technologies Oy | Audio signal analysis |
US9087306B2 (en) | 2012-07-13 | 2015-07-21 | Sas Institute Inc. | Computer-implemented systems and methods for time series exploration |
US9037998B2 (en) | 2012-07-13 | 2015-05-19 | Sas Institute Inc. | Computer-implemented systems and methods for time series exploration using structured judgment |
US9916282B2 (en) | 2012-07-13 | 2018-03-13 | Sas Institute Inc. | Computer-implemented systems and methods for time series exploration |
US9244887B2 (en) | 2012-07-13 | 2016-01-26 | Sas Institute Inc. | Computer-implemented systems and methods for efficient structuring of time series data |
US10025753B2 (en) | 2012-07-13 | 2018-07-17 | Sas Institute Inc. | Computer-implemented systems and methods for time series exploration |
US10037305B2 (en) | 2012-07-13 | 2018-07-31 | Sas Institute Inc. | Computer-implemented systems and methods for time series exploration |
US11776533B2 (en) | 2012-07-23 | 2023-10-03 | Soundhound, Inc. | Building a natural language understanding application using a received electronic record containing programming code including an interpret-block, an interpret-statement, a pattern expression and an action statement |
US10957310B1 (en) | 2012-07-23 | 2021-03-23 | Soundhound, Inc. | Integrated programming framework for speech and text understanding with meaning parsing |
US10996931B1 (en) | 2012-07-23 | 2021-05-04 | Soundhound, Inc. | Integrated programming framework for speech and text understanding with block and statement structure |
US9608824B2 (en) | 2012-09-25 | 2017-03-28 | Audible Magic Corporation | Using digital fingerprints to associate data with a work |
US9081778B2 (en) | 2012-09-25 | 2015-07-14 | Audible Magic Corporation | Using digital fingerprints to associate data with a work |
US10698952B2 (en) | 2012-09-25 | 2020-06-30 | Audible Magic Corporation | Using digital fingerprints to associate data with a work |
US20140185816A1 (en) * | 2013-01-02 | 2014-07-03 | Samsung Electronics Co., Ltd. | Apparatus and method for processing audio signal |
US9294855B2 (en) * | 2013-01-02 | 2016-03-22 | Samsung Electronics Co., Ltd. | Apparatus and method for processing audio signal |
US9147218B2 (en) | 2013-03-06 | 2015-09-29 | Sas Institute Inc. | Devices for forecasting ratios in hierarchies |
US11189277B2 (en) * | 2013-03-14 | 2021-11-30 | Amazon Technologies, Inc. | Dynamic gazetteers for personalized entity recognition |
US20140350923A1 (en) * | 2013-05-23 | 2014-11-27 | Tencent Technology (Shenzhen) Co., Ltd. | Method and device for detecting noise bursts in speech signals |
EP2816550A1 (en) * | 2013-06-18 | 2014-12-24 | Nokia Corporation | Audio signal analysis |
US9280961B2 (en) | 2013-06-18 | 2016-03-08 | Nokia Technologies Oy | Audio signal analysis for downbeats |
US9934259B2 (en) | 2013-08-15 | 2018-04-03 | Sas Institute Inc. | In-memory time series database and processing in a distributed environment |
US9812150B2 (en) | 2013-08-28 | 2017-11-07 | Accusonus, Inc. | Methods and systems for improved signal decomposition |
US11238881B2 (en) | 2013-08-28 | 2022-02-01 | Accusonus, Inc. | Weight matrix initialization method to improve signal decomposition |
US11581005B2 (en) | 2013-08-28 | 2023-02-14 | Meta Platforms Technologies, Llc | Methods and systems for improved signal decomposition |
US10366705B2 (en) | 2013-08-28 | 2019-07-30 | Accusonus, Inc. | Method and system of signal decomposition using extended time-frequency transformations |
US9372925B2 (en) * | 2013-09-19 | 2016-06-21 | Microsoft Technology Licensing, Llc | Combining audio samples by automatically adjusting sample characteristics |
US9798974B2 (en) | 2013-09-19 | 2017-10-24 | Microsoft Technology Licensing, Llc | Recommending audio sample combinations |
US20150081064A1 (en) * | 2013-09-19 | 2015-03-19 | Microsoft Corporation | Combining audio samples by automatically adjusting sample characteristics |
US20150094835A1 (en) * | 2013-09-27 | 2015-04-02 | Nokia Corporation | Audio analysis apparatus |
US9507849B2 (en) | 2013-11-28 | 2016-11-29 | Soundhound, Inc. | Method for combining a query and a communication command in a natural language computer system |
US9601114B2 (en) | 2014-02-01 | 2017-03-21 | Soundhound, Inc. | Method for embedding voice mail in a spoken utterance using a natural language processing computer system |
US9292488B2 (en) | 2014-02-01 | 2016-03-22 | Soundhound, Inc. | Method for embedding voice mail in a spoken utterance using a natural language processing computer system |
US11295730B1 (en) | 2014-02-27 | 2022-04-05 | Soundhound, Inc. | Using phonetic variants in a local context to improve natural language understanding |
US9918174B2 (en) | 2014-03-13 | 2018-03-13 | Accusonus, Inc. | Wireless exchange of data between devices in live events |
US9584940B2 (en) | 2014-03-13 | 2017-02-28 | Accusonus, Inc. | Wireless exchange of data between devices in live events |
US10169720B2 (en) | 2014-04-17 | 2019-01-01 | Sas Institute Inc. | Systems and methods for machine learning using classifying, clustering, and grouping time series data |
US10474968B2 (en) | 2014-04-17 | 2019-11-12 | Sas Institute Inc. | Improving accuracy of predictions using seasonal relationships of time series data |
US10468036B2 (en) * | 2014-04-30 | 2019-11-05 | Accusonus, Inc. | Methods and systems for processing and mixing signals using signal decomposition |
US11610593B2 (en) | 2014-04-30 | 2023-03-21 | Meta Platforms Technologies, Llc | Methods and systems for processing and mixing signals using signal decomposition |
US11030993B2 (en) | 2014-05-12 | 2021-06-08 | Soundhound, Inc. | Advertisement selection by linguistic classification |
US9564123B1 (en) | 2014-05-12 | 2017-02-07 | Soundhound, Inc. | Method and system for building an integrated user profile |
US10311858B1 (en) | 2014-05-12 | 2019-06-04 | Soundhound, Inc. | Method and system for building an integrated user profile |
US9892370B2 (en) | 2014-06-12 | 2018-02-13 | Sas Institute Inc. | Systems and methods for resolving over multiple hierarchies |
US20170287505A1 (en) * | 2014-09-03 | 2017-10-05 | Samsung Electronics Co., Ltd. | Method and apparatus for learning and recognizing audio signal |
US9208209B1 (en) | 2014-10-02 | 2015-12-08 | Sas Institute Inc. | Techniques for monitoring transformation techniques using control charts |
US9418339B1 (en) | 2015-01-26 | 2016-08-16 | Sas Institute, Inc. | Systems and methods for time series analysis techniques utilizing count data sets |
US10289916B2 (en) * | 2015-07-21 | 2019-05-14 | Shred Video, Inc. | System and method for editing video and audio clips |
US10983682B2 (en) | 2015-08-27 | 2021-04-20 | Sas Institute Inc. | Interactive graphical user-interface for analyzing and manipulating time-series projections |
US10297241B2 (en) * | 2016-03-07 | 2019-05-21 | Yamaha Corporation | Sound signal processing method and sound signal processing apparatus |
US10460732B2 (en) * | 2016-03-31 | 2019-10-29 | Tata Consultancy Services Limited | System and method to insert visual subtitles in videos |
US10170090B2 (en) * | 2016-06-08 | 2019-01-01 | Visionarist Co., Ltd | Music information generating device, music information generating method, and recording medium |
US10366121B2 (en) * | 2016-06-24 | 2019-07-30 | Mixed In Key Llc | Apparatus, method, and computer-readable medium for cue point generation |
US11354355B2 (en) * | 2016-06-24 | 2022-06-07 | Mixed In Key Llc | Apparatus, method, and computer-readable medium for cue point generation |
US20200151208A1 (en) * | 2016-09-23 | 2020-05-14 | Amazon Technologies, Inc. | Time code to byte indexer for partial object retrieval |
US20190014312A1 (en) * | 2017-07-05 | 2019-01-10 | Tektronix, Inc. | Video Waveform Peak Indicator |
US10587872B2 (en) * | 2017-07-05 | 2020-03-10 | Project Giants, Llc | Video waveform peak indicator |
CN109218818A (en) * | 2017-07-05 | 2019-01-15 | 特克特朗尼克公司 | video waveform peak indicator |
US10331490B2 (en) | 2017-11-16 | 2019-06-25 | Sas Institute Inc. | Scalable cloud-based time series analysis |
US20200357369A1 (en) * | 2018-01-09 | 2020-11-12 | Guangzhou Baiguoyuan Information Technology Co., Ltd. | Music classification method and beat point detection method, storage device and computer device |
US11715446B2 (en) * | 2018-01-09 | 2023-08-01 | Bigo Technology Pte, Ltd. | Music classification method and beat point detection method, storage device and computer device |
RU2743315C1 (en) * | 2018-01-09 | 2021-02-17 | Гуанчжоу Байгуоюань Информейшен Текнолоджи Ко., Лтд. | Method of music classification and a method of detecting music beat parts, a data medium and a computer device |
US10338994B1 (en) | 2018-02-22 | 2019-07-02 | Sas Institute Inc. | Predicting and adjusting computer functionality to avoid failures |
US10255085B1 (en) | 2018-03-13 | 2019-04-09 | Sas Institute Inc. | Interactive graphical user interface with override guidance |
US20210064916A1 (en) * | 2018-05-17 | 2021-03-04 | Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. | Device and method for detecting partial matches between a first time varying signal and a second time varying signal |
US11860934B2 (en) * | 2018-05-17 | 2024-01-02 | Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. | Device and method for detecting partial matches between a first time varying signal and a second time varying signal |
US10560313B2 (en) | 2018-06-26 | 2020-02-11 | Sas Institute Inc. | Pipeline system for time-series data forecasting |
US10685283B2 (en) | 2018-06-26 | 2020-06-16 | Sas Institute Inc. | Demand classification based pipeline system for time-series data forecasting |
US11443724B2 (en) * | 2018-07-31 | 2022-09-13 | Mediawave Intelligent Communication | Method of synchronizing electronic interactive device |
JP2022545342A (en) * | 2019-08-27 | 2022-10-27 | エヌイーシー ラボラトリーズ アメリカ インク | Sequence model for audio scene recognition |
US11374663B2 (en) * | 2019-11-21 | 2022-06-28 | Bose Corporation | Variable-frequency smoothing |
US11264015B2 (en) | 2019-11-21 | 2022-03-01 | Bose Corporation | Variable-time smoothing for steady state noise estimation |
US12124498B2 (en) * | 2020-01-09 | 2024-10-22 | Amazon Technologies, Inc. | Time code to byte indexer for partial object retrieval |
US11373657B2 (en) * | 2020-05-01 | 2022-06-28 | Raytheon Applied Signal Technology, Inc. | System and method for speaker identification in audio data |
US11315545B2 (en) | 2020-07-09 | 2022-04-26 | Raytheon Applied Signal Technology, Inc. | System and method for language identification in audio data |
US12020697B2 (en) | 2020-07-15 | 2024-06-25 | Raytheon Applied Signal Technology, Inc. | Systems and methods for fast filtering of audio keyword search |
US20230282205A1 (en) * | 2022-03-01 | 2023-09-07 | Raytheon Applied Signal Technology, Inc. | Conversation diarization based on aggregate dissimilarity |
CN117636900B (en) * | 2023-12-04 | 2024-05-07 | 广东新裕信息科技有限公司 | Musical instrument playing quality evaluation method based on audio characteristic shape matching |
CN117636900A (en) * | 2023-12-04 | 2024-03-01 | 广东新裕信息科技有限公司 | Musical instrument playing quality evaluation method based on audio characteristic shape matching |
Also Published As
Publication number | Publication date |
---|---|
JP2002014691A (en) | 2002-01-18 |
JP3941417B2 (en) | 2007-07-04 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US6542869B1 (en) | Method for automatic analysis of audio including music and speech | |
Lerch | An introduction to audio content analysis: Music Information Retrieval tasks and applications | |
Brossier | Automatic annotation of musical audio for interactive applications | |
Foote | Automatic audio segmentation using a measure of audio novelty | |
Tzanetakis et al. | Marsyas: A framework for audio analysis | |
Tzanetakis | Manipulation, analysis and retrieval systems for audio signals | |
Tzanetakis et al. | Musical genre classification of audio signals | |
Peeters et al. | Toward automatic music audio summary generation from signal analysis | |
Kim et al. | MPEG-7 audio and beyond: Audio content indexing and retrieval | |
Pampalk et al. | On the evaluation of perceptual similarity measures for music | |
Dannenberg et al. | Music structure analysis from acoustic signals | |
US20100198760A1 (en) | Apparatus and methods for music signal analysis | |
US20030205124A1 (en) | Method and system for retrieving and sequencing music by rhythmic similarity | |
US10002596B2 (en) | Intelligent crossfade with separated instrument tracks | |
Marolt | A mid-level representation for melody-based retrieval in audio collections | |
Casey et al. | The importance of sequences in musical similarity | |
Hargreaves et al. | Structural segmentation of multitrack audio | |
Rocha et al. | Segmentation and timbre-and rhythm-similarity in Electronic Dance Music | |
Tzanetakis et al. | A framework for audio analysis based on classification and temporal segmentation | |
Lerch | Audio content analysis | |
Ellis | Extracting information from music audio | |
Klapuri | Pattern induction and matching in music signals | |
Foote | Methods for the automatic analysis of music and audio | |
Barthet et al. | Speech/music discrimination in audio podcast using structural segmentation and timbre recognition | |
Gómez Gutiérrez et al. | A quantitative comparison of different approaches for melody extraction from polyphonic audio recordings |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AS | Assignment |
Owner name: FUJI XEROX CO., LTD., JAPAN Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:FOOTE, JONATHAN T.;REEL/FRAME:010796/0250 Effective date: 20000510 Owner name: XEROX CORPORATION, CONNECTICUT Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:FOOTE, JONATHAN T.;REEL/FRAME:010796/0250 Effective date: 20000510 |
|
STCF | Information on status: patent grant |
Free format text: PATENTED CASE |
|
AS | Assignment |
Owner name: FUJI XEROX CO., LTD., JAPAN Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:FX PALO ALTO LABORATORY, INC.;REEL/FRAME:014462/0456 Effective date: 20030822 |
|
AS | Assignment |
Owner name: JPMORGAN CHASE BANK, AS COLLATERAL AGENT, TEXAS Free format text: SECURITY AGREEMENT;ASSIGNOR:XEROX CORPORATION;REEL/FRAME:015134/0476 Effective date: 20030625 Owner name: JPMORGAN CHASE BANK, AS COLLATERAL AGENT,TEXAS Free format text: SECURITY AGREEMENT;ASSIGNOR:XEROX CORPORATION;REEL/FRAME:015134/0476 Effective date: 20030625 |
|
FEPP | Fee payment procedure |
Free format text: PAYER NUMBER DE-ASSIGNED (ORIGINAL EVENT CODE: RMPN); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY Free format text: PAYOR NUMBER ASSIGNED (ORIGINAL EVENT CODE: ASPN); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY |
|
FPAY | Fee payment |
Year of fee payment: 4 |
|
SULP | Surcharge for late payment | ||
FPAY | Fee payment |
Year of fee payment: 8 |
|
AS | Assignment |
Owner name: XEROX CORPORATION, NEW YORK Free format text: RELEASE BY SECURED PARTY;ASSIGNOR:JPMORGAN CHASE BANK;REEL/FRAME:031319/0131 Effective date: 20061204 |
|
AS | Assignment |
Owner name: FX PALO ALTO LABORATORY, INC., CALIFORNIA Free format text: CORRECTIVE ASSIGNMENT TO CORRECT THE ASSIGNEE TO FX PALO ALTO LABORATORY, INC., 3400 HILLVIEW AVENUE, BUILDING 4, PALO ALTO, CALIFORNIA PREVIOUSLY RECORDED ON REEL 010796 FRAME 0250. ASSIGNOR(S) HEREBY CONFIRMS THE ASSIGNMENT OF ASSIGNOR'S INTEREST;ASSIGNOR:FOOTE, JONATHAN T.;REEL/FRAME:031922/0970 Effective date: 20000510 |
|
FPAY | Fee payment |
Year of fee payment: 12 |
|
AS | Assignment |
Owner name: XEROX CORPORATION, CONNECTICUT Free format text: RELEASE BY SECURED PARTY;ASSIGNOR:JPMORGAN CHASE BANK, N.A. AS SUCCESSOR-IN-INTEREST ADMINISTRATIVE AGENT AND COLLATERAL AGENT TO JPMORGAN CHASE BANK;REEL/FRAME:066728/0193 Effective date: 20220822 |