Lyra (codec): Difference between revisions
Citation bot (talk | contribs) Add: date. | Use this bot. Report bugs. | Suggested by Whoop whoop pull up | #UCB_webform 1575/2955 |
Miracle Pen (talk | contribs) add template |
||
(13 intermediate revisions by 7 users not shown) | |||
Line 1: | Line 1: | ||
⚫ | |||
{{Infobox file format |
{{Infobox file format |
||
| icon = Lyra codec logo.png |
| icon = Lyra codec logo.png |
||
| developer = Google |
| developer = Google |
||
| type = |
| type = [[speech codec]] |
||
| extension = .lyra |
| extension = .lyra |
||
| released = {{Start date|2021}} |
| released = {{Start date|2021}} |
||
| latest_release_version = 1.3.2 |
| latest_release_version = 1.3.2 |
||
| latest_release_date = {{ |
| latest_release_date = {{Start date and age|2022|12|20}} |
||
| free = Yes ([[Apache-2.0]]) |
| free = Yes ([[Apache-2.0]]) |
||
}} |
}} |
||
⚫ | |||
'''Lyra''' is a lossy [[audio codec]] developed by [[Google]] that is designed for compressing speech at very low bitrates. Unlike most other audio formats, it compresses data using a [[machine learning]]-based algorithm. |
'''Lyra''' is a lossy [[audio codec]] developed by [[Google]] that is designed for compressing speech at very low bitrates. Unlike most other audio formats, it compresses data using a [[machine learning]]-based algorithm. |
||
== Features == |
== Features == |
||
The Lyra codec is designed to transmit speech in real-time when bandwidth is severely restricted, such as over slow or unreliable network connections.<ref name=":0">{{Cite web |last=Buckley |first=Ian |date=2021-04-08 |title=Google Makes Its Lyra Low Bitrate Speech Codec Public |url=https://www.makeuseof.com/google-lyra-speech-codec-public/ |access-date=2022-07-21 |website=MakeUseOf |language=en-US}}</ref> It runs at fixed bitrates of 3.2, 6, and 9 |
The Lyra codec is designed to transmit speech in real-time when bandwidth is severely restricted, such as over slow or unreliable network connections.<ref name=":0">{{Cite web |last=Buckley |first=Ian |date=2021-04-08 |title=Google Makes Its Lyra Low Bitrate Speech Codec Public |url=https://www.makeuseof.com/google-lyra-speech-codec-public/ |access-date=2022-07-21 |website=MakeUseOf |language=en-US}}</ref> It runs at fixed bitrates of 3.2, 6, and 9 kbit/s and it is intended to provide better quality than codecs that use traditional waveform-based algorithms at similar bitrates.<ref name=":2">{{Cite web |title=Lyra: A New Very Low-Bitrate Codec for Speech Compression |url=http://ai.googleblog.com/2021/02/lyra-new-very-low-bitrate-codec-for.html |access-date=2022-07-21 |website=Google AI Blog |date=25 February 2021 |language=en}}</ref><ref name=":4" /> Instead, compression is achieved via a [[machine learning]] algorithm that encodes the input with feature extraction, and then reconstructs an approximation of the original using a generative model.<ref name=":0" /> This model was trained on thousands of hours of speech recorded in over 70 languages to function with various speakers.<ref name=":2" /> Because generative models are more computationally complex than traditional codecs, a simple model that processes different frequency ranges in parallel is used to obtain acceptable performance.<ref name=":1">{{Cite web |date=2021-04-09 |title=Google Duo uses a new codec for better call quality over poor connections |url=https://www.xda-developers.com/google-duo-lyra-codec-better-call-quality/ |access-date=2022-07-21 |website=XDA |language=en-US}}</ref> Lyra imposes 20 ms of latency due to its frame size.<ref name=":4" /> Google's reference implementation is available for [[Android (operating system)|Android]] and [[Linux]].<ref name=":1" /> |
||
{{Listen |
{{Listen |
||
Line 28: | Line 28: | ||
| title5 = Speex at 3 kbps |
| title5 = Speex at 3 kbps |
||
}} |
}} |
||
=== Quality === |
=== Quality === |
||
Lyra's initial version performed significantly better than traditional codecs at similar bitrates.<ref name=":0" /><ref name=":1" /><ref name=":3">{{Cite web |last=Levent-Levi |first=Tsahi |date=2021-04-19 |title=Lyra, Satin and the future of voice codecs in WebRTC |url=https://bloggeek.me/lyra-satin-webrtc-voice-codecs/ |access-date=2022-07-21 |website=BlogGeek.me |language=en-US}}</ref> Ian Buckley at MakeUseOf said, "It succeeds in creating almost eerie levels of audio reproduction with bitrates as low as 3 kbps." Google claims that it reproduces natural-sounding speech, and that Lyra at 3 |
Lyra's initial version performed significantly better than traditional codecs at similar bitrates.<ref name=":0" /><ref name=":1" /><ref name=":3">{{Cite web |last=Levent-Levi |first=Tsahi |date=2021-04-19 |title=Lyra, Satin and the future of voice codecs in WebRTC |url=https://bloggeek.me/lyra-satin-webrtc-voice-codecs/ |access-date=2022-07-21 |website=BlogGeek.me |language=en-US}}</ref> Ian Buckley at MakeUseOf said, "It succeeds in creating almost eerie levels of audio reproduction with bitrates as low as 3 kbps." Google claims that it reproduces natural-sounding speech, and that Lyra at 3 kbit/s beats Opus at 8 kbit/s.<ref name=":2" /> Tsahi Levent-Levi writes that [[Satin (codec)|Satin]], [[Microsoft|Microsoft's]] AI-based codec, outperforms it at higher bitrates.<ref name=":3" /> |
||
== History == |
== History == |
||
In December 2017, Google researchers published a preprint paper on replacing the [[Codec 2]] decoder with a WaveNet neural network. They found that a neural network is able to extrapolate features of the voice not described in the Codec 2 bitstream and give better audio quality, and that the use of conventional features makes the neural network calculation simpler compared to a purely waveform-based network. Lyra version 1 would reuse this overall framework of feature extraction, quantization, and neural synthesis.<ref>{{cite conference|author=Kleijn, W. B. |author2=Lim, F. S. |author3=Luebs, A. |author4=Skoglund, J. |author5=Stimberg, F. |author6=Wang, Q. |author7=Walters, T. C. |date=April 2018|title=Wavenet based low rate speech coding|conference=2018 IEEE international conference on acoustics, speech and signal processing (ICASSP)|pages=676–680|publisher=IEEE|arxiv=1712.01120 }}</ref> |
|||
⚫ | |||
⚫ | Lyra was first announced in February 2021,<ref name=":2" /> and in April, Google released the source code of their reference implementation.<ref name=":0" /> The initial version had a fixed bitrate of 3 kbit/s and around 90 ms latency.<ref name=":0" /><ref name=":2" /> The encoder calculates a [[mel scale|log mel spectrogram]] and performs [[vector quantization]] to store the spectrogram in a data stream. The decoder is a [[WaveNet]] neural network that takes the spectrogram and reconstructs the input audio.<ref name=":2"/> |
||
A second version (v2/1.2.0), released |
A second version (v2/1.2.0), released in September 2022, improved sound quality, latency, and performance, and permitted multiple bitrates. V2 uses a "SoundStream" structure where both the encoder and decoder are neural networks, a kind of [[autoencoder]]. A [[residual vector quantizer]] is used to turn the feature values into transferrable data.<ref name=":4">{{Cite web |title=Lyra V2 - a better, faster, and more versatile speech codec |url=https://opensource.googleblog.com/2022/09/lyra-v2-a-better-faster-and-more-versatile-speech-codec.html |access-date=2023-04-26 |website=Google Open Source Blog}}</ref> |
||
== Support == |
== Support == |
||
=== Implementations === |
=== Implementations === |
||
Google's implementation is available on [[GitHub |
Google's implementation is available on [[GitHub]] under the Apache License.<ref name=":0" /><ref>{{Cite web |last=((Google)) |date=2021 |title=Lyra: A Very Low-Bitrate Codec for Speech Compression |url=https://github.com/google/lyra |access-date=21 July 2022 |website=GitHub}}</ref> Written in [[C++]], it is optimized for 64-bit [[ARM architecture family|ARM]] but also runs on [[x86]], on either Android or Linux.<ref name=":1" /> |
||
=== Applications === |
=== Applications === |
||
Line 52: | Line 55: | ||
== See also == |
== See also == |
||
* [[Satin (codec)]], an AI-based codec developed by Microsoft |
* [[Satin (codec)]], an AI-based codec developed by Microsoft |
||
* [[Comparison of audio coding formats]] |
* [[Comparison of audio coding formats]] |
||
* [[Speech coding]] |
* [[Speech coding]] |
||
* [[Videotelephony]] |
* [[Videotelephony]] |
||
{{Compression formats}} |
|||
[[Category:Speech codecs]] |
[[Category:Speech codecs]] |
||
[[Category:Lossy compression algorithms]] |
[[Category:Lossy compression algorithms]] |
Latest revision as of 18:01, 15 May 2024
Filename extension |
.lyra |
---|---|
Developed by | |
Initial release | 2021 |
Latest release | 1.3.2 December 20, 2022 |
Type of format | speech codec |
Free format? | Yes (Apache-2.0) |
Lyra is a lossy audio codec developed by Google that is designed for compressing speech at very low bitrates. Unlike most other audio formats, it compresses data using a machine learning-based algorithm.
Features[edit]
The Lyra codec is designed to transmit speech in real-time when bandwidth is severely restricted, such as over slow or unreliable network connections.[1] It runs at fixed bitrates of 3.2, 6, and 9 kbit/s and it is intended to provide better quality than codecs that use traditional waveform-based algorithms at similar bitrates.[2][3] Instead, compression is achieved via a machine learning algorithm that encodes the input with feature extraction, and then reconstructs an approximation of the original using a generative model.[1] This model was trained on thousands of hours of speech recorded in over 70 languages to function with various speakers.[2] Because generative models are more computationally complex than traditional codecs, a simple model that processes different frequency ranges in parallel is used to obtain acceptable performance.[4] Lyra imposes 20 ms of latency due to its frame size.[3] Google's reference implementation is available for Android and Linux.[4]
Quality[edit]
Lyra's initial version performed significantly better than traditional codecs at similar bitrates.[1][4][5] Ian Buckley at MakeUseOf said, "It succeeds in creating almost eerie levels of audio reproduction with bitrates as low as 3 kbps." Google claims that it reproduces natural-sounding speech, and that Lyra at 3 kbit/s beats Opus at 8 kbit/s.[2] Tsahi Levent-Levi writes that Satin, Microsoft's AI-based codec, outperforms it at higher bitrates.[5]
History[edit]
In December 2017, Google researchers published a preprint paper on replacing the Codec 2 decoder with a WaveNet neural network. They found that a neural network is able to extrapolate features of the voice not described in the Codec 2 bitstream and give better audio quality, and that the use of conventional features makes the neural network calculation simpler compared to a purely waveform-based network. Lyra version 1 would reuse this overall framework of feature extraction, quantization, and neural synthesis.[6]
Lyra was first announced in February 2021,[2] and in April, Google released the source code of their reference implementation.[1] The initial version had a fixed bitrate of 3 kbit/s and around 90 ms latency.[1][2] The encoder calculates a log mel spectrogram and performs vector quantization to store the spectrogram in a data stream. The decoder is a WaveNet neural network that takes the spectrogram and reconstructs the input audio.[2]
A second version (v2/1.2.0), released in September 2022, improved sound quality, latency, and performance, and permitted multiple bitrates. V2 uses a "SoundStream" structure where both the encoder and decoder are neural networks, a kind of autoencoder. A residual vector quantizer is used to turn the feature values into transferrable data.[3]
Support[edit]
Implementations[edit]
Google's implementation is available on GitHub under the Apache License.[1][7] Written in C++, it is optimized for 64-bit ARM but also runs on x86, on either Android or Linux.[4]
Applications[edit]
Google Duo uses Lyra to transmit sound for video chats when bandwidth is limited.[1][5]
References[edit]
- ^ a b c d e f g Buckley, Ian (2021-04-08). "Google Makes Its Lyra Low Bitrate Speech Codec Public". MakeUseOf. Retrieved 2022-07-21.
- ^ a b c d e f "Lyra: A New Very Low-Bitrate Codec for Speech Compression". Google AI Blog. 25 February 2021. Retrieved 2022-07-21.
- ^ a b c "Lyra V2 - a better, faster, and more versatile speech codec". Google Open Source Blog. Retrieved 2023-04-26.
- ^ a b c d "Google Duo uses a new codec for better call quality over poor connections". XDA. 2021-04-09. Retrieved 2022-07-21.
- ^ a b c Levent-Levi, Tsahi (2021-04-19). "Lyra, Satin and the future of voice codecs in WebRTC". BlogGeek.me. Retrieved 2022-07-21.
- ^ Kleijn, W. B.; Lim, F. S.; Luebs, A.; Skoglund, J.; Stimberg, F.; Wang, Q.; Walters, T. C. (April 2018). Wavenet based low rate speech coding. 2018 IEEE international conference on acoustics, speech and signal processing (ICASSP). IEEE. pp. 676–680. arXiv:1712.01120.
- ^ Google (2021). "Lyra: A Very Low-Bitrate Codec for Speech Compression". GitHub. Retrieved 21 July 2022.
External links[edit]
- Lyra: A New Very Low-Bitrate Codec for Speech Compression Google blog post with a demonstration comparing codecs
See also[edit]
- Satin (codec), an AI-based codec developed by Microsoft
- Comparison of audio coding formats
- Speech coding
- Videotelephony