KWS 派系类别 r囧r小猫 2023-02-10 03:37 3阅读 0赞 > 原创: Lebhoryi@rt-thread.com > 时间: 2020/05/19 ### 文章目录 ### * 一、LVCSR * 二、HMM-GMM * 三、语音到文本 * * 3.1 DNN * 3.2 CNN | 考虑时间和频率得相关性 * 3.3 CRNN * 3.4 RNN * 3.5 ResNet | 更大得感受野 * 3.6 DSConv (Depthwise Separable Convolutions) * 3.7 TDNN * 四、Query-by-Example * 五、Other * * 5.1 Optimization model * 5.2 Loss * 5.3 Dataset * * 5.3.1 Enhance * 5.3.2 Small dataset * 5.4 Other > 之前查看论文得方法有误,鄙人还沾沾自喜,引以为戒,paper非经典或必看切勿通篇阅读 > parper 阅读手册:[你的Paper阅读能力合格了吗][Paper] # 一、LVCSR # * The LVCSR based systems need to generate rich lattices and high computational resources. # 二、HMM-GMM # # 三、语音到文本 # ## 3.1 DNN ## * 2014 | Small-footprint keyword spotting using deep neural networks * **[arXiv:1709.03665][arXiv_1709.03665] | attention | DNN + CTC** * 不需要帧对齐,需要full-sized encoders * [arXiv:1812.02802][arXiv_1812.02802] | Streaming + End-to-End * Singular value decomposition SVD,奇异值分解 减小模型 * 2019 | Multitask Learning of Deep Neural Network Based Keyword Spotting for IoT Devices | DNN + HMM * 2019 | Time-Delayed Bottleneck Highway Networks Using a DFT Feature for Keyword Spotting ## 3.2 CNN | 考虑时间和频率得相关性 ## * 2015 | Convolutional neural networks for small-footprint keyword spotting * [arXiv:1907.01448][arXiv_1907.01448] | Sub-band CNN * 2019 | A Small-Footprint End-to-End KWS System in Low Resources | CTC + End-to-End * [arXiv:1811.07684][arXiv_1811.07684] | Dilated Convolutions | end-to-end ## 3.3 CRNN ## * [arXiv:1703.05390][arXiv_1703.05390] * 解码窗1.5秒,不能做到真正得实时 * [arXiv:1911.01803][arXiv_1911.01803] | CRNN + temporal feedback connections ## 3.4 RNN ## * [arXiv:1512.08903][arXiv_1512.08903] | LSTM + CTC * [arXiv:1611.09405][arXiv_1611.09405] | RNN + CTC (End-to-End) * [arXiv:1705.02411][arXiv_1705.02411] | LSTM + Max Pooling loss * 不需要依赖于音素级信息对齐,但是受限于解码 * [ arXiv:1803.10916][arXiv_1803.10916] | Attention | Encoding - Decoding | End-to-End * 2019 | Adversarial examples for improving end-to-end attention-based small-footprint keyword spotting (End-to-End) * 2019 | [DesenNet-BiLSTM][] * [arXiv:1912.07575][arXiv_1912.07575] | Encoding - Decoding * [arXiv:2002.10851][arXiv_2002.10851] | Quantized LSTM + CTC ## 3.5 ResNet | 更大得感受野 ## * [arXiv:1710.10361][arXiv_1710.10361] * [arXiv:1904.03814][arXiv_1904.03814] | **TCNet | [hyperconnect/TC-ResNet][hyperconnect_TC-ResNet]** * [arXiv:1912.05124][arXiv_1912.05124] | CENet-GCN * **[arXiv:2004.08531][arXiv_2004.08531] | MatchboxNet** (end-to-end) ## 3.6 DSConv (Depthwise Separable Convolutions) ## * [arXiv:1711.07128][arXiv_1711.07128] | **Hello Edge** | [github][] * [arXiv:1911.02086][arXiv_1911.02086] | SincConv + DSConv | Raw audio > [arXiv:1808.00158][arXiv_1808.00158] | ASR + SincConv | [github][github 1] * [arXiv:2004.12200][arXiv_2004.12200] | DSConv + ResNet ## 3.7 TDNN ## * [2017 | Compressed time delay neural network for small-footprint keyword spotting][2017 _ Compressed time delay neural network for small-footprint keyword spotting] | SVD * 2019 | A Time Delay Neural Network with Shared Weight Self-Attention for Small-Footprint Keyword Spotting # 四、Query-by-Example # * LSTM * 2015 | Query-by-example keyword spotting using long short-term memory networks * [arXiv:1811.10736][arXiv_1811.10736] | DONUT | CTC | 后验概率 * RNN-T with attention * **[arXiv:1710.09617][arXiv_1710.09617] | Squence-to-sequence | End-to-End | keyword/filler** * 不需要帧对齐,需要full-sized encoders * [arXiv:1910.05171][arXiv_1910.05171] | query enrollment and testing | user-specific queries # 五、Other # ## 5.1 Optimization model ## * Compression methods * [arXiv:1412.6115][arXiv_1412.6115] | CV端用得比较多 * 2016 | Model compression applied to small-footprint keyword spotting * [arXiv:1712.05877][arXiv_1712.05877] * [arXiv:1902.05026][arXiv_1902.05026] * Another optimization approach,non-streaming model convert to streaming model * [arXiv:1811.07684][arXiv_1811.07684] * **[arXiv:2005.06720][arXiv_2005.06720] | [google-research/kws\_streaming][google-research_kws_streaming]** * Quantized Distillation * 2016 | [Knowledge Distillation for Small-footprint Highway Networks][] * 2018 | [Compression of End-to-End Models][] * [arXiv:1907.00873][arXiv_1907.00873] * Low rank * 2016 | Model compression applied to small-footprint keyword spotting * 2017| Compressed time delay neural network for small-footprint keyword spotting ## 5.2 Loss ## * CTC loss * 2006 | Connection-ist temporal classifification: labelling unsegmented sequence data with recurrent neural networks * Max-pooling loss * [arXiv:1705.02411][arXiv_1705.02411] * [arXiv:2001.09246][arXiv_2001.09246] | A smoothed maxpooling loss (end-to-end) ## 5.3 Dataset ## ### 5.3.1 Enhance ### * 2019 | Adversarial Examples for Improving End-to-end Attention-based Small-footprint Keyword Spotting (End-to-End) ### 5.3.2 Small dataset ### * 2018 | Fast asr-free and almost zero-resource keyword spotting using dtw and cnns for humanitarian monitoring * use DTW to augment the data * 2019 | [Meta learning for few-shot keyword spotting][] * suggests a few-shot meta-learning approach ## 5.4 Other ## * [arXiv:2005.03633][arXiv_2005.03633] | Far-field * 2019 | Improving keyword spotting and language identifification via neural architecture search at scale * 2017 | Hey Siri: An On-device DNN-powered Voice Trigger for Apple’s Personal Assistan [Paper]: https://zhuanlan.zhihu.com/p/42419025 [arXiv_1709.03665]: https://arxiv.org/abs/1709.03665 [arXiv_1812.02802]: https://arxiv.org/abs/1812.02802 [arXiv_1907.01448]: https://arxiv.org/abs/1907.01448 [arXiv_1811.07684]: https://arxiv.org/abs/1811.07684 [arXiv_1703.05390]: https://arxiv.org/abs/1703.05390 [arXiv_1911.01803]: https://arxiv.org/abs/1911.01803 [arXiv_1512.08903]: https://arxiv.org/abs/1512.08903 [arXiv_1611.09405]: https://arxiv.org/abs/1611.09405 [arXiv_1705.02411]: https://arxiv.org/abs/1705.02411 [arXiv_1803.10916]: https://arxiv.org/abs/1803.10916 [DesenNet-BiLSTM]: https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=8607038 [arXiv_1912.07575]: https://arxiv.org/abs/1912.07575 [arXiv_2002.10851]: https://arxiv.org/abs/2002.10851 [arXiv_1710.10361]: https://arxiv.org/abs/1710.10361 [arXiv_1904.03814]: https://arxiv.org/abs/1904.03814 [hyperconnect_TC-ResNet]: https://github.com/hyperconnect/TC-ResNet [arXiv_1912.05124]: https://arxiv.org/abs/1912.05124 [arXiv_2004.08531]: https://arxiv.org/abs/2004.08531 [arXiv_1711.07128]: https://arxiv.org/abs/1711.07128 [github]: https://github.com/ARM-software/ML-KWS-for-MCU [arXiv_1911.02086]: https://arxiv.org/abs/1911.02086 [arXiv_1808.00158]: https://arxiv.org/abs/1808.00158 [github 1]: https://github.com/mravanelli/SincNet/ [arXiv_2004.12200]: https://arxiv.org/abs/2004.12200 [2017 _ Compressed time delay neural network for small-footprint keyword spotting]: https://pdfs.semanticscholar.org/8eda/5fa7103406403a14342336c684b666dfdfc8.pdf [arXiv_1811.10736]: https://arxiv.org/abs/1811.10736 [arXiv_1710.09617]: https://arxiv.org/abs/1710.09617 [arXiv_1910.05171]: https://arxiv.org/abs/1910.05171 [arXiv_1412.6115]: https://arxiv.org/abs/1412.6115 [arXiv_1712.05877]: https://arxiv.org/abs/1712.05877 [arXiv_1902.05026]: https://arxiv.org/abs/1902.05026 [arXiv_2005.06720]: https://arxiv.org/abs/2005.06720 [google-research_kws_streaming]: https://github.com/google-research/google-research/tree/master/kws_streaming [Knowledge Distillation for Small-footprint Highway Networks]: https://arxiv.org/abs/1608.00892 [Compression of End-to-End Models]: https://www.isca-speech.org/archive/Interspeech_2018/abstracts/1025.html [arXiv_1907.00873]: https://arxiv.org/abs/1907.00873 [arXiv_2001.09246]: https://arxiv.org/abs/2001.09246 [Meta learning for few-shot keyword spotting]: https://arxiv.org/pdf/1910.05171 [arXiv_2005.03633]: https://arxiv.org/abs/2005.03633
还没有评论,来说两句吧...