KWS 派系类别

r囧r小猫 2023-02-10 03:37 3阅读 0赞

> 原创: Lebhoryi@rt-thread.com  
> 时间: 2020/05/19

### 文章目录 ###

*  一、LVCSR
 *  二、HMM-GMM
 *  三、语音到文本
 *   *  3.1 DNN
     *  3.2 CNN | 考虑时间和频率得相关性
     *  3.3 CRNN
     *  3.4 RNN
     *  3.5 ResNet | 更大得感受野
     *  3.6 DSConv (Depthwise Separable Convolutions)
     *  3.7 TDNN
 *  四、Query-by-Example
 *  五、Other
 *   *  5.1 Optimization model
     *  5.2 Loss
     *  5.3 Dataset
     *   *  5.3.1 Enhance
         *  5.3.2 Small dataset
     *  5.4 Other

> 之前查看论文得方法有误，鄙人还沾沾自喜，引以为戒，paper非经典或必看切勿通篇阅读  
> parper 阅读手册：[你的Paper阅读能力合格了吗][Paper]

# 一、LVCSR #

*  The LVCSR based systems need to generate rich lattices and high computational resources.

# 二、HMM-GMM #

# 三、语音到文本 #

## 3.1 DNN ##

*  2014 | Small-footprint keyword spotting using deep neural networks
 *  **[arXiv:1709.03665][arXiv_1709.03665] | attention | DNN + CTC**
    
     *  不需要帧对齐，需要full-sized encoders
 *  [arXiv:1812.02802][arXiv_1812.02802] | Streaming + End-to-End
    
     *  Singular value decomposition SVD，奇异值分解 减小模型
 *  2019 | Multitask Learning of Deep Neural Network Based Keyword Spotting for IoT Devices | DNN + HMM
 *  2019 | Time-Delayed Bottleneck Highway Networks Using a DFT Feature for Keyword Spotting

## 3.2 CNN | 考虑时间和频率得相关性 ##

## 3.3 CRNN ##

*  [arXiv:1703.05390][arXiv_1703.05390]
    
     *  解码窗1.5秒，不能做到真正得实时
 *  [arXiv:1911.01803][arXiv_1911.01803] | CRNN + temporal feedback connections

## 3.4 RNN ##

*  [arXiv:1512.08903][arXiv_1512.08903] | LSTM + CTC
 *  [arXiv:1611.09405][arXiv_1611.09405] | RNN + CTC (End-to-End)
 *  [arXiv:1705.02411][arXiv_1705.02411] | LSTM + Max Pooling loss
    
     *  不需要依赖于音素级信息对齐，但是受限于解码
 *  [ arXiv:1803.10916][arXiv_1803.10916] | Attention | Encoding - Decoding | End-to-End
 *  2019 | Adversarial examples for improving end-to-end attention-based small-footprint keyword spotting (End-to-End)
 *  2019 | [DesenNet-BiLSTM][]
 *  [arXiv:1912.07575][arXiv_1912.07575] | Encoding - Decoding
 *  [arXiv:2002.10851][arXiv_2002.10851] | Quantized LSTM + CTC

## 3.5 ResNet | 更大得感受野 ##

*  [arXiv:1710.10361][arXiv_1710.10361]
 *  [arXiv:1904.03814][arXiv_1904.03814] | **TCNet | [hyperconnect/TC-ResNet][hyperconnect_TC-ResNet]**
 *  [arXiv:1912.05124][arXiv_1912.05124] | CENet-GCN
 *  **[arXiv:2004.08531][arXiv_2004.08531] | MatchboxNet** (end-to-end)

## 3.6 DSConv (Depthwise Separable Convolutions) ##

## 3.7 TDNN ##

*  [2017 | Compressed time delay neural network for small-footprint keyword spotting][2017 _ Compressed time delay neural network for small-footprint keyword spotting] | SVD
 *  2019 | A Time Delay Neural Network with Shared Weight Self-Attention for  
    Small-Footprint Keyword Spotting

# 四、Query-by-Example #

*  LSTM
    
     *  2015 | Query-by-example keyword spotting using long short-term memory networks
 *  [arXiv:1811.10736][arXiv_1811.10736] | DONUT | CTC | 后验概率
 *  RNN-T with attention
    
     *  **[arXiv:1710.09617][arXiv_1710.09617] | Squence-to-sequence | End-to-End | keyword/filler**
        
         *  不需要帧对齐，需要full-sized encoders
 *  [arXiv:1910.05171][arXiv_1910.05171] | query enrollment and testing | user-specific queries

# 五、Other #

## 5.1 Optimization model ##

*  Compression methods
    
     *  [arXiv:1412.6115][arXiv_1412.6115] | CV端用得比较多
     *  2016 | Model compression applied to small-footprint keyword spotting
     *  [arXiv:1712.05877][arXiv_1712.05877]
     *  [arXiv:1902.05026][arXiv_1902.05026]
 *  Another optimization approach，non-streaming model convert to streaming model
    
     *  [arXiv:1811.07684][arXiv_1811.07684]
     *  **[arXiv:2005.06720][arXiv_2005.06720] | [google-research/kws\_streaming][google-research_kws_streaming]**
 *  Quantized Distillation
    
     *  2016 | [Knowledge Distillation for Small-footprint Highway Networks][]
     *  2018 | [Compression of End-to-End Models][]
     *  [arXiv:1907.00873][arXiv_1907.00873]
 *  Low rank
    
     *  2016 | Model compression applied to small-footprint keyword spotting
     *  2017| Compressed time delay neural network for small-footprint keyword  
        spotting

## 5.2 Loss ##

*  CTC loss
    
     *  2006 | Connection-ist temporal classifification: labelling unsegmented sequence data with recurrent neural networks
 *  Max-pooling loss
    
     *  [arXiv:1705.02411][arXiv_1705.02411]
     *  [arXiv:2001.09246][arXiv_2001.09246] | A smoothed maxpooling loss (end-to-end)

## 5.3 Dataset ##

### 5.3.1 Enhance ###

*  2019 | Adversarial Examples for Improving End-to-end Attention-based Small-footprint Keyword Spotting (End-to-End)

### 5.3.2 Small dataset ###

*  2018 | Fast asr-free and almost zero-resource keyword spotting using dtw and cnns for humanitarian monitoring
    
     *  use DTW to augment the data
 *  2019 | [Meta learning for few-shot keyword spotting][]
    
     *  suggests  
        a few-shot meta-learning approach

## 5.4 Other ##

*  [arXiv:2005.03633][arXiv_2005.03633] | Far-field
 *  2019 | Improving keyword spotting and language identifification via neural architecture search at scale
 *  2017 | Hey Siri: An On-device DNN-powered Voice Trigger for Apple’s Personal Assistan

[Paper]: https://zhuanlan.zhihu.com/p/42419025
[arXiv_1709.03665]: https://arxiv.org/abs/1709.03665
[arXiv_1812.02802]: https://arxiv.org/abs/1812.02802
[arXiv_1907.01448]: https://arxiv.org/abs/1907.01448
[arXiv_1811.07684]: https://arxiv.org/abs/1811.07684
[arXiv_1703.05390]: https://arxiv.org/abs/1703.05390
[arXiv_1911.01803]: https://arxiv.org/abs/1911.01803
[arXiv_1512.08903]: https://arxiv.org/abs/1512.08903
[arXiv_1611.09405]: https://arxiv.org/abs/1611.09405
[arXiv_1705.02411]: https://arxiv.org/abs/1705.02411
[arXiv_1803.10916]: https://arxiv.org/abs/1803.10916
[DesenNet-BiLSTM]: https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=8607038
[arXiv_1912.07575]: https://arxiv.org/abs/1912.07575
[arXiv_2002.10851]: https://arxiv.org/abs/2002.10851
[arXiv_1710.10361]: https://arxiv.org/abs/1710.10361
[arXiv_1904.03814]: https://arxiv.org/abs/1904.03814
[hyperconnect_TC-ResNet]: https://github.com/hyperconnect/TC-ResNet
[arXiv_1912.05124]: https://arxiv.org/abs/1912.05124
[arXiv_2004.08531]: https://arxiv.org/abs/2004.08531
[arXiv_1711.07128]: https://arxiv.org/abs/1711.07128
[github]: https://github.com/ARM-software/ML-KWS-for-MCU
[arXiv_1911.02086]: https://arxiv.org/abs/1911.02086
[arXiv_1808.00158]: https://arxiv.org/abs/1808.00158
[github 1]: https://github.com/mravanelli/SincNet/
[arXiv_2004.12200]: https://arxiv.org/abs/2004.12200
[2017 _ Compressed time delay neural network for small-footprint keyword spotting]: https://pdfs.semanticscholar.org/8eda/5fa7103406403a14342336c684b666dfdfc8.pdf
[arXiv_1811.10736]: https://arxiv.org/abs/1811.10736
[arXiv_1710.09617]: https://arxiv.org/abs/1710.09617
[arXiv_1910.05171]: https://arxiv.org/abs/1910.05171
[arXiv_1412.6115]: https://arxiv.org/abs/1412.6115
[arXiv_1712.05877]: https://arxiv.org/abs/1712.05877
[arXiv_1902.05026]: https://arxiv.org/abs/1902.05026
[arXiv_2005.06720]: https://arxiv.org/abs/2005.06720
[google-research_kws_streaming]: https://github.com/google-research/google-research/tree/master/kws_streaming
[Knowledge Distillation for Small-footprint Highway Networks]: https://arxiv.org/abs/1608.00892
[Compression of End-to-End Models]: https://www.isca-speech.org/archive/Interspeech_2018/abstracts/1025.html
[arXiv_1907.00873]: https://arxiv.org/abs/1907.00873
[arXiv_2001.09246]: https://arxiv.org/abs/2001.09246
[Meta learning for few-shot keyword spotting]: https://arxiv.org/pdf/1910.05171
[arXiv_2005.03633]: https://arxiv.org/abs/2005.03633

KWS 派系类别

发表评论取消回复

还没有评论，来说两句吧...

相关阅读