The feature extraction network, FEN, improves upon the previous Time Convolutional Network (TCN) by aiming to capture long-term dependencies between action segments and extract rich features from the input feature dimensions, where $X=\left[x_{1}, \ldots x_{T}\right] \in \mathbb{R}^{T \times D}$. In ASRF, TCN is utilized, with each layer using dilated convolution, resulting in a larger receptive field for higher layers, but with still very low receptive fields for lower layers. To overcome this issue, this paper employs a double-layered dilated convolution, combining two convolutions with different expansion factors. The first convolution has a lower dilation coefficient for lower layers, with an exponential increase as the layers increase. For the second convolution, it starts with larger dilation factors for lower layers and decreases exponentially as the layers increase. The operation set for each layer can be formally described as follows:

对下面这段话进行润色:n特征提取网络FEN改进了以往的时间卷积网络TCN特征提取网络的目标是捕获动作段之间的长期依赖关系并提取丰富的特征其中$X=leftx_1-ldots-x_Tright-in-mathbbR^T-times-D$是输入特征的维度。在ASRF中使用的是时间卷积网络-TCN每层网络使用空洞卷积这会导致较高层的感受野较大但较低层仍然具有非常低的感受野。为了克服这个问题本文使用双层扩张卷积将两个具有不同扩张因子的卷积组合在一起。第一个卷积在较低层中的空洞系数较低并且随着层数增加而呈

原文地址: https://www.cveoy.top/t/topic/qN6 著作权归作者所有。请勿转载和采集!

免费AI点我,无需注册和登录