翻译：This approach fine-tunes some or all the layers ofthe PLM and then adds one or two simple output layers known as prediction heads Wolf et al2020 Typically these are feed-forward layers forclassifi

日期: 2024-08-19

标签: 科技

：这种方法微调PLM的一些或所有层，然后添加一个或两个简单的输出层（称为预测头，Wolf等人，2020）。通常，这些是用于分类的前馈层。输出层和PLM一起在端到端设置中进行训练，但大部分计算是应用于微调语言模型以产生所需的输入表示。输出层的任务仅是将每个标记的嵌入提供的信息压缩到所需类别的数量。单词嵌入可以来自顶层，也可以来自顶n层（通常n = 4）的连接或加权平均值（Peters等人，2018）。图2（左）显示了这种方法的示意图。

翻译：This approach fine-tunes some or all the layers ofthe PLM and then adds one or two simple output layers known as prediction heads Wolf et al2020 Typically these are feed-forward layers forclassifi

原文地址: https://www.cveoy.top/t/topic/dnYu 著作权归作者所有。请勿转载和采集!