Journal: Conference on Neural Information Processing Systems
Programming languages: Python
Project website: https://github.com/yitu-opensource/ConvBert
We present a novel span-based dynamic convolution operator and integrate it into the self-attention mechanism to form our mixed attention block for language pre-training. We also devise a bottleneck structure applied to the self-attention module and a grouped linear operation for the feed-forward module.