Journal: Conference on Neural Information Processing Systems
Languages: All Languages
Programming languages: Python
Project website: https://github.com/google-research/bigbird
BigBird, is a sparse-attention based transformer which extends Transformer based models, such as BERT to much longer sequences. Moreover, BigBird comes along with a theoretical understanding of the capabilities of a complete transformer that the sparse model can handle.