BERT、BART、GPT-2、XLNet等模型参数量统计见:
https://huggingface.co/transformers/v2.2.0/pretrained_models.html
GPT-2有15亿参数(gpt2-xl,48-layer, 1600-hidden, 25-heads, 1558M parameters)
GPT-3 (around 175 Billion trainable parameters, 96 attention layers)
中文预训练语言模型见:
https://github.com/lonePatient/awesome-pretrained-chinese-nlp-models
参考:
GPT-3,https://www.springboard.com/blog/ai-machine-learning/machine-learning-gpt-3-open-ai/#:~:text=In%20fact%2C%20with%20around%20175,and%20limitations%20of%20the%20model.