大语言模型(large language model,LLM)发展历程及模型相关信息汇总(2023-07-12更新

在这里插入图片描述
LLM发展时间轴:以下用表格形式汇总了从 BERT(2018-10-11)到 Baichuan(203-06-15)共计 58种语言大模型的相关信息:主要从 模型名称,发布时间,模型参数,发布机构,github/官网,发表论文7个维度进行统计。

排序 模型名称 发布时间 模型参数 发布机构 GitHub/官网 论文
57 Baichuan-7B 2023-06-15 70亿 百川智能 github.com/baichuan-inc
56 Aquila-7B 2023-06-10 70亿 BAAI github.com/FlagAI-Open/
55 Falcon 2023-05-24 400亿 Technology Innovation Institute falconllm.tii.ae/
54 Guanaco 2023-05-23 70亿~650亿 University of Washington github.com/artidoro/qlo QLORA: Efficient Finetuning of Quantized LLMs
53 RWKV 2023-05-22 70亿 RWKV Foundation github.com/BlinkDL/RWKV RWKV: Reinventing RNNs for the Transformer Era
52 CodeT5+ 2023-05-13 160亿 Salesforce github.com/salesforce/C CodeT5+: Open Code Large Language Models for Code Understanding and Generation
51 PaLM2 2023-05-10 10亿~100亿 Google ai.google/static/docume PaLM 2 Technical Report
50 RedPajamaINCITE 2023-05-05 28亿 TOGETHER huggingface.co/together Releasing 3B and 7B RedPajama-INCITE family of models including base, instruction-tuned & chat models
49 MPT 2023-05-05 70亿 MosaicML github.com/mosaicml/llm Introducing MPT-7B: A New Standard for Open-Source, Commercially Usable LLMs
48 StarCoder 2023-05-05 70亿 Hugging Face github.com/bigcode-proj Star Coder: May the Source be With You!
47 OpenLLaMa 2023-05-03 70亿 Berkeley Artificial Intelligence Research github.com/openlm-resea OpenLLaMA: An Open Reproduction of LLaMA
46 StableLM 2023-04-20 30亿&70亿 Stability AI stability.ai/blog/stabi Stability AI Launches the First of its StableLM Suite of Language Models
44 Koala 2023-04-03 130亿 Berkeley Artificial Intelligence Research github.com/young-geng/E Koala: A Dialogue Model for Academic Research
43 Vicuna-13B 2023-03-31 130亿 LM-SYS github.com/lm-sys/FastC Vicuna: An Open-Source Chatbot Impressing GPT-4 with 90% ChatGPT Quality
42 BloombergGPT 2023-03-30 500亿 Bloomberg bloomberg.com/company/p BloombergGPT: A Large Language Model for Finance
41 GPT4All 2023-03-29 70亿 Nomic AI github.com/nomic-ai/gpt GPT4All: Training an Assistant-style Chatbot with Large Scale Data Distillation from GPT-3.5-Turbo
40 Dolly 2023-03-24 60亿 Databricks huggingface.co/databric Hello Dolly: Democratizing the magic of ChatGPT with open models
39 ChatGLM-6B 2023-03-14 62亿 清华大学 github.com/THUDM/ChatGL ChatGLM-6B: An Open Bilingual Dialogue Language Model
38 GPT-4 2023-03-14 未知 OpenAI cdn.openai.com/papers/g GPT-4 Technical Report
37 StanfordAlpaca 2023-03-13 70亿 Stanford github.com/tatsu-lab/st Alpaca: A Strong, Replicable Instruction-Following Model
36 LLaMA 2023-02-24 70亿~650亿 Meta github.com/facebookrese LLaMA: Open and Efficient Foundation Language Models
35 GPT-3.5 2022-11-30 1750亿 OpenAI platform.openai.com/doc GPT-3.5 Model
34 BLOOM 2022-11-09 1760亿 BigScience huggingface.co/bigscien BLOOM: A 176B-Parameter Open-Access Multilingual Language Model
33 BLOOMZ 2022-11-03 1760亿 BigScience github.com/bigscience-w Crosslingual Generalization through Multitask Finetuning
32 mT0 2022-11-03 130亿 BigScience github.com/bigscience-w Crosslingual Generalization through Multitask Finetuning
31 Flan-U-PaLM 2022-10-20 5400亿 Google github.com/google-resea Scaling Instruction-Finetuned Language Models
30 Flan-T5 2022-10-20 110亿 Google github.com/google-resea Scaling Instruction-Finetuned Language Models
29 WeLM 2022-09-21 100亿 微信 welm.weixin.qq.com/docs WeLM: A Well-Read Pre-trained Language Model for Chinese
28 PLUG 2022-09-01 270亿 阿里达摩院 github.com/alibaba/Alic PLUG: Pre-training for Language Understanding and Generation
27 OPT 2022-05-02 1750亿 Meta github.com/facebookrese OPT: Open Pre-trained Transformer Language Models
26 PaLM 2022-04-05 5400亿 Google github.com/lucidrains/P PaLM: Scaling Language Modeling with Pathways
25 Chinchilla 2022-03-29 700亿 Google DeepMind deepmind.com/blog/an-em Training Compute-Optimal Large Language Models
24 CodeGen 2022-03-25 160亿 Salesforce github.com/salesforce/c CodeGen: An Open Large Language Model for Code with Multi-Turn Program Synthesis
23 GLM-130B 2022-03-17 1300亿 清华大学 github.com/THUDM/GLM-13 GLM: General Language Model Pretraining with Autoregressive Blank Infilling
22 InstructGPT 2022-03-04 1750亿 OpenAI github.com/openai/follo Training Language Models to Follow Instructions with Human Feedback
21 AlphaCode 2022-02-08 410亿 Google DeepMind deepmind.com/blog/compe Competition-Level Code Generation with AlphaCode
20 MT-NLG 2022-01-28 5300亿 Microsoft github.com/microsoft/De Using DeepSpeed and Megatron to Train Megatron-Turing NLG 530B, A Large-Scale Generative Language Model
19 LaMDA 2022-01-20 1370亿 Google github.com/conceptofmin LaMDA: Language Models for Dialog Applications
18 WebGPT 2021-12-17 1750亿 OpenAI openai.com/research/web WebGPT: Browser-assisted question-answering with human feedback
17 GLaM 2021-12-13 12000亿 Google ai.googleblog.com/2021/ GLaM: Efficient Scaling of Language Models with Mixture-of-Experts
16 Gopher 2021-12-08 2800亿 Google DeepMind deepmind.com/blog/langu Scaling Language Models: Methods, Analysis & Insights from Training Gopher
15 T0 2021-10-15 110亿 Hugging Face github.com/bigscience-w Multitask Prompted Training Enables Zero-Shot Task Generalization
14 FLAN 2021-09-03 1370亿 Google github.com/google-resea Finetuned Language Models Are Zero-Shot Learners
13 Codex 2021-07-07 120亿 OpenAI github.com/openai/human Evaluating large language models trained on code
12 ERNIE3.0 2021-07-05 100亿 百度 github.com/PaddlePaddle ERNIE 3.0: Large-scale Knowledge Enhanced Pre-training for Language Understanding and Generation
11 PanGu-Alpha 2021-04-26 2000亿 华为 openi.pcl.ac.cn/PCL-Pla PanGu-α: Large-scale Autoregressive Pretrained Chinese Language Models with Auto-parallel Computation
10 SwitchTransformer 2021-01-11 16000亿 Google huggingface.co/google/s Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity
9 mT5 2020-10-22 130亿 Google huggingface.co/google/m mT5: A Massively Multilingual Pre-trained Text-to-Text Transformer
8 GShard 2020-06-30 6000亿 Google arxiv.org/pdf/2006.1666 GShard: Scaling Giant Models with Conditional Computation and Automatic Sharding
7 GPT-3 2020-05-28 1750亿 OpenAI github.com/openai/gpt-3 Language Models are Few-Shot Learners
6 Turing-NLG 2020-02-13 170亿 Microsoft microsoft.com/en-us/res Turing-NLG: A 17-billion-parameter language model by Microsoft
5 T5 2019-10-23 110亿 Google github.com/google-resea Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer
4 XLNet 2019-06-19 3.4亿 Google Brain github.com/zihangdai/xl XLNet: Generalized Autoregressive Pretraining for Language Understanding
3 Baidu-ERNIE 2019-04-19 3.4亿 百度 github.com/PaddlePaddle ERNIE: Enhanced Representation through Knowledge Integration
2 GPT-2 2019-02-14 15亿 OpenAI github.com/openai/gpt-2 Language Models are Unsupervised Multitask Learners
1 BERT 2018-10-11 3.4亿 Google github.com/google-resea Bidirectional Encoder Representations from Transformers
0 GPT-1 2018-06-11 1.17 亿 OpenAI github.com/openai/finet Improving Language Understanding by Generative Pre-Training

其中具有代表性的节点作品:

-结合对齐和翻译的神经网络机器翻译模型

论文题目Neural Machine Translation by Jointly Learning to Align and Translate (2014)

论文解读论文笔记《Neural Machine Translation by Jointly Learning to Align and Translate》

这篇文章引入了一种注意力机制(attention mechanism),用于提升递归神经网络(RNN)的长序列建模能力。这使得 RNN 能够更准确地翻译更长的句子——这也是后来开发出原始 Transformer 模型的动机。

Transformer注意力机制

论文题目Attention Is All You Need (2017)

论文解读详解Transformer (Attention Is All You Need)

这篇论文介绍了原始 Transformer 模型的结构。该模型由编码器和解码器两部分组成,这两个部分在后续模型中分离成两个独立的模块。此外,该论文还引入了缩放点积注意力机制(Scaled Dot Product Attention Mechanism)、多头注意力机制(Multi-head Attention Blocks)和位置编码(Positional Input Encoding)等概念,这些概念仍然是现代 Transformer 系列模型的基础。

BERT: 语言理解的深度双向 Transformer 预训练

论文题目BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding (2018)

论文解读[详解] 一文读懂 BERT 模型

在原始的 Transformer 模型之后,大语言模型研究开始向两个方向分化:基于编码器结构的 Transformer 模型用于预测建模任务,例如文本分类;而基于解码器结构的 Transformer 模型用于生成建模任务,例如翻译、摘要和其他形式的文本内容生成。

GPT1:通过生成预训练改进语言理解

论文题目Improving Language Understanding by Generative Pre-Training (2018)

论文解读ChatGPT1论文解读《Improving Language Understanding by Generative Pre-Training》(2018)

在预训练阶段增加Transformer中间层可以显著提升效果;整个模型在12个数据集中的9个取得了更好的效果,说明该模型架构设计很不错,值得继续深入研究;辅助目标学习对于数据量越大的场景,可以越提升模型 的泛化能力。

GPT2:

论文题目:Language Models are Unsupervised Multitask Learners(2019)

GPT-2模型依旧使用Transformer模型的decoder,但相比于GPT-1,数据和模型参数变得更大,大约是之前的10倍,主打zero-shot任务。

GPT3:

论文题目:Language Models are Few-Shot Learners(2020)

论文解读:GPT-3阅读笔记:Language Models are Few-Shot Learners

GPT-3不再追求极致的zero-shot学习,即不给你任何样例去学习,而是利用少量样本去学习。因为人类也不是不看任何样例学习的,而是通过少量样例就能有效地举一反三。
由于GPT-3庞大的体量,在下游任务进行fine-tune的成本会很大。因此GPT-3作用到下游子任务时,不进行任何的梯度更新或fine-tune。

GPT4:生成式预训练变换模型

论文题目GPT-4 Technical Report(2023)

论文解读GPT-4大模型硬核解读,看完成半个专家

论文解读:GPT系列论文阅读笔记

整理数据来源于网上公开资源,如有不对之处请指正,谢谢。

参考:

1.关于 ChatGPT 必看的 10 篇论文

2.理解大语言模型–10篇论文的简明清单

3.GPT-4论文精读【论文精读·53】

4 .通向AGI之路:大型语言模型(LLM)技术精要

5.万字长文:LLM - 大语言模型发展简史

Logo

Agent 垂直技术社区,欢迎活跃、内容共建。

更多推荐