Editing GPT-3
Appearance
Content that violates any copyrights will be deleted. Encyclopedic content must be verifiable through citations to reliable sources.
Latest revision | Your text | ||
Line 30: | Line 30: | ||
'''Generative Pre-trained Transformer 3''' ('''GPT-3''') is a [[large language model]] released by [[OpenAI]] in 2020. |
'''Generative Pre-trained Transformer 3''' ('''GPT-3''') is a [[large language model]] released by [[OpenAI]] in 2020. |
||
Like its predecessor, [[GPT-2]], it is a decoder-only<ref name="OpenAI_Radford_20200611" /> [[transformer model]] of deep neural network, which supersedes recurrence and convolution-based architectures with a technique known as "[[Attention (machine learning)|attention]]".<ref name="2018_Attention_Paper">{{cite journal |last1=Vaswani |first1=Ashish |author1-link= Ashish Vaswani |last2=Shazeer |first2=Noam |last3=Parmar |first3=Niki |last4=Uszkoreit |first4=Jakob |last5=Jones |first5=Llion |last6=Gomez |first6=Aidan N |author6-link= Aidan Gomez |last7=Kaiser |first7=Łukasz |last8=Polosukhin |first8=Illia |title=Attention is All you Need |journal=Advances in Neural Information Processing Systems |date=2017 |volume=30 |url=https://proceedings.neurips.cc/paper/2017/file/3f5ee243547dee91fbd053c1c4a845aa-Paper.pdf |publisher=Curran Associates, Inc.}}</ref> This attention mechanism allows the model to |
Like its predecessor, [[GPT-2]], it is a decoder-only<ref name="OpenAI_Radford_20200611" /> [[transformer model]] of deep neural network, which supersedes recurrence and convolution-based architectures with a technique known as "[[Attention (machine learning)|attention]]".<ref name="2018_Attention_Paper">{{cite journal |last1=Vaswani |first1=Ashish |author1-link= Ashish Vaswani |last2=Shazeer |first2=Noam |last3=Parmar |first3=Niki |last4=Uszkoreit |first4=Jakob |last5=Jones |first5=Llion |last6=Gomez |first6=Aidan N |author6-link= Aidan Gomez |last7=Kaiser |first7=Łukasz |last8=Polosukhin |first8=Illia |title=Attention is All you Need |journal=Advances in Neural Information Processing Systems |date=2017 |volume=30 |url=https://proceedings.neurips.cc/paper/2017/file/3f5ee243547dee91fbd053c1c4a845aa-Paper.pdf |publisher=Curran Associates, Inc.}}</ref> This attention mechanism allows the model to selectively focus on segments of input text it predicts to be most relevant.<ref name="jointly">{{cite arXiv |last1= Bahdanau |first1 = Dzmitry |last2 = Cho |first2= Kyunghyun |last3= Bengio |first3= Yoshua |eprint = 1409.0473 |title= Neural Machine Translation by Jointly Learning to Align and Translate |class= cs.CL |date= 1 September 2014}}</ref> GPT-3 has 175 billion [[Parameter (machine learning)|parameters]], each with a 16-bit precision, thus requiring 350GB of storage space as each parameter takes 2 bytes of space. It has a [[context window]] size of 2048 [[Lexical analysis|tokens]], and has demonstrated strong "[[zero-shot]]" and "[[Few-shot learning (natural language processing)|few-shot]]" learning abilities on many tasks.<ref name="OpenAI_Radford_20200611">{{Cite web| page = 12| access-date = July 31, 2020| date = June 11, 2018| last1 = Radford| first1 = Alec| last2 = Narasimhan| first2 = Karthik| last3 = Salimans| first3 = Tim| last4 = Sutskever| first4 = Ilya| title = Improving Language Understanding by Generative Pre-Training| url = https://cdn.openai.com/research-covers/language-unsupervised/language_understanding_paper.pdf| archive-date = January 26, 2021| archive-url = https://web.archive.org/web/20210126024542/https://cdn.openai.com/research-covers/language-unsupervised/language_understanding_paper.pdf| url-status = live}}</ref> |
||
On September 22, 2020, [[Microsoft]] announced that it had licensed GPT-3 exclusively. Others can still receive output from its public API, but only Microsoft has access to the underlying model.<ref name="MSgotcode">{{Cite magazine |title=OpenAI is giving Microsoft exclusive access to its GPT-3 language model |url=https://www.technologyreview.com/2020/09/23/1008729/openai-is-giving-microsoft-exclusive-access-to-its-gpt-3-language-model/ |date=September 23, 2020 |last=Hao |first=Karen |access-date=2020-09-25 |magazine=[[MIT Technology Review]] |language=en |quote="The companies say OpenAI will continue to offer its public-facing [[API]], which allows chosen users to send text to GPT-3 or OpenAI's other models and receive its output. Only Microsoft, however, will have access to GPT-3's underlying code, allowing it to embed, repurpose, and modify the model as it pleases." |archive-date=February 5, 2021 |archive-url=https://web.archive.org/web/20210205121656/https://www.technologyreview.com/2020/09/23/1008729/openai-is-giving-microsoft-exclusive-access-to-its-gpt-3-language-model/ |url-status=live }}</ref> |
On September 22, 2020, [[Microsoft]] announced that it had licensed GPT-3 exclusively. Others can still receive output from its public API, but only Microsoft has access to the underlying model.<ref name="MSgotcode">{{Cite magazine |title=OpenAI is giving Microsoft exclusive access to its GPT-3 language model |url=https://www.technologyreview.com/2020/09/23/1008729/openai-is-giving-microsoft-exclusive-access-to-its-gpt-3-language-model/ |date=September 23, 2020 |last=Hao |first=Karen |access-date=2020-09-25 |magazine=[[MIT Technology Review]] |language=en |quote="The companies say OpenAI will continue to offer its public-facing [[API]], which allows chosen users to send text to GPT-3 or OpenAI's other models and receive its output. Only Microsoft, however, will have access to GPT-3's underlying code, allowing it to embed, repurpose, and modify the model as it pleases." |archive-date=February 5, 2021 |archive-url=https://web.archive.org/web/20210205121656/https://www.technologyreview.com/2020/09/23/1008729/openai-is-giving-microsoft-exclusive-access-to-its-gpt-3-language-model/ |url-status=live }}</ref> |