Some of the interesting statistics from GPT-3 are stated below:
GPT-3 is way ahead of existing models with 175B trainable parameters [1].
GPT-3 has the largest training data, a whooping 45TB [2].
GPT-3 is being used by over 300 applications [3].
As of March 2021, an average of 4.5B words are generated per day [3].
Algolia tested GPT-3 on 2.1M news articles and get 91% precision [3].
GPT-3 is 117x more complex than GPT-2 [10].
GPT-3 Models & Parameters
Various Natural Language Processing models are available and kept evolving. Prominent GPT-3 ranking statistics in terms of various models are as follows:
The deep learning model’s training resources doubled every 3.4 months during the last decade [4].
A 300K times increase in computational resources is observed between 2012 and 2018 [4].
Currently, GPT-3 has the largest data corpus 45TB trained with 499 Billion tokens [2].
A previous model T5 was trained on only a 7TB dataset [6].
GPT-3 has 175B trainable parameters [1].
GPT-3’s disruptive technology shows that ~70% of software development can be automated [7].
Earlier NLP models, ELMo, had 94M parameters, BERT had 340M, GPT-2 had 1.5B, and Turing NLG had 17B [8].
BERT by Google has 470x fewer parameters than GPT-3 [9].
GPT-3 contains 100x more parameters than its predecessor GPT-2 [1].
GPT-3 has 10x more parameters than Microsoft’s Turing NLG model [1].
The capacity of GPT-n models is enhanced by 3 orders of magnitudes with GPT-3.
GPT-3 is 117x complex than GPT-2 [10].
GPT-3 outperformed SOTA for the LAMBDA dataset with an 8% efficiency improvement [2].
Compared to SOTA which provides 60% accuracy for two-digit addition and subtraction GPT-3 Fine-tuned model depicts 100% [11].
GPT-3 based Algolia accurately answers complex natural language questions 4x better than BERT [3].
As of November 2021, Microsoft has announced a larger model Megatron-Turing NLG with 530B parameters [8].
GPT-n Timeline
In 2018, the GPT-n series was initiated by OpenAI to enhance the NLP models i.e. human-like speech, text, and coding. A statistical comparison of GPT-n is provided below: [1]
GPT-1 has 12-layers with 12 attention heads and a 768-dimensional state.
GPT-1’s training data, BooksCorpus, had almost 7000 unpublished books amounting to ~5GB of text.
GPT-1 performed well in 9 out of 12 tasks compared with supervised SOTA models along with decent zero-shot performance on various tasks.
GPT-2, a successor to GPT-1 launched in 2019, is trained on 10x the parameters and amount of data as GPT-1.
GPT-2 has 1.5B parameters and 40GB dataset, WebText, including 8M web pages.
GPT-2 provided improved results for 7 out of 8 existing SOTA models and also performed well in the zero-shot setting.
GPT-3 outperforms prior language models with 100x parameters than GPT-2.
GPT-3 has 175B trainable parameters and 12288-word embedding (dimensions).
Model
Launch Year
Training Data
Training Parameters
Attention Layers
Word Embedding
Attention Heads
GPT-1
2018
7000 Books ~5GB
117M
12
768
12
GPT-2
2019
8 million documents ~40GB
1.5B
48
1600
48
GPT-3
2020
Multiple Source ~45TB
175B
96
12288
96
GPT-3 Training Model Statistics
The statistics of multiple datasets used to train the model are as follows:
GPT-3 is trained with total 499B tokens, or 700GB [2].
Common Crawl weighted 60%, contains diverse data from web crawling over years [2].
WebText2 spanning 22%, includes the dataset from outbound Reddit links [2].
Books1 and Books2 with a combined share of 16%, contain internet-based books corpora [2].
Wikipedia weighted 3%, includes data from Wikipedia pages in English [2].
GPT-3 is not available as open source but through a commercial API. Some of the astonishing stats regarding the company’s status and running cost of GPT-3 are as follows:
In 2015, OpenAI began as a nonprofit research lab.
In 2019, OpenAI switched from a non-profit organization to a for-profit company [5].
Microsoft partnered with OpenAI with a $1B investment [12].
GPT-3 training requires 3.114×1023 FLOPS (floating-point operations) which cost $4.6M using a Tesla V100 cloud instance at $1.5/hour and take 355 GPU-years [13].
GPT-3 can’t be trained on a single GPU but requires distributed system increases the cost of training the final model by 1.5x – 5x [14].
The R&D cost of GPT-3 ranges from $11.5M to $27.6M, excluding the overhead of parallel GPUs, salaries, and submodel costs [14].
In parallel GPT-3 requires at least 11 Tesla V100 GPUs with 32GB memory each, at a cost of $9,000/piece summing to $99,000 for GPU cluster excluding RAM, CPU, SSD drives, and power supply [13].
GPT-3 model cost $12.6M with at least 350GB of VRAM (half-precision FLOP at 16 bits/parameter) just to load the model and run inference, putting VRAM north of 400 GB [15].
Hardware costs of running would be $100,000 – $150,000 neglecting power supply, cooling, and backup costs [14].
A baseline Nvidia’s DGX-1 server, VRAM (8×16GB), costs around $130,000 including all other components for a solid performance on GPT-3 [16].
If run in the cloud, GPT-3 requires at least Amazon’s p3dn.24xlarge, packed with 8xTesla V100 (32GB), 768GB RAM, and 96 CPU cores, and costs $10-30/hour, and a minimum of $87,000 yearly [14].
OpenAI may work in collaboration with Microsoft on specialized hardware, such as the supercomputer leading to cost-efficient solutions [14].
GPT-3 has a supercomputer hosted in Microsoft’s Azure cloud, consisting of 285k CPU cores and 10k high-end GPUs [17].
The preliminary pricing plan provides OpenAI with a near-6,000-percent profit margin, providing room for much adjustment if the current business plan doesn’t bring in customers [18].
GPT-3 Pricing
OpenAI provides diverse pricing plans for its API. Some of the pricing stats are defined here,
GPT-3 has a free plan, Start for free, for $18 in free credit for the first 3 months [19].
Two other paid plans include a flexible ‘Pay as you go’ and a complex ‘Choose your model’ [19].
Billing is done per 1000 tokens i.e. about 750 words [19]
A token equals 4 characters or 0.75 words for English text [19].
Every model has a predefined maximum context length, ranging from 1500 to 2048 tokens [20].
Based on the spectrum of capabilities and choices GPT-3 provides 4 pricing models [21].
Ada priced at $0.0008/1K tokens, performs the fastest at cost of lesser capabilities [21].
Babbage costs $0.0012/1K tokens, is good for straightforward tasks [21].
Curie charged $0.0060/1K tokens, has the ability for nuanced tasks, and good as a general chatbot [21].
Davinci priced $0.0600/1K tokens gives the best results for complex intent [21].
GPT-3 provides a customizable Fine-tuned model billed at 50% of the base price and an expensive Embedding model to build advanced search [22,23].
The performance and accuracy of GPT-3 are studied over various existing datasets. The interesting performance stats are as follows:
Significant performance improvement is depicted over LAMBADA and PhysicalQA (PIQA) [2].
A prominent gain of 8% is achieved in zero-shot setting LAMBADA by GPT-3 compared to SOTA [2].
A substantial 4% improved accuracy is depicted for PIQA compared to previous SOTA – a fine-tuned RoBERTa [2].
HellaSwag and StoryCloze showed respectable performance but still lower than SOTA [2].
HellaSwag results are lower compared to fine-tuned multi-task model ALUM [29].
StoryCloze is 4.1% behind the SOTA using fine-tuned BERT model [29].
Winograd shows 88.3%, 89.7%, and 88.6% in the zero-shot, one-shot, and few-shot settings respectively, and depicting strong results but below SOTA [13].
GPT-3 Fine-tuned model depict 100% accuracy for two-digit addition and subtraction [11].
Short articles (~200 words) written by GPT-3 175B are humanly detectable for change at ~52% [13].
The articles written by GPT-3 125M are 76% human detectable. ([13])
Some of the businesses and applications utilizing GPT-3 are aforementioned in Tailored Model Statistics and Pricing Statistics. The stats of some more platforms and apps powered by GPT-3 are stated below:
GPT-3 is being utilized by over 300 applications [30]
The platform has tens of thousands of developers around the globe [11].
As of March 2021 an average of 4.5B words are generated per day [30].
Algolia tested GPT-3 on 2.1M news articles and get 91% precision [30].
Duolingo using GPT-3 observed e 12% improvement in prediction accuracy and user engagement [31].
DALL·E 2 based on 12B GPT-3 is preferred by 71.7of % of users for caption matching and by 88.8% for photo realism [32].
GPT-3 Use Cases
GPT-3 is a new artificial intelligence system that is said to be the most powerful AI system in the world. GPT-3 has many potential uses, including helping humans with their work, providing better customer service, and even becoming a personal assistant. Here are some of the common GPT-3 uses cases:
Business
GPT-3, the world’s largest artificial intelligence model, is now available to the public. And businesses are taking notice. Businesses are already using AI to improve customer service, create new products, and automate repetitive tasks.
Marketing
GPT-3 is a powerful tool for marketing. AI marketing tools can help you create better content, target your audience more effectively, and track your results. Additionally, GPT-3 can help you track your progress and analyze your results so that you can optimize your marketing strategies.
Customer Service
AI in customer service is revolutionizing how businesses interact with their customers. By automating routine tasks and providing instant answers to common questions, AI is helping businesses improve their customer service experience. In addition, GPT-3 powered chatbots can handle complex customer inquiries, freeing up human agents to provide more personalized service.
Data analysis
AI can help identify patterns and correlations that humans might miss. It can also help automate the analysis process, making it faster and easier. Additionally, AI can provide insights that would not be possible without its help. For these reasons, AI is becoming an essential tool for data analysts.
Content Creation
GPT-3 based AI content creation tools are being used to write articles, create videos, and even generate social media posts.
Design
AI design tools powered by GPT-3 have the potential to improve the efficiency and quality of the design process by automating repetitive tasks, providing personalized recommendations, and assisting in the exploration of design options.
GPT-3 Statistics Final Words
The article provides the GPT-3 growth story based on prominent statistics. The GPT-n models are substantially growing and the research community is curious about GPT-4. According to a reviewer at Hacker News,
“A typical human brain has over 100 trillion synapses, which is another three orders of magnitudes larger than the GPT-3 175B model. Given it takes OpenAI just about a year and a quarter to increase their GPT model capacity by two orders of magnitude from 1.5B to 175B, having models with trillions of weight suddenly looks promising.”
I am an author, blogger, and full-time online entrepreneur based in the UK, focusing on e-commerce and affiliate marketing. Business Solution provides people with currently working marketing techniques that I personally use to build profitable online businesses, including SEO, traffic generation, affiliate marketing, and more.
Launch Profitable Sales Funnels
Every online business needs a repeatable and consistent sales process that converts their visitors into leads, customers, and advocates.