Top GPT-3 Statistics
Some of the interesting statistics from GPT-3 are stated below:
GPT-3 is way ahead of existing models with 175B trainable parameters .
GPT-3 has the largest training data, a whooping 45TB .
GPT-3 is being used by over 300 applications .
As of March 2021, an average of 4.5B words are generated per day .
Algolia tested GPT-3 on 2.1M news articles and get 91% precision .
GPT-3 is 117x more complex than GPT-2 .
GPT-3 Models & Parameters
models are available and kept evolving. Prominent GPT-3 ranking statistics in terms of various models are as follows: Natural Language Processing
The deep learning model’s training resources doubled every 3.4 months during the last decade .
A 300K times increase in computational resources is observed between 2012 and 2018 .
Currently, GPT-3 has the largest data corpus 45TB trained with 499 Billion tokens .
A previous model T5 was trained on only a 7TB dataset .
GPT-3 has 175B trainable parameters .
GPT-3’s disruptive technology shows that ~70% of software development can be automated .
Earlier NLP models, ELMo, had 94M parameters, BERT had 340M, GPT-2 had 1.5B, and Turing NLG had 17B .
BERT by Google has 470x fewer parameters than GPT-3 .
GPT-3 contains 100x more parameters than its predecessor GPT-2 .
GPT-3 has 10x more parameters than Microsoft’s Turing NLG model .
The capacity of GPT-n models is enhanced by 3 orders of magnitudes with GPT-3.
GPT-3 is 117x complex than GPT-2 .
GPT-3 outperformed SOTA for the LAMBDA dataset with an 8% efficiency improvement .
Compared to SOTA which provides 60% accuracy for two-digit addition and subtraction GPT-3 Fine-tuned model depicts 100% .
GPT-3 based Algolia accurately answers complex natural language questions 4x better than BERT .
As of November 2021, Microsoft has announced a larger model Megatron-Turing NLG with 530B parameters .
In 2018, the GPT-n series was initiated by OpenAI to enhance the NLP models i.e. human-like speech, text, and coding. A statistical comparison of GPT-n is provided below: 
GPT-1 has 12-layers with 12 attention heads and a 768-dimensional state.
GPT-1’s training data, BooksCorpus, had almost 7000 unpublished books amounting to ~5GB of text.
GPT-1 performed well in 9 out of 12 tasks compared with supervised SOTA models along with decent zero-shot performance on various tasks.
GPT-2, a successor to GPT-1 launched in 2019, is trained on 10x the parameters and amount of data as GPT-1.
GPT-2 has 1.5B parameters and 40GB dataset, WebText, including 8M web pages.
GPT-2 provided improved results for 7 out of 8 existing SOTA models and also performed well in the zero-shot setting.
GPT-3 outperforms prior language models with 100x parameters than GPT-2.
GPT-3 has 175B trainable parameters and 12288-word embedding (dimensions).
7000 Books ~5GB
8 million documents ~40GB
Multiple Source ~45TB
GPT-3 Training Model Statistics
The statistics of multiple datasets used to train the model are as follows:
GPT-3 is trained with total 499B tokens, or 700GB .
Common Crawl weighted 60%, contains diverse data from web crawling over years .
WebText2 spanning 22%, includes the dataset from outbound Reddit links .
Books1 and Books2 with a combined share of 16%, contain internet-based books corpora .
Wikipedia weighted 3%, includes data from Wikipedia pages in English .
Dataset Weightage in Training
Common Crawl (filtered)
https://arxiv.org/pdf/2005.14165.pdf) GPT-3 Business Model Statistics
GPT-3 is not available as open source but through a commercial API. Some of the astonishing stats regarding the company’s status and running cost of GPT-3 are as follows:
In 2015, OpenAI began as a nonprofit research lab.
In 2019, OpenAI switched from a non-profit organization to a for-profit company .
Microsoft partnered with OpenAI with a $1B investment .
GPT-3 training requires 3.114×1023 FLOPS (floating-point operations) which cost $4.6M using a Tesla V100 cloud instance at $1.5/hour and take 355 GPU-years .
GPT-3 can’t be trained on a single GPU but requires distributed system increases the cost of training the final model by 1.5x – 5x .
The R&D cost of GPT-3 ranges from $11.5M to $27.6M, excluding the overhead of parallel GPUs, salaries, and submodel costs .
In parallel GPT-3 requires at least 11 Tesla V100 GPUs with 32GB memory each, at a cost of $9,000/piece summing to $99,000 for GPU cluster excluding RAM, CPU, SSD drives, and power supply .
GPT-3 model cost $12.6M with at least 350GB of VRAM (half-precision FLOP at 16 bits/parameter) just to load the model and run inference, putting VRAM north of 400 GB .
Hardware costs of running would be $100,000 – $150,000 neglecting power supply, cooling, and backup costs .
A baseline Nvidia’s DGX-1 server, VRAM (8×16GB), costs around $130,000 including all other components for a solid performance on GPT-3 .
If run in the cloud, GPT-3 requires at least Amazon’s p3dn.24xlarge, packed with 8xTesla V100 (32GB), 768GB RAM, and 96 CPU cores, and costs $10-30/hour, and a minimum of $87,000 yearly .
OpenAI may work in collaboration with Microsoft on specialized hardware, such as the supercomputer leading to cost-efficient solutions .
GPT-3 has a supercomputer hosted in Microsoft’s Azure cloud, consisting of 285k CPU cores and 10k high-end GPUs .
The preliminary pricing plan provides OpenAI with a near-6,000-percent profit margin, providing room for much adjustment if the current business plan doesn’t bring in customers .
OpenAI provides diverse pricing plans for its API. Some of the pricing stats are defined here,
GPT-3 has a free plan, Start for free, for $18 in free credit for the first 3 months .
Two other paid plans include a flexible ‘Pay as you go’ and a complex ‘Choose your model’ .
Billing is done per 1000 tokens i.e. about 750 words 
A token equals 4 characters or 0.75 words for English text .
Every model has a predefined maximum context length, ranging from 1500 to 2048 tokens .
Based on the spectrum of capabilities and choices GPT-3 provides 4 pricing models .
Ada priced at $0.0008/1K tokens, performs the fastest at cost of lesser capabilities .
Babbage costs $0.0012/1K tokens, is good for straightforward tasks .
Curie charged $0.0060/1K tokens, has the ability for nuanced tasks, and good as a general chatbot .
Davinci priced $0.0600/1K tokens gives the best results for complex intent .
GPT-3 provides a customizable Fine-tuned model billed at 50% of the base price and an expensive Embedding model to build advanced search [22,23].
Pricing / 1K Tokens
Training / 1K Tokens
(Fine-tuned Models) / 1K Tokens
(Embedding Models) / 1K Tokens
Commercialization of GPT-3 leads several service-utilizing platforms to switch to paid mode:
PhilosopherAI declared a service cost of at least $4,000/month .
AI Dungeon has introduced a premium Dragon Model for the GPT-3-based version charging $10 monthly .
GPT-3 Tailored Model Statistics
Customers tailor GPT-3 models suitable to their requirements and get stunning results. Here are some stats:
Fine-tuning improves accuracy over the Grade School Math problems dataset by 2 to 4 times .
A customer’s correct outputs are increased from 83% to 95% .
Another customer’s error rate at reduced by 50% with a tailored model .
The frequency of unreliable outputs is reduced from 17% to 5% for a customer .
The benefits of fine-tuning GPT-3 start to appear in less than 100 examples .
The statistics of apps powered by customized GPT-3 depict promising results:
Keeper Tax’s performance enhances from 85% to 93%, with 500 new training examples once a week .
Viable reports improved accuracy from 66% to 90% in summarizing customer feedback .
Sana Lab’s question and content generation yielded a 60% improvement from general grammatically correct responses to highly accurate ones .
Elicit observes an improvement of 24% in understandability of results, 17% in accuracy, and 33% overall .
GPT-3 Model Architecture
The transformer-based model has a massive architecture divided into submodels.
GPT-3 has 8 models based on parameter sizes ranging from 125M to 175B .
Attention-based architecture has attention layers ranging from 12 in the smallest model to 96 in the largest .
The transformer layers range from 12 to 96 .
Learning rate changes from 6.0 × 10
−4 to 0.6 × 10 −4 .
Bottleneck Layer Units
Attention Head Dimension
6.0 × 10
3.0 × 10
2.5 × 10
2.0 × 10
1.6 × 10
1.2 × 10
1.0 × 10
0.6 × 10
https://arxiv.org/pdf/2005.14165.pdf GPT-3 Performance and Accuracy
The performance and accuracy of GPT-3 are studied over various existing datasets. The interesting performance stats are as follows:
Significant performance improvement is depicted over LAMBADA and PhysicalQA (PIQA) .
A prominent gain of 8% is achieved in zero-shot setting LAMBADA by GPT-3 compared to SOTA .
A substantial 4% improved accuracy is depicted for PIQA compared to previous SOTA – a fine-tuned RoBERTa .
HellaSwag and StoryCloze showed respectable performance but still lower than SOTA .
HellaSwag results are lower compared to fine-tuned multi-task model ALUM .
StoryCloze is 4.1% behind the SOTA using fine-tuned BERT model .
Winograd shows 88.3%, 89.7%, and 88.6% in the zero-shot, one-shot, and few-shot settings respectively, and depicting strong results but below SOTA .
GPT-3 Fine-tuned model depict 100% accuracy for two-digit addition and subtraction .
Short articles (~200 words) written by GPT-3 175B are humanly detectable for change at ~52% .
The articles written by GPT-3 125M are 76% human detectable. ()
https://www.springboard.com/blog/data-science/machine-learning-gpt-3-open-ai/ GPT-3 Powered Platforms
Some of the businesses and applications utilizing GPT-3 are aforementioned in Tailored Model Statistics and Pricing Statistics. The stats of some more platforms and apps powered by GPT-3 are stated below:
GPT-3 is being utilized by over 300 applications 
The platform has tens of thousands of developers around the globe .
As of March 2021 an average of 4.5B words are generated per day .
Algolia tested GPT-3 on 2.1M news articles and get 91% precision .
Duolingo using GPT-3 observed e 12% improvement in prediction accuracy and user engagement .
DALL·E 2 based on 12B GPT-3 is preferred by 71.7of % of users for caption matching and by 88.8% for photo realism .
GPT-3 Use Cases
GPT-3 is a new artificial intelligence system that is said to be the most powerful AI system in the world. GPT-3 has many potential uses, including helping humans with their work, providing better customer service, and even becoming a personal assistant. Here are some of the common GPT-3 uses cases:
GPT-3, the world’s largest artificial intelligence model, is now available to the public. And businesses are taking notice. Businesses are already using AI to improve customer service, create new products, and automate repetitive tasks.
GPT-3 is a powerful tool for marketing.
can help you create better content, target your audience more effectively, and track your results. Additionally, GPT-3 can help you track your progress and analyze your results so that you can optimize your marketing strategies. AI marketing tools Customer Service
AI in customer service is revolutionizing how businesses interact with their customers. By automating routine tasks and providing instant answers to common questions, AI is helping businesses improve their customer service experience. In addition, GPT-3 powered chatbots can handle complex customer inquiries, freeing up human agents to provide more personalized service.
AI can help identify patterns and correlations that humans might miss. It can also help automate the analysis process, making it faster and easier. Additionally, AI can provide insights that would not be possible without its help. For these reasons, AI is becoming an essential tool for data analysts.
are being used to write articles, create videos, and even generate social media posts. AI content creation tools Design
powered by GPT-3 have the potential to improve the efficiency and quality of the design process by automating repetitive tasks, providing personalized recommendations, and assisting in the exploration of design options. AI design tools GPT-3 Statistics Final Words
The article provides the GPT-3 growth story based on prominent statistics. The GPT-n models are substantially growing and the research community is curious about GPT-4. According to a reviewer at Hacker News,