26. Which of the following is a pre-processing step required for GPT models?
A) Stemming
B) Tokenization
C) Part-of-speech tagging
D) Dependency parsing
27. Which of the following architectures is GPT-3 based on?
A) LSTM
B) GRU
C) Transformer
D) CNN
28. What is the purpose of the “top-p” decoding technique used in GPT models?
A) To control the length of generated text
B) To limit the vocabulary used in generated text
C) To prioritize the most probable words for generation
D) To increase the diversity of generated text
29. What is the “few-shot” performance of GPT-3?
A) The ability to generate text in a language the model has not been trained on
B) The ability to generate text for a specific task without fine-tuning the model
C) The ability to generate text with a limited vocabulary
D) The ability to perform a task accurately with very few examples
30. What is the purpose of the “output layer” in GPT models?
A) To generate text based on input
B) To learn features from input
C) To compute loss during training
D) To regularize the model during training