Introduction to Large Language Models | Machine Learning

New to language models or large language models? Check out the resources below.

What is a language model?

A language model is a machine learningmodelthat aims to predict and generate plausible language. Autocomplete is alanguage model, for example.

These models work by estimating the probability of atoken orsequence of tokens occurring within a longer sequence of tokens. Consider thefollowing sentence:

When I hear rain on my roof, I _______ in my kitchen.

If you assume that a token is a word, then a language model determines theprobabilities of different words or sequences of words to replace thatunderscore. For example, a language model might determine the followingprobabilities:

cook soup 9.4%warm up a kettle 5.2%cower 3.6%nap 2.5%relax 2.2%...

A "sequence of tokens" could be an entire sentence or a series of sentences.That is, a language model could calculate the likelihood of different entiresentences or blocks of text.

Estimating the probability of what comes next in a sequence is useful for allkinds of things: generating text, translating languages, and answeringquestions, to name a few.

What is a large language model?

Modeling human language at scale is a highly complex and resource-intensiveendeavor. The path to reaching the current capabilities of language models andlarge language models has spanned several decades.

How large is large?

The definition is fuzzy, but "large" has been used to describe BERT (110Mparameters) as well as PaLM 2 (up to 340B parameters).

Parametersare theweightsthe model learned during training, used to predict the next token in thesequence. "Large" can refer either to the number of parameters in the model, orsometimes the number of words in the dataset.

Transformers

A key development in language modeling was the introduction in 2017 ofTransformers, an architecture designed around the idea ofattention.This made it possible to process longer sequences by focusing on the mostimportant part of the input, solving memory issues encountered in earliermodels.

Transformers are the state-of-the-art architecture for a wide variety oflanguage model applications, such as translators.

If the input is "I am a good dog.", a Transformer-based translatortransforms that input into the output "Je suis un bon chien.", which is thesame sentence translated into French.

Full Transformers consist of anencoder and adecoder. Anencoder converts input text into an intermediate representation, and a decoderconverts that intermediate representation into useful text.

Self-attention

Transformers rely heavily on a concept called self-attention. The self part ofself-attention refers to the "egocentric" focus of each token in a corpus.Effectively, on behalf of each token of input, self-attention asks, "How muchdoes every other token of input matter to me?" To simplify matters, let'sassume that each token is a word and the complete context is a singlesentence. Consider the following sentence:

The animal didn't cross the street because it was too tired.

There are 11 words in the preceding sentence, so each of the 11 words is payingattention to the other ten, wondering how much each of those ten words mattersto them. For example, notice that the sentence contains the pronoun it.Pronouns are often ambiguous. The pronoun it always refers to a recent noun,but in the example sentence, which recent noun does it refer to: the animalor the street?

The self-attention mechanism determines the relevance of each nearby word tothe pronoun it.

What are some use cases for LLMs?

LLMs are highly effective at the task they were built for, which is generatingthe most plausible text in response to an input. They are even beginning to showstrong performance on other tasks; for example, summarization, questionanswering, and text classification. These are calledemergent abilities. LLMs can evensolve some math problems and write code (though it's advisable to check theirwork).

LLMs are excellent at mimicking human speech patterns. Among other things,they're great at combining information with different styles and tones.

However, LLMs can be components of models that do more than justgenerate text. Recent LLMs have been used to build sentiment detectors,toxicity classifiers, and generate image captions.

LLM Considerations

Models this large are not without their drawbacks.

The largest LLMs are expensive. They can take months to train, and as a resultconsume lots of resources.

They can also usually be repurposed for other tasks, a valuable silver lining.

Training models with upwards of a trillion parameterscreates engineering challenges. Special infrastructure and programmingtechniques are required to coordinate the flow to the chips and back again.

There are ways to mitigate the costs of these large models. Two approaches areoffline inferenceanddistillation.

Bias can be a problem in very large models and should be considered in trainingand deployment.

As these models are trained on human language, this can introduce numerous potential ethical issues, including the misuse of language, and bias in race,gender, religion, and more.

It should be clear that as these models continue to get bigger and performbetter, there is continuing need to be diligent about understanding andmitigating their drawbacks. Learn more about Google's approach toresponsible AI.

Introduction to Large Language Models | Machine Learning | Google for Developers (2024)