After the Breakthroughs Come the Brakes
The Breakthrough
Research into building algorithms to “understand” language goes back many years. Thousands of researchers, 100K papers, endless experiments, and a prodigious number of failed trials led us to a deeper understanding with some foundation concepts to build on. All the gains were incremental and scattered, with no synthesis or conceptual cohesion.
Then in 2017, the BIG breakthrough arrived in the form of a paper cleverly named “Attention is all you need” written by a brilliant team at Google Mind. The paper proposed the Transformer Architecture and the few that grasped its significance grabbed it with both hands and took the giant leap forward. This was a Nobel Prize-worthy moment and achievement.
The Transformer laid the foundations for language understanding and language generation.
The Nature of Language
Language would become unwieldy if we had to create one unique word to represent each object, action, subject, and their variations. The explosion in vocabulary and rules of usage would render the invention close to useless. The solution we came up with was simple and elegant: reuse the same term in different contexts and let the context suggest its meaning.
Take the word “Bank.” You can use the word in several contexts, and in each context, humans understand its use.
I withdrew money from the bank
I swam to the other side of the river bank
A bank of clouds built up
I went to the blood bank
The bank at the poker table went bust
I banked my paycheck today.
The road banks to the right.
The plane banked to the left
I bank on my team to win the championship.
The clues to context are provided by words that immediately surround the term in question and can come from words or phrases that live 3 or 100+ words away from the term in question.
Before 2017, the algorithms could only reference terms within 3 to 5 words near the term in question. But models that implemented the architecture articulated in the Google paper could now use hundreds of words to the right or left of the term in question to understand the context. This was a game changer, and it triggered the renaissance in Language AI, which led to ChatGPT and its relatives.
Now, to be clear, Language models do NOT understand language the way humans do. They have no experiential knowledge of the world as discovered by our senses. They have a statistical understanding of the world as represented in the language that describes it and upon which they have been trained.
This means that their understanding of the world is based on correlations between words and concepts rather than on a deep understanding based on sensory experience.
In summary
Statistical Inference: LLMs learn the statistical relationships between ords, phrases, and concepts.
Pattern Recognition: LLMs become proficient at recognizing patterns in language usage. Through the pattern, they have learned grammatical structures, syntactic rules, and common expressions.
Generation: Learning patterns enable LLMs to generate coherent and contextually relevant text based on the patterns they have ‘encountered’ through training.
Contextual Understanding: LLMs demonstrate contextual understanding even for complex multi-paragraph sentences. This enables them to generate contextually appropriate responses.
Correlation-Based Knowledge: LLMs have a large repository of statistical correlations between words and concepts.
Biases: In generating text, LLMs are likely to express the same biases and prejudices present in the training data. Text that humans originally created.
Accuracy: LLMs generate text based on the probability that the words generated have some relationship to the question posed. Probability distributions come with a confidence score, so the lower the score, the less accurate the response.
Humans and Accuracy
If you are writing a textbook on marine or aircraft engineering, you will try to generate text that is as close to 100% accurate as possible in the real world.
If you write a fiction novel, you are less concerned about matching real-world accuracy, so you escape scrutiny under the protection of a literary license.
Accuracy hinges on assigning higher probabilities to words or phrases that are more likely to appear in a given context. Ask a question about the weather, and the humans and a model is likely to generate a response with words that include 'temperature,' 'forecast,' or 'conditions.'
Accuracy is a function of the confidence scores, and scores reflect the degree of certainty of the generated output.
We have all observed that LLMs often generate text in ways that require oversight and our careful and judicious use of its output. We are also starting to understand the inner workings of LLMs and are getting to know their strengths and obvious weaknesses.
We use tools of all kinds in all walks of life and are equally aware of their strengths and weaknesses.
In addition, we use products and services of all kinds in all walks of life and are equally aware of their strengths and weaknesses.
In all cases, we attempt to exercise our judgment as to the limits of the tool, product, or service and use them accordingly.
Tools don’t think for themselves, so their use and application are always in the hands of human judgment.
The Brakes
Oversight is good when it is shaped by common sense. Historically, our success rate in implementing oversight is as good as our success rate in projects or investing in startups.
While humans have a clear and decisive advantage over LLMs due to their access to experiential learning, they also suffer from the same problems of accuracy and bias.
If we are to impose oversight on LLMs we should also consider how to impose oversight on humans. The past 5,000 years of human history right up to today clearly show that correctly “understanding” something and acting on it judiciously and beneficially is a skill that humans have yet to master.
That we still try holds out hope for modest progress.
There is a long history of developing oversight over industries. Oversight over LLMs could be self-regulated by a group or government. The trick is to get that balance between protecting the innocent while not stifling innovation.