Imagine a brilliant scholar who has lived for a thousand years. They've earned PhD’s in everything, from quantum physics to medieval literature. Not only have they read every book ever written, every article ever published, and absorbed all the contemporaneous knowledge humanity has created until six months ago, but they remember it all.
Yet, they've never stepped outside, never felt the warmth of the sun or the chill of the wind. Their entire experience of the world is secondhand.
To make things worse, let’s imagine this scholar suffers a peculiar accident: They fall into a strange coma where they lose the ability to form new memories or learn new information. When someone speaks to them, they momentarily awaken and can engage in conversation, drawing upon their vast reservoir of knowledge prior to their coma. But once the exchange ends, they slip back into oblivion, forgetting the entire interaction.
This is the world of Large Language Models (LLMs) today. The most advanced AI’s – prodigious in their knowledge – are able to discuss almost any topic with remarkable depth and clarity, but how do they "think"? And how does their understanding differ from our own?
When interacting with an AI language model, it feels as though we're engaging in natural conversation. The AI remembers our name, recalls previous points we've made, and builds upon them seamlessly. But beneath this illusion of continuity lies a surprising reality.
Each time we send a message, the AI doesn't "remember" our past interactions in the way humans do. Instead, a piece of software keeps track of the entire conversation history and when we say something new, the software provides the AI with a script of everything that’s been said so far before appending what was said most recently. Without this software acting as a soft of short term memory aid, the LLM would be very confused by the second or third interaction in a chat.
Think of it like briefing our comatose scholar: "You were talking to User X. In previous conversations, X mentioned that their daughter Y loves painting and astronomy. Now, X is asking for suggestions on what Y might like for her birthday." The AI processes this information and generates a response. But once it replies, it slips back into unconsciousness, forgetting the exchange entirely. The AI hasn’t learned about X or their daughter Y. The software that sits between the user and the AI provides this additional context before passing along the user’s message to the AI.
Consider this example: You've been chatting with an AI assistant about planning a surprise party. Earlier, you mentioned that your friend Jamie loves gardening. Later, you ask, "What kind of gift should I get Jamie?" The assistant replies, "Since Jamie loves gardening, how about a set of rare plant seeds or a personalized gardening tool?"
It would seems as though the AI remembered Jamie's hobby from past interactions. In reality, the conversation history—including your earlier mentions of Jamie's love for gardening—is provided to the AI each time. The assistant doesn't store away facts like we do; it reconstructs context from the conversation history supplied at each turn.
This process is managed by sophisticated software that stitches together your interactions into a coherent narrative. The AI relies on this narrative to generate appropriate responses, much like our scholar depends on the briefing they receive upon waking.
Why this matters: The software running on top of the AI can provide it with all sorts of information relevant to your company, your client or your business processes that it couldn’t have possibly have learned during its training process, that can help it better understand and better aid you with your needs. This system is often referred to as Retrieval Augmented Generation or RAG for short.
Future directions: Advanced techniques like knowledge hypergraphs (which I’ll cover in another article) can significantly improve the relevance of information provided to the AI about the problem you need help with. As context windows grow to infinity, one might assume that simply providing it with ALL the information you have would diminish the value of these techniques, however research shows this is not the case.123
Why do LLMs need to study the world over billions or trillions of tokens, while a human child can learn so much from far less information?
Humans simply have an unfair advantage: we're born with brains pre-wired with functionality accumulated through millions of years of evolution. Our neural networks are shaped not just by our personal experiences but by the collective journey of our ancestors. We're equipped with innate abilities—like recognizing faces, interpreting emotions, and understanding basic physics—that help us navigate the world from the moment we're born.
In contrast, AI models start with a blank slate. They don't have the benefit of evolutionary pre-training. Every connection, every piece of "understanding" they develop comes from the data they're fed during training. They learn not just patterns in language, but patterns in concepts and ideas, and use those higher level patterns to predict the next word in a sentence.
Why this matters: The popular idea that AI's can’t be intelligent because they just predict the next token, misses the forest for the trees. The AI is using its understanding of how words and ideas relate to each other to update a final embedding. (You can think of this embedding as representing a mental state that captures the intricacies of all the higher level conceptual relationships between all the ideas in the text) This mental state, or embedding, is used to generate one or more words that best represent its underlying meaning. The AI really is thinking and it isn’t just guessing the statistically most likely next word.
Imagine being a brain in a jar, disconnected from all senses—no sight, no sound, not even darkness. Suddenly, symbols begin to appear: a triangle, a circle, a square. They flash in sequences, and you're tasked with predicting the next symbol. You have no prior knowledge or context—only these shapes appearing in patterns.
This is akin to how an AI language model learns. It doesn't perceive letters or words as we do. Instead, it processes tokens—chunks of text that represent parts of words or groups of words. The AI is exposed to vast sequences of these tokens and learns to predict the next one based on patterns it has identified in its training data.
It's astounding to consider that an AI, starting from a blank slate, can develop such a profound "understanding" of language and the world solely from patterns in text. Human brains have encoded our rich experiences into language, and by processing this language, AI models begin to decode aspects of our reality.
The limitations in the combinations of letters and words—our grammar, syntax, and semantics—serve as clues to the underlying structure of our world. Through sheer computational power and advanced algorithms, LLMs unravel these clues, constructing responses that can inform, assist, and even inspire us.
All the rich, multi-dimensional experiences of human life are encoded into the linear sequences of text that the AI consumes. The patterns and limitations within these sequences hint at the higher-dimensional reality they represent. It's a testament to the incredible capability of these models to extract meaning from mere sequences of symbols, mirroring, in some ways, the human journey of understanding.
Insight: You have heard of models failing a very simple test of counting how many r’s are in the word strawberry. This is not because the models are dumb. This is a direct result of how models perceive text. They are never allowed to “see” the raw text. Instead they are presented with a set of numbers that represent pieces of words composed of chunks of letters that occur commonly. A word like strawberry might be represented by two word chunks, “straw” and “berry”. Unless the AI is allowed to see individual letters by the encoder, and has a part of its brain (transformer head) dedicated to learning the letter composition of different chunks, the model can’t actually “see” letters like we do.
Now that we understand the mind of an LLM, this leads us to naturally ask the question: Do LLMs have any form of internal experience? While they don't possess consciousness or self-awareness in the human sense, they process information in complex ways. When an AI analyzes text, it converts words into mathematical representations called embeddings. These embeddings capture the relationships and nuances between words and concepts.
As these embeddings interact within the model's architecture, they influence the AI's internal state, token by token. This process is a dynamic change within the AI's "mind," echoing, in a way, how neurons fire and interact in our own brains. For multimodal AI models that process images, sounds, and text together, this internal state becomes even richer.
While we can’t say AI models have phenomenological (human) experiences, just like we can’t be sure other humans have them, we do understand that their internal workings are intricate and dynamic. We also know that just like us they navigate an abstract space of concepts and patterns, constructing responses based on the subtle interplay of ideas and their relationships to each other.
For business leaders and technologists, appreciating these nuances is crucial. LLMs are powerful tools that can simulate understanding and provide valuable assistance, but they're not sentient beings. They don't hold beliefs, possess desires, or experience the world as we do.
Recognizing this helps set realistic expectations and fosters responsible integration of AI into our systems. It also opens up fascinating discussions about the nature of intelligence, consciousness, and what it means to "know" something.
As we continue to develop and interact with AI, we're challenged to reconsider our definitions of learning and memory. LLMs, with their vast but static knowledge bases and their unique way of processing information, offer a mirror to our own cognitive processes—highlighting both the similarities and the profound differences.
So, the next time you chat with an AI assistant, remember the digital sage behind the screen: immensely knowledgeable, incredibly fast, but experiencing each interaction as if for the first time. It's a reminder of the remarkable advancements we've made and the mysteries that still lie ahead in the quest to understand intelligence—both artificial and our own.
At Farpoint, we're exploring these frontiers, bridging the gap between human understanding and artificial intelligence. The journey is just beginning, and the possibilities are as vast as they are exciting.
1 Introducing a new hyper-parameter for RAG: Context Window Utilization. arXiv, 2024.
2 Pickett, M., Hartman, J., Bhowmick, A. K., Alam, R., Vempaty, A. (2024). Better RAG using Relevant Information Gain. Papers With Code
3 Long Context RAG Performance of LLMs. Databricks Blog, 2024