Auto-Regressive Large Language Models (LLMs) like ChatGPT generate text by predicting one token at a time based on sequential data, aiding tasks such as coding assistance, though they struggle with consistent factual accuracy and generalisation.
What are LLMs?
In a slightly more technical lingo, they’re sometimes referred to as AR-LLMs (Auto Regressive Large Language Models).
The Auto Regressive component is very important when we think of applications like ChatGPT. This term harks back to old school statistical forecasting, when the AR aspect of time series (sequential) data is used to help train the ‘long window’ (longer trends in the time series) aspect of the forecast model. However, in statistical forecasting (but not so much some of the current Machine Learning forecasting algos), the AR window is ‘integrated’ with the short window (Moving Averages) to generate forecasts into the unseen future. The current Machine Learning/AI forecasting algos have different ways of achieving accurate forecasts, though. But that’s a different article!
In a somewhat similar fashion, LLMs also treat words (referred to as Tokens) as a sequential string of words. That sequence is used to generate words one token at a time. At a super high level, this is essentially how ChatGPT appears to ‘speak and converse’ with us when we prompt it with questions.
Possible Implications to Technical Job Roles
Code writing assistance: In the blunt opinion of this Data Scientist, if someone thinks that ChatGPT will take their role as an Engineer, Data Scientist, or whichever technical space you occupy, you’re probably not that great at it anyway.
Screenshot from https://www.youtube.com/@lexfridman/videos
ChatGPT is quite adept at performing as a coding assistant to help programmers find solutions to technical problems that might otherwise take a very long time for a human to solve. It can help debug error stacks and assist with getting through tedious boilerplate tasks.
This is where human programmers with good technical skills will really shine. With ChatGPT acting as a coding assistant, it helps us get through some of the more tedious, time consuming and often frustrating tasks more efficiently and with little drama. From this Data Scientist’s viewpoint, all hail ChatGPT for helping with this aspect of the workflow! Anything that can help me get through tedious tasks and debug time consuming error stacks to allow me to work on the more advanced technical issues, such that I have the time to build better predictive models, is aces in my book.
Where ChatGPT won’t be able to help:
- Those professionals who mindlessly copy and paste code from blogs, stack overflow and public GitHub repos, and are just looking to get through an error stack but don’t really care or understand what the code is doing, will become more exposed as frauds. In this respect, I would hope that the ChatGPT model exposes more bad actors in the field and helps the good folks get the respect they deserve.
- In a similar line with the above dot point, ChatGPT can have really good domain knowledge, but doesn’t have contextual knowledge. This is also where the human programmer’s experience and ability to solve business problems is needed to arrive at a solution. Whilst it can generate code, if the human programmer lacks the necessary knowledge to properly understand what it means, ChatGPT will go tumbling down rabbit holes like Alice and generate unusable garbage.
Limitations
Some of the current limitations of LLMs are:
- Producing factually correct answers consistently. In other words, these things can hallucinate and produce incorrect facts, but write those incorrect facts with beautiful exposition. So, beware the charming but poorly informed AI assistant!
- Since language processing is trained on billions of parameters, retraining the model to account for new information is a costly and energy intensive task. The CO2 emissions involved in training is actually greater than the average cost of owning and running a car in the US for 1 year! As a result, they don’t get refreshed with fresh training data that often:
- The models don’t tend to generalise well in the real world. They essentially will mimic what they’ve learned in the training sets but are not great at applying that knowledge to the real world. In the Data Science world, we tend to call this ‘over fitting’.
- Unless specifically taught, they’re actually not great at solving mathematical equations either. This is a lesser problem, since mathematical axioms tend to be extremely well defined and can be demonstrated to the machine by way of proofs.
In more general terms, Auto-Regressive LLMs (at best) approximate the functions of the Wernicke’s and Broca’s areas in the brain. In other words, the auditory and language areas of the brain. However, as far as I’m aware, neither of these areas have any direct connections to the prefrontal cortex (where most cognitive functions are executed). In very general terms, it can be said that language and the auditory areas of the brain have little to do with human intelligence.
So, if that’s the case, we could also say that there is only a small fraction of our thinking that is expressed in linguistic form (be it written or verbal expressions). And, even when we do express ourselves in language terms, it’s often in a ‘reduced form’. Ever think to yourself, ‘if only I could explain myself so that everyone could understand!’? But, even for the most eloquent among us, words often fail to fully express the thoughts surrounding them. As such, we use language as an almost reductive method of expressing ourselves.
Given that language models are trained solely on, well, language, how can we deem them intelligent? In even broader terms, all neural networks have similar limitations within their respective domains. A separate example of this are the challenges that have plagued self-driving cars for several years in that they’re nowhere near able to drive themselves without a ‘human in the loop’. Even with the several years’ long effort made by Tesla to make their cars self-driving, they’re not anywhere near being able to make it so.
By contrast, an average adolescent human can learn the basics of how to drive a car with ~20 hours of supervised driving practice (though I’m sure a few parents might find that number of hours debatable!). Neural networks (also referred to as Deep Learning) not only require tremendous amounts of carefully curated training data to learn an often repeatable task, but when there are too many degrees of freedom (bits of information that go into estimating a parameter) in the system, these neural networks will often struggle to generalise their learnings – i.e. apply the information they’ve learned during training into the real world (can sometimes be referred to as predictions based on unseen data).
Vikram Pande is a passionate and driven Data Scientist with a breadth of experience that includes Telcos, Banking, and private Healthcare (https://www.linkedin.com/in/vikrampande).
See Vikram’s profile here.