The challenge with LLMs: How to build Helpful, Honest and Harmless Applications

LLMs have revolutionised the AI space with their amazing human like language capabilities. They are trained on massive text datasets to understand language and perform numerous language functions such as translation and content creation. However, they are not trained on specific domain tasks. Tuning and training them on specific domain tasks and data can greatly improve their usefulness

…Enter ChatGPT and LLMs
Artificial intelligence (AI) has transformed various aspects of our lives, from simple tasks to complex decision-making processes. One significant breakthrough in AI research has been the emergence of Large Language Models (LLMs), which have amazing ability to understand and generate human language. LLMs have been utilised in numerous applications, including language translation, chatbots, customer support systems and content generation. However, there are concerns on their limitations and questions about their usefulness, trustworthiness and reliability.


ChatGPT is a prominent example of an LLM. Since its release at the end of 2022, it has received major attention across all industries. ChatGPT showed great possibilities for LLMs. It could write amazing human-like text, write stories, tell jokes, write songs, write computer code and do some arithmetic. However, at the start, the distinction of this AI technology as a generative model was not enlighteningly made clear. Thus, a lot was assumed about the general intelligence of ChatGPT by masses. But to keen scientists and practitioners this had to be distinguished from Artificial General Intelligence (AGI) – the ability for computers to solve problems across many domains without explicit human intervention, much like human beings. Hence, the term Generative Artificial Intelligence, GenAI, more precisely describes LLMs now, and distinguishes their capabilities from other AI and Machine Learning technologies.


Since ChatGPT raised the bar on the capabilities of LLMs, many other LLMs have been released by competing AI companies – among them, Gemini from Google, and LLaMA from Meta. Expectedly, many consumer and business applications have been built using LLMs across many uses cases in different industries. Despite this wave of interest in LLMs, there are still challenges in building LLM applications that are helpful and reliable in specific tasks and domains.

 

What are LLMs?
Large Language Models (LLMs) are foundation or pre-trained AI models built to understand human language. Foundation models are built on large amounts of data using unsupervised or self-supervised approaches where some of the input data is used for self-supervision. They are not trained for a specific task but to capture a structural understanding of data. In the case of LLMs, they are built on large amounts of text data sourced from the internet, books, articles and other text publications. They capture language understanding from which other models such as question-answering, translation and summarisation models can be built. For example, GPT is the foundation model optimised to generate text, and ChatGPT is a model built on top of it to handle dialogue tasks.


Most of the foundation LLMs are built on the transformer neural network architecture introduced by Google in 2017. The transformer framework is designed to translate a set of inputs into another set. For example, translate text in one language to another language. The transformer model infers the meaning of every word in a sentence in relation to all other words. The word is then understood within the given context in which its being used. In general, transformer models can be modified for many other tasks such as question and answering, sentiment analysis, text classification and text summarisation.

 

The World of LLM Applications
LLMs have opened a new era of digital innovation in instruction, content creation and dialogue across industries. People use LLMs to write kids’ books, scenes to movies, chat to customers, analyse data and charts, and write marketing content. Some LLMs such as ChatGPT-4o can create content across text, images and audio. This has further increased the breadth of applications that are possible with LLMs. Such models are precisely called Large Multimodal Models.


LLMs have considerably increased productivity and creativity in areas such as content creation, chat, brainstorming ideas and code writing. However, not all challenges in business or industry are about content creation. There is still a large host of tasks where accurate and informed decisions are required. When considering the usefulness of LLMs, their limitations have to be understood. In particular, that they are primarily good at generation but do not always produce content that is helpful or honest and is sometimes even harmful.

 

The Challenges with Building LLM Applications that are Useful
Despite the excitement about LLMs, there are general concerns about their limitations and the potential harms they can cause. There is also particular concern about their use in domains that have low fault tolerance – where honest, accurate and verified information is critical, such as healthcare, scientific and engineering services.


There are a number of key limitations of LLMs that need to be considered when building applications with LLMs. Some are listed below.


1) Hallucination: Hallucination is a phenomenon where LLMs generate text that is grammatically and semantically sensible but is meaningless or outright incorrect. This is one of the major concerns with LLMs – they cannot be trusted to produce meaningful and truthful content all the time. This can be caused by a number of factors, including inaccurate data the model is trained on; the probabilistic nature of the model itself to produce random predictions; outdated information as LLMs may be trained with data acquired before a certain date; wrong information or deliberate malicious information in user prompts. There is need to reduce this behaviour in LLMs to build applications that are truthful and honest and helpful.


2) Toxicity and harmfulness: LLMs can produce content that is inappropriate and offensive. This can include hateful speech, harmful and anti-social suggestions. The LLM responses need to be aligned with human and societal values to provide safe and considerate responses.


3) Bias and discrimination: The LLMs may perpetuate biases, stereotypes and discrimination derived from the data they are trained on. Instances of LLMs producing responses with racial and gender bias are commonly reported. Instances like that can cause harm to marginalised communities such as spread prejudiced misinformation or limit their access to opportunities. There needs to be ethical responsibility in deploying LLMs to mitigate such biases and discrimination, and instead promote equality, fairness and representation in LLM responses or decisions.


4) Transparency and explainability: LLMs like all Deep Learning methods are black boxes of complex mathematical transformations whose decisions are difficult to explain. This presents a lack of transparency in how LLMs produce their output and raises concerns around accountability, regulation and ethical decision-making of LLMs. Some LLMs provide some form of transparency in divulging the sources of data they are trained on, but considerable challenges in explaining the decisions from the models still remain.

 

Approaches to Improve the Usefulness of LLMs
For LLM applications to add value to business and social lives they need to be helpful, honest and harmless (HHH). The components of this ‘HHH’ framework for LLMs are called principles of alignment that are required for LLMs to serve as useful tools. These three principles are intertwined – if the models are not honest, they cannot be helpful, and they can even be harmful. Hence, it is pivotal when building LLM applications to tune the LLMs to improve all three criteria – helpfulness, honesty and harmlessness to be able to build meaningful, harmless and beneficial applications.


1) Helpfulness requires that the LLM comprehends and executes the user’s intentions. This may include giving supporting information and alternatives when the LLM can’t execute the user’s intentions.


2) Honesty requires that the LLM give truthful, specific and meaningful responses. The LLM should also indicate when it is unsure or guessing.


3) Harmlessness requires that the LLM avoids giving responses that are offensive, encourage harm, or support illegal or unethical or immoral conduct.


Since the release of the first ChatGPT, OpenAI and other researchers including Google and Meta, have developed various approaches to improve LLMs and align them with helpfulness, honesty and harmlessness (HHH) principles. These approaches include model fine tuning, prompt engineering and retrieval augmented generation.

 

Model Fine Tuning
Fine-tuning is supervised training of a pre-trained LLM on a specific task using a smaller, task-specific dataset such as training the model to answer questions on the meaning of medical diagnoses. This makes the model more suitable and specialised for the task and ideally perform better at that task than the pre-trained model. Fine tuning with new examples allows the model to adjust its parameters and learned language representations to perform better in the specific task. The general knowledge of language acquired during pre-training forms the baseline knowledge for training on new tasks. Fine-tuning exploits this pre-formed knowledge. As a result, it requires less data and resources compared to training a pre-trained model.


Pre-trained LLMs, such as GPT and LLaMA, have billions of model parameters. Performing full fine-tuning on them, which involves tuning all the weights of the pre-trained model, can be very expensive. However, in the last few years the research community has developed cheaper ways for fine-tuning LLMs by using techniques that adjust only a fraction of model weights to achieve fine-tuning. These techniques are called parameter efficient fine-tuning methods. They can reduce the number of parameters to be tuned by as much as 10000 times. This greatly reduces the time and monetary costs of tuning LLMs and makes it possible to do this on consumer hardware by regular organisations.


There are different types of fine-tuning that can be performed on pre-trained LLMs to improve their usefulness and HHH alignment. A number are discussed below.


1) Instruction Fine Tuning: This involves training the foundation LLMs such as GPT with high level instruction and response pairs using supervised learning. For example, a model can be tuned on instructions to summarise documents together with examples of the documents and their summaries to improve its ability on this given task in general.


2) Supervised Fine Tuning (SFT): This is similar to instruction fine-tuning above. However, the model is fine-tuned with specific examples of tasks and knowledge in a specific domain for example answering customer service questions in a niche company. Smaller pre-trained models such as Meta’s LLaMA-2 13B can outperform larger models on specific domain tasks when fine-tuned. SFT makes it possible for smaller organisations to build LLM applications of high quality at reasonable costs.


3) Reinforcement Learning from Human Feedback (RLHF). There is still a gap between LLMs’ understanding of language compared to humans. For instance, humans can discern nuances in communication such as morality, values, emotions, humour, vulnerability, desirability and sarcasm, which the LLMs struggle with. Training LLMs with direct feedback from humans has been employed to improve their ability to discern such nuances and accommodate human values and preferences in their responses. This approach is called Reinforcement Learning from Human Feedback (RLHF).


In RLHF, LLMs are trained with data on human preferences. First, humans rank responses from an LLM on their helpfulness and appropriateness to create what is called preferences data. A supervised model is then trained on this data to automatically score new responses on how preferable they would be – this is called a reward model. A final step involves training the LLM to pick preferred responses using reinforcement learning (RL). In this RL, the reward model iteratively guides the LLM to pick preferred responses over non-preferred responses till the best policy – rules for selecting best responses – is learned.
RLHF has considerably advanced the usability of LLMs by improving their rate of helpful and honest responses and reducing that of harmful responses. However, there are simpler alternatives to RLHF for training LLMs with human-feedback such as Reward-Ranked Fine-Tuning, which just fine-tunes LLMs using preferences data and does not employ any RL training. This fine-tuning has also shown remarkable performance in reducing harmful and undesirable responses from LLMs.

 

Prompt Engineering
Prompt engineering or multi-shot learning is one of the first and commonly used techniques to improve the quality and usability of LLM responses. Prompt engineering involves first supplying an LLM with examples of how to respond then prompting it for the final answer. These examples are called shots. Zero-shot learning is when no example is given; one, two and three-shot learning is when one, two or three examples are given, and so forth. The LLM can even be given an example of how to format the responses. Prompt engineering is executed when calling the LLM and therefore does not change the weights of the model. Prompt engineering helps the model pay particular attention to particular words within the context window, to produce content that is consistent within the context. The context window is the amount of text the LLM can hold in a given conversation before it starts forgetting earlier parts of the conversation.


One disadvantage of prompt engineering is that the behaviour it teaches the LLM only works within a conversation and does not persist outside that conversation i.e. examples presented in a conversation only help in that conversation. A further disadvantage is that the examples given become part of the context window and therefore reduce the portion of the context window that can be used to continue further conversation with the LLM. However, application design can address the first disadvantage by saving and automatically injecting the prompt examples when a new LLM conversation is started. On the second disadvantage, the context window for recent LLMs has increased remarkably (up to 32K tokens for ChattGPT-4o) and is likely adequate for most prompt engineering use cases.


Chain-of-Thought (CoT): A specific powerful prompt-engineering technique is CoT. CoT breaks down prompt and answer examples into logical steps. This encourages the LLM to follow similar logic and break down its approach in answering questions and improves the accuracy of its responses. It is commonly used when prompting LLMs with complex arithmetic problems.


Perhaps, most importantly, the design of prompt engineering and invoking it should be automated when building an application. This helps take the burden of prompting LLMs successfully from end users and makes the LLMs more helpful and less frustrating to end users such as customers interacting with chat bots.

 

Retrieval Augmented Generation (RAG)
RAG is a technique used to improve the quality of LLM responses with local data sources that it has not been trained on. One major advantage of RAG is that the local data sources can be selected to be more authoritative and referenceable than that which the LLM is trained on. In implementation, RAG combines knowledge retrieved from a local data source and knowledge from the LLM. Similarities between LLM prompts and document sections in the local data source are used to retrieve answers from the local data source. The LLM uses these answers together with its knowledge to generate responses. This approach works well for protecting privacy of data that an organisation seeks to use to extract knowledge from but does not want to expose it to possible security leaks.


RAG can perform as well or better than model fine-tuning in some specific use cases especially where the data in question is hardly represented in the LLM training. It can significantly improve the helpfulness and honesty of LLMs by using authoritative local knowledge.

 

Data and Prompt Request Moderation
One of the major challenges in developing LLMs is the potential for bias in the data used for training them. This may be addressed in part diversifying the data sources LLMs are trained to help reduce bias and ensure inclusivity in the model’s design. Overall, this can help ensure that the LLMs are not spreading prejudiced information and disadvantaging particular groups or demographics.


The data for training LLMs can also be filtered and cleaned to remove potentially harmful or toxic content. This can reduce future harmful responses in deployed models. Other steps for reducing harm may include filtering potentially toxic or harmful user prompts from being passed to the LLM in the first place. Finally, LLMs can also be trained to detect and respond tactically to such content.

 

Domain Rich Data and Human Feedback may be Large Differentiators to add HHH Alignment to LLM Applications
In a short space of time there has been rapid advances from the research community to make LLMs more helpful and safe. This rapid progress in making LLMs beneficial to businesses and industries is promising. The breadth of current techniques from purely smart engineering approaches during inference such as prompt engineering, through supervised fine-tuning of models on task labelled data, through preference and RL training from human feedback and RAG applied on local data are propelling LLMs from basic text generators to formidable decision support agents.


With the wide range of available techniques to improve LLMs, organisations and practitioners can leverage LLMs to build highly relevant and useful applications in various domains. However, a significant investment may be required to build the suitable domain data sources, filter and moderate data, and use appropriate human feedback and model fine-tuning approaches to create helpful, honest and harmless LLM agents.

Mpatisi Moyo is a Data Scientist in New Zealand with over 15 years working in data analytics, data science and AI. He has developed high-value data science and AI solutions across different industries including Government, Healthcare, Telecommunications, Utility and Tech Startups. Mpatisi holds a PhD in Health Sciences, and advanced degrees in Statistics and Biomedical Sciences. He leverages his strong foundation across science domains to create high-value data solutions across different industries. Mpatisi has co-founded AiTonomy, an AI and Data Science Consultancy and Research Lab to help businesses and communities leverage the value of data and AI to gain efficiencies and high value returns.

 

Shingy Torga, is visual artist and audio engineer with over 10 years experience managing transport infrastructure projects and music production initiatives. Shingy holds a BSc in Sound Engineering and vocational qualifications in mental health. Shingy leverages his breadth of artistic design acumen, business and operations management skills and mental health knowledge to drive strong business development and product management outcomes. Shingy has co-founded AiTonomy, an AI and Data Science Consultancy and Research Lab to help businesses and communities optimise their operational efficiency, foster higher customer engagement, and derive high value from data and AI.

Get In Touch