Jun 16th 2024

The Foundation Understanding LLMs and Prompt Engineering, and Why It All Matters

Let’s get down to basics and talk about how Large Language Models (LLMs) actually work. Think of them like prediction machines. There’s nothing factual; everything is statistical. It generates text, one word after another (well, technically it’s not a word, but it’s a token; multiple tokens may form a word), and then tries to guess what the next word should be. They’re trained on massive amounts of data, so they get pretty good at figuring out how words relate to each other.

This is the 2nd blog of the series: Prompt engineering for business applications. Prompt Engineering is complex and requires careful planning and refinement to achieve desired results from AI models. As a software engineer @Google with experience in prompt engineering for major businesses, I will share practical learnings in a blog series to help others unlock the power of AI beyond simple tasks.

Blog 1: Demystifying Prompt Engineering for the Enterprise

The Foundation: Understanding LLMs and Prompt Engineering, And Why It All Matters

When you give an LLM or AI Agent a prompt, you’re basically giving it instructions on how to do this word prediction thing. Good prompt engineering is all about crafting those instructions really well. It’s like giving someone directions - the more precise and more specific you are, the better the chances they’ll end up where you want them to.

In the world of GenAI and natural language processing, a prompt is the input you give to the model to get a response. You can use these prompts to make the LLM do all sorts of actions:

Summarizing large documents
Extracting key information from a speech
Answering and reasoning your questions from a contract
Classifying stuff (like, is this email spam or not?)
Translating languages of a document
Generating code or explaining code

The Challenges: It’s Not Always Simple

Anyone can write a prompt, but writing a good prompt? That’s where things get tricky. You’ve gotta consider a bunch of factors to get a top-notch response from your model:

Teamwork: Get a subject matter expert on board. They know the ins and outs of your topic and can help you evaluate and rate generated answers or provide examples of what “perfect” looks like.
Make configurations: You need to pick the right AI model and tweak its settings. Things like how creative (Temperature) it should be, how safe its answers need to be (Safety an filtering Settings), and how it samples which words to use (Top-K / Top-P), it all plays a role.
Prompt Perfection: The way you write your prompt matters - the words you choose, the order you put them in, how you phrase things, how you provide instructions, the role and style that you take, the context you pass in, the examples you give and the constraints and output expectations that you set. It influences the LLM response.

Plus, even with the same prompt, the response can sometimes be different. It’s different from a calculator, where you always get the same answer. So, you can’t just string compare the text of two responses to see if they’re the same.

And let’s not forget the technical challenges.

Sometimes, the LLM’s answer is too long and, therefore, breaks up the formatting. (This happens a lot when you work with an output format like JSON)
Responses might get blocked if they try to say something harmful, copyrighted, or inappropriate.
You can run into quota issues.
And all kinds of security challenges!

So yeah, there are many reasons why your output isn’t what you expected. For a consumer using a chat interface like Gemini or ChatGPT, that’s fine; they just type another single line question. For a business application, this can be a severe issue. A food ordering bot that takes your order wrong likely won’t be used a 2nd time. Worse, a medical summary wrongly summarized or a legal contract wrongly explained has serious consequences.

And this is why it’s super important to keep track of your prompts, test them thoroughly, and get feedback from real people like subject matter experts, other prompt engineers in your team or even another automated LLM.

Don’t worry; it’s not all doom and gloom. You can fine-tune your AI model, tweak your prompts, or even try a different model altogether. This guide is all about helping you master the art of prompt engineering, so stick around, and we’ll dive into the nitty-gritty details!