Prompt Engineering

Posted Sep 6, 2024

32 min read

(00:00) hey everyone I’m Shaw and this is the fourth video in the larger series on using large language models in practice today I’m going to be talking about prompt engineering and now before all the technical folks come after me with their pitchforks let’s just address the elephant in the room so if you’re a technical person like Tony Stark here you might be rolling your eyes at the idea of prompt engineering you might say prompt engineering is not engineering or prompt engineering is way overhyped or (00:29) even prompt engineering is just a complete waste of time and when I first heard about the concept I had a similar attitude it didn’t seem like something worth my time I was more concerned with the model development side of things like how can I fine-tune a large language model but after spending more time with it my perspective on prompt engineering has changed my goal with this dog is to give a sober and practical overview of prompt engineering and the technical people out there who are rolling their eyes like this version (00:56) of Tony Stark maybe by the end of this you’ll be more like this version of Tony Stark oh wow imprompt engineering will be another tool in your AI data science and software development Arsenal so since this is kind of a long session I apologize in advance first I’ll talk about what is prompt engineering then I’ll talk about two different levels of problem engineering what I call the easy way and the less easy way next we’re going to talk about how you can build AI apps with prompt engineering then I’ll (01:39) talk about seven tricks for prompt engineering and then finally we will walk through a concrete example of how to create an automatic grader using Python and Lang chain what is prompt engineering the way I like to Define it is it’s any use of an llm out of the box but there’s a lot more that can be said about prompt engineering here are a few comments on prompt engineering that have stood out to me the first comes from the paper by white at all which defines prompt engineering as the means by which llms are programmed with prompts and (02:13) this raises this idea that prompt engineering is a new way to program computers and this was something that was really eye-opening for me when I first saw tragedy PT and heard this idea of prompt engineering it felt like oh this is just like a chat bot kind of thing but as I dove deeper into it and I read this paper and consumed other resources out there the deeper picture here is that large language models provide a path to making programming and computation as easy as asking a computer what you want in natural language (02:47) another definition comes from the paper by Hugh at all which defines prompt engineering as an empirical art of composing and formatting the prompt to maximize a model’s performance on a desired task the reason this one stood out to me is because it highlights this aspect of prompt engineering then it’s at this point it’s not really a science it’s a collection of heuristics and people throwing things against the wall and accidentally stumbling across techniques and then through that messy process it seems like some tricks and (03:18) heuristics are starting to emerge and this might be part of the reason why people are so put off by prompt engineering because it doesn’t seem like a serious science and that’s because it’s not a serious science it’s still way too early in this new paradigm of large language models that we’re operating in it’s going to take a while for us to understand what these models are actually doing why they actually work and I think with that we’ll have a better understanding of how to manipulate them how to throw stuff at (03:45) them and get desired results out and the final comment that I really liked about prompt engineering comes from Andre carpathy in his state of GPT talk from Microsoft build 2023 where he said language models want to complete documents and so you can trick them into performing tasks just by arranging fake documents I feel like this captures the essence of prompt engineering language models are not explicitly trained to do the vast majority of tasks we ask them to do all these language model want to do is to predict the next token and then (04:17) predict the next one and the next one and the next one and so I love this concept of tricking the AI into solving your problems and that’s essentially all prompt engineering is constructing some text that generates the desired outcome from the large language model and so the way I like to think about it is that there are two levels of prompt engineering the first level is what I call the easy way which is essentially chat GPT or something similar so now Google has barred out there Microsoft has Bing chat all these different (04:44) applications provide a very user-friendly and intuitive interface for interacting with these large language models and so while this is the easiest and cheapest way to interact with large language models it is a bit restrictive in that you can’t really use chat GPT to build an app maybe it’ll help you write some code but you can’t integrate chat gbt into some piece of software or some larger application that you want to build out that’s where the less easy way come comes in the less easy way is to interact with these large (05:16) language models programmatically and so you could use Python for this you could use JavaScript or whatever programming language the key upside of the less easy way here is that you can fully customize how a large language model fits into a larger piece of software this in many ways unlocks a new paradigm for programming and software development and that brings us to building AI apps with prompt engineering like I just said the less easy way unlocks a new paradigm of software development and to demonstrate this let’s just look at a specific use (05:52) case suppose we wanted to make an automatic grader for a high school history class and while this might be easy enough if the questions are multiple choice or true false this becomes a bit more difficult when the answers are short form or even long form text responses and so an example of this is as follows consider the question who was the 35th president of of the United States well you might think that there’s only one answer John F Kennedy there are many answers that are reasonable and could be considered correct and so (06:22) here’s a list of a few examples so there’s John F Kennedy but JFK a very common abbreviation of his name could also be considered correct there’s also Jack Kennedy which is a common nickname used for JFK there’s John Fitzgerald Kennedy which is his full name and someone probably trying to get extra credit and then there’s John F Kennedy where the student may have just forgotten to put one of the ends in his last name let’s see how we can go about making a piece of software that can do (06:48) this grading process automatically first we have the traditional Paradigm this is how programming has always been done and here it’s on the developer to figure out the logic to handle all the variations and all the edge cases this is the hard part of programming it’s like writing a robust piece of software that can handle all the different edge cases so this might require the user to input a list of all possible correct answers and then that might be hard you know with homework with a bunch of questions you (07:15) can’t anticipate every possible answer that a student is going to write down and traditionally if you’re trying to evaluate texts against some like Target text you probably would be using some kind of like exact or fuzzy string matching algorithm but now let’s look at this new paradigm where we can incorporate large language models into the logic of our software and here you can use an olm to handle all the logic of this automatic grading task using prompt engineering instead of coming in with some code that does exact matching (07:47) or fuzzy matching and figuring out the logic that gives you the desired outcome you could just write a prompt and so what this might look like is you write the prompt you are a high school history teacher grading homework assignments based on the homework question indicated by q and the correct answer indicated by a your task is to determine whether the student’s answer is correct grading is binary therefore student answers can be correct or wrong simple misspellings are okay then we have this template here (08:14) where we have q and a as indicated by The Prompt and these curly brackets are indicating where a question is going to be placed in and where the single correct answer is going to be placed in and then we also have a place for the student answer all this can be fed to a large language model and the language model will generate a completion that says the student answer is correct or the student answer is wrong and maybe it’ll give some reasoning behind why this student answer is wrong taking a step back and comparing these two (08:44) approaches to this problem approach one was to manually sit down think and write out a string matching algorithm that tried to handle all the different edge cases and variations of potentially correct answers I’m an okay programmer at best so it would probably take me a week or so to get a piece of software that did an okay job at doing that comparing that to how long it took me to write this prompt which is about two minutes think of the time saving here I could have spent a week trying to use string matching to solve this problem or (09:18) I could have spent a couple minutes writing a prop this is just like the core logic of the application this isn’t including all the peripherals the user interfaces the boilerplate code and stuff like that but that’s the cost savings we’re talking about here we’re talking about minutes versus days or weeks of software development and so that’s the power of prompt engineering and this kind of new way of thinking of programming and software development so now let’s talk about best practices for (09:45) prompt engineering here I’m going to talk about seven tricks you can use to write better prompts and this is definitely not a complete or comprehensive list this is just a set of tricks that I’ve extracted from comparing and contrasting a few resources if you want to dive deeper into any one of these tricks check out the blog published and towards data science where I talk more about these tricks and different resources you can refer to to learn more about any of these so just running through this the (10:11) first trick is to be descriptive even though so in a lot of writing tasks less is more when doing prompt engineering it’s kind of the opposite more is better trick twos give examples and so this is the idea of few shot learning you give a few demonstrations of questions and answers for example in your prompt and that tends to improve the llm’s performance trick three is to use structured text which we’ll see what that looks like later trick four is Chain of Thought which is essentially having the llm think step by step trick (10:41) five is using chatbot personas so basically assigning a role or expertise to the large language model trick six is this flipped approach where instead of you are asking the large language model questions you prompted to ask you questions so it can extract information from you to generate a more helpful completion finally trick 7 is what I summarize as reflect review and refine which is essentially having the large language model reflect on its past responses and refine them either by improving it or or identifying errors in (11:15) past responses okay so let’s see what this looks like via a demo here I’m going to use Chad GPT and it’s important to know what large language model you’re using because optimal prompting strategies are dependent on the large language model that you’re using Chachi PT is a fine-tuned model so you don’t really have to break your back too much on the prompt engineering to get reasonable responses but if you’re working with a base model like gpt3 you’re going to have to do a lot more (11:42) work on the prompt engineering side to get useful responses and that’s because gpg3 is not a fine-tuned model it only does word prediction while chat GPT is a fine-tuned model it was trained to take instructions and then on top of that they did this reinforcement learning with human feedback to refine those responses even further trick one is to be descriptive so let’s compare and contrast an example with and without this trick so let’s say I want to use chatgpt to help me write a birthday message for my dad the naive thing to do (12:12) would be to type been to chat GPT the following prompt write me a birthday message for my dad and so it’s gonna do that and so while this might be fine for some use cases I don’t write messages that are verbose like this and the response is a bit generic you know like Dad you’ve been my rock my guide my source of inspiration throughout my life your wisdom kindness and unwavering support has shaped me into the person I am today for that I am eternally grateful oh that’s very nice I tend to be a bit more cheeky when it comes to (12:40) these kinds of birthday messages and whatnot another thing we can do is to employ this trick of being descriptive and getting a good response from chat you PT what that might look like is you type in write me a birthday message for my dad no longer than 200 characters okay so now we don’t want it to be as verbose this is a big birthday because he’s turning 50 so now we’re giving more context to celebrate I booked us a boy’s trip to Cancun more context and then be sure to include some cheeky humor he (13:06) loves that so I’m giving jack gbt more to work with to tailor the response to something closer that I would actually write so let’s see what this response looks like okay so it’s a lot more concise which I like it says happy 50th dad time to Fiesta like you’re 21 again in Cancun cheers to endless Adventures ahead hashtag dad and Cancun that’s actually pretty funny maybe I want to use this exactly but I could see it as like a starting point for actually writing a birthday message so the second (13:32) trick is to give examples let’s compare prompts without and with this trick without giving examples we might prompt chat gbt as follows given the title of a torch data Science Blog article write a subtitle for it here we’re putting in the title as prompt engineering how to trick AI into solving your problems which is the title of the blog associated with this video and then we leave the subtitle area blank so the completion that it spits out is Unleash the Power of clever prompts for more effective AI problem solving yeah pretty (14:06) nifty let’s see what this looks like if we give a few more examples to try to capture the style of the subtitle that we’re looking for and so here the prompt is pretty similar but now I’m putting in the title and subtitle for preceding blogs in this larger Series so here put a practical introduction to llms three levels of using llms in practice then we have cracking open the openai python API a complete beginner friendly introduction with example code and then finally we have the same prompt as we (14:35) saw before so let’s see what it spits out now mastering the art of crafting effective prompts for AI driven Solutions well at face value this might not seem much different than the completion that we saw before I kind of prefer this one over this one here and the only reason is because again I don’t like verbose text and this is more concise than this previous one here so I think maybe that’s what Chad GPT picked up on it’s like oh these subtitles here have these number of tokens let’s make (15:04) sure that the next subtitle has about the same number of tokens just speculating but regardless that’s how you can incorporate examples into your prompt the next trick is to use structured text let’s see what this looks like in action so I suppose this is our prompt for tragedy BT we don’t have any structured text here we’re just putting in prompt without structured text so we’re asking it to write me a recipe for chocolate chip cookies gives a pretty good response gives us ingredients gives us instructions and (15:33) gives us some tips if Chachi PT was not fine-tuned it may not have spit out this very neat structure for a chocolate chip cookie recipe and so this is another indication of why what large language model you’re working with matters because I could be happy with this response here there may not even be a need to use structured text here but still let’s see what this could look like if we did use structured text in our prompt here the prompt is a little different create a well organized recipe for chocolate chip cookies use the (16:04) following formatting elements the key difference here is we’re now asking it specifically to follow this specific format and we’re giving it kind of of a description of each section that we want so let’s see what this looks like so one subtle difference here is that in the completion where we use structured text you notice that it just kind of gives the title and the ingredients and so on this is something that you could easily just copy paste onto like a web page without any alterations well if we go (16:33) here there’s no title which could be fine but you have this certainly here’s a classic chocolate chip cookie recipe for you so now it’s trying to be more conversational and may have required some extra steps if this is fitting into a larger like automated pipeline but other than that it doesn’t seem like there’s much difference between the other aspects of the completion one interesting thing is that here the tips are a bit more clear and bolded well here there’s just some like quick bullet (16:59) points next we have trick four which is Chain of Thought and the basic idea with Chain of Thought is to give the llm time to think and this is achieved by breaking down a complex task into smaller pieces so that it’s a bit easier for the large language model to give good completions without using Chain of Thought this is what the prompt might look like write me a LinkedIn post based on the following medium blog and then we just copy paste the medium blog text here through some text in here so it does a pretty good job again this feels (17:31) way too long for LinkedIn post and it feels like it’s just summarizing the text that I threw in there but I mean it’s not bad this could be a really good starting place but now let’s see what this can look like using Chain of Thought instead of just having it write the LinkedIn post based on the text here I’m trying to explicitly list out my personal process for turning a Blog into a LinkedIn post and trying to get the llm to mimic that so here I put write me a LinkedIn post based on the step-by-step process and medium blog (17:57) given below so here step one come up with a one line hook relevant to the blog step two extract three key points from the article step three compress each point to less than 50 characters step four combine the hook compress key points from step three and add a con to action to generate the final output and then we put the medium text here okay looking at this this seems a lot more reasonable for a LinkedIn post each line is just one sentence it’s not way too much text no one likes reading a wall of text or at least I don’t like reading a (18:27) wall of text so this is much more helpful to me in making a LinkedIn post okay trick five is to use these chatbot personas the idea here is to prompt the llm to take on a particular Persona so let’s see a concrete example of this without the trick let’s just say we want chat gbt to make me a travel itinerary for a weekend in New York city so it spits out something that looks pretty good so now let’s see what this could look like with a Persona so here instead of just asking it straight up for an (18:54) itinerary I say act as an NYC native and cabbie who knows everything about the city please make me a travel itinerary for a weekend in New York City based on your experience don’t forget to include your Charming New York accent in your response okay so let’s see what this does comparing this response with the other response there seems to be a lot of overlap and maybe there’s not a practical difference between these two but it does feel like there are things here that you don’t get here start your (19:23) day with the classic New York breakfast at a local dinner Cafe well this one will just say start with the bagel Central Park stroll Museum and grab a bagel again yeah it’s just eat Bagels every single day oh that’s funny I like how it injected a bit of humor here yep you guessed it another Bagel fuel of your final day maybe you really have to like read through these to get a sense of the subtle differences but maybe just from this Bagel example this just gives you two different flavors of itineraries and maybe one matches your (19:53) interests a bit more than the other trick number six the flipped approach and so here instead of you asking all the questions to the chat bot you prompt the chatbot to ask you questions to better help you with whatever you’re trying to do so let’s see this without the trick let’s say you just want an idea for an llm based application you give it that prompt and it’s just gonna generate some idea for you here’s generating a idea for us edu bot pros and intelligent educational platform that harnesses the power of llms to (20:21) offer personalized learning and tutoring experience for students of all ages and levels so this could be a great product idea the problem is maybe this isn’t something that you’re passionate about or that you really care about or this idea is not tailored to your interests and skill set as someone that that wants to build an app let’s see how the flipped approach can help us with this so here instead of asking for an idea just straight up we can say I want you to ask me questions to help me come up (20:46) with an llm based application idea ask me one question at a time to keep things conversational you can see right off the bat what are your areas of expertise and interest that you’d like to incorporate into your llm based application idea I didn’t think to say oh yeah maybe I should tell the chat bot what I know and what I’m interested in so we can better serve me and maybe there are a bunch of other questions that are critical to making a good recommendation on an app idea that I just wouldn’t think of and (21:12) that’s where the flip approach is helpful because the chatbot will ask you what it needs to know in order to give a good response and those questions may or may not be something that you can think of all up front the seventh and final trick is reflect review and refine and so this is essentially where we prompt the chat bot to look back at previous responses and evaluate them whether we’re asking it for improvements or to to identifying potential mistakes so what this might look like is here we have the edu bot Pro response from (21:44) before let’s see what happens when we prompt it to review the previous response so here I’m saying review your previous response pinpoint areas for enhancement and offer an improved version then explain your reasoning for how you improved the response so I haven’t tried this so we’re both seeing this for the first time it looks pretty similar but since we asked it to explain how it improved the responses it gave us this extra section here so reasoning for enhancements Clarity and conciseness (22:10) emphasizing personalization enhanced language and then monetization strategies the monetization section provides more detail on viable strategies okay cool well I’m not going to read through this but this prompt or something like it you can basically copy paste this as needed to potentially improve any chat completion so I know that was a ton of content and I flew through that but if you want to dive into any particular trick a bit more check out the Torches data Science Blog where I talk about each of of these a (22:36) bit more insight resources where you can learn more everything we have just talked about is applicable to both the easy way and the less easy way of prompt engineering but now I want to focus more on the less easy way and I’m going to try to demonstrate the power of prompt engineering the less Easy Way by building out this automatic greater example we were talking about before using the langchain python Library first as always we’re going to do some imports so here we’re just importing everything (23:04) from langchain and then here we’re going to be using the openai API so that requires a secret key if you haven’t worked with the open AI API before check out the previous video that talks all about that there I talk about what an API is talk about open ai’s API and give some example python code of how you can use it here we’re just importing our secret key which allows us to make API calls here we’re going to make our first chain the main utility of Lang chain is that it provides a ton of boilerplate (23:34) code that makes it easy to incorporate calls to large language models within your python code or some larger piece of software that you’re developing and it does this through these things called chains which is essentially a set of steps which you can modularize into these so-called chains so let’s see what that looks like the first thing we need is our chat model so here we’re going to incorporate open ai’s GPT 3. (24:02) 5 turbo the next thing we need is a prompt template so essentially this is going to be a chunk of text that we can actually pass in inputs and dynamically update with new information so for example this is the same prompt we saw from the previous slide for the automatic grader we’ll be able to pass in these arguments question correct answer and student answer into our chain and it’ll dynamically update this prompt template send it to the chat bot and get back the response to put this chain together it’s super simple (24:33) the syntax looks like this you have llm chain you define what your llm is which is chat model which is the open AI model we instantiated earlier The Prompt is prompt which is the prompt template we created on the previous slide and you combine it all together into this llm chain and we Define it as chain what this looks like in action is as follows we Define the inputs so here we’re going to define the question who was the 35th President of the United States of America we Define the correct answer John F Kennedy and we Define the (25:01) student’s answer FDR and so we can pass all these inputs to the chain as a dictionary so we have this questions correct answer student answer keywords and then we plug in these values that we Define up here and then this is what the large language model spits out students answer is wrong so it correctly grades the student answer as wrong because FDR was not the 35th President of the United States however there’s a small problem with our chain right now namely the output from this chain is a piece of (25:30) text which may or may not fit nicely into our larger data pipeline or software pipe line that we’re putting together it might make a lot more sense instead of outputting a piece of text the chain will output like a true or false indicating whether the student’s answer was correct or not with that numerical or Boolean output it’ll be much easier to process that information with some Downstream task maybe you want to sum up all the correct and incorrect answers of the homework and generate the final grade of the entire worksheet we (26:02) can do this via output parsers so this is another thing we can include in our chains that will take the output text of the large language model we’ll format in a certain way extract some piece of information or convert it into some other format as we’ll see here here I’m defining our output parser to determine whether the grade was correct or wrong and I just use a simple piece of logic here I have it returned a Boolean of whether or not the word wrong is in the text completion as an example before the (26:34) completion was the student answer is wrong so this word wrong appears in the text completion this parser here will return false because wrong is in the completion and so this knot will flip that and it’ll make it false so as you can see like we haven’t automated all the logic out of programming you still need to have some problem solving skills and programming skills here but then once we have our parser defined we can just add it into our chain like this so we have our llm same as before our prompt template same as before and then (27:05) we add this output parser which is the grade output parser that we defined right here and then we can apply this chain so let’s see what this looks like in for Loop so we have the same question and correct answer as before who’s the 35th President of the United States and then the correct answer is John F Kennedy and now we’re defining a list of student questions that we may have received which are John F Kennedy JFK FDR John F Kennedy only one n John Kennedy Jack Kennedy Jacqueline Kennedy and Robert F Kennedy also with one end (27:37) we’ll run through this list in a for Loop we’ll run our chain just like we did before and we’ll print the result and so here we can see that John F Kennedy is true indicating a correct response JFK is true FDR is false John F Kennedy spelled incorrectly is true because we specifically said misspellings are okay John Kennedy is true because we’re just dropping the middle initial Jack Kennedy’s true it’s a common nickname Jacqueline Kennedy is false that was his wife and then Robert F Kennedy is false because that’s his (28:05) brother and as always the code is available at the GitHub repo for this video series which is linked down here feel free to take this code adopt it or maybe just give you some ideas of what’s possible with prompt Engineering in this way I would be remiss if I did not talk about the limitations of prompt engineering which are as follows like I said in before optimal prompt strategies are model dependent what is the optimal prompt for chat GPT it’s going to be completely different than what’s a optimal prompt for gpt3 another downside (28:35) is that not all pertinent information may fit into the context window because only so much information can be passed into a large language model and if you’re talking about a significantly large knowledge base that’s not something that prompt engineering may be able to do most effectively another limitation is that typically the models we use to do prompt engineering are these like huge general purpose models and if you’re talking about a particular use case this might be cost inefficient or even overkill for the problem you’re (29:04) trying to solve and another version of this is that smaller specialized models can outperform a larger general purpose models an example of this was demonstrated by open AI When comparing their smaller instruct GPT model to a much larger version of gpt3 so this brings up the idea of model fine tuning and that’s going to be the topic of the next video in this series there we’re going to break down some key fine-tuning Concepts and then I’m going to share some concrete example code of how you can fine tune your very own large (29:36) language model using the hugging face software ecosystem so I hope this video was helpful to you if you enjoyed it please consider liking subscribing and sharing with others if you have any questions or suggestions for future content please feel free to drop those in the comments section below and as always thank you so much for your time and thanks for watching

AI, LLM

prompt

This post is licensed under CC BY 4.0 by the author.

Trending Tags