How I Use LLMs

rw-book-cover

Metadata

Author: Andrej Karpathy
Full Title: How I Use LLMs
Category: #articles
Document Note: Background: I utilize LLMs for both personal and professional purposes, and I am seeking to enhance my understanding and effectiveness in using these models.
Top 3 Most Important Takeaways:
🧠 Explore Diverse Use Cases: Investigate various applications of LLMs in both personal and work contexts. This could include content generation, brainstorming ideas, or automating repetitive tasks to maximize efficiency.
🔍 Refine Prompt Engineering: Focus on improving your prompt crafting skills. Experiment with different phrasing and structures to elicit more accurate and relevant responses from LLMs, ensuring that you get the most out of your interactions.
📚 Stay Updated on Developments: Keep abreast of the latest advancements in LLM technology and best practices. Regularly read articles, attend webinars, or engage with communities to learn new strategies and features that can enhance your usage.
Summary: The author shares their experiences using large language models (LLMs). They explain how LLMs help with various tasks. The video highlights the benefits of using these advanced tools.
URL: https://www.youtube.com/watch?v=EWvNQjAaOHw&feature=youtu.be

Highlights

so here you can come to some ranking of different models and you can see sort of their strength or ELO score and so this is one place where you can keep track of them (View Highlight)
- Note: can view leaderboard of how LLMs compare for specific tasks at https://lmarena.ai/?leaderboard
the way we can see those tokens is we can use an app like for example Tik tokenizer (View Highlight)
- Note: can use tiktokenizer.vercel.app to see how tokens will be chunked from text input into an LLM
what I want to get across here is that what looks to you and I as little chat bubbles going back and forth under the hood we are collaborating with the model and we're both writing into a token stream and these two bubbles back and forth were in sequence of exactly 42 tokens under the hood I contributed some of the first tokens and then the model continued the sequence of tokens with its response
and we could alternate and continue adding tokens here and together we're are building out a token window a onedimensional tokens onedimensional sequence of tokens (View Highlight)
- Note: Chats show up as a sequence of tokens under-the-hood. Each response adds to the sequence.
this pre-training stage also we saw is fairly costly so
this can be many tens of millions of dollars say like three months of training and so on um so this is a costly long phase for that reason this phase is not done that often (View Highlight)
- Note: pre-training a model is very expensive. This is how they do things like "load the internet" into the models. And this is why models sometimes don't have up-to-date info. There is a cutoff from when they were pre-trained. Also, the pre-trained data is "lossy" and probabilistic. It's not actually the whole internet, it's a probabilistic neural network that has been trained on the internet to learn how to predict the next in a sequence of tokens.
post-training Stage is really attaching a smiley face to this ZIP file because we don't want to generate internet documents we want this thing to take on the Persona of an assistant that responds to user queries and that's done in a process of post training where we swap out the data set for a data set of conversations that are built out by
humans so this is basically where the model takes on this Persona and that actually so that we can like ask questions and it responds with answers (View Highlight)
- Note: post-training is much cheaper than pre-training. Includes fine-tuning the model to give it a "persona" that we can interact with for chat purposes.
I don't always fully trust what's coming out here right this is just a probabilistic statistical
recollection of the internet (View Highlight)
- Note: keep in mind that LLM answers are probabilistic, not definitive. So answers are likely to be more true for more commonly discussed topics on the internet.
hink of the tokens in the context window as a precious resource um think of that as the working memory of the model and don't overload it with irrelevant information and keep it as short as you can and you can expect that to work faster and slightly better of course if the if the information actually is related to your task you may want to keep it in there but I encourage you to as often as as you can um
basically start a new chat whenever you are switching topic (View Highlight)
- Note: open new chats for new topics. Unrelated info in a chat can bog down the LLM, since it's making the token stream larger unnecessarily.
I kind of refer to all these models as my llm Council so they're kind of like the Council of language models if I'm trying to figure out where to go on a vacation I will ask all of them and uh so you can also do that for yourself if that works for you (View Highlight)
- Note: worth chatting with various models about the same topic to see differences in responses. I've noticed Claude does way better than other when working with large amounts of text.
when you are using a thinking model which will do additional thinking you are using the model that has been additionally tuned with reinforcement learning and qualitatively what does this look like well qualitatively the model will do a lot more thinking and what you can expect is that you will get higher accuracies especially on problems that are for example math code and things that require a lot of thinking (View Highlight)
- Note: Reasoning models have been trained on "Reinforcement Learning" and can spend multiple minutes "thinking" about how to solve complex problems. These are great for prompts or problems that require deeper answers or multiple levels of thought.
for open all of these models that start with o
are thinking models 01 O3 mini O3 mini high and 01 Pro promote are all thinking models and uh they're not very good at naming their models (View Highlight)
- Note: Open AI models starting with 'o' are the thinking/reasoning models
the reason I like perplexity is because when you go to the model dropdown one of the models that they host is this deep seek R1 so this has the reasoning with the Deep seek R1 model which is the model that we saw uh over here uh this is the paper so perplexity just hosts it and makes it very easy to use (View Highlight)
- Note: You can access a reasoning model for free (limited use daily) by signing in at https://www.perplexity.ai/
deep research is a combination of internet search and thinking and rolled out for a long time so the model will go off and it will spend tens of minutes doing what deep research (View Highlight)

How I Use LLMs

Metadata

Document Note: Background: I utilize LLMs for both personal and professional purposes, and I am seeking to enhance my understanding and effectiveness in using these models.

Highlights