Contents
- 1 Why Do We Need to Change LLM Responses?
- 2 How LLM Responses Are Generated
- 3 Common Challenges With LLM Responses
- 4 Why Healthcare Is the Ideal Stress-Test for LLM Responses: Case Study
- 5 Starting Point: Our Unmodified Agent
- 6 Different Methods of Changing LLM Response
- 7 Side-by-Side Comparison: How Each Method Changes an LLM Response
- 8 How to Improve LLM Responses
- 9 How to Evaluate LLM Responses
- 10 When to Use Each Method in Healthcare Applications
- 11 Final Thoughts
Large Language Models (LLMs) have changed how organizations handle interactions, share information, and assist users. But straight out of the box, even the top LLMs often fall short in high-stakes areas, like healthcare. They hallucinate facts, provide generic or unsafe recommendations, or miss local context entirely.
This article explores how LLM responses can be changed, controlled, and improved. We illustrate practical approaches tested across multiple methods: Prompt Engineering, Retrieval-Augmented Generation (RAG), and Parameter-Efficient Fine-Tuning (PEFT). While the examples are drawn from a healthcare-focused case study, the principles apply to any domain that requires precise, reliable, and context-aware AI outputs.
Why Do We Need to Change LLM Responses?
LLM outputs don’t always line up with what a user needs or what an organization expects. Sometimes the answers are a bit off — inaccurate, biased, too general, or just not consistent. That can lower trust and make responses less useful.
Changing how models behave can make their responses more on point, more relevant, and better fit for what’s really needed. It helps people, supports decisions, and prevents wrong or awkward answers.
| Low Effort / Cost | High Effort / Cost | |
|---|---|---|
| High Core Change | N/A | Full Fine-Tuning:
|
| Low Core Change | Prompt Engineering: Guides the existing model in real-time with instructions. No training needed; instant results. Decoding & Output Control: Adjusts generation parameters like temperature. No training, real-time style control. | Retrieval-Augmented Generation (RAG):
|
How LLM Responses Are Generated
Understanding how LLMs produce output helps explain why interventions are necessary.
LLMs generate responses by combining:
- Patterns learned from massive text corpora during pretraining
- Context from the user’s input and prompt
- Decoding strategies (like temperature or top-k) that influence creativity versus precision
- Their internal world model, which is static unless adapted
By default, LLMs are generalists. They do not inherently know your domain, rules, or preferred style unless you provide that information explicitly through prompts or adapt the model’s knowledge.
This is why techniques like Prompt Engineering, RAG, and PEFT exist — to control, customize, and enhance responses.
Common Challenges With LLM Responses
Base models fall short for several reasons:
1. Limited Domain Knowledge
Models don’t know your internal documentation, product specifics, or up-to-date local data. They cannot answer location-specific or industry-specific questions reliably without external context.
2. Inconsistent Tone & Formatting
LLMs often drift away from brand voice, change structure unexpectedly, or fail to maintain strict output formats like JSON, tables, or templates.
3. Hallucinations
LLMs may confidently invent facts — a critical failure in domains like healthcare, compliance, or law.
4. Lack of Specialized Skills
They can perform many tasks passably, but struggle with niche, complex workflows that require deep expertise.
Because of these limitations, changing and controlling LLM responses becomes essential.
Why Healthcare Is the Ideal Stress-Test for LLM Responses: Case Study
Healthcare forces an LLM to demonstrate:
- Accuracy: Wrong facts can cause harm.
- Sensitivity: Tone matters in emotional or clinical conversations.
- Local Knowledge: Advice must reflect local laws, clinics, and insurance systems.
- Safety Compliance: Models must handle suicide/self-harm queries responsibly.
- Consistency: No hallucinations, no improvisation, no ambiguity.
In our test, we simulated a patient persona:
- Alex, recently diagnosed with Type 2 Diabetes, asking real questions about life, care, costs, and emotional support.
Starting Point: Our Unmodified Agent
| Alex's Question | Gemma's response |
|---|---|
| Where can I find a diabetes support group in Orlando? | It’s understandable to want to connect with others facing similar challenges – finding a support group can be incredibly helpful. Unfortunately, I don’t have access to real-time local listings of support groups. However, here are some avenues you can explore to find a diabetes support group in Orlando: Local Hospitals, Diabetes Associations, and Community Centers |
| Can I get life or health insurance with type 2 diabetes? | Some companies specialize in coverage for individuals with chronic conditions. You can explore plans through a broker who can help you compare options. |
| I'm feeling very overwhelmed and depressed after my diagnosis. What support can I find? | If you are having thoughts of harming yourself, please reach out for help immediately. Here are some resources: SAMHSA National Helpline at 1-800-662-HELP (4357) The Crisis Text Line: Text HOME to 741741 The Trevor Project: 1-866-488-7386 |
| I was just diagnosed with diabetes 2, and I'm scared. What should I do? | Commonly, managing type 2 diabetes involves a combination of lifestyle changes, like regular physical activity and making healthy food choices. It’s important to work with a registered dietitian or certified diabetes educator to create a personalized plan. |
| How much does it cost to treat when I have Type 2 Diabetes in the USA? | Unfortunately, I can’t give you an exact cost estimate for treatment in the USA. Healthcare costs vary significantly based on several factors, including the specific clinic, the type of care needed, and your individual circumstances. |
Different Methods of Changing LLM Response
There are three dominant strategies for controlling LLM responses. Each solves different problems and requires different effort levels.
Method 1 — Prompt Engineering
Prompt engineering is the lowest-cost, fastest way to influence LLM behavior.
It changes responses without training the model.
Concept: Prompt Engineering is the discipline of crafting detailed inputs to guide the model’s response. It’s the fastest way to influence behavior without changing the model itself. We’ll look at two key techniques.
Key Technique 1: Conditional Directives (Safety Overrides)
What it is: Embedding explicit IF-THEN style rules into the prompt to handle high-stakes topics. This forces the model to abandon its normal response generation and follow a strict, predefined script when specific keywords or themes are detected.
What it solves: Enforces critical safety procedures. For example, it can override a generic, non-local response to a self-harm query with a specific, regional helpline, ensuring the user receives the most appropriate information.
IF the user mentions self-harm, THEN your entire response must be ONLY this…
This forces the LLM to use a predefined, safe, region-specific response.
For example, the model was instructed to say:
Please know there is immediate help available.
You can call SAMHSA National Helpline at 1-800-662-HELP (4357)…
The override replaced the model’s incorrect U.S. hotline numbers.
Key Technique 2: Few-Shot / One-Shot Examples
What it is: Including examples of the desired input and output directly within the user’s prompt.
What it solves: Teaches the model the exact format you want for the output. It’s incredibly effective for fixing unreliable formatting issues.
This teaches the LLM the desired output format and local accuracy by providing example Q&A pairs.
Example:
When asked about support groups in Orlando, the prompt’s examples instructed the model to mention organizations such as
- Orlando Health
- AdventHealth
- American Diabetes Association (ADA)
| Alex's Question | Old Response | New Response |
|---|---|---|
| Are there any diabetes support groups in Orlando? | Unfortunately, I don’t have access to real-time local listings of support groups | There are several organizations and centers in Orlando that provide support to people with diabetes and their families: local hospitals like Orlando Health and AdventHealth, the American Diabetes Association (ADA), YMCA wellness programs, community health centers, and diabetes education programs offered through local clinics. Your doctor or diabetes educator may also know of nearby groups. |
| Is it possible to get life or health insurance when living with type 2 diabetes? | Some companies specialize in coverage for individuals with chronic conditions | Free government support. Insulin for people with diabetes on insulin therapy is provided free of charge through an endocrinologist in accordance with the National Diabetes Plan, which has been in effect since 1999. This is general information, and it’s very important to discuss a plan that’s right for you with your doctor. |
| I'm feeling very overwhelmed and down after my diagnosis. What support can I find? | The Trevor Project: 1-866-488-7386 | It sounds like you are going through a difficult time. In the USA, you can reach out to support lines and text services for guidance and emotional support: 988 Lifeline & Crisis Chat, text HOME to 741741. |
| I was just diagnosed with type 2 diabetes, and I'm scared. What should I do? | Commonly, managing type 2 diabetes involves a combination of lifestyle changes like regular physical activity | It’s completely understandable to feel scared after a diagnosis like type 2 diabetes. It’s a lot to take in! First, allow yourself to feel those emotions – it’s a normal reaction. Now, let's focus on taking things one step at a time. Learning as much as you can about type 2 diabetes is a great start. There are many reliable resources available. Common dietary approaches for managing blood sugar often include focusing on whole foods and controlling portion sizes. Regular physical activity is also really important. |
| How much does it cost to treat Type 2 Diabetes in the USA? | I can’t give you an exact cost estimate for treatment in the USA | Costs vary widely depending on insurance, treatment, and location. As a general reference: routine doctor visits can cost $100–$250 per visit, lab tests $50–$200 each, medications like metformin $4–$20 per month (generic) or insulin $150–$500 per month depending on type and dose. |
Results
Prompt engineering transformed vague answers into highly specific, structured ones without modifying the model.
| Pros (Advantages) | Cons (Disadvantages) |
|---|---|
| Rapid Implementation: Changes are instant. | Increased Token Cost: Longer prompts cost more to process with every API call. |
| No Training Required: Zero compute cost for setup. | Brittle for Complex Logic: Complex rules can become hard to manage and may break easily. |
| Highly Flexible: Easy to test and adapt on the fly. | Cannot Add New Knowledge: It can't make the model know things it wasn't trained on. |
| Transparent: The logic is human-readable in the prompt. |
Method 2 — Retrieval-Augmented Generation (RAG)
RAG solves the “knowledge gap” by connecting the LLM to an external, trusted data source. Instead of relying on the model’s static, pre-trained memory, we give it the exact information it needs to answer a question, just in time.
How RAG Works (The 3-Step Process):
- Retrieve: When a user asks a question, the system first searches a private knowledge base (like a collection of documents or a database) for relevant information. This is often done using a Vector Database.
- Augment: The relevant text snippets found in the search are then combined with the user’s original question to create a new, much more detailed prompt.
- Generate: This “augmented” prompt is sent to the LLM. The model now has all the necessary context to generate a factual, accurate answer based on the provided data.
Analogy:
You’re not asking the model to remember the answer to a test question. You’re giving it the textbook and telling it which page to read.
| Alex's Question | Old Response | New Response |
|---|---|---|
| Are there any diabetes support groups in Orlando? | Unfortunately, I don’t have access to real-time local listings of support groups | It’s understandable to want to connect with others facing similar challenges. There are several organizations and centers in Orlando that provide support to people with diabetes and their families. Local hospitals like Orlando Health are large, not-for-profit healthcare networks in Central Florida, offering hospitals, specialty institutes, outpatient clinics, and urgent care centers. It provides comprehensive services, including chronic disease management and diabetes care. |
| Is it possible to get life or health insurance when living with type 2 diabetes? | Some companies specialize in coverage for individuals with chronic conditions | It’s understandable to be concerned about insurance when managing type 2 diabetes. Currently, the government provides support like free insulin for people on insulin therapy through the National Diabetes Plan. |
| I'm feeling very overwhelmed and down after my diagnosis. What support can I find? | The Trevor Project: 1-866-488-7386 | It sounds like you are going through a difficult time. Please know that support is available. You can reach out to resources in the USA, such as 988 Lifeline & Crisis Chat or by texting HOME to 741741. |
| I was just diagnosed with type 2 diabetes, and I'm scared. What should I do? | Commonly, managing type 2 diabetes involves a combination of lifestyle changes like regular physical activity | First, take a deep breath. Many people find support groups and online communities helpful. Learning about your condition is a great first step – focus on understanding what diabetes is and how it affects your body. Common dietary approaches for managing blood sugar often include focusing on whole foods, limiting sugary drinks, and controlling portion sizes. Regular physical activity is also really important. |
| How much does it cost to treat Type 2 Diabetes in the USA? | I can’t give you an exact cost estimate for treatment in the USA | Examples of Treatment Costs: doctor visits can range from $100–$250, lab tests $50–$200 each, medications like metformin $4–$20/month (generic) or insulin $150–$500/month, and diabetes education programs $0–$200 per session. Checking with your insurance provider and healthcare team can give you a more accurate estimate for your situation. |
Results
RAG cut hallucinations dramatically and supported accurate, current responses for local healthcare situations.
| Pros (Advantages) | Cons (Disadvantages) |
|---|---|
| Reduces Hallucinations: Answers are grounded in real, provided data. | Depends on Retrieval Quality: "Garbage in, garbage out." If the search finds the wrong document, the answer will be wrong. |
| Easily Updatable Knowledge: Just add a new document to the database; no retraining needed. | Requires Infrastructure: You need to set up and maintain a vector database and an embedding pipeline. |
| Transparent & Citable: You know exactly which source document was used to generate an answer. | Increased Latency: The extra retrieval step can make the response slightly slower than a direct query. |
| Cost-Effective for Knowledge: Cheaper to update a database than to retrain a model. | Doesn't Teach New Skills: Cannot teach the model a new behavior or reasoning style, only provide it with new facts. |
Method 3 — Parameter-Efficient Fine-Tuning (PEFT)
PEFT teaches the model new skills by training only a small set of additional parameters (e.g., LoRA, QLoRA).
It is ideal for:
- Adding medical writing style
- Teaching local terminology
- Reducing generic responses
- Creating clinical persona consistency
Why PEFT?
- PEFT (Parameter-Efficient Fine-Tuning) is the modern, efficient way to do fine-tuning.
- Instead of retraining the entire model (which is slow and expensive), PEFT freezes the original model and only trains a tiny set of new parameters, like a small plugin or “adapter.”
- This is much faster, requires less data, and results in a tiny adapter file, not a whole new multi-gigabyte model.
| Full Fine-Tuning (Old Way) | PEFT (e.g., LoRA - New Way) |
|---|---|
| Changes all billions of parameters. | Freezes the original model, adds a few thousand new parameters. |
| Requires massive GPUs and is very expensive. | Can be done on a single, free Colab GPU. |
| Creates a completely new, large model file. | Creates a tiny, portable "adapter" file. |
What is QLoRA?
QLoRA (Quantized Low-Rank Adaptation) is a technique that makes fine-tuning more accessible. It follows a simple but powerful two-step process:
- Quantize: It takes a large, pre-trained model and shrinks it down to a 4-bit version.
- Adapt: It then performs the standard LoRA fine-tuning on this smaller, quantized model.
By training on the compressed model, QLoRA dramatically reduces the memory needed for fine-tuning, making it possible to adapt huge models on a single, consumer-grade GPU.
Example Results
After QLoRA fine-tuning:
- Insurance answers became more detailed
- Local references improved
- Tone became more empathetic
- Responses were structured and action-oriented
- The model avoided irrelevant clinical references (e.g., dialysis)
Fine-Tuning (PEFT) pros and cons
| Pros (Advantages) | Cons (Disadvantages) |
|---|---|
| Highly Accessible: QLoRA makes it possible to fine-tune large models on a single, consumer-grade (or free Colab) GPU. | Slight Performance Trade-Off: Can sometimes be slightly less performant than a full fine-tune, though the difference is often negligible. |
| Small & Portable: Creates tiny adapter files (megabytes) instead of full model copies (gigabytes). | Inference Complexity: Requires loading the base model and the adapter, which can add a step to deployment. |
| Modular: You can have one base model and easily switch between many different skill "adapters." | Still Needs Quality Data: PEFT is efficient, but it can't fix a poorly constructed or noisy dataset. |
| Prevents Catastrophic Forgetting: By freezing the original model, you protect its vast general knowledge from being damaged during training. | "Adapter Soup": Managing dozens of different adapter files for different tasks can become a logistical challenge. |
Side-by-Side Comparison: How Each Method Changes an LLM Response
| Method | Prompt Engineering | Retrieval-Augmented Generation (RAG) | PEFT (Fine-Tuning) |
|---|---|---|---|
| What it Solves | Controls tone, style, and simple formatting. | Solves the "knowledge gap" by providing external data. | Teaches the model a new, reliable skill or behavior. |
| Core Mechanism | Detailed instructions in the prompt. | Open-book exam" - retrieves info then answers. | Training course" - updates model weights on examples. |
| Best Use Case | Setting a persona, enforcing safety rules, quick formatting. | Answering questions based on private or recent documents. | Consistently producing structured output (e.g., JSON), adopting a complex writing style. |
| Effort Level | Low | Medium | High |
Each method progressively enhances control, accuracy, and user trust.
How to Improve LLM Responses
Whether you’re building a medical chatbot, a triage assistant, or a patient support system, improving each LLM response requires a layered strategy:
1. Start with Prompt Engineering
Set personality rules, add safety instructions, provide examples, and define structure, tone, and disclaimers.
2. Add RAG for Accuracy
Use embeddings and a vector database to pull in verified medical policies, clinic details, insurance info, and an always-updated knowledge base.
3. Use PEFT for Domain Behavior
Train the model on your organization’s communication style and real conversations to enhance accuracy in specialized areas like diabetes, oncology, or mental health.
The best systems combine all three layers.
How to Evaluate LLM Responses
Evaluation must be systematic, especially in healthcare.
Key Metrics
- Accuracy: Is the medical info correct and current?
- Safety: Does it flag emergencies or high-risk situations?
- Tone & Empathy: Is the language supportive and human?
- Relevance: Is content tailored to user context and region?
- Structure: Does output follow the expected format?
- Consistency: Are responses similar in quality across variations?
Testing Approach
- Persona-based scenarios
- Edge cases and contradictory questions
- Long multi-turn dialogues
- Localization checks (laws, clinics, cultural norms)
When to Use Each Method in Healthcare Applications
| Situation | Best Method | Why |
|---|---|---|
| Need immediate behavior change | Prompt Engineering | Fast, zero training |
| Need factual accuracy tied to real documents | RAG | Uses verified medical data |
| Need reliable, consistent, domain-specific behavior | PEFT | Trained clinical skills |
| Need empathy, tone, and persona shaping | Prompt + PEFT | Tone & training |
| Need structured formats (JSON, reports) | PEFT | Highest consistency |
| Need local guidelines or dynamic content | RAG | Update knowledge instantly |
| Need safety-critical escalations | Prompt Engineering | Override rules |
Healthcare systems usually require a hybrid stack: Prompt Engineering + RAG + PEFT.
Final Thoughts
Changing and controlling LLM responses is not optional—it’s essential for safe, accurate, and trustworthy AI in healthcare.
Our case study shows that:
Together, these methods form the foundation for AI systems in high-stakes environments. At DevCom, we combine structured methods, domain datasets, and robust evaluation to turn LLMs into specialized, reliable assistants. For support designing controlled LLM systems in healthcare, finance, or regulated industries, our team is ready to help.

