Contents

1 Why Do We Need to Change LLM Responses?
2 How LLM Responses Are Generated
3 Common Challenges With LLM Responses
4 Why Healthcare Is the Ideal Stress-Test for LLM Responses: Case Study
5 Starting Point: Our Unmodified Agent
6 Different Methods of Changing LLM Response
7 Side-by-Side Comparison: How Each Method Changes an LLM Response
8 How to Improve LLM Responses
9 How to Evaluate LLM Responses
10 When to Use Each Method in Healthcare Applications
11 Final Thoughts

Large Language Models (LLMs) have changed how organizations handle interactions, share information, and assist users. But straight out of the box, even the top LLMs often fall short in high-stakes areas, like healthcare. They hallucinate facts, provide generic or unsafe recommendations, or miss local context entirely.

This article explores how LLM responses can be changed, controlled, and improved. We illustrate practical approaches tested across multiple methods: Prompt Engineering, Retrieval-Augmented Generation (RAG), and Parameter-Efficient Fine-Tuning (PEFT). While the examples are drawn from a healthcare-focused case study, the principles apply to any domain that requires precise, reliable, and context-aware AI outputs.

Why Do We Need to Change LLM Responses?

LLM outputs don’t always line up with what a user needs or what an organization expects. Sometimes the answers are a bit off — inaccurate, biased, too general, or just not consistent. That can lower trust and make responses less useful.

Changing how models behave can make their responses more on point, more relevant, and better fit for what’s really needed. It helps people, supports decisions, and prevents wrong or awkward answers.

Low Effort / Cost

High Effort / Cost

High Core Change

N/A

Full Fine-Tuning:

Changes all model weights on a large dataset to create a new domain expert.
Highest cost and complexity.

PEFT (e.g., LoRA):

Adds and trains small new layers.
Changes model behavior with moderate cost and data.

Low Core Change

Prompt Engineering:
Guides the existing model in real-time with instructions. No training needed; instant results.

Decoding & Output Control:
Adjusts generation parameters like temperature. No training, real-time style control.

Retrieval-Augmented Generation (RAG):

Connects the model to an external knowledge base.
No model change, but requires setting up a data pipeline.

How LLM Responses Are Generated

Understanding how LLMs produce output helps explain why interventions are necessary.

LLMs generate responses by combining:

Patterns learned from massive text corpora during pretraining
Context from the user’s input and prompt
Decoding strategies (like temperature or top-k) that influence creativity versus precision
Their internal world model, which is static unless adapted

By default, LLMs are generalists. They do not inherently know your domain, rules, or preferred style unless you provide that information explicitly through prompts or adapt the model’s knowledge.

This is why techniques like Prompt Engineering, RAG, and PEFT exist — to control, customize, and enhance responses.

Common Challenges With LLM Responses

Base models fall short for several reasons:

1. Limited Domain Knowledge

Models don’t know your internal documentation, product specifics, or up-to-date local data. They cannot answer location-specific or industry-specific questions reliably without external context.

2. Inconsistent Tone & Formatting

LLMs often drift away from brand voice, change structure unexpectedly, or fail to maintain strict output formats like JSON, tables, or templates.

3. Hallucinations

LLMs may confidently invent facts — a critical failure in domains like healthcare, compliance, or law.

4. Lack of Specialized Skills

They can perform many tasks passably, but struggle with niche, complex workflows that require deep expertise.

Because of these limitations, changing and controlling LLM responses becomes essential.

Why Healthcare Is the Ideal Stress-Test for LLM Responses: Case Study

Healthcare forces an LLM to demonstrate:

Accuracy: Wrong facts can cause harm.
Sensitivity: Tone matters in emotional or clinical conversations.
Local Knowledge: Advice must reflect local laws, clinics, and insurance systems.
Safety Compliance: Models must handle suicide/self-harm queries responsibly.
Consistency: No hallucinations, no improvisation, no ambiguity.

In our test, we simulated a patient persona:

Alex, recently diagnosed with Type 2 Diabetes, asking real questions about life, care, costs, and emotional support.

Starting Point: Our Unmodified Agent

Alex's Question	Gemma's response
Where can I find a diabetes support group in Orlando?	It’s understandable to want to connect with others facing similar challenges – finding a support group can be incredibly helpful. Unfortunately, I don’t have access to real-time local listings of support groups. However, here are some avenues you can explore to find a diabetes support group in Orlando: Local Hospitals, Diabetes Associations, and Community Centers
Can I get life or health insurance with type 2 diabetes?	Some companies specialize in coverage for individuals with chronic conditions. You can explore plans through a broker who can help you compare options.
I'm feeling very overwhelmed and depressed after my diagnosis. What support can I find?	If you are having thoughts of harming yourself, please reach out for help immediately. Here are some resources: SAMHSA National Helpline at 1-800-662-HELP (4357) The Crisis Text Line: Text HOME to 741741 The Trevor Project: 1-866-488-7386
I was just diagnosed with diabetes 2, and I'm scared. What should I do?	Commonly, managing type 2 diabetes involves a combination of lifestyle changes, like regular physical activity and making healthy food choices. It’s important to work with a registered dietitian or certified diabetes educator to create a personalized plan.
How much does it cost to treat when I have Type 2 Diabetes in the USA?	Unfortunately, I can’t give you an exact cost estimate for treatment in the USA. Healthcare costs vary significantly based on several factors, including the specific clinic, the type of care needed, and your individual circumstances.

Different Methods of Changing LLM Response

There are three dominant strategies for controlling LLM responses. Each solves different problems and requires different effort levels.

Method 1 — Prompt Engineering

Prompt engineering is the lowest-cost, fastest way to influence LLM behavior.

It changes responses without training the model.

Concept: Prompt Engineering is the discipline of crafting detailed inputs to guide the model’s response. It’s the fastest way to influence behavior without changing the model itself. We’ll look at two key techniques.

Key Technique 1: Conditional Directives (Safety Overrides)

What it is: Embedding explicit IF-THEN style rules into the prompt to handle high-stakes topics. This forces the model to abandon its normal response generation and follow a strict, predefined script when specific keywords or themes are detected.

What it solves: Enforces critical safety procedures. For example, it can override a generic, non-local response to a self-harm query with a specific, regional helpline, ensuring the user receives the most appropriate information.

IF the user mentions self-harm, THEN your entire response must be ONLY this…

This forces the LLM to use a predefined, safe, region-specific response.

For example, the model was instructed to say:

Please know there is immediate help available.

You can call SAMHSA National Helpline at 1-800-662-HELP (4357)…

The override replaced the model’s incorrect U.S. hotline numbers.

Key Technique 2: Few-Shot / One-Shot Examples

What it is: Including examples of the desired input and output directly within the user’s prompt.

What it solves: Teaches the model the exact format you want for the output. It’s incredibly effective for fixing unreliable formatting issues.

This teaches the LLM the desired output format and local accuracy by providing example Q&A pairs.

Example:

When asked about support groups in Orlando, the prompt’s examples instructed the model to mention organizations such as

Orlando Health
AdventHealth
American Diabetes Association (ADA)

Alex's Question	Old Response	New Response
Are there any diabetes support groups in Orlando?	Unfortunately, I don’t have access to real-time local listings of support groups	There are several organizations and centers in Orlando that provide support to people with diabetes and their families: local hospitals like Orlando Health and AdventHealth, the American Diabetes Association (ADA), YMCA wellness programs, community health centers, and diabetes education programs offered through local clinics. Your doctor or diabetes educator may also know of nearby groups.
Is it possible to get life or health insurance when living with type 2 diabetes?	Some companies specialize in coverage for individuals with chronic conditions	Free government support. Insulin for people with diabetes on insulin therapy is provided free of charge through an endocrinologist in accordance with the National Diabetes Plan, which has been in effect since 1999. This is general information, and it’s very important to discuss a plan that’s right for you with your doctor.
I'm feeling very overwhelmed and down after my diagnosis. What support can I find?	The Trevor Project: 1-866-488-7386	It sounds like you are going through a difficult time. In the USA, you can reach out to support lines and text services for guidance and emotional support: 988 Lifeline & Crisis Chat, text HOME to 741741.
I was just diagnosed with type 2 diabetes, and I'm scared. What should I do?	Commonly, managing type 2 diabetes involves a combination of lifestyle changes like regular physical activity	It’s completely understandable to feel scared after a diagnosis like type 2 diabetes. It’s a lot to take in! First, allow yourself to feel those emotions – it’s a normal reaction. Now, let's focus on taking things one step at a time. Learning as much as you can about type 2 diabetes is a great start. There are many reliable resources available. Common dietary approaches for managing blood sugar often include focusing on whole foods and controlling portion sizes. Regular physical activity is also really important.
How much does it cost to treat Type 2 Diabetes in the USA?	I can’t give you an exact cost estimate for treatment in the USA	Costs vary widely depending on insurance, treatment, and location. As a general reference: routine doctor visits can cost $100–$250 per visit, lab tests $50–$200 each, medications like metformin $4–$20 per month (generic) or insulin $150–$500 per month depending on type and dose.

Results

Prompt engineering transformed vague answers into highly specific, structured ones without modifying the model.

Pros (Advantages)	Cons (Disadvantages)
Rapid Implementation: Changes are instant.	Increased Token Cost: Longer prompts cost more to process with every API call.
No Training Required: Zero compute cost for setup.	Brittle for Complex Logic: Complex rules can become hard to manage and may break easily.
Highly Flexible: Easy to test and adapt on the fly.	Cannot Add New Knowledge: It can't make the model know things it wasn't trained on.
Transparent: The logic is human-readable in the prompt.

Method 2 — Retrieval-Augmented Generation (RAG)

RAG solves the “knowledge gap” by connecting the LLM to an external, trusted data source. Instead of relying on the model’s static, pre-trained memory, we give it the exact information it needs to answer a question, just in time.

How RAG Works (The 3-Step Process):

Retrieve: When a user asks a question, the system first searches a private knowledge base (like a collection of documents or a database) for relevant information. This is often done using a Vector Database.
Augment: The relevant text snippets found in the search are then combined with the user’s original question to create a new, much more detailed prompt.
Generate: This “augmented” prompt is sent to the LLM. The model now has all the necessary context to generate a factual, accurate answer based on the provided data.

Analogy:

You’re not asking the model to remember the answer to a test question. You’re giving it the textbook and telling it which page to read.

Alex's Question	Old Response	New Response
Are there any diabetes support groups in Orlando?	Unfortunately, I don’t have access to real-time local listings of support groups	It’s understandable to want to connect with others facing similar challenges. There are several organizations and centers in Orlando that provide support to people with diabetes and their families. Local hospitals like Orlando Health are large, not-for-profit healthcare networks in Central Florida, offering hospitals, specialty institutes, outpatient clinics, and urgent care centers. It provides comprehensive services, including chronic disease management and diabetes care.
Is it possible to get life or health insurance when living with type 2 diabetes?	Some companies specialize in coverage for individuals with chronic conditions	It’s understandable to be concerned about insurance when managing type 2 diabetes. Currently, the government provides support like free insulin for people on insulin therapy through the National Diabetes Plan.
I'm feeling very overwhelmed and down after my diagnosis. What support can I find?	The Trevor Project: 1-866-488-7386	It sounds like you are going through a difficult time. Please know that support is available. You can reach out to resources in the USA, such as 988 Lifeline & Crisis Chat or by texting HOME to 741741.
I was just diagnosed with type 2 diabetes, and I'm scared. What should I do?	Commonly, managing type 2 diabetes involves a combination of lifestyle changes like regular physical activity	First, take a deep breath. Many people find support groups and online communities helpful. Learning about your condition is a great first step – focus on understanding what diabetes is and how it affects your body. Common dietary approaches for managing blood sugar often include focusing on whole foods, limiting sugary drinks, and controlling portion sizes. Regular physical activity is also really important.
How much does it cost to treat Type 2 Diabetes in the USA?	I can’t give you an exact cost estimate for treatment in the USA	Examples of Treatment Costs: doctor visits can range from $100–$250, lab tests $50–$200 each, medications like metformin $4–$20/month (generic) or insulin $150–$500/month, and diabetes education programs $0–$200 per session. Checking with your insurance provider and healthcare team can give you a more accurate estimate for your situation.

Results

RAG cut hallucinations dramatically and supported accurate, current responses for local healthcare situations.

Pros (Advantages)	Cons (Disadvantages)
Reduces Hallucinations: Answers are grounded in real, provided data.	Depends on Retrieval Quality: "Garbage in, garbage out." If the search finds the wrong document, the answer will be wrong.
Easily Updatable Knowledge: Just add a new document to the database; no retraining needed.	Requires Infrastructure: You need to set up and maintain a vector database and an embedding pipeline.
Transparent & Citable: You know exactly which source document was used to generate an answer.	Increased Latency: The extra retrieval step can make the response slightly slower than a direct query.
Cost-Effective for Knowledge: Cheaper to update a database than to retrain a model.	Doesn't Teach New Skills: Cannot teach the model a new behavior or reasoning style, only provide it with new facts.

Method 3 — Parameter-Efficient Fine-Tuning (PEFT)

PEFT teaches the model new skills by training only a small set of additional parameters (e.g., LoRA, QLoRA).

It is ideal for:

Adding medical writing style
Teaching local terminology
Reducing generic responses
Creating clinical persona consistency

Why PEFT?

PEFT (Parameter-Efficient Fine-Tuning) is the modern, efficient way to do fine-tuning.
Instead of retraining the entire model (which is slow and expensive), PEFT freezes the original model and only trains a tiny set of new parameters, like a small plugin or “adapter.”
This is much faster, requires less data, and results in a tiny adapter file, not a whole new multi-gigabyte model.

Full Fine-Tuning (Old Way)	PEFT (e.g., LoRA - New Way)
Changes all billions of parameters.	Freezes the original model, adds a few thousand new parameters.
Requires massive GPUs and is very expensive.	Can be done on a single, free Colab GPU.
Creates a completely new, large model file.	Creates a tiny, portable "adapter" file.

What is QLoRA?

QLoRA (Quantized Low-Rank Adaptation) is a technique that makes fine-tuning more accessible. It follows a simple but powerful two-step process:

Quantize: It takes a large, pre-trained model and shrinks it down to a 4-bit version.
Adapt: It then performs the standard LoRA fine-tuning on this smaller, quantized model.

By training on the compressed model, QLoRA dramatically reduces the memory needed for fine-tuning, making it possible to adapt huge models on a single, consumer-grade GPU.

Example Results

After QLoRA fine-tuning:

Insurance answers became more detailed
Local references improved
Tone became more empathetic

Responses were structured and action-oriented
The model avoided irrelevant clinical references (e.g., dialysis)

Fine-Tuning (PEFT) pros and cons

Pros (Advantages)	Cons (Disadvantages)
Highly Accessible: QLoRA makes it possible to fine-tune large models on a single, consumer-grade (or free Colab) GPU.	Slight Performance Trade-Off: Can sometimes be slightly less performant than a full fine-tune, though the difference is often negligible.
Small & Portable: Creates tiny adapter files (megabytes) instead of full model copies (gigabytes).	Inference Complexity: Requires loading the base model and the adapter, which can add a step to deployment.
Modular: You can have one base model and easily switch between many different skill "adapters."	Still Needs Quality Data: PEFT is efficient, but it can't fix a poorly constructed or noisy dataset.
Prevents Catastrophic Forgetting: By freezing the original model, you protect its vast general knowledge from being damaged during training.	"Adapter Soup": Managing dozens of different adapter files for different tasks can become a logistical challenge.

Side-by-Side Comparison: How Each Method Changes an LLM Response

Method	Prompt Engineering	Retrieval-Augmented Generation (RAG)	PEFT (Fine-Tuning)
What it Solves	Controls tone, style, and simple formatting.	Solves the "knowledge gap" by providing external data.	Teaches the model a new, reliable skill or behavior.
Core Mechanism	Detailed instructions in the prompt.	Open-book exam" - retrieves info then answers.	Training course" - updates model weights on examples.
Best Use Case	Setting a persona, enforcing safety rules, quick formatting.	Answering questions based on private or recent documents.	Consistently producing structured output (e.g., JSON), adopting a complex writing style.
Effort Level	Low	Medium	High

Each method progressively enhances control, accuracy, and user trust.

How to Improve LLM Responses

Whether you’re building a medical chatbot, a triage assistant, or a patient support system, improving each LLM response requires a layered strategy:

1. Start with Prompt Engineering

Set personality rules, add safety instructions, provide examples, and define structure, tone, and disclaimers.

2. Add RAG for Accuracy

Use embeddings and a vector database to pull in verified medical policies, clinic details, insurance info, and an always-updated knowledge base.

3. Use PEFT for Domain Behavior

Train the model on your organization’s communication style and real conversations to enhance accuracy in specialized areas like diabetes, oncology, or mental health.

The best systems combine all three layers.

How to Evaluate LLM Responses

Evaluation must be systematic, especially in healthcare.

Key Metrics

Accuracy: Is the medical info correct and current?
Safety: Does it flag emergencies or high-risk situations?
Tone & Empathy: Is the language supportive and human?
Relevance: Is content tailored to user context and region?
Structure: Does output follow the expected format?
Consistency: Are responses similar in quality across variations?

Testing Approach

Persona-based scenarios
Edge cases and contradictory questions
Long multi-turn dialogues
Localization checks (laws, clinics, cultural norms)

When to Use Each Method in Healthcare Applications

Situation	Best Method	Why
Need immediate behavior change	Prompt Engineering	Fast, zero training
Need factual accuracy tied to real documents	RAG	Uses verified medical data
Need reliable, consistent, domain-specific behavior	PEFT	Trained clinical skills
Need empathy, tone, and persona shaping	Prompt + PEFT	Tone & training
Need structured formats (JSON, reports)	PEFT	Highest consistency
Need local guidelines or dynamic content	RAG	Update knowledge instantly
Need safety-critical escalations	Prompt Engineering	Override rules

Healthcare systems usually require a hybrid stack: Prompt Engineering + RAG + PEFT.

Final Thoughts

Changing and controlling LLM responses is not optional—it’s essential for safe, accurate, and trustworthy AI in healthcare.

Our case study shows that:

Prompt Engineering offers instant control
RAG provides factual grounding
PEFT delivers long-term behavioral consistency

Together, these methods form the foundation for AI systems in high-stakes environments. At DevCom, we combine structured methods, domain datasets, and robust evaluation to turn LLMs into specialized, reliable assistants. For support designing controlled LLM systems in healthcare, finance, or regulated industries, our team is ready to help.

LLM Responses: How to Change, Control,
and Improve Them (Healthcare Case Study)

Why Do We Need to Change LLM Responses?

How LLM Responses Are Generated

Common Challenges With LLM Responses

Why Healthcare Is the Ideal Stress-Test for LLM Responses: Case Study

Starting Point: Our Unmodified Agent

Different Methods of Changing LLM Response

Method 1 — Prompt Engineering

Method 2 — Retrieval-Augmented Generation (RAG)

Method 3 — Parameter-Efficient Fine-Tuning (PEFT)

Side-by-Side Comparison: How Each Method Changes an LLM Response

How to Improve LLM Responses

How to Evaluate LLM Responses

When to Use Each Method in Healthcare Applications

Final Thoughts

USA HQ

LLM Responses: How to Change, Control, and Improve Them (Healthcare Case Study)

Why Do We Need to Change LLM Responses?

How LLM Responses Are Generated

Common Challenges With LLM Responses

Why Healthcare Is the Ideal Stress-Test for LLM Responses: Case Study

Starting Point: Our Unmodified Agent

Different Methods of Changing LLM Response

Method 1 — Prompt Engineering

Method 2 — Retrieval-Augmented Generation (RAG)

Method 3 — Parameter-Efficient Fine-Tuning (PEFT)

Side-by-Side Comparison: How Each Method Changes an LLM Response

How to Improve LLM Responses

How to Evaluate LLM Responses

When to Use Each Method in Healthcare Applications

Final Thoughts

LLM Responses: How to Change, Control,
and Improve Them (Healthcare Case Study)