Getting Started

Weekly Tuning Session Checklist

30-minute weekly review template: queue patterns, prompt adjustments, data quality, metrics.

Weekly Tuning Session Checklist

Your AI systems drift. Prompts that worked last month produce mediocre outputs today. Response times creep up. Edge cases multiply. Without regular maintenance, you're burning tokens on subpar results.

This 30-minute weekly review catches degradation before it reaches clients. Use it every Monday morning or Friday afternoon - pick a slot and protect it. You'll review four areas: queue patterns, prompt performance, data quality, and cost metrics.

Pre-Session Setup (5 minutes)

Pull these reports before you start:

Last 7 days of API logs from OpenAI/Anthropic dashboard
Token usage by endpoint (available in usage reports)
Error rate by prompt template (track in your application logs)
Average response time by request type (from your monitoring tool)

If you don't have monitoring in place, set up basic logging this week. At minimum, log: timestamp, prompt template ID, token count, response time, and any error codes.

1. Queue Pattern Analysis (8 minutes)

What to check:

Open your API APIApplication Programming Interface. The connection point that lets two pieces of software exchange data. How n8n talks to your CRM. logs. Sort by volume. Identify your top 5 prompt types by request count.

Compare this week to last week:

Did any prompt type jump more than 30% in volume?
Did any new prompt category appear in your top 10?
Are requests clustering at specific times (indicating batch jobs or user behavior patterns)?

Red flags:

Sudden spike in a single prompt type (indicates a user found a workaround or a process broke)
Requests timing out consistently between 2-4 AM (batch job needs optimization)
New prompt patterns you didn't design (users are improvising - capture and standardize these)

Action checklist:

[ ] Document the top 5 prompt types and their weekly volume
[ ] Flag any prompt type with >30% volume change
[ ] Identify 1-2 improvised prompts to convert into official templates
[ ] Note any time-of-day clustering for batch job review

Example finding: "Client intake summary" prompts jumped from 45/week to 127/week. Users are bypassing the manual intake form. Convert this into an official workflow and add structured output formatting.

2. Prompt Performance Review (10 minutes)

What to check:

Pull 5 random outputs from your highest-volume prompt template. Read them completely. Ask:

Does the output match the intended format?
Are there repeated phrases or filler content?
Does it hallucinate facts or make unsupported claims?
Would you send this to a client without editing?

Now check your lowest-performing prompt (highest error rate or longest response time). Run it 3 times with the same input. Compare outputs.

Red flags:

Outputs vary wildly between runs (prompt is too open-ended)
Model consistently ignores specific instructions (instruction is buried or unclear)
Response includes "As an AI language model" or similar hedging (system prompt needs work)
Output exceeds needed length by 2x (you're wasting tokens)

Action checklist:

[ ] Test your top prompt template with 5 random inputs
[ ] Identify 1 specific improvement for your worst-performing prompt
[ ] Check if any prompts can use a cheaper model (GPT-4 → GPT-3.5 or Claude Opus → Sonnet)
[ ] Update 1 prompt template based on findings

Specific fixes:

Replace vague instructions:

Bad: "Summarize this document professionally"
Good: "Create a 3-paragraph summary. P1: Key findings. P2: Methodology. P3: Recommendations. Use bullet points for lists. Max 200 words."

Add output constraints:

Bad: "Draft a client email"
Good: "Draft a client email. Subject line: [topic]. Body: 4 sentences max. Tone: direct, no apologies. End with a specific next step and deadline."

3. Data Quality Spot Check (5 minutes)

What to check:

If you're using RAG RAGRetrieval-Augmented Generation. An AI pattern where the model looks up your documents before answering, instead of relying on training data alone. (retrieval-augmented generation) or fine-tuning:

Open your vector database or training dataset. Pull 10 random documents. Verify:

Are dates current (nothing older than 6 months unless historical reference)?
Do documents match your current service offerings?
Are there obvious errors (formatting issues, truncated text, OCR mistakes)?

Red flags:

Documents reference discontinued services
Pricing information is outdated
Legal disclaimers are from previous policy versions
Source documents have poor OCR quality (common with scanned PDFs)

Action checklist:

[ ] Review 10 random documents from your knowledge base
[ ] Flag any documents older than 6 months for review
[ ] Remove or update 1 outdated document
[ ] Add 1 new document if you launched a service or changed a process this week

Quick win: Set a document expiration policy. Tag every document with a "review by" date. Auto-flag anything past that date during your weekly session.

4. Cost and Performance Metrics (7 minutes)

What to check:

Open your billing dashboard. Compare this week to last week:

Total tokens used
Cost per request type
Average tokens per request
Percentage of requests using GPT-4 vs GPT-3.5 (or Claude Opus vs Sonnet)

Calculate your cost per output type:

Client deliverable: $X per document
Internal summary: $X per summary
Email draft: $X per email

Red flags:

Token usage increased but request volume stayed flat (prompts are getting bloated)
More than 40% of requests use your most expensive model (you're over-provisioning)
Cost per request type varies by more than 50% week-to-week (inconsistent inputs or prompt drift)

Action checklist:

[ ] Calculate cost per output for your top 3 use cases
[ ] Identify 1 prompt that can move to a cheaper model
[ ] Set a token budget alert (most providers offer this)
[ ] Document your baseline metrics for next week's comparison

Model selection guide:

Use GPT-4/Claude Opus for:

Client-facing deliverables
Complex analysis requiring multi-step reasoning
Outputs where errors have high cost

Use GPT-3.5/Claude Sonnet for:

Internal summaries
Email drafts
Data extraction from structured documents
Anything with clear right/wrong answers

Cost optimization example: Moving internal meeting summaries from GPT-4 to GPT-3.5 Turbo saved one firm $340/month with no quality loss. Test this with 20 outputs before switching completely.

Weekly Action Summary Template

Copy this into your notes each week:

Week of: [DATE]

TOP FINDING:
[One sentence describing the biggest issue or opportunity]

CHANGES MADE:
1. [Specific prompt update, model change, or process fix]
2. [Second change]
3. [Third change]

METRICS:
- Total requests: [number] (vs [number] last week)
- Total cost: $[amount] (vs $[amount] last week)
- Avg response time: [seconds]
- Error rate: [percentage]

NEXT WEEK FOCUS:
[One specific thing to investigate or improve]

Bottom Line

This checklist prevents the slow degradation that kills AI ROI. Thirty minutes per week catches prompt drift, cost creep, and data staleness before they compound.

The firms that succeed with AI treat it like production software - they monitor, tune, and improve continuously. The firms that fail treat it like magic - they deploy once and wonder why results deteriorate.

Schedule your first session now. Put it on your calendar as a recurring meeting with yourself. Protect that time. Your AI systems will only be as good as the attention you give them.

Related Resources

Getting Started

90-Day Implementation Planner

Gantt-chart style planner matching the Month 1/2/3 plan in Ch. 9. Editable in Excel/Google Sheets.

Getting Started

AI Readiness Self-Assessment

10-question diagnostic: Is your firm ready? Scores data quality, process maturity, leadership buy-in, tech stack.

Getting Started

AI Use Policy One-Pager (Quick-Start)

A short one-page AI use policy your firm can roll out this week. Plain-English do/don'ts on data, approved tools, and how to escalate when in doubt.

Get the Book

The full system, end to end.

Looking to build your AI workforce? Get the comprehensive guide for professional services - the 12 plays, the frameworks, and the field-tested playbooks.

Buy on Amazon

Reviewed by Revenue Institute

This guide is actively maintained and reviewed by the implementation experts at Revenue Institute. As the creators of The AI Workforce Playbook, we test and deploy these exact frameworks for professional services firms scaling without new headcount.

Visit Revenue Institute →Get the Book →

Get the Book

Looking to build your AI workforce? Get the comprehensive guide for professional services.

Buy on Amazon

Need help turning this guide into reality?

Revenue Institute builds and implements the AI workforce for professional services firms.

Work with Revenue Institute