Prompting Technique for Browser Automation

How a Single Browser Automation Prompt Cost 175M Tokens — and How you can reduce the cost by 74% to 99.96%

Apr 07, 2026

Brief

I asked Claude Code to crawl a SaaS helpdesk platform, visit every page recursively, and save full-page screenshots. It was a reasonable ask — I wanted a visual inventory of every screen in our instance. Claude ran for 3 hours, made 898 tool calls, and consumed 174.8 million tokens. The actual useful output? About 108K tokens — Claude’s responses and the screenshots it saved. The other 99.94% was the system re-reading the same growing conversation history on every single turn.

This post breaks down exactly why this happened, what Claude inferred from my prompt, the strategy it adopted, and four alternative approaches that would have cut the cost by 74% to 99.96%.

Thanks for reading! Subscribe for free to receive new posts and support my work.

The original prompt

use chrome cdp to visit all pages in https://[redacted].helpdesk.com/
and take screenshots of each page. Recursively watch for the pages.
If you have any confusion, ask. Dont fill anything, or "request demo"
or purchase anything. If your click navigates you away from
[redacted].helpdesk.com then ignore that page.

After a false start (Claude wasn’t saving the screenshots), I clarified:

yes, save the screenshots, otherwise whats the point?
Restart everything and save full DOM screenshots

And later expanded scope:

there are lots of action buttons in the settings pages as well.
are you clicking them as well? take two screenshots in this case.
one of the page, then second when the action buttons is clicked

What Claude inferred

From these prompts, Claude understood the goal as: exhaustively crawl every page within this helpdesk SaaS instance, capture full-page screenshots of each, and expand scope to include interactive elements like settings action buttons.

The key word was “recursively” — Claude interpreted this as a depth-first crawl of the entire site. It would visit a page, find all links, visit each linked page, find more links, and so on. For a SaaS platform with dashboards, ticket views, forums, knowledge base, admin settings, and sub-settings, this expanded into hundreds of unique pages.

The strategy Claude adopted

Claude chose the simplest possible approach: a single long-running session where it manually browsed every page one by one, clicking through the UI, taking screenshots, and saving them to disk.

The session flow looked like this, repeated hundreds of times:

Navigate to a page (or click a link)
Wait for load
Take a GIF screenshot via the gif_creator tool
Save the screenshot to disk via Bash
Look for more links on the page
Navigate to the next one

This produced 898 assistant turns and 683 tool calls:

| Tool              | Calls | Purpose                                 |
| ----------------- | ----- | --------------------------------------- |
| `computer`        | 235   | Clicking/interacting with page elements |
| `gif_creator`     | 212   | Recording screenshots/GIFs              |
| `Bash`            | 98    | Saving files to disk                    |
| `navigate`        | 93    | Navigating to pages                     |
| `javascript_tool` | 30    | DOM manipulation                        |
| Other             | 15    | Tabs, tasks, reads, writes              |

Why this strategy was expensive

The quadratic context problem

Every turn in a Claude conversation re-sends the entire conversation history. In the API, this is how the protocol works — the full message list is sent each time, and caching reduces the cost of re-reading unchanged portions, but the tokens still count.

As Claude browsed more pages, each tool call result (including screenshot data, page content, navigation confirmations) accumulated in the conversation. The context grew steadily:

| Turn | Context size per turn |
| ---- | --------------------- |
| 1    | ~20K tokens           |
| 100  | ~65K tokens           |
| 300  | ~134K tokens          |
| 500  | ~213K tokens          |
| 700  | ~290K tokens          |
| 898  | ~370K tokens          |

The total cost is the sum of all context sizes across all turns. This grows quadratically — it’s not 898 x 370K (the final size), it’s 898 x ~195K (the average size). That’s how you get to 175M tokens.

Token breakdown

| Type         | Tokens | %     | Note                               |
| ------------ | ------ | ----- | ---------------------------------- |
| Cache read   | 170.9M | 97.8% | Re-reading prior context each turn |
| Cache create | 3.8M   | 2.2%  | New content added to context       |
| Output       | 108K   | 0.06% | Claude’s actual responses          |
| Input        | 1.2K   | ~0%   | Uncached inputs                    |

97.8% of tokens were the system re-reading conversation history. Claude’s actual work — deciding what to click, where to navigate, what to save — was 108K tokens of output. The rest was overhead.

Why this task specifically triggers the problem

Crawling a website is a high-turn, low-interdependence task. Each page visit is essentially independent — Claude doesn’t need to remember what page 47 looked like to screenshot page 48. But the single-session approach forced Claude to carry the full history of all previous pages on every turn, as if it needed that context. It didn’t.

A better strategy

The fundamental problem was that Claude treated a parallelizable, stateless task as a sequential, stateful one. The prompt didn’t give it a reason to do otherwise — “visit all pages recursively” naturally reads as “start browsing and keep going.”

Here are four alternative approaches, ranked by token savings, along with the prompt you’d use to steer Claude toward each one.

Strategy 1: Script-first — have Claude write a crawler, not be the crawler

Instead of Claude manually browsing 898 times, ask it to write a script that does the browsing. The script runs outside Claude with zero token cost.

The prompt

Write a Node.js Playwright script that:
1. Opens https://[redacted].helpdesk.com/ (I'll handle login manually first)
2. Crawls all internal links recursively, up to depth 3
3. Takes a full-page screenshot of each unique page
4. Saves screenshots to screenshots/ with descriptive filenames based on the URL path
5. Writes a manifest.json mapping URLs to screenshot filenames
6. For settings pages, also clicks each action button and captures the resulting state
7. Skips any links that navigate away from [redacted].helpdesk.com
8. Doesn't fill forms, click "request demo", or purchase anything

Run the script after writing it. If it errors, fix and retry.

What happens

Claude writes ~200 lines of Playwright code in one turn. Maybe spends 3-5 turns refining it after test runs. The script then crawls the entire site in 10-20 minutes with zero token cost. Total session: maybe 10 turns, ~50K tokens.

If you want Claude to analyze the screenshots afterward, start a new session:

Read manifest.json and review the screenshots in screenshots/.
Summarize all the pages and features you find.

Token comparison

Original:              174.8M tokens
Script approach:       ~70K tokens (writing + debugging the script)
Savings:               99.96%

When this works best

When the task is mechanical and predictable. Crawling a standard SaaS dashboard with known navigation patterns is exactly this. There’s no visual judgment needed mid-crawl — Claude doesn’t need to see a page to decide whether to screenshot it.

When this doesn’t work

If the site has unpredictable UI that requires visual judgment (e.g., “screenshot anything that looks like a bug”), or if anti-bot protections make Playwright difficult to use. But for internal SaaS tools behind a login, scripts work perfectly.

Strategy 2: Top agent + sub-agents — coordinator maps the site, workers capture pages

Split the work into a lightweight coordinator that collects URLs, then independent workers that each handle a small batch with fresh context.

The prompt

I need screenshots of every page in https://[redacted].helpdesk.com/.
Do this in two phases:

Phase 1: Navigate the site and extract all internal URLs using JavaScript
(document.querySelectorAll('a[href]')). Visit each top-level section to find
sub-pages. Write the full deduplicated URL list to urls.json grouped by section.
Don't take screenshots in this phase.

Phase 2: Read urls.json. For each section, spawn a sub-agent that visits those
URLs, takes full-page screenshots via Bash (use Playwright CLI), and saves them
to screenshots/{section}/. Run sub-agents in parallel where possible.

For settings pages with action buttons, spawn additional sub-agents that click
each button and capture the result.

What happens

Phase 1 — Claude browses the site with JavaScript-only link extraction. No screenshots, no GIFs. Maybe 50 turns, context stays under 30K. Outputs a structured urls.json:

{
  "pages": [
    {"url": "/a/dashboard", "section": "dashboard"},
    {"url": "/a/tickets", "section": "tickets"},
    {"url": "/a/tickets/filters/all", "section": "tickets"},
    {"url": "/a/admin/general", "section": "settings"},
    ...
  ]
}

Phase 2 — Claude spawns sub-agents via the Agent tool. Each sub-agent gets a batch of 5-10 URLs. Each starts with a fresh, empty context — no history from the coordinator or other agents. Multiple sub-agents can run in parallel.

Each sub-agent’s context stays small (~30K tokens/turn for ~40 turns = ~1.5M tokens per agent).

Token math

Original: sum of (context_at_turn_i) for 898 turns
        ≈ 195K average × 898 turns = 175M tokens

With sub-agents:
Discovery:  25K avg × 50 turns   =   1.25M
Agent A:    25K avg × 40 turns   =   1.0M
Agent B:    25K avg × 40 turns   =   1.0M
...x10 agents...
Total: ~1.25M + (10 × 1.0M)     =  ~11M tokens
Savings: ~94%

The key insight: 10 short conversations are dramatically cheaper than 1 long conversation, even if the total number of turns is the same, because context doesn’t accumulate across agents.

Limitations

Sub-agents spawned via the Agent tool don’t have access to Chrome MCP tools by default — they use Bash, Read, Write, Grep, etc. The sub-agents would need to use Bash to run Playwright commands, or the top agent would need to do the Chrome work itself in batches.
If the site requires login and cookies, each sub-agent may need its own auth flow (or you share cookies via a file).
Coordinating between agents requires writing intermediate files (urls.json, progress tracking). This is simple but needs to be explicit in your prompts.

Strategy 3: Batched sessions with state file — you are the scheduler

Keep Claude as the browser operator, but break the work across multiple independent sessions that share progress via a file on disk.

The prompts

Session 1:

Navigate https://[redacted].helpdesk.com/. List every top-level section and
sub-page you can find. Write them to crawl-state.json with status "pending".
Don't take screenshots yet.

Session 2 (new claude session):

Read crawl-state.json. Visit the first 15 pages with status "pending".
Take a full-page screenshot of each and save to screenshots/.
Update each page's status to "done" in the JSON file.

Session 3, 4, 5... (repeat):

Read crawl-state.json. Visit remaining pages with status "pending".
Take screenshots, save to screenshots/, update status to "done".

What happens

Each session starts with a fresh context. The state file provides continuity without the token cost of carrying conversation history. Claude reads crawl-state.json at the start, does its batch of work, updates the file, and exits.

{
  "pages": [
    {"url": "/a/dashboard", "status": "done", "section": "main"},
    {"url": "/a/tickets", "status": "done", "section": "main"},
    {"url": "/a/admin/general", "status": "pending", "section": "settings"},
    {"url": "/a/admin/email", "status": "pending", "section": "settings"},
    ...
  ]
}

Token math

Original:     1 session  × 898 turns × 195K avg context = 175M tokens
Batched:      1 session  × 50 turns  × 25K avg context  =  1.25M  (mapping)
            + 5 sessions × 60 turns  × 35K avg context  = 10.5M   (capture)
Total:                                                   ≈ 12M tokens
Savings:      ~93%

Trade-offs

Manual orchestration: You have to start each session yourself and tell it what batch to work on. You are the scheduler.
Login state: If the site requires login, you may need to log in at the start of each session (or keep the browser open so cookies persist).
State file can get out of sync: If a session crashes mid-batch, some pages might be visited but not marked “done.” You’d need to check for existing screenshots before re-visiting.

When to use this

When you want Claude doing the browsing (maybe you need visual judgment or the site is tricky) but you’re willing to manage multiple sessions. A good middle ground between full automation and script-first.

Strategy 4: Reduce per-turn bloat — optimize the single-session approach

If you want to stick with a single session, you can still cut the cost significantly by controlling what enters the conversation context.

The prompt

Use chrome CDP to visit all pages in https://[redacted].helpdesk.com/ and
take screenshots. Rules:

- Save screenshots by running Playwright CLI commands via Bash, not via
  gif_creator. The screenshots should go to disk directly, not through
  our conversation.
- Navigate to pages by URL whenever possible. Don't click through menus
  to reach pages you already know the URL for.
- Don't summarize progress. Don't list what you've already done. Only
  speak if you hit an error or need my input.
- After every 50 pages, use /compact to reset the conversation context.
- Write visited URLs to visited.txt so you can resume after compacting.

What each rule does

4a. Bash screenshots instead of GIF recording

The original session made 212 gif_creator calls. GIF recording captures multiple frames and embeds image data into the conversation context. Every subsequent turn replays all that image data.

Instead, running a Playwright screenshot command via Bash:

npx playwright screenshot --full-page "https://example.com/dashboard" screenshots/dashboard.png

This saves the image to disk without it ever entering Claude’s conversation. Claude only sees “Screenshot saved” — a few tokens instead of thousands.

Estimated savings: Removing GIF data from context could reduce per-turn context by 30-50%, which compounds across 898 turns. ~50M tokens saved.

4b. Direct URL navigation instead of UI clicking

The session made 235 computer clicks and 93 navigate calls. Clicking through menus costs multiple turns:

Clicking through a menu:
  Turn 1: Click "Admin" in sidebar       → +500 tokens to context
  Turn 2: Wait for menu to expand        → +300 tokens
  Turn 3: Click "Email Settings"         → +500 tokens
  Turn 4: Wait for page to load          → +300 tokens
  = 4 turns, ~1,600 tokens added

Direct navigation:
  Turn 1: navigate to /admin/email       → +400 tokens to context
  = 1 turn, ~400 tokens added

4x fewer turns, 4x less context growth per page. ~30M tokens saved across the session.

4c. Suppress progress reporting

Every time Claude says “I’ve now visited 15 pages, here’s what I found so far...” that text enters the conversation and is replayed on every subsequent turn. A 500-token progress report at turn 100 costs 500 x 798 = 399K tokens in cache reads over the remaining turns.

~10M tokens saved.

4d. Periodic context reset via /compact

If the conversation gets long, /compact summarizes the history and resets the context. Even one mid-session reset at turn 450 would save roughly:

Without reset: turns 451-898 replay ~190K-370K context each
With reset:    turns 451-898 replay ~20K-190K context each
Savings:       ~40M tokens from this one reset alone

Combined impact

| Optimization                        | Estimated savings                |
| ----------------------------------- | -------------------------------- |
| Bash screenshots instead of GIFs    | ~50M tokens                      |
| Direct navigation instead of clicks | ~30M tokens                      |
| Suppress progress reports           | ~10M tokens                      |
| One mid-session `/compact`          | ~40M tokens                      |
| **Combined**                        | **~130M tokens (74% reduction)** |

Even without restructuring the approach, being deliberate about what enters the conversation context could have cut the cost from 175M to ~45M tokens.

Recommendation matrix

| Strategy                  | Token savings | Effort                             | Best when...                                  |
| ------------------------- | ------------- | ---------------------------------- | --------------------------------------------- |
| 1. Script-first           | ~99.96%       | Low — one prompt                   | Task is mechanical and predictable            |
| 2. Top agent + sub-agents | ~94%          | Medium — structured prompts        | You want Claude browsing but need parallelism |
| 3. Batched sessions       | ~93%          | Medium — manual session management | You want interactive control per batch        |
| 4. Reduce bloat           | ~74%          | Low — prompt changes only          | You want the simplest change                  |

For mechanical tasks (crawling, scraping, bulk screenshots): Strategy 1 is the clear winner. There’s no reason for Claude to be in the loop during the crawl. Write the script once, run it, done.

For tasks requiring judgment mid-browse (e.g., “explore this site and identify UX issues”): Strategy 2 gives you the best balance of intelligence and efficiency.

The general rule: If Claude doesn’t need to see page N to decide what to do on page N+1, the pages are independent work units. Don’t process them sequentially in a single growing conversation. Use a script, spawn parallel agents, or batch into separate sessions. The cost of a long conversation isn’t linear — it’s quadratic. That’s the trap.

Thanks for reading! Subscribe for free to receive new posts and support my work.

Joyjeet Sarkar

Discussion about this post

Ready for more?