Skip to main content

Context Management

How Grok One-Shot manages conversation context and documentation loading.

Overview

Grok One-Shot uses an efficient on-demand context loading system that balances comprehensive documentation access with token efficiency.

Context Loading Strategy

Traditional Approach (Old System)

Problem with auto-loading everything:

Startup context:
- GROK.md: ~6,400 bytes
- docs-index.md: ~7,600 bytes
- All 49 docs: ~65,000-85,000 tokens

Result: 65k-85k tokens consumed before user sends first message

Issues:

  • Massive token waste on unused documentation
  • Slower startup
  • Higher API costs
  • Context limit reached quickly

Current Approach (Efficient System)

On-demand loading:

Startup context:
- GROK.md: ~6,400 bytes (1,600 tokens)
- docs-index.md: ~7,600 bytes (1,900 tokens)
Total: ~3,500 tokens (95% reduction!)

Runtime:
- AI reads specific docs as needed via Read tool
- Only loads relevant documentation
- User queries load minimal context

Benefits:

  • 94.6-95.8% token reduction at startup
  • Faster startup
  • Lower initial costs
  • Context budget available for actual work

How It Works

Startup Phase

What's loaded:

// src/hooks/use-claude-md.ts
export function useClaudeMd() {
const claudeMd = readFileSync('GROK.md', 'utf-8');
const docsIndex = readFileSync('docs-index.md', 'utf-8');

return {
systemPrompt: `${claudeMd}\n\n${docsIndex}`,
tokenCount: ~3500
};
}

Result:

  • AI knows project structure (GROK.md)
  • AI knows available documentation (docs-index.md)
  • AI can read specific docs when needed

Runtime Phase

When AI needs specific information:

  1. User asks question:
> How do I configure MCP servers?
  1. AI checks docs-index.md:
AI sees:
- configuration/settings.md (covers MCP configuration)
- build-with-claude-code/mcp.md (detailed MCP guide)
  1. AI uses Read tool:
await Read({
file_path: '.agent/docs/claude-code/configuration/settings.md'
});
  1. AI responds with accurate info:
To configure MCP servers, edit ~/.grok/settings.json...
[provides information from settings.md]

Context in Sessions

Session Context Accumulation

Each message adds context:

User message: +tokens (your prompt)
AI response: +tokens (AI's reply)
Tool calls: +tokens (file contents, command outputs)

Example session growth:

Initial: 3,500 tokens (GROK.md + docs-index.md)
After message 1: 5,000 tokens (+1,500)
After message 5: 12,000 tokens
After message 20: 45,000 tokens
After message 50: 90,000 tokens (approaching limit)

Context Limits

Model context window: 128,000 tokens

Practical considerations:

Good session: 10,000-50,000 tokens
- Enough context for coherent conversation
- Room for file reading and analysis

Large session: 50,000-100,000 tokens
- Still functional but getting expensive
- Consider if all context is needed

Excessive: >100,000 tokens
- Approaching model limit
- Very expensive
- Should start new session

Monitoring Context

Check token usage:

# During session
Press Ctrl+I

Output:
Token Usage:
Input: 45,230 tokens
Output: 12,450 tokens
Total: 57,680 tokens

From session files:

cat ~/.grok/sessions/latest-session.json | jq '.tokenUsage'

Context Optimization

Start New Sessions

When to start fresh:

  • Unrelated task
  • Context > 50k tokens and slowing down
  • No longer need old conversation
  • Want clean slate

How:

# Exit current session
/exit

# Start new
grok

Headless Mode for Simple Queries

Avoid session accumulation:

# Each query is independent
grok -p "list TypeScript files"
grok -p "find TODO comments"
grok -p "check for console.log"

# No context carries over between queries

Be Specific

Bad (loads lots of context):

> Tell me everything about this codebase
[AI reads many files, context explodes]

Good (targeted context):

> Explain how authentication works in src/auth/
[AI reads specific files, context stays manageable]

Advanced Context Techniques

Incremental Exploration

Build context gradually:

Step 1: "What is the overall architecture?"
[AI reads GROK.md, provides overview]

Step 2: "How does the agent system work?"
[AI reads specific agent docs]

Step 3: "Show me the GrokAgent implementation"
[AI reads src/agent/grok-agent.ts]

Benefits:

  • Only loads what's needed
  • Builds understanding progressively
  • Avoids context explosion

Context Pruning (Manual)

Current state: Manual

  • No automatic context pruning yet
  • User must start new session when context is large
  • Future enhancement: automatic context compression

How to prune manually:

# Save important findings
> Summarize what we've learned so far
[Copy summary]

# Start new session
/exit
grok

# Resume with summary
> Continuing from previous session:
[Paste summary]
Now let's...

Implemented

Efficient startup:

  • On-demand doc loading
  • Minimal initial context
  • Fast session start

Context monitoring:

  • Ctrl+I shows token usage
  • Session files track usage
  • Manual inspection available

Session management:

  • Save/restore sessions
  • Session history in ~/.grok/sessions/
  • Manual session control

Partially Implemented

Context awareness:

  • AI understands when context is large
  • Manual pruning via new session
  • No automatic warnings at thresholds

Multi-session workflows:

  • Can start multiple sessions
  • No session linking or merging
  • No cross-session context sharing

Planned Features

Automatic context management:

  • Auto-prune old messages when threshold reached
  • Intelligent context summarization
  • Keep most relevant parts, summarize old parts

Context caching:

  • Cache common docs (settings, quickstart)
  • Reduce repeated API calls
  • Faster responses for frequent questions

Smart context loading:

  • Predict which docs user will need
  • Pre-load related documentation
  • Balance prediction vs token cost

Best Practices

DO

** Monitor token usage:**

Press Ctrl+I regularly to check context size

** Start new sessions for unrelated tasks:**

/exit # End current task
grok # Fresh start for new task

** Use headless mode for simple queries:**

grok -p "quick query" # No session accumulation

** Be specific in prompts:**

"Analyze authentication in src/auth/"
vs
"Analyze everything"

DON'T

** Let sessions grow indefinitely:**

# Check tokens
Ctrl+I
# If >50k, consider new session

** Load unnecessary files:**

# Avoid: "Read all files"
# Better: "Read src/auth/middleware.ts"

** Repeat context unnecessarily:**

# Session remembers previous messages
# No need to re-explain context

Troubleshooting

High Token Usage

Symptom: Ctrl+I shows >50k tokens

Causes:

  • Long conversation
  • AI read many files
  • Repeated context

Solutions:

# Start new session
/exit
grok

# Or use summary technique
> Summarize findings, then start new session

Slow Responses

Symptom: AI takes long to respond

Possible cause: Large context

Check:

Ctrl+I to see token count
If >80k tokens, context is likely cause

Solution:

# Start fresh session
/exit
grok

Context Confusion

Symptom: AI confuses current task with earlier messages

Cause: Too much context mixing different topics

Solution:

# Start new session for new topic
/exit
grok

# Be explicit
> Focusing on [NEW TOPIC], ignoring previous discussion about [OLD TOPIC]

Technical Details

Implementation

Context loading hook:

// src/hooks/use-claude-md.ts
export function useClaudeMd(): string {
const grokMd = readFileSync(path.join(cwd, 'GROK.md'), 'utf-8');
const docsIndex = readFileSync(path.join(cwd, 'docs-index.md'), 'utf-8');
return `${grokMd}\n\n${docsIndex}`;
}

Session context:

// src/agent/grok-agent.ts
const messages = [
{ role: 'system', content: systemPrompt }, // GROK.md + docs-index.md
...conversationHistory, // Previous messages
{ role: 'user', content: userMessage } // Current message
];

Token counting:

// Approximate: 1 token ≈ 4 characters
const estimatedTokens = text.length / 4;

Future Enhancements

Automatic compaction:

// Planned
if (totalTokens > COMPACTION_THRESHOLD) {
const summary = await compactOldMessages(messages);
messages = [systemPrompt, summary, ...recentMessages];
}

Context caching:

// Planned
const cachedDocs = cache.get('common-docs');
if (!cachedDocs) {
cachedDocs = await loadDocs();
cache.set('common-docs', cachedDocs, TTL);
}

See Also


Status: Core functionality implemented, Advanced features in progress

Efficient context management ensures fast, cost-effective AI interactions.