Error Handling and Recovery

How Grok One-Shot handles errors and recovers from failures.

Overview

Grok One-Shot is designed with robust error handling and automatic recovery. The AI can detect errors, diagnose causes, and attempt corrections autonomously.

Error Categories

1. API Errors

X.AI API failures:

Common errors:
- 401 Unauthorized (invalid API key)
- 429 Too Many Requests (rate limit)
- 500 Internal Server Error (API issue)
- Network timeout
- Connection refused

Handling:

Automatic retry for transient errors (3 attempts)
Exponential backoff (2s, 4s, 8s)
Clear error messages to user
Graceful degradation

Example:

User: "Analyze the codebase"

API Error: 429 Too Many Requests

AI Response:
" Rate limit exceeded. Retrying in 2 seconds..."
[Waits 2s]
"Retrying request..."
[Success]

2. File Operation Errors

Read errors:

Common errors:
- File not found
- Permission denied
- File too large
- Invalid encoding

AI recovery:

User: "Read auth.ts"

Error: File not found

AI Recovery:
1. Glob("**/*auth*.ts")
→ Finds: src/auth/middleware.ts
2. Read(src/auth/middleware.ts)
→ Success
3. "Found the file at src/auth/middleware.ts"

Write/Edit errors:

Common errors:
- Permission denied
- Disk full
- File already exists (Write)
- old_string not found (Edit)

AI recovery:

Error: old_string not unique in Edit

AI Recovery:
1. Read file to see content
2. Find unique surrounding context
3. Retry Edit with larger old_string
4. Success

3. Command Execution Errors

Bash tool failures:

Common errors:
- Command not found
- Exit code non-zero
- Timeout
- Permission denied

AI recovery:

User: "Run the tests"

Error: npm: command not found

AI Recovery:
1. Check for yarn: which yarn
2. Try: yarn test
3. Or: bun test
4. Or: node ./test.js
5. Report findings to user

4. Tool Limit Errors

MAX_TOOL_ROUNDS exceeded:

Error: Reached maximum tool calls (400)

AI Response:
"I've reached the tool limit. Here's what I've completed:
- Analyzed 47 files
- Found 12 issues
- Fixed 8 of them

Remaining tasks:
- 4 issues need review
- Would you like me to continue in a new session?"

Error Handling Strategies

Implemented

Automatic retry:

API calls retry on transient errors
Exponential backoff
Max 3-4 retry attempts
Clear error reporting

Graceful degradation:

Continue on non-critical failures
Report partial results
Suggest manual alternatives

Error recovery:

AI adapts to errors automatically
Tries alternative approaches
Corrects file paths, commands
Adjusts strategy based on failures

Error reporting:

Clear, actionable error messages
Context about what went wrong
Suggestions for fixes
Logs available for debugging

Partially Implemented

Transaction rollback:

No atomic multi-file operations
No automatic undo on failure
Manual recovery needed

Error prevention:

No pre-flight validation
No dry-run mode
No change preview

Advanced recovery:

No checkpoint/resume
No partial state save
Limited context recovery

Planned Features

Atomic operations:

Multi-file transactions
Automatic rollback on failure
Checkpoint and resume
Change preview before apply

Enhanced validation:

Pre-flight checks
Dry-run mode
Impact analysis
Confirmation with preview

Smart recovery:

Learn from previous errors
Suggest preventive measures
Auto-fix common issues
Recovery strategies library

Error Recovery Examples

Example 1: File Not Found

Scenario:

User: "Fix the bug in user-service.ts"

AI:
1. Edit(user-service.ts)
Error: File not found

Recovery:
2. Glob("**/*user-service*")
→ Found: src/services/user-service.ts

3. Edit(src/services/user-service.ts)
Success: Bug fixed

Response: "Fixed the bug in src/services/user-service.ts"

Example 2: Command Not Found

Scenario:

User: "Run the linter"

AI:
1. Bash("eslint .")
Error: eslint: command not found

Recovery:
2. Bash("npm run lint")
Success: Linter runs via npm script

Response: "Ran linter via npm run lint. Found 5 issues..."

Example 3: Edit String Not Found

Scenario:

User: "Change the timeout to 5000"

AI:
1. Edit(config.ts, old_string="timeout: 3000", new_string="timeout: 5000")
Error: old_string not unique (found 3 times)

Recovery:
2. Read(config.ts)
3. Find unique context around each occurrence
4. Ask user which one to change
OR auto-select based on context

Response: "Found 3 timeout values. Changed the API timeout to 5000."

Example 4: API Rate Limit

Scenario:

User: "Analyze all files in src/"

AI:
1. Processing files...
2. API Error: 429 Too Many Requests

Recovery:
3. Wait 2 seconds
4. Retry request
5. Continue processing

Response: "Analyzed 47 files (with brief rate limit pause)..."

Example 5: Permission Denied

Scenario:

User: "Create a log file"

AI:
1. Write(/var/log/app.log)
Error: Permission denied

Recovery:
2. Write(./logs/app.log)
Check: ./logs/ exists?
3. Bash("mkdir -p ./logs")
4. Write(./logs/app.log)
Success

Response: "Created log file at ./logs/app.log"

Error Prevention

Best Practices

DO:

Use confirmations for risky operations
Review changes before approving
Test in safe environment first
Keep backups (git commit often)
Monitor error logs

DON'T:

Disable confirmations for untrusted code
Ignore repeated errors
Run with elevated privileges unnecessarily
Modify production without testing

Pre-Execution Checks

Manual checks:

Before major changes:
> What files will this modify?
> Show me a summary of planned changes
Review and approve
> Proceed with changes

After changes:
> Run tests to verify
> Show me what changed (git diff)
> Any issues found?

Safe Practices

Use version control:

# Before AI makes changes
git commit -m "Before AI refactoring"

# AI makes changes
[Changes applied]

# If something goes wrong
git diff # Review changes
git restore . # Undo if needed

Test in isolation:

# Create test branch
git checkout -b test/ai-changes

# Let AI make changes
grok "refactor authentication"

# Review and test
npm test
git diff

# If good: merge
git checkout main
git merge test/ai-changes

# If bad: discard
git checkout main
git branch -D test/ai-changes

Debugging Errors

Enable Debug Mode

See detailed error information:

export GROK_DEBUG=true
grok

Debug output includes:

API request/response details
Tool call parameters
Error stack traces
Retry attempts
Internal state

Check Logs

Startup log:

cat xcli-startup.log

Contains:

Environment configuration
Loaded settings
MCP server status
Startup errors

Session files:

# View errors in session
cat ~/.grok/sessions/latest-session.json | jq '.messages[] | select(.error)'

Common Error Patterns

API key issues:

Error: 401 Unauthorized

Check:
1. Is GROK_API_KEY set?
echo $GROK_API_KEY
2. Is key valid?
Try in X.AI console
3. Is key in settings.json correct?
cat ~/.grok/settings.json

File operation issues:

Error: Permission denied

Check:
1. File permissions
ls -la <file>
2. Directory permissions
ls -la <directory>
3. Ownership
ls -la <file>

Network issues:

Error: ECONNREFUSED

Check:
1. Internet connection
ping api.x.ai
2. Proxy settings
echo $HTTP_PROXY
3. Firewall rules

Error Recovery Workflows

Workflow 1: Investigate and Fix

Error occurs
Enable debug mode
export GROK_DEBUG=true
Reproduce error
Review debug output
Identify root cause
Apply fix
Verify success

Workflow 2: Retry with Different Approach

1. Error: Initial approach fails
2. AI tries alternative
- Different file path
- Different command
- Different tool
3. Success or escalate to user

Workflow 3: Partial Success

1. Task: "Fix 10 bugs"
2. AI fixes 7 successfully
3. Error on 8th bug
4. AI reports:
"Fixed 7 of 10 bugs. Encountered error on bug #8.
Here's what was fixed: [...list...]
Remaining: Bug #8 needs manual review due to [reason]"

Error Messages

User-Friendly Messages

Good error messages:

"API key not found. Set GROK_API_KEY environment variable or use -k flag."
"Rate limit exceeded. Waiting 5 seconds before retry..."
"File not found: user-service.ts. Did you mean src/services/user-service.ts?"

Poor error messages:

"Error 401"
"Request failed"
"Unknown error"

Error Context

Grok One-Shot provides context:

Error: Edit failed

Context provided:
- What was being edited
- What change was attempted
- Why it failed
- What to try instead

Example:
"Failed to edit src/auth/middleware.ts:
Could not find exact match for 'const timeout = 3000'
Found similar: 'const apiTimeout = 3000'
Should I try editing that instead?"

Troubleshooting Guide

Error: "No API key found"

Cause: GROK_API_KEY not set

Solution:

export GROK_API_KEY="your-key"
# or
grok -k "your-key"

Error: "Rate limit exceeded"

Cause: Too many API requests

Solution:

Wait a few minutes
Reduce request frequency
Upgrade API plan

Error: "Too many tool rounds"

Cause: Hit MAX_TOOL_ROUNDS limit

Solution:

export MAX_TOOL_ROUNDS=500

Error: "File not found"

Cause: Incorrect file path

Solution:

AI will search for correct path
Or provide full path to AI

Error: "Permission denied"

Cause: No write permission

Solution:

# Fix permissions
chmod +w <file>

# Or use different location

Overview​

Error Categories​

1. API Errors​

2. File Operation Errors​

3. Command Execution Errors​

4. Tool Limit Errors​

Error Handling Strategies​

Implemented​

Partially Implemented​

Planned Features​

Error Recovery Examples​

Example 1: File Not Found​

Example 2: Command Not Found​

Example 3: Edit String Not Found​

Example 4: API Rate Limit​

Example 5: Permission Denied​

Error Prevention​

Best Practices​

Pre-Execution Checks​

Safe Practices​

Debugging Errors​

Enable Debug Mode​

Check Logs​

Common Error Patterns​

Error Recovery Workflows​

Workflow 1: Investigate and Fix​

Workflow 2: Retry with Different Approach​

Workflow 3: Partial Success​

Error Messages​

User-Friendly Messages​

Error Context​

Troubleshooting Guide​

Error: "No API key found"​

Error: "Rate limit exceeded"​

Error: "Too many tool rounds"​

Error: "File not found"​

Error: "Permission denied"​

See Also​

Overview

Error Categories

1. API Errors

2. File Operation Errors

3. Command Execution Errors

4. Tool Limit Errors

Error Handling Strategies

Implemented

Partially Implemented

Planned Features

Error Recovery Examples

Example 1: File Not Found

Example 2: Command Not Found

Example 3: Edit String Not Found

Example 4: API Rate Limit

Example 5: Permission Denied

Error Prevention

Best Practices

Pre-Execution Checks

Safe Practices

Debugging Errors

Enable Debug Mode

Check Logs

Common Error Patterns

Error Recovery Workflows

Workflow 1: Investigate and Fix

Workflow 2: Retry with Different Approach

Workflow 3: Partial Success

Error Messages

User-Friendly Messages

Error Context

Troubleshooting Guide

Error: "No API key found"

Error: "Rate limit exceeded"

Error: "Too many tool rounds"

Error: "File not found"

Error: "Permission denied"

See Also