AI 비교하기AI 사용하기AI 최신정보AI 커뮤니티
Our VisionTermsPrivacyContact

Optimizing DeepSeek V4 Pro for Coding Agents

Optimizing DeepSeek V4 Pro for Coding Agents

howardchen.substack.com
Thursday, June 18, 2026
  • •Howard Chen developed 'cwcode,' a Go-based terminal harness enabling V4 Pro to perform at 90% of Claude-level quality.
  • •Hash-anchored editing and caching optimizations allow V4 Pro to operate at 5% of the cost of competing models.
  • •The harness uses 'Plan mode' and 'Rewind' checkpoints to support multi-hour autonomous coding loops on production codebases.
  • •Howard Chen developed 'cwcode,' a Go-based terminal harness enabling V4 Pro to perform at 90% of Claude-level quality.
  • •Hash-anchored editing and caching optimizations allow V4 Pro to operate at 5% of the cost of competing models.
  • •The harness uses 'Plan mode' and 'Rewind' checkpoints to support multi-hour autonomous coding loops on production codebases.

Howard Chen (developer) details the implementation of 'cwcode,' a Go-based terminal harness designed to optimize the performance of the DeepSeek V4 Pro AI model for coding tasks. By utilizing 'hash-anchored' editing, the author reports a 30–40% reduction in output tokens and fewer retries, effectively lowering the barrier for using V4 Pro in autonomous loops. The harness addresses V4 Pro's limitations in planning and ambiguity tolerance through 'Plan mode,' which restricts the agent to read-only tools, and 'Rewind,' a checkpoint system allowing users to restore file states using SHA-256 hashes.

DeepSeek V4 Pro costs approximately 5% of Claude Sonnet 4's pricing, with input costs at $0.435 per million tokens compared to Claude's $3. The author achieves an 85%+ cache hit ratio in session-cumulative usage by ensuring byte-stable prompt prefixes, which involves sorting tool schemas and stripping reasoning content from outbound requests. These optimizations allow the team to perform 50-turn autonomous tasks at a cost between $0.40 and $0.80. The harness has been utilized for developing radiotherapy dose-prediction models and financial research agents, with the agent itself used to iterate on cwcode's own codebase.

The author emphasizes that agent failure is often a harness issue rather than a model issue. Key design principles include using 'hashlines' to edit code by reference, ensuring non-deterministic tool serialization does not break prompt caches, and replacing silent aborts with synthesized assistant messages when the model enters a failure loop. cwcode is implemented as 12k lines of Go using Bubbletea for terminal user interface components, with a decoupled 'Sink' interface allowing the agent loop to remain agnostic to the rendering environment. This approach allows developers to treat V4 Pro as a reliable daily driver for coding tasks by compensating for model weaknesses through structured harness logic.

Howard Chen (developer) details the implementation of 'cwcode,' a Go-based terminal harness designed to optimize the performance of the DeepSeek V4 Pro AI model for coding tasks. By utilizing 'hash-anchored' editing, the author reports a 30–40% reduction in output tokens and fewer retries, effectively lowering the barrier for using V4 Pro in autonomous loops. The harness addresses V4 Pro's limitations in planning and ambiguity tolerance through 'Plan mode,' which restricts the agent to read-only tools, and 'Rewind,' a checkpoint system allowing users to restore file states using SHA-256 hashes.

DeepSeek V4 Pro costs approximately 5% of Claude Sonnet 4's pricing, with input costs at $0.435 per million tokens compared to Claude's $3. The author achieves an 85%+ cache hit ratio in session-cumulative usage by ensuring byte-stable prompt prefixes, which involves sorting tool schemas and stripping reasoning content from outbound requests. These optimizations allow the team to perform 50-turn autonomous tasks at a cost between $0.40 and $0.80. The harness has been utilized for developing radiotherapy dose-prediction models and financial research agents, with the agent itself used to iterate on cwcode's own codebase.

The author emphasizes that agent failure is often a harness issue rather than a model issue. Key design principles include using 'hashlines' to edit code by reference, ensuring non-deterministic tool serialization does not break prompt caches, and replacing silent aborts with synthesized assistant messages when the model enters a failure loop. cwcode is implemented as 12k lines of Go using Bubbletea for terminal user interface components, with a decoupled 'Sink' interface allowing the agent loop to remain agnostic to the rendering environment. This approach allows developers to treat V4 Pro as a reliable daily driver for coding tasks by compensating for model weaknesses through structured harness logic.

Read original (English)·Jun 16, 2026
#deepseek#coding agent#harness#prompt caching#hashlines#go#autonomous loop