<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0"><channel><title><![CDATA[Anthropic 发布 Opus 4.7：四项核心升级，Token 成本上涨 35%]]></title><description><![CDATA[<p dir="auto">Anthropic 正式发布 Opus 4.7，目前最强的公开可用 Claude 模型。（内部仍有更高规格的 Mythos，暂不对外开放。）<br />
本次升级聚焦四个方向：</p>
<p dir="auto">任务自检：长任务完成后自动执行输出校验再返回结果，显著降低幻觉率。<br />
Token 预算控制：支持设定 Token 上限，模型自主分配思考与工具调用的资源占比，避免无效消耗。<br />
自适应思考深度：根据任务复杂度动态调整推理时长，无需手动配置。<br />
高分辨率图片输入：原生支持高清图像输入。</p>
<p dir="auto">注意：Opus 4.7 采用新版 Tokenizer，相同内容的 Token 消耗较上一代约增加 35%，使用前建议重新评估成本预算。</p>
<table class="table table-bordered table-striped">
<thead>
<tr>
<th>基准测试</th>
<th>Opus 4.7</th>
<th>Opus 4.6</th>
<th>GPT-5.4</th>
<th>Gemini 3.1 Pro</th>
<th>Mythos Preview</th>
</tr>
</thead>
<tbody>
<tr>
<td>Agentic coding (SWE-bench Pro)</td>
<td>64.3%</td>
<td>53.4%</td>
<td>57.7%</td>
<td>54.2%</td>
<td><strong>77.8%</strong></td>
</tr>
<tr>
<td>Agentic coding (SWE-bench Verified)</td>
<td>87.6%</td>
<td>80.8%</td>
<td>—</td>
<td>80.6%</td>
<td><strong>93.9%</strong></td>
</tr>
<tr>
<td>Agentic terminal coding (Terminal-Bench 2.0)</td>
<td>69.4%</td>
<td>65.4%</td>
<td>75.1% <em>(self-reported harness)</em></td>
<td>68.5%</td>
<td><strong>82.0%</strong></td>
</tr>
<tr>
<td>Multidisciplinary reasoning - Humanity’s Last Exam (no tools)</td>
<td>46.9%</td>
<td>40.0%</td>
<td>42.7% <em>(no tools Pro)</em></td>
<td>44.4%</td>
<td><strong>56.8%</strong></td>
</tr>
<tr>
<td>Multidisciplinary reasoning - Humanity’s Last Exam (with tools)</td>
<td>54.7%</td>
<td>53.3%</td>
<td>58.7% <em>(with tools Pro)</em></td>
<td>51.4%</td>
<td><strong>64.7%</strong></td>
</tr>
<tr>
<td>Agentic search (BrowseComp)</td>
<td>79.3%</td>
<td>83.7%</td>
<td><strong>89.3%</strong> <em>(Pro)</em></td>
<td>85.9%</td>
<td>86.9%</td>
</tr>
<tr>
<td>Scaled tool use (MCP-Atlas)</td>
<td><strong>77.3%</strong></td>
<td>75.8%</td>
<td>68.1%</td>
<td>73.9%</td>
<td>—</td>
</tr>
<tr>
<td>Agentic computer use (OSWorld-Verified)</td>
<td>78.0%</td>
<td>72.7%</td>
<td>75.0%</td>
<td>—</td>
<td><strong>79.6%</strong></td>
</tr>
<tr>
<td>Agentic financial analysis (Finance Agent v1.1)</td>
<td><strong>64.4%</strong></td>
<td>60.1%</td>
<td>61.5% <em>(Pro)</em></td>
<td>59.7%</td>
<td>—</td>
</tr>
<tr>
<td>Cybersecurity vulnerability reproduction (CyberGym)</td>
<td>73.1%</td>
<td>73.8%</td>
<td>66.3%</td>
<td>—</td>
<td><strong>83.1%</strong></td>
</tr>
<tr>
<td>Graduate-level reasoning (GPQA Diamond)</td>
<td>94.2%</td>
<td>91.3%</td>
<td>94.4% <em>(Pro)</em></td>
<td>94.3%</td>
<td><strong>94.6%</strong></td>
</tr>
<tr>
<td>Visual reasoning - CharXiv Reasoning (no tools)</td>
<td>82.1%</td>
<td>69.1%</td>
<td>—</td>
<td>—</td>
<td><strong>86.1%</strong></td>
</tr>
<tr>
<td>Visual reasoning - CharXiv Reasoning (with tools)</td>
<td>91.0%</td>
<td>84.7%</td>
<td>—</td>
<td>—</td>
<td><strong>93.2%</strong></td>
</tr>
<tr>
<td>Multilingual Q&amp;A (MMMLU)</td>
<td>91.5%</td>
<td>91.1%</td>
<td>—</td>
<td><strong>92.6%</strong></td>
<td>—</td>
</tr>
</tbody>
</table>
<p dir="auto">Opus 4.7 相比 4.6 全面提升，但在 BrowseComp（搜索）和 CyberGym（网安）两项上反而略有下滑。Scaled tool use 是 Opus 4.7 唯一明显领先所有对手的项目，体现了新预算控制机制的价值。Mythos 在几乎所有有数据的项目上都是最强，但大量栏位标注 “—”，选择性公开的意味很明显。</p>
]]></description><link>https://welinux.com//topic/10/anthropic-发布-opus-4.7-四项核心升级-token-成本上涨-35</link><generator>RSS for Node</generator><lastBuildDate>Sun, 03 May 2026 03:06:43 GMT</lastBuildDate><atom:link href="https://welinux.com//topic/10.rss" rel="self" type="application/rss+xml"/><pubDate>Sat, 18 Apr 2026 13:55:02 GMT</pubDate><ttl>60</ttl></channel></rss>