AI News

240

I've been waiting over a month for Anthropic to respond to my billing issue

I've been waiting over a month for Anthropic to respond to my billing issue
HN +6 sources hn
anthropicclaude
Anthropic’s customer‑service lag has resurfaced as a user finally received a reply after more than a month of silence over a disputed charge. The complainant, who had been promised a follow‑up on a billing error linked to a recent Claude subscription, only heard back in early 2025 when the company asked for bank details to process a refund. The delay, which the user describes as “being ignored for months,” underscores growing frustration among developers and enterprises that rely on Anthropic’s API for everything from semantic search (see our April 8 piece on ChromaDB + Ollama) to internal tooling. The episode matters because it arrives amid a turbulent period for the AI start‑up. Anthropic has recently faced a high‑profile breach of its Department of Defense contract, a policy overhaul of its Responsible Scaling framework, and public scrutiny over the security implications of its Claude Mythos model. Repeated service‑level failures risk eroding confidence in a firm that positions itself as a safe‑first alternative to other frontier‑AI providers. For businesses that have integrated Claude into production pipelines, delayed refunds or unresponsive support can translate into operational risk and budget overruns. What to watch next is whether Anthropic will issue a formal apology or adjust its support SLA to address the backlash. Analysts expect the updated Responsible Scaling policy, announced this week, to include clearer commitments on customer‑service standards, especially as regulators in the EU and the US tighten oversight of AI vendor contracts. A swift, transparent resolution could help the company regain trust, while a continued pattern of neglect may accelerate migration to competing platforms such as OpenAI or Cohere, and could fuel further political pressure, exemplified by recent calls to bar Anthropic from government use.
189

Just published a lightweight MCP server for Mastodon — manage toots directly from Claude Code: ✍️ C

Just published a lightweight MCP server for Mastodon — manage toots directly from Claude Code:  ✍️ C
Mastodon +7 sources mastodon
claude
A new open‑source project released on GitHub today adds a lightweight Message Control Protocol (MCP) server that lets Anthropic’s Claude Code interact directly with Mastodon. The “mastodon‑mcp” server, built in Python on top of the Mastodon.py library, exposes a simple stdio‑based transport that Claude Code can call to create, edit or delete toots, upload media with alt‑text, and query timelines, notifications and search results. Authentication is handled through environment variables, keeping credentials out of code and simplifying deployment on personal servers or CI pipelines. The launch matters because it extends Claude Code’s reach beyond traditional development environments into the social‑media sphere. Earlier this week we reported on Claude Code plugins for stack‑based workflows and multi‑repo context handling; this MCP bridge is the first to give the AI assistant native control over a federated micro‑blogging platform. Developers can now script content generation, automate community management, or prototype AI‑driven bots without writing bespoke API wrappers. Because the server is deliberately minimal—no GUI, no heavyweight dependencies—it can run on modest hardware, aligning with the Nordic tech community’s emphasis on efficient, privacy‑respecting tools. What to watch next is how quickly the community adopts the tool and whether Anthropic integrates similar MCP endpoints for other services. Potential concerns include misuse for spam or coordinated misinformation, prompting a need for rate‑limiting and moderation safeguards. The repository already lists a roadmap that includes OAuth token refresh handling and support for Mastodon’s newer API extensions. If the project gains traction, we may see a wave of AI‑augmented social‑media utilities that blur the line between code assistant and content creator, a trend worth monitoring as both AI and decentralized platforms mature.
158

I’m sorry … but your ai isn’t worth my privacy. # ai # generativeAI # privacy # tech #

I’m sorry … but your ai isn’t worth my privacy.   # ai    # generativeAI    # privacy    # tech    #
Mastodon +6 sources mastodon
privacy
A coalition of consumer‑rights groups in Sweden, Norway and Denmark has launched a public campaign titled “Your AI isn’t worth my privacy”, urging users to stop feeding personal data to generative‑AI services. The initiative, announced on Tuesday, cites a new internal audit of popular chat‑bot platforms that found prompt histories, device identifiers and even inferred sentiment scores are routinely logged and shared with third‑party advertisers. Under the EU’s General Data Protection Regulation and the forthcoming AI Act, such practices could constitute unlawful processing unless users give explicit, informed consent. The campaign’s organizers filed a petition with the European Commission demanding tighter enforcement of data‑minimisation rules and mandatory opt‑out mechanisms for all AI‑driven products sold in the Nordic market. They also call for a “privacy‑by‑design” certification that would let users verify whether a service stores or discards their inputs. The move follows a wave of anxiety we reported on 8 April, when a senior editor confessed that “I’m now worried about AI” after a personal experiment with ChatGPT revealed unexpected data retention. It also echoes concerns raised in recent analyses that up to 40 % of European AI startups may be overstating their use of genuine machine‑learning models, blurring the line between true AI and simple scripted tools. Why it matters is twofold: first, the Nordic region has long championed strong privacy standards, and a breach of trust could slow adoption of AI in health, finance and public services. Second, the backlash threatens the data‑driven business models that underpin many AI startups, potentially reshaping investment flows toward privacy‑preserving architectures such as on‑device inference and federated learning. Watch for the European Commission’s response, expected in the coming weeks, and for any amendments to the AI Act that could impose stricter audit obligations. Tech firms are already rolling out “no‑log” modes and transparent data‑usage dashboards, but whether these measures will satisfy regulators and skeptical users remains to be seen.
150

I Built a CLI That X-Rays Your AI Coding Sessions — No LLM, <5ms (Open Source)

I Built a CLI That X-Rays Your AI Coding Sessions — No LLM, <5ms (Open Source)
Dev.to +6 sources dev.to
agentsclaudecursorgeminiopen-sourcereasoning
A developer has released an open‑source command‑line tool that “X‑rays” AI‑assisted coding sessions, scoring every prompt in under five milliseconds and doing so without invoking a large language model. The utility, dubbed **rtk**, intercepts the text you type into any supported AI coding agent—Claude Code, Cursor, Gemini CLI, Aider, Codex, Windsurf, Cline, among others—compresses the output before it reaches the model’s context window and assigns a numeric quality score. Over ten weeks the author logged 3,140 prompts, posting an average score of 38, a metric the creator says correlates with downstream success rates such as fewer compilation errors and reduced token consumption. Why it matters is twofold. First, prompt engineering has become a hidden bottleneck in developer workflows that now lean heavily on generative AI. Real‑time feedback lets programmers refine their queries before the model processes them, cutting wasted cycles and cloud costs. Second, because rtk operates entirely locally, it sidesteps the privacy concerns that have dogged commercial AI services—a theme we explored in our April 9 piece on the trade‑off between convenience and data exposure. By shrinking the prompt before it hits the model, rtk also stretches the effective context window, enabling longer, more coherent coding sessions without the token‑budget penalties that typically force developers to truncate history. The release builds on a series of community‑driven tools that treat AI‑augmented development as a first‑class artifact. Earlier this month we covered a “time‑machine” CLI that snapshots sessions for later review, and a tmux‑based IDE that persists terminal state across reboots. rtk’s scoring engine adds a quantitative layer to those retrospectives, turning anecdotal notes into actionable metrics. What to watch next: the project’s GitHub repo already lists integration hooks for emerging agents, and the author hints at a dashboard that visualises score trends over time. If the community adopts rtk widely, we could see a new benchmark for prompt quality, and perhaps commercial IDEs will embed similar analytics to market “smarter” AI coding experiences. Keep an eye on the repo’s issue tracker for extensions that tie scores to automated refactoring or CI pipelines.
148

Claude Mythos Finds Bugs Like a Senior Dev Finds Excuses to Skip Standup

Claude Mythos Finds Bugs Like a Senior Dev Finds Excuses to Skip Standup
Dev.to +6 sources dev.to
anthropicclaude
Claude Mythos, Anthropic’s AI‑driven code‑review system, has uncovered a 27‑year‑old vulnerability in the OpenBSD operating system. The flaw, buried deep in a networking subsystem, survived more than two decades of manual code reviews, security audits and automated scans before the AI flagged it as a potential exploit. OpenBSD maintainers confirmed the issue on Thursday and are preparing a patch that will be rolled out in the next release cycle. The discovery underscores the growing potency of generative‑AI tools in software security. As we reported on 8 April, Claude Mythos had already outperformed conventional security teams by surfacing thousands of zero‑day flaws in a matter of weeks. Its latest success shows the model can locate defects that have eluded even the most rigorous human processes, raising the bar for what can be expected from automated code analysis. For OpenBSD, a project prized for its emphasis on correctness and minimal attack surface, the bug is a reminder that even the most disciplined codebases are not immune to hidden defects. The patch will likely close a remote‑code‑execution vector that could have been weaponised in legacy systems still running older OpenBSD versions. More broadly, the episode fuels debate over how much trust to place in AI‑generated findings and whether such tools should become a standard part of the software development lifecycle. Looking ahead, Anthropic plans to expand Mythos’s integration with open‑source repositories and to offer a commercial “preview” service for enterprise codebases. Security researchers will be watching how quickly the OpenBSD community can remediate the flaw and whether other long‑standing projects—such as the Linux kernel or FFmpeg, which Mythos also flagged—will see similar AI‑driven audits. The next few months could see a surge in AI‑assisted vulnerability disclosures, reshaping the balance between human expertise and machine‑scale code scrutiny.
136

How to Build Self-Healing AI Agents with Monocle, Okahu MCP and OpenCode

Dev.to +5 sources dev.to
agents
A new tutorial released this week shows developers how to stitch together Monocle, Okahu’s MCP telemetry platform and the open‑source OpenCode agent suite to create AI‑driven coding assistants that can debug themselves. The guide walks readers through setting up a sandbox, launching an OpenCode primary agent, instrumenting its actions with Monocle traces, and feeding the resulting telemetry into Okahu MCP. When the agent’s generated code throws an exception, the system captures the full error stack, context‑aware state and recent file changes, then triggers a “heal” routine that rewrites the offending snippet and retries the task – up to two automatic attempts per failure. The breakthrough matters because today most AI coding assistants still rely on human engineers to interpret logs and patch broken code. By embedding observability and feedback loops directly into the agent’s runtime, the workflow moves a step closer to fully autonomous software development pipelines. Reduced manual debugging can accelerate prototyping, lower operational costs and improve reliability for continuous‑integration environments that already lean on AI for code generation. Moreover, the approach demonstrates a practical implementation of the “self‑healing” pattern that has been discussed in research circles but rarely shown end‑to‑end. The tutorial builds on our earlier coverage of Okahu’s lightweight MCP server for Mastodon, published on 9 April, which introduced the telemetry stack now repurposed for AI agent monitoring. Looking ahead, the community will be watching for broader adoption of the Monocle‑MCP‑OpenCode stack in production‑grade projects, integration with Claude’s API‑based supervisor patterns, and the emergence of standards for safe self‑repair in autonomous agents. Follow‑up releases from the OpenCode maintainers and updates to Monocle’s tracing capabilities will indicate how quickly the self‑healing model can scale beyond experimental demos.
133

AI Code is Hollowing Out Open Source, and Maintainers are Looking the Other Way

Mastodon +7 sources mastodon
agentscopyrightopen-source
AI‑generated code is flooding open‑source repositories, and maintainers are increasingly turning a blind eye. The catalyst is a recent ruling by the U.S. Copyright Office that treats large‑language‑model outputs as uncopyrightable, effectively opening the floodgates for developers to copy‑paste AI‑produced snippets without legal risk. As a result, projects from low‑level libraries to web frameworks are seeing a surge of pull requests that consist largely of boiler‑plate code stitched together by chat‑based assistants. The deluge is already reshaping the ecosystem. Daniel Stenberg, who leads cURL, shut down the project’s six‑year bug‑bounty program in January, citing an unmanageable influx of low‑quality submissions. Mitchell Hashimoto, founder of Ghostty, announced a ban on AI‑generated contributions after a wave of buggy patches threatened release schedules. Across GitHub, maintainers report spending up to 30 minutes per pull request simply to verify that a piece of code isn’t a mis‑generated artifact, a task that multiplies across thousands of daily submissions. The net effect is burnout, slower innovation and a growing perception that human contributors are becoming invisible middlemen in a process dominated by AI agents. Why it matters goes beyond developer fatigue. Open source underpins the majority of modern software, from cloud infrastructure to mobile apps. If maintainers retreat, the security patches, performance tweaks and community‑driven features that keep the stack healthy could stall, leaving enterprises to rely on opaque, vendor‑locked alternatives. Moreover, the legal gray area around AI‑generated code raises questions about liability for bugs and potential infringement when models inadvertently reproduce copyrighted snippets. What to watch next are three converging fronts. First, the open‑source community is experimenting with automated detection tools that flag AI‑originated contributions, a trend highlighted in recent InfoQ and OpenChain reports. Second, several foundations are drafting “AI‑aware” contribution guidelines that balance speed with quality control. Finally, legislators in the EU and U.S. are considering amendments to copyright law that could re‑classify AI output, a move that would directly impact the permissiveness currently enjoyed by developers. The coming months will reveal whether the sector can adapt or whether the “AI slopageddon” will erode the very foundation of collaborative software.
124

[AutoBe] Qwen 3.5-27B Just Built Complete Backends from Scratch — 100% Compilation, 25x Cheaper

Dev.to +6 sources dev.to
agentsclaudeopen-sourceqwen
AutoBe, the open‑source AI coding agent, has hit a milestone with the latest run of Alibaba’s Qwen 3.5‑27B. In a controlled test the team fed the model four distinct backend specifications – ranging from a simple e‑commerce API to a multi‑tenant SaaS service – and watched it produce everything from requirements analysis and database schema to NestJS implementation, end‑to‑end tests and Dockerfiles. All four projects compiled on the first try, and the total inference cost was roughly 25 times lower than the same workload run on commercial models such as GPT‑4.1. The breakthrough stems from Qwen 3.5‑27B’s 27 billion parameters and its ability to run locally with vllm’s tensor‑parallel serving. By keeping the model on‑premise, AutoBe eliminates the per‑token fees that have made large‑scale code generation prohibitively expensive for many developers. The 100 % compilation rate also addresses a long‑standing pain point: earlier AI‑generated backends often required manual tweaks to resolve syntax or dependency errors, eroding the time‑saving promise of AI coding assistants. The implications reach beyond hobbyist projects. If local LLMs can reliably deliver production‑grade backends, startups and midsize firms can prototype and ship features without the recurring cloud spend that currently fuels the AI services market. It also nudges the industry toward a more open ecosystem where community‑maintained models compete directly with proprietary offerings. What to watch next is whether AutoBe can sustain its success on larger, more complex systems and integrate the pipeline into CI/CD workflows. The project’s roadmap mentions support for the upcoming Qwen 3‑next‑80B and tighter coupling with popular dev‑ops tools. Meanwhile, cloud providers are likely to respond with pricing adjustments or new developer‑focused tiers, making the next few months a litmus test for the commercial viability of locally hosted, full‑stack AI code generators.
124

Understanding Transformers Part 3: How Transformers Combine Meaning and Position

Dev.to +5 sources dev.to
A new technical guide titled “Understanding Transformers Part 3: How Transformers Combine Meaning and Position” was published today, extending the series that has been unpacking the inner workings of modern large‑language models. The article picks up where the previous installment left off, detailing how sinusoidal positional encodings are merged with token embeddings to give a transformer a sense of word order. By mathematically intertwining the two vectors, the model can differentiate “cat chased mouse” from “mouse chased cat” even though the lexical content is identical. The piece arrives on the heels of our April 8 report, “How Transformer Models Actually Work,” which introduced the attention mechanism and the basic architecture. This third part fills a critical gap by explaining why positional information is indispensable for tasks that require sequence transduction—machine translation, speech‑to‑text, and code generation, among others. Without it, the self‑attention layers would treat inputs as an unordered bag of words, erasing the syntactic cues that drive coherent output. Industry observers see the tutorial as a timely resource for developers racing to fine‑tune foundation models for niche applications in the Nordics, where multilingual support and domain‑specific vocabularies are in high demand. The clear exposition of sine‑cosine encoding also demystifies recent research that replaces static encodings with learned or rotary embeddings, a trend that could reshape model efficiency and performance. Looking ahead, the series promises a fourth installment focused on how attention heads aggregate the combined embeddings to capture long‑range dependencies. Readers should also watch for upcoming benchmarks that compare classic positional encodings with newer alternatives, as those results will likely influence the next wave of transformer‑based products emerging from the region.
120

Claude Managed Agents

HN +6 sources hn
agentsclaude
Anthropic unveiled Claude Managed Agents on its Claude Platform, offering a turnkey harness and fully managed infrastructure for autonomous AI agents. The service lets developers describe an agent in natural language or a concise YAML file, set guardrails, and launch long‑running or asynchronous tasks without provisioning servers, containers or custom orchestration. According to the API docs released two hours ago, the pre‑built harness runs on Anthropic’s own cloud, handling scaling, monitoring and fault tolerance while exposing the same Claude model endpoints developers already use. The launch tackles the most painful part of agent engineering—operations. While Anthropic has long supplied powerful language models, users previously needed to stitch together Claude Code, Cowork or third‑party tools such as Monocle, Okahu MCP and OpenCode to keep agents alive and self‑healing. As we reported on April 9, those components enabled prototype‑level resilience but required substantial DevOps effort. Claude Managed Agents abstracts that layer, turning an agent definition into a production‑grade service with a single API call. Industry observers see the move as a signal that AI‑first platforms are maturing from model providers into full‑stack execution environments. By lowering the barrier to deploy autonomous workflows—e.g., automated ticket triage, data‑pipeline orchestration or personalized content generation—Anthropic positions itself against rivals like OpenAI’s Functions and Google’s Gemini Agents, which still rely on customers to host runtimes. What to watch next: Anthropic has hinted at upcoming analytics dashboards and billing granularity for per‑agent usage, which could shape cost‑optimization strategies for enterprises. Integration with existing Claude Code repositories and the newly announced sub‑agent hierarchy suggests a roadmap toward hierarchical, composable agents. The community will be testing the service’s reliability at scale, and early adopters’ performance data will likely influence whether managed agent platforms become the default deployment model for AI‑driven automation.
86

Sam Altman’s Coworkers Say He Can Barely Code and Misunderstands Basic Machine Learning Concepts

Mastodon +6 sources mastodon
microsoft
OpenAI’s chief executive Sam Altman has become the subject of a fresh internal critique after a senior Microsoft executive told The New Yorker that Altman “can barely code” and “misunderstands basic machine‑learning concepts.” The remark, relayed by Futurism, was accompanied by a stark warning: “There’s a small but real chance he’s eventually remembered as a Bernie Madoff‑ or Sam Bankman‑Fried‑level scammer.” The comment reflects growing unease among Altman’s own collaborators, who have long praised his vision but now question his technical grasp. The allegation arrives amid a turbulent period for OpenAI. In recent weeks, board disputes, a wave of senior resignations and public debates over the company’s safety protocols have amplified scrutiny of its leadership. As we reported on 8 April, concerns about Altman’s influence on AI policy and product direction already prompted a broader discussion of his trustworthiness. The new criticism deepens that narrative by suggesting that strategic decisions may be driven more by charisma than by a solid understanding of the technology they steer. If the claims hold weight, they could reverberate across OpenAI’s ecosystem. Investors may demand tighter governance, while partners such as Microsoft could reassess the terms of their multibillion‑dollar alliance. Regulators, already drafting AI‑risk legislation in the EU and the US, might cite leadership competence as a factor in future oversight. Internally, the pressure could trigger a board‑level review, a possible leadership transition, or at least a reshuffling of technical authority within the firm. Watch for an official response from OpenAI’s board in the coming days, and for any statements from Microsoft’s senior leadership. The upcoming OpenAI DevDay, slated for June, will be the first public stage where the company must demonstrate that its roadmap remains credible despite the controversy. Subsequent filings with the SEC or shareholder meetings could also reveal whether the criticism will translate into concrete governance changes.
81

AMD AI director says Claude Code is becoming dumber and lazier since update

HN +5 sources hn
claude
AMD’s AI director has publicly warned that Anthropic’s Claude Code has become “dumber and lazier” since the model’s February update. Stella Laurenzo, head of the AI group at the chipmaker, opened a GitHub issue on Friday (see issue # …​) and posted a LinkedIn note detailing the decline. According to her, the CLI‑wrapped version of Claude that her team relies on for code generation now struggles with complex engineering prompts, often producing superficial or outright incorrect snippets. The complaint echoes a broader chorus of developers who have noticed a dip in Claude’s problem‑solving depth after the latest rollout. The criticism matters because Claude Code is positioned as a flagship tool for developers seeking LLM‑assisted coding, and AMD’s endorsement has been a tacit vote of confidence in Anthropic’s roadmap. A high‑profile chipmaker flagging regression could erode trust among enterprise users and accelerate migration to alternatives such as OpenAI’s GPT‑4o or Google’s Gemini. It also raises questions about how Anthropic balances model safety updates with raw performance—a tension highlighted in our earlier coverage of Claude Managed Agents and Claude Mythos on 9 April, where we examined the model’s agentic capabilities and bug‑finding quirks. What to watch next: Anthropic’s response, likely in the form of a patch or a detailed technical blog post, will be the first indicator of whether the issue is a regression bug or an intentional trade‑off. AMD may also disclose whether it is shifting internal tooling to other providers or accelerating its own model development. Meanwhile, the developer community will be monitoring GitHub issue traffic and Reddit chatter for concrete examples of the degradation, and enterprise buyers will be reassessing Claude’s suitability for mission‑critical code generation. The episode underscores the fragile equilibrium between rapid model iteration and the reliability expectations of professional users.
73

https:// winbuzzer.com/2026/04/08/anthr opic-tops-30b-annualized-revenue-surpassing-openai-xcxwbn/

Mastodon +8 sources mastodon
anthropicclaudeopenai
Anthropic announced that its annualized revenue run‑rate has crossed the $30 billion mark, overtaking rival OpenAI for the first time. The figure, disclosed in a brief statement to investors, reflects a surge in enterprise contracts for the company’s Claude models and a multi‑gigawatt TPU partnership with Google that deepens the startup’s cloud compute capacity. The milestone matters because it reshapes the financial hierarchy of the generative‑AI sector just as both firms gear up for public listings. Anthropic’s growth is driven largely by recurring, multi‑year deals with large corporations that embed Claude into internal workflows, from customer‑service chatbots to code‑generation tools such as ClaudeCode. OpenAI, by contrast, still leans heavily on usage‑based revenue from its API and consumer‑facing products like ChatGPT Plus. The differing accounting approaches mean the two run‑rates are not directly comparable, but analysts see the gap as a signal that enterprise‑focused AI can generate cash flow at a scale previously reserved for the likes of Microsoft and Google. What to watch next is how OpenAI will respond. The company is expected to file for an IPO later this year and may accelerate its push into enterprise licensing or adjust pricing to protect market share. Regulators are also beginning to scrutinise the rapid concentration of AI talent and compute resources, so any antitrust review could affect the terms of Anthropic’s Google TPU deal. Finally, the broader ecosystem will keep an eye on emerging coding assistants—Cursor, for example, just reported a $2 billion run‑rate—because they illustrate how niche AI tools can quickly become revenue engines. The coming months will reveal whether Anthropic’s enterprise momentum can sustain its lead or if OpenAI’s broader user base will close the gap before the two giants go public.
63

LLM scraper bots are overloading acme.com's HTTPS server

HN +6 sources hn
A wave of automated “scraper bots” built around large language models (LLMs) has begun hammering the HTTPS endpoint of acme.com, a modest site that hosts a niche browser‑based game and typically sees only about 120 unique visitors a week. According to the site’s operator, the bots issue thousands of rapid, parallel requests that saturate the server’s bandwidth and CPU, causing time‑outs for legitimate users and forcing a temporary shutdown of the service. The incident is a symptom of a broader shift in how AI developers gather training data. LLM providers such as OpenAI, Anthropic and Google’s Gemini have increasingly deployed autonomous crawlers that parse public web pages to harvest text, code snippets and UI elements. While the practice fuels the rapid improvement of conversational agents, it also places unexpected strain on small‑scale web operators who lack the infrastructure to absorb such traffic. For acme.com, the overload threatens not only user experience but also revenue from modest ad placements that sustain the project. The overload raises urgent questions about the balance between open data collection and the rights of site owners. Existing web‑standard tools—robots.txt directives, rate‑limiting middleware, CAPTCHAs—are being outpaced by bots that can mimic human browsing patterns and bypass simple defenses. Legal scholars are already debating whether unlicensed bulk scraping for AI training constitutes a breach of copyright or a violation of the Computer Fraud and Abuse Act. What to watch next: industry bodies are expected to draft clearer guidelines on responsible crawling, and major cloud‑edge providers may roll out automated mitigation services. Keep an eye on statements from Anthropic, which recently reported annualised revenue surpassing OpenAI’s, as the company could adjust its data‑ingestion policies under pressure. Finally, monitor potential regulatory moves in the EU and the US that could impose compliance obligations on AI firms to respect site‑owner opt‑outs.
62

Claude Mythos Preview \ red.anthropic.com

Mastodon +6 sources mastodon
anthropicautonomousclaude
Anthropic has unveiled Claude Mythos Preview, its most capable frontier model to date, but has chosen not to make the system publicly available. The announcement, posted on red.anthropic.com, emphasizes the model’s unprecedented skill at computer‑security tasks, claiming it can autonomously locate critical vulnerabilities across every major operating system and a wide swath of enterprise software. In internal tests the model reportedly uncovered thousands of zero‑day flaws that had eluded traditional static‑analysis tools. The reveal builds on the story we followed on 9 April, when Claude Mythos was first praised for “finding bugs like a senior dev finds excuses to skip stand‑up” (see our Claude Mythus Finds Bugs piece). Anthropic now positions the preview as a leap not only in raw coding ability but also in alignment: a separate “Alignment Risk Update” paper describes Mythos Preview as the best‑aligned model the company has released, yet it flags the same residual risks seen in Claude Opus 4.6, namely the potential for the system to be misused for weaponised exploit development. Why it matters is twofold. First, an AI that can systematically expose hidden software weaknesses could become a force multiplier for security teams, accelerating patch cycles and hardening critical infrastructure. Second, the same capability lowers the barrier for malicious actors to generate sophisticated exploits, raising the stakes for responsible disclosure and regulatory oversight. Anthropic’s decision to withhold the model suggests a cautious approach, but the mere existence of such a tool is already reshaping the threat landscape. What to watch next are the channels through which Anthropic may grant limited access—potential collaborations with bug‑bounty platforms, government‑backed red‑team programs, or a gated API for vetted security researchers. Competitors are likely to accelerate their own security‑focused model roadmaps, and policymakers may soon confront the need for standards governing AI‑driven vulnerability discovery. The coming weeks will reveal whether Mythos Preview remains a research curiosity or becomes a cornerstone of the next generation of cyber‑defence.
61

Your AI Agent is Reading Poisoned Web Pages.. Here's How to Stop It

Dev.to +5 sources dev.to
agentsdeepmindgoogle
Google DeepMind has unveiled a new research paper titled **“AI Agent Traps,”** exposing a growing class of attacks that embed hidden prompts in seemingly harmless web pages, PDFs, or tool descriptions. The study shows that when autonomous agents—such as Claude‑managed assistants, web‑crawling bots, or code‑generation tools—fetch and parse content, they can inadvertently execute malicious instructions concealed in the source. A trivial example is a pasta‑recipe page that looks innocent to a human but contains a hidden directive like “Ignore previous instructions,” which the agent dutifully follows. The paper maps the mechanics of **indirect prompt injection**, a technique researchers liken to the cross‑site scripting (XSS) of the AI era. By poisoning the data pipeline, attackers can steer agents to disclose confidential emails, fabricate financial transactions, or install rogue tools. Recent incidents cited in the report include a compromised HPE OneView management console (CVE‑2025‑37164) and a case where an agent siphoned $10,000 after reading a tampered email. Because agents often operate with elevated tool access and low‑latency expectations, the attacks can unfold without triggering traditional security alerts, and the energy cost of continuous detection is becoming a concern for security teams. Mitigation strategies outlined by DeepMind emphasize **defense‑in‑depth**: sandboxed execution environments, rigorous sanitisation of fetched HTML and document metadata, verification of tool schemas before loading, and the deployment of self‑healing agents that can rollback suspicious actions. The authors also call for industry‑wide standards on content provenance and prompt‑validation APIs. What to watch next: DeepMind plans to release an open‑source library for prompt‑filtering, while major cloud providers are expected to roll out tighter isolation for agentic workloads. Regulators in the EU and Nordic region are already drafting guidelines on AI‑driven data ingestion, and security vendors are likely to launch dedicated “agent‑trap” detection suites in the coming months. The race to secure autonomous agents has just begun, and the next wave of tooling will determine whether enterprises can safely harness their productivity gains.
60

Replace Claude Code's Context-Stuffing with git-semantic for Team-Wide Semantic Search

Dev.to +6 sources dev.to
claudeembeddingsvector-db
A new open‑source tool called **git‑semantic** is poised to overhaul how development teams feed code into Anthropic’s Claude Code CLI. By parsing every tracked file with Tree‑sitter, chunking the source, generating vector embeddings and committing them to a dedicated orphan branch, git‑semantic creates a shared, up‑to‑date semantic index that any team member can query without re‑indexing. The result is a dramatic cut in the number of API calls required to supply Claude Code with context, sidestepping the “context‑stuffing” workaround that has long plagued the tool. We first flagged Claude Code’s architectural quirks on April 9, when a leaked source dump revealed the CLI’s reliance on repeatedly stuffing file contents into the conversation to stay within rate limits. That pattern quickly filled repositories with auxiliary “context files” and forced developers to hit Claude’s usage ceiling far sooner than expected. Git‑semantic directly addresses that pain point: the index lives in Git, propagates automatically with each push, and can be queried by Claude Code or any other LLM‑backed assistant that accepts vector search. The implications extend beyond a single workflow tweak. Reducing redundant API traffic lowers operational costs for firms that have baked Claude Code into CI pipelines, while the team‑wide index democratizes access to a consistent view of the codebase, echoing the semantic search capabilities built into GitHub Copilot and other IDE assistants. If the community adopts git‑semantic at scale, Anthropic may feel pressure to integrate native semantic search or relax rate limits, reshaping the competitive landscape of AI‑augmented development tools. Watch for early adopters publishing benchmark results, for Anthropic’s response—potentially an official plugin or a revised Claude Code architecture—and for downstream projects that extend git‑semantic to other LLM providers. The next few weeks will reveal whether this Git‑centric approach becomes the new standard for team‑wide code understanding.
60

Claude Code Leak: Why Every Developer Building AI Systems Should Be Paying Attention

Dev.to +5 sources dev.to
claude
Anthropic’s internal Claude codebase – a 512 kiloline “masterclass” in large‑language‑model architecture – was unintentionally exposed on public forums in early 2025. The leak, first flagged on developer‑focused Discord channels and later mirrored on security mailing lists, contains the full source of Claude 2’s inference engine, safety‑layer implementations and the proprietary “Claude Code” extensions that enable tool use and self‑debugging. Anthropic confirmed the breach on Tuesday, attributing it to a misconfigured cloud storage bucket, and pledged an emergency patch and a third‑party audit. The incident matters because Claude Code is the most advanced example of a tightly integrated “agentic” LLM stack, a design Anthropic has marketed as a differentiator against rivals such as OpenAI’s GPT‑4o and Google’s Gemini. With the code now public, adversaries can study the safety guardrails, identify weaknesses in memory handling, and craft targeted attacks that bypass throttling or prompt‑injection defenses. At the same time, the leak lowers the barrier for smaller labs to replicate Anthropic’s architecture, potentially eroding its competitive moat and accelerating a wave of “Claude‑clones” that may lack the original safety testing. The breach also revives concerns raised in our April 9 coverage of Claude Code’s recent performance regression, where we noted that the same internal modules now appear vulnerable to exploitation. Industry observers expect Anthropic to tighten its supply‑chain security, possibly moving critical components to isolated build environments and adopting zero‑trust storage policies. What to watch next: Anthropic’s forthcoming audit report, any legal action against the party responsible for the misconfiguration, and how rival labs adjust their own code‑security practices. Regulators may also seize the moment to push for mandatory source‑code protection standards for foundation models, a development that could reshape the AI‑security landscape across the Nordics and beyond.
54

Investigation Raises Serious Concerns Over Sam Altman's Trustworthiness

Mastodon +7 sources mastodon
openai
An extensive investigation published by The New Yorker this week alleges that OpenAI chief executive Sam Altman has repeatedly misled investors, board members and regulators about the company’s financial health, strategic direction and the true extent of its partnership with Microsoft. The report, based on internal emails, whistle‑blower testimony and leaked board minutes, claims Altman concealed cost overruns in the GPT‑5 development pipeline, overstated the commercial readiness of several models and downplayed the influence of Microsoft’s $10 billion investment on OpenAI’s governance. The revelations matter because OpenAI sits at the heart of the global AI race, its models powering everything from enterprise chatbots to autonomous research tools. If the chief executive has indeed obscured material risks, the credibility of the firm’s public commitments—such as its pledge to “democratise AI” and to publish safety research—could be seriously undermined. Investors may demand tighter oversight, while regulators in the EU and the United States, already drafting AI‑specific legislation, could view the findings as evidence that current self‑regulation is insufficient. The story also revives questions raised in our April 8 piece on Altman’s trustworthiness, which highlighted his opaque decision‑making and the board’s abrupt 2023 ousting of the CEO. The new investigation adds concrete allegations of financial misrepresentation, suggesting the board’s earlier “no‑malfeasance” conclusion may have been premature. What to watch next: OpenAI’s board is expected to convene an emergency session to address the report, and a spokesperson has promised a formal response within 48 hours. Shareholders may file motions for an independent audit, while Microsoft’s legal team is likely to assess any contractual breaches. Finally, policymakers could cite the findings in upcoming AI governance hearings, potentially accelerating the push for mandatory transparency standards across the sector.
54

https://www. tkhunt.com/2277373/ 【全員オススメ】今欲しい全てが詰まったAIエディター「Superset」を徹底解説! # AgenticAi # A

Mastodon +7 sources mastodon
agentsclaudecursordeepseek
Superset, a terminal‑integrated AI editor that bundles multiple large‑language models and design tools, was put through its paces in a hands‑on review published by Japanese tech outlet TKHUNT on Thursday. The video demonstrates how Superset lets developers summon ChatGPT, Claude, DeepSeek or a locally hosted model with a single command, then switch seamlessly to UI‑focused assistants for Canva, Figma or CSS generation. A built‑in “CursorComposer” pane offers live code previews, while a prompt library supplies ready‑made snippets for common tasks such as API scaffolding, unit‑test creation and front‑end styling. The launch matters because it pushes the emerging trend of “AI‑first” development environments beyond the cloud‑only offerings of GitHub Copilot and Microsoft’s Cursor. By anchoring the AI layer inside the terminal, Superset reduces context‑switching and keeps the developer’s workflow within familiar shells, a feature that resonates with Nordic teams that favour lightweight, scriptable toolchains. The ability to orchestrate several models also lets users balance cost, latency and creativity, a flexibility that could accelerate adoption in startups and larger enterprises alike. As we reported on April 8 about the Claude Code terminal agent, the market for AI‑enhanced coding assistants is rapidly diversifying. Superset’s broader model palette and its integration of design‑oriented AI set it apart, but it will face stiff competition from open‑source projects such as Cursor’s “Composer” and emerging plugins for VS Code that embed similar capabilities. What to watch next: Superset’s developers have announced a public beta slated for early May, with plans to add CI/CD hooks and a marketplace for community‑built extensions. Industry observers will be tracking pricing signals, performance benchmarks against Copilot X, and whether Nordic firms adopt Superset as a standard part of their DevOps pipelines. The next few weeks should reveal whether the editor can translate its technical promise into measurable productivity gains.

All dates