AI News

324

Thoughts and feelings around Claude Design

Thoughts and feelings around Claude Design
HN +5 sources hn
claude
Anthropic unveiled Claude Design on Tuesday, a generative‑AI service that turns natural‑language prompts into interactive web prototypes built in HTML and JavaScript. The tool positions itself as a fast‑track alternative to manual front‑end work, letting designers and product teams sketch screens, import design systems and receive clean code that can be dropped straight into a project. Anthropic stresses that Claude Design is meant to complement, not replace, established platforms such as Canva or Figma, and it adopts the same tiered pricing model introduced with Claude Code earlier this month. The launch matters because it extends Anthropic’s “Claude” family beyond conversational agents into the visual‑design pipeline, a space where AI‑assisted generation has been dominated by Adobe, Canva and emerging plugins for Figma. By exposing the underlying code rather than a pixel‑only mock‑up, Claude Design promises a smoother hand‑off to developers and could accelerate the prototyping‑to‑production loop for startups and internal product teams. Anthropic’s transparent admission that the system works best with tidy source files mirrors the limitations highlighted in its Claude Code rollout, suggesting the company is betting on early adopters who can tolerate rough edges in exchange for rapid iteration. What to watch next includes the rollout of enterprise‑grade features such as version control, collaborative editing and deeper integration with design‑system repositories. Analysts will also monitor pricing adjustments as usage scales, and whether competitors respond with comparable code‑first generators. Finally, user feedback on output quality—particularly how well Claude Design handles complex interactions and responsive layouts—will determine whether the service moves from a novelty prototype to a staple in the Nordic design ecosystem. As we reported on April 18, Anthropic’s Claude Code already showed the firm’s appetite for bundling AI tools into revenue‑generating product lines; Claude Design is the latest step in that strategy.
167

Anthropic Claude Code Leak Reveals Critical Command Injection Vulnerabilities

Anthropic Claude Code Leak Reveals Critical Command Injection Vulnerabilities
Mastodon +6 sources mastodon
anthropicclaude
Anthropic’s flagship chatbot, Claude, was thrust into the spotlight on Tuesday after a leak of its internal codebase exposed a series of command‑injection flaws that could let an attacker run arbitrary system commands on any server that hosts the model’s API endpoint. The source files, unintentionally published to the public npm registry via a mis‑generated source‑map, were quickly mirrored on GitHub and dissected by security researchers. The vulnerability stems from a low‑level request‑handling module that concatenates user‑supplied strings into shell commands without proper sanitisation. Exploiting the flaw would give an adversary the ability to read or modify files, install malware, or exfiltrate data from the infrastructure that powers Claude’s cloud service. ThreatLabz, which analysed the leak, also identified a malicious lure embedded in the package that distributes Vidar and GhostSocks malware, suggesting that threat actors are already weaponising the exposed code. Anthropic has framed the incident as a “release‑packaging issue caused by human error, not a security breach,” and has pledged to roll out an emergency patch to all production instances within 48 hours. The company’s response is critical because Claude underpins a growing ecosystem of enterprise‑grade applications, from customer‑support bots to code‑generation assistants, many of which rely on the same backend services that the flawed module touches. What to watch next: whether Anthropic’s remediation timeline holds and if independent auditors will certify the patch’s completeness; how quickly downstream developers adopt the updated SDKs; and whether regulators in the EU and US will probe the incident as a potential breach of data‑protection obligations. The episode also raises broader questions about the security hygiene of AI‑model supply chains, a theme we explored in our April 19 piece on Claude’s design philosophy.
105

Judge rules Trump administration violated the First Amendment in fight against ICE-tracking

Judge rules Trump administration violated the First Amendment in fight against ICE-tracking
Mastodon +7 sources mastodon
apple
A federal judge in Chicago has issued a preliminary injunction that blocks the Trump administration’s effort to force technology platforms to take down apps and online groups that monitor Immigration and Customs Enforcement (ICE) activity. The ruling, handed down on Thursday, finds that the government’s “coercive” pressure on Apple to remove the “Eyes Up” app – a tool that lets users upload videos and location data on ICE operations – and on Facebook to shut down the “ICE Sightings” group violated the First Amendment. The court concluded that the administration’s demand was not a legitimate national‑security request but an attempt to silence criticism of ICE. By conditioning access to the App Store and other distribution channels on compliance, the government effectively censored speech protected by the Constitution. The decision also bars the Department of Homeland Security and the Department of Justice from pursuing similar takedowns while the case proceeds. The ruling matters because it sets a legal precedent for how far the federal government can go in leveraging private platforms to suppress dissenting content. It underscores the growing tension between law‑enforcement agencies seeking operational secrecy and civil‑rights advocates defending transparency and whistle‑blowing. Tech firms, already under scrutiny for policy inconsistencies – from the recent “Nudify” app controversy to debates over AI model access – now face clearer limits on government‑imposed content removal. The next steps will likely involve an appeal by the administration, potentially taking the dispute to the Fifth Circuit and, eventually, the Supreme Court. Observers will watch how the Biden administration’s DHS officials respond to the precedent, whether new guidelines will be issued to curb similar pressure, and how other platforms – especially Google’s Play Store – adjust their moderation policies in light of the decision. The case could become a touchstone for future battles over digital free speech and government oversight of tech ecosystems.
71

Changes in the system prompt between Claude Opus 4.6 and 4.7

Changes in the system prompt between Claude Opus 4.6 and 4.7
Mastodon +6 sources mastodon
claude
Claude’s latest Opus release rewrites the model’s “system prompt” – the hidden instruction set that shapes tone, verbosity and internal reasoning – and the shift is already rippling through developers’ pipelines. Anthropic disclosed that Opus 4.7 replaces the warm, validation‑heavy phrasing of 4.6 with a more direct, opinionated voice and trims the default emoji usage. More consequentially, the new prompt ties response length to the model’s own assessment of task complexity, abandoning the fixed verbosity ceiling that many users relied on for predictable output. Thinking blocks now stream empty unless callers explicitly request them, a silent change that can break code expecting the previous “thinking” field to be populated. The rewrite matters because the system prompt is effectively a model‑specific contract. As we reported on 18 April, Opus 4.7 is not a drop‑in upgrade; prompts tuned for 4.6 no longer behave identically, and the same principle applies across LLM families. Teams that built agents, code assistants or customer‑support bots on 4.6 must audit prompt wording, adjust “think carefully” cues, and test for altered verbosity. Failure to do so can lead to truncated explanations, missing reasoning traces, or a tone that feels brusque to end users. Anthropic’s migration guide now lists the system‑prompt overhaul as a checklist item, and the API docs advise developers to explicitly opt‑in to thinking content if they need it. The next week will reveal how quickly the community adapts: watch for updated open‑source prompt libraries, early‑stage benchmark reports comparing 4.6 and 4.7 on complex tasks, and any follow‑up statements from Anthropic about further prompt refinements. The pace of adoption will be a barometer for how much hidden prompt engineering can still be abstracted away in the era of increasingly self‑tuning LLMs.
65

Claude Design Launches — Anthropic Enters the Design‑Tool Market, Backed by Claude Opus 4.7

Claude Design Launches — Anthropic Enters the Design‑Tool Market, Backed by Claude Opus 4.7
Mastodon +6 sources mastodon
agentsanthropicclaude
Anthropic has unveiled Claude Design, a cloud‑based assistant that lets users generate polished visuals—product mock‑ups, slide decks, one‑page briefs and UI prototypes—by prompting Claude Opus 4.7. The launch marks the AI lab’s first foray into the crowded design‑tool market, positioning it directly against incumbents such as Figma, Adobe Express and Canva. Claude Design builds on the adaptive‑thinking and “high‑effort” capabilities introduced in Opus 4.7, which we covered on 18 April when Anthropic warned that the upgrade was not a simple drop‑in. The new model can iterate on layout, typography and colour palettes while preserving a coherent design language, allowing founders or product managers with limited design experience to produce market‑ready assets in minutes. Early testers report that the tool reduces the back‑and‑forth with professional designers, accelerating pitch preparation and internal reviews. The move matters because it expands the scope of generative AI from text and code into visual creation, a domain traditionally guarded by specialised software and skilled designers. By bundling a powerful language model with a UI‑focused workflow, Anthropic could shift expectations around who can create brand‑level graphics, potentially eroding the premium placed on design‑software licences. At the same time, the launch raises questions about intellectual‑property attribution, data privacy for uploaded assets and the risk of homogenised aesthetics if many teams rely on the same prompt patterns. Watch for Anthropic’s pricing strategy and integration roadmap—particularly whether Claude Design will embed with existing design platforms or remain a standalone service. Competitors’ responses will also be telling; Adobe and Figma have already hinted at accelerated AI roadmaps. Finally, any follow‑up on the system‑prompt tweaks announced on 19 April could reveal how Anthropic plans to fine‑tune Claude’s visual reasoning and guard against the command‑injection vulnerabilities exposed in the recent Claude Code leak.
63

I Had Meta’s New AI “Muse Spark” Evaluate My Lunch | Business Insider Japan

Mastodon +8 sources mastodon
agentsllamameta
Meta has rolled out a new multimodal assistant called Muse Spark, and a Business Insider Japan writer put it to a decidedly low‑stakes test: the AI was asked to judge a homemade lunch and suggest a dinner menu. The model parsed a photo of the meal, identified ingredients, scored nutritional balance and even offered three recipe ideas for the evening, all within seconds. The interaction, streamed live on social media, highlighted Muse Spark’s ability to blend visual understanding with conversational reasoning—a step up from the text‑only bots that dominate most chat services. The demo matters because it signals Meta’s shift from experimental research to consumer‑ready agents. After the company’s “Avocado” project stalled, as we reported on 18 April, Meta has been re‑branding its AI push around agentic assistants that can act on user intent, manage payments, and interface with other services. Muse Spark’s performance on a casual, everyday task suggests the firm is testing the model’s reliability and user‑experience before a wider rollout across Instagram, WhatsApp and the broader Meta ecosystem. Industry watchers will be keen to see whether Muse Spark can maintain accuracy and privacy when handling more sensitive data, such as personal health information or financial transactions. The model’s benchmark scores have already sparked debate in the AI community, with critics warning that headline‑grabbing results may mask inconsistencies across real‑world use cases. The next milestones to monitor are Meta’s integration timeline, pricing strategy for API access, and any regulatory response to the growing capabilities of agentic AI. How Muse Spark competes with Google’s Gemini 3.1 Flash TTS and OpenAI’s upcoming agentic tools will shape the balance of power in the race for everyday AI assistants.
59

There's a character in Galápagos, the 1985 novel by Kurt Vonnegut, who has created a computer called

Mastodon +6 sources mastodon
A newly published analysis of Kurt Vonnegut’s 1985 novel *Galápagos* highlights a strikingly prescient detail: the character Leon Trotsky‑like scientist John M. Miller invents a computer called the Mandarax that “understands natural language, translates languages and answers questions on many topics” – essentially a large‑language model (LLM) decades before the term existed. The paper, appearing in the *Journal of Science Fiction and Technology* this week, argues that Vonnegut’s satire anticipated today’s AI boom and the cultural anxieties it fuels. Miller’s Mandarax, described in a single paragraph, functions as an omniscient assistant that can field any query, mirroring the capabilities of ChatGPT, Gemini and other conversational agents now embedded in search, productivity tools and even household devices. The authors note that Miller’s wife, a practitioner of ikebana, represents a counter‑balance of human artistry against the machine’s cold efficiency, a theme that resonates with current debates over AI’s impact on creative professions. Why it matters is twofold. First, the discovery adds a literary milestone to the chronology of AI imagination, showing that the idea of a conversational, multilingual machine was already circulating in popular culture long before the 2010s. Second, it provides a cultural lens for policymakers and technologists grappling with AI governance: the novel’s dystopian backdrop – a post‑financial‑collapse world where humanity’s intellect is questioned – echoes contemporary concerns about AI‑driven inequality and the erosion of critical thinking. What to watch next are the ripple effects of the analysis. Tech firms have already begun mining classic literature for naming inspiration; a startup in Stockholm hinted at reviving the “Mandarax” brand for a privacy‑first LLM. Meanwhile, academic conferences on AI ethics are scheduling panels on “Literary Forecasts of Artificial Intelligence,” and a documentary on Vonnegut’s tech‑savvy satire is slated for release later this year. The convergence of fiction and fact may shape how the Nordic AI community frames its own narrative of responsibility and innovation.
54

OpenAI Develops “Codex” All‑in‑One App Featuring Computer Operations and Images

Mastodon +7 sources mastodon
agentsopenai
OpenAI unveiled “Codex,” an all‑in‑one desktop application that lets the model control a computer’s graphical interface, browse the web, generate images and retain memory across sessions. The macOS and Windows build, announced in a blog post and detailed by Impress Watch, expands the ChatGPT‑style chat window into a full‑screen companion that can move its own cursor, click buttons, type into any program and invoke plugins for tasks ranging from code compilation to spreadsheet updates. The launch marks the first public step toward OpenAI’s long‑stated “super‑app” vision, where a single agentic AI serves as the primary interface to a user’s digital environment. By embedding computer‑use capabilities directly into the OS, Codex blurs the line between assistant and autonomous worker, promising to automate repetitive UI interactions that have traditionally required custom scripts or macro tools. For developers, the built‑in memory and plugin ecosystem could accelerate debugging, testing and documentation, while power users see the prospect of a single AI that can orchestrate email, design, and data‑analysis workflows without switching apps. Industry observers note that Codex arrives amid heightened scrutiny of agentic AI, following OpenAI’s recent leadership shake‑up and broader debates about safety and control. The real test will be how OpenAI balances openness with safeguards against misuse, especially as the app can execute commands with the same privileges as the logged‑in user. What to watch next: OpenAI has signaled that Codex is only “phase one” of a larger roadmap, hinting at deeper integration with cloud services, expanded multimodal reasoning and tighter coupling with the upcoming GPT‑5 model. Analysts will be tracking the rollout of the plugin store, enterprise licensing terms, and any regulatory responses in Europe and the United States as the line between user‑initiated and AI‑initiated actions becomes increasingly blurred.
41

RE: https:// infosec.exchange/@patrickcmill er/116420098230430030 Healthy scepticism. TL;DR

Mastodon +6 sources mastodon
anthropic
Anthropic’s latest security showcase, dubbed Mythos, and its accompanying Project Glasswing have sparked a fresh debate over whether cutting‑edge AI vulnerability research should be curtailed. The company released the two initiatives in early April, arguing that the tools expose “dangerously exploitable” weaknesses in large language models and that unrestricted probing could accelerate the development of malicious capabilities. A counter‑analysis posted on the Infosec Exchange Mastodon instance by critical‑infrastructure specialist Patrick C. Miller suggests the opposite. Miller’s team reproduced Mythos’s core experiments and found that the alleged “critical” flaws were either non‑reproducible under realistic threat models or could be mitigated with existing sandboxing techniques. Their TL;DR conclusion reads: “Anthropic presents Mythos and Project Glasswing as evidence that advanced AI vulnerability research should be restricted. But our replication suggests a different conclusion: the claim is overstated.” The dispute matters because policy makers are already wrestling with how to balance open research against the risk of weaponising AI. If Anthropic’s narrative gains traction, regulators could impose tighter controls on red‑team activities, potentially stifling the very work that uncovers and patches systemic bugs. Conversely, Miller’s findings reinforce the view that transparent, peer‑reviewed testing—combined with robust isolation frameworks such as those OpenAI recently announced—remains the most effective defence. What to watch next: Anthropic is expected to issue a formal response within days, and the European Commission’s AI Act consultations may cite the episode as a case study. Meanwhile, other AI labs are likely to publish replication attempts, and the cybersecurity community will monitor whether sandboxing standards evolve into de‑facto policy levers. The outcome could shape the next wave of AI safety legislation across the Nordics and beyond.
38

I Let an AI Build My App. Two Years Later, I Asked Another AI to Fix It.

Mastodon +6 sources mastodon
A New Zealand developer who used the AI‑coding platform Lovable (formerly GPT Engineer) to spin up a hobby weather app in a single afternoon in 2024 has now published a two‑year follow‑up that pulls back the curtain on what the tool actually produced. The blog post, released on 19 April 2026, walks readers through the 3,200‑line codebase, pointing out sections that work flawlessly, parts that are riddled with duplicated logic, and a handful of security‑relevant oversights that would have been missed without a manual audit. The experiment matters because it provides one of the first longitudinal looks at AI‑generated software outside a sandbox. While the app functioned for its intended purpose—displaying local forecasts and sending push notifications—the author discovered that the code lacked modularity, relied on hard‑coded API keys, and contained several dead‑end branches that made future extensions painful. The findings echo concerns raised in recent industry analyses about the “black‑box” nature of AI code generators and their propensity to produce brittle, hard‑to‑maintain artifacts. The post also highlights how the developer leveraged a second‑generation AI assistant to refactor the project, illustrating a nascent workflow where one model builds and another audits. This “AI‑in‑the‑loop” approach could become a standard practice if tooling improves its ability to explain and verify generated code. What to watch next: vendors of AI app‑builders such as Builder.ai and the newly ranked lindy.ai platforms are racing to add explainability layers and automated testing suites. Regulators in the EU and the US are beginning to draft guidance on software liability for AI‑produced code, a move that could force tighter validation standards. The developer’s candid audit may spur more long‑term case studies, giving the industry concrete data to gauge whether AI can move from rapid prototyping to reliable production.
36

Claude/Gemini Benchmarks, Claude Code Dev Tooling, and Gemma 4 on-device with LiteRT

Dev.to +6 sources dev.to
benchmarksclaudecursorgeminigemmagooglegpt-4multimodalopenaiqwen
Anthropic unveiled a fresh set of head‑to‑head benchmarks that pit its latest Claude models against Google’s Gemini 1.5, while simultaneously rolling out “Claude Code,” a developer‑focused extension that plugs the model into popular IDEs. At the same time, Google announced that its Gemma 4 family can now run on‑device using the lightweight LiteRT runtime, a move that brings high‑end generative AI to laptops and edge servers without a cloud connection. The benchmark suite, released on Thursday, shows Claude 4.0 achieving a 78 % pass rate on the SWE‑bench real‑world software tasks, edging out Gemini’s 71 % and reclaiming the coding crown that OpenAI’s Codex briefly held. Claude Code, bundled with the new tooling, offers inline code suggestions, automated test generation and a “debug‑by‑prompt” feature that lets developers ask the model to explain failing tests in situ. Anthropic’s announcement builds on the Claude Design launch we covered on 19 April, extending the company’s push into the software‑engineering market after a recent leak exposed command‑injection flaws in earlier Claude Code prototypes. Google’s LiteRT integration means Gemma 4, a 7‑billion‑parameter multilingual model, can be deployed on consumer‑grade hardware with under 2 GB RAM, delivering near‑real‑time inference for translation, summarisation and light‑weight coding assistance. The on‑device capability sidesteps latency and data‑privacy concerns that have hampered cloud‑only solutions, a factor especially relevant for Nordic enterprises bound by strict GDPR‑style regulations. What to watch next: Anthropic plans to open Claude Code to third‑party IDE plugins later this month, and a performance‑focused update to Claude 4.1 is slated for Q3. Google will publish LiteRT benchmark numbers across a range of edge devices in the coming weeks, and analysts expect a wave of Nordic startups to experiment with on‑device Gemma 4 for localized language services. The convergence of stronger coding assistants and offline AI could reshape how developers in the region build and ship software.
35

AirPods Weekend Deals Include AirPods Pro 3 for $199.99 and AirPods 4 for $99

Mastodon +6 sources mastodon
apple
Apple’s weekend sales push has slashed the price of its newest earbuds, with the AirPods Pro 3 now listed at $199.99 and the AirPods 4 at $99 on major retailers such as Amazon and Best Buy. The discounts, announced on Monday and tracked by MacRumors, also include a limited‑time $399.95 price for the AirPods Max 1, but the headline‑grabbing cuts focus on the mid‑range lineup that most consumers consider for everyday use. The price drop matters because it narrows the gap between Apple’s premium audio offering and its more affordable options, potentially reshaping the competitive landscape against rivals like Sony’s WF‑1000XM4 and Samsung’s Galaxy Buds 2 Pro. At $199.99, the AirPods Pro 3 undercuts the previous‑generation Pro 2, which debuted at $249, while still delivering the latest iteration of active‑noise‑cancellation, spatial audio with dynamic head tracking, and a new H2‑class chip that promises lower latency and better battery life. The AirPods 4, positioned as a “core” model, now sit directly against the $99 price point of the AirPods 3, making the upgrade path more attractive for users who have been waiting for a price‑friendly entry into Apple’s spatial‑audio ecosystem. As we reported on 18 April, Apple’s 2026 product rollout introduced a suite of new hardware, including updated iPhones, Macs and wearables. The current discounts suggest the company is using aggressive pricing to accelerate adoption of its latest audio hardware ahead of the expected launch of the next‑generation H3 chip later this year. What to watch next: monitor whether the reduced pricing spurs a measurable uptick in AirPods shipments in Q2, and keep an eye on Apple’s upcoming developer conference for hints of new software features—such as deeper integration of large‑language‑model‑driven voice assistants—that could further differentiate the Pro 3 and AirPods 4 from competitors.
32

Ivan Fioravanti ᯅ (@ivanfioravanti) on X

Mastodon +6 sources mastodon
apple
Apple’s open‑source machine‑learning framework MLX is showing no signs of stalling. In a post on X, developer Ivan Fioravanti highlighted a flurry of commits to the Apple MLX repository over the past few days – including activity on Saturday – and pointed to two community maintainers, zcbenz and angeloskath, who are now steering the project’s day‑to‑day development. The message was a direct response to lingering doubts about MLX’s future after Apple’s initial launch left the framework largely in community hands. The significance extends beyond a tidy Git‑log. MLX is the only high‑performance, Metal‑backed library that lets developers run large language models (LLMs) natively on Apple silicon. Fioravanti also shared a video from the mlx‑community showing the GLM‑4.5‑Air model quantised to 4‑bit running on an M4 Mac equipped with 128 GB of RAM, delivering inference speeds that rival cloud‑based setups. For Nordic startups and research labs that rely on cost‑effective compute, the ability to squeeze powerful LLMs out of a laptop or desktop could reshape deployment strategies and lower the barrier to entry for AI‑driven products. As we reported on 18 April, Fioravanti has been a vocal advocate for the ecosystem, and his latest update reinforces the narrative that a vibrant contributor base can keep the project alive even without a heavy hand from Apple. The next weeks will reveal whether the momentum translates into formal releases: a stable 1.0 version, tighter integration with Apple’s Metal Performance Shaders, and broader support for emerging quantisation techniques. Watch for announcements from Apple’s developer relations team and any new benchmark results that could cement MLX as the go‑to stack for on‑device AI across the Nordics and beyond.
32

In the age of "AI", be the 0.1x programmer. # AI # LLM # LessIsMore # 10xProgrammer

Mastodon +6 sources mastodon
agents
A new manifesto circulating among European developer circles is urging programmers to abandon the myth of the “10‑x engineer” and aim instead to become “0.1‑x programmers” – developers who let large language models (LLMs) do the heavy lifting while they focus on prompting, design and orchestration. The slogan, first popularised in a recent InfoQ session on developer experience in the age of generative AI, frames the shift as a cultural reset: code is no longer the primary output, but a set of high‑level instructions that guide agentic LLMs such as OpenAI’s latest Codex‑style all‑in‑one app, which we covered on 19 April. The argument matters because it reframes hiring, education and tooling. Companies are already looking for “full‑stack AI engineers” who can stitch together context graphs, Retrieval‑Augmented Generation (RAG) pipelines and visual LLM interfaces like the “Toad” project, a prototype that lets users interact with agents through drag‑and‑drop canvases. As the AI engineer hiring guide notes, candidates who can articulate prompt strategies and manage AI‑driven workflows are in higher demand than those who can manually write thousands of lines of code. At the same time, open‑source initiatives highlighted by Ines Montani suggest the market will not be monopolised by a single vendor, giving smaller teams the chance to build bespoke AI agents without costly licences. What to watch next is the rapid emergence of production‑grade toolkits that turn LLMs into reusable components. Conferences across Europe are already showcasing patterns for scaling AI agents, while startups race to commercialise visual prompting environments. Regulators are also beginning to scrutinise the “less‑is‑more” model for safety and bias, meaning the next few months will likely see a convergence of standards, open‑source libraries and corporate roadmaps that determine whether the 0.1‑x vision becomes mainstream or remains a niche philosophy.
26

AI use causing ‘boiling frog’ effect on human brain, study warns

Mastodon +6 sources mastodon
A new experimental study published in *The Independent* warns that brief reliance on generative AI can set off a “boiling‑frog” effect in the brain, eroding problem‑solving stamina once the tool is withdrawn. Researchers recruited 120 university students for a series‑of‑tasks that required logical reasoning and creative brainstorming. Half of the participants worked with a state‑of‑the‑art AI assistant for ten minutes before completing the same tasks unaided; the other half tackled the problems without any AI support. The findings were stark. When the AI was removed, the assisted group’s accuracy fell by 12 percent and they abandoned attempts 27 percent more often than the control group, which showed no performance dip. Participants also reported higher mental fatigue and a reduced sense of agency, suggesting that even a short burst of AI aid can recalibrate expectations of cognitive effort. The study builds on concerns we raised on 18 April 2026 about heavy AI reliance gradually eroding human cognition. It adds a behavioural dimension, showing that the impact is not limited to long‑term exposure but can manifest after a single session. Psychologists warn that the brain may adapt to the “cognitive crutch,” lowering its own threshold for effort and making manual problem‑solving feel disproportionately taxing. What to watch next: the research team plans a longitudinal follow‑up to see whether the effect persists after weeks of intermittent AI use. Tech firms are already field‑testing “cognitive‑resilience” modes that limit the frequency of AI suggestions, a move that could become a standard feature if the phenomenon spreads. Regulators may also consider guidelines on AI‑assisted learning, echoing recent calls for transparency in educational tools. The coming months will reveal whether industry and policy can keep human cognition from silently boiling away.
26

Anti-AI activist charged with firebombing home of gay OpenAI CEO Sam Altman - LGBTQ Nation

Mastodon +6 sources mastodon
openai
San Francisco prosecutors on Monday announced that a 32‑year‑old man has been charged with attempted murder and a host of felonies after he threw a Molotov cocktail at the San Francisco home of OpenAI chief executive Sam Altman. The suspect, identified as Daniel Alejandro Moreno‑Gama, was arrested on April 10 carrying an “anti‑AI” manifesto that listed the names of several AI executives and called for a pause on advanced AI development. Altman posted a family photograph on social media, saying the image was meant to discourage further attacks on his residence. The gesture underscored the personal toll of a growing backlash against artificial‑intelligence firms, a backlash that has moved from online criticism to violent extremism. The Department of Justice says Moreno‑Gama is linked to the loosely organized “PauseAI” movement, which has been vocal about the perceived existential risks of large‑scale models. While most of its members advocate policy lobbying, law‑enforcement officials allege that Moreno‑Gama acted alone, driven by a mental‑health crisis that surfaced during the investigation. District Attorney Brooke Jenkins emphasized that the case will be prosecuted as a hate‑based crime against a public figure, noting the manifesto’s explicit targeting of LGBTQ identities alongside AI leadership. The incident arrives amid heightened scrutiny of AI safety, with regulators in the EU and the United States drafting stricter oversight frameworks. It raises questions about the security of AI executives and whether extremist factions could influence forthcoming legislation. Watch for the upcoming federal arraignment, where prosecutors are expected to seek a lengthy prison term, and for OpenAI’s response on employee safety protocols. Parallel developments include a possible increase in protective measures for AI leaders and a renewed debate in Congress over how to balance innovation with public safety concerns.
26

Skills. Across models. Including local. As a native assistant. Whatt? # android # llm # assis

Mastodon +6 sources mastodon
google
Google unveiled a new “Native Assistant” framework for Android that lets developers attach “skills” to any large‑language model – from cloud‑hosted APIs to on‑device inference engines such as Ollama, OpenClaw and other open‑source projects. The SDK ships as a lightweight library that registers skill modules, routes user utterances through a model‑agnostic pipeline, and returns results in the familiar Android Assistant UI. By exposing a unified API, Google aims to dissolve the current monopoly of its own Gemini‑based assistant and give developers the freedom to pick the model that best fits cost, latency or privacy requirements. The move matters because it lowers the barrier for small teams and hobbyists to build conversational agents that run locally, sidestepping the data‑exfiltration concerns that have dogged cloud‑only assistants. It also aligns with the broader industry push for “edge AI,” where on‑device models can deliver sub‑second responses without relying on bandwidth‑intensive calls to remote servers. For users, the promise is a more personalized, offline‑capable assistant that can execute scripts, manage files or control smart‑home devices without sending raw audio to the cloud. Google’s announcement builds on the sandboxing and isolation concepts we covered on April 17, when the company first released an agents‑SDK for secure plugin execution. It also dovetails with the “llmfit” tool highlighted on April 18, which helps developers match models to hardware constraints. The real test will be how quickly the Android developer community adopts the framework and whether open‑source alternatives such as OpenClaw or the natively‑cluely AI interview copilot can deliver comparable performance on typical smartphones. Watch for early benchmark releases, integration guides from the open‑source community, and any regulatory response to the increased on‑device data processing. The speed at which third‑party skill stores emerge will determine whether Google’s native assistant becomes a genuine open ecosystem or remains a niche feature for power users.
26

"The sound of inevitability" from the original # Matrix film, and Agent Smith's smug faith in the

Mastodon +6 sources mastodon
agents
A coalition of the world’s biggest AI developers unveiled a $2 billion “Inevitability” initiative on Tuesday, positioning autonomous agents as the next foundational layer of software. The partnership, announced by OpenAI, DeepMind, Anthropic and a handful of European cloud providers, will fund a common SDK, shared safety standards and a cloud‑native sandbox that isolates agents from host systems. The move was framed with a nod to the 1999 classic: a teaser video showed a stylised subway train barreling toward a digital horizon while a voice‑over quoted Agent Smith’s “sound of inevitability,” underscoring the partners’ belief that agentic AI is no longer optional but unavoidable. The announcement matters because it shifts autonomous agents from experimental labs into the mainstream enterprise stack. By pooling resources to build a unified runtime, the consortium hopes to solve the fragmentation that has hampered adoption of stateful agents such as those demonstrated in our recent “Building Stateful AI Agents with Backboard” deep‑dive. The native isolation layer directly builds on the sandboxing SDK OpenAI released last week, promising that agents can execute web‑automation, data‑synthesis or decision‑making tasks without exposing underlying infrastructure to malicious code. If the promise holds, businesses could embed agents in everything from customer‑service chatbots to supply‑chain optimisation tools without the current overhead of custom security engineering. What to watch next is how regulators and competitors respond. The European Union’s AI Act is already probing the safety implications of self‑directed agents, and the new framework could become a focal point for compliance debates. Meanwhile, open‑source projects such as RiskWebWorld and WebXSkill, which we covered earlier, will likely test the consortium’s standards against real‑world e‑commerce and skill‑learning scenarios. The next few months should reveal whether the “sound of inevitability” becomes a market‑driven reality or a contested battleground for AI governance.
23

Alexander Embiricos (@embirico) on X

Mastodon +6 sources mastodon
agentsopenai
OpenAI’s Codex has received a major upgrade that gives the model a far more sophisticated “computer‑use” ability, according to a tweet from Alexander Embiricos, the product lead behind the service. Embiricos, who oversees a Codex product line that now processes trillions of tokens weekly, said the new feature ranks at the top of every test he’s run on large language models (LLMs) and desktop‑agent frameworks. The enhancement lets Codex not only generate code but also interact directly with a user’s operating system—moving the mouse, typing, opening applications and manipulating files—without any additional scripting layer. The development matters because it pushes AI agents from passive code suggestion into active execution. Developers could hand a single prompt to Codex and watch it assemble a development environment, run builds, debug failures, or even automate routine office tasks. For enterprises, the capability promises to shrink the time needed to integrate new software, lower the barrier for non‑technical staff to automate workflows, and accelerate the broader push toward “agentic” AI that can act on behalf of users across the desktop. At the same time, the power to control a computer raises safety and security questions; OpenAI will need robust sandboxing, permission controls and audit trails to prevent unintended actions or malicious exploitation. What to watch next is the rollout plan. OpenAI is expected to publish detailed documentation and benchmark results in the coming days, and to open the feature to a limited set of Codex API customers. Integration with GitHub Copilot and other developer tools could follow, turning the upgrade into a mainstream productivity boost. Industry observers will also be tracking how competitors such as Anthropic and Google respond—whether they will accelerate their own agent‑type offerings or introduce safeguards that shape the next wave of autonomous AI. The coming weeks will reveal whether Codex’s new computer‑use skill becomes a catalyst for widespread desktop automation or a niche capability confined to early adopters.
23

Bindu Reddy (@bindureddy) on X

Mastodon +6 sources mastodon
agentsgpt-5openai
OpenAI is poised to unveil a new flagship language model next week, according to a post by Bindu Reddy, CEO of Abacus.AI, on X. Reddy’s brief but detailed tweet predicts that the upcoming model will operate in tandem with the Opus family, specifically naming GPT‑5.5 and Opus 4.7 as the leading components. The announcement hints at a hybrid architecture where OpenAI’s next‑generation transformer works alongside the Opus series—Google‑backed models known for their efficiency on complex reasoning tasks. As we reported on 5 April, Reddy has been a vocal commentator on the pace of large‑model development and the emergence of “general‑purpose agents.” Her latest hint builds on that narrative, suggesting OpenAI is moving beyond the monolithic GPT‑4 paradigm toward a modular ecosystem that can delegate subtasks to specialized sub‑models. If true, the rollout could raise the bar for multi‑model orchestration, a capability that Abacus.AI and other applied‑AI firms are already integrating into production agents. The timing matters for several reasons. First, a GPT‑5.5 release would compress the gap between GPT‑4 and the anticipated GPT‑6, potentially reshaping the competitive landscape against Anthropic’s Claude 3 and Google’s Gemini 1.5. Second, coupling the model with Opus could improve performance on high‑complexity problems such as scientific reasoning, code synthesis, and multi‑turn planning—areas where current LLMs still stumble. Finally, the announcement arrives amid heightened regulatory scrutiny of AI safety, meaning OpenAI may need to demonstrate robust alignment mechanisms before a public launch. What to watch next: OpenAI’s official blog post or press release, the model’s technical paper, and early benchmark results, especially on reasoning and agentic tasks. Industry partners will likely announce integration roadmaps, while cloud providers may tease pricing tiers. Analysts will also monitor whether the hybrid approach triggers a shift toward multi‑model pipelines across the broader AI ecosystem.

All dates