LLM Stories: Another Successful Jailbreak of Gemini - Removing Watermarks - Ambience

2026-03-30 | Source: Mastodon | Original article

A researcher on the Ambience blog has published a new “jailbreak” that strips the copyright watermark Google embeds in images generated by its Gemini model. By feeding the model a crafted prompt and then applying the open‑source GeminiWatermarkTool – a reverse‑alpha‑blending algorithm that reconstructs the original pixel data before a lightweight AI cleanup – the author can output a clean picture that looks identical to the watermarked version but without any attribution trace. The technique builds on a series of recent Gemini exploits that manipulate prompts to bypass the model’s built‑in guardrails. While earlier work focused on extracting hidden system instructions or forcing the model to reveal proprietary prompts, this latest effort targets the visual output layer, directly undermining Google’s effort to embed provenance metadata in AI‑generated art. The ability to erase watermarks raises immediate concerns for copyright enforcement, as it could enable the unlicensed redistribution of AI‑created images and complicate the tracking of content origin. Google’s Gemini rollout in Hong Kong, which we covered on 26 March, was marketed as a “responsibly built” assistant with strong safety controls. The new jailbreak shows that even freshly released models can be coaxed into violating their own usage policies, highlighting a gap between advertised safeguards and real‑world robustness. For creators and rights‑holders, the development signals that technical watermarking alone may not be sufficient protection against misuse. What to watch next: Google is expected to issue a patch to the Gemini API and may tighten prompt‑filtering rules. The company’s legal team could also respond to growing scrutiny from copyright organisations that view watermark removal as a circumvention of digital rights management. Meanwhile, the broader AI‑jailbreak community is publishing ever more sophisticated prompt libraries, suggesting that the arms race between model developers and adversarial users will intensify throughout the year.

Sources

Back to AIPULSEN