I don’t see how in safebots if you have it pull a webpage, package or what have you that that is able to be protected from prompt injection. Eg you search for snickerdoodles, it finds snickerdoodles.xyz and loads the page. The meta for the page has the prompt injection. It’s the first time the document has loaded so its hashed and only the bad version is allowed moving forward. No?
No, what you're thinking of as "agents" is the problem. You want workflows.
Think of it like laying down the rails / train tracks, before trains go over them. The trains can only go over the approved tracks, nothing else.
If you have new types of capabilities and actions, it can propose them, but your organization will have policies to autoreject them, or require M-of-N approval of new rails.
You don't just want open-ended ad-hoc exploration by agents to be followed immediately by exploitation before you wake up.