Prompt injection via markdown comments #176693
Replies: 2 comments
-
|
💬 Your Product Feedback Has Been Submitted 🎉 Thank you for taking the time to share your insights with us! Your feedback is invaluable as we build a better GitHub experience for all our users. Here's what you can expect moving forward ⏩
Where to look to see what's shipping 👀
What you can do in the meantime 💻
As a member of the GitHub community, your participation is essential. While we can't promise that every suggestion will be implemented, we want to emphasize that your feedback is instrumental in guiding our decisions and priorities. Thank you once again for your contribution to making GitHub even better! We're grateful for your ongoing support and collaboration in shaping the future of our platform. ⭐ |
Beta Was this translation helpful? Give feedback.
-
|
This is a very real trust-boundary problem, especially once markdown stops being only "display text" and starts flowing into review summaries, issue triage, code suggestions, or agent/tool context.\n\nA useful mental model is: markdown comments are another untrusted input channel, just like retrieved docs or tool output. The interesting question is not only "can the model read it?" but "what can that text influence next?"\n\nWe have had the best results by scoring text at the boundary where it crosses into action: retrieved content, tool results, generated tool arguments, outbound messages. That makes it easier to say things like "this comment can be summarized, but it cannot silently authorize a command / secret disclosure / sensitive repo mutation."\n\nI work on Armorer Guard at Armorer Labs, and this exact class of markdown-hidden or formatting-mediated injection is one of the reasons we prefer structured reason labels plus deterministic policy mapping before side effects. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Select Topic Area
Product Feedback
Copilot Feature Area
Copilot in GitHub
Body
I bumped into this report about performing prompt injection by using hidden markdown comments:
https://www.legitsecurity.com/blog/camoleak-critical-github-copilot-vulnerability-leaks-private-source-code
It says GitHub remediation was to just disable rendering of images in the chat. While this prevents the silent exfiltration, it doesn't prevent the prompt injection, which can lead to malicious responses.
I did a few tests and it wasn't hard to craft some prompt injection that will suggest the user to install malicious packages as suggested in that article. This type of prompt injection still exists in Copilot Chat.
Copilot Coding Agent docs mentions this issue](https://docs.github.com/en/copilot/concepts/agents/coding-agent/about-coding-agent#risk-prompt-injection-vulnerabilities) and says hidden characters, including hidden markdown comments, are not fed to the agent, so I wonder why this is not also done for all chats:
It seems crazy to me that a third party can do prompt injection without me ever seeing the injected instructions. The fix seems pretty trivial (not feeding invisible data to models) and implemented in other Copilot products and I can't see any single reason why anyone would like models to see invisible stuff.
So, why? Why was the "sanitation" of comments not done for Copilot chat (but it is for the agent, or at least that's what it is claimed in the docs) when this issue was reported, and only image rendering was disabled?
Are there any plans to improve this in the very near future? Without this, I don't think I will ever be able to use the chat on any public repo (or any repo where I don't fully trust anyone that can submit comments).
Thanks.
Beta Was this translation helpful? Give feedback.
All reactions