I have to build my web app against code injection. I find that the problem requires us to see input string used in several different context.
- In HTML/XML as text.
- In HTML/XML as an attribute inside the quote.
- In URL as query parameter.
In each context, there are different rule in escaping them. Since the data can move from one context to another, they have to be properly escaped in all cases.
To help test for proper escaping, I have come up with a string that has lots of special characters below. Put it in your test database and paste it in your input fields. Observe if this causes problem anywhere. In properly escaped system, the string should be transfered and reconstructed verbatim.
A related issue is whether your code support unicode correctly. I find it helpful to insert a string below into the test data to test it out right from the beginning.
\u4e09\u570b\u5fd7 or 三國志
2012.05.01 comments -