Several years ago I read an article about escaping form values posted by Ben Nadel on his site www.bennadel.com. Some discussion came up in the comments about allowing a limited set of html tags for paragraphs, bold text, and so on. I had a need to do this for forum comments on a site that I was working on. This site was written in Coldfusion so I was looking at some of the same options mentioned in Ben's article. I ended up doing something a little different though.
We were using TinyMCE for the forum comments. TinyMCE produces XHTML code so I was able to use Coldfusion's abilities to handle XML to accomplish this task. Using the XMLValidate function and an XML schema that was modified to accept a small list of tags and attributes the comments were correctly limited. Here is the relevant portion of the code. Below is an explanation of how it works.
The first line takes the content from the form and wraps a content tag around it. This is done because valid XML has to have a root element. Naming this element content was just an arbitrary decision. It could have been anything. Also before parsing non-breaking spaces are escaped in the content. If I remember right the non-breaking spaces were causing the XMLFormat function to error.
The next section is a cfxml tag containing the XML schema. This XML was created from some examples and then modified to include the desired tags.
The schema allows the following tags:
- any number of br, strong, em, ul, ol, u, strike, li, spans with style and class attributes,
- a tags with href, title, target, and class attributes,
- img tags with src, alt, height, and width attributes,
- p tags with align, class, and style attributes,
- and finally content tags
As mentioned above the content tag is just a container added to act as the root document element. Also the href attribute on a tags can only contain urls not javascript.
Then XMLValidate is called with the content and the schema. This returns a struct with information about the validity of the document. Finally the status key is used to decide if the content is acceptable or we need to reject it.
This article was rewritten from an article that I wrote several years ago on a different blog. That blog is no longer online. I think the information is still useful though. I have a few other articles from that old site that I am going to re-publish.
No comments:
Post a Comment