Sanitising User Input from Browser. part 1

I was working on a web based project recently where there was no security thought about when designing, developing it.
The following outlines my experience with retrofitting security.
It’s my hope that someone will find it useful for their own implementation.

We’ll be focussing on the client side in this post (part 1) and the server side in part 2.
We’ll also cover some preliminary discussion that will set the stage for this series.

The application consists of a WCF service delivering up content to some embedding code on any page in the browser.
The content is stored as Xml in the database and transformed into Html via Xslt.

The first activity I find useful is to go through the process of Threat Modelling the Application.
This process can be quite daunting for those new to it.
Here’s a couple of references I find quite useful to get started:

Actually this ones not bad either.

There is no single right way to do this.
The more you read and experiment, the more equipped you will be.
The idea is to think like an attacker thinks.
This may be harder for some than others, but it is essential, to cover as many potential attack vectors as possible.
Remember, there is no secure system, just varying levels of insecurity.
It will always be much harder to discover the majority of security weaknesses in your application as the person or team creating/maintaining it,
than for the person attacking it.
The Threat Modelling topic is large and I’m not going to go into it here, other than to say, you need to go into it.

Threat Agents

Work out who your Threat Agents are likely to be.
Learn how to think like they do.
Learn what skills they have and learn the skills your self.
Sometimes the skills are very non technical.
For example walking through the door of your organisation in the weekend because the cleaners (or any one with access) forgot to lock up.
Or when the cleaners are there and the technical staff are not (which is just as easy).
It happens more often than we like to believe.

Defense in Depth

To attempt to mitigate attacks, we need to take a multi layered approach (often called defence in depth).

What made me decide to start with sanitising user input from the browser anyway?
Well according to the OWASP Top 10, Injection and Cross Site Scripting (XSS) are still the most popular techniques chosen to compromise web applications.
So it makes sense if your dealing with web apps, to target the most common techniques exploited.

Now, in regards to defence in depth when discussing web applications;
If the attacker gets past the first line of defence, there should be something stopping them at the next layer and so forth.
The aim is to stop the attack as soon as possible.
This is why we focus on the UI first, and later move our focus to the application server, then to the database.
Bear in mind though, that what ever we do on the client side, can be circumvented relatively easy.
Client side code is out of our control, so it’s best effort.
Because of this, we need to perform the following not only in the browser, but as much as possible on the server side as well.

  1. Minimising the attack surface
  2. Defining maximum field lengths (validation)
  3. Determining a white list of allowable characters (validation)
  4. Escaping untrusted data, especially where you know it’s going to endup in an execution context. Even where you don’t think this is likely, it’s still possible.
  5. Using Stored Procedures / parameterised queries (not covered in this series).
  6. Least Privilege.
    Minimising the privileges assigned to every database account (not covered in this series).

Minimising the attack surface

input fields should only allow certain characters to be input.
Text input fields, textareas etc that are free form (anything is allowed) are very hard to constrain to a small white list.
input fields where ever possible should be constrained to well structured data,
like dates, social security numbers, zip codes, e-mail addresses, etc. then the developer should be able to define a very strong validation pattern, usually based on regular expressions, for validating such input. If the input field comes from a fixed set of options, like a drop down list or radio buttons, then the input needs to match exactly one of the values offered to the user in the first place.
As it was with the existing app I was working on, we had to allow just about everything in our free form text fields.
This will have to be re-designed in order to provide constrained input.

Defining maximum field lengths (validation)

This was currently being done (sometimes) in the Xml content for inputs where type="text".
Don’t worry about the inputType="single", it gets transformed.

<input id="2" inputType="single" type="text" size="10" maxlength="10" />

And if no maxlength specified in the Xml, we now specify a default of 50 in the xsl used to do the transformation.
This way we had the input where type="text" covered for the client side.
This would also have to be validated on the server side when the service received values from these inputs where type="text".

    <xsl:template match="input[@inputType='single']">
      <xsl:value-of select="@text" />
        <input name="a{@id}" type="text" id="a{@id}" class="textareaSingle">
          <xsl:attribute name="value">
              <xsl:when test="key('response', @id)">
                <xsl:value-of select="key('response', @id)" />
                <xsl:value-of select="string(' ')" />
          <xsl:attribute name="maxlength">
              <xsl:when test="@maxlength">
                <xsl:value-of select="@maxlength"/>

For textareas we added maxlength validation as part of the white list validation.
See below for details.

Determining a white list of allowable characters (validation)

See bottom of this section for Update

Now this was quite an interesting exercise.
I needed to apply a white list to all characters being entered into the input fields.
A user can:

  1. type the characters in
  2. [ctrl]+[v] a clipboard full of characters in
  3. right click -> Paste

To cover all these scenarios as elegantly as possible, was going to be a bit of a challenge.
I looked at a few JavaScript libraries including one or two JQuery plug-ins.
None of them covered all these scenarios effectively.
I wish they did, because the solution I wasn’t totally happy with, because it required polling.
In saying that, I measured performance, and even bringing the interval right down had negligible effect, and it covered all scenarios.

setupUserInputValidation = function () {

  var textAreaMaxLength = 400;
  var elementsToValidate;
  var whiteList = /[^A-Za-z_0-9\s.,]/g;

  var elementValue = {
    textarea: '',
    textareaChanged: function (obj) {
      var initialValue = obj.value;
      var replacedValue = initialValue.replace(whiteList, "").slice(0, textAreaMaxLength);
      if (replacedValue !== initialValue) {
        this.textarea = replacedValue;
        return true;
      return false;
    inputtext: '',
    inputtextChanged: function (obj) {
      var initialValue = obj.value;
      var replacedValue = initialValue.replace(whiteList, "");
      if (replacedValue !== initialValue) {
        this.inputtext = replacedValue;
        return true;
      return false;

  elementsToValidate = {
    textareainputelements: (function () {
      var elements = $('#page' + currentPage).find('textarea');
      if (elements.length > 0) {
        return elements;
      return 'no elements found';
    } ()),
    textInputElements: (function () {
      var elements = $('#page' + currentPage).find('input[type=text]');
      if (elements.length > 0) {
        return elements;
      return 'no elements found';
    } ())

  // store the intervals id in outer scope so we can clear the interval when we change pages.
  userInputValidationIntervalId = setInterval(function () {
    var element;

    // Iterate through each one and remove any characters not in the whitelist.
    // Iterate through each one and trim any that are longer than textAreaMaxLength.

    for (element in elementsToValidate) {
      if (elementsToValidate.hasOwnProperty(element)) {
        if (elementsToValidate[element] === 'no elements found')

        $.each(elementsToValidate[element], function () {
          $(this).attr('value', function () {
            var name = $(this).prop('tagName').toLowerCase();
            name = name === 'input' ? name + $(this).prop('type') : name;
            if (elementValue[name + 'Changed'](this))
              this.value = elementValue[name];
  }, 300); // milliseconds

Each time we change page, we clear the interval and reset it for the new page.



Update 2013-06-02:

Now with HTML5 we have the pattern attribute on the input tag, which allows us to specify a regular expression that the text about to be received is checked against. We can also see it here amongst the new HTML5 attributes . If used, this can make our JavaScript white listing redundant, providing we don’t have textareas which W3C has neglected to include the new pattern attribute on. I’d love to know why?

Escaping untrusted data

Escaped data will still render in the browser properly.
Escaping simply lets the interpreter know that the data is not intended to be executed,
and thus prevents the attack.

Now what we do here is extend the String prototype with a function called htmlEscape.

if (typeof Function.prototype.method !== "function") {
  Function.prototype.method = function (name, func) {
    this.prototype[name] = func;
    return this;

String.method('htmlEscape', function () {

  // Escape the following characters with HTML entity encoding to prevent switching into any execution context,
  // such as script, style, or event handlers.
  // Using hex entities is recommended in the spec.
  // In addition to the 5 characters significant in XML (&, <, >, ", '), the forward slash is included as it helps to end an HTML entity.
  var character = {
    '&': '&amp;',
    '<': '&lt;',
    '>': '&gt;',
    '"': '&quot;',
    // Double escape character entity references.
    // Why?
    // The XmlTextReader that is setup in XmlDocument.LoadXml on the service considers the character entity references () to be the character they represent.
    // All XML is converted to unicode on reading and any such entities are removed in favor of the unicode character they represent.
    // So we double escape character entity references.
    // These now get read to the XmlDocument and saved to the database as double encoded Html entities.
    // Now when these values are pulled from the database and sent to the browser, it decodes the & and displays #x27; and/or #x2F.
    // This isn't what we want to see in the browser.
    "'": '&amp;#x27;',    // &apos; is not recommended
    '/': '&amp;#x2F;'     // forward slash is included as it helps end an HTML entity

  return function () {
    return this.replace(/[&<>"'/]/g, function (c) {
      return character[c];

This allows us to, well, html escape our strings.


In looking through here,
The only untrusted data we are capturing is going to be inserted into an Html element

tag by way of insertion into a textarea element,
or the attribute value of input elements where type="text".
I initially thought I’d have to:

  1. Html escape the untrusted data which is only being captured from textarea elements.
  2. Attribute escape the untrusted data which is being captured from the value attribute of input elements where type="text".

RULE #2 – Attribute Escape Before Inserting Untrusted Data into HTML Common Attributes of here,
“Properly quoted attributes can only be escaped with the corresponding quote.”
So I decided to test it.
Created a collection of injection attacks. None of which worked.
Turned out we only needed to Html escape for the untrusted data that was going to be inserted into the textarea element.
More on this in a bit.

Now in regards to the code comments in the above code around having to double escape character entity references;
Because we’re sending the strings to the browser, it’s easiest to single decode the double encoded Html on the service side only.
Now because we’re still focused on the client side sanitisation,
and we are going to shift our focus soon to making sure we cover the server side,
we know we’re going to have to create some sanitisation routines for our .NET service.
Because the routines are quite likely going to be static, and we’re pretty much just dealing with strings,
lets create an extensions class in a new project in a common library we’ve already got.
This will allow us to get the widest use out of our sanitisation routines.
It also allows us to wrap any existing libraries or parts of them that we want to get use of.

namespace My.Common.Security.Encoding
    /// <summary>
    /// Provides a series of extension methods that perform sanitisation.
    /// Escaping, unescaping, etc.
    /// Usually targeted at user input, to help defend against the likes of XSS attacks.
    /// </summary>
    public static class Extensions
        /// <summary>
        /// Returns a new string in which all occurrences of a double escaped html character (that's an html entity immediatly prefixed with another html entity)
        /// in the current instance are replaced with the single escaped character.
        /// </summary>
        /// The new string.
        public static string SingleDecodeDoubleEncodedHtml(this string source)
            return source.Replace("&amp;#x", "&#x");

Now when we run our xslt transformation on the service, we chain our new extension method on the end.
Which gives us back a single encoded string that the browser is happy to display as the decoded value.

return Transform().SingleDecodeDoubleEncodedHtml();

Now back to my findings from the test above.
Turns out that “Properly quoted attributes can only be escaped with the corresponding quote.” really is true.
I thought that if I entered something like the following into the attribute value of an input element where type="text",
then the first double quote would be interpreted as the corresponding quote,
and the end double quote would be interpreted as the end quote of the onmouseover attribute value.

 " onmouseover="alert(2)

What actually happens, is during the transform…

xslCompiledTransform.Transform(xmlNodeReader, args, writer, new XmlUrlResolver());

All the relevant double quotes are converted to the double quote Html entity ‘”‘ without the single quotes.


And all double quotes are being stored in the database as the character value.

Libraries and useful code

Microsoft Anti-Cross Site Scripting Library

OWASP Encoding Project
This is the Reform library. Supports Perl, Python, PHP, JavaScript, ASP, Java, .NET

Online escape tool supporting Html escape/unescape, Java, .NET, JavaScript

The characters that need escaping for inserting untrusted data into Html element content

JavaScript The Good Parts: pg 90 has a nice ‘entityify’ function

OWASP Enterprise Security API Used for JavaScript escaping (ESAPI4JS)

JQuery plugin

Changing encoding on html page

Cheat Sheets and Check Lists I found helpful

If any of this is unclear, let me know and I’ll do my best to clarify. Maybe you have suggestions of how this could have been improved? Let’s spark a discussion.


Tags: , , , , , ,

2 Responses to “Sanitising User Input from Browser. part 1”

  1. Ernesto Says:

    Every weekend i used to pay a visit this site, as i wish for enjoyment, since this this web page conations actually nice funny data too.

  2. Up and Running with Kali Linux and Friends | Binarymist Says:

    […] Top 10 List (unchanged from 2010). I’d encourage any serious web developers to look at my Sanitising User Input From Browser. Part 1” and Part 2 Part 1 deals with the client side (untrused) code. Part 2 deals with the server side […]

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s