Archive for the ‘Architecture’ Category

Sanitising User Input from Browser. part 2

November 16, 2012

Untrusted data (data entered by a user), should always be treated as though it contains attack code.
This data should not be sent anywhere without taking the necessary steps to detect and neutralise the malicious code.
With applications becoming more interconnected, attacks being buried in user input and decoded and/or executed by a downstream interpreter is becoming all the more common.
Input validation, that’s restricting user input to allow only certain white listed characters and restricting field lengths are only two forms of defence.
Any decent attacker can get around client side validation, so you need to employ defence in depth.
validation and escaping also needs to be performed on the server side.

Leveraging existing libraries

  1. Microsofts AntiXSS is not extensible,
    it doesn’t allow the user to define their own whitelist.
    It didn’t allow me to add behaviour to the routines.
    I want to know how many instances of HTML encoded values there were.
    There was certainly a lot of code in there, but I didn’t find it very useful.
  2. The OWASP encoding project (Reform)(as mentioned in part 1 of this series).
    This is quite a useful set of projects for different technologies.
  3. System.Net.WebUtility from the System.Web.dll.
    Now this did most of what I needed other than provide me with fine grained information of what had been tampered with.
    So I took it and extended it slightly.
    We hadn’t employed AOP at this stage and it wasn’t considered important enough to invest the time to do so.
    So it was a matter of copy past modify.

What’s the point in client side validation if the server has to do it again anyway?

Now there are arguments both ways for this.
My current take on this for the project in question was:
If you only have server side validation, the client side is less responsive and user friendly.
If you only have client side validation, it’s out of our control.
This also gives fuel to the argument of using JavaScript on the client and server side (with the likes of node.js).
So the same code can be used both sides without having to code the same validation in two different languages.
Personally I find writing validation code easier using JavaScript than C#.
This maybe just because I’ve been writing considerably more JavaScript than C# lately though.

The code

I drew a sequence diagram of how this should work, but it got lost in a move.
So I wasn’t keen on doing it again, as the code had already been done.
In saying that, the code has reasonably good documentation (I think).
Code is king, providing it has been written to be read.
If you notice any of the escaping isn’t quite making sense, it could be the blogging engine either doing what it’s meant to, or not doing what it’s meant to.
I’ve been over the code a few times, but I may have missed something.
Shout out if anything’s not clear.

First up, we’ll look at the custom exceptions as we’ll need those soon.

using System;

namespace Common.WcfHelpers.ErrorHandling.Exceptions
{
    public abstract class WcfException : Exception
    {
        /// <summary>
        /// In order to set the message for the client, set it here, or via the property directly in order to over ride default value.
        /// </summary>
        /// <param name="message">The message to be assigned to the Exception's Message.</param>
        /// <param name="innerException">The exception to be assigned to the Exception's InnerException.</param>
        /// <param name="messageForClient">The client friendly message. This parameter is optional, but should be set.</param>
        public WcfException(string message, Exception innerException = null, string messageForClient = null) : base(message, innerException)
        {
            MessageForClient = messageForClient;
        }

        /// <summary>
        /// This is the message that the service's client will see.
        /// Make sure it is set in the constructor. Or here.
        /// </summary>
	    public string MessageForClient
        {
            get { return string.IsNullOrEmpty(_messageForClient) ? "The MessageForClient property of WcfException was not set" : _messageForClient; }
            set { _messageForClient = value; }
        }
        private string _messageForClient;
    }
}

And the more specific SanitisationWcfException

using System;
using System.Configuration;

namespace Common.WcfHelpers.ErrorHandling.Exceptions
{
    /// <summary>
    /// Exception class that is used when the user input sanitisation fails, and the user needs to be informed.
    /// </summary>
    public class SanitisationWcfException : WcfException
    {
        private const string _defaultMessageForClient = "Answers were NOT saved. User input validation was unsuccessful.";
        public string UnsanitisedAnswer { get; private set; }

        /// <summary>
        /// In order to set the message for the client, set it here, or via the property directly in order to over ride default value.
        /// </summary>
        /// <param name="message">The message to be assigned to the Exception's Message.</param>
        /// <param name="innerException">The Exception to be assigned to the base class instance's inner exception. This parameter is optional.</param>
        /// <param name="messageForClient">The client friendly message. This parameter is optional, but should be set.</param>
        /// <param name="unsanitisedAnswer">The user input string before service side sanitisatioin is performed.</param>
        public SanitisationWcfException
        (
            string message,
            Exception innerException = null,
            string messageForClient = _defaultMessageForClient,
            string unsanitisedAnswer = null
        )
            : base(
                message,
                innerException,
                messageForClient + " If this continues to happen, please contact " + ConfigurationManager.AppSettings["SupportEmail"] + Environment.NewLine
                )
        {
            UnsanitisedAnswer = unsanitisedAnswer;
        }
    }
}

Now as we define whether our requirements are satisfied by way of executable requirements (unit tests(in their rawest form))
Lets write some executable specifications.

using NUnit.Framework;
using Common.Security.Sanitisation;

namespace Common.Security.Encoding.UnitTest
{
    [TestFixture]
    public class ExtensionsTest
    {

        private readonly string _inNeedOfEscaping = @"One #x2F / two amp & three #x27 ' four lt < five quot "" six gt >.";
        private readonly string _noNeedForEscaping = @"One x2F two amp three x27 four lt five quot six gt       .";

        [Test]
        public void SingleDecodeDoubleEncodedHtml_ShouldSingleDecodeDoubleEncodedHtml()
        {
            string doubleEncodedHtml = @"";               // between the ""'s we have a string of Html with double escaped values like &amp;#x27; user entered text &amp;#x2F.
            string singleEncodedHtmlShouldLookLike = @""; // between the ""'s we have a string of Html with single escaped values like ' user entered text &#x2F.
            // In the above, the bloging engine is escaping the sinlge escaped entity encoding, so all you'll see is the entity it self.
            // but it should look like the double encoded entity encodings without the first &amp->;


            string singleEncodedHtml = doubleEncodedHtml.SingleDecodeDoubleEncodedHtml();
            
            Assert.That(singleEncodedHtml, Is.EqualTo(singleEncodedHtmlShouldLookLike));
        }

        [Test]
        public void Extensions_CompliesWithWhitelist_ShouldNotComply()
        {
            Assert.That(_inNeedOfEscaping.CompliesWithWhitelist(whiteList: @"^[\w\s\.,]+$"), Is.False);
        }

        [Test]
        public void Extensions_CompliesWithWhitelist_ShouldComply()
        {
            Assert.That(_noNeedForEscaping.CompliesWithWhitelist(whiteList: @"^[\w\s\.,]+$"), Is.True);
            Assert.That(_inNeedOfEscaping.CompliesWithWhitelist(whiteList: @"^[\w\s\.,#/&'<"">]+$"), Is.True);
        }
    }
}

Now the code that satisfies the above executable specifications, and more.

using System;
using System.Collections.Generic;
using System.Globalization;
using System.IO;
using System.Text.RegularExpressions;

namespace Common.Security.Sanitisation
{
    /// <summary>
    /// Provides a series of extension methods that perform sanitisation.
    /// Escaping, unescaping, etc.
    /// Usually targeted at user input, to help defend against the likes of XSS and other injection attacks.
    /// </summary>
    public static class Extensions
    {

        private const int CharacterIndexNotFound = -1;

        /// <summary>
        /// Returns a new string in which all occurrences of a double escaped html character (that's an html entity immediatly prefixed with another html entity)
        /// in the current instance are replaced with the single escaped character.
        /// </summary>
        /// <param name="source">The target text used to strip one layer of Html entity encoding.</param>
        /// <returns>The singly escaped text.</returns>
        public static string SingleDecodeDoubleEncodedHtml(this string source)
        {
            return source.Replace("&amp;#x", "&#x");
        }
        /// <summary>
        /// Filter a text against a regular expression whitelist of specified characters.
        /// </summary>
        /// <param name="target">The text that is filtered using the whitelist.</param>
        /// <param name="alternativeTarget"></param>
        /// <param name="whiteList">Needs to be be assigned a valid whitelist, otherwise nothing gets through.</param>
        public static bool CompliesWithWhitelist(this string target, string alternativeTarget = "", string whiteList = "")
        {
            if (string.IsNullOrEmpty(target))
                target = alternativeTarget;
            
            return Regex.IsMatch(target, whiteList);
        }
        /// <summary>
        /// Takes a string and returns another with a single layer of Html entity encoding replaced with it's Html entity literals.
        /// </summary>
        /// <param name="encodedUserInput">The text to perform the opperation on.</param>
        /// <param name="numberOfEscapes">The number of Html entity encodings that were replaced.</param>
        /// <returns>The text that's had a single layer of Html entity encoding replaced with it's Html entity literals.</returns>
        public static string HtmlDecode(this string encodedUserInput, ref int numberOfEscapes)
        {
            const int NotFound = -1;

            if (string.IsNullOrEmpty(encodedUserInput))
                return string.Empty;

            StringWriter output = new StringWriter(CultureInfo.InvariantCulture);
            
            if (encodedUserInput.IndexOf('&') == NotFound)
            {
                output.Write(encodedUserInput);
            }
            else
            {
                int length = encodedUserInput.Length;
                for (int index1 = 0; index1 < length; ++index1)
                {
                    char ch1 = encodedUserInput[index1];
                    if (ch1 == 38)
                    {
                        int index2 = encodedUserInput.IndexOfAny(_htmlEntityEndingChars, index1 + 1);
                        if (index2 > 0 && encodedUserInput[index2] == 59)
                        {
                            string entity = encodedUserInput.Substring(index1 + 1, index2 - index1 - 1);
                            if (entity.Length > 1 && entity[0] == 35)
                            {
                                ushort result;
                                if (entity[1] == 120 || entity[1] == 88)
                                    ushort.TryParse(entity.Substring(2), NumberStyles.AllowHexSpecifier, NumberFormatInfo.InvariantInfo, out result);
                                else
                                    ushort.TryParse(entity.Substring(1), NumberStyles.AllowLeadingWhite | NumberStyles.AllowTrailingWhite | NumberStyles.AllowLeadingSign, NumberFormatInfo.InvariantInfo, out result);
                                if (result != 0)
                                {
                                    ch1 = (char)result;
                                    numberOfEscapes++;
                                    index1 = index2;
                                }
                            }
                            else
                            {
                                index1 = index2;
                                char ch2 = HtmlEntities.Lookup(entity);
                                if ((int)ch2 != 0)
                                {
                                    ch1 = ch2;
                                    numberOfEscapes++;
                                }
                                else
                                {
                                    output.Write('&');
                                    output.Write(entity);
                                    output.Write(';');
                                    continue;
                                }
                            }
                        }
                    }
                    output.Write(ch1);
                }
            }
            string decodedHtml = output.ToString();
            output.Dispose();
            return decodedHtml;
        }
        /// <summary>
        /// Escapes all character entity references (double escaping where necessary).
        /// Why? The XmlTextReader that is setup in XmlDocument.LoadXml on the service considers the character entity references (&#xxxx;) to be the character they represent.
        /// All XML is converted to unicode on reading and any such entities are removed in favor of the unicode character they represent.
        /// </summary>
        /// <param name="unencodedUserInput">The string that needs to be escaped.</param>
        /// <param name="numberOfEscapes">The number of escapes applied.</param>
        /// <returns>The escaped text.</returns>
        public static unsafe string HtmlEncode(this string unencodedUserInput, ref int numberOfEscapes)
        {
            if (string.IsNullOrEmpty(unencodedUserInput))
                return string.Empty;

            StringWriter output = new StringWriter(CultureInfo.InvariantCulture);
            
            if (output == null)
                throw new ArgumentNullException("output");
            int num1 = IndexOfHtmlEncodingChars(unencodedUserInput);
            if (num1 == -1)
            {
                output.Write(unencodedUserInput);
            }
            else
            {
                int num2 = unencodedUserInput.Length - num1;
                fixed (char* chPtr1 = unencodedUserInput)
                {
                    char* chPtr2 = chPtr1;
                    while (num1-- > 0)
                        output.Write(*chPtr2++);
                    while (num2-- > 0)
                    {
                        char ch = *chPtr2++;
                        if (ch <= 62)
                        {
                            switch (ch)
                            {
                                case '"':
                                    output.Write(""");
                                    numberOfEscapes++;
                                    continue;
                                case '&':
                                    output.Write("&amp;");
                                    numberOfEscapes++;
                                    continue;
                                case '\'':
                                    output.Write("&amp;#x27;");
                                    numberOfEscapes = numberOfEscapes + 2;
                                    continue;
                                case '<':
                                    output.Write("<");
                                    numberOfEscapes++;
                                    continue;
                                case '>':
                                    output.Write(">");
                                    numberOfEscapes++;
                                    continue;
                                case '/':
                                    output.Write("&amp;#x2F;");
                                    numberOfEscapes = numberOfEscapes + 2;
                                    continue;
                                default:
                                    output.Write(ch);
                                    continue;
                            }
                        }
                        if (ch >= 160 && ch < 256)
                        {
                            output.Write("&#");
                            output.Write(((int)ch).ToString(NumberFormatInfo.InvariantInfo));
                            output.Write(';');
                            numberOfEscapes++;
                        }
                        else
                            output.Write(ch);
                    }
                }
            }
            string encodedHtml = output.ToString();
            output.Dispose();
            return encodedHtml;
        }

 

        private static unsafe int IndexOfHtmlEncodingChars(string searchString)
        {
            int num = searchString.Length;
            fixed (char* chPtr1 = searchString)
            {
                char* chPtr2 = (char*)((UIntPtr)chPtr1);
                for (; num > 0; --num)
                {
                    char ch = *chPtr2;
                    if (ch <= 62)
                    {
                        switch (ch)
                        {
                            case '"':
                            case '&':
                            case '\'':
                            case '<':
                            case '>':
                            case '/':
                                return searchString.Length - num;
                        }
                    }
                    else if (ch >= 160 && ch < 256)
                        return searchString.Length - num;
                    ++chPtr2;
                }
            }
            return CharacterIndexNotFound;
        }

        private static char[] _htmlEntityEndingChars = new char[2]
        {
            ';',
            '&'
        };
        private static class HtmlEntities
        {
            private static string[] _entitiesList = new string[253]
            {
                "\"-quot",
                "&-amp",
                "'-apos",
                "<-lt",
                ">-gt",
                " -nbsp",
                "¡-iexcl",
                "¢-cent",
                "£-pound",
                "¤-curren",
                "¥-yen",
                "¦-brvbar",
                "§-sect",
                "¨-uml",
                "©-copy",
                "ª-ordf",
                "«-laquo",
                "¬-not",
                "\x00AD-shy",
                "®-reg",
                "¯-macr",
                "°-deg",
                "±-plusmn",
                "\x00B2-sup2",
                "\x00B3-sup3",
                "´-acute",
                "µ-micro",
                "¶-para",
                "·-middot",
                "¸-cedil",
                "\x00B9-sup1",
                "º-ordm",
                "»-raquo",
                "\x00BC-frac14",
                "\x00BD-frac12",
                "\x00BE-frac34",
                "¿-iquest",
                "À-Agrave",
                "Á-Aacute",
                "Â-Acirc",
                "Ã-Atilde",
                "Ä-Auml",
                "Å-Aring",
                "Æ-AElig",
                "Ç-Ccedil",
                "È-Egrave",
                "É-Eacute",
                "Ê-Ecirc",
                "Ë-Euml",
                "Ì-Igrave",
                "Í-Iacute",
                "Î-Icirc",
                "Ï-Iuml",
                "Ð-ETH",
                "Ñ-Ntilde",
                "Ò-Ograve",
                "Ó-Oacute",
                "Ô-Ocirc",
                "Õ-Otilde",
                "Ö-Ouml",
                "×-times",
                "Ø-Oslash",
                "Ù-Ugrave",
                "Ú-Uacute",
                "Û-Ucirc",
                "Ü-Uuml",
                "Ý-Yacute",
                "Þ-THORN",
                "ß-szlig",
                "à-agrave",
                "á-aacute",
                "â-acirc",
                "ã-atilde",
                "ä-auml",
                "å-aring",
                "æ-aelig",
                "ç-ccedil",
                "è-egrave",
                "é-eacute",
                "ê-ecirc",
                "ë-euml",
                "ì-igrave",
                "í-iacute",
                "î-icirc",
                "ï-iuml",
                "ð-eth",
                "ñ-ntilde",
                "ò-ograve",
                "ó-oacute",
                "ô-ocirc",
                "õ-otilde",
                "ö-ouml",
                "÷-divide",
                "ø-oslash",
                "ù-ugrave",
                "ú-uacute",
                "û-ucirc",
                "ü-uuml",
                "ý-yacute",
                "þ-thorn",
                "ÿ-yuml",
                "Œ-OElig",
                "œ-oelig",
                "Š-Scaron",
                "š-scaron",
                "Ÿ-Yuml",
                "ƒ-fnof",
                "\x02C6-circ",
                "˜-tilde",
                "Α-Alpha",
                "Β-Beta",
                "Γ-Gamma",
                "Δ-Delta",
                "Ε-Epsilon",
                "Ζ-Zeta",
                "Η-Eta",
                "Θ-Theta",
                "Ι-Iota",
                "Κ-Kappa",
                "Λ-Lambda",
                "Μ-Mu",
                "Ν-Nu",
                "Ξ-Xi",
                "Ο-Omicron",
                "Π-Pi",
                "Ρ-Rho",
                "Σ-Sigma",
                "Τ-Tau",
                "Υ-Upsilon",
                "Φ-Phi",
                "Χ-Chi",
                "Ψ-Psi",
                "Ω-Omega",
                "α-alpha",
                "β-beta",
                "γ-gamma",
                "δ-delta",
                "ε-epsilon",
                "ζ-zeta",
                "η-eta",
                "θ-theta",
                "ι-iota",
                "κ-kappa",
                "λ-lambda",
                "μ-mu",
                "ν-nu",
                "ξ-xi",
                "ο-omicron",
                "π-pi",
                "ρ-rho",
                "ς-sigmaf",
                "σ-sigma",
                "τ-tau",
                "υ-upsilon",
                "φ-phi",
                "χ-chi",
                "ψ-psi",
                "ω-omega",
                "ϑ-thetasym",
                "ϒ-upsih",
                "ϖ-piv",
                " -ensp",
                " -emsp",
                " -thinsp",
                "\x200C-zwnj",
                "\x200D-zwj",
                "\x200E-lrm",
                "\x200F-rlm",
                "–-ndash",
                "—-mdash",
                "‘-lsquo",
                "’-rsquo",
                "‚-sbquo",
                "“-ldquo",
                "”-rdquo",
                "„-bdquo",
                "†-dagger",
                "‡-Dagger",
                "•-bull",
                "…-hellip",
                "‰-permil",
                "′-prime",
                "″-Prime",
                "‹-lsaquo",
                "›-rsaquo",
                "‾-oline",
                "⁄-frasl",
                "€-euro",
                "ℑ-image",
                "℘-weierp",
                "ℜ-real",
                "™-trade",
                "ℵ-alefsym",
                "←-larr",
                "↑-uarr",
                "→-rarr",
                "↓-darr",
                "↔-harr",
                "↵-crarr",
                "⇐-lArr",
                "⇑-uArr",
                "⇒-rArr",
                "⇓-dArr",
                "⇔-hArr",
                "∀-forall",
                "∂-part",
                "∃-exist",
                "∅-empty",
                "∇-nabla",
                "∈-isin",
                "∉-notin",
                "∋-ni",
                "∏-prod",
                "∑-sum",
                "−-minus",
                "∗-lowast",
                "√-radic",
                "∝-prop",
                "∞-infin",
                "∠-ang",
                "∧-and",
                "∨-or",
                "∩-cap",
                "∪-cup",
                "∫-int",
                "∴-there4",
                "∼-sim",
                "≅-cong",
                "≈-asymp",
                "≠-ne",
                "≡-equiv",
                "≤-le",
                "≥-ge",
                "⊂-sub",
                "⊃-sup",
                "⊄-nsub",
                "⊆-sube",
                "⊇-supe",
                "⊕-oplus",
                "⊗-otimes",
                "⊥-perp",
                "⋅-sdot",
                "⌈-lceil",
                "⌉-rceil",
                "⌊-lfloor",
                "⌋-rfloor",
                "〈-lang",
                "〉-rang",
                "◊-loz",
                "♠-spades",
                "♣-clubs",
                "♥-hearts",
                "♦-diams"
            };
            private static Dictionary<string, char> _lookupTable = GenerateLookupTable();

            private static Dictionary<string, char> GenerateLookupTable()
            {
                Dictionary<string, char> dictionary = new Dictionary<string, char>(StringComparer.Ordinal);
                foreach (string str in _entitiesList)
                    dictionary.Add(str.Substring(2), str[0]);
                return dictionary;
            }

            public static char Lookup(string entity)
            {
                char ch;
                _lookupTable.TryGetValue(entity, out ch);
                return ch;
            }
        }
    }
}

You may also notice that I’ve mocked the OperationContext.
Thanks to WCFMock, a mocking framework for WCF services.
I won’t include this code, but you can get it here.
I’ve used the popular NUnit test framework and RhinoMocks for the stubbing and mocking.
Both pulled into the solution using NuGet.
Most useful documentation for RhinoMocks:
http://ayende.com/Wiki/Rhino+Mocks+3.5.ashx
http://ayende.com/wiki/Rhino+Mocks.ashx

For this project I used NLog and wrapped it.
Now you start to get an idea of how to use the sanitisation.

using System;
using System.ServiceModel;
using System.ServiceModel.Channels;
using NUnit.Framework;
using System.Configuration;
using Rhino.Mocks;
using Common.Wrapper.Log;
using MockedOperationContext = System.ServiceModel.Web.MockedOperationContext;
using Common.WcfHelpers.ErrorHandling.Exceptions;

namespace Sanitisation.UnitTest
{
    [TestFixture]
    public class SanitiseTest
    {
        private const string _myTestIpv4Address = "My.Test.Ipv4.Address";
        private readonly int _maxLengthHtmlEncodedUserInput = int.Parse(ConfigurationManager.AppSettings["MaxLengthHtmlEncodedUserInput"]);
        private readonly int _maxLengthHtmlDecodedUserInput = int.Parse(ConfigurationManager.AppSettings["MaxLengthHtmlDecodedUserInput"]);
        private readonly string _encodedUserInput_thatsMaxDecodedLength = @"One #x2F &amp;#x2F; two amp &amp; three #x27 &amp;#x27; four lt < five quot " six gt >.
One #x2F &amp;#x2F; two amp &amp; three #x27 &amp;#x27; four lt < five quot " six gt >.
One #x2F &amp;#x2F; two amp &amp; three #x27 &amp;#x27; four lt < five quot " six gt >.
One #x2F &amp;#x2F; two amp &amp; three #x27 &amp;#x27; four lt < five quot " six gt >.
One #x2F &amp;#x2F; two amp &amp; three #x27 &amp;#x27; four lt < five quot " six gt >.
One #x2F &amp;#x2F; two amp &amp; three #x27 &amp;#x27; four lt < five quot " six gt >.";
        private readonly string _decodedUserInput_thatsMaxLength = @"One #x2F / two amp & three #x27 ' four lt < five quot "" six gt >.
One #x2F / two amp & three #x27 ' four lt < five quot "" six gt >.
One #x2F / two amp & three #x27 ' four lt < five quot "" six gt >.
One #x2F / two amp & three #x27 ' four lt < five quot "" six gt >.
One #x2F / two amp & three #x27 ' four lt < five quot "" six gt >.
One #x2F / two amp & three #x27 ' four lt < five quot "" six gt >.";

        [Test]
        public void Sanitise_UserInput_WhenGivenNull_ShouldReturnEmptyString()
        {
            Assert.That(new Sanitise().UserInput(null), Is.EqualTo(string.Empty));
        }

        [Test]
        public void Sanitise_UserInput_WhenGivenEmptyString_ShouldReturnEmptyString()
        {
            Assert.That(new Sanitise().UserInput(string.Empty), Is.EqualTo(string.Empty));
        }

        [Test]
        public void Sanitise_UserInput_WhenGivenSanitisedString_ShouldReturnSanitisedString()
        {
            // Open the whitelist up in order to test the encoding without restriction.
            Assert.That(new Sanitise(whiteList: @"^[\w\s\.,#/&'<"">]+$").UserInput(_encodedUserInput_thatsMaxDecodedLength), Is.EqualTo(_encodedUserInput_thatsMaxDecodedLength));
        }
        [Test]
        [ExpectedException(typeof(SanitisationWcfException))]
        public void Sanitise_UserInput_ShouldThrowExceptionIfEscapedInputToLong()
        {
            string fourThousandAndOneCharacters = "Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand character";
            string expectedError = "The un-modified string received from the client with the following IP address: " +
                   '"' + _myTestIpv4Address + "\" " +
                   "exceeded the allowed maximum length of an escaped Html user input string. " +
                   "The maximum length allowed is: " +
                   _maxLengthHtmlEncodedUserInput +
                   ". The length was: " +
                   (_maxLengthHtmlEncodedUserInput+1) + ".";

            using(new MockedOperationContext(StubbedOperationContext))
            {
                try
                {
                    new Sanitise().UserInput(fourThousandAndOneCharacters);
                }
                catch(SanitisationWcfException e)
                {
                    Assert.That(e.Message, Is.EqualTo(expectedError));
                    Assert.That(e.UnsanitisedAnswer, Is.EqualTo(fourThousandAndOneCharacters));
                    throw;
                }
            }
        }
        [Test]
        [ExpectedException(typeof(SanitisationWcfException))]
        public void Sanitise_UserInput_DecodedUserInputShouldThrowException_WhenMaxLengthHtmlDecodedUserInputIsExceeded()
        {
            char oneCharOverTheLimit = '.';
            string expectedError =
                           "The string received from the client with the following IP address: " +
                           "\"" + _myTestIpv4Address + "\" " +
                           "after Html decoding exceded the allowed maximum length of an un-escaped Html user input string." +
                           Environment.NewLine +
                           "The maximum length allowed is: " + _maxLengthHtmlDecodedUserInput + ". The length was: " +
                           (_decodedUserInput_thatsMaxLength + oneCharOverTheLimit).Length + oneCharOverTheLimit;

            using(new MockedOperationContext(StubbedOperationContext))
            {
                try
                {
                    new Sanitise().UserInput(_encodedUserInput_thatsMaxDecodedLength + oneCharOverTheLimit);
                }
                catch(SanitisationWcfException e)
                {
                    Assert.That(e.Message, Is.EqualTo(expectedError));
                    Assert.That(e.UnsanitisedAnswer, Is.EqualTo(_encodedUserInput_thatsMaxDecodedLength + oneCharOverTheLimit));
                    throw;
                }
            }
        }
        [Test]
        public void Sanitise_UserInput_ShouldLogAndSendEmail_IfNumberOfDecodedHtmlEntitiesDoesNotMatchNumberOfEscapes()
        {
            string encodedUserInput_with6HtmlEntitiesNotEscaped = _encodedUserInput_thatsMaxDecodedLength.Replace("&amp;#x2F;", "/");
            string errorWeAreExpecting =
                "It appears as if someone has circumvented the client side Html entity encoding." + Environment.NewLine +
                "The requesting IP address was: " +
                "\"" + _myTestIpv4Address + "\" " +
                "The sanitised input we receive from the client was the following:" + Environment.NewLine +
                "\"" + encodedUserInput_with6HtmlEntitiesNotEscaped + "\"" + Environment.NewLine +
                "The same input after decoding and re-escaping on the server side was the following:" + Environment.NewLine +
                "\"" + _encodedUserInput_thatsMaxDecodedLength + "\"";
            string sanitised;
            // setup _logger
            ILogger logger = MockRepository.GenerateMock<ILogger>();
            logger.Expect(lgr => lgr.logError(errorWeAreExpecting));

            Sanitise sanitise = new Sanitise(@"^[\w\s\.,#/&'<"">]+$", logger);

            using (new MockedOperationContext(StubbedOperationContext))
            {
                // Open the whitelist up in order to test the encoding etc.
                sanitised = sanitise.UserInput(encodedUserInput_with6HtmlEntitiesNotEscaped);
            }

            Assert.That(sanitised, Is.EqualTo(_encodedUserInput_thatsMaxDecodedLength));
            logger.VerifyAllExpectations();
        }        

        private static IOperationContext StubbedOperationContext
        {
            get
            {
                IOperationContext operationContext = MockRepository.GenerateStub<IOperationContext>();
                int port = 80;
                RemoteEndpointMessageProperty remoteEndpointMessageProperty = new RemoteEndpointMessageProperty(_myTestIpv4Address, port);
                operationContext.Stub(oc => oc.IncomingMessageProperties[RemoteEndpointMessageProperty.Name]).Return(remoteEndpointMessageProperty);
                return operationContext;
            }
        }
    }
}

Now the API code that we can use to do our sanitisation.

using System;
using System.Configuration;
// Todo : KC We need time to implement DI. Should be using something like ninject.extensions.wcf.
using OperationContext = System.ServiceModel.Web.MockedOperationContext;
using System.ServiceModel.Channels;
using Common.Security.Sanitisation;
using Common.WcfHelpers.ErrorHandling.Exceptions;
using Common.Wrapper.Log;

namespace Sanitisation
{

    public class Sanitise
    {
        private readonly string _whiteList;
        private readonly ILogger _logger;
        

        private string RequestingIpAddress
        {
            get
            {
                RemoteEndpointMessageProperty remoteEndpointMessageProperty = OperationContext.Current.IncomingMessageProperties[RemoteEndpointMessageProperty.Name] as RemoteEndpointMessageProperty;
                return ((remoteEndpointMessageProperty != null) ? remoteEndpointMessageProperty.Address : string.Empty);
            }
        }
        /// <summary>
        /// Provides server side escaping of Html entities, and runs the supplied whitelist character filter over the user input string.
        /// </summary>
        /// <param name="whiteList">Should be provided by DI from the ResourceFile.</param>
        /// <param name="logger">Should be provided by DI. Needs to be an asynchronous logger.</param>
        /// <example>
        /// The whitelist can be obtained from a ResourceFile like so...
        /// <code>
        /// private Resource _resource;
        /// _resource.GetString("WhiteList");
        /// </code>
        /// </example>
        public Sanitise(string whiteList = "", ILogger logger = null)
        {
            _whiteList = whiteList;
            _logger = logger ?? new Logger();
        }
        /// <summary>
        /// 1) Check field lengths.         Client side validation may have been negated.
        /// 2) Check against white list.	Client side validation may have been negated.
        /// 3) Check Html escaping.         Client side validation may have been negated.

        /// Generic Fail actions:	Drop the payload. No point in trying to massage and save, as it won't be what the user was expecting,
        ///                         Add full error to a WCFException Message and throw.
        ///                         WCF interception reads the WCFException.MessageForClient, and sends it to the user. 
        ///                         On return, log the WCFException's Message.
        ///                         
        /// Escape Fail actions:	Asynchronously Log and email full error to support.


        /// 1) BA confirmed 50 for text, and 400 for textarea.
        ///     As we don't know the field type, we'll have to go for 400."
        ///
        ///     First we need to check that we haven't been sent some huge string.
        ///     So we check that the string isn't longer than 400 * 10 = 4000.
        ///     10 is the length of our double escaped character references.
        ///     Or, we ask the business for a number."
        ///     If we fail here, perform Generic Fail actions and don't complete the following steps.
        /// 
        ///     Convert all Html Entity Encodings back to their equivalent characters, and count how many occurrences.
        ///
        ///     If the string is longer than 400, perform Generic Fail actions and don't complete the following steps.
        /// 
        /// 2) check all characters against the white list
        ///     If any don't match, perform Generic Fail actions and don't complete the following steps.
        /// 
        /// 3) re html escape (as we did in JavaScript), and count how many escapes.
        ///     If count is greater than the count of Html Entity Encodings back to their equivalent characters,
        ///     Perform Escape Fail actions. Return sanitised string.
        /// 
        ///     If we haven't returned, return sanitised string.
        
        
        /// Performs checking on the text passed in, to verify that client side escaping and whitelist validation has already been performed.
        /// Performs decoding, and re-encodes. Counts that the number of escapes was the same, otherwise we log and send email with the details to support.
        /// Throws exception if the client side validation failed to restrict the number of characters in the escaped string we received.
        ///     This needs to be intercepted at the service.
        ///     The exceptions default message for client needs to be passed back to the user.
        ///     On return, the interception needs to log the exception's message.
        /// </summary>
        /// <param name="sanitiseMe"></param>
        /// <returns></returns>
        public string UserInput(string sanitiseMe)
        {
            if (string.IsNullOrEmpty(sanitiseMe))
                return string.Empty;

            ThrowExceptionIfEscapedInputToLong(sanitiseMe);

            int numberOfDecodedHtmlEntities = 0;
            string decodedUserInput = HtmlDecodeUserInput(sanitiseMe, ref numberOfDecodedHtmlEntities);

            if(!decodedUserInput.CompliesWithWhitelist(whiteList: _whiteList))
            {
                string error = "The answer received from client with the following IP address: " +
                    "\"" + RequestingIpAddress + "\" " +
                    "had characters that failed to match the whitelist.";
                throw new SanitisationWcfException(error);
            }

            int numberOfEscapes = 0;
            string sanitisedUserInput = decodedUserInput.HtmlEncode(ref numberOfEscapes);

            if(numberOfEscapes != numberOfDecodedHtmlEntities)
            {
                AsyncLogAndEmail(sanitiseMe, sanitisedUserInput);
            }

            return sanitisedUserInput;
        }
        /// <note>
        /// Make sure the logger is setup to log asynchronously
        /// </note>
        private void AsyncLogAndEmail(string sanitiseMe, string sanitisedUserInput)
        {
            // no need for SanitisationException

            _logger.logError(
                "It appears as if someone has circumvented the client side Html entity encoding." + Environment.NewLine +
                "The requesting IP address was: " +
                "\"" + RequestingIpAddress + "\" " +
                "The sanitised input we receive from the client was the following:" + Environment.NewLine +
                "\"" + sanitiseMe + "\"" + Environment.NewLine +
                "The same input after decoding and re-escaping on the server side was the following:" + Environment.NewLine +
                "\"" + sanitisedUserInput + "\""
                );
        }

        /// <summary>
        /// This procedure may throw a SanitisationWcfException.
        /// If it does, ErrorHandlerBehaviorAttribute will need to pass the "messageForClient" back to the client from within the IErrorHandler.ProvideFault procedure.
        /// Once execution is returned, the IErrorHandler.HandleError procedure of ErrorHandlerBehaviorAttribute
        /// will continue to process the exception that was thrown in the way of logging sensitive info.
        /// </summary>
        /// <param name="toSanitise"></param>
        private void ThrowExceptionIfEscapedInputToLong(string toSanitise)
        {
            int maxLengthHtmlEncodedUserInput = int.Parse(ConfigurationManager.AppSettings["MaxLengthHtmlEncodedUserInput"]);
            if (toSanitise.Length > maxLengthHtmlEncodedUserInput)
            {
                string error = "The un-modified string received from the client with the following IP address: " +
                    "\"" + RequestingIpAddress + "\" " +
                    "exceeded the allowed maximum length of an escaped Html user input string. " +
                    "The maximum length allowed is: " +
                    maxLengthHtmlEncodedUserInput +
                    ". The length was: " +
                    toSanitise.Length + ".";
                throw new SanitisationWcfException(error, unsanitisedAnswer: toSanitise);
            }
        }

        private string HtmlDecodeUserInput(string doubleEncodedUserInput, ref int numberOfDecodedHtmlEntities)
        {
            string decodedUserInput = doubleEncodedUserInput.HtmlDecode(ref numberOfDecodedHtmlEntities).HtmlDecode(ref numberOfDecodedHtmlEntities) ?? string.Empty;
            
            // if the decoded string is longer than MaxLengthHtmlDecodedUserInput throw
            int maxLengthHtmlDecodedUserInput = int.Parse(ConfigurationManager.AppSettings["MaxLengthHtmlDecodedUserInput"]);
            if(decodedUserInput.Length > maxLengthHtmlDecodedUserInput)
            {
                throw new SanitisationWcfException(
                    "The string received from the client with the following IP address: " +
                    "\"" + RequestingIpAddress + "\" " +
                    "after Html decoding exceded the allowed maximum length of an un-escaped Html user input string." +
                    Environment.NewLine +
                    "The maximum length allowed is: " + maxLengthHtmlDecodedUserInput + ". The length was: " +
                    decodedUserInput.Length + ".",
                    unsanitisedAnswer: doubleEncodedUserInput
                    );
            }
            return decodedUserInput;
        }
    }
}

As you can see, there’s a lot more work in the server side sanitisation than the client side.

Sanitising User Input from Browser. part 1

November 4, 2012

I was working on a web based project recently where there was no security thought about when designing, developing it.
The following outlines my experience with retrofitting security.
It’s my hope that someone will find it useful for their own implementation.

We’ll be focussing on the client side in this post (part 1) and the server side in part 2.
We’ll also cover some preliminary discussion that will set the stage for this series.

The application consists of a WCF service delivering up content to some embedding code on any page in the browser.
The content is stored as Xml in the database and transformed into Html via Xslt.

The first activity I find useful is to go through the process of Threat Modelling the Application.
This process can be quite daunting for those new to it.
Here’s a couple of references I find quite useful to get started:

https://www.owasp.org/index.php/Application_Threat_Modeling

https://www.owasp.org/index.php/Threat_Risk_Modeling#Decompose_Application

Actually this ones not bad either.

There is no single right way to do this.
The more you read and experiment, the more equipped you will be.
The idea is to think like an attacker thinks.
This may be harder for some than others, but it is essential, to cover as many potential attack vectors as possible.
Remember, there is no secure system, just varying levels of insecurity.
It will always be much harder to discover the majority of security weaknesses in your application as the person or team creating/maintaining it,
than for the person attacking it.
The Threat Modelling topic is large and I’m not going to go into it here, other than to say, you need to go into it.

Threat Agents

Work out who your Threat Agents are likely to be.
Learn how to think like they do.
Learn what skills they have and learn the skills your self.
Sometimes the skills are very non technical.
For example walking through the door of your organisation in the weekend because the cleaners (or any one with access) forgot to lock up.
Or when the cleaners are there and the technical staff are not (which is just as easy).
It happens more often than we like to believe.

Defense in Depth

To attempt to mitigate attacks, we need to take a multi layered approach (often called defence in depth).

What made me decide to start with sanitising user input from the browser anyway?
Well according to the OWASP Top 10, Injection and Cross Site Scripting (XSS) are still the most popular techniques chosen to compromise web applications.
So it makes sense if your dealing with web apps, to target the most common techniques exploited.

Now, in regards to defence in depth when discussing web applications;
If the attacker gets past the first line of defence, there should be something stopping them at the next layer and so forth.
The aim is to stop the attack as soon as possible.
This is why we focus on the UI first, and later move our focus to the application server, then to the database.
Bear in mind though, that what ever we do on the client side, can be circumvented relatively easy.
Client side code is out of our control, so it’s best effort.
Because of this, we need to perform the following not only in the browser, but as much as possible on the server side as well.

  1. Minimising the attack surface
  2. Defining maximum field lengths (validation)
  3. Determining a white list of allowable characters (validation)
  4. Escaping untrusted data, especially where you know it’s going to endup in an execution context. Even where you don’t think this is likely, it’s still possible.
  5. Using Stored Procedures / parameterised queries (not covered in this series).
  6. Least Privilege.
    Minimising the privileges assigned to every database account (not covered in this series).

Minimising the attack surface

input fields should only allow certain characters to be input.
Text input fields, textareas etc that are free form (anything is allowed) are very hard to constrain to a small white list.
input fields where ever possible should be constrained to well structured data,
like dates, social security numbers, zip codes, e-mail addresses, etc. then the developer should be able to define a very strong validation pattern, usually based on regular expressions, for validating such input. If the input field comes from a fixed set of options, like a drop down list or radio buttons, then the input needs to match exactly one of the values offered to the user in the first place.
As it was with the existing app I was working on, we had to allow just about everything in our free form text fields.
This will have to be re-designed in order to provide constrained input.

Defining maximum field lengths (validation)

This was currently being done (sometimes) in the Xml content for inputs where type="text".
Don’t worry about the inputType="single", it gets transformed.

<input id="2" inputType="single" type="text" size="10" maxlength="10" />

And if no maxlength specified in the Xml, we now specify a default of 50 in the xsl used to do the transformation.
This way we had the input where type="text" covered for the client side.
This would also have to be validated on the server side when the service received values from these inputs where type="text".

    <xsl:template match="input[@inputType='single']">
      <xsl:value-of select="@text" />
        <input name="a{@id}" type="text" id="a{@id}" class="textareaSingle">
          <xsl:attribute name="value">
            <xsl:choose>
              <xsl:when test="key('response', @id)">
                <xsl:value-of select="key('response', @id)" />
              </xsl:when>
              <xsl:otherwise>
                <xsl:value-of select="string(' ')" />
              </xsl:otherwise>
            </xsl:choose>
          </xsl:attribute>
          <xsl:attribute name="maxlength">
            <xsl:choose>
              <xsl:when test="@maxlength">
                <xsl:value-of select="@maxlength"/>
              </xsl:when>
              <xsl:otherwise>50</xsl:otherwise>
            </xsl:choose>
          </xsl:attribute>
        </input>
        <br/>
    </xsl:template>

For textareas we added maxlength validation as part of the white list validation.
See below for details.

Determining a white list of allowable characters (validation)

See bottom of this section for Update

Now this was quite an interesting exercise.
I needed to apply a white list to all characters being entered into the input fields.
A user can:

  1. type the characters in
  2. [ctrl]+[v] a clipboard full of characters in
  3. right click -> Paste

To cover all these scenarios as elegantly as possible, was going to be a bit of a challenge.
I looked at a few JavaScript libraries including one or two JQuery plug-ins.
None of them covered all these scenarios effectively.
I wish they did, because the solution I wasn’t totally happy with, because it required polling.
In saying that, I measured performance, and even bringing the interval right down had negligible effect, and it covered all scenarios.

setupUserInputValidation = function () {

  var textAreaMaxLength = 400;
  var elementsToValidate;
  var whiteList = /[^A-Za-z_0-9\s.,]/g;

  var elementValue = {
    textarea: '',
    textareaChanged: function (obj) {
      var initialValue = obj.value;
      var replacedValue = initialValue.replace(whiteList, "").slice(0, textAreaMaxLength);
      if (replacedValue !== initialValue) {
        this.textarea = replacedValue;
        return true;
      }
      return false;
    },
    inputtext: '',
    inputtextChanged: function (obj) {
      var initialValue = obj.value;
      var replacedValue = initialValue.replace(whiteList, "");
      if (replacedValue !== initialValue) {
        this.inputtext = replacedValue;
        return true;
      }
      return false;
    }
  };

  elementsToValidate = {
    textareainputelements: (function () {
      var elements = $('#page' + currentPage).find('textarea');
      if (elements.length > 0) {
        return elements;
      }
      return 'no elements found';
    } ()),
    textInputElements: (function () {
      var elements = $('#page' + currentPage).find('input[type=text]');
      if (elements.length > 0) {
        return elements;
      }
      return 'no elements found';
    } ())
  };

  // store the intervals id in outer scope so we can clear the interval when we change pages.
  userInputValidationIntervalId = setInterval(function () {
    var element;

    // Iterate through each one and remove any characters not in the whitelist.
    // Iterate through each one and trim any that are longer than textAreaMaxLength.

    for (element in elementsToValidate) {
      if (elementsToValidate.hasOwnProperty(element)) {
        if (elementsToValidate[element] === 'no elements found')
          continue;

        $.each(elementsToValidate[element], function () {
          $(this).attr('value', function () {
            var name = $(this).prop('tagName').toLowerCase();
            name = name === 'input' ? name + $(this).prop('type') : name;
            if (elementValue[name + 'Changed'](this))
              this.value = elementValue[name];
          });
        });
      }
    }
  }, 300); // milliseconds
};

Each time we change page, we clear the interval and reset it for the new page.

clearInterval(userInputValidationIntervalId);

setupUserInputValidation();

Update 2013-06-02:

Now with HTML5 we have the pattern attribute on the input tag, which allows us to specify a regular expression that the text about to be received is checked against. We can also see it here amongst the new HTML5 attributes . If used, this can make our JavaScript white listing redundant, providing we don’t have textareas which W3C has neglected to include the new pattern attribute on. I’d love to know why?

Escaping untrusted data

Escaped data will still render in the browser properly.
Escaping simply lets the interpreter know that the data is not intended to be executed,
and thus prevents the attack.

Now what we do here is extend the String prototype with a function called htmlEscape.

if (typeof Function.prototype.method !== "function") {
  Function.prototype.method = function (name, func) {
    this.prototype[name] = func;
    return this;
  };
}

String.method('htmlEscape', function () {

  // Escape the following characters with HTML entity encoding to prevent switching into any execution context,
  // such as script, style, or event handlers.
  // Using hex entities is recommended in the spec.
  // In addition to the 5 characters significant in XML (&, <, >, ", '), the forward slash is included as it helps to end an HTML entity.
  var character = {
    '&': '&amp;',
    '<': '&lt;',
    '>': '&gt;',
    '"': '&quot;',
    // Double escape character entity references.
    // Why?
    // The XmlTextReader that is setup in XmlDocument.LoadXml on the service considers the character entity references () to be the character they represent.
    // All XML is converted to unicode on reading and any such entities are removed in favor of the unicode character they represent.
    // So we double escape character entity references.
    // These now get read to the XmlDocument and saved to the database as double encoded Html entities.
    // Now when these values are pulled from the database and sent to the browser, it decodes the & and displays #x27; and/or #x2F.
    // This isn't what we want to see in the browser.
    "'": '&amp;#x27;',    // &apos; is not recommended
    '/': '&amp;#x2F;'     // forward slash is included as it helps end an HTML entity
  };

  return function () {
    return this.replace(/[&<>"'/]/g, function (c) {
      return character[c];
    });
  };
}());

This allows us to, well, html escape our strings.

element.value.htmlEscape();

In looking through here,
The only untrusted data we are capturing is going to be inserted into an Html element

tag by way of insertion into a textarea element,
or the attribute value of input elements where type="text".
I initially thought I’d have to:

  1. Html escape the untrusted data which is only being captured from textarea elements.
  2. Attribute escape the untrusted data which is being captured from the value attribute of input elements where type="text".

RULE #2 – Attribute Escape Before Inserting Untrusted Data into HTML Common Attributes of here,
mentions
“Properly quoted attributes can only be escaped with the corresponding quote.”
So I decided to test it.
Created a collection of injection attacks. None of which worked.
Turned out we only needed to Html escape for the untrusted data that was going to be inserted into the textarea element.
More on this in a bit.

Now in regards to the code comments in the above code around having to double escape character entity references;
Because we’re sending the strings to the browser, it’s easiest to single decode the double encoded Html on the service side only.
Now because we’re still focused on the client side sanitisation,
and we are going to shift our focus soon to making sure we cover the server side,
we know we’re going to have to create some sanitisation routines for our .NET service.
Because the routines are quite likely going to be static, and we’re pretty much just dealing with strings,
lets create an extensions class in a new project in a common library we’ve already got.
This will allow us to get the widest use out of our sanitisation routines.
It also allows us to wrap any existing libraries or parts of them that we want to get use of.

namespace My.Common.Security.Encoding
{
    /// <summary>
    /// Provides a series of extension methods that perform sanitisation.
    /// Escaping, unescaping, etc.
    /// Usually targeted at user input, to help defend against the likes of XSS attacks.
    /// </summary>
    public static class Extensions
    {
        /// <summary>
        /// Returns a new string in which all occurrences of a double escaped html character (that's an html entity immediatly prefixed with another html entity)
        /// in the current instance are replaced with the single escaped character.
        /// </summary>
        ///
        /// The new string.
        public static string SingleDecodeDoubleEncodedHtml(this string source)
        {
            return source.Replace("&amp;#x", "&#x");
        }
    }
}

Now when we run our xslt transformation on the service, we chain our new extension method on the end.
Which gives us back a single encoded string that the browser is happy to display as the decoded value.

return Transform().SingleDecodeDoubleEncodedHtml();

Now back to my findings from the test above.
Turns out that “Properly quoted attributes can only be escaped with the corresponding quote.” really is true.
I thought that if I entered something like the following into the attribute value of an input element where type="text",
then the first double quote would be interpreted as the corresponding quote,
and the end double quote would be interpreted as the end quote of the onmouseover attribute value.

 " onmouseover="alert(2)

What actually happens, is during the transform…

xslCompiledTransform.Transform(xmlNodeReader, args, writer, new XmlUrlResolver());

All the relevant double quotes are converted to the double quote Html entity ‘”‘ without the single quotes.

onmouseover

And all double quotes are being stored in the database as the character value.

Libraries and useful code

Microsoft Anti-Cross Site Scripting Library

OWASP Encoding Project
This is the Reform library. Supports Perl, Python, PHP, JavaScript, ASP, Java, .NET

Online escape tool supporting Html escape/unescape, Java, .NET, JavaScript

The characters that need escaping for inserting untrusted data into Html element content

JavaScript The Good Parts: pg 90 has a nice ‘entityify’ function

OWASP Enterprise Security API Used for JavaScript escaping (ESAPI4JS)

JQuery plugin

Changing encoding on html page

Cheat Sheets and Check Lists I found helpful

https://www.owasp.org/index.php/Input_Validation_Cheat_Sheet

https://www.owasp.org/index.php/OWASP_Validation_Regex_Repository

https://www.owasp.org/index.php/XSS_(Cross_Site_Scripting)_Prevention_Cheat_Sheet

https://www.owasp.org/index.php/DOM_based_XSS_Prevention_Cheat_Sheet

https://www.owasp.org/index.php/OWASP_AJAX_Security_Guidelines

If any of this is unclear, let me know and I’ll do my best to clarify. Maybe you have suggestions of how this could have been improved? Let’s spark a discussion.

C#.NET Coding Standards and Guidelines

August 12, 2012

This is the current set of coding standards and guidelines I use when I’m coding in the C#.NET language.
I thought it would be good to share so others could get use out of them also, and maybe start a discussion as to amendments / changes they see that could be useful?

Naming Conventions

  • Do not use Hungarian notation, I.E. a boolian variable may have the name MyBool, but shouldn’t be called bMyBool.
  • Do prefix member variables with the underscore ‘_’. Do not prefix member variables with “this”. also use camelCasing for member variables. The underscore is easy to see, is one key stroke.
  • Do prefix interfaces names with “I”
  • Do not prefix enums, classes, or delegates with any letter.

Key:

c” = camelCase
P” = PascalCase
“_” = Prefix with _Underscore
“x” = Not Applicable.

Identifier Public Protected Internal Private Notes
Project File P x x x Match Assembly and Root Namespace
Project Folder P x x x Match Project File
Source File P x x x Match contained class
Test Source File P x x x Append the word Test if it contains tests
Image File c x x x
Other Files P x x x Apply where possible
Namespace P  x x x Partial Project/Assembly match.(Also see the namespace section)
Solution File P x x x CompanyNameSolutionDescription
Solution Folder P x x x CompanyNameSolutionDescription (if multiple solutions in repository). Source (if single solution)
SpecFlow Feature File P x x x USn_BriefUserStoryName where n is the user story number
oject Folder P  x x x Same as Project file
Class or Struct P P P P Add suffix of subclass.
Interface P P P P Prefix with a capital I.
Generic Class P P P P Use T (type) or K (key) as Type identifier.
Method P P P P Use a Verb or Verb-Object pair.
Test Method P x x x MemberUnderTest_StateUnderTest_ExpectedBehavior . StateUnderBehavior can be leftout if not applicable.
Property P P P P Do not prefix with Get or Set.
Field P P P _c Only use Private fields.
Constant P P P _c
Static Field P P P _c Only use Private fields.
Enum P P P P Options are also PascalCase.
Delegate P P P P See under Events, Delegates for naming Dot NET
Event P P P P See under Events, Delegates for naming Dot NET
Inline Variable x x x c Avoid single-character and enumerated names.
Parameter x x x c

Coding Style

Commenting

Comment Style

Block comments should usually be avoided

/* Line 1
* Line 2
* Line 3
*/
/* … */

Begin comment text with an upper case character. End comment text with a period.

If you have to comment your code, consider refactoring, so that it is easier to read.
Prefer not to use inline-comments to explain obvious code. Well written code is self-documenting.
Rather fix or clean up code now, than put a // Todo in.

You can access // Todo‘s in Visual Studio via

View menu -> Task List
The Tokens can be setup in Tools -> Options… -> Environment->Task List

or for ReSharper

ReSharper menu -> Tools -> To-do Items (or use the key shortcuts)

Use the following tokens:

  • Todo
  • Note
  • Bug
  • Not Implemented

XML Documentation

  • Always apply C# comment-blocks (///) to public, protected, and internal declarations.
  • Only use C# comment-blocks for documenting the API I.E the interface.
  • include <summary> comments. Include <param>, <return>, and <exception> comment
    sections where applicable.
  • Include <see cref=””/> and <seeAlso cref=””/> where possible.
  • Always add CDATA tags to comments containing code and other embedded markup in order to avoid
    encoding issues.
    Example:

    ///
    /// Add the following key to the appSettings” section of your config:
    /// <code><![CDATA[
    ///   <configuration>
    ///     <appSettings>
    ///       <add key=”mySetting” value=”myValue”/>
    ///     </appSettings>
    ///   </configuration>
    /// ]]></code>
    ///
    

File Organisation

Group internal class implementation by type in the following order:

  1. Member variables.
  2. Constructors & Finalizers.
  3. Nested Enums, Structs, and Classes.
  4. Properties
  5. Methods

Sequence declarations within type groups based upon access modifier and visibility:

  1. Public
  2. Protected
  3. Internal
  4. Private
  • Do not use #region statements
  • Always match class name and file name where ever possible. Avoid including more than one class per file.

Formatting

Bracing

  • Place first brace of the block at the end of the line preceded with a space.
    In languages like C, C++, C#, Java, it doesn’t matter where you put the first curly brace, it’s just personal preference or based on vote.
    In languages like JavaScript, it does matter. I use quite a bit of JavaScript, so just find it easier to use the same convention. Although at work, we use the “opening brace on a new line convention, simply because it won the vote”.
  • Always use curly braces ({ and }) in conditional statements. Unless there is a very simple statement, like return bla.
  • Recursively indent all code blocks contained within braces.

Spacing

Use white space (CR/LF, Tabs, etc) liberally to separate and organize code.

Only declare related attribute declarations on a single line, otherwise stack each attribute as a separate declaration.

Example:

// Bad!
[Attrbute1, Attrbute2, Attrbute3]
public class MyClass {
   …
}

// Good!
[Attrbute1, RelatedAttribute2]
[Attrbute3]
[Attrbute4]
public class MyClass {
   …
}

Tabs and Indenting

Tab characters (x09) should not be used in code. All indentation should be done with 3 space characters.

Language Usage

Access Modifiers

Do not omit access modifiers.
Explicitly declare all identifiers with the appropriate access modifier instead of allowing the default.
Example:

// Bad!
Void WriteEvent(string message) {
   …
}

// Good!
private Void WriteEvent(string message) {
   …
}
Prefer explicit to implicit Both the above definitions are private.
Prefer explicit to implicit.

Calling Routines

When calling a routine that takes a bool or a number.
Don’t pass litterals, as it’s unclair what they represent.
Instead create a variable with a meaningful name.

// calling MethodTakingExampleArgs
MethodTakingExampleArgs(true, 12);

// instead do the following

bool temperatureHasChanged = true;
int temperatureInCelcius = 12;

// calling MethodTakingExampleArgs
MethodTakingExampleArgs(temperatureHasChanged, temperatureInCelcius);

The intent becomes clearer, thus making for code that’s easier to read, thus we work faster.

If a routine call has its parameters spread over more than a single line due to being to long, place each parameter on its own line.
Also consider how many arguments are being passed, if it’s over 5, consider other ways to pass the information needed.

Class

Avoid putting multiple classes in a single file.

Events, Delegates

The delegate type should be prefixed with “Handler”.
The name of the procedure that does the work should be a verb.

public class MyDelegateExample {
   delegate void ChangeHandler();
   event ChangeHandler _change;

   private void OnChange() {
      if (_change != null)
         _change();
   }
}
Prefer explicit to implicit Rather than checking for null, you can add an empty delegate to your _change event
so that you don’t have to check the event for null before you raise it.

The traditional null check followed by the next action is not atomic, so not thread safe. Discussed in more depth here.

public class MyDelegateExample {

   delegate void ChangeHandler();
   event ChangeHandler _change = delegate{};

   public void Attach(ChangeHandler update) {
      Change += update;
   }

   public void Detach(ChangeHandler update) {
      Change -= update;
   }

   private void OnChange() {
      _change();
   }
}

Exceptions

  • Do not use try/catch blocks for flow-control. Only use for exceptional cases.
  • Only catch exceptions that you can handle.
  • Never declare an empty catch block.
  • Avoid nesting a try/catch within a catch block.
  • Always catch the most derived exception via exception filters.
  • Order exception filters from most to least derived exception type.
  • Avoid re-throwing an exception. Allow it to bubble-up instead.
  • If re-throwing an exception, preserve the original call stack by omitting the exception argument from the throw statement.Example:
    // Bad!
    catch(Exception e) {
       Log(e);
       throw e;
    }
    
    // Good!
    catch(Exception e) {
       Log(e);
       throw;
    }
    
  • Only use the finally block to release resources from a try statement.
  • Always use validation to avoid exceptions.
    Example:

    // Bad!
    try {
       conn.Close();
    }
    Catch(Exception ex) {
       // handle exception if already closed!
    }
    
    // Good!
    if(conn.State != ConnectionState.Closed) {
       conn.Close();
    }
    
  • Always set the innerException property on thrown exceptions so the exception chain & call stack are maintained.
  • Avoid defining custom exception classes if there is an existing Exception derived class available in the .NET library.
  • Always suffix exception class names with the word “Exception”.
  • Always add the SerializableAttribute to exception classes.
  • Always implement the standard “Exception Constructor Pattern”:
    public MyCustomException ();
    public MyCustomException (string message);
    public MyCustomException (string message, Exception innerException);
    
    Prefer explicit to implicit Or better… if using .NET 4.0 or greater, use optional parameters.
  • Always implement the deserialization constructor:
    protected MyCustomException(SerializationInfo info, treamingContext contxt);
    

Flow Control

Case Statements

  • Only use switch/case statements for simple operations with parallel conditional logic.
  • Prefer nested if/else over switch/case for short conditional sequences and complex conditions.
  • Prefer polymorphism over switch/case to encapsulate and delegate complex operations.
    Don’t fall into the trap of writing procedural code in an OO language.

Conditions

Avoid evaluating Boolean conditions against true or false.
Example:

// Bad!
if(isValid == true) {
   …
}

// Good!
if(isValid) {
   …
}

Use braces {} as shown above in all situations but for the most simple.
If you have more than a single line statement in a conditional, surround it with braces.

Implicit typing using the var keyword

Some background on var:

The compiler simply takes the compile time
type of the initialization expression and makes the variable have that type too.
An example:

var stringVariable = "Hello, world."
stringVariable = 0;
The above code is invalid.

The var keyword should only be used with LINQ and Anonymous types.
Unless there’s a significant gain in code simplicity, use explicit typing.

It is recommended to use var only when it is necessary, that is, when the variable will be used to store an anonymous type or a collection of anonymous types.
See Microsofts reference on var.

Sometimes you actually want the code to break when a type is changed.

Consider a control system. Many elements have On() and Off() methods. there are many cases where there is no relationship between the types (i.e. no common base classes or interfaces), there is only the similarity that both have methods with those signatures.

Writting code:

var thing = SomeFactory.GetThing();  // Returns something that is safe to turn off...
thing.Off();

Then later a change is made to the Factory and that method now returns something completely different, which happens to have severe consequences if it is arbitrarily turned off having such a design is debatable for many reasons.

By using var, the previous code will compile without complaint. Even though the return type may have changed from ReadingLamp to LifeSupportSystem.

I believe that there are more times when there is the possibility of an “unintended side-effect” caused by a change in the type than there are times where the change in type has no bearing on the code that consumes it. As a result, I very rarely use var. Even when the return type is obvious (such as the LHS of a new), I find it easier to be consistent.

Namespace

  • CompanyName.SolutionDescription.AssemblyDescription
  • Never declare more than 1 namespace per file.
  • Append folder-name to namespace for source files within sub-folders.
  • Also see the Naming conventions table.
  • Place namespace “using” statements together at the top of file. Group .NET namespaces above custom namespaces.
  • Followed by grouping of external namespaces.
  • Followed by grouping of organisation namespaces.
  • Order namespace “using” statements alphabetically.

Variables and Types

  • Declare and preferably initialize local variables at the same point and as close to where you first use them.
  • Always choose the simplest data type, list, or object required.
  • Always use the built-in C# data type aliases, not the .NET common type system (CTS).
    Example:
short NOT System.Int16
int NOT System.Int32
long NOT System.Int64
string NOT System.String
  • Declare one variable per line.
  • Only declare member variables as private. Use properties to provide access to them with public, protected, or internal access modifiers.
  • Prefer to use the as operator and check for null, rather than directly casting, and having to handle potential InvalidCastException.
    object dataObject = LoadData();
    DataSet ds = dataObject as DataSet;
    if(ds != null) {
       …
    }
    
  • Avoid boxing and unboxing value types.
    Especially in loops, or where performance matters.Example:

    int count = 1;
    object refCount = count; // Implicitly boxed.
    int newCount = (int)refCount; // Explicitly unboxed.
    

Strings

  • Use the “@” prefix for string literals instead of escaped strings.
  • Prefer String.Format() or StringBuilder over string concatenation.
    StringBuilder performs many times faster (thousands in fact)
  • Never concatenate strings inside a loop. Remember, string’s are immutable. Each time you concatenate, a new instance of string is created.
  • Checking whether a string is empty?
    String.Length == 0 or “” is faster than String.Empty, but… beware of null strings, if null when you perform a String.Length, you’ll get a NullReferenceException.
    The safest technique is to use the static IsNullOrEmpty function on string.
    Using “” does not create a new object. Due to string interning, it will be created either once per assembly or once per AppDomain.

A Handful of Singletons in C#

July 14, 2012

Recently I was involved in an interview where I was queried on the Singleton Creational design pattern.
I thought I’d share what I came up with.
In order of preference from most to least used.

Most used:
System.Lazy introduced in .Net 4.0
Sealing the class can help the Just In Time (JIT) compilation to optimise the IL.
Of course you also don’t want your singletons being extended,
but the fact that your constructor is private and default (takes no arguments),
guards against instantiation and sub-classing

Example 1

public sealed class KimsSingleton {

   // System.Lazy guarantees lazyness and thread safety
   private static readonly Lazy<KimsSingleton> _instance = new Lazy<KimsSingleton>(() => new KimsSingleton());

   // private, preventing any other class's from instantiating.
   // Also prevents creating child class's... which would create another instance, thus violating the pattern.
   private KimsSingleton() {
   }

   // static so client code can call Instance property from class.
   public static KimsSingleton Instance {
      get {
         return _instance.Value;
      }
   }
}

.Net guarantees lazy initialisation if the type is not marked with beforefieldinit.
Although it could be more lazy. See example 3 for one way to do this.
Marking the types constructor as static tells the compiler not to mark the type with beforefieldinit in the IL,
thus giving us laziness.
This is also thread safe.
In C#, static constructor will execute only once (per AppDomain),
either on instantiation, or when a static member is referenced for the first time.

Example 2

public sealed class KimsSingleton {
   private static readonly KimsSingleton _instance = new KimsSingleton();

   static KimsSingleton() {
   }

   private KimsSingleton() {
   }

   public static KimsSingleton Instance {
      get {
         return _instance;
      }
   }
}

Example 3

public sealed class KimsSingleton {
   private KimsSingleton() {
   }

   public static KimsSingleton Instance {
      get {
         return Nested._instance;
      }
   }

   private class Nested {
      static Nested() {
      }
      // This gives us more laziness than than example 2,
      // because the only static member that can initialise is the static instance member in Nested.
      internal static readonly KimsSingleton _instance = new KimsSingleton();
   }
}

One more way that I’ve seen used quite a few times that starts to fall apart.
Even the GoF guys do this example.
Head First Design Patterns do it as well (not good!),
although they use the volatile keyword which helps.
Lets look at where it falls apart.
If performance is an issue, stay away from this way.
If you fail to declare your instance member as volatile,
the exact position of the read / write may be changed by the compiler.
I don’t use this method.

Example 4

public sealed class Singleton {
   private volatile static Singleton _instance;
   private static readonly object _lock = new object();

   private Singleton() {
   }

   public static Singleton getInstance() {
      if (_instance == null) {
         // Lock area where instance is created
         lock(_lock) {
            if (_instance == null) {
               _instance = new Singleton();
            }
         }
      }
      return _instance;
   }
}

There are quite a few other ways of implementing the singleton.
Many of which are flawed to some degree.
So I generally stick with the above implementations.

How to optimise your testing effort

March 24, 2012

I recently wrote a post for the company I currently work for around the joys of doing TDD.
You can check it out here.

What is your current approach to testing?
How can you spend the little time you have on the most important areas?

I thought I’d share some thoughts around where I see the optimal areas to invest your test effort.
I got to thinking last night, and when I was asleep.
We are putting too much effort into our UI, UA and system tests.
We are writing to many of them, thus we’re creating a top heavy test structure that will sooner or later topple.
These tests have their sweet spot, but they are slow, fragile and time consuming to write.

We should have a small handful for each user story to provide some UA, and the rest should be without the UI and database (the slow and fragile bits).
We need to get our mind sets lower down the test triangle.

test triangle

I’ll try and explain why we should be doing less Manual tests, followed by GUI tests, followed by UA tests, followed by integration tests, followed by Unit tests.

Try not to test the UI with the lower architectural layers included in the tests.
UI tests should have the lower layers mocked and / or stubbed.
Check out Dummy vs Fake vs Stub vs Mock
Full end to end system tests are not required to validate UI field constraints.
Dependency injection really helps us here.

When you are explicitly testing the upper levels of the test triangle, the lower / immediate lower layers are implicitly being tested.
So you might think, cool, if we invest in the upper layers, we implicitly cover the lower layers.
That’s right, but the disadvantages of the higher level tests outweigh the advantages.
UI tests and especially ones that go from end to end, should be avoided, or very few in number,
as they are fragile and incur high maintenance costs.
If we create to many of these, confidence in their value diminishes.
Read on and you’ll find out why.

Lets look at cost vs value to the business.

Some tests cost a lot to create and modify.
Some cost little to create and modify.
Some yield high value.
Some yield low value.
We only have so much time for testing,
so lets use it in the areas that provide the greatest value to the business.
Greatest value of course, will be measured differently for each feature.
There is no stock standard answer here, only guidelines.
What we’re aiming for is to spend the minimum effort (cost) and get the maximum benefit (value).
Not the other way around…
With the following set of scales, we’ve spent to much in the wrong areas, yielding suboptimal value.

cost verse business value

It’s worth the effort to get under the UI layer and do the required setup incl mocking the layers below.
It’s also not to hard to get around the likes of the HttpContext hierarchy of classes (HttpRequest, HttpResponse, and so on) encountered in ASP.NET Web Forms and MVC.

Beware

  • the higher level tests get progressively more expensive to create and maintain.
  • They are slower to run, which means they don’t run as part of CI, but maybe the nightly build.
    Which means there is more latency in the development cycle.
    Developers are less likely to run them manually.
  • When  they break, it takes longer to locate the fault, as you have all the layers below to go through.

Unreliable tests are a major cause for teams ignoring or losing confidence in automated tests.
UI, Acceptance, followed by integration tests are usually the culprits for causing this.
Once confidence is lost, the value initially invested in the automated tests is significantly reduced.
Fixing failing tests and resolving issues associated with brittle tests should be a priority to remove false positives.

Planning the test effort

This is usually the first step we do when starting work on a user story,
or any new feature.
We usually create a set of Test Conditions (Given/When/Then)

Given When Then
There are no items in the shopping cart Customer clicks “Purchase” button for a book which is in stock 1 x book is added to shopping cart. Book is held – preventing selling it twice.
Customer clicks “Purchase” button for a book which is not in stock Dialog with “Out of stock” message is displayed and offering customer option of putting book on back order.

for Product Backlog items where there are enough use cases for it to be worth doing.
Where we don’t create Test Conditions, we have a Test Condition workshop.
In the workshop we look at the What, How, Who and Why in that order.
The test quadrant (pictured below) assists us in this.
In the workshop, we write the previously recorded Acceptance Criteria on a board (the What) and discuss the most effective way to verify that the conditions are meet (the How)
With the how we look at the test triangle and the test quadrant and decide where our time is most effectively spent.

Test condition workshop

With the test condition workshop,
when we start on a user story (generally a feature in the sprint backlog),
we plan where we are going to spend our test resource.
Think about What, and sometimes Who, but not How.
The How comes last.

Unit tests are the developers bread and butter.
They are cheap to create and modify,
and consistently yield not only good value to the developers,
but implicitly good value to most / all other areas.
This is why they sit at the bottom of the test triangle.
This is why TDD is as strong as it is today.
test quadrant

The hierarchy of criteria that we use to help us

  1. Release Criteria
    Ultimately controlled by the Product Owner or release manager.
  2. Acceptance Criteria
    Also owned by the Product Owner.
    Attached to each user story, or more correctly… product backlog item.
    The Development team must meet these in order to fulfill the Definition of Done.
  3. Test Conditions
    When executable, confirm the development team have satisfied the requirements of the product backlog item.

Write your tests first

TDD is  not about testing, it’s about creating better designs.
This forces us to design better software. “Testable”, “Modular”, separating concerns, Single responsibility principle.
This forces us down the path of SOLID Principles.

red green refactor

  1. Write a unit test
    Run it and watch it fail (because the production code is not yet written)
  2. Write just enough production code to make the test pass
  3. Re-run the test and watch it pass

This podcast around TDD has lots of good info.

Continuous Integration

Realise the importance of setting up CI and nightly builds.
The benefit of having your unit (fast running) tests automatically executed regularly are great.
You get rapid feedback, which is crucial to an agile team completing features on time.
Tests that are not being run regularly have the risk that they may be failing.
The sooner you find a failing test, the easier it is to fix the code.
The longer it’s left unattended, the more technical debt you accrue and the more effort is required to hunt down the fault.
Make the effort to get your tests running on each commit or push.

Nightly Builds

The slower running tests (that’s all the automated tests above unit tests on the triangle), need to be run as part of a nightly build.
We can’t have these running as part of the CI because they are just too slow.
If something gets in the way of a developers work flow, it won’t get done.

Pair Review

Don’t forget to pair review all code written.
In my current position we’ve been requesting reviews verbally and responding with emails, comments on paper.
This is not ideal and we’re currently evaluating review software, of which there are many offerings.

Keeping your events thread safe

March 11, 2012

An area I’ve noticed where engineers often forget to think about synchronization is where firing events.
Now I’m going to go over a little background on C# delegates quickly just to refresh what we learnt or should have learnt years ago at the beginnings of the C# language.

It seems to be a common misconception, that all that is needed to keep synchronisation,
is to check the delegate (technically a MulticastDelegate, or in architectural terms the publisher of the publish-subscribe pattern (more commonly known as the observer pattern)) for null.

Defining the publisher without using the event keyword

public class Publisher {
   // ...

   // Define the delegate data type
   public delegate void MyDelegateType();

   // Define the event publisher
   public MyDelegateType OnStateChange {
      get{ return _onStateChange;}
      set{ _onStateChange = value;}
   }
   private MyDelegateType _onStateChange;

   // ...
}

When you declare a delegate, you are actually declaring a MulticastDelegate.
The delegate keyword is an alias for a type derived from System.MulticastDelegate.
When you create a delegate, the compiler automatically employs the System.MulticastDelegate type rather than the System.Delegate type.
When you add a method to a multicast delegate, the MulticastDelegate class creates a new instance of the delegate type, stores the object reference and the method pointer for the added method into the new instance, and adds the new delegate instance as the next item in a list of delegate instances.
Essentially, the MulticastDelegate keeps a linked list of Delegate objects.

It’s possible to assign new subscribers to delegate instances, replacing existing subscribers with new subscribers by using the = operator.
Most of the time what is intended is actually the += operator (implemented internally by using System.Delegate.Combine()).
System.Delegate.Remove() is what’s used when you use the -+ operator on a delegate.

class Program {
   public static void Main() {

      Publisher publisher = new Publisher();
      Subscriber1 subscriber1 = new Subscriber1();
      Subscripber2 subscripber2 = new Subscripber2();

      publisher.OnStateChange = subscriber1.OnStateChanged;

      // Bug: assignment operator overrides previous assignment.
      // if using the event keyword, the assignment operator is not supported for objects outside of the containing class.
      publisher.OnStateChange = subscriber2.OnStateChanged;

   }
}

Another short coming of the delegate is that delegate instances are able to be invoked outside of the containing class.

class Program {
   public static void Main() {
      Publisher publisher = new Publisher();
      Subscriber1 subscriber1 = new Subscriber1();
      Subscriber2 subscriber2 = new Subscriber2();

      publisher.OnStateChange += subscriber1.OnStateChanged;
      publisher.OnStateChange += subscriber2.OnStateChanged;

      // lack of encapsulation
      publisher.OnStateChange();
   }
}

C# Events come to the rescue

in the form of the event keyword.
The event keyword address’s the above problems.

The modified Publisher looks like the following:

public class Publisher {
   // ...

   // Define the delegate data type
   public delegate void MyDelegateType();

   // Define the event publisher
   public event MyDelegateType OnStateChange;

   // ...
}

Now. On to synchronisation

The following is an example from the GoF guys with some small modifications I added.
You’ll also notice, that the above inadequacies are taken care of.
Now if the Stock.OnChange is not accessed by multiple threads, this code is fine.
If it is accessed by multiple threads, it’s not fine.
Why I hear you ask?
Well, between the time the null check is performed on the Change event
and when Change is fired, Change could be set to null, by another thread.
This will of course produce a NullReferenceException.

The code on lines 59,60 is not atomic.

using System;
using System.Collections.Generic;

namespace DoFactory.GangOfFour.Observer.NETOptimized {
    /// <summary>
    /// MainApp startup class for .NET optimized
    /// Observer Design Pattern.
    /// </summary>
    class MainApp {
        /// <summary>
        /// Entry point into console application.
        /// </summary>
        static void Main() {
            // Create IBM stock and attach investors
            var ibm = new IBM(120.00);

            // Attach 'listeners', i.e. Investors
            ibm.Attach(new Investor { Name = "Sorros" });
            ibm.Attach(new Investor { Name = "Berkshire" });

            // Fluctuating prices will notify listening investors
            ibm.Price = 120.10;
            ibm.Price = 121.00;
            ibm.Price = 120.50;
            ibm.Price = 120.75;

            // Wait for user
            Console.ReadKey();
        }
    }

    // Custom event arguments
    public class ChangeEventArgs : EventArgs {
        // Gets or sets symbol
        public string Symbol { get; set; }

        // Gets or sets price
        public double Price { get; set; }
    }

    /// <summary>
    /// The 'Subject' abstract class
    /// </summary>
    abstract class Stock {
        protected string _symbol;
        protected double _price;

        // Constructor
        public Stock(string symbol, double price) {
            this._symbol = symbol;
            this._price = price;
        }

        // Event
        public event EventHandler<ChangeEventArgs> Change;

        // Invoke the Change event
        private void OnChange(ChangeEventArgs e) {
            // not thread safe
            if (Change != null)
                Change(this, e);
        }

        public void Attach(IInvestor investor) {
            Change += investor.Update;
        }

        public void Detach(IInvestor investor) {
            Change -= investor.Update;
        }

        // Gets or sets the price
        public double Price {
            get { return _price; }
            set {
                if (_price != value) {
                    _price = value;
                    OnChange(new ChangeEventArgs { Symbol = _symbol, Price = _price });
                    Console.WriteLine("");
                }
            }
        }
    }

    /// <summary>
    /// The 'ConcreteSubject' class
    /// </summary>
    class IBM : Stock {
        // Constructor - symbol for IBM is always same
        public IBM(double price)
            : base("IBM", price) {
        }
    }

    /// <summary>
    /// The 'Observer' interface
    /// </summary>
    interface IInvestor {
        void Update(object sender, ChangeEventArgs e);
    }

    /// <summary>
    /// The 'ConcreteObserver' class
    /// </summary>
    class Investor : IInvestor {
        // Gets or sets the investor name
        public string Name { get; set; }

        // Gets or sets the stock
        public Stock Stock { get; set; }

        public void Update(object sender, ChangeEventArgs e) {
            Console.WriteLine("Notified {0} of {1}'s " +
                "change to {2:C}", Name, e.Symbol, e.Price);
        }
    }
}

At least we don’t have to worry about the += and -= operators. They are thread safe.

Ok. So how do we make it thread safe?
Now I’ll do my best not to make your brain hurt.
We can assign a local copy of the event and then check that instead.
How does that work you say?
The Change delegate is a reference type.
You may think that  threadSafeChange references the same location as Change,
thus any changes to Change would also be reflected in threadSafeChange.
That’s not the case though.
Change += investor.Update does not add a new delegate to Change, but assigns it a new MulticastDelegate,
which has no effect on the original MulticastDelegate that threadSafeChange also references.

The reference part of reference type local variables is stored on the stack.
A new stack frame is created for each thread with every method call
(whether its an instance or static method).
All local variables are safe…
so long as they are not reference types being passed to another thread or being passed to another thread by ref.
So, only one thread can access the threadSafeChange instance.

private void OnChange(ChangeEventArgs e) {
   // assign reference to heap allocated memory to stack allocated implements thread safety
   EventHandler<ChangeEventArgs> threadSafeChange = Change;
   if ( threadSafeChange != null)
      threadSafeChange(this, e);
}

Now for a bit of error handling

If one subscriber throws an exception, any subscribers later in the chain do not receive the publication.
One way to get around this problem, is to semantically override the enumeration of the subscribers.
Thus providing the error handling.

private void OnChange(ChangeEventArgs e) {
   // assign reference to heap allocated memory to stack allocated implements thread safety
   EventHandler<ChangeEventArgs> threadSafeChange = Change;
   if ( threadSafeChange != null) {
      foreach(EventHandler<ChangeEventArgs> handler in Change.GetInvocationList()) {
         try {
            //if subscribers delegate methods throw an exception, we'll handle in the catch and carry on with the next delegate
            handler(this, e);
            // if we only want to allow a single subscriber
            if (Change.GetInvocationList().Length > 1)
               throw new Exception("Too many subscriptions to the Stock.Change" /*, provide a meaningful inner exception*/);
         }
         catch (Exception exception) {
            // what we do here depends on what stage of development we are in.
            // if we're in early stages, pre-release, fail early and hard.
         }
      }
   }
}

LSP / DbC and .NET’s support

October 12, 2010
Part 2

Some Examples

Bear in mind, there are additional overloads for most/all of the following procedures.
Be sure to check them out.

Examples of Preconditions

Contract.Assert

Ensures that a condition is true when in debug mode.

 

Contract.Assert(MyInput != null);

Contract.Assume

Instructs code analysis tools to assume that the condition supplied as an argument is true, even if it can not be statically proven to always be true.
Good example of how this can be used.

 

Contract.Assume(MyInput != null);

Contract.Requires

Must be at the beginning of a method or property. Ensures that a condition is true before any other code.
The following example ensures that MyInput parameter is valid (not null in this case) before any other code is run on the same thread.

 

Contract.Requires(MyInput != null);

Contract.EndContractBlock

Is used to inform the compiler to treat all code before it as if it was a code contract.
This is useful for legacy code.
For example if you have parameter validation code that does not use Code Contract’s.
The contract tools recognise if/then/throw statements as preconditions when the statements appear first inside a method or property,
and the set of such statements are followed by a Contract method call, such as Requires, Ensures, EnsuresOnThrow or EndContractBlock.

 

if(MyInput == null)
    throw new NullReferenceException("The input is null");
Contract.EndContractBlock();

Examples of Postconditions

Contract.Ensures

Ensures that a condition is true on exit of a method or property.

 


Contract.Ensures(0 <= speed && speed <= 800);

Contract.EnsuresOnThrow

This method call must be at the beginning of a method or property, before any other code.
If an exception is thrown, the specified condition must be true and the exception being thrown is of the specified type parameter.

 

public bool AddAndProcessItem(T item)
{
	Contract.EnsuresOnThrow(!item.Added);

	try
	{
		 // add item to a collection and set it's Added property to true
	}
	catch (AdditionException)
	{
		 item.Added = false;
		 throw;
	}

	// ... other operations
	return true;
}

Contract.ForAll

Allows the iteration through a collection to ensure all members meet a specific requirement.

 

// first param MyCollection is of type IEnumerable
// second param is a delegate, or lambda (often called a predicate when used like this)
Contract.ForAll(MyCollection, x=>x!=null);

Object Invariants

Used to mark methods that contain object invariant specifications.
Defined by decorating a method with the ContractInvariantMethodAttribute.
ContractInvariantMethodAttribute can only be used on a single method per class.
The method must return void and carry out no operations other than a sequence of calls to Contract.Invariant

 

[ContractInvariantMethod]
void ObjectInvariant ()
{
   Contract. Invariant ( _y >= 0 );
   Contract. Invariant ( _x > _y );
   ...
}

Code Contract Values

There are also some useful pseudo variables that can be used when writing postconditions.

Contract.Result

A methods or properties return value can be refered to by Contract.Result<T>()
Contract.Result<T>() can be used only in the conditional expression for the Ensures contract.
Methods with a return type of void cannot refer to Contract. Result<T>() within their postconditions for the obvious reason.

 


Contract.Ensures(0 < Contract.Result());

Contract.OldValue

Contract.OldValue<T>(T value) can be used only in the conditional expression for the Ensures contract.
The generic type argument may be omitted whenever the compiler is able to infer its type.
There are a few restrictions around the use of OldValue. Have a look at the User Manual document. Listed at the end of the post.

 

Contract.Ensures(MyInput != Contract.OldValue(MyInput));

Pure

Using the PureAttribute indicates that a type, method, property, etc is pure, that is, It has no visible side-effects for callers.
You can only use a method/property/etc in a contract if it is declared Pure.
If the method/property/etc is not decorated with this attribute when called from within a contract, a warning will be given “Detected call to impure method”.

See this post, it has a good example.

Interface Contracts

Contracts for interfaces must be defined in a separate class decorated with the ContractClassForAttribute.
The interface sporting the contract must be decorated with the ContractClassAttribute specifying the type that has the interfaces contract.
The class that implements the interfaces contract is expected to be abstract.
The User Manual listed at the end of this post has some good examples of how to set up Interface Contracts.

 

[ContractClass(typeof(ContractForInteface))]
interface IDoSomething
{
	int DoSomething(int value);
}

[ContractClassFor(typeof(IDoSomething))]
sealed class ContractForInteface : IDoSomething
{
	int IDoSomething.DoSomething(int value)
	{
		Contract.Requires( value != 0);
		//contracts require a dummy value
		return default(int);
	}
}

There’s plenty of good documentation around Code Contracts

System.Diagnostics.Contracts Namespace
http://msdn.microsoft.com/en-us/library/system.diagnostics.contracts.aspx
The Contract Class
http://msdn.microsoft.com/en-us/library/system.diagnostics.contracts.contract.aspx
Brief description of Code Contracts and how to use them
http://msdn.microsoft.com/en-us/library/dd264808.aspx
The User Manual
http://download.microsoft.com/download/C/2/7/C2715F76-F56C-4D37-9231-EF8076B7EC13/userdoc.pdf
Code Contracts on DevLabs
http://msdn.microsoft.com/en-us/devlabs/dd491992.aspx
DimeCasts has a few web casts on Code Contracts
http://dimecasts.net/Casts/ByTag/Code%20Contracts

LSP / DbC and .NET’s support

October 11, 2010
Part 1

Two design principle’s I believe go hand in hand are

  1. Dr. Barbara Liskov’s Liskov Substitution Principle (LSP)
  2. Dr. Bertrand Meyer’s Design by Contract (DbC)

Lets keep our Object Oriented relationships adhering to the LSP and the DbC.
The LSP and the DbC provide both theory and practical guidelines of how we as software engineers can design…

  • Hierarchical components that behave the same across all layers.
  • Systems that produce predictable results on a layer, from changes made on another layer (providing architecture that can be reasoned about).
    As programmers, we need to be able to make correct assumptions about the behaviour of abstract data types and their derivatives.
    LSP and DbC provide us with the ability to create intuitive class hierarchies that support the object oriented architectural reasoning process.
  • Extensible code. Ability to add functionality without having to change existing code, but rather extending it.

LSP

The LSP states that you shouldn’t inherit from a base class unless the derived class truly “is a” more specific version of the base class (Liskov 1988).
Subtype objects must be behaviourally substitutable for supertype objects.
Programmers must be able to reason correctly about and rely upon the behaviour of subtypes using only the specification of the supertype behaviour.

Robert C. Martin wrote this excellent article on LSP.
Which I found here along with many other great articles.

Andy Hunt and Dave Thomas summarise LSP like this: “Subclasses must be usable through the base class interface without the need for the user to know the difference.”
In other words, all the routines defined in the base class should mean the same thing when they’re used in each of the derived classes.

As Steve McConnell in Code Complete puts it:
If you have a base class of Account and derived classes of CheckingAccount, SavingsAccount, and AutoLoanAccount.
A programmer should be able to invoke any of the routines derived from Account on any of Account’s subtypes without caring about which subtype a specific account object is.

If a program has been written so that the LSP is true, inheritance is a powerful tool for reducing complexity because a programmer can focus on the generic attributes of an object without worrying about the details.
If a programmer must be constantly thinking about semantic differences in a subclass implementation, then inheritance is increasing complexity rather than reducing it.
This goes against software’s primary technical imperative of managing complexity.
Suppose a programmer has to think this
“If I call the InterestRate() routine on CheckingAccount or SavingsAccount,
it returns the interest the bank pays,
but if I call InterestRate() on AutoLoanAccount I have to change the sign because it returns the interest the consumer pays to the bank.”
According to LSP, AutoLoanAccount should not inherit from the Account base class in this example
because the semantics of the InterestRate() routine are not the same as the semantics of the base class’s InterestRate() routine.

DbC

From my understanding. DbC provides constraints which enforce the LSP.
LSP is notional. DbC provides us with the practical approach.

.Net 4.0 gives us Code Contracts which directly support the DbC principle.
In looking at what the .Net framework 4.0 provides us with in the System.Diagnostics.Contracts namespace.
Bear in mind, this is not a language extension or a runtime extension. It’s just a library.
I think that using the static class’s in your code, that System.Diagnostics.Contracts provides would be a prime candidate for Aspect Oriented Programming.
It looks like the .Net CLR has support for AOP, although it doesn’t sound like it has been implemented very cleanly.
Have a look at this.
IMHO, when using AOP, the actual business code utilising the aspects, should not show any signs of the aspects (Code Contracts (in our case)).
Although it looks like there are some open source alternatives that do the job quite nicely.
Anyway, without diving into AOP, lets save it for another post and move on.

DbC is actually a registered trademark of Eiffel Software.
Bertrand Meyer when designing the Eiffel programming language came up with this term.
DbC provides us with class invariants, preconditions and postconditions.

C# for artists by Rick Miller has a good explanation of these.

Class Invariant

A class invariant is an assertion about an object property that must hold true for all valid states the object can assume.
For example, suppose an airplane object has a speed property that can be set to a range of integer values between 0 and 800.
This rule should be enforced for all valid states an airplane object can assume.
All methods that can be invoked on an airplane object must ensure they do not set the speed property to less than 0 or greater than 800.

Precondition

A precondition is an assertion about some condition that must be true before a method can be expected to perform it’s operation correctly.
For example, suppose the airplane object’s speed property can be incremented by some value and there exists in the set of airplane’s public interface methods one that increments the speed property any where from 1 to 5 depending on the value of the argument supplied to the method.
For this method to perform correctly, it must check that the argument is in fact a valid increment value of 1,2,3,4, or 5.
If the increment value tests valid then the precondition holds true and the increment method should perform correctly.
The precondition must be true before the method is called, therefore it is the responsibility of the caller to make the precondition true, and the responsibility of the called method to enforce the truth of the precondition.

Postcondition

A postcondition is an assertion that must hold true when a method completes its operations and returns to the caller.
For example, the airplane’s speed increment method should ensure that the class invariant speed property being 0<=speed<=800 holds true when the increment method completes its operations.

Code Contracts can run on .Net 4.0 or previous versions.

Code Contracts as a whole include

1)   Static methods from the Contract class

  • Express assumptions and constraints, I.E. the object invariants, pre and post conditions.

2)   Static analyzer or static checker

  • Provide compile-time analysis.
  • Checks for implicit contracts, such as null dereferences and checking array bounds, as well as the explicit developer provided contracts.
  • Only available in Microsoft’s Premium edition or above of Visual Studio.

3)   Runtime analyzer or binary rewriter

  • modifies the IL by injecting the contracts.
  • performs runtime checking.

Once you’ve installed either the Standard or Premium Edition of the managed code contracts from DevLabs.
Projects within Visual Studio will now have an extra property pane entitled “Code Contracts”.
This pane provides configuration options for both static and runtime contract checking.

I think the idea is that if your project is targeting:

  1. A .Net framework version prior to 4.0, the System.Diagnostics.Contracts namespace is in the Microsoft.Contracts.dll.
  2. The .Net framework version is 4.0, the System.Diagnostics.Contracts namespace is in the mscorlib.dll.

Discussion on Class Construction Techniques

October 10, 2010

I had a discussion with some work colleges a short while ago,
around a couple of different techniques of constructing a class object.
The two approaches involved in the discussion where…

  1. Should we prefer constructing an object by providing public access to its members and initialising them external to the class, like the Builder pattern does?
  2. Or by initialising the objects members within its constructor, I.E. like the Factory Method and Abstract Factory does?

My take on this, was that it would be best object oriented practice to keep the initialisation of the objects members within the constructor if possible.
As far as I’m aware, there seems to be more support for the “keeping initialisation within the constructor”.

Some of my supporting arguments were the following

From Steve McConnell’s Code Complete

Chapter 6: Working Classes
Initialise all member data in all constructors, if possible.
Initialising all data members in all constructors is an inexpensive defensive programming practice.

Chapter 10: General Issues in Using Variables
Initialise a class’s member data in its constructor.
Just as a routine’s variables should be initialised within each routine, a class’s data should be initialised within its constructor.

GOF
Builder pattern verses the likes of the Factory Method and Abstract Factory.
Notice the Frequency of use for the Builder verses the other two?

Examples of class’s that know how to populate themselves

 

You can get code examples here if you’re not familiar with the patterns.
Any feedback on what people think about the before mentioned approaches would be great.