Posts Tagged ‘XSS’

Up and Running with Kali Linux and Friends

March 29, 2014

When it comes to measuring the security posture of an application or network, the best defence against an attacker is offence. What does that mean? It means your best defence is to have someone with your best interests (generally employed by you), if we’re talking about your asset, assess the vulnerabilities of your asset and attempt to exploit them.

In the words of Offensive Security (Creators of Kali Linux), Kali Linux is an advanced Penetration Testing and Security Auditing Linux distribution. For those that are familiar with BackTrack, basically Kali is a new creation based on Debian rather than Ubuntu, with significant improvements over BackTrack.

When it comes to actually getting Kali on some hardware, there is a multitude of options available.

All externally listening services by default are disabled, but very easy to turn on if/when required. The idea being to reduce chances of detecting the presence of Kali.

I’ve found the Kali Linux documentation to be of a high standard and plentiful.

In this article I’ll go over getting Kali Linux installed and set-up. I’ll go over a few of the packages in a low level of detail (due to the share number of them) that come out of the box. On top of that I’ll also go over a few programmes I like to install separately. In a subsequent article I’d like to continue with additional programmes that come with Kali Linux as there are just to many to cover in one go.

System Requirements

  1. Minimum of 8 GB disk space is required for the Kali install
  2. Minimum RAM 512 MB
  3. CD/DVD Drive or USB boot support

Supported Hardware

Officially supported architectures

i386, amd64, ARM (armel and armhf)

Unofficial (but maintained) images

You can download official Kali Linux images for the following, these are maintained on a best effort basis by Offensive Security.

  • VMware (pre-made vm with VMware tools installed)

ARM images

  • rk3306 mk/ss808CPU: dual-core 1.6 GHz A9

    RAM: 1 GB

  • Raspberry Pi
  • ODROID U2CPU: quad-core 1.7 GHz

    RAM: 2GB

    Ethernet: 10/100Mbps

  • ODROID X2CPU: quad-core Cortex-A9 MPCore

    RAM: 2GB

    USB 2: 6 ports

    Ethernet: 10/100Mbps

  • MK802/MK802 II
  • Samsung Chromebook
  • Galaxy Note 10.1
  • CuBox
  • Efika MX
  • BeagleBone Black

Create a Customised Kali Image

Kali also provides a simple way to create your own ISO image from the latest source. You can include the packages you want and exclude the ones you don’t. You can customise the kernel. The options are virtually limitless.

The default desktop environment is Gnome, but Kali also provides an easy way to configure which desktop environment you use before building your custom ISO image.

The alternative options provided are: KDE, LXDE, XFCE, I3WM and MATE.

Kali has really embraced the Debian ethos of being able to be run on pretty well any hardware with extreme flexibility. This is great to see.

Installation

You should find most if not all of what you need here. Just follow the links specific to your requirements.

As with BackTrack, the default user is “root” without the quotes. If your installing, make sure you use a decent password. Not a dictionary word or similar. It’s generally a good idea to use a mix of upper case, lower case characters, numbers and special characters and of a decent length.

I’m not going to repeat what’s already documented on the Kali site, as I think they’ve done a pretty good job of it already, but I will go over some things that I think may not be 100% clear at first attempt. Also just to be clear, I’ve done this on a Linux box.

Now once you have down loaded the image that suites your target platform,

you’re going to want to check its validity by verifying the SHA1 checksums. Now this is where the instructions can be a little confusing. You’ll need to make sure that the SHA1SUMS file that contains the specific checksum you’re going to use to verify the checksum of the image you downloaded, is in fact the authentic SHA1SUMS file. instructions say “When you download an image, be sure to download the SHA1SUMS and SHA1SUMS.gpg files that are next to the downloaded image (i.e. in the same directory on the server).”. You’ve got to read between the lines a bit here. A little further down the page has the key to where these files are. It’s buried in a wget command. Plus you have to add another directory to find them. The location was here. Now that you’ve got these two files downloaded in the same directory, verify the SHA1SUMS.gpg signature as follows:

$ gpg --verify SHA1SUMS.gpg SHA1SUMS
gpg: Signature made Thu 25 Jul 2013 08:05:16 NZST using RSA key ID 7D8D0BF6
gpg: Good signature from "Kali Linux Repository <devel@kali.org>

You’ll also get a warning about the key not being certified with a trusted signature.

Now verify the checksum of the image you downloaded with the checksum within the (authentic) SHA1SUMS file

Compare the output of the following two commands. They should be the same.

# Calculate the checksum of your downloaded image file.
$ sha1sum [name of your downloaded image file]
# Print the checksum from the SHA1SUMS file for your specific downloaded image file name.
$ grep [name of your downloaded image file] SHA1SUMS

Kali also has a live USB Install including persistence to your USB drive.

Community

IRC: #kali-linux on FreeNode. Stick to the rules.

What’s Included

> 300 security programmes packaged with the operating system:

Before installation you can view the tools included in the Kali repository.

Or once installed by issuing the following command:

# prints complete list of installed packages.
dpkg --get-selections | less

To find out a little more about the application:

dpkg-query -l '*[some text you think may exist in the package name]*'

Or if you know the package name your after:

dpkg -l [package name]

Want more info still?

man [package name]

Some of the notable applications installed by default

Metasploit

Framework that provides the infrastructure to create, re-use and automate a wide variety of exploitation tasks.

If you require database support for Metasploit, start the postgresql service.

# I like to see the ports that get opened, so I run ss -ant before and after starting the services.
ss -ant
service postgresql start
ss -ant

ss or “socket statistics” which is a new replacement programme for the old netstat command. ss gets its information from kernel space via Netlink.

Start the Metasploit service:

ss -ant
service metasploit start
ss -ant

When you start the metasploit service, it will create a database and user, both with the names msf3, providing you have your database service started. Now you can run msfconsole.

Start msfconsole:

msfconsole

The following is an image of terminator where I use the top pane for stopping/starting services, middle pane for checking which ports are opened/closed, bottom pane for running msfconsole. terminator is not installed by default. It’s as simple as apt-get install terminator

metasploit

You can find full details of setting up Metasploits database and start/stopping the services here.

You can also find the Metasploit frameworks database commands simply by typing help database at the msf prompt.

# Print the switches that you can run msfconsole with.
msfconsole -h

Once your in msf type help at the prompt to get yourself started.

There is also a really easy to navigate all encompassing set of documentation provided for msfconsole here.

You can also set-up PostgreSQL and Metasploit to launch on start-up like this:

update-rc.d postgresql enable
update-rc.d metasploit enable

Offensive Security also has a Metasploit online course here.

Armitage

Just as it was included in BackTrack, which is no longer supporting Armitage, you’ll also find Armitage comes installed out of the box in version 1.0.4 of Kali Linux. Armitage is a GUI to assist in metasploit visualisation. You can find the official documentation here. Offensive Security has also done a good job of providing their own documentation for Armitage over here. To get started with Armitage, just make sure you’ve got the postgresql service running. Armitage will start the metasploit service for you if it’s not already running. Armitage allows your red team to collaborate by using a single instance of Metasploit. There is also a commercial offering developed by Raphael Mudge’s company “Strategic Cyber LLC” which also created Armitage, called Cobalt Strike. Cobalt Strike currently costs $2500 per user per year. There is a 21 day trial though. Cobalt Strike offers a bunch of great features. Check them out here. Armitage can connect to an existing instance of Metasploit on another host.

NMap

Target use is network discovery and auditing. Provides host information for anything it can access from a network. Also now has a scripting engine that can execute arbitrary custom tasks.

I’m guessing we’ve probably all used NMap? ZenMap which Kali Linux also provides out of the box Is a gui for NMap. This was also included in BackTrack.

Intercepting Web Proxies

Burp Suite

I use burp quite regularly and have a few blog posts where I’ve detailed some of it’s use. In fact I’ve used it to reverse engineer the comms between VMware vSphere and ESXi to create a UPS solution that deals with not only virtual hosts but also the clients.

WebScarab

I haven’t really found out what webscarab’s sweet spot is if it has one. I’d love to know what it does better than burp, zap and w3af combined? There is also a next generation version which according to the google code repository hasn’t had any work done on it since March 2011, where as the classic version is still receiving fixes. The documentation has always seemed fairly minimalistic also.

In terms of web proxy/interceptors I’ve also used fiddler which relies on the .NET framework and as mono is not installed out of the box on Kali, neither is fiddler.

OWASP Zed Attack Proxy (ZAP)

Which is an OWASP flagship project, so it’s free and open source. Cross platform. It was forked from the Paros Proxy project which is not longer supported. Includes automated, passive, brute force and port scanners. Traditional and AJAX spiders. Can even find unlinked files. Provides fuzzing, port scanning. Can be run without the UI in headless mode and can be accessed via a REST API. Supports Anti CSRF tokens. The Script Console that is one of the add-ons supports any language that JSR (Java Specification Requests) 223 supports. That’s languages such as JavaScript Groovy, Python, Ruby and many more. There is plenty of info on the add-ons here. OWASP also provide directions on how to write your own extensions and they provide some sample templates. Following is the list of current extensions, which can also be managed from within Zap. “Manage Add-ons” menu → Marketplace tab. Select and click “Install Selected”

OWASP Zap

The idea is to first set Zap up as a proxy for your browser. Fetch some web pages (build history). Zap will create a history of URLs. You then right click the item of interest and click Attack->[one of the spider options], then click the play button and watch the progress bar. which will crawl all the pages you have access to according to your permissions. Then under the Analyse menu → Scan Policy… Setup your scan policy so your only scanning what you want to scan. Then hit Scan to assess your target application. Out of the box, you’ve got many scan options. Zap does a lot for you. I’m really loving this tool OWASP!

As usual with OWASP, zap has a wealth of documentation. If zap doesn’t provide enough out of the box, extend it. OWASP also provide an API for zap.

You can find the user group here (also accessible from the ZAP ‘Online’ menu.), which is good for getting help if the help file (which can also be found via ZAP itself) fails to yeild. There is also a getting started guide which is a work in progress. There is also the ZAP Blog.

FoxyProxy

Although nothing to do with Kali Linux and could possibly be in the IceWeasel add-ons section below, I’ve added it here instead as it really reduces friction with web proxy interception. FoxyProxy is a very handy add-on for both firefox and chromium. Although it seems to have more options for firefox, or at least they are more easily accessible. It allows you to set-up a list of proxies and then switch between them as you need. When I run chromium as a non root user I can’t change the proxy settings once the browser is running. I have to run the following command in order to set the proxy to my intermediary before run time like this:

chromium-browser --temp-profile –proxy-server=localhost:3001

Firefox is a little easier, but neither browsers allow you to build up lists of proxies and then switch them in mid flight. FoxyProxy provides a menu button, so with two clicks you can disable the add-on completely to revert to your previous settings, or select any or your predefined proxies. This is a real time saver.

Vulnerability Scanners

Open Vulnerability Assessment System (OpenVAS)

Forked from the last free version (closed in 2005) of Nessus. OpenVAS plugins are written in the same language that Nessus uses. OpenVAS looks for known misconfigurations and vulnerabilities common in out of date software. In fact it covers the following OWASP Top 10 items:

  • No.5 Security Misconfiguration
  • No.7 Missing Function Level Access Control (formerly known as “failure to restrict URL access”)
  • No.9 Using Components with Known Vulnerabilities.

OpenVAS also has some SQLi and other probes to test application input, but it’s primary purpose is to scan networks of machines with out of date software and bad configurations.

Tests continue to be added. Now currently at 32413 Network Vulnerability Tests (NVTs) details here.

OpenVAS

Greenbone Security Desktop (gsd) who’s package is a GUI that uses the Greenbone Security Manager, OpenVAS Manager or any other service that offers the OpenVAS Management Protocol (omp) protocol. Currently at version 1.2.2 and licensed under the GPLv2. The Greenbone Security Assistant (gsad) is currently at version 4.0.0. The Germany government also sponsor OpenVAS.

From the menu: Kali Linux → Vulnerability Analysis → OpenVAS, we have a couple of short-cuts visible. openvas-gsd is actually just the gsd package and openvas-setup which is the set-up script.

Before you run openvas-gsd, you can either:

  1. Run openvas-setup which will do all the setup which I think is already done on Kali. At the end of this, you will be prompted to add a password for a user to the Admin role. The password you add here is for a new user called “admin” (of course it doesn’t say that, so can be a little confusing as to what the password is for).
  2. Or you can just run the following command, which is much quicker because you don’t run the set-up procedure:
openvasad -c 'add_user' -n [a new administrative username of your choosing] -r Admin

You’ll be prompted to add a new password. Make sure you remember it.

Check out the man page for further options. For example the -c switch is a shortened –command and it lists a selection of commands you can use.

I think -n is for –name although not listed in the man page. -r switch is –role. Either User or Admin.

The user you’ve just added is used to connect the gsd to the:

  1. openvasmd (OpenVAS Manager daemon) which listens on port 9390
  2. openvassd (OpenVAS Scanner daemon) which listens on port 9391
  3. gsad (Greenbone Security Assistant daemon) which listens on port 9392. This is a web app, which also listens on port 443
  4. openvasad (OpenVAS Administrator daemon) which listens on 9393

The core functionality is provided by the scanner and the manager. The manager handles and organises scan results. The gsad or assistant connects to the manager and administrator to provide a fully featured user interface. There is also a CLI (omp) but I haven’t been able to get this going on Kali Linux yet. You’ll also find that the previous link has links to all the man pages for OpenVAS. You can read more about the architecture and how the different components fit together.

I’ve also found that sometimes the daemons don’t automatically start when gsd starts. So you have to start them manually.

openvasmd && openvassd && gsad && openvasad

You can also use the web app https://127.0.0.1/omp

Then try logging in to the openvasmd. When your finished with gsd you can kill the running daemons if you like. I like to keep an eye on the listening ports when I’m done to keep things as quite as possible.

Check the ports.

ss -anp

Optional to see the processes running, but not necessary.

ps -e
kill -9 <PID of openvasad> <PID of gsad> <PID of openvassd> <PID of openvasmd>

There are also plenty of options when it comes to the report. This can be output in HTML, PDF, XML, Emailed and quite a few others. The reports are colour coded and you can choose what to have put in them. The vulnerabilities are classified by risk: High, Medium, Low, OpenVAS can take quite a while to scan as it runs so many tests.

This is how to get started with gsd.

Web Vulnerability Scanners

This is the generally accepted criteria of a tool to be considered a Web Application Security Scanner.

SkipFish

A high performance active reconnaissance tool written in C. From the documentation “Multiplexing single-thread, fully asynchronous network I/O and data processing model that eliminates memory management, scheduling, and IPC inefficiencies present in some multi-threaded clients.”. OK. So it’s fast.

which prepares an interactive sitemap by carrying out a recursive crawl and probes based on existing dictionaries or ones you build up yourself. Further details in the documentation linked below.

Doesn’t conform to most of the criteria outlined in the above Web Application Security Scanner criteria.

SkipFish v2.05 is the current version packaged with Kali Linux.

SkipFish v2.10b (released Dec 2012)

Free and you can view the source code. Apache license 2.0

Performs a similar role to w3af.

Project details can be found here.

You can find the tests here.

How do you use it though? This is a good place to start. Instead of reading through the non-existent doc/dictionaries.txt, I think you can do as well by reading through /usr/share/skipfish/dictionaries/README-FIRST.

The other two documentation sources are the man page and skipfish with the -h option.

Web Application Attack and Audit Framework (w3af)

Andres Riancho has created a masterpiece. The main behavior of this application is to assess and identify vulnerabilities in a web application by sending customised HTTP requests. Results can be output in quite a few formats including email. It can also proxy, but burp suite is more focused on this role and does it well.

Can be run with a gui: w3af_gui or from the terminal: w3af_console. Written in Python and Runs on Linux BSD or Mac. Older versions used to work on Windows, but it’s not currently being tested on Windows. Open source on GitHub and released under the GPLv2 license.

You can write your own plug-ins, but check first to make sure it doesn’t already exist. The plugins are listed within the application and on the w3af.org web site along with links to their source code, unit tests and descriptions. If it doesn’t appear that the plug-in you want exists, contact Andres Riancho to make sure, write it and submit a pull request. Also looks like Andres Riancho is driving the development TDD style, which means he’s obviously serious about creating quality software. Well done Andres!

w3af provides the ability to inject your payloads into almost every part of the HTTP request by way of it’s fuzzing engine. Including: query string, POST data, headers, cookie values, content of form files, URL file-names and paths.

There’s a good set of documentation found here and you can watch the training videos. I’m really looking forward to using this in anger.

w3af

Nikto

Is a web server scanner that’s not overly stealthy. It’s built on “Rain Forest Puppies” LIbWhisker2 which has a BSD license.

Nikto is free and open source with GPLv3 license. Can be run on any platform that runs a perl interpreter. It’s source can be found here. The first release of Nikto was in December of 2001 and is still under active development. Pull requests encouraged.

Suports SSL. Supports HTTP proxies, so you can see what Nikto is actually sending. Host authentication. Attack encoding. Update local databases and plugins via the -update argument. Checks for server configuration items like multiple index files and HTTP server options. Attempts to identify installed web servers and software.

Looks like the LibWhisker web site no longer exists. Last release of LibWhisker was at the beginning of 2010.

Nikto v2.1.4 (Released Feb 20 2011) is the current version packaged with Kali Linux. Tests for multiple items, including > 6400 potentially dangerous files/CGIs. Outdated versions of > 1200 servers. Insecurities of specific versions of > 270 servers.

Nikto v2.1.5 (released Sep 16 2012) is the latest version. Tests for multiple items, including > 6500 potentially dangerous files/CGIs. Outdated versions of > 1250 servers. Insecurities of specific versions of > 270 servers.

Just spoke with the Kali developers about the old version. They are now building a package of 2.1.5 as I write this. So should be an apt-get update && apt-get upgrade away by the time you read this all going well. Actually I can see it in the repo now. Man those guys are responsive!

Most of the info you will need can be found here.

SQLNinja

sqlninja: Targets Microsoft SQL Servers. Uses SQL injection vulnerabilities on a web app. Focuses on popping remote shells on the target database server and uses them to gain a foothold over the target network. You can set-up graphical access via a VNC server injection. Can upload executables by using HTTP requests via vbscript or debug.exe. Supports direct and reverse bindshell. Quite a few other methods of obtaining access. Documentation here.

Text Editors

  1. Vim. Shouldn’t need much explanation.
  2. Leafpad. This is a very basic graphical text editor. A bit like Windows Notepad.
  3. Gvim. This is the Graphical version of Vim. I’ve mostly used sublime text 2 & 3, gedit on Linux, but Gvim is really quite powerful too.

Note Keeping

  1. KeepNote. Supported on Linux, Windows and MacOS X. Easy to transport notes by zipping or copying a folder. Notes stored in HTML and XML.
  2. Zim Desktop Wiki.

Other Notable Features

  • Offensive Securities Kali Linux is free and always will be. It’s also completely open (as it’s based on debian) to modification of it’s OS or programmes.
  • FHS compliant. That means the file system complies to the Linux Filesystem Hierarchy Standard
  • Wireless device support is vast. Including USB devices.
  • Forensics Mode. As with BackTrack 5, the Kali ISO also has an option to boot into the forensic mode. No drives are written to (including swap). No drives will be auto mounted upon insertion.

Customising installed Kali

Wireless Card

I had a little trouble with my laptop wireless card not being activated. Turned out to be me just not realising that an external wi-fi switch had to be turned on. I had wireless enabled in the BIOS. The following where the steps I took to resolve it:

Read Kali Linux documentation on Troubleshooting Wireless Drivers  and found the card listed with lspci. Opened /var/log/dmesg with vi. Searched for the name of the card:

#From command mode to make search case insensitive:
:set ic
#From command mode to search
/[name of my wireless card]

There were no errors. So ran iwconfig (similar to ifconfig but dedicated to wireless interfaces). I noticed that the card was definitely present and the Tx-Power was off. I then thought I’d give rfkill a spin and it’s output made me realise I must have missed a hardware switch somewhere.

rfkill

Found the hard switch and turned it on and we now have wireless.

Adding Shortcuts to your Panel

[Alt]+[right click]->[Add to Panel…]

Or if your Kali install is on VirtualBox:

[Windows]+[Alt]+[right click]->[Add to Panel…]

Caching Debian Packages

If you want to:

  1. save on bandwidth
  2. have a large number of your packages delivered at your network speed rather than your internet speed
  3. have several debian based machines on your network

I’d recommend using apt-cacher-ng. If not already, you’ll have to set this up on a server and add the following file to each of your debian based machines.

/etc/apt/apt.conf with the following contents and set it’s permissions to be the same as your sources.list:

Acquire::http::Proxy “http://[ip address of your apt-cacher server]:3142”;

IceWeasel add-ons

  • Firebug
  • NoScript
  • Web Developer
  • FoxyProxy (more details mentioned above)
  • HackBar. Somewhat useful for (en/de)coding (Base64, Hex, MD5, SHA-(1/256), etc), manipulating and splitting URLs

SQL Inject Me

Nothing to do with Kali Linux, but still a good place to start for running a quick vulnerability assessment. Open source software (GPLv3) from Security Compass Labs. SQL Inject Me is a component of the Exploit-Me suite. Allows you to test all or any number of input fields on all or any of a pages forms. You just fill in the fields with valid data, then test with all the tools attacks or with the top that you’ve defined in the options menu. It then looks for database errors which are rendered into the returned HTML as a result of sending escape strings, so doesn’t cater for blind injection. You can also add remove escape strings and resulting error strings that SQL Inject Me should look for on response. The order in which each escape string can be tried can also be changed. All you need to know can be found here.

XSS Me

Nothing to do with Kali Linux, but still a good place to start for running a quick vulnerability assessment. Open source software (GPLv3) from Security Compass Labs. XSS Me is also a component of the Exploit-Me suite. This tool’s behaviour is very similar to SQL Inject Me (follows the POLA) which makes using the tools very easy. Both these add-ons have next to no learning curve. The level of entry is very low and I think are exactly what web developers that make excuses for not testing their own security need. The other thing is that it helps developers understand how these attacks can be carried out. XSS Me currently only tests for reflected XSS. It doesn’t attempt to compromise the security of the target system. Both XSS Me and SQL Inject Me are reconnaissance tools, where the information is the vulnerabilities found. XSS Me doesn’t support stored XSS or user supplied data from sources such as cookies, links, or HTTP headers. How effective XSS Me is in finding vulnerabilities is also determined by the list of attack strings the tool has available. Out of the box the list of XSS attack strings are derived from RSnakes collection which were donated to OWASP who now maintains it as one of their cheatsheets.. Multiple encodings are not yet supported, but are planned for the future. You can help to keep the collection up to date by submitting new attack strings.

Chromium

Because it’s got great developer tools that I’m used to using. In order to run this under the root account, you’ll need to add the following parameter to /etc/chromium/default between the quotes for CHROMIUM_FLAGS=””

--user-data-dir

I like to install the following extensions: Cookies, ScriptSafe

Terminator

Because I like a more powerful console than the default. Terminator adds split screen on top of multi tabs. If you live at the command line, you owe it to yourself to get the best console you can find. So far terminator still fits this bill for me.

KeePass

The password database app. Because I like passwords to be long, complex, unique for everything and as secure as possible.

Exploits

I was going to go over a few exploits we could carry out with the Kali Linux set-up, but I ran out of time and page space. In fact there are still many tools I wanted to review, but there just isn’t enough time or room in this article. Feel free to subscribe to my blog and you’ll get an update when I make posts. I’d like to extend on this by reviewing more of the tools offered in Kali Linux

Input Sanitisation

This has been one of my pet topics for a while. Why? Because the lack of it is so often abused. In fact this is one of the primary techniques for No.1 (Injection) and No.3 (XSS) of this years OWASP Top 10 List (unchanged from 2010). I’d encourage any serious web developers to look at my Sanitising User Input From Browser. Part 1” and Part 2

Part 1 deals with the client side (untrused) code.

Part 2 deals with the server side (trusted) code.

I provide source code, sources and discuss the following topics:

  1. Minimising the attack surface
  2. Defining maximum field lengths (validation)
  3. Determining a white list of allowable characters (validation)
  4. Escaping untrusted data
  5. External libraries, cheat sheets, useful code and sites, I used. Also discuss the less useful resources and why.
  6. The point of validating client side when the server side is going to do it again anyway
  7. Full set of server side tests to test the sanitisation is doing what is expected

Sanitising User Input from Browser. part 2

November 16, 2012

Untrusted data (data entered by a user), should always be treated as though it contains attack code.
This data should not be sent anywhere without taking the necessary steps to detect and neutralise the malicious code.
With applications becoming more interconnected, attacks being buried in user input and decoded and/or executed by a downstream interpreter is becoming all the more common.
Input validation, that’s restricting user input to allow only certain white listed characters and restricting field lengths are only two forms of defence.
Any decent attacker can get around client side validation, so you need to employ defence in depth.
validation and escaping also needs to be performed on the server side.

Leveraging existing libraries

  1. Microsofts AntiXSS is not extensible,
    it doesn’t allow the user to define their own whitelist.
    It didn’t allow me to add behaviour to the routines.
    I want to know how many instances of HTML encoded values there were.
    There was certainly a lot of code in there, but I didn’t find it very useful.
  2. The OWASP encoding project (Reform)(as mentioned in part 1 of this series).
    This is quite a useful set of projects for different technologies.
  3. System.Net.WebUtility from the System.Web.dll.
    Now this did most of what I needed other than provide me with fine grained information of what had been tampered with.
    So I took it and extended it slightly.
    We hadn’t employed AOP at this stage and it wasn’t considered important enough to invest the time to do so.
    So it was a matter of copy past modify.

What’s the point in client side validation if the server has to do it again anyway?

Now there are arguments both ways for this.
My current take on this for the project in question was:
If you only have server side validation, the client side is less responsive and user friendly.
If you only have client side validation, it’s out of our control.
This also gives fuel to the argument of using JavaScript on the client and server side (with the likes of node.js).
So the same code can be used both sides without having to code the same validation in two different languages.
Personally I find writing validation code easier using JavaScript than C#.
This maybe just because I’ve been writing considerably more JavaScript than C# lately though.

The code

I drew a sequence diagram of how this should work, but it got lost in a move.
So I wasn’t keen on doing it again, as the code had already been done.
In saying that, the code has reasonably good documentation (I think).
Code is king, providing it has been written to be read.
If you notice any of the escaping isn’t quite making sense, it could be the blogging engine either doing what it’s meant to, or not doing what it’s meant to.
I’ve been over the code a few times, but I may have missed something.
Shout out if anything’s not clear.

First up, we’ll look at the custom exceptions as we’ll need those soon.

using System;

namespace Common.WcfHelpers.ErrorHandling.Exceptions
{
    public abstract class WcfException : Exception
    {
        /// <summary>
        /// In order to set the message for the client, set it here, or via the property directly in order to over ride default value.
        /// </summary>
        /// <param name="message">The message to be assigned to the Exception's Message.</param>
        /// <param name="innerException">The exception to be assigned to the Exception's InnerException.</param>
        /// <param name="messageForClient">The client friendly message. This parameter is optional, but should be set.</param>
        public WcfException(string message, Exception innerException = null, string messageForClient = null) : base(message, innerException)
        {
            MessageForClient = messageForClient;
        }

        /// <summary>
        /// This is the message that the service's client will see.
        /// Make sure it is set in the constructor. Or here.
        /// </summary>
	    public string MessageForClient
        {
            get { return string.IsNullOrEmpty(_messageForClient) ? "The MessageForClient property of WcfException was not set" : _messageForClient; }
            set { _messageForClient = value; }
        }
        private string _messageForClient;
    }
}

And the more specific SanitisationWcfException

using System;
using System.Configuration;

namespace Common.WcfHelpers.ErrorHandling.Exceptions
{
    /// <summary>
    /// Exception class that is used when the user input sanitisation fails, and the user needs to be informed.
    /// </summary>
    public class SanitisationWcfException : WcfException
    {
        private const string _defaultMessageForClient = "Answers were NOT saved. User input validation was unsuccessful.";
        public string UnsanitisedAnswer { get; private set; }

        /// <summary>
        /// In order to set the message for the client, set it here, or via the property directly in order to over ride default value.
        /// </summary>
        /// <param name="message">The message to be assigned to the Exception's Message.</param>
        /// <param name="innerException">The Exception to be assigned to the base class instance's inner exception. This parameter is optional.</param>
        /// <param name="messageForClient">The client friendly message. This parameter is optional, but should be set.</param>
        /// <param name="unsanitisedAnswer">The user input string before service side sanitisatioin is performed.</param>
        public SanitisationWcfException
        (
            string message,
            Exception innerException = null,
            string messageForClient = _defaultMessageForClient,
            string unsanitisedAnswer = null
        )
            : base(
                message,
                innerException,
                messageForClient + " If this continues to happen, please contact " + ConfigurationManager.AppSettings["SupportEmail"] + Environment.NewLine
                )
        {
            UnsanitisedAnswer = unsanitisedAnswer;
        }
    }
}

Now as we define whether our requirements are satisfied by way of executable requirements (unit tests(in their rawest form))
Lets write some executable specifications.

using NUnit.Framework;
using Common.Security.Sanitisation;

namespace Common.Security.Encoding.UnitTest
{
    [TestFixture]
    public class ExtensionsTest
    {

        private readonly string _inNeedOfEscaping = @"One #x2F / two amp & three #x27 ' four lt < five quot "" six gt >.";
        private readonly string _noNeedForEscaping = @"One x2F two amp three x27 four lt five quot six gt       .";

        [Test]
        public void SingleDecodeDoubleEncodedHtml_ShouldSingleDecodeDoubleEncodedHtml()
        {
            string doubleEncodedHtml = @"";               // between the ""'s we have a string of Html with double escaped values like &amp;#x27; user entered text &amp;#x2F.
            string singleEncodedHtmlShouldLookLike = @""; // between the ""'s we have a string of Html with single escaped values like ' user entered text &#x2F.
            // In the above, the bloging engine is escaping the sinlge escaped entity encoding, so all you'll see is the entity it self.
            // but it should look like the double encoded entity encodings without the first &amp->;


            string singleEncodedHtml = doubleEncodedHtml.SingleDecodeDoubleEncodedHtml();
            
            Assert.That(singleEncodedHtml, Is.EqualTo(singleEncodedHtmlShouldLookLike));
        }

        [Test]
        public void Extensions_CompliesWithWhitelist_ShouldNotComply()
        {
            Assert.That(_inNeedOfEscaping.CompliesWithWhitelist(whiteList: @"^[\w\s\.,]+$"), Is.False);
        }

        [Test]
        public void Extensions_CompliesWithWhitelist_ShouldComply()
        {
            Assert.That(_noNeedForEscaping.CompliesWithWhitelist(whiteList: @"^[\w\s\.,]+$"), Is.True);
            Assert.That(_inNeedOfEscaping.CompliesWithWhitelist(whiteList: @"^[\w\s\.,#/&'<"">]+$"), Is.True);
        }
    }
}

Now the code that satisfies the above executable specifications, and more.

using System;
using System.Collections.Generic;
using System.Globalization;
using System.IO;
using System.Text.RegularExpressions;

namespace Common.Security.Sanitisation
{
    /// <summary>
    /// Provides a series of extension methods that perform sanitisation.
    /// Escaping, unescaping, etc.
    /// Usually targeted at user input, to help defend against the likes of XSS and other injection attacks.
    /// </summary>
    public static class Extensions
    {

        private const int CharacterIndexNotFound = -1;

        /// <summary>
        /// Returns a new string in which all occurrences of a double escaped html character (that's an html entity immediatly prefixed with another html entity)
        /// in the current instance are replaced with the single escaped character.
        /// </summary>
        /// <param name="source">The target text used to strip one layer of Html entity encoding.</param>
        /// <returns>The singly escaped text.</returns>
        public static string SingleDecodeDoubleEncodedHtml(this string source)
        {
            return source.Replace("&amp;#x", "&#x");
        }
        /// <summary>
        /// Filter a text against a regular expression whitelist of specified characters.
        /// </summary>
        /// <param name="target">The text that is filtered using the whitelist.</param>
        /// <param name="alternativeTarget"></param>
        /// <param name="whiteList">Needs to be be assigned a valid whitelist, otherwise nothing gets through.</param>
        public static bool CompliesWithWhitelist(this string target, string alternativeTarget = "", string whiteList = "")
        {
            if (string.IsNullOrEmpty(target))
                target = alternativeTarget;
            
            return Regex.IsMatch(target, whiteList);
        }
        /// <summary>
        /// Takes a string and returns another with a single layer of Html entity encoding replaced with it's Html entity literals.
        /// </summary>
        /// <param name="encodedUserInput">The text to perform the opperation on.</param>
        /// <param name="numberOfEscapes">The number of Html entity encodings that were replaced.</param>
        /// <returns>The text that's had a single layer of Html entity encoding replaced with it's Html entity literals.</returns>
        public static string HtmlDecode(this string encodedUserInput, ref int numberOfEscapes)
        {
            const int NotFound = -1;

            if (string.IsNullOrEmpty(encodedUserInput))
                return string.Empty;

            StringWriter output = new StringWriter(CultureInfo.InvariantCulture);
            
            if (encodedUserInput.IndexOf('&') == NotFound)
            {
                output.Write(encodedUserInput);
            }
            else
            {
                int length = encodedUserInput.Length;
                for (int index1 = 0; index1 < length; ++index1)
                {
                    char ch1 = encodedUserInput[index1];
                    if (ch1 == 38)
                    {
                        int index2 = encodedUserInput.IndexOfAny(_htmlEntityEndingChars, index1 + 1);
                        if (index2 > 0 && encodedUserInput[index2] == 59)
                        {
                            string entity = encodedUserInput.Substring(index1 + 1, index2 - index1 - 1);
                            if (entity.Length > 1 && entity[0] == 35)
                            {
                                ushort result;
                                if (entity[1] == 120 || entity[1] == 88)
                                    ushort.TryParse(entity.Substring(2), NumberStyles.AllowHexSpecifier, NumberFormatInfo.InvariantInfo, out result);
                                else
                                    ushort.TryParse(entity.Substring(1), NumberStyles.AllowLeadingWhite | NumberStyles.AllowTrailingWhite | NumberStyles.AllowLeadingSign, NumberFormatInfo.InvariantInfo, out result);
                                if (result != 0)
                                {
                                    ch1 = (char)result;
                                    numberOfEscapes++;
                                    index1 = index2;
                                }
                            }
                            else
                            {
                                index1 = index2;
                                char ch2 = HtmlEntities.Lookup(entity);
                                if ((int)ch2 != 0)
                                {
                                    ch1 = ch2;
                                    numberOfEscapes++;
                                }
                                else
                                {
                                    output.Write('&');
                                    output.Write(entity);
                                    output.Write(';');
                                    continue;
                                }
                            }
                        }
                    }
                    output.Write(ch1);
                }
            }
            string decodedHtml = output.ToString();
            output.Dispose();
            return decodedHtml;
        }
        /// <summary>
        /// Escapes all character entity references (double escaping where necessary).
        /// Why? The XmlTextReader that is setup in XmlDocument.LoadXml on the service considers the character entity references (&#xxxx;) to be the character they represent.
        /// All XML is converted to unicode on reading and any such entities are removed in favor of the unicode character they represent.
        /// </summary>
        /// <param name="unencodedUserInput">The string that needs to be escaped.</param>
        /// <param name="numberOfEscapes">The number of escapes applied.</param>
        /// <returns>The escaped text.</returns>
        public static unsafe string HtmlEncode(this string unencodedUserInput, ref int numberOfEscapes)
        {
            if (string.IsNullOrEmpty(unencodedUserInput))
                return string.Empty;

            StringWriter output = new StringWriter(CultureInfo.InvariantCulture);
            
            if (output == null)
                throw new ArgumentNullException("output");
            int num1 = IndexOfHtmlEncodingChars(unencodedUserInput);
            if (num1 == -1)
            {
                output.Write(unencodedUserInput);
            }
            else
            {
                int num2 = unencodedUserInput.Length - num1;
                fixed (char* chPtr1 = unencodedUserInput)
                {
                    char* chPtr2 = chPtr1;
                    while (num1-- > 0)
                        output.Write(*chPtr2++);
                    while (num2-- > 0)
                    {
                        char ch = *chPtr2++;
                        if (ch <= 62)
                        {
                            switch (ch)
                            {
                                case '"':
                                    output.Write(""");
                                    numberOfEscapes++;
                                    continue;
                                case '&':
                                    output.Write("&amp;");
                                    numberOfEscapes++;
                                    continue;
                                case '\'':
                                    output.Write("&amp;#x27;");
                                    numberOfEscapes = numberOfEscapes + 2;
                                    continue;
                                case '<':
                                    output.Write("<");
                                    numberOfEscapes++;
                                    continue;
                                case '>':
                                    output.Write(">");
                                    numberOfEscapes++;
                                    continue;
                                case '/':
                                    output.Write("&amp;#x2F;");
                                    numberOfEscapes = numberOfEscapes + 2;
                                    continue;
                                default:
                                    output.Write(ch);
                                    continue;
                            }
                        }
                        if (ch >= 160 && ch < 256)
                        {
                            output.Write("&#");
                            output.Write(((int)ch).ToString(NumberFormatInfo.InvariantInfo));
                            output.Write(';');
                            numberOfEscapes++;
                        }
                        else
                            output.Write(ch);
                    }
                }
            }
            string encodedHtml = output.ToString();
            output.Dispose();
            return encodedHtml;
        }

 

        private static unsafe int IndexOfHtmlEncodingChars(string searchString)
        {
            int num = searchString.Length;
            fixed (char* chPtr1 = searchString)
            {
                char* chPtr2 = (char*)((UIntPtr)chPtr1);
                for (; num > 0; --num)
                {
                    char ch = *chPtr2;
                    if (ch <= 62)
                    {
                        switch (ch)
                        {
                            case '"':
                            case '&':
                            case '\'':
                            case '<':
                            case '>':
                            case '/':
                                return searchString.Length - num;
                        }
                    }
                    else if (ch >= 160 && ch < 256)
                        return searchString.Length - num;
                    ++chPtr2;
                }
            }
            return CharacterIndexNotFound;
        }

        private static char[] _htmlEntityEndingChars = new char[2]
        {
            ';',
            '&'
        };
        private static class HtmlEntities
        {
            private static string[] _entitiesList = new string[253]
            {
                "\"-quot",
                "&-amp",
                "'-apos",
                "<-lt",
                ">-gt",
                " -nbsp",
                "¡-iexcl",
                "¢-cent",
                "£-pound",
                "¤-curren",
                "¥-yen",
                "¦-brvbar",
                "§-sect",
                "¨-uml",
                "©-copy",
                "ª-ordf",
                "«-laquo",
                "¬-not",
                "\x00AD-shy",
                "®-reg",
                "¯-macr",
                "°-deg",
                "±-plusmn",
                "\x00B2-sup2",
                "\x00B3-sup3",
                "´-acute",
                "µ-micro",
                "¶-para",
                "·-middot",
                "¸-cedil",
                "\x00B9-sup1",
                "º-ordm",
                "»-raquo",
                "\x00BC-frac14",
                "\x00BD-frac12",
                "\x00BE-frac34",
                "¿-iquest",
                "À-Agrave",
                "Á-Aacute",
                "Â-Acirc",
                "Ã-Atilde",
                "Ä-Auml",
                "Å-Aring",
                "Æ-AElig",
                "Ç-Ccedil",
                "È-Egrave",
                "É-Eacute",
                "Ê-Ecirc",
                "Ë-Euml",
                "Ì-Igrave",
                "Í-Iacute",
                "Î-Icirc",
                "Ï-Iuml",
                "Ð-ETH",
                "Ñ-Ntilde",
                "Ò-Ograve",
                "Ó-Oacute",
                "Ô-Ocirc",
                "Õ-Otilde",
                "Ö-Ouml",
                "×-times",
                "Ø-Oslash",
                "Ù-Ugrave",
                "Ú-Uacute",
                "Û-Ucirc",
                "Ü-Uuml",
                "Ý-Yacute",
                "Þ-THORN",
                "ß-szlig",
                "à-agrave",
                "á-aacute",
                "â-acirc",
                "ã-atilde",
                "ä-auml",
                "å-aring",
                "æ-aelig",
                "ç-ccedil",
                "è-egrave",
                "é-eacute",
                "ê-ecirc",
                "ë-euml",
                "ì-igrave",
                "í-iacute",
                "î-icirc",
                "ï-iuml",
                "ð-eth",
                "ñ-ntilde",
                "ò-ograve",
                "ó-oacute",
                "ô-ocirc",
                "õ-otilde",
                "ö-ouml",
                "÷-divide",
                "ø-oslash",
                "ù-ugrave",
                "ú-uacute",
                "û-ucirc",
                "ü-uuml",
                "ý-yacute",
                "þ-thorn",
                "ÿ-yuml",
                "Œ-OElig",
                "œ-oelig",
                "Š-Scaron",
                "š-scaron",
                "Ÿ-Yuml",
                "ƒ-fnof",
                "\x02C6-circ",
                "˜-tilde",
                "Α-Alpha",
                "Β-Beta",
                "Γ-Gamma",
                "Δ-Delta",
                "Ε-Epsilon",
                "Ζ-Zeta",
                "Η-Eta",
                "Θ-Theta",
                "Ι-Iota",
                "Κ-Kappa",
                "Λ-Lambda",
                "Μ-Mu",
                "Ν-Nu",
                "Ξ-Xi",
                "Ο-Omicron",
                "Π-Pi",
                "Ρ-Rho",
                "Σ-Sigma",
                "Τ-Tau",
                "Υ-Upsilon",
                "Φ-Phi",
                "Χ-Chi",
                "Ψ-Psi",
                "Ω-Omega",
                "α-alpha",
                "β-beta",
                "γ-gamma",
                "δ-delta",
                "ε-epsilon",
                "ζ-zeta",
                "η-eta",
                "θ-theta",
                "ι-iota",
                "κ-kappa",
                "λ-lambda",
                "μ-mu",
                "ν-nu",
                "ξ-xi",
                "ο-omicron",
                "π-pi",
                "ρ-rho",
                "ς-sigmaf",
                "σ-sigma",
                "τ-tau",
                "υ-upsilon",
                "φ-phi",
                "χ-chi",
                "ψ-psi",
                "ω-omega",
                "ϑ-thetasym",
                "ϒ-upsih",
                "ϖ-piv",
                " -ensp",
                " -emsp",
                " -thinsp",
                "\x200C-zwnj",
                "\x200D-zwj",
                "\x200E-lrm",
                "\x200F-rlm",
                "–-ndash",
                "—-mdash",
                "‘-lsquo",
                "’-rsquo",
                "‚-sbquo",
                "“-ldquo",
                "”-rdquo",
                "„-bdquo",
                "†-dagger",
                "‡-Dagger",
                "•-bull",
                "…-hellip",
                "‰-permil",
                "′-prime",
                "″-Prime",
                "‹-lsaquo",
                "›-rsaquo",
                "‾-oline",
                "⁄-frasl",
                "€-euro",
                "ℑ-image",
                "℘-weierp",
                "ℜ-real",
                "™-trade",
                "ℵ-alefsym",
                "←-larr",
                "↑-uarr",
                "→-rarr",
                "↓-darr",
                "↔-harr",
                "↵-crarr",
                "⇐-lArr",
                "⇑-uArr",
                "⇒-rArr",
                "⇓-dArr",
                "⇔-hArr",
                "∀-forall",
                "∂-part",
                "∃-exist",
                "∅-empty",
                "∇-nabla",
                "∈-isin",
                "∉-notin",
                "∋-ni",
                "∏-prod",
                "∑-sum",
                "−-minus",
                "∗-lowast",
                "√-radic",
                "∝-prop",
                "∞-infin",
                "∠-ang",
                "∧-and",
                "∨-or",
                "∩-cap",
                "∪-cup",
                "∫-int",
                "∴-there4",
                "∼-sim",
                "≅-cong",
                "≈-asymp",
                "≠-ne",
                "≡-equiv",
                "≤-le",
                "≥-ge",
                "⊂-sub",
                "⊃-sup",
                "⊄-nsub",
                "⊆-sube",
                "⊇-supe",
                "⊕-oplus",
                "⊗-otimes",
                "⊥-perp",
                "⋅-sdot",
                "⌈-lceil",
                "⌉-rceil",
                "⌊-lfloor",
                "⌋-rfloor",
                "〈-lang",
                "〉-rang",
                "◊-loz",
                "♠-spades",
                "♣-clubs",
                "♥-hearts",
                "♦-diams"
            };
            private static Dictionary<string, char> _lookupTable = GenerateLookupTable();

            private static Dictionary<string, char> GenerateLookupTable()
            {
                Dictionary<string, char> dictionary = new Dictionary<string, char>(StringComparer.Ordinal);
                foreach (string str in _entitiesList)
                    dictionary.Add(str.Substring(2), str[0]);
                return dictionary;
            }

            public static char Lookup(string entity)
            {
                char ch;
                _lookupTable.TryGetValue(entity, out ch);
                return ch;
            }
        }
    }
}

You may also notice that I’ve mocked the OperationContext.
Thanks to WCFMock, a mocking framework for WCF services.
I won’t include this code, but you can get it here.
I’ve used the popular NUnit test framework and RhinoMocks for the stubbing and mocking.
Both pulled into the solution using NuGet.
Most useful documentation for RhinoMocks:
http://ayende.com/Wiki/Rhino+Mocks+3.5.ashx
http://ayende.com/wiki/Rhino+Mocks.ashx

For this project I used NLog and wrapped it.
Now you start to get an idea of how to use the sanitisation.

using System;
using System.ServiceModel;
using System.ServiceModel.Channels;
using NUnit.Framework;
using System.Configuration;
using Rhino.Mocks;
using Common.Wrapper.Log;
using MockedOperationContext = System.ServiceModel.Web.MockedOperationContext;
using Common.WcfHelpers.ErrorHandling.Exceptions;

namespace Sanitisation.UnitTest
{
    [TestFixture]
    public class SanitiseTest
    {
        private const string _myTestIpv4Address = "My.Test.Ipv4.Address";
        private readonly int _maxLengthHtmlEncodedUserInput = int.Parse(ConfigurationManager.AppSettings["MaxLengthHtmlEncodedUserInput"]);
        private readonly int _maxLengthHtmlDecodedUserInput = int.Parse(ConfigurationManager.AppSettings["MaxLengthHtmlDecodedUserInput"]);
        private readonly string _encodedUserInput_thatsMaxDecodedLength = @"One #x2F &amp;#x2F; two amp &amp; three #x27 &amp;#x27; four lt < five quot " six gt >.
One #x2F &amp;#x2F; two amp &amp; three #x27 &amp;#x27; four lt < five quot " six gt >.
One #x2F &amp;#x2F; two amp &amp; three #x27 &amp;#x27; four lt < five quot " six gt >.
One #x2F &amp;#x2F; two amp &amp; three #x27 &amp;#x27; four lt < five quot " six gt >.
One #x2F &amp;#x2F; two amp &amp; three #x27 &amp;#x27; four lt < five quot " six gt >.
One #x2F &amp;#x2F; two amp &amp; three #x27 &amp;#x27; four lt < five quot " six gt >.";
        private readonly string _decodedUserInput_thatsMaxLength = @"One #x2F / two amp & three #x27 ' four lt < five quot "" six gt >.
One #x2F / two amp & three #x27 ' four lt < five quot "" six gt >.
One #x2F / two amp & three #x27 ' four lt < five quot "" six gt >.
One #x2F / two amp & three #x27 ' four lt < five quot "" six gt >.
One #x2F / two amp & three #x27 ' four lt < five quot "" six gt >.
One #x2F / two amp & three #x27 ' four lt < five quot "" six gt >.";

        [Test]
        public void Sanitise_UserInput_WhenGivenNull_ShouldReturnEmptyString()
        {
            Assert.That(new Sanitise().UserInput(null), Is.EqualTo(string.Empty));
        }

        [Test]
        public void Sanitise_UserInput_WhenGivenEmptyString_ShouldReturnEmptyString()
        {
            Assert.That(new Sanitise().UserInput(string.Empty), Is.EqualTo(string.Empty));
        }

        [Test]
        public void Sanitise_UserInput_WhenGivenSanitisedString_ShouldReturnSanitisedString()
        {
            // Open the whitelist up in order to test the encoding without restriction.
            Assert.That(new Sanitise(whiteList: @"^[\w\s\.,#/&'<"">]+$").UserInput(_encodedUserInput_thatsMaxDecodedLength), Is.EqualTo(_encodedUserInput_thatsMaxDecodedLength));
        }
        [Test]
        [ExpectedException(typeof(SanitisationWcfException))]
        public void Sanitise_UserInput_ShouldThrowExceptionIfEscapedInputToLong()
        {
            string fourThousandAndOneCharacters = "Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand character";
            string expectedError = "The un-modified string received from the client with the following IP address: " +
                   '"' + _myTestIpv4Address + "\" " +
                   "exceeded the allowed maximum length of an escaped Html user input string. " +
                   "The maximum length allowed is: " +
                   _maxLengthHtmlEncodedUserInput +
                   ". The length was: " +
                   (_maxLengthHtmlEncodedUserInput+1) + ".";

            using(new MockedOperationContext(StubbedOperationContext))
            {
                try
                {
                    new Sanitise().UserInput(fourThousandAndOneCharacters);
                }
                catch(SanitisationWcfException e)
                {
                    Assert.That(e.Message, Is.EqualTo(expectedError));
                    Assert.That(e.UnsanitisedAnswer, Is.EqualTo(fourThousandAndOneCharacters));
                    throw;
                }
            }
        }
        [Test]
        [ExpectedException(typeof(SanitisationWcfException))]
        public void Sanitise_UserInput_DecodedUserInputShouldThrowException_WhenMaxLengthHtmlDecodedUserInputIsExceeded()
        {
            char oneCharOverTheLimit = '.';
            string expectedError =
                           "The string received from the client with the following IP address: " +
                           "\"" + _myTestIpv4Address + "\" " +
                           "after Html decoding exceded the allowed maximum length of an un-escaped Html user input string." +
                           Environment.NewLine +
                           "The maximum length allowed is: " + _maxLengthHtmlDecodedUserInput + ". The length was: " +
                           (_decodedUserInput_thatsMaxLength + oneCharOverTheLimit).Length + oneCharOverTheLimit;

            using(new MockedOperationContext(StubbedOperationContext))
            {
                try
                {
                    new Sanitise().UserInput(_encodedUserInput_thatsMaxDecodedLength + oneCharOverTheLimit);
                }
                catch(SanitisationWcfException e)
                {
                    Assert.That(e.Message, Is.EqualTo(expectedError));
                    Assert.That(e.UnsanitisedAnswer, Is.EqualTo(_encodedUserInput_thatsMaxDecodedLength + oneCharOverTheLimit));
                    throw;
                }
            }
        }
        [Test]
        public void Sanitise_UserInput_ShouldLogAndSendEmail_IfNumberOfDecodedHtmlEntitiesDoesNotMatchNumberOfEscapes()
        {
            string encodedUserInput_with6HtmlEntitiesNotEscaped = _encodedUserInput_thatsMaxDecodedLength.Replace("&amp;#x2F;", "/");
            string errorWeAreExpecting =
                "It appears as if someone has circumvented the client side Html entity encoding." + Environment.NewLine +
                "The requesting IP address was: " +
                "\"" + _myTestIpv4Address + "\" " +
                "The sanitised input we receive from the client was the following:" + Environment.NewLine +
                "\"" + encodedUserInput_with6HtmlEntitiesNotEscaped + "\"" + Environment.NewLine +
                "The same input after decoding and re-escaping on the server side was the following:" + Environment.NewLine +
                "\"" + _encodedUserInput_thatsMaxDecodedLength + "\"";
            string sanitised;
            // setup _logger
            ILogger logger = MockRepository.GenerateMock<ILogger>();
            logger.Expect(lgr => lgr.logError(errorWeAreExpecting));

            Sanitise sanitise = new Sanitise(@"^[\w\s\.,#/&'<"">]+$", logger);

            using (new MockedOperationContext(StubbedOperationContext))
            {
                // Open the whitelist up in order to test the encoding etc.
                sanitised = sanitise.UserInput(encodedUserInput_with6HtmlEntitiesNotEscaped);
            }

            Assert.That(sanitised, Is.EqualTo(_encodedUserInput_thatsMaxDecodedLength));
            logger.VerifyAllExpectations();
        }        

        private static IOperationContext StubbedOperationContext
        {
            get
            {
                IOperationContext operationContext = MockRepository.GenerateStub<IOperationContext>();
                int port = 80;
                RemoteEndpointMessageProperty remoteEndpointMessageProperty = new RemoteEndpointMessageProperty(_myTestIpv4Address, port);
                operationContext.Stub(oc => oc.IncomingMessageProperties[RemoteEndpointMessageProperty.Name]).Return(remoteEndpointMessageProperty);
                return operationContext;
            }
        }
    }
}

Now the API code that we can use to do our sanitisation.

using System;
using System.Configuration;
// Todo : KC We need time to implement DI. Should be using something like ninject.extensions.wcf.
using OperationContext = System.ServiceModel.Web.MockedOperationContext;
using System.ServiceModel.Channels;
using Common.Security.Sanitisation;
using Common.WcfHelpers.ErrorHandling.Exceptions;
using Common.Wrapper.Log;

namespace Sanitisation
{

    public class Sanitise
    {
        private readonly string _whiteList;
        private readonly ILogger _logger;
        

        private string RequestingIpAddress
        {
            get
            {
                RemoteEndpointMessageProperty remoteEndpointMessageProperty = OperationContext.Current.IncomingMessageProperties[RemoteEndpointMessageProperty.Name] as RemoteEndpointMessageProperty;
                return ((remoteEndpointMessageProperty != null) ? remoteEndpointMessageProperty.Address : string.Empty);
            }
        }
        /// <summary>
        /// Provides server side escaping of Html entities, and runs the supplied whitelist character filter over the user input string.
        /// </summary>
        /// <param name="whiteList">Should be provided by DI from the ResourceFile.</param>
        /// <param name="logger">Should be provided by DI. Needs to be an asynchronous logger.</param>
        /// <example>
        /// The whitelist can be obtained from a ResourceFile like so...
        /// <code>
        /// private Resource _resource;
        /// _resource.GetString("WhiteList");
        /// </code>
        /// </example>
        public Sanitise(string whiteList = "", ILogger logger = null)
        {
            _whiteList = whiteList;
            _logger = logger ?? new Logger();
        }
        /// <summary>
        /// 1) Check field lengths.         Client side validation may have been negated.
        /// 2) Check against white list.	Client side validation may have been negated.
        /// 3) Check Html escaping.         Client side validation may have been negated.

        /// Generic Fail actions:	Drop the payload. No point in trying to massage and save, as it won't be what the user was expecting,
        ///                         Add full error to a WCFException Message and throw.
        ///                         WCF interception reads the WCFException.MessageForClient, and sends it to the user. 
        ///                         On return, log the WCFException's Message.
        ///                         
        /// Escape Fail actions:	Asynchronously Log and email full error to support.


        /// 1) BA confirmed 50 for text, and 400 for textarea.
        ///     As we don't know the field type, we'll have to go for 400."
        ///
        ///     First we need to check that we haven't been sent some huge string.
        ///     So we check that the string isn't longer than 400 * 10 = 4000.
        ///     10 is the length of our double escaped character references.
        ///     Or, we ask the business for a number."
        ///     If we fail here, perform Generic Fail actions and don't complete the following steps.
        /// 
        ///     Convert all Html Entity Encodings back to their equivalent characters, and count how many occurrences.
        ///
        ///     If the string is longer than 400, perform Generic Fail actions and don't complete the following steps.
        /// 
        /// 2) check all characters against the white list
        ///     If any don't match, perform Generic Fail actions and don't complete the following steps.
        /// 
        /// 3) re html escape (as we did in JavaScript), and count how many escapes.
        ///     If count is greater than the count of Html Entity Encodings back to their equivalent characters,
        ///     Perform Escape Fail actions. Return sanitised string.
        /// 
        ///     If we haven't returned, return sanitised string.
        
        
        /// Performs checking on the text passed in, to verify that client side escaping and whitelist validation has already been performed.
        /// Performs decoding, and re-encodes. Counts that the number of escapes was the same, otherwise we log and send email with the details to support.
        /// Throws exception if the client side validation failed to restrict the number of characters in the escaped string we received.
        ///     This needs to be intercepted at the service.
        ///     The exceptions default message for client needs to be passed back to the user.
        ///     On return, the interception needs to log the exception's message.
        /// </summary>
        /// <param name="sanitiseMe"></param>
        /// <returns></returns>
        public string UserInput(string sanitiseMe)
        {
            if (string.IsNullOrEmpty(sanitiseMe))
                return string.Empty;

            ThrowExceptionIfEscapedInputToLong(sanitiseMe);

            int numberOfDecodedHtmlEntities = 0;
            string decodedUserInput = HtmlDecodeUserInput(sanitiseMe, ref numberOfDecodedHtmlEntities);

            if(!decodedUserInput.CompliesWithWhitelist(whiteList: _whiteList))
            {
                string error = "The answer received from client with the following IP address: " +
                    "\"" + RequestingIpAddress + "\" " +
                    "had characters that failed to match the whitelist.";
                throw new SanitisationWcfException(error);
            }

            int numberOfEscapes = 0;
            string sanitisedUserInput = decodedUserInput.HtmlEncode(ref numberOfEscapes);

            if(numberOfEscapes != numberOfDecodedHtmlEntities)
            {
                AsyncLogAndEmail(sanitiseMe, sanitisedUserInput);
            }

            return sanitisedUserInput;
        }
        /// <note>
        /// Make sure the logger is setup to log asynchronously
        /// </note>
        private void AsyncLogAndEmail(string sanitiseMe, string sanitisedUserInput)
        {
            // no need for SanitisationException

            _logger.logError(
                "It appears as if someone has circumvented the client side Html entity encoding." + Environment.NewLine +
                "The requesting IP address was: " +
                "\"" + RequestingIpAddress + "\" " +
                "The sanitised input we receive from the client was the following:" + Environment.NewLine +
                "\"" + sanitiseMe + "\"" + Environment.NewLine +
                "The same input after decoding and re-escaping on the server side was the following:" + Environment.NewLine +
                "\"" + sanitisedUserInput + "\""
                );
        }

        /// <summary>
        /// This procedure may throw a SanitisationWcfException.
        /// If it does, ErrorHandlerBehaviorAttribute will need to pass the "messageForClient" back to the client from within the IErrorHandler.ProvideFault procedure.
        /// Once execution is returned, the IErrorHandler.HandleError procedure of ErrorHandlerBehaviorAttribute
        /// will continue to process the exception that was thrown in the way of logging sensitive info.
        /// </summary>
        /// <param name="toSanitise"></param>
        private void ThrowExceptionIfEscapedInputToLong(string toSanitise)
        {
            int maxLengthHtmlEncodedUserInput = int.Parse(ConfigurationManager.AppSettings["MaxLengthHtmlEncodedUserInput"]);
            if (toSanitise.Length > maxLengthHtmlEncodedUserInput)
            {
                string error = "The un-modified string received from the client with the following IP address: " +
                    "\"" + RequestingIpAddress + "\" " +
                    "exceeded the allowed maximum length of an escaped Html user input string. " +
                    "The maximum length allowed is: " +
                    maxLengthHtmlEncodedUserInput +
                    ". The length was: " +
                    toSanitise.Length + ".";
                throw new SanitisationWcfException(error, unsanitisedAnswer: toSanitise);
            }
        }

        private string HtmlDecodeUserInput(string doubleEncodedUserInput, ref int numberOfDecodedHtmlEntities)
        {
            string decodedUserInput = doubleEncodedUserInput.HtmlDecode(ref numberOfDecodedHtmlEntities).HtmlDecode(ref numberOfDecodedHtmlEntities) ?? string.Empty;
            
            // if the decoded string is longer than MaxLengthHtmlDecodedUserInput throw
            int maxLengthHtmlDecodedUserInput = int.Parse(ConfigurationManager.AppSettings["MaxLengthHtmlDecodedUserInput"]);
            if(decodedUserInput.Length > maxLengthHtmlDecodedUserInput)
            {
                throw new SanitisationWcfException(
                    "The string received from the client with the following IP address: " +
                    "\"" + RequestingIpAddress + "\" " +
                    "after Html decoding exceded the allowed maximum length of an un-escaped Html user input string." +
                    Environment.NewLine +
                    "The maximum length allowed is: " + maxLengthHtmlDecodedUserInput + ". The length was: " +
                    decodedUserInput.Length + ".",
                    unsanitisedAnswer: doubleEncodedUserInput
                    );
            }
            return decodedUserInput;
        }
    }
}

As you can see, there’s a lot more work in the server side sanitisation than the client side.

Sanitising User Input from Browser. part 1

November 4, 2012

I was working on a web based project recently where there was no security thought about when designing, developing it.
The following outlines my experience with retrofitting security.
It’s my hope that someone will find it useful for their own implementation.

We’ll be focussing on the client side in this post (part 1) and the server side in part 2.
We’ll also cover some preliminary discussion that will set the stage for this series.

The application consists of a WCF service delivering up content to some embedding code on any page in the browser.
The content is stored as Xml in the database and transformed into Html via Xslt.

The first activity I find useful is to go through the process of Threat Modelling the Application.
This process can be quite daunting for those new to it.
Here’s a couple of references I find quite useful to get started:

https://www.owasp.org/index.php/Application_Threat_Modeling

https://www.owasp.org/index.php/Threat_Risk_Modeling#Decompose_Application

Actually this ones not bad either.

There is no single right way to do this.
The more you read and experiment, the more equipped you will be.
The idea is to think like an attacker thinks.
This may be harder for some than others, but it is essential, to cover as many potential attack vectors as possible.
Remember, there is no secure system, just varying levels of insecurity.
It will always be much harder to discover the majority of security weaknesses in your application as the person or team creating/maintaining it,
than for the person attacking it.
The Threat Modelling topic is large and I’m not going to go into it here, other than to say, you need to go into it.

Threat Agents

Work out who your Threat Agents are likely to be.
Learn how to think like they do.
Learn what skills they have and learn the skills your self.
Sometimes the skills are very non technical.
For example walking through the door of your organisation in the weekend because the cleaners (or any one with access) forgot to lock up.
Or when the cleaners are there and the technical staff are not (which is just as easy).
It happens more often than we like to believe.

Defense in Depth

To attempt to mitigate attacks, we need to take a multi layered approach (often called defence in depth).

What made me decide to start with sanitising user input from the browser anyway?
Well according to the OWASP Top 10, Injection and Cross Site Scripting (XSS) are still the most popular techniques chosen to compromise web applications.
So it makes sense if your dealing with web apps, to target the most common techniques exploited.

Now, in regards to defence in depth when discussing web applications;
If the attacker gets past the first line of defence, there should be something stopping them at the next layer and so forth.
The aim is to stop the attack as soon as possible.
This is why we focus on the UI first, and later move our focus to the application server, then to the database.
Bear in mind though, that what ever we do on the client side, can be circumvented relatively easy.
Client side code is out of our control, so it’s best effort.
Because of this, we need to perform the following not only in the browser, but as much as possible on the server side as well.

  1. Minimising the attack surface
  2. Defining maximum field lengths (validation)
  3. Determining a white list of allowable characters (validation)
  4. Escaping untrusted data, especially where you know it’s going to endup in an execution context. Even where you don’t think this is likely, it’s still possible.
  5. Using Stored Procedures / parameterised queries (not covered in this series).
  6. Least Privilege.
    Minimising the privileges assigned to every database account (not covered in this series).

Minimising the attack surface

input fields should only allow certain characters to be input.
Text input fields, textareas etc that are free form (anything is allowed) are very hard to constrain to a small white list.
input fields where ever possible should be constrained to well structured data,
like dates, social security numbers, zip codes, e-mail addresses, etc. then the developer should be able to define a very strong validation pattern, usually based on regular expressions, for validating such input. If the input field comes from a fixed set of options, like a drop down list or radio buttons, then the input needs to match exactly one of the values offered to the user in the first place.
As it was with the existing app I was working on, we had to allow just about everything in our free form text fields.
This will have to be re-designed in order to provide constrained input.

Defining maximum field lengths (validation)

This was currently being done (sometimes) in the Xml content for inputs where type="text".
Don’t worry about the inputType="single", it gets transformed.

<input id="2" inputType="single" type="text" size="10" maxlength="10" />

And if no maxlength specified in the Xml, we now specify a default of 50 in the xsl used to do the transformation.
This way we had the input where type="text" covered for the client side.
This would also have to be validated on the server side when the service received values from these inputs where type="text".

    <xsl:template match="input[@inputType='single']">
      <xsl:value-of select="@text" />
        <input name="a{@id}" type="text" id="a{@id}" class="textareaSingle">
          <xsl:attribute name="value">
            <xsl:choose>
              <xsl:when test="key('response', @id)">
                <xsl:value-of select="key('response', @id)" />
              </xsl:when>
              <xsl:otherwise>
                <xsl:value-of select="string(' ')" />
              </xsl:otherwise>
            </xsl:choose>
          </xsl:attribute>
          <xsl:attribute name="maxlength">
            <xsl:choose>
              <xsl:when test="@maxlength">
                <xsl:value-of select="@maxlength"/>
              </xsl:when>
              <xsl:otherwise>50</xsl:otherwise>
            </xsl:choose>
          </xsl:attribute>
        </input>
        <br/>
    </xsl:template>

For textareas we added maxlength validation as part of the white list validation.
See below for details.

Determining a white list of allowable characters (validation)

See bottom of this section for Update

Now this was quite an interesting exercise.
I needed to apply a white list to all characters being entered into the input fields.
A user can:

  1. type the characters in
  2. [ctrl]+[v] a clipboard full of characters in
  3. right click -> Paste

To cover all these scenarios as elegantly as possible, was going to be a bit of a challenge.
I looked at a few JavaScript libraries including one or two JQuery plug-ins.
None of them covered all these scenarios effectively.
I wish they did, because the solution I wasn’t totally happy with, because it required polling.
In saying that, I measured performance, and even bringing the interval right down had negligible effect, and it covered all scenarios.

setupUserInputValidation = function () {

  var textAreaMaxLength = 400;
  var elementsToValidate;
  var whiteList = /[^A-Za-z_0-9\s.,]/g;

  var elementValue = {
    textarea: '',
    textareaChanged: function (obj) {
      var initialValue = obj.value;
      var replacedValue = initialValue.replace(whiteList, "").slice(0, textAreaMaxLength);
      if (replacedValue !== initialValue) {
        this.textarea = replacedValue;
        return true;
      }
      return false;
    },
    inputtext: '',
    inputtextChanged: function (obj) {
      var initialValue = obj.value;
      var replacedValue = initialValue.replace(whiteList, "");
      if (replacedValue !== initialValue) {
        this.inputtext = replacedValue;
        return true;
      }
      return false;
    }
  };

  elementsToValidate = {
    textareainputelements: (function () {
      var elements = $('#page' + currentPage).find('textarea');
      if (elements.length > 0) {
        return elements;
      }
      return 'no elements found';
    } ()),
    textInputElements: (function () {
      var elements = $('#page' + currentPage).find('input[type=text]');
      if (elements.length > 0) {
        return elements;
      }
      return 'no elements found';
    } ())
  };

  // store the intervals id in outer scope so we can clear the interval when we change pages.
  userInputValidationIntervalId = setInterval(function () {
    var element;

    // Iterate through each one and remove any characters not in the whitelist.
    // Iterate through each one and trim any that are longer than textAreaMaxLength.

    for (element in elementsToValidate) {
      if (elementsToValidate.hasOwnProperty(element)) {
        if (elementsToValidate[element] === 'no elements found')
          continue;

        $.each(elementsToValidate[element], function () {
          $(this).attr('value', function () {
            var name = $(this).prop('tagName').toLowerCase();
            name = name === 'input' ? name + $(this).prop('type') : name;
            if (elementValue[name + 'Changed'](this))
              this.value = elementValue[name];
          });
        });
      }
    }
  }, 300); // milliseconds
};

Each time we change page, we clear the interval and reset it for the new page.

clearInterval(userInputValidationIntervalId);

setupUserInputValidation();

Update 2013-06-02:

Now with HTML5 we have the pattern attribute on the input tag, which allows us to specify a regular expression that the text about to be received is checked against. We can also see it here amongst the new HTML5 attributes . If used, this can make our JavaScript white listing redundant, providing we don’t have textareas which W3C has neglected to include the new pattern attribute on. I’d love to know why?

Escaping untrusted data

Escaped data will still render in the browser properly.
Escaping simply lets the interpreter know that the data is not intended to be executed,
and thus prevents the attack.

Now what we do here is extend the String prototype with a function called htmlEscape.

if (typeof Function.prototype.method !== "function") {
  Function.prototype.method = function (name, func) {
    this.prototype[name] = func;
    return this;
  };
}

String.method('htmlEscape', function () {

  // Escape the following characters with HTML entity encoding to prevent switching into any execution context,
  // such as script, style, or event handlers.
  // Using hex entities is recommended in the spec.
  // In addition to the 5 characters significant in XML (&, <, >, ", '), the forward slash is included as it helps to end an HTML entity.
  var character = {
    '&': '&amp;',
    '<': '&lt;',
    '>': '&gt;',
    '"': '&quot;',
    // Double escape character entity references.
    // Why?
    // The XmlTextReader that is setup in XmlDocument.LoadXml on the service considers the character entity references () to be the character they represent.
    // All XML is converted to unicode on reading and any such entities are removed in favor of the unicode character they represent.
    // So we double escape character entity references.
    // These now get read to the XmlDocument and saved to the database as double encoded Html entities.
    // Now when these values are pulled from the database and sent to the browser, it decodes the & and displays #x27; and/or #x2F.
    // This isn't what we want to see in the browser.
    "'": '&amp;#x27;',    // &apos; is not recommended
    '/': '&amp;#x2F;'     // forward slash is included as it helps end an HTML entity
  };

  return function () {
    return this.replace(/[&<>"'/]/g, function (c) {
      return character[c];
    });
  };
}());

This allows us to, well, html escape our strings.

element.value.htmlEscape();

In looking through here,
The only untrusted data we are capturing is going to be inserted into an Html element

tag by way of insertion into a textarea element,
or the attribute value of input elements where type="text".
I initially thought I’d have to:

  1. Html escape the untrusted data which is only being captured from textarea elements.
  2. Attribute escape the untrusted data which is being captured from the value attribute of input elements where type="text".

RULE #2 – Attribute Escape Before Inserting Untrusted Data into HTML Common Attributes of here,
mentions
“Properly quoted attributes can only be escaped with the corresponding quote.”
So I decided to test it.
Created a collection of injection attacks. None of which worked.
Turned out we only needed to Html escape for the untrusted data that was going to be inserted into the textarea element.
More on this in a bit.

Now in regards to the code comments in the above code around having to double escape character entity references;
Because we’re sending the strings to the browser, it’s easiest to single decode the double encoded Html on the service side only.
Now because we’re still focused on the client side sanitisation,
and we are going to shift our focus soon to making sure we cover the server side,
we know we’re going to have to create some sanitisation routines for our .NET service.
Because the routines are quite likely going to be static, and we’re pretty much just dealing with strings,
lets create an extensions class in a new project in a common library we’ve already got.
This will allow us to get the widest use out of our sanitisation routines.
It also allows us to wrap any existing libraries or parts of them that we want to get use of.

namespace My.Common.Security.Encoding
{
    /// <summary>
    /// Provides a series of extension methods that perform sanitisation.
    /// Escaping, unescaping, etc.
    /// Usually targeted at user input, to help defend against the likes of XSS attacks.
    /// </summary>
    public static class Extensions
    {
        /// <summary>
        /// Returns a new string in which all occurrences of a double escaped html character (that's an html entity immediatly prefixed with another html entity)
        /// in the current instance are replaced with the single escaped character.
        /// </summary>
        ///
        /// The new string.
        public static string SingleDecodeDoubleEncodedHtml(this string source)
        {
            return source.Replace("&amp;#x", "&#x");
        }
    }
}

Now when we run our xslt transformation on the service, we chain our new extension method on the end.
Which gives us back a single encoded string that the browser is happy to display as the decoded value.

return Transform().SingleDecodeDoubleEncodedHtml();

Now back to my findings from the test above.
Turns out that “Properly quoted attributes can only be escaped with the corresponding quote.” really is true.
I thought that if I entered something like the following into the attribute value of an input element where type="text",
then the first double quote would be interpreted as the corresponding quote,
and the end double quote would be interpreted as the end quote of the onmouseover attribute value.

 " onmouseover="alert(2)

What actually happens, is during the transform…

xslCompiledTransform.Transform(xmlNodeReader, args, writer, new XmlUrlResolver());

All the relevant double quotes are converted to the double quote Html entity ‘”‘ without the single quotes.

onmouseover

And all double quotes are being stored in the database as the character value.

Libraries and useful code

Microsoft Anti-Cross Site Scripting Library

OWASP Encoding Project
This is the Reform library. Supports Perl, Python, PHP, JavaScript, ASP, Java, .NET

Online escape tool supporting Html escape/unescape, Java, .NET, JavaScript

The characters that need escaping for inserting untrusted data into Html element content

JavaScript The Good Parts: pg 90 has a nice ‘entityify’ function

OWASP Enterprise Security API Used for JavaScript escaping (ESAPI4JS)

JQuery plugin

Changing encoding on html page

Cheat Sheets and Check Lists I found helpful

https://www.owasp.org/index.php/Input_Validation_Cheat_Sheet

https://www.owasp.org/index.php/OWASP_Validation_Regex_Repository

https://www.owasp.org/index.php/XSS_(Cross_Site_Scripting)_Prevention_Cheat_Sheet

https://www.owasp.org/index.php/DOM_based_XSS_Prevention_Cheat_Sheet

https://www.owasp.org/index.php/OWASP_AJAX_Security_Guidelines

If any of this is unclear, let me know and I’ll do my best to clarify. Maybe you have suggestions of how this could have been improved? Let’s spark a discussion.