How to Increase Software Developer Productivity

March 2, 2013

Is your organisation:

  • Wanting to get more out of your Software Developers?
  • Wanting to increase RoI?
  • Spending too much money fixing bugs?
  • Development team not releasing business value fast enough?
  • Maybe your a software developer and you want to lift your game to the next level?

If any of these points are of concern to you… read on.

There are many things we can do to lift a software developers productivity and thus the total output of The Development Team. I’m going to address some quick and cheap wins, followed by items that may take a little longer to implement, but non the less, will in many cases provide even greater results.

What ever it takes to remove friction and empower your software developers to work with the least amount of interruptions, do it.
Allow them to create a space that they love working in. I know when I work from home my days are far more productive than when working for a company that insists on cramming as many workers around you into a small space as possible. Chitter chatter from behind, both sides and in front of you will not help one get their mind into a state of deep thought easily.

I have included thoughts from Nicholas C. Zakas post to re-iterate the common fallacies uttered by non-engineers.

  • I don’t understand why this is such a big deal. Isn’t it just a few lines of code? (Technically, everything is a few lines of code. That doesn’t make it easy or simple.)
  • {insert name here} says it can be done in a couple of days. (That’s because {insert name here} already has perfect knowledge of the solution. I don’t, I need to learn it first.)
  • What can we do to make this go faster? Do you need more engineers? (Throwing more engineers at a problem frequently makes it worse. The only way to get something built faster is to build a smaller thing.)

Screen real estate

When writing code, a software developers work requires a lot of time spent deep in thought. Holding multiple layers of complexity within immediately accessible memory.
One of the big wins I’ve found that helps with continuity, is maximising your screen real estate.
I’ve now moved up to 3 x 27″ 2560×1440 IPS flat panels. These are absolutely gorgeous to look at/work with.
Software development generally requires a large number of applications to be running at any one time.
For example in any average session for me, I generally have somewhere around 30 windows open.
The more screen real estate a developer has, the less he/she has to fossick around for what he/she needs and switch between them.
Also, the less brain cycles he/she has to spend locating that next running application, means the more cycles you have in order to do real work.
So, the less gap there is switching between say one code editor and another, the easier it is for a developer to keep the big picture in memory.
We’re looking at:

  1. physical screen size
  2. total pixel count

The greater real estate available (physical screen size and pixel count) the more information you can have instant access to, which means:

  • less waiting
  • less memory loss
  • less time spent rebuilding structures in your head
  • greater continuity

Which then gives your organisation and developers:

  • greater productivity
  • greater RoI

These screens are cheaper than many realise. I set these up 4 months ago. They continue to drop in price.

  1. FSM-270YG 27″ PC Monitor LED S-IPS WIDE 2560×1440 16:9 WQHD DVI-D $470.98 NZD
  2. [QH270-IPSMS] Achieva ShiMian HDMI DVI D-Sub 27″ LG LED 2560×1440 $565.05 NZD
  3. [QH270-IPSMS] Achieva ShiMian HDMI DVI D-Sub 27″ LG LED 2560×1440 $565.05 NZD

It’s just simply not worth not to upgrading to these types of panels.

korean monitors

In this setup, I’m running Linux Mint Maya. Besides the IPS panels, I’m using the following hardware.

  • Video card: 1 x Gigabyte GV-N650OC-2GI GTX 650 PCIE
  • PSU: 1200w Corsair AX1200 (Corsair AX means no more PSU troubles (7 yr warranty))
  • CPU: Intel Core i7 3820 3.60GHz (2011)
  • Mobo: Asus P9X79
  • HDD: 1TB Western Digital WD10EZEX Caviar Blue
  • RAM: Corsair 16GB (2x8GB) Vengeance Performance Memory Module DDR3 1600MHz

One of the ShiMian panels is using the VGA port on the video card as the FSM-270YG only supports DVI.
The other ShiMian and the FSM-270YG are hooked up to the 2 DVI-D (dual link) ports on the video card. The two panels feeding on the dual link are obviously a lot clearer than the panel feeding on the VGA. Also I can reduce the size of the text considerably giving me greater clarity while reading, while enabling me to fit a lot more information on the screens.

With this development box, I’m never left waiting for the machine to catchup with my thought process.
So don’t skimp on hardware. It just doesn’t make sense any way you look at it.

Machine Speed

The same goes for your machine speed. If you have to wait for your machine to do what you’ve commanded it to do and at the same time try and keep a complex application structure in your head, the likelihood of loosing part of that picture increases. Plus your brain has to work harder to hold the image in memory while your trying to maintain continuity of thought. Again using precious cycles for something that shouldn’t be required rather than on the essential work. When a developer looses part of this picture, they have to rebuild it again when the machine finishes executing the last command given. This is re-work that should not be necessary.

An interesting observation from Joel Spolsky:

“The longer it takes to task switch, the bigger the penalty you pay for multitasking.
OK, back to the more interesting topic of managing humans, not CPUs. The trick here is that when you manage programmers, specifically, task switches take a really, really, really long time. That’s because programming is the kind of task where you have to keep a lot of things in your head at once. The more things you remember at once, the more productive you are at programming. A programmer coding at full throttle is keeping zillions of things in their head at once: everything from names of variables, data structures, important APIs, the names of utility functions that they wrote and call a lot, even the name of the subdirectory where they store their source code. If you send that programmer to Crete for a three week vacation, they will forget it all. The human brain seems to move it out of short-term RAM and swaps it out onto a backup tape where it takes forever to retrieve.”

Many of my posts so far have been focused on productivity enhancements. Essentially increasing RoI. This list will continue to grow.

Coding Standards and Guidelines

Agreeing on a set of Coding Standards and Guidelines and policing them (generally by way of code reviews and check-in commit scripts) means software developers get to spend less time thinking about things that they don’t need to and get to throw more time at the real problems.

For example:

Better Tooling

Improving tool sets has huge gains in productivity. In most cases many of the best tools are free. Moving from the likes of non distributed source control systems to best of bread distributed.

There are many more that should be considered.

Wiki

Implementing an excellent Wiki that is easy to use. I’ve put a few wiki’s in place now and have used even more. My current pick of the bunch would have to be Atlassians Confluence. I’ve installed this on a local server and also migrated the instance to their cloud. There are varying plans and all very reasonably priced with excellent support. If the wiki you’re planning on using is not as intuitive as it could be, developers just wont use it. So don’t settle for anything less.

Improving Processes

Code Reviews

Also a very important step in all successful development teams and often a discipline that must be satisfied as part of Scrums Definition of Done (DoD). What this gives us is high quality designs and code, conforming to the coding standards. This reduces defects, duplicate code (DRY) and enforces easily readable code as the reviewer has to understand it. Saves a lot of money in re-work.

Cost of Change

Scott Amblers Cost of change curve

Definition of Done (DoD)

Get The Team together and decide on what it means to have each Product Backlog Item that’s pulled into the Sprint Done.
Here’s an example of a DoD that one of my previous Development Teams compiled:

Definition of Done

What does Done actually mean?

Come Sprint Review on the last day of the Sprint, everyone knows what it means to be done. There is no “well I thought it was Done because I’ve written the code for it, but it’s not tested yet”.

Continuous Integration (CI)

There are many tools and ways to implement CI. What does CI give you? Visibility of code quality, adherence to standards, reports on cyclomatic complexity, predictability and quite a number of other positive side effects. You’ll know as soon as the code fails to build and/or your fast running tests (unit tests) fail. This means The Development Team don’t keep writing code on top of faulty code, thus reducing technical debt by not having to undo changes on changes later down the track.
I’ve used a number of these tools and have carried out extensive research and evaluation spikes on a number of the most popular offerings. In order of preference, the following are my candidates.

  1. Jenkins (free and open source, with a great community)
  2. TeamCity
  3. Atlassian Bamboo

Release Plans

Make sure you have these. This will reduce confusion and provide a clear definition of the steps involved to get your software out the door. This will reduce the likelihood of screwing up a release and re-work being required. You’ll definitely need one of these for the next item.

Here’s an example of a release notes guideline I wrote for one of the previous companies I worked for.

release notes

Continuous Deployment

If using Scrum, The Scrum Team will be forecasting a potentially releasable Increment (the sum of all the Product Backlog items completed during a Sprint and all previous Sprints).
You may decide to actually release this. When you do, you can look at the possibility of automating this deployment. Thus reducing the workload of the release manager or who ever usually deploys (often The Development Team in a Scrum environment). This has the added benefit of consistency, predictability, reliability and of course happy customers. I’ve also been through this process of research and evaluation on the tools available and the techniques to implement.

Here’s a good podcast that got me started. I’ve got a collection of other resources if you need them and can offer you my experience in this process. Just leave a comment.

Implement Scrum (and not the Flaccid flavour)

I hope this goes without saying?
Implementing Scrum to provide ultimate visibility

Get maximum quality out of the least money spent

How to get the most out of your limited QA budget

Driving your designs with tests, thus creating maintainable code, thus reducing technical debt.

Hold Retrospectives

Scrum is big on continual inspection and adaption, self-organisation and fostering innovation. The military have another term for inspection and adaption. It’s called the OODA Loop.
The Retrospective is just one of the Scrum Events that enable The Scrum Team to continually inspect the way they are doing things and improve the way they develop and deliver business value.

Invest a little into your servant leaders

Empowering the servant leaders.

Context Switching

Don’t do it. This is a real killer.
This is hard. What you need to do is be aware of how much productivity is killed with each switch. Then do everything in your power to make sure your Development Team is sheltered from as much as possible. There are many ways to do this. For starters, you’re going to need as much visibility as possible into how much this is currently happening. track add-hock requests and any other types of interruptions that steel the developers concentration. In the last Scrum Team that I was Scrum Master of, The Development Team decided to include another metric to the burn down chart that was on the middle of the wall, clearly visible to all. Every time one of the developers was interrupted during a Sprint, they would record this time, the reason and who interrupted them, on the burn down chart. The Scrum Team would then address this during the Retrospective and empirically address why this happened and work out how to stop it happening every Sprint. Jeff Atwood has an informative post on why and how context-switching/multitasking kills productivity. Be sure to check it out.

As always, if anything I’ve mentioned isn’t completely clear, or you have any questions, please leave a comment 🙂

Establishing your SSH Server’s Key Fingerprint

February 16, 2013

When you connect to a remote host via SSH that you haven’t established a trust relationship with before,
you’re going to be told that the authenticity of the host your attempting to connect to can’t be established.

me@mybox ~ $ ssh me@10.1.1.40
The authenticity of host '10.1.1.40 (10.1.1.40)' can't be established.
RSA key fingerprint is 23:d9:43:34:9c:b3:23:da:94:cb:39:f8:6a:95:c6:bc.
Are you sure you want to continue connecting (yes/no)? y
Please type 'yes' or 'no':

Do you type yes to continue without actually knowing that it is the host you think it is? Well, if you do, you should be more careful. The fingerprint that’s being put in front of you could be a Man In The Middle (MITM). You can query the target (from “it’s” shell of course) for the fingerprint of it’s key easily. On Debian you’ll find the keys in /etc/ssh/

On

ls /etc/ssh/

you should get a listing that reveals the private and public keys. Run the following command on the appropriate key to reveal it’s fingerprint. For example if SSH is using rsa:

ssh-keygen -lf ssh_host_rsa_key.pub

For example if SSH is using dsa:

ssh-keygen -lf ssh_host_dsa_key.pub

If you try the command on either the private or publick key you’ll be given the public key’s fingerprint, which is exactly what you need for verifying the authenticity from the client side.

Sometimes you may need to force the output of the fingerprint_hash algorithm as ssh-keygen may be displaying it in a different form than it’s shown when you try to SSH for the first time. The default when using ssh-keygen to show the key fingerprint is sha256, but in order to compare apples with apples you may need to specify md5 if that’s what’s being shown when you attempt to login. You would do that like the following:

ssh-keygen -lE md5 -f ssh_host_dsa_key.pub

Details on the man page for the options.

Do not connect remotely and then run the above command, as the machine you’re connected to is still untrusted. The command could be dishing you up any string replacement if it’s an attackers machine. You need to run the command on the physical box or get someone you trust (your network admin) to do this and hand you the fingerprint.

Now when you try to establish your SSH connection for the first time, you can check that the remote host is actually the host you think it is by comparing the output of one of the previous commands with what SSH on your client is telling you the remote hosts fingerprint is. If it’s different it’s time to start tracking down the origin of the host masquerading as the address your trying to hook up with.

Now, when you get the following message when attempting to SSH to your server, due to something or somebody changing the hosts key fingerprint:

@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
@    WARNING: REMOTE HOST IDENTIFICATION HAS CHANGED!     @
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
IT IS POSSIBLE THAT SOMEONE IS DOING SOMETHING NASTY!
Someone could be eavesdropping on you right now (man-in-the-middle attack)!
It is also possible that a host key has just been changed.
The fingerprint for the RSA key sent by the remote host is
23:d9:43:34:9c:b3:23:da:94:cb:39:f8:6a:95:c6:bc.
Please contact your system administrator.
Add correct host key in /home/me/.ssh/known_hosts to get rid of this message.
Offending RSA key in /home/me/.ssh/known_hosts:6
  remove with: ssh-keygen -f "/home/me/.ssh/known_hosts" -R 10.1.1.40
RSA host key for 10.1.1.40 has changed and you have requested strict checking.
Host key verification failed.

The same applies. Check that the fingerprint is indeed the intended target hosts key fingerprint. If it is, run the specified command.

Painless git diff

February 2, 2013

I’ve been using Node.js quite a bit lately and decided it was time to start using Git for my projects.

I’m used to using Mercurial (Hg) for DVCS, but have only used it on Windows and a little on Linux via command line.

I was looking for a similar experience that Windows gave me for Hg (file explorer integration with tortoisehg), but for Linux. I had created a repository using tortoisehg. When I attempted to add files to the repository using tortoisehg or straight from the command line, I was getting a few errors. tortoisehg, nautilus integration is broken on my distro at the time of writing this too. So this encouraged me to invest a little more time in Git. I had done a bit of reading and listened to a few good podcasts on Git, so I felt it was a good time.
think like a git is also good for a read.

As I was creating repositories, dealing with remote repositories, cloning, setting up all the config files, adding, committing, pulling, pushing, viewing status and diffing. What I quickly came to realise, was that the Git commands were very extensive, made more sense to me than Hg, and there is a lot of good documentation around. In saying that, it’s been a while since I used hg from the command line and most of my work has been through the GUI tools.

One area I was struggling with was the diffing of files and directories on the command line. There are a couple of good ways to make this experience a lot more pleasurable.
I like using meld on Linux for my file and directory comparisons, so already had that installed.

git diff

Create a bash file in the /bin directory.
I called it git-meld, and it looks like the following:

#!/bin/bash
meld $2 $5

Turn the executable bit on, so it can be executed.

chmod git-meld +x

Now modify your ~/.gitconfig file

git config --global diff.external git-meld

To make sure your’ve added git-meld as the script that’ll run meld with the correct parameters:

cat .gitconfig

and you should see at least the following:

[diff]
external = git-meld

Now that should be all you need to get git to pop meld on diff.

git diff [options] <commit> <commit> [<path_to_file_to_compare>]

If you have a stack of files (rather than just one, as shown in my above example) that were changed between these commits, diff will pop each file open in meld. One at a time until you’ve finished with each one

meld

git difftool

git also comes with difftool. I found this really nice to use. There is no setting up for it. All you do is replace the diff command with difftool. Optionally you can specify the GUI diff tool you want to use, simply by appending -t [your_GUI_diff_tool] like this if you like using meld.

git difftool -t meld <commit> <commit> [<path_to_file_to_compare>]

If you do this without specifying the file you want to compare, you are prompted if you want to view each file, rather than how diff works by just opening every one.

Launchmeld

If you choose to leave the -t option out, difftool will give you the option of all the possible tools able to perform the diff (some of which may need installation).

multiple diff tools

So using difftool is a better diff IMHO. This is how git difftool behaves whether or not you set up ~/.gitconfig file with your prefered diff tool.

A Decent Console for Windows

January 19, 2013

On *nix we’re kind of spoilt when it comes to the CLI experience.
The console I use most in a GUI environment is the great terminator.

terminator

No, not that one.
This one

terminator

Multi tab, split screen, transparency, the works.
Then we’ve also got tmux (and a comparison between terminator and tmux).
Taking things further, we’ve got awesome

Well I’ve been looking for something similar for Windows for a while.
I’ve tried terminator on Cygwin, but it’s just not the same, plus it only supports the single shell.

Meet Console2

Console2 PS

With PowerShell as the currently active tab.
It’s a stand alone executable and crucially it’s free.
Console2 is just that, a console or terminal that seems to be able to host any shell that’s thrown at it.
As you can see with the image above, I’ve setup Console2 to host the following shells:

  1. Windows Command shell
  2. The Visual Studio Command Prompt (which is just the Windows Command shell (with some paths and variables added?))
  3. PowerShell
  4. The node.js Read-Eval-Print-Loop (REPL)
  5. VMwares vSphere PowerCLI
  6. And of course the bash shell we all know and love.
    Cygwin required

Although project activity looks minimal to non existent currently.

How I setup Console2

Once running, right click the console -> Edit -> Settings…

  • Setup the hot keys under the Hotkeys node to behave like the terminal I use on Linux (currently terminator).
    • Select the specific command, put your cursor in the Hotkey text box, press your preferred key combination, press the Assign key.
    • For opening new tabs I use Ctrl+Shift+T
    • Change the Copy selection node to what it should be: Ctrl+C
    • Change the Past to what it should be: Ctrl+V
  • Under the Console node, enter the directory to have each shell start in
  • Under the Appearance/More… node, I deselect the Show menu, Show toolbar and Show status bar
    • I make sure the Window transparency is set to None, as it just distracts me being able to see stuff behind the surface I’m concentrating on.
      It looks cool to turn it on, but I personally find it harder to read the text when you’ve got to lots of text overlapping
    • Under the Behavior node, I turn on Copy on select, as this is on in Linux by default
  • Now under the Tabs node is where we set up all of our shells.
    • Click the Add button
    • Change the name you want the shell tab to appear as in the Main tab under the Title text box
    • Now for the Icons I just got images I wanted for them and opened them in GIMP and changed the size to 32×32 pixels and saved as .ico files to the same directory that the Console.exe runs from
    • I Then select them here
    • Under the Shell section I just copy the short cuts from the likes of the start menu and past them in there
    • You can then override the default startup dir by specifying your path in the Startup dir text box
    • You can also specify if you want the shell to run as a specific user. Administrator for example.
      When you run this shell, you’ll be prompted for the users credentials if it’s not you.

As I was working through the Console2 set up, I ran into another offering…

Meet ConEmu

The actively maintained ConEmu lives here.
I had a quick play with this and a flick through the documentation.
The simple tasks of setting up different shells as pre-sets seemed to evade me.
There seems to be a lot more configuration options too.
As I’d just set up the Console2 and it seemed to be doing everything I needed for now, I decided to call it quits with ConEmu.
I think it’s worth checking out though if you need more power than Console2.
Scott Hanselmans post on Conemu.

Generic Coding Standards and Guidelines

January 5, 2013

Merging Conventions to Aid Readability thus Reducing Development Time

When programming in a mixed-language environment,
the naming conventions, formatting conventions, documentation conventions, and other conventions,
can be optimised for overall consistency and readability.
This may mean going against convention for one or more of the languages that’s part of the mix.

For Example…

in many classical Object Oriented and procedural languages,
routine names have an initial capital letter (PascalCase).
The convention in JavaScript is for routine names to have an initial lower case letter (camelCase),
unless the routine is a constructor (intended to be used with the new prefix).
When a constructor is invoked without the new prefix,
the constructors this will be bound to the global object,
rather than where it should be…
The functions execution context.
When invoked with the new prefix as it should be,
the function object will be created with a hidden link to the value of the functions prototype,
and the functions this value will be bound to the function object (where it should be).
Because this convention has a very important reason,
your team may decide to carry that convention across the other languages you use.

Refactor or Document Short, Hard to Read Names

I don’t know how many times I see code that uses very short names which make readability difficult.
What’s worse, is that so often there are many different names that mean the same thing sprinkled across the project/s.
Short, hard to read, pronounce, or understand names are rarely needed with the programming languages of today.
Use easily and quickly readable names where ever possible.
If you have to use short names or abbreviations, keep them consistent.
Translation tables are good for this.
You can have a commented translation table at the beginning of a file,
or at the project level if the names are wider spread.
Names should be specific to the domain your working in, rather than to the programming language.

Meaningful Loop Index Names

If your loop is more than a couple of lines long or you have nested loops,
make your loop index name something meaningful,
rather than i, j, k etc.

Additional Thoughts

  • Code is read many more times than it is written.
    Make sure the names you choose favour read-time over write-time convenience.
  • If you have names that are general or vague enough to be used for multiple purposes,
    refactor your code, maybe create additional entities that have more specific names.
  • Don’t leave the meaning of the name to guess work.
    This taxes the programmers mind unnecessarily.
    There are better uses of our cycles.
  • Agree on and adopt a set of coding standards and guidelines.
    It’s more important to have standards than to not have them because you can’t agree on the “right” way.
    They will save wasted time and arguments during coding, and code reviewing.

JavaScript Coding Standards and Guidelines

December 19, 2012

This is the current set of coding standards and guidelines I use when I’m coding in the JavaScript language.
I thought it would be good to share so others could get use out of them also, and maybe start a discussion as to amendments / changes they see that could be useful?

Naming Conventions

Names should be formed from the 26 upper and lower case letters (A .. Z, a .. z), the 10 digits (0 .. 9), and _ (underbar).
Avoid use of international characters because they may not read well or be understood everywhere.
Do not use $ (dollar sign) or \ (backslash) in names.

I think jQuery would have to be an exception to this

Do not use _ (underbar) as the first character of a name.
It is sometimes used to indicate privacy, but it does not actually provide privacy.
If privacy is important, use the forms that provide private members.

Most variables and functions should start with a lower case letter.

Constructor functions which must be used with the new prefix should start with a capital letter.
JavaScript issues neither a compile-time warning nor a run-time warning if a required new is omitted.
Bad things can happen if new is not used, so the capitalization convention is the only defence we have.

Global variables should be in all caps.
JavaScript does not have macros or constants, so there isn’t much point in using all caps to signify features that JavaScript doesn’t have.
Same with Enum names. There is no native ability to create constant variables. Although… you can create read-only properties.
With the advent of ES5 we now have a couple of well known techniques to enforce that our property values can not be altered once initialised.

When you define a property using one of the ES5 techniques, (1) when you set the writable property attribute to false the value of the value attribute can no longer be altered. (2) Using an accessor property with only a get function

var objWithMultipleProperties;
var objWithMultiplePropertiesDescriptor;

objWithMultipleProperties = Object.defineProperties({}, {
   x: { value: 1, writable: true, enumerable:true, configurable:true }, // Change writable to false enables read-only semantics.
   y: { value: 1, writable: true, enumerable:true, configurable:true }, // Change writable to false enables read-only semantics.
   r: {
      get: function() {
         return Math.sqrt(this.x*this.x + this.y*this.y)
      },
      enumerable:true,
      configurable:true
   }
});

objWithMultiplePropertiesDescriptor = Object.getOwnPropertyDescriptor(objWithMultipleProperties, 'r');
// objWithMultiplePropertiesDescriptor {
//    configurable: true,
//    enumerable: true,
//    get: function () {
//       // other members in here
//    },
//    set: undefined,
//   // ...
// }

See here for complete coverage of property attributes.

Coding Style

Commenting

Use comments where needed.

If the script needs commenting due to complexity, consider revising before commenting.
Comments easily rot (can be left behind after future code changes).
It is important that comments be kept up-to-date. Erroneous comments can make programs even harder to read and understand.
In this case they cause more damage than they originally gave benefit.
Comments should be well-written and clear, just like the code they are annotating.

Make comments meaningful. Focus on what is not immediately visible. Don’t waste the reader’s time with stuff like

i = 0; // Set i to zero.

Comment Style

Block comments are not safe for
commenting out blocks of code. For example:

/*
var rm_a = /a*/.match(s);
*/

causes a syntax error. So, it is recommended that /* */ comments be avoided and //
comments be used instead.

Don’t use HTML comments in script blocks. In the ancient days of javascript (1995), some browsers like Netscape 1.0 didn’t have any support or knowledge of the script tag. So when javascript was first released, a technique was needed to hide the code from older browsers so they wouldn’t show it as text in the page. The ‘hack’ was to use HTML comments within the script block to hide the code.

<script language="javascript">
<!--
   // code here
   //-->
</script>

No browsers in common use today are ignorant of the <script> tag, so hiding of javascript source is no longer necessary. Thanks Matt Kruse for re-iterating this.

File Organization

JavaScript programs should be stored in and delivered as .js files.

  • JavaScript should not be in HTML
  • JavaScript should not be in CSS
  • CSS should not be in JavaScript
  • HTML should not be in JavaScript

Code in HTML adds significantly to page weight with no opportunity for mitigation by caching and compression. There are many other reasons to keep the UI layers separate.

<script src=filename.js>; tags should be placed as late in the body as possible.
This reduces the effects of delays imposed by script loading on other page components.

There is no need to use the language or type attributes. It is the server, not the script tag, that determines the MIME type.

Formatting

Bracing

Use same line opening brace.
Brace positioning is more or less a holy war without any right answer — except in JavaScript, where same-line braces are right and you should always use them. Here’s why:

return
{
  ok: false;
};

return {
  ok: true;
};

What’s the difference between these two snippets? Well, in the first one, you silently get something completely different than what you wanted.
The lone return gets mangled by the semicolon insertion process and becomes return; and returns nothing.
The rest of the code becomes a plain old block statement, with ok: becoming a label (of all things)! Having a label there might make sense in C, where you can goto, but in JavaScript, it makes no sense in this context.
And what happens to false? it gets evaluated and completely ignored.
Finally, the trailing semicolon — what about that?
Do we at least get a syntax error there? Nope: empty statement, like in C.

Spacing

Blank lines improve readability by setting off sections of code that are logically related.

Blank spaces should be used in the following circumstances:

  • A keyword followed by ( (left parenthesis) should be separated by a space.
    while (true) {
    
  • A blank space should not be used between a function value and its ( (left parenthesis). This helps to distinguish between keywords and function invocations.
  • All binary operators except . (period) and ( (left parenthesis) and [ (left bracket) should be separated from their operands by a space.
  • No space should separate a unary operator and its operand except when the operator is a word such as typeof.
  • Each ; (semicolon) in the control part of a for statement should be followed with a space.
  • Whitespace should follow every , (comma).

Tabs and Indenting

The unit of indentation is three spaces.
Use of tabs should be avoided because (as of this writing in the 21st Century) there still is not a standard for the placement of tab stops.
The use of spaces can produce a larger file size, but the size is not significant over local networks, and the difference is eliminated by minification.

Line Length

Avoid lines longer than 80 characters. When a statement will not fit on a single line, it may be necessary to break it.
Place the break after an operator, ideally after a comma.
A break after an operator decreases the likelihood that a copy-paste error will be masked by semicolon insertion.
The next line should be indented six spaces.

Language Usage

Access To Members

In JavaScript we don’t have access modifiers like we do in classical languages.

In saying that, we can and should still control the accessibility of our members, and keep as much of our objects information secret to outsiders.
We can apply public, private and Privileged access.

public

Members created in the Constructor using the dot notation

function Container(param) {
    this.member = param;
}

So, if we construct a new object

var myContainer = new Container('abc');

then myContainer.member contains 'abc'.

In the prototype we can add a public method.
To add a public method to all objects made by a constructor, add a function to the constructor’s prototype:

Container.prototype.stamp = function (string) {
    return this.member + string;
}

We can then invoke the method

myContainer.stamp('def')

which produces 'abcdef'.

private

Ordinary vars and parameters of the constructor becomes the private members

function Container(param) {
    this.member = param;
    var secret = 3;
    var that = this;
}

param, secret, and that are private member variables.
They are not accessible to the outside.
They are not even accessible to the objects own public methods.

They are accessible to private methods
Private methods are inner functions of the constructor.

function Container(param) {

    this.member = param; // param is private, member is public
    var secret = 3;      // secret is private
    var that = this;     // that is private
    function dec() {     // dec is private
        var innerFunction = function () {
            that.value = someCrazyValue;  // when dec is called, value will be a member of the newly constructed Container instance.
        };

        if (secret > 0) {
            secret -= 1;
            return true;
        } else {
            return false;
        }
    }
}

A method (which is a function that belongs to an object) cannot employ an inner function to help it do its work because the inner function does not
share the method’s access to the object as its this is bound to the global object. This was a mistake in the design of the language.
Had the language been designed correctly, when the inner function is invoked, this would still be bound to the this variable of the outer function.
The work around for this is to define a variable and assign it the value of this.
By convention, the name of that variable I use is that.

  • Private methods cannot be called by public methods. To make private methods useful, we need to introduce a privileged method.
privileged

A privileged method is able to access the private variables and methods, and is itself accessible to the public methods and the outside. It is possible to delete or replace a privileged method, but it is not possible to alter it, or to force it to give up its secrets.

Privileged methods are assigned with this within the constructor.

function Container(param) {

    this.member = param;
    var secret = 3;
    var that = this;

    function dec() {
        if (secret > 0) {
            secret -= 1;
            return true;
        } else {
            return false;
        }
    }

    this.service = function () { // prefix with this to assign a function to be a privileged method
        if (dec()) {
            return that.member;
        } else {
            return null;
        }
    };
}

service is a privileged method. Calling myContainer.service() will return 'abc' the first three times it is called. After that, it will return null. service calls the private dec method which accesses the private secret variable. service is available to other objects and methods, but it does not allow direct access to the private members.

eval is Evil

The eval function is the most misused feature of JavaScript. Avoid it.
eval has aliases. Do not use the Function constructor. Do not pass strings to setTimeout or setInterval.

Global Abatement

JavaScript makes it easy to define global variables that can hold all of the assets of your application.
Unfortunately, global variables weaken the resiliency of programs and should be avoided.

One way to minimize the use of global variables is to create a single global variable
for your application:

var KimsGlobal = {};

That variable then becomes the container for your application:

KimsGlobal.Calculate = {
    calculatorOutPutArray: [
    $('.step1i1'),
    $('.step1i2'),
    $('.step1r1'),
    ...
    ],
    jQueryObjectCounter: 0,
    jQueryDOMElementCounter: 0
    ...
};

KimsGlobal.NavigateCalculator = {
    currentStepId: stepId,
    step: 'step' + stepId,
    stepSelector: = '#' + step,
    ...
};

By reducing your global footprint to a single name, you significantly reduce the chance of bad interactions with other applications, widgets, or libraries.
Your program also becomes easier to read because it is obvious that KimsGlobal.Calculate refers to a top-level structure.

Using closure for information hiding is another effective global abatement technique.

var digit_name = (function() {
  var names = ['zero', 'one', 'two', ...];
  return function (n) {
    return names[n];
  };
})();

Using the Module patterns explains this in detail.

Scoping

JavaScript scoping is different to classical languages, and can take some getting used to for programmers used to languages such as C, C++, C#, Java.
Classical languages like the before mentioned have block scope.
JavaScript has function scope. Although ES6 is bringing in block scoping with the let keyword.

In the following example “10” will be alerted.
var foo = 1; // foo is defined in global scope.
function bar() {
    if (!foo) { // The foo variable of the bar scope has been hoisted directly above this if statement, but the assignment has not. So it is unassigned (undefined).
        var foo = 10;
    }
    alert(foo);
}
bar();
In the following example “1” will be alerted.
var a = 1;
function b() {
    a = 10;
    return;
    function a() {}
}
b();
alert(a);
In the following example Firebug will show 1, 2, 2.
var x = 1;
console.log(x); // 1
if (true) {
    var x = 2;
    console.log(x); // 2
}
console.log(x); // 2

In JavaScript, blocks such as if statements, do not create new scope. Only functions create new scope.

There is a workaround though 😉
JavaScript has Closure.
If you need to create a temporary scope within a function, do the following.

function foo() {
    var x = 1;
    if (x) {
        (function () {
            var x = 2;
            // some other code
        }());
    }
    // x is still 1.
}

Line 3: begins a closure
Line 6: the closure invokes itself with ()

Hoisting

Terminology

function declaration or function statement are the same thing.
function expression or variable declaration with function assignment are the same thing.

A function statement looks like the following:

function foo( ) {}

A function expression looks like the following:

var foo = function foo( ) {};

A function expression must not start with the word “function”.

//anonymous function expression
var a = function () {
    return 3;
}

//named function expression
var a = function bar() {
    return 3;
}

//self invoking named function expression. This is also a closure
(function sayHello() {
    alert('hello!');
})();

//self invoking anonymous function expression. This is also a closure
(function ( ) {
    var hidden_variable;
    // This function can have some impact on
    // the environment, but introduces no new
    // global variables.
}() );

In JavaScript, a name enters a scope in one of four basic ways:

  1. Language-defined: All scopes are, by default, given the names this and arguments.
  2. Formal parameters: Functions can have named formal parameters, which are scoped to the body of that function.
  3. Function declarations: These are of the form function foo() {}.
  4. Variable declarations: These take the form var foo;.

Function declarations and variable declarations are always hoisted invisibly to the top of their containing scope by the JavaScript interpreter.
Function parameters and language-defined names are, obviously, already there. This means that code like this:

function foo() {
    bar();
    var x = 1;
}

Is actually interpreted like this:

function foo() {
    var x;
    bar();
    x = 1;
}

It turns out that it doesn’t matter whether the line that contains the declaration would ever be executed. The following two functions are equivalent:

function foo() {
    if (false) {
        var x = 1;
    }
    return;
    var y = 1;
}
function foo() {
    var x, y;
    if (false) {
        x = 1;
    }
    return;
    y = 1;
}

The assignment portion of the declaration is not hoisted.
Only the identifier is hoisted.
This is not the case with function declarations, where the entire function body will be hoisted as well,
but remember that there are two normal ways to declare functions. Consider the following JavaScript:

function test() {
    foo(); // TypeError 'foo is not a function'
    bar(); // 'this will run!'
    var foo = function () { // function expression assigned to local variable 'foo'
        alert('this won't run!');
    }
    function bar() { // function declaration, given the name 'bar'
        alert('this will run!');
    }
}
test();

In this case, only the function declaration has its body hoisted to the top. The name ‘foo’ is hoisted, but the body is left behind, to be assigned during execution.

Name Resolution Order

The most important special case to keep in mind is name resolution order. Remember that there are four ways for names to enter a given scope. The order I listed them above is the order they are resolved in. In general, if a name has already been defined, it is never overridden by another property of the same name. This means that a function declaration takes priority over a variable declaration. This does not mean that an assignment to that name will not work, just that the declaration portion will be ignored. There are a few exceptions:

  • The built-in name arguments behaves oddly. It seems to be declared following the formal parameters, but before function declarations. This means that a formal parameter with the name arguments will take precedence over the built-in, even if it is undefined. This is a bad feature. Don’t use the name arguments as a formal parameter.
  • Trying to use the name this as an identifier anywhere will cause a Syntax Error. This is a good feature.
  • If multiple formal parameters have the same name, the one occurring latest in the list will take precedence, even if it is undefined.

Additional hoisting examples on my blog

Now that you understand scoping and hoisting, what does that mean for coding in JavaScript?
The most important thing is to always declare your variables with var statements.
Declare your variables at the top of the scope (as already mentioned JavaScript only has function scope). See the Variable Declarations section.
If you force yourself to do this, you will never have hoisting-related confusion.
However, doing this can make it hard to keep track of which variables have actually been declared in the current scope.
I recommend using strict mode which will inform you if you have tried to use a variable without declaring it with var. If you’ve done all of this, your code should look something like this:

function foo(a, b, c) {
    'use strict';
    var x = 1;
    var bar;
    var baz = 'something';
    // other non hoistable code here
}

Inheritance

Always prefer Prototypal inheritance to Pseudo classical.

There are quite a few reasons why we shouldn’t use Pseudo classical inheritance in JavaScript.
JavaScript The Good Parts explains why.

In a purely prototypal pattern, we dispense with classes.
We focus instead on the objects.
Prototypal inheritance is conceptually simpler than classical inheritance.

I’ll show you three examples of prototypal inheritance, and explain the flaws and why the third attempt is the better way.

Example 1
function object(o) {
    function F() {}
    F.prototype = o;
    return new F();
}

The object function takes an existing object as a parameter and returns an empty new object that inherits from the old one.
The problem with the object function is that it is global, and globals are clearly problematic.

Example 2
Object.prototype.begetObject = function () {
    function F() {}
    F.prototype = this;
    return new F();
};

newObject = oldObject.begetObject();

The problem with Object.prototype.begetObject is that it trips up incompetent programs, and it can produce unexpected results when begetObject is overridden.

Example 3
if (typeof Object.create !== 'function') {
    Object.create = function (o) {
        function F() {}
        F.prototype = o;
        return new F();
    };
}
newObject = Object.create(oldObject);

Example 3 overcomes the problems with the previous prototypical examples.
This is how Object.create works in ES5

New

Use {} instead of new Object(). Use [] instead of new Array(). There are instances where using new allows compiler optimisations to be performed. Learn what happens when you use new in different scenarios and test.

Operators

Plus Minus

Be careful to not follow a + with + or ++. This pattern can be confusing. Insert parens between them to make your intention clear.

total = subtotal + +myInput.value;

is better written as

total = subtotal + (+myInput.value);

so that the + + is not misread as ++.

Equality
By default use the === operator rather than the == operator.
By default use the !== operator rather than the != operator.

JavaScript has two sets of equality operators: === and !==, and their (as Douglas Crockford puts it) evil twins == and
!=. The === and !== ones work the way you would expect. If the two operands are of the
same type and have the same value, then === produces true and !== produces false.
The other ones do the right thing when the operands are of the same type, but if they
are of different types, they attempt to coerce the values.
Make sure this is what you need rather than just blindly using the shorter form.
These are some of the interesting cases:

'' == '0'          // false
0 == ''            // true
0 == '0'           // true
false == 'false'   // false
false == '0'       // true
false == undefined // false
false == null      // false
null == undefined  // true
' \t\r\n ' == 0    // true

Patterns

Module

Module presents an interface but hides its state and implementation.
Takes advantage of function scope and closure to create relationships that are binding and private.
Eliminate the use of global variables. It promotes information hiding and other good design practices.

Global Import

JavaScript has a feature known as implied globals.
Whenever a name is used, the interpreter walks the scope chain backwards looking for a var statement for that name.
If none is found, that variable is assumed to be global.
If it’s used in an assignment, the global is created if it doesn’t already exist.
This means that using or creating global variables in an anonymous closure is easy.
Unfortunately, this leads to hard-to-manage code, as it’s not obvious (to humans) which variables are global in a given file.

Luckily, our anonymous function provides an easy alternative.
By passing globals as parameters to our anonymous function, we import them into our code, which is both clearer and faster than implied globals.

(function ($, YAHOO) {
    // now have access to globals jQuery (as $) and YAHOO in this code
}(jQuery, YAHOO));
Module Export

Sometimes you don’t just want to use globals, but you want to declare them. We can easily do this by exporting them, using the anonymous function’s return value.

var MODULE = (function () {
    var my = {},
        privateVariable = 1;

    function privateMethod() {
        // ...
    }

    my.moduleProperty = 1;
    my.moduleMethod = function () {
        // ...
    };

    return my;
}());

Notice that we’ve declared a global module named MODULE, with two public properties:
a method named MODULE.moduleMethodand a variable named MODULE.moduleProperty.
In addition, it maintains private internal state using the closure of the anonymous function.
Also, we can easily import needed globals, using the pattern we learned above.

Augmentation

One limitation of the module pattern so far is that the entire module must be in one file.
Anyone who has worked in a large code-base understands the value of splitting among multiple files.
Luckily, we have a nice solution to augment modules.
First, we import the module, then we add properties, then we export it.
Here’s an example, augmenting our MODULE from above:

var MODULE = (function (my) {
    my.anotherMethod = function () {
        // added method...
    };

    return my;
}(MODULE));

Use the var keyword again.
After this code has run, our module will have gained a new public method named MODULE.anotherMethod.
This augmentation file will also maintain its own private internal state and imports.

Loose Augmentation

While our example above requires our initial module creation to be first, and the augmentation to happen second, that isn’t always necessary.
One of the best things a JavaScript application can do for performance is to load scripts asynchronously.
We can create flexible multi-part modules that can load themselves in any order with loose augmentation.
Each file should have the following structure:

var MODULE = (function (my) {
    // add capabilities...

    return my;
}(MODULE || {}));

The import will create the module if it doesn’t already exist.
This means you can use a library like require.js and load all of your module files in parallel, without needing to block.
preferably not more than 2 to 3 at a time, else performance will degrade

Tight Augmentation

While loose augmentation is great, it does place some limitations on your module.
Most importantly, you cannot override module properties safely.
You also cannot use module properties from other files during initialization (but you can at run-time after intialization).
Tight augmentation implies a set loading order, but allows overrides.
Here is a simple example (augmenting our original MODULE):

var MODULE = (function (my) {
    var old_moduleMethod = my.moduleMethod;

    my.moduleMethod = function () {
        // method override, has access to old through old_moduleMethod...
    };

    return my;
}(MODULE));

Here we’ve overridden MODULE.moduleMethod, but maintain a reference to the original method, if needed.

Cloning and Inheritance
var MODULE_TWO = (function (old) {
    var my = {},
    var key;

    for (key in old) {
        if (old.hasOwnProperty(key)) {
            my[key] = old[key];
        }
    }

    var super_moduleMethod = old.moduleMethod;
    my.moduleMethod = function () {
        // override method on the clone, access to super through super_moduleMethod
    };

    return my;
}(MODULE));

This pattern is perhaps the least flexible option. It does allow some neat compositions, but that comes at the expense of flexibility.
As I’ve written it, properties which are objects or functions will not be duplicated, they will exist as one object with two references.
Changing one will change the other.
This could be fixed for objects with a recursive cloning process, but probably cannot be fixed for functions, except perhaps with eval.

Sub-modules
MODULE.sub = (function () {
    var my = {};
    // ...

    return my;
}());

Statements

Simple Statements

Each line should contain at most one statement.
Put a ; (semicolon) at the end of every simple statement.
Note that an assignment statement which is assigning a function literal or object literal is still an assignment statement and must end with a semicolon.

JavaScript allows any expression to be used as a statement.
This can mask some errors, particularly in the presence of semicolon insertion.
The only expressions that should be used as statements are assignments and invocations.

Compound Statements

Compound statements are statements that contain lists of statements enclosed in { } (curly braces).

  • The enclosed statements should be indented four more spaces.
  • The { (left curly brace) should be at the end of the line that begins the compound statement.
  • The } (right curly brace) should begin a line and be indented to align with the beginning of the line containing the matching { (left curly brace).
  • Braces should be used around all statements, even single statements, when they are part of a control structure, such as an if or for statement. This makes it easier to add statements without accidentally introducing bugs.
Labels
Say no to labelling Statement labels are optional. Only these statements should be labelled: while, do, for, switch. Or better still don’t use labelling.
return Statement

A return statement with a value should not use ( ) (parentheses) around the value.
The return value expression must start on the same line as the return keyword in order to avoid semicolon insertion.

if Statement

The if class of statements should have the following form:

if (condition) {
    statements
}

if (condition) {
    statements
} else {
    statements
}

if (condition) {
    statements
} else if (condition) {
    statements
} else {
    statements
}

Avoid doing assignments in the condition part of if and while statements.

Is

if (a = b) {

a correct statement? Or was

if (a === b) {

intended? Avoid constructs that cannot easily be determined to be correct.

Also see the Equality section

for Statement

A for class of statements should have the following form:

for (initialisation; condition; update) {
    statements
}

for (variable in object) {
    if (filter) {
        statements
    }
}

The first form should be used with arrays and with loops of a predetermined number of iterations.

The second form should be used with objects.
Be aware that members that are added to the prototype of the object will be included in the enumeration.
It is wise to program defensively by using the hasOwnProperty method to distinguish the true members of the object:

for (variable in object) {
    if (object.hasOwnProperty(variable)) {
        statements
    }
}
while Statement

A while statement should have the following form:

while (condition) {
    statements
}
do Statement

A do statement should have the following form:

do {
    statements
} while (condition);

Unlike the other compound statements, the do statement always ends with a ; (semicolon).

switch Statement

A switch statement should have the following form:

switch (expression) {
case expression:
    statements
default:
    statements
}

Each case is aligned with the switch. This avoids over-indentation.
Each group of statements (except the default) should end with break, return, or throw. Do not fall through.

Or better, use the following form from Angus Crolls blog post

Procedural way to do the construct

var whatToBring;
switch(weather) {
    case 'Sunny':
        whatToBring = 'Sunscreen and hat';
        break;
    case 'Rain':
        whatToBring  ='Umbrella and boots';
        break;
    case 'Cold':
        whatToBring = 'Scarf and Gloves';
        break;
    default : whatToBring = 'Play it by ear';
}

OO way to do the construct

var whatToBring = {
    'Sunny' : 'Sunscreen and hat',
    'Rain' : 'Umbrella and boots',
    'Cold' : 'Scarf and Gloves',
    'Default' : 'Play it by ear'
}

var gear = whatToBring[weather] || whatToBring['Default'];
try Statement

The try class of statements should have the following form:

try {
    statements
} catch (variable) {
    statements
}

try {
    statements
} catch (variable) {
    statements
} finally {
    statements
}
continue Statement

Avoid use of the continue statement. It tends to obscure the control flow of the function.

with Statement

Why it shouldn’t be used.

Why it should be used.

with statement Both points are valid. Understand how it works before you use it. It aint going to work with strict mode anyway, for good reason IMHO.

Function Declarations

  • All functions should be declared before they are used.
  • Inner functions should follow the var statement. This helps make it clear what variables are included in its scope.
  • There should be no space between the name of a function and the ( (left parenthesis) of its parameter list.
  • There should be one space between the ) (right parenthesis) and the { (left curly brace) that begins the statement body.
    The body itself is indented four spaces.
    The } (right curly brace) is aligned with the line containing the beginning of the declaration of the function.

This convention works well with JavaScript because in JavaScript, functions and object literals can be placed anywhere that an expression is allowed.
It provides the best readability with inline functions and complex structures.

function getElementsByClassName(className) {
    var results = [];
    walkTheDOM(document.body, function (node) {
        var a; // array of class names
        var c = node.className; // the node's classname
        var i; // loop counter

// If the node has a class name, then split it into a list of simple names.
// If any of them match the requested name, then append the node to the set of results.

        if (c) {
            a = c.split(' ');
            for (i = 0; i < a.length; i += 1) {
                if (ai === className) {
                    results.push(node);
                    break;
                }
            }
        }
    });
    return results;
}

If a function literal is anonymous, there should be one space between the word function and the ( (left parenthesis).
If the space is omitted, then it can appear that the function’s name is function, which is an incorrect reading.

div.onclick = function (e) {
   return false;
};

that = {
    method: function () {
         return this.datum;
    },
    datum: 0
};

When a function is to be invoked immediately,
the entire invocation expression should be wrapped in parens so that it is clear that the value being produced is the result of the function and not the function itself.

var collection = (function () {
    var keys = [], values = [];

    return {
    get: function (key) {
        var at = keys.indexOf(key);
        if (at >= 0) {
             return values[at];
        }
    },
    set: function (key, value) {
        var at = keys.indexOf(key);
        if (at < 0) {
            at = keys.length;
        }
        keysat = key;
        valuesat = value;
    },
    remove: function (key) {
        var at = keys.indexOf(key);
        if (at >= 0) {
            keys.splice(at, 1);
            values.splice(at, 1); }
        }
    };
}());

Variable Declarations

All variables should be declared before used.
JavaScript does not require this, but doing so makes the program easier to read and makes it easier to detect undeclared variables that become implied globals.
If you get into the habit of using strict mode, you’ll get pulled up on this anyway.
Implied global variables should never be used.

The var statements should be the first statements in the function body.

It is preferred that each variable be given its own line. They should be listed in alphabetical order.

var currentSelectedTableEntry;
var indentationlevel;
var sizeOfTable;

JavaScript does not have block scope (see Scope section), so defining variables in blocks can confuse programmers who are experienced with other C family languages.

  • Define all variables at the top of the function.
  • Use of global variables should be minimized.
  • Implied global variables should never be used.

References

http://javascript.crockford.com/code.html

http://www.webreference.com/programming/javascript/prototypal_inheritance/

http://javascript.crockford.com/prototypal.html

http://www.crockford.com/javascript/inheritance.html

http://www.adequatelygood.com/2010/3/JavaScript-Module-Pattern-In-Depth

Moving to TDD

December 1, 2012

My last employers software development team recently took up the challenge of writing their tests before writing the functionality for which the test was written. In software development, this is known as Test Driven Development or TDD.

TDD is a hard concept to get developers to embrace. It’s often as much of a paradigm shift as persuading a procedural programmer to start creating Object Oriented designs. Some never get it. Fortunately we had a very talented bunch of developers, and they’ve taken to it like fish to water.

The first thing to clear up is that TDD is not primarily about testing, but rather it forces the developer to write code that is testable (the fact the code has tests written for it and running regularly is a side effect, albeit a very positive one).

This is why there is often some confusion about TDD and the fact it or its derivatives (BDD, ATDD, AAT, etc.) are primarily focused on creating well designed software. Code that is testable must be modular, which provides good separation of concerns.

  • Testing is about measuring where the quality is currently at.
  • TDD and its derivatives are about building the quality in from the start.

red green refactor

TDD concentrates on writing a unit test for the routine we are about to create before it’s created. A developer writes code that acts as a low-level specification (the test) that will be run on the routine, to confirm that the routine does what we expect it will do.

To unit test a routine, we must be able to break out the routines dependencies and separate them. If we don’t do this, the hierarchy of calls often grows exponentially.

Thus:

  1. We end up testing far more than we want or need to.
  2. The complexity gets out of hand.
  3. The test takes longer to execute than it needs to.
  4. Thus, the tests don’t get run as often as they should because we developers have to wait, and we live in an instant society.

This allows us to ignore how the dependencies behave and concentrate on a single routine. There are a number of concepts we can instantiate to help with this.

We can use:

Although TDD isn’t primarily about testing, its sole purpose is to create solid, well designed, extensible and scalable software. TDD encourages and in some cases forces us down the path of the SOLID principles, because to test each routine, each routine must be able to stand on its own.

SOLID principles

So what does SOLID give us? SOLID stands for:

  • Single Responsibility Principle
  • Open Closed Principle
  • Liskov Substitution Principle
  • Interface Segregation Principle
  • Dependency Inversion Principle

Single Responsibility Principle

  • Each class should have one and only one reason to change.
  • Each class should do one thing and do it well.

Single Responsibility Principle

Just because you can, doesn’t mean you should.

Open Closed Principle

  • A class’s behaviour should be able to be extended without modifying it.
  • There are several ways to achieve this. Some of which are polymorphism via inheritance, aggregation, wrapping.

Liskov Substitution Principle

Interface Segregation Principle

  • When an interface consists of too many members, it should be split into smaller and more specific (to the client’s needs) interfaces, so that clients using the interface only use the members applicable to them.
  • A client should not have to know about all the extra interface members they don’t use.
  • This encourages systems to be decoupled and thus more easily re-factored, extended and scaled.

Dependency Inversion Principle

  • Often implemented in the form of the Dependency Injection Pattern via the more specific Inversion of Control Principle (IoC). In some circles, known as the Hollywood Principle… Don’t call us, we’ll call you.

TDD assists in the monitoring of technical debt and streamlines the path to quality design.

Additional info on optimizing your team’s testing effort can be found here.

Sanitising User Input from Browser. part 2

November 16, 2012

Untrusted data (data entered by a user), should always be treated as though it contains attack code.
This data should not be sent anywhere without taking the necessary steps to detect and neutralise the malicious code.
With applications becoming more interconnected, attacks being buried in user input and decoded and/or executed by a downstream interpreter is becoming all the more common.
Input validation, that’s restricting user input to allow only certain white listed characters and restricting field lengths are only two forms of defence.
Any decent attacker can get around client side validation, so you need to employ defence in depth.
validation and escaping also needs to be performed on the server side.

Leveraging existing libraries

  1. Microsofts AntiXSS is not extensible,
    it doesn’t allow the user to define their own whitelist.
    It didn’t allow me to add behaviour to the routines.
    I want to know how many instances of HTML encoded values there were.
    There was certainly a lot of code in there, but I didn’t find it very useful.
  2. The OWASP encoding project (Reform)(as mentioned in part 1 of this series).
    This is quite a useful set of projects for different technologies.
  3. System.Net.WebUtility from the System.Web.dll.
    Now this did most of what I needed other than provide me with fine grained information of what had been tampered with.
    So I took it and extended it slightly.
    We hadn’t employed AOP at this stage and it wasn’t considered important enough to invest the time to do so.
    So it was a matter of copy past modify.

What’s the point in client side validation if the server has to do it again anyway?

Now there are arguments both ways for this.
My current take on this for the project in question was:
If you only have server side validation, the client side is less responsive and user friendly.
If you only have client side validation, it’s out of our control.
This also gives fuel to the argument of using JavaScript on the client and server side (with the likes of node.js).
So the same code can be used both sides without having to code the same validation in two different languages.
Personally I find writing validation code easier using JavaScript than C#.
This maybe just because I’ve been writing considerably more JavaScript than C# lately though.

The code

I drew a sequence diagram of how this should work, but it got lost in a move.
So I wasn’t keen on doing it again, as the code had already been done.
In saying that, the code has reasonably good documentation (I think).
Code is king, providing it has been written to be read.
If you notice any of the escaping isn’t quite making sense, it could be the blogging engine either doing what it’s meant to, or not doing what it’s meant to.
I’ve been over the code a few times, but I may have missed something.
Shout out if anything’s not clear.

First up, we’ll look at the custom exceptions as we’ll need those soon.

using System;

namespace Common.WcfHelpers.ErrorHandling.Exceptions
{
    public abstract class WcfException : Exception
    {
        /// <summary>
        /// In order to set the message for the client, set it here, or via the property directly in order to over ride default value.
        /// </summary>
        /// <param name="message">The message to be assigned to the Exception's Message.</param>
        /// <param name="innerException">The exception to be assigned to the Exception's InnerException.</param>
        /// <param name="messageForClient">The client friendly message. This parameter is optional, but should be set.</param>
        public WcfException(string message, Exception innerException = null, string messageForClient = null) : base(message, innerException)
        {
            MessageForClient = messageForClient;
        }

        /// <summary>
        /// This is the message that the service's client will see.
        /// Make sure it is set in the constructor. Or here.
        /// </summary>
	    public string MessageForClient
        {
            get { return string.IsNullOrEmpty(_messageForClient) ? "The MessageForClient property of WcfException was not set" : _messageForClient; }
            set { _messageForClient = value; }
        }
        private string _messageForClient;
    }
}

And the more specific SanitisationWcfException

using System;
using System.Configuration;

namespace Common.WcfHelpers.ErrorHandling.Exceptions
{
    /// <summary>
    /// Exception class that is used when the user input sanitisation fails, and the user needs to be informed.
    /// </summary>
    public class SanitisationWcfException : WcfException
    {
        private const string _defaultMessageForClient = "Answers were NOT saved. User input validation was unsuccessful.";
        public string UnsanitisedAnswer { get; private set; }

        /// <summary>
        /// In order to set the message for the client, set it here, or via the property directly in order to over ride default value.
        /// </summary>
        /// <param name="message">The message to be assigned to the Exception's Message.</param>
        /// <param name="innerException">The Exception to be assigned to the base class instance's inner exception. This parameter is optional.</param>
        /// <param name="messageForClient">The client friendly message. This parameter is optional, but should be set.</param>
        /// <param name="unsanitisedAnswer">The user input string before service side sanitisatioin is performed.</param>
        public SanitisationWcfException
        (
            string message,
            Exception innerException = null,
            string messageForClient = _defaultMessageForClient,
            string unsanitisedAnswer = null
        )
            : base(
                message,
                innerException,
                messageForClient + " If this continues to happen, please contact " + ConfigurationManager.AppSettings["SupportEmail"] + Environment.NewLine
                )
        {
            UnsanitisedAnswer = unsanitisedAnswer;
        }
    }
}

Now as we define whether our requirements are satisfied by way of executable requirements (unit tests(in their rawest form))
Lets write some executable specifications.

using NUnit.Framework;
using Common.Security.Sanitisation;

namespace Common.Security.Encoding.UnitTest
{
    [TestFixture]
    public class ExtensionsTest
    {

        private readonly string _inNeedOfEscaping = @"One #x2F / two amp & three #x27 ' four lt < five quot "" six gt >.";
        private readonly string _noNeedForEscaping = @"One x2F two amp three x27 four lt five quot six gt       .";

        [Test]
        public void SingleDecodeDoubleEncodedHtml_ShouldSingleDecodeDoubleEncodedHtml()
        {
            string doubleEncodedHtml = @"";               // between the ""'s we have a string of Html with double escaped values like &amp;#x27; user entered text &amp;#x2F.
            string singleEncodedHtmlShouldLookLike = @""; // between the ""'s we have a string of Html with single escaped values like ' user entered text &#x2F.
            // In the above, the bloging engine is escaping the sinlge escaped entity encoding, so all you'll see is the entity it self.
            // but it should look like the double encoded entity encodings without the first &amp->;


            string singleEncodedHtml = doubleEncodedHtml.SingleDecodeDoubleEncodedHtml();
            
            Assert.That(singleEncodedHtml, Is.EqualTo(singleEncodedHtmlShouldLookLike));
        }

        [Test]
        public void Extensions_CompliesWithWhitelist_ShouldNotComply()
        {
            Assert.That(_inNeedOfEscaping.CompliesWithWhitelist(whiteList: @"^[\w\s\.,]+$"), Is.False);
        }

        [Test]
        public void Extensions_CompliesWithWhitelist_ShouldComply()
        {
            Assert.That(_noNeedForEscaping.CompliesWithWhitelist(whiteList: @"^[\w\s\.,]+$"), Is.True);
            Assert.That(_inNeedOfEscaping.CompliesWithWhitelist(whiteList: @"^[\w\s\.,#/&'<"">]+$"), Is.True);
        }
    }
}

Now the code that satisfies the above executable specifications, and more.

using System;
using System.Collections.Generic;
using System.Globalization;
using System.IO;
using System.Text.RegularExpressions;

namespace Common.Security.Sanitisation
{
    /// <summary>
    /// Provides a series of extension methods that perform sanitisation.
    /// Escaping, unescaping, etc.
    /// Usually targeted at user input, to help defend against the likes of XSS and other injection attacks.
    /// </summary>
    public static class Extensions
    {

        private const int CharacterIndexNotFound = -1;

        /// <summary>
        /// Returns a new string in which all occurrences of a double escaped html character (that's an html entity immediatly prefixed with another html entity)
        /// in the current instance are replaced with the single escaped character.
        /// </summary>
        /// <param name="source">The target text used to strip one layer of Html entity encoding.</param>
        /// <returns>The singly escaped text.</returns>
        public static string SingleDecodeDoubleEncodedHtml(this string source)
        {
            return source.Replace("&amp;#x", "&#x");
        }
        /// <summary>
        /// Filter a text against a regular expression whitelist of specified characters.
        /// </summary>
        /// <param name="target">The text that is filtered using the whitelist.</param>
        /// <param name="alternativeTarget"></param>
        /// <param name="whiteList">Needs to be be assigned a valid whitelist, otherwise nothing gets through.</param>
        public static bool CompliesWithWhitelist(this string target, string alternativeTarget = "", string whiteList = "")
        {
            if (string.IsNullOrEmpty(target))
                target = alternativeTarget;
            
            return Regex.IsMatch(target, whiteList);
        }
        /// <summary>
        /// Takes a string and returns another with a single layer of Html entity encoding replaced with it's Html entity literals.
        /// </summary>
        /// <param name="encodedUserInput">The text to perform the opperation on.</param>
        /// <param name="numberOfEscapes">The number of Html entity encodings that were replaced.</param>
        /// <returns>The text that's had a single layer of Html entity encoding replaced with it's Html entity literals.</returns>
        public static string HtmlDecode(this string encodedUserInput, ref int numberOfEscapes)
        {
            const int NotFound = -1;

            if (string.IsNullOrEmpty(encodedUserInput))
                return string.Empty;

            StringWriter output = new StringWriter(CultureInfo.InvariantCulture);
            
            if (encodedUserInput.IndexOf('&') == NotFound)
            {
                output.Write(encodedUserInput);
            }
            else
            {
                int length = encodedUserInput.Length;
                for (int index1 = 0; index1 < length; ++index1)
                {
                    char ch1 = encodedUserInput[index1];
                    if (ch1 == 38)
                    {
                        int index2 = encodedUserInput.IndexOfAny(_htmlEntityEndingChars, index1 + 1);
                        if (index2 > 0 && encodedUserInput[index2] == 59)
                        {
                            string entity = encodedUserInput.Substring(index1 + 1, index2 - index1 - 1);
                            if (entity.Length > 1 && entity[0] == 35)
                            {
                                ushort result;
                                if (entity[1] == 120 || entity[1] == 88)
                                    ushort.TryParse(entity.Substring(2), NumberStyles.AllowHexSpecifier, NumberFormatInfo.InvariantInfo, out result);
                                else
                                    ushort.TryParse(entity.Substring(1), NumberStyles.AllowLeadingWhite | NumberStyles.AllowTrailingWhite | NumberStyles.AllowLeadingSign, NumberFormatInfo.InvariantInfo, out result);
                                if (result != 0)
                                {
                                    ch1 = (char)result;
                                    numberOfEscapes++;
                                    index1 = index2;
                                }
                            }
                            else
                            {
                                index1 = index2;
                                char ch2 = HtmlEntities.Lookup(entity);
                                if ((int)ch2 != 0)
                                {
                                    ch1 = ch2;
                                    numberOfEscapes++;
                                }
                                else
                                {
                                    output.Write('&');
                                    output.Write(entity);
                                    output.Write(';');
                                    continue;
                                }
                            }
                        }
                    }
                    output.Write(ch1);
                }
            }
            string decodedHtml = output.ToString();
            output.Dispose();
            return decodedHtml;
        }
        /// <summary>
        /// Escapes all character entity references (double escaping where necessary).
        /// Why? The XmlTextReader that is setup in XmlDocument.LoadXml on the service considers the character entity references (&#xxxx;) to be the character they represent.
        /// All XML is converted to unicode on reading and any such entities are removed in favor of the unicode character they represent.
        /// </summary>
        /// <param name="unencodedUserInput">The string that needs to be escaped.</param>
        /// <param name="numberOfEscapes">The number of escapes applied.</param>
        /// <returns>The escaped text.</returns>
        public static unsafe string HtmlEncode(this string unencodedUserInput, ref int numberOfEscapes)
        {
            if (string.IsNullOrEmpty(unencodedUserInput))
                return string.Empty;

            StringWriter output = new StringWriter(CultureInfo.InvariantCulture);
            
            if (output == null)
                throw new ArgumentNullException("output");
            int num1 = IndexOfHtmlEncodingChars(unencodedUserInput);
            if (num1 == -1)
            {
                output.Write(unencodedUserInput);
            }
            else
            {
                int num2 = unencodedUserInput.Length - num1;
                fixed (char* chPtr1 = unencodedUserInput)
                {
                    char* chPtr2 = chPtr1;
                    while (num1-- > 0)
                        output.Write(*chPtr2++);
                    while (num2-- > 0)
                    {
                        char ch = *chPtr2++;
                        if (ch <= 62)
                        {
                            switch (ch)
                            {
                                case '"':
                                    output.Write(""");
                                    numberOfEscapes++;
                                    continue;
                                case '&':
                                    output.Write("&amp;");
                                    numberOfEscapes++;
                                    continue;
                                case '\'':
                                    output.Write("&amp;#x27;");
                                    numberOfEscapes = numberOfEscapes + 2;
                                    continue;
                                case '<':
                                    output.Write("<");
                                    numberOfEscapes++;
                                    continue;
                                case '>':
                                    output.Write(">");
                                    numberOfEscapes++;
                                    continue;
                                case '/':
                                    output.Write("&amp;#x2F;");
                                    numberOfEscapes = numberOfEscapes + 2;
                                    continue;
                                default:
                                    output.Write(ch);
                                    continue;
                            }
                        }
                        if (ch >= 160 && ch < 256)
                        {
                            output.Write("&#");
                            output.Write(((int)ch).ToString(NumberFormatInfo.InvariantInfo));
                            output.Write(';');
                            numberOfEscapes++;
                        }
                        else
                            output.Write(ch);
                    }
                }
            }
            string encodedHtml = output.ToString();
            output.Dispose();
            return encodedHtml;
        }

 

        private static unsafe int IndexOfHtmlEncodingChars(string searchString)
        {
            int num = searchString.Length;
            fixed (char* chPtr1 = searchString)
            {
                char* chPtr2 = (char*)((UIntPtr)chPtr1);
                for (; num > 0; --num)
                {
                    char ch = *chPtr2;
                    if (ch <= 62)
                    {
                        switch (ch)
                        {
                            case '"':
                            case '&':
                            case '\'':
                            case '<':
                            case '>':
                            case '/':
                                return searchString.Length - num;
                        }
                    }
                    else if (ch >= 160 && ch < 256)
                        return searchString.Length - num;
                    ++chPtr2;
                }
            }
            return CharacterIndexNotFound;
        }

        private static char[] _htmlEntityEndingChars = new char[2]
        {
            ';',
            '&'
        };
        private static class HtmlEntities
        {
            private static string[] _entitiesList = new string[253]
            {
                "\"-quot",
                "&-amp",
                "'-apos",
                "<-lt",
                ">-gt",
                " -nbsp",
                "¡-iexcl",
                "¢-cent",
                "£-pound",
                "¤-curren",
                "¥-yen",
                "¦-brvbar",
                "§-sect",
                "¨-uml",
                "©-copy",
                "ª-ordf",
                "«-laquo",
                "¬-not",
                "\x00AD-shy",
                "®-reg",
                "¯-macr",
                "°-deg",
                "±-plusmn",
                "\x00B2-sup2",
                "\x00B3-sup3",
                "´-acute",
                "µ-micro",
                "¶-para",
                "·-middot",
                "¸-cedil",
                "\x00B9-sup1",
                "º-ordm",
                "»-raquo",
                "\x00BC-frac14",
                "\x00BD-frac12",
                "\x00BE-frac34",
                "¿-iquest",
                "À-Agrave",
                "Á-Aacute",
                "Â-Acirc",
                "Ã-Atilde",
                "Ä-Auml",
                "Å-Aring",
                "Æ-AElig",
                "Ç-Ccedil",
                "È-Egrave",
                "É-Eacute",
                "Ê-Ecirc",
                "Ë-Euml",
                "Ì-Igrave",
                "Í-Iacute",
                "Î-Icirc",
                "Ï-Iuml",
                "Ð-ETH",
                "Ñ-Ntilde",
                "Ò-Ograve",
                "Ó-Oacute",
                "Ô-Ocirc",
                "Õ-Otilde",
                "Ö-Ouml",
                "×-times",
                "Ø-Oslash",
                "Ù-Ugrave",
                "Ú-Uacute",
                "Û-Ucirc",
                "Ü-Uuml",
                "Ý-Yacute",
                "Þ-THORN",
                "ß-szlig",
                "à-agrave",
                "á-aacute",
                "â-acirc",
                "ã-atilde",
                "ä-auml",
                "å-aring",
                "æ-aelig",
                "ç-ccedil",
                "è-egrave",
                "é-eacute",
                "ê-ecirc",
                "ë-euml",
                "ì-igrave",
                "í-iacute",
                "î-icirc",
                "ï-iuml",
                "ð-eth",
                "ñ-ntilde",
                "ò-ograve",
                "ó-oacute",
                "ô-ocirc",
                "õ-otilde",
                "ö-ouml",
                "÷-divide",
                "ø-oslash",
                "ù-ugrave",
                "ú-uacute",
                "û-ucirc",
                "ü-uuml",
                "ý-yacute",
                "þ-thorn",
                "ÿ-yuml",
                "Œ-OElig",
                "œ-oelig",
                "Š-Scaron",
                "š-scaron",
                "Ÿ-Yuml",
                "ƒ-fnof",
                "\x02C6-circ",
                "˜-tilde",
                "Α-Alpha",
                "Β-Beta",
                "Γ-Gamma",
                "Δ-Delta",
                "Ε-Epsilon",
                "Ζ-Zeta",
                "Η-Eta",
                "Θ-Theta",
                "Ι-Iota",
                "Κ-Kappa",
                "Λ-Lambda",
                "Μ-Mu",
                "Ν-Nu",
                "Ξ-Xi",
                "Ο-Omicron",
                "Π-Pi",
                "Ρ-Rho",
                "Σ-Sigma",
                "Τ-Tau",
                "Υ-Upsilon",
                "Φ-Phi",
                "Χ-Chi",
                "Ψ-Psi",
                "Ω-Omega",
                "α-alpha",
                "β-beta",
                "γ-gamma",
                "δ-delta",
                "ε-epsilon",
                "ζ-zeta",
                "η-eta",
                "θ-theta",
                "ι-iota",
                "κ-kappa",
                "λ-lambda",
                "μ-mu",
                "ν-nu",
                "ξ-xi",
                "ο-omicron",
                "π-pi",
                "ρ-rho",
                "ς-sigmaf",
                "σ-sigma",
                "τ-tau",
                "υ-upsilon",
                "φ-phi",
                "χ-chi",
                "ψ-psi",
                "ω-omega",
                "ϑ-thetasym",
                "ϒ-upsih",
                "ϖ-piv",
                " -ensp",
                " -emsp",
                " -thinsp",
                "\x200C-zwnj",
                "\x200D-zwj",
                "\x200E-lrm",
                "\x200F-rlm",
                "–-ndash",
                "—-mdash",
                "‘-lsquo",
                "’-rsquo",
                "‚-sbquo",
                "“-ldquo",
                "”-rdquo",
                "„-bdquo",
                "†-dagger",
                "‡-Dagger",
                "•-bull",
                "…-hellip",
                "‰-permil",
                "′-prime",
                "″-Prime",
                "‹-lsaquo",
                "›-rsaquo",
                "‾-oline",
                "⁄-frasl",
                "€-euro",
                "ℑ-image",
                "℘-weierp",
                "ℜ-real",
                "™-trade",
                "ℵ-alefsym",
                "←-larr",
                "↑-uarr",
                "→-rarr",
                "↓-darr",
                "↔-harr",
                "↵-crarr",
                "⇐-lArr",
                "⇑-uArr",
                "⇒-rArr",
                "⇓-dArr",
                "⇔-hArr",
                "∀-forall",
                "∂-part",
                "∃-exist",
                "∅-empty",
                "∇-nabla",
                "∈-isin",
                "∉-notin",
                "∋-ni",
                "∏-prod",
                "∑-sum",
                "−-minus",
                "∗-lowast",
                "√-radic",
                "∝-prop",
                "∞-infin",
                "∠-ang",
                "∧-and",
                "∨-or",
                "∩-cap",
                "∪-cup",
                "∫-int",
                "∴-there4",
                "∼-sim",
                "≅-cong",
                "≈-asymp",
                "≠-ne",
                "≡-equiv",
                "≤-le",
                "≥-ge",
                "⊂-sub",
                "⊃-sup",
                "⊄-nsub",
                "⊆-sube",
                "⊇-supe",
                "⊕-oplus",
                "⊗-otimes",
                "⊥-perp",
                "⋅-sdot",
                "⌈-lceil",
                "⌉-rceil",
                "⌊-lfloor",
                "⌋-rfloor",
                "〈-lang",
                "〉-rang",
                "◊-loz",
                "♠-spades",
                "♣-clubs",
                "♥-hearts",
                "♦-diams"
            };
            private static Dictionary<string, char> _lookupTable = GenerateLookupTable();

            private static Dictionary<string, char> GenerateLookupTable()
            {
                Dictionary<string, char> dictionary = new Dictionary<string, char>(StringComparer.Ordinal);
                foreach (string str in _entitiesList)
                    dictionary.Add(str.Substring(2), str[0]);
                return dictionary;
            }

            public static char Lookup(string entity)
            {
                char ch;
                _lookupTable.TryGetValue(entity, out ch);
                return ch;
            }
        }
    }
}

You may also notice that I’ve mocked the OperationContext.
Thanks to WCFMock, a mocking framework for WCF services.
I won’t include this code, but you can get it here.
I’ve used the popular NUnit test framework and RhinoMocks for the stubbing and mocking.
Both pulled into the solution using NuGet.
Most useful documentation for RhinoMocks:
http://ayende.com/Wiki/Rhino+Mocks+3.5.ashx
http://ayende.com/wiki/Rhino+Mocks.ashx

For this project I used NLog and wrapped it.
Now you start to get an idea of how to use the sanitisation.

using System;
using System.ServiceModel;
using System.ServiceModel.Channels;
using NUnit.Framework;
using System.Configuration;
using Rhino.Mocks;
using Common.Wrapper.Log;
using MockedOperationContext = System.ServiceModel.Web.MockedOperationContext;
using Common.WcfHelpers.ErrorHandling.Exceptions;

namespace Sanitisation.UnitTest
{
    [TestFixture]
    public class SanitiseTest
    {
        private const string _myTestIpv4Address = "My.Test.Ipv4.Address";
        private readonly int _maxLengthHtmlEncodedUserInput = int.Parse(ConfigurationManager.AppSettings["MaxLengthHtmlEncodedUserInput"]);
        private readonly int _maxLengthHtmlDecodedUserInput = int.Parse(ConfigurationManager.AppSettings["MaxLengthHtmlDecodedUserInput"]);
        private readonly string _encodedUserInput_thatsMaxDecodedLength = @"One #x2F &amp;#x2F; two amp &amp; three #x27 &amp;#x27; four lt < five quot " six gt >.
One #x2F &amp;#x2F; two amp &amp; three #x27 &amp;#x27; four lt < five quot " six gt >.
One #x2F &amp;#x2F; two amp &amp; three #x27 &amp;#x27; four lt < five quot " six gt >.
One #x2F &amp;#x2F; two amp &amp; three #x27 &amp;#x27; four lt < five quot " six gt >.
One #x2F &amp;#x2F; two amp &amp; three #x27 &amp;#x27; four lt < five quot " six gt >.
One #x2F &amp;#x2F; two amp &amp; three #x27 &amp;#x27; four lt < five quot " six gt >.";
        private readonly string _decodedUserInput_thatsMaxLength = @"One #x2F / two amp & three #x27 ' four lt < five quot "" six gt >.
One #x2F / two amp & three #x27 ' four lt < five quot "" six gt >.
One #x2F / two amp & three #x27 ' four lt < five quot "" six gt >.
One #x2F / two amp & three #x27 ' four lt < five quot "" six gt >.
One #x2F / two amp & three #x27 ' four lt < five quot "" six gt >.
One #x2F / two amp & three #x27 ' four lt < five quot "" six gt >.";

        [Test]
        public void Sanitise_UserInput_WhenGivenNull_ShouldReturnEmptyString()
        {
            Assert.That(new Sanitise().UserInput(null), Is.EqualTo(string.Empty));
        }

        [Test]
        public void Sanitise_UserInput_WhenGivenEmptyString_ShouldReturnEmptyString()
        {
            Assert.That(new Sanitise().UserInput(string.Empty), Is.EqualTo(string.Empty));
        }

        [Test]
        public void Sanitise_UserInput_WhenGivenSanitisedString_ShouldReturnSanitisedString()
        {
            // Open the whitelist up in order to test the encoding without restriction.
            Assert.That(new Sanitise(whiteList: @"^[\w\s\.,#/&'<"">]+$").UserInput(_encodedUserInput_thatsMaxDecodedLength), Is.EqualTo(_encodedUserInput_thatsMaxDecodedLength));
        }
        [Test]
        [ExpectedException(typeof(SanitisationWcfException))]
        public void Sanitise_UserInput_ShouldThrowExceptionIfEscapedInputToLong()
        {
            string fourThousandAndOneCharacters = "Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand character";
            string expectedError = "The un-modified string received from the client with the following IP address: " +
                   '"' + _myTestIpv4Address + "\" " +
                   "exceeded the allowed maximum length of an escaped Html user input string. " +
                   "The maximum length allowed is: " +
                   _maxLengthHtmlEncodedUserInput +
                   ". The length was: " +
                   (_maxLengthHtmlEncodedUserInput+1) + ".";

            using(new MockedOperationContext(StubbedOperationContext))
            {
                try
                {
                    new Sanitise().UserInput(fourThousandAndOneCharacters);
                }
                catch(SanitisationWcfException e)
                {
                    Assert.That(e.Message, Is.EqualTo(expectedError));
                    Assert.That(e.UnsanitisedAnswer, Is.EqualTo(fourThousandAndOneCharacters));
                    throw;
                }
            }
        }
        [Test]
        [ExpectedException(typeof(SanitisationWcfException))]
        public void Sanitise_UserInput_DecodedUserInputShouldThrowException_WhenMaxLengthHtmlDecodedUserInputIsExceeded()
        {
            char oneCharOverTheLimit = '.';
            string expectedError =
                           "The string received from the client with the following IP address: " +
                           "\"" + _myTestIpv4Address + "\" " +
                           "after Html decoding exceded the allowed maximum length of an un-escaped Html user input string." +
                           Environment.NewLine +
                           "The maximum length allowed is: " + _maxLengthHtmlDecodedUserInput + ". The length was: " +
                           (_decodedUserInput_thatsMaxLength + oneCharOverTheLimit).Length + oneCharOverTheLimit;

            using(new MockedOperationContext(StubbedOperationContext))
            {
                try
                {
                    new Sanitise().UserInput(_encodedUserInput_thatsMaxDecodedLength + oneCharOverTheLimit);
                }
                catch(SanitisationWcfException e)
                {
                    Assert.That(e.Message, Is.EqualTo(expectedError));
                    Assert.That(e.UnsanitisedAnswer, Is.EqualTo(_encodedUserInput_thatsMaxDecodedLength + oneCharOverTheLimit));
                    throw;
                }
            }
        }
        [Test]
        public void Sanitise_UserInput_ShouldLogAndSendEmail_IfNumberOfDecodedHtmlEntitiesDoesNotMatchNumberOfEscapes()
        {
            string encodedUserInput_with6HtmlEntitiesNotEscaped = _encodedUserInput_thatsMaxDecodedLength.Replace("&amp;#x2F;", "/");
            string errorWeAreExpecting =
                "It appears as if someone has circumvented the client side Html entity encoding." + Environment.NewLine +
                "The requesting IP address was: " +
                "\"" + _myTestIpv4Address + "\" " +
                "The sanitised input we receive from the client was the following:" + Environment.NewLine +
                "\"" + encodedUserInput_with6HtmlEntitiesNotEscaped + "\"" + Environment.NewLine +
                "The same input after decoding and re-escaping on the server side was the following:" + Environment.NewLine +
                "\"" + _encodedUserInput_thatsMaxDecodedLength + "\"";
            string sanitised;
            // setup _logger
            ILogger logger = MockRepository.GenerateMock<ILogger>();
            logger.Expect(lgr => lgr.logError(errorWeAreExpecting));

            Sanitise sanitise = new Sanitise(@"^[\w\s\.,#/&'<"">]+$", logger);

            using (new MockedOperationContext(StubbedOperationContext))
            {
                // Open the whitelist up in order to test the encoding etc.
                sanitised = sanitise.UserInput(encodedUserInput_with6HtmlEntitiesNotEscaped);
            }

            Assert.That(sanitised, Is.EqualTo(_encodedUserInput_thatsMaxDecodedLength));
            logger.VerifyAllExpectations();
        }        

        private static IOperationContext StubbedOperationContext
        {
            get
            {
                IOperationContext operationContext = MockRepository.GenerateStub<IOperationContext>();
                int port = 80;
                RemoteEndpointMessageProperty remoteEndpointMessageProperty = new RemoteEndpointMessageProperty(_myTestIpv4Address, port);
                operationContext.Stub(oc => oc.IncomingMessageProperties[RemoteEndpointMessageProperty.Name]).Return(remoteEndpointMessageProperty);
                return operationContext;
            }
        }
    }
}

Now the API code that we can use to do our sanitisation.

using System;
using System.Configuration;
// Todo : KC We need time to implement DI. Should be using something like ninject.extensions.wcf.
using OperationContext = System.ServiceModel.Web.MockedOperationContext;
using System.ServiceModel.Channels;
using Common.Security.Sanitisation;
using Common.WcfHelpers.ErrorHandling.Exceptions;
using Common.Wrapper.Log;

namespace Sanitisation
{

    public class Sanitise
    {
        private readonly string _whiteList;
        private readonly ILogger _logger;
        

        private string RequestingIpAddress
        {
            get
            {
                RemoteEndpointMessageProperty remoteEndpointMessageProperty = OperationContext.Current.IncomingMessageProperties[RemoteEndpointMessageProperty.Name] as RemoteEndpointMessageProperty;
                return ((remoteEndpointMessageProperty != null) ? remoteEndpointMessageProperty.Address : string.Empty);
            }
        }
        /// <summary>
        /// Provides server side escaping of Html entities, and runs the supplied whitelist character filter over the user input string.
        /// </summary>
        /// <param name="whiteList">Should be provided by DI from the ResourceFile.</param>
        /// <param name="logger">Should be provided by DI. Needs to be an asynchronous logger.</param>
        /// <example>
        /// The whitelist can be obtained from a ResourceFile like so...
        /// <code>
        /// private Resource _resource;
        /// _resource.GetString("WhiteList");
        /// </code>
        /// </example>
        public Sanitise(string whiteList = "", ILogger logger = null)
        {
            _whiteList = whiteList;
            _logger = logger ?? new Logger();
        }
        /// <summary>
        /// 1) Check field lengths.         Client side validation may have been negated.
        /// 2) Check against white list.	Client side validation may have been negated.
        /// 3) Check Html escaping.         Client side validation may have been negated.

        /// Generic Fail actions:	Drop the payload. No point in trying to massage and save, as it won't be what the user was expecting,
        ///                         Add full error to a WCFException Message and throw.
        ///                         WCF interception reads the WCFException.MessageForClient, and sends it to the user. 
        ///                         On return, log the WCFException's Message.
        ///                         
        /// Escape Fail actions:	Asynchronously Log and email full error to support.


        /// 1) BA confirmed 50 for text, and 400 for textarea.
        ///     As we don't know the field type, we'll have to go for 400."
        ///
        ///     First we need to check that we haven't been sent some huge string.
        ///     So we check that the string isn't longer than 400 * 10 = 4000.
        ///     10 is the length of our double escaped character references.
        ///     Or, we ask the business for a number."
        ///     If we fail here, perform Generic Fail actions and don't complete the following steps.
        /// 
        ///     Convert all Html Entity Encodings back to their equivalent characters, and count how many occurrences.
        ///
        ///     If the string is longer than 400, perform Generic Fail actions and don't complete the following steps.
        /// 
        /// 2) check all characters against the white list
        ///     If any don't match, perform Generic Fail actions and don't complete the following steps.
        /// 
        /// 3) re html escape (as we did in JavaScript), and count how many escapes.
        ///     If count is greater than the count of Html Entity Encodings back to their equivalent characters,
        ///     Perform Escape Fail actions. Return sanitised string.
        /// 
        ///     If we haven't returned, return sanitised string.
        
        
        /// Performs checking on the text passed in, to verify that client side escaping and whitelist validation has already been performed.
        /// Performs decoding, and re-encodes. Counts that the number of escapes was the same, otherwise we log and send email with the details to support.
        /// Throws exception if the client side validation failed to restrict the number of characters in the escaped string we received.
        ///     This needs to be intercepted at the service.
        ///     The exceptions default message for client needs to be passed back to the user.
        ///     On return, the interception needs to log the exception's message.
        /// </summary>
        /// <param name="sanitiseMe"></param>
        /// <returns></returns>
        public string UserInput(string sanitiseMe)
        {
            if (string.IsNullOrEmpty(sanitiseMe))
                return string.Empty;

            ThrowExceptionIfEscapedInputToLong(sanitiseMe);

            int numberOfDecodedHtmlEntities = 0;
            string decodedUserInput = HtmlDecodeUserInput(sanitiseMe, ref numberOfDecodedHtmlEntities);

            if(!decodedUserInput.CompliesWithWhitelist(whiteList: _whiteList))
            {
                string error = "The answer received from client with the following IP address: " +
                    "\"" + RequestingIpAddress + "\" " +
                    "had characters that failed to match the whitelist.";
                throw new SanitisationWcfException(error);
            }

            int numberOfEscapes = 0;
            string sanitisedUserInput = decodedUserInput.HtmlEncode(ref numberOfEscapes);

            if(numberOfEscapes != numberOfDecodedHtmlEntities)
            {
                AsyncLogAndEmail(sanitiseMe, sanitisedUserInput);
            }

            return sanitisedUserInput;
        }
        /// <note>
        /// Make sure the logger is setup to log asynchronously
        /// </note>
        private void AsyncLogAndEmail(string sanitiseMe, string sanitisedUserInput)
        {
            // no need for SanitisationException

            _logger.logError(
                "It appears as if someone has circumvented the client side Html entity encoding." + Environment.NewLine +
                "The requesting IP address was: " +
                "\"" + RequestingIpAddress + "\" " +
                "The sanitised input we receive from the client was the following:" + Environment.NewLine +
                "\"" + sanitiseMe + "\"" + Environment.NewLine +
                "The same input after decoding and re-escaping on the server side was the following:" + Environment.NewLine +
                "\"" + sanitisedUserInput + "\""
                );
        }

        /// <summary>
        /// This procedure may throw a SanitisationWcfException.
        /// If it does, ErrorHandlerBehaviorAttribute will need to pass the "messageForClient" back to the client from within the IErrorHandler.ProvideFault procedure.
        /// Once execution is returned, the IErrorHandler.HandleError procedure of ErrorHandlerBehaviorAttribute
        /// will continue to process the exception that was thrown in the way of logging sensitive info.
        /// </summary>
        /// <param name="toSanitise"></param>
        private void ThrowExceptionIfEscapedInputToLong(string toSanitise)
        {
            int maxLengthHtmlEncodedUserInput = int.Parse(ConfigurationManager.AppSettings["MaxLengthHtmlEncodedUserInput"]);
            if (toSanitise.Length > maxLengthHtmlEncodedUserInput)
            {
                string error = "The un-modified string received from the client with the following IP address: " +
                    "\"" + RequestingIpAddress + "\" " +
                    "exceeded the allowed maximum length of an escaped Html user input string. " +
                    "The maximum length allowed is: " +
                    maxLengthHtmlEncodedUserInput +
                    ". The length was: " +
                    toSanitise.Length + ".";
                throw new SanitisationWcfException(error, unsanitisedAnswer: toSanitise);
            }
        }

        private string HtmlDecodeUserInput(string doubleEncodedUserInput, ref int numberOfDecodedHtmlEntities)
        {
            string decodedUserInput = doubleEncodedUserInput.HtmlDecode(ref numberOfDecodedHtmlEntities).HtmlDecode(ref numberOfDecodedHtmlEntities) ?? string.Empty;
            
            // if the decoded string is longer than MaxLengthHtmlDecodedUserInput throw
            int maxLengthHtmlDecodedUserInput = int.Parse(ConfigurationManager.AppSettings["MaxLengthHtmlDecodedUserInput"]);
            if(decodedUserInput.Length > maxLengthHtmlDecodedUserInput)
            {
                throw new SanitisationWcfException(
                    "The string received from the client with the following IP address: " +
                    "\"" + RequestingIpAddress + "\" " +
                    "after Html decoding exceded the allowed maximum length of an un-escaped Html user input string." +
                    Environment.NewLine +
                    "The maximum length allowed is: " + maxLengthHtmlDecodedUserInput + ". The length was: " +
                    decodedUserInput.Length + ".",
                    unsanitisedAnswer: doubleEncodedUserInput
                    );
            }
            return decodedUserInput;
        }
    }
}

As you can see, there’s a lot more work in the server side sanitisation than the client side.

Sanitising User Input from Browser. part 1

November 4, 2012

I was working on a web based project recently where there was no security thought about when designing, developing it.
The following outlines my experience with retrofitting security.
It’s my hope that someone will find it useful for their own implementation.

We’ll be focussing on the client side in this post (part 1) and the server side in part 2.
We’ll also cover some preliminary discussion that will set the stage for this series.

The application consists of a WCF service delivering up content to some embedding code on any page in the browser.
The content is stored as Xml in the database and transformed into Html via Xslt.

The first activity I find useful is to go through the process of Threat Modelling the Application.
This process can be quite daunting for those new to it.
Here’s a couple of references I find quite useful to get started:

https://www.owasp.org/index.php/Application_Threat_Modeling

https://www.owasp.org/index.php/Threat_Risk_Modeling#Decompose_Application

Actually this ones not bad either.

There is no single right way to do this.
The more you read and experiment, the more equipped you will be.
The idea is to think like an attacker thinks.
This may be harder for some than others, but it is essential, to cover as many potential attack vectors as possible.
Remember, there is no secure system, just varying levels of insecurity.
It will always be much harder to discover the majority of security weaknesses in your application as the person or team creating/maintaining it,
than for the person attacking it.
The Threat Modelling topic is large and I’m not going to go into it here, other than to say, you need to go into it.

Threat Agents

Work out who your Threat Agents are likely to be.
Learn how to think like they do.
Learn what skills they have and learn the skills your self.
Sometimes the skills are very non technical.
For example walking through the door of your organisation in the weekend because the cleaners (or any one with access) forgot to lock up.
Or when the cleaners are there and the technical staff are not (which is just as easy).
It happens more often than we like to believe.

Defense in Depth

To attempt to mitigate attacks, we need to take a multi layered approach (often called defence in depth).

What made me decide to start with sanitising user input from the browser anyway?
Well according to the OWASP Top 10, Injection and Cross Site Scripting (XSS) are still the most popular techniques chosen to compromise web applications.
So it makes sense if your dealing with web apps, to target the most common techniques exploited.

Now, in regards to defence in depth when discussing web applications;
If the attacker gets past the first line of defence, there should be something stopping them at the next layer and so forth.
The aim is to stop the attack as soon as possible.
This is why we focus on the UI first, and later move our focus to the application server, then to the database.
Bear in mind though, that what ever we do on the client side, can be circumvented relatively easy.
Client side code is out of our control, so it’s best effort.
Because of this, we need to perform the following not only in the browser, but as much as possible on the server side as well.

  1. Minimising the attack surface
  2. Defining maximum field lengths (validation)
  3. Determining a white list of allowable characters (validation)
  4. Escaping untrusted data, especially where you know it’s going to endup in an execution context. Even where you don’t think this is likely, it’s still possible.
  5. Using Stored Procedures / parameterised queries (not covered in this series).
  6. Least Privilege.
    Minimising the privileges assigned to every database account (not covered in this series).

Minimising the attack surface

input fields should only allow certain characters to be input.
Text input fields, textareas etc that are free form (anything is allowed) are very hard to constrain to a small white list.
input fields where ever possible should be constrained to well structured data,
like dates, social security numbers, zip codes, e-mail addresses, etc. then the developer should be able to define a very strong validation pattern, usually based on regular expressions, for validating such input. If the input field comes from a fixed set of options, like a drop down list or radio buttons, then the input needs to match exactly one of the values offered to the user in the first place.
As it was with the existing app I was working on, we had to allow just about everything in our free form text fields.
This will have to be re-designed in order to provide constrained input.

Defining maximum field lengths (validation)

This was currently being done (sometimes) in the Xml content for inputs where type="text".
Don’t worry about the inputType="single", it gets transformed.

<input id="2" inputType="single" type="text" size="10" maxlength="10" />

And if no maxlength specified in the Xml, we now specify a default of 50 in the xsl used to do the transformation.
This way we had the input where type="text" covered for the client side.
This would also have to be validated on the server side when the service received values from these inputs where type="text".

    <xsl:template match="input[@inputType='single']">
      <xsl:value-of select="@text" />
        <input name="a{@id}" type="text" id="a{@id}" class="textareaSingle">
          <xsl:attribute name="value">
            <xsl:choose>
              <xsl:when test="key('response', @id)">
                <xsl:value-of select="key('response', @id)" />
              </xsl:when>
              <xsl:otherwise>
                <xsl:value-of select="string(' ')" />
              </xsl:otherwise>
            </xsl:choose>
          </xsl:attribute>
          <xsl:attribute name="maxlength">
            <xsl:choose>
              <xsl:when test="@maxlength">
                <xsl:value-of select="@maxlength"/>
              </xsl:when>
              <xsl:otherwise>50</xsl:otherwise>
            </xsl:choose>
          </xsl:attribute>
        </input>
        <br/>
    </xsl:template>

For textareas we added maxlength validation as part of the white list validation.
See below for details.

Determining a white list of allowable characters (validation)

See bottom of this section for Update

Now this was quite an interesting exercise.
I needed to apply a white list to all characters being entered into the input fields.
A user can:

  1. type the characters in
  2. [ctrl]+[v] a clipboard full of characters in
  3. right click -> Paste

To cover all these scenarios as elegantly as possible, was going to be a bit of a challenge.
I looked at a few JavaScript libraries including one or two JQuery plug-ins.
None of them covered all these scenarios effectively.
I wish they did, because the solution I wasn’t totally happy with, because it required polling.
In saying that, I measured performance, and even bringing the interval right down had negligible effect, and it covered all scenarios.

setupUserInputValidation = function () {

  var textAreaMaxLength = 400;
  var elementsToValidate;
  var whiteList = /[^A-Za-z_0-9\s.,]/g;

  var elementValue = {
    textarea: '',
    textareaChanged: function (obj) {
      var initialValue = obj.value;
      var replacedValue = initialValue.replace(whiteList, "").slice(0, textAreaMaxLength);
      if (replacedValue !== initialValue) {
        this.textarea = replacedValue;
        return true;
      }
      return false;
    },
    inputtext: '',
    inputtextChanged: function (obj) {
      var initialValue = obj.value;
      var replacedValue = initialValue.replace(whiteList, "");
      if (replacedValue !== initialValue) {
        this.inputtext = replacedValue;
        return true;
      }
      return false;
    }
  };

  elementsToValidate = {
    textareainputelements: (function () {
      var elements = $('#page' + currentPage).find('textarea');
      if (elements.length > 0) {
        return elements;
      }
      return 'no elements found';
    } ()),
    textInputElements: (function () {
      var elements = $('#page' + currentPage).find('input[type=text]');
      if (elements.length > 0) {
        return elements;
      }
      return 'no elements found';
    } ())
  };

  // store the intervals id in outer scope so we can clear the interval when we change pages.
  userInputValidationIntervalId = setInterval(function () {
    var element;

    // Iterate through each one and remove any characters not in the whitelist.
    // Iterate through each one and trim any that are longer than textAreaMaxLength.

    for (element in elementsToValidate) {
      if (elementsToValidate.hasOwnProperty(element)) {
        if (elementsToValidate[element] === 'no elements found')
          continue;

        $.each(elementsToValidate[element], function () {
          $(this).attr('value', function () {
            var name = $(this).prop('tagName').toLowerCase();
            name = name === 'input' ? name + $(this).prop('type') : name;
            if (elementValue[name + 'Changed'](this))
              this.value = elementValue[name];
          });
        });
      }
    }
  }, 300); // milliseconds
};

Each time we change page, we clear the interval and reset it for the new page.

clearInterval(userInputValidationIntervalId);

setupUserInputValidation();

Update 2013-06-02:

Now with HTML5 we have the pattern attribute on the input tag, which allows us to specify a regular expression that the text about to be received is checked against. We can also see it here amongst the new HTML5 attributes . If used, this can make our JavaScript white listing redundant, providing we don’t have textareas which W3C has neglected to include the new pattern attribute on. I’d love to know why?

Escaping untrusted data

Escaped data will still render in the browser properly.
Escaping simply lets the interpreter know that the data is not intended to be executed,
and thus prevents the attack.

Now what we do here is extend the String prototype with a function called htmlEscape.

if (typeof Function.prototype.method !== "function") {
  Function.prototype.method = function (name, func) {
    this.prototype[name] = func;
    return this;
  };
}

String.method('htmlEscape', function () {

  // Escape the following characters with HTML entity encoding to prevent switching into any execution context,
  // such as script, style, or event handlers.
  // Using hex entities is recommended in the spec.
  // In addition to the 5 characters significant in XML (&, <, >, ", '), the forward slash is included as it helps to end an HTML entity.
  var character = {
    '&': '&amp;',
    '<': '&lt;',
    '>': '&gt;',
    '"': '&quot;',
    // Double escape character entity references.
    // Why?
    // The XmlTextReader that is setup in XmlDocument.LoadXml on the service considers the character entity references () to be the character they represent.
    // All XML is converted to unicode on reading and any such entities are removed in favor of the unicode character they represent.
    // So we double escape character entity references.
    // These now get read to the XmlDocument and saved to the database as double encoded Html entities.
    // Now when these values are pulled from the database and sent to the browser, it decodes the & and displays #x27; and/or #x2F.
    // This isn't what we want to see in the browser.
    "'": '&amp;#x27;',    // &apos; is not recommended
    '/': '&amp;#x2F;'     // forward slash is included as it helps end an HTML entity
  };

  return function () {
    return this.replace(/[&<>"'/]/g, function (c) {
      return character[c];
    });
  };
}());

This allows us to, well, html escape our strings.

element.value.htmlEscape();

In looking through here,
The only untrusted data we are capturing is going to be inserted into an Html element

tag by way of insertion into a textarea element,
or the attribute value of input elements where type="text".
I initially thought I’d have to:

  1. Html escape the untrusted data which is only being captured from textarea elements.
  2. Attribute escape the untrusted data which is being captured from the value attribute of input elements where type="text".

RULE #2 – Attribute Escape Before Inserting Untrusted Data into HTML Common Attributes of here,
mentions
“Properly quoted attributes can only be escaped with the corresponding quote.”
So I decided to test it.
Created a collection of injection attacks. None of which worked.
Turned out we only needed to Html escape for the untrusted data that was going to be inserted into the textarea element.
More on this in a bit.

Now in regards to the code comments in the above code around having to double escape character entity references;
Because we’re sending the strings to the browser, it’s easiest to single decode the double encoded Html on the service side only.
Now because we’re still focused on the client side sanitisation,
and we are going to shift our focus soon to making sure we cover the server side,
we know we’re going to have to create some sanitisation routines for our .NET service.
Because the routines are quite likely going to be static, and we’re pretty much just dealing with strings,
lets create an extensions class in a new project in a common library we’ve already got.
This will allow us to get the widest use out of our sanitisation routines.
It also allows us to wrap any existing libraries or parts of them that we want to get use of.

namespace My.Common.Security.Encoding
{
    /// <summary>
    /// Provides a series of extension methods that perform sanitisation.
    /// Escaping, unescaping, etc.
    /// Usually targeted at user input, to help defend against the likes of XSS attacks.
    /// </summary>
    public static class Extensions
    {
        /// <summary>
        /// Returns a new string in which all occurrences of a double escaped html character (that's an html entity immediatly prefixed with another html entity)
        /// in the current instance are replaced with the single escaped character.
        /// </summary>
        ///
        /// The new string.
        public static string SingleDecodeDoubleEncodedHtml(this string source)
        {
            return source.Replace("&amp;#x", "&#x");
        }
    }
}

Now when we run our xslt transformation on the service, we chain our new extension method on the end.
Which gives us back a single encoded string that the browser is happy to display as the decoded value.

return Transform().SingleDecodeDoubleEncodedHtml();

Now back to my findings from the test above.
Turns out that “Properly quoted attributes can only be escaped with the corresponding quote.” really is true.
I thought that if I entered something like the following into the attribute value of an input element where type="text",
then the first double quote would be interpreted as the corresponding quote,
and the end double quote would be interpreted as the end quote of the onmouseover attribute value.

 " onmouseover="alert(2)

What actually happens, is during the transform…

xslCompiledTransform.Transform(xmlNodeReader, args, writer, new XmlUrlResolver());

All the relevant double quotes are converted to the double quote Html entity ‘”‘ without the single quotes.

onmouseover

And all double quotes are being stored in the database as the character value.

Libraries and useful code

Microsoft Anti-Cross Site Scripting Library

OWASP Encoding Project
This is the Reform library. Supports Perl, Python, PHP, JavaScript, ASP, Java, .NET

Online escape tool supporting Html escape/unescape, Java, .NET, JavaScript

The characters that need escaping for inserting untrusted data into Html element content

JavaScript The Good Parts: pg 90 has a nice ‘entityify’ function

OWASP Enterprise Security API Used for JavaScript escaping (ESAPI4JS)

JQuery plugin

Changing encoding on html page

Cheat Sheets and Check Lists I found helpful

https://www.owasp.org/index.php/Input_Validation_Cheat_Sheet

https://www.owasp.org/index.php/OWASP_Validation_Regex_Repository

https://www.owasp.org/index.php/XSS_(Cross_Site_Scripting)_Prevention_Cheat_Sheet

https://www.owasp.org/index.php/DOM_based_XSS_Prevention_Cheat_Sheet

https://www.owasp.org/index.php/OWASP_AJAX_Security_Guidelines

If any of this is unclear, let me know and I’ll do my best to clarify. Maybe you have suggestions of how this could have been improved? Let’s spark a discussion.

JavaScript Properties

October 2, 2012

In ECMAScript 5 we now have two distinct kinds of properties.

  1. Data properties
  2. Accessor properties

A property is a named collection of attributes.
value: any JavaScript value
writable: boolean
configurable: boolean, common for both Data and Accessor
enumerable: boolean, common for both Data and Accessor
get: a function that returns a value
set: a function that takes an argument as its value

configurable

Any attempts to delete the property or change its (writable, configurable, or enumerable) attributes will fail if set to false.
if using strict mode, we get a run time error.
if not using strict mode, the behaviour is as it was with ES3,
the deletion attempt is ignored.
If set to false:
-It can not be re-set to true.
-We can change the value and writable attributes, but writable only from true to false.

enumerable

The property will be enumerated over when a for-in loop is encountered if set to true.
if using strict mode, it’s as if the property doesn’t exist, it’s ignored.

In ES5,

  • a default property descriptor; if the property is defined the old fashioned way, without using Object.defineProperty,
    the boolean attributes will all default to true.
  • A default property descriptor; if the property is defined using Object.defineProperty and the boolean attribute values not specified,
    the boolean attributes will all default to false.

I was wondering about this, as I had heard conflicting stories.
IMO this follows the Principle of least astonishment (POLA)

var obj1 = {};
var obj1PropertyDesc;
var obj2 = {};
var obj2PropertyDesc;

Object.defineProperty(obj1, 'propOnObj1', {
   value: 'value of propOnObj1' //,
   // writable: false,
   // enumerable: false,
   // configurable: false,
});

obj1PropertyDesc = Object.getOwnPropertyDescriptor(obj1, 'propOnObj1');

// obj1PropertyDesc {
//    configurable: false,
//    enumerable: false,
//    value: "value of propOnObj1",
//    writable: false
// }

obj2.propOnObj2 = 'value of propOnObj2';

obj2PropertyDesc = Object.getOwnPropertyDescriptor(obj2, 'propOnObj2');

// obj2PropertyDesc {
//    configurable: true,
//    enumerable: true,
//    value: "value of propOnObj2",
//    writable: true
// }

So in general

Properties declared the old ES3 way are configurable (can be deleted).
Properties declared using Object.defineProperty; by default are not configurable (can not be deleted).
See edge cases below.

The delete operator is used to remove a property from an object.
It does not touch properties in the prototype chain.
If you have a prototype that has a property with the same name, it will now be used when your code references the derived object’s property that no longer exists.

var objLiteral = {
   aProperty: 'value of super property'
}

var anObject = Object.create(objLiteral); // create is an ES5 method, but easy enough to replicate for ES3 implementations

anObject.aProperty = 'value of derived property';

anObject.aProperty  // 'value of derived property'
delete anObject.aProperty;
anObject.aProperty  // 'value of super property'

Edge cases

JavaScript Patterns pg 12 states “Implied globals created without var (regardless if created inside functions) can be
deleted.”
Thanks to Angus Croll for pointing this out as untrue.

obj1 = 'kims global property';
var obj1PropertyDesc;
var obj2PropertyDesc;

obj1PropertyDesc = Object.getOwnPropertyDescriptor(this, 'obj1');

// obj1PropertyDesc {
//    configurable: true,
//    enumerable: true,
//    value: "kims global property",
//    writable: true,
// }

(function (){
   obj2 = 'kims global property declared within function scope';
}());

obj2PropertyDesc = Object.getOwnPropertyDescriptor(this, 'obj2');

// obj2PropertyDesc {
//    configurable: false,
//    enumerable: true,
//    value: "kims global property declared within function scope",
//    writable: true
// }

delete obj2;
// Nope, obj2 was not deleted.
// turn strict mode on and we get the following error:
// Uncaught SyntaxError: Delete of an unqualified identifier in strict mode.

When you declare a global,
you are actually defining a property of the global object.
If you use the var keyword on that global, you are still creating a property.
That property is non-configurable (can not be deleted with the delete operator).
Only object properties with the configurable option set to true can be deleted.
Nothing else can be deleted.
Variables which are properties that we can’t access their property descriptor, can never be deleted.

var obj1 = {};
var obj1PropertyDesc;

obj1PropertyDesc = Object.getOwnPropertyDescriptor(this, 'obj1');

// obj1PropertyDesc {
//    configurable: false
//    enumerable: true
//    value: Object
//    writable: true
// }

There are a couple of notable internal properties that are found on all ES3 and ES5 objects.
[[Get]] and [[Put]].
The Ecma specs enclose internal properties in double square brackets as a convention only.
In ES3 [[Get]] and [[Put]] are used to return and set the internal [[Value]] property.
According to the Ecma5 spec, the internal [[Get]] and [[Put]] properties appear to do the same thing, although it’s not stated explicitly.
This may just be an oversight of the spec.

Accessor Properties

All the examples so far have been showing data properties.
By default properties are data properties unless they define a getter and/or setter,
in which case they are defined as accessor properties.
There are two attributes that are distinct to accessor properties.
get and set.
Both of which allow a method (and only a method) to be assigned to them to get or set respectively.

JavaScript getter error

  • Internally the getter calls the functions internal [[Call]] method with no arguments.
  • Internally the setter calls the functions internal [[Call]] method with an arguments list containing the assigned value as its sole argument.
    The setter may but is not required to have an effect on the value returned by subsequent calls to the properties internal [[Get]] method.

So these attributes may or may not leverage the internal [[Get]] and [[Put]] properties that are found in ES3 and ES5 on all objects.
You can in fact define only a getter (readonly), or only a setter (write-only) accessor if you so choose.

Defining accessor properties literally:

var testObj = {
   // An ordinary data property
   dataProp: 'value',

   // An accessor property defined as a pair of functions
   // get accessorProp() { return this.dataProp; },
   set accessorProp(value) { this.dataProp = value; }
};

testObj.accessorProp = 'an updated string';
alert(testObj.accessorProp); // undefined
alert(testObj.dataProp);  // an updated string

Can we create a data (default) property and then change it to be an accessor property?

var testObj = {}; // Start with no properties at all
// Add a nonenumerable data property x with value 1.
Object.defineProperty(testObj, 'x', { value : 1,
writable: true,
enumerable: false,
configurable: true});

// Check that the property is there but is non-enumerable
alert(testObj.x); // 1

// check that we can't enumerate the testObj
alert(Object.keys(testObj)); // returns an empty array of strings

// Now modify the property x so that it is read-only
Object.defineProperty(testObj, 'x', { writable: false });

// Try to change the value of the property
testObj.x = 2;
// Fails silently or throws TypeError in strict mode
alert(testObj.x); // 1

// The property is still configurable, so we can change its value like this:
Object.defineProperty(testObj, 'x', { value: 2 });

alert(testObj.x); // 2

// what happens if we change configurable to false?
Object.defineProperty(testObj, 'x', { configurable: false });
Object.defineProperty(testObj, 'x', { value: 2.5 }); // Uncaught TypeError: Cannot redefine property: x

// Now change x from a data property to an accessor property
// providing we haven't set configurable to false as above.
Object.defineProperty(testObj, 'x', {
   get: function() {
      return 0;
   }
});

alert(testObj.x); // 0

Yip.

Ok, so what does a property descriptor of an Accessor Property look like?

var objWithMultipleProperties;
var objWithMultiplePropertiesDescriptor;

objWithMultipleProperties = Object.defineProperties({}, {
   x: { value: 1, writable: true, enumerable:true, configurable:true },
   y: { value: 1, writable: true, enumerable:true, configurable:true },
   r: {
      get: function() {
         return Math.sqrt(this.x*this.x + this.y*this.y)
      },
      enumerable:true,
      configurable:true
   }
});

objWithMultiplePropertiesDescriptor = Object.getOwnPropertyDescriptor(objWithMultipleProperties, 'r');
// objWithMultiplePropertiesDescriptor {
//    configurable: true,
//    enumerable: true,
//    get: function () {
//       // other members in here
//    },
//    set: undefined,
//   // ...
// }

The Global Object

When the JavaScript interpreter starts (or whenever a web browser loads a new page),
it creates a new global object and gives it an initial set of properties that define:
• global properties like undefined, Infinity, and NaN
• global functions like isNaN(), parseInt(), and eval()
• constructor functions like Date(), RegExp(), String(), Object(), and Array()
• global objects like Math and JSON

delete undefined;  // not deleted
delete Infinity;   // not deleted
delete NaN;        // not deleted
delete isNaN;      // deleted
delete parseInt;   // deleted
delete eval;       // deleted
delete Date;       // deleted
delete RegExp;     // deleted
delete String;     // deleted
delete Object;     // deleted
delete Array;      // deleted
delete Math;       // deleted
delete JSON;       // deleted
undefined = 'kims undefined'; // nonassignable
Infinity = 'kims infinity';   // nonassignable
NaN = 'kims nan';             // nonassignable
  1. Why are undefined, Infinity and NaN not removed?
  2. Are they non-configurable?
  3. Why are they non-assignable?
  4. How do we test this?
  5. Are they constants?
  6. Are Infinity, NaN and undefined reserved words?

I’ll answer these questions shortly.

ES3 properties

According to the standard
8.6 “Each property consists of a name, a value and a set of attributes”.
8.6.1 A property can have zero or more attributes from the following set:

These attributes along with others (see ES3 spec) are reserved for internal use.

Attribute Description
ReadOnly The property is a read-only property.
Attempts by ECMAScript code to write to the property will be ignored.
(Note, however, that in some cases the value of a property with the ReadOnly attribute may change over time because of actions taken by the host environment; therefore “ReadOnly” does not mean “constant and unchanging”!)
DontEnum The property is not to be enumerated by a for-in enumeration
DontDelete Attempts to delete the property will be ignored.
Internal Internal properties have no name and are not directly accessible via the property accessor operators. This means the property is not accessible to the ECMAScript program.
How these properties are accessed is implementation specific.
How and when some of these properties are used is specified by the language specification.

These property attribute values can not be changed
An interesting Internal property is the [[Prototype]]
There are a number of ways to access the internal [[Prototype]] property indirectly.
I’ve detailed them in my post on prototypes here.

More on ES5 properties

writable, enumerable and configurable replace the ES3 property attributes: ReadOnly, DontEnum, DontDelete.
The property attributes and their values define the property descriptor object (including Data or Accessor properties and those that apply to both (enumerable and configurable)).

The property attributes can be manually managed by the:
Object.defineProperty and Object.defineProperties methods
Object.getOwnPropertyDescriptor

var myObj = {};

Object.defineProperty(myObj, 'propOnMyObj', {
   value: 'property descriptor',
   writable: true,    // ReadOnly = false in ES3
   enumerable: false, // DontEnum = true in ES3
   configurable: true // DontDelete = false in ES3
});

console.log(myObj.propOnMyObj); // 'property descriptor'

// getOwnPropertyDescriptor is the only way to get the properties attributes.
// They don't exist as visible properties on the property (other than for setting them as above),
// they're stored internally in the ECMAScript engine.
var myPropertyDescriptor = Object.getOwnPropertyDescriptor(myObj, 'propOnMyObj');

console.log(myPropertyDescriptor.enumerable); // false
console.log(myPropertyDescriptor.writable);   // true
// etc.

getOwnPropertyDescriptor

There’s lots of new methods defined in ES5.

Now, back to the six questions we had above.

  1. Why are undefined, Infinity and NaN not removed?
    Because  their property descriptors configurable attribute is set to false.
  2. Are they non-configurable?
    Yes, as above.
  3. Why are they non-assignable?
    Because their property descriptors writable attribute is set to false.
  4. How do we test this?
    NaN
    Number
    global Infinity property
    undefined
  5. Are they constants?
    Effectively, yes.
  6. Are Infinity, NaN and undefined reserved words?
    No. Avoid using their names to remove ambiguity.

ES3
Infinity read/write (the value can be changed). Holds positive infinity.
Number is a property on the global object, which has readonly properties Infinity and NaN.
NaN read/write (the value can be changed).
undefined

ES5
Infinity (well… POSITIVE_INFINITY) is a property on the global Number property with the value Infinity
We now also have NEGATIVE_INFINITY with the value –Infinity
Number
Infinity is also declared directly on the global object.
NaN is a property on the global object with the value NaN
undefined is a property on the global object with the value (you guessed it) undefined.
These are all constants now.

Important differences between Properties and Variables

Variables are properties, but not vice versa.

The VariableObject in ES3 is called the VariableEnvironment in ES5.
Can be seen in the specs.
Not sure why they changed what they called it.

Each execution context (be it global or any function) has an associated VariableObject.
Variables (and functions) created within a given context are bound as properties of that context’s VariableObject.
Even function parameters are added as properties of the VariableObject.
Discussed in depth in:
ECMAScript3 spec under “10 Execution Contexts”
ECMAScript5 spec under “10.3 Execution Contexts” onwards
This is why we can access global variables as properties of the global object…
Because that’s what they are.

  • The global object is created before control enters any execution context.
  • The global object is the same as the global contexts VariableObject.
  • In the HTML DOM; the window property of the global object is the global object.

Now variables of functions are similar, but we can’t access them as properties.
Why?…
ECMAScript has an Activation Object.
When control enters the execution context of a function, an activation object is created and associated with the execution context.
The activation object is initialised with:

  1. The this value
  2. an arguments property (referred to as a binding in ES5 spec) that has the DontDelete attribute (configurable set to false in ES5).

The activation object is then used as the VariableObject.
We can access members of the activation object but not the activation object itself,
which is why we can’t access the members as properties.
Further details in:
ECMAScript3 spec under “10.2 Entering An Execution Context”
ECMAScript5 spec under “10.4 Establishing an Execution Context”

Feature Detection (Yes, Including JavaScript)

I know this is not property specific, but it was something I thought noteworthy.

There’s a library that looks to have potential for JavaScript feature detection.
“has.js”
This should be useful for detecting what your users browsers are capable of EcmaScript wise.
The project lead is Peter Higgins (Dojo Toolkit project lead).
Has a good sized group of committers.
May have potential to be a better Modernizr.
The source is here.
Explanation of has.js features here.

Additional References:

Succinct explanation of Variables vs Properties in JavaScript

EcmaScript5 Objects and Properties

Slideshow by Doug Crockford on ES5’s new parts

Dmitry Soshnikov’s elaborations on the Ecma standards:

http://dmitrysoshnikov.com/ecmascript/es5-chapter-1-properties-and-property-descriptors/

http://dmitrysoshnikov.com/ecmascript/chapter-7-2-oop-ecmascript-implementation/

http://dmitrysoshnikov.com/ecmascript/chapter-2-variable-object/


Design a site like this with WordPress.com
Get started