Archive for the ‘JavaScript’ Category

Software Engineer Interview Quick Question Set

May 11, 2013

Ice breakers

  • Tell us a little bit about yourself and what drives you?
  • Ask a question from their CV that is positive, ‘what was your greatest success in your current or last role’
  • What’s your ideal job?
  • Can you give us one thing you really enjoyed in your last job?
  • What about one thing that you didn’t enjoy as much?
    How did you solve that?

Testing

  • How can you implement unit testing when there are dependencies between a business layer and a data layer, or the presentation layer and the business layer?
  • The development team is getting near release date. They start saying things like, we’re going to need a sprint to test. What would your reaction be?

Maintenance

  • What measures have you taken to make your software products more easily maintainable?
  • What is the most expensive part of the SDLC?
    (hint: reading others code)

Design and architecture

  • Can you explain some design patterns, and where you have used them?

Scrum

  • Have you used scrum before? (If the answer is no, move on)
  • If you were taken on as a team member and the team was failing Sprint after Sprint. What would you do?
  • What would you do if you were part of a Scrum Team and your manager asked you to do a piece of work not in the Scrum Backlog?
    (hint: manager needs to consult PO. Something has to be removed from Sprint backlog in order for something to be added)

Construction

  • When do you use an abstract class and when do you use an interface?
  • How do you make sure that your code is both safe and fast?
  • Can you describe the process you use for writing a piece of code, from requirements to delivery?

Software engineering questions

  • What are the benefits and drawbacks of Object Orientated Design?
    (hint: polymorphism inheritance encapsulation)
  • What books have you read on software engineering that you thought were good?
  • Explain the terms YAGNI, DRY, SOLID?
    (hint You Aint Gonna Need It. Build what you need as you need it, aggressively refactoring as you go along; don’t spend a lot of time planning for grandiose, unknown future scenarios. Good software can evolve into what it will ultimately become. Every piece of code is code we have to test. If the code is not needed, why are we spending time on it?)

Functional design questions

  • Which controls would you use when a user must select multiple items from a big list, in a minimal amount of space?
  • How would you design editing twenty fields for a list of 10 items? And editing 3 fields for a list of 1000 items?

Specific technical requirements

  • When, where and how do you optimize code?

Web questions

  • How would you mitigate SQL injection?
    (hint: looking for multi layered sanitisation. parameterised SQL. Least privileged account for data access)
  • Have you used XSS and can you provide us an example?
  • What JavaScript libraries have you used?
  • What are some of the irritating limitations of CSS?
  • How would you remove the ASP.NET_SessionId cookie from a MVC controllers Response?
    (hint: Response.Cookies["ASP.NET_SessionId"].Expires = DateTime.Now;)

JavaScript

  • How does JavaScript implement inheritance?
    (hint: via Object’s prototype property)

Service Oriented

  • What are the 3 things a WCF end point must have, or what is the ABC of a WCF service?
    (hint:
    Address – where the WCF service is hosted.
    Binding – that specifies the protocol and its myriad of options.
    Contract – service contract defines what service operations are available to the client for consumption.
    )

C# / .Net questions

  • What’s the difference between public, private, protected and internal modifiers?
  • What are the main differences between the .NET 2.0 and 4.0 garbage collector?
    (hint: background GC was introduced)
  • Describe the different ways arguments can be passed in C#
    (hint: pass val by val, pass val by ref, pass ref by val, pass ref by ref)
  • We have a Base class, we have a child class that inherits BaseClass. Does the child class inherit the base class’s private members?
    (hint: this is normally good for a laugh)
  • Have you ever worked with a deadlock and how did it occur?
  • When should locks be used in concurrent programming?
    (hint:
    when synchronization cannot be performed in any other way. This is rare. With careful thought and planning, there is just about always a better way. There are many ways to synchronise without using locks. System.Threading.Interlocked class generally supported by the processor
    )
  • What are some of your favourite .NET features?

Finally, this question is from Google; can you quickly tell us something that we don’t know anything about? It can be anything.

Advertisement

Software Engineer Interview Process and Questions

April 27, 2013

A short time ago, I was tasked with finding the right software engineer/s for the organisation I was working for. I settled on a process, a set of background questions,  a set of practical programming exercises and a set of verbal questions. Later on I cut the set of verbal questions down to a quicker set. In this post, I’ll be going over the process and the full set of verbal questions. In a subsequent post I’ll go over the quicker set.

The Process

  1. We sent them an email with a series of questions.
    Technical and non-technical.
    They have two days to reply with answers.
    The programming exercises are not covered here.
    If they passed this…
  1. We would get them in for an interview.
    Technical and non-technical questions would be asked.
    They would be put on the spot and asked to speak to the development team about a technical subject that they were familiar with.
    The development team would quiz them on whatever comes to mind.
    Once the candidate had left, the development team would collaborate on what they thought of the candidate and whether or not they would be a good fit for the team.
    The team would take this feedback and discuss whether the candidate should be given a trial. 
    Step 2 could be broken into two parts depending on how many questions and their intensity, you wanted to drill the candidate with.

The following set of tests will confirm whether the candidate satisfies the points we have asked for in the job description.

The non functional (soft) qualities listed on the Job add would need to be kept in mind during the interview events.

Qualities such as:

  • Quality focus
  • Passion
  • Personality
  • Commitment to the organisations needs
  • A genuine sense of excitement about the technologies we work with

Email test

  1. Send Screening.pdf
  2. Send InterviewQuestions.doc

Now with the following questions, with many of them there is not necessarily a right or wrong answer. Many of them are just to gauge how the candidate thinks and whether or not they hold the right set of values.

Ice breakers

  • Would you like to be the team leader or team member?
  • Tell me about a conflict at a previous job and how you resolved it.
  • (Summary personality item: Think to yourself, “If we hire this person, would I want to spend four hours driving in a car with them?”)

Design and architecture

  • What’s the difference between TDD and BDD and why do they matter?
  • What is Technical Debt. How do you deal with it once in it? How do you stay out of it?
  • How would you deal with a pair when reviewing their code, when they have not followed good design principles?
  • What would you do if a fellow team member reviewed your code and suggested you change something you had designed that followed good design principles, to something inferior?
  • Can you explain how the Composite pattern works and where you would use it?
  • Can you describe several class construction techniques?
    What are two design patterns that are focused on class construction, and how do they work?
    (hint: Builder, Factory Method).
  • How would you model the animal kingdom (with species and their behaviour) as a class system?
    (hint GoF design pattern. Abstract Factory)
  • Can you name a number of non-functional (or quality) requirements?
  • What is your advice when a customer wants high performance, high usability and high security?
  • What is your advice when a customer wants high performance, Good design, Cheap?
    (hint: pick 2)
  • What do low coupling and high cohesion mean? What does the principle of encapsulation mean to you?
  • Can you think of some concurrency patterns?
    (hint: Asynchronous Results, Background Worker, Compare/Exchange pattern via Interlocked.CompareExchange)
  • How would you manage conflicts in a web application when different people are editing the same data?
  • Where would you use the Command pattern?
  • Do you know what a stateless business layer is? Where do long-running transactions fit into that picture?
    (hint: if you have long-running transactions, you are going to have to manage state somehow. How would you do this?)
  • What kinds of diagrams have you used in designing parts of an architecture, or a technical design?
  • Can you name the different tiers and responsibilities in an N-tier architecture?
    (hint: presentation, business, data)
  • Can you name different measures to guarantee correctness and robustness of data in an architecture?
    (hint: for example transactions, thread synchronisation)
  • What does the acronym ACID stand for in relation to transactions?
    (hint: atomicity, consistency, isolation, durability)
  • Can you name any differences between object-oriented design and component-based design?
    (hint: objects vs services or documents)
  • How would you model user authorization, user profiles and permissions in a database?(hint: Membership API)

Scrum questions

  • Have you used Scrum before? (If the answer is no, not much point in asking the rest of these questions).
  • If you were taken on as a team member and the team was failing Sprint after Sprint. What would you do?
  • What are the Scrum events and the purpose of them?
    (hint: Daily Scrum, Sprint Planning Meetings 1 & 2, Sprint Review and Sprint Retrospective)
  • What would you do if you were part of a Scrum Team and your manager asked you to do a piece of work not in the Scrum Backlog?
  • Who decides what Product Backlog Items should be pulled into a Sprint?
  • What is the DoD and what is it useful for?
  • Where and how do changing requirements fit into scrum?

Construction questions

  • How do you make sure that your code can handle different kinds of error situations?
    (hint: TDD, BDD, testing…)
  • How do you make sure that your code is both safe and fast?
  • When would you use polymorphism and when would you use delegates?
  • When would you use a class with static members and when would you use a Singleton class?
  • Can you name examples of anticipating changing requirements in your code?
  • Can you describe the process you use for writing a piece of code, from requirements to delivery?
  • Explain DI / IoC. Are there any differences between the two? If so, what are they?
    (hint: DI is one method of following the Dependency Inversion Principle (DIP) or IoC)

Software engineering skills

  • What is Object Oriented Design? What are the benefits and drawbacks?
    (hint: polymorphism inheritance encapsulation)
  • What is the role of interfaces in design?
  • What books have you read on software engineering that you thought were good?
  • What are important aspects of GUI design?
  • What Object Relational Mapping tools have you used?
  • What are the differences between Model-View-Controller, Model-View-Presenter and Model-View-ViewModel
    Can you draw MVC and MVP?
    (hint: doted lines are pub/sub)

MVCM-V-VM

  • What is the difference between Mocks, Stubs, Fakes and Dummies?
  • (hint:
    Mocks are objects pre-programmed with expectations which form a specification of the calls they are expected to receive. Stubs provide canned answers to calls made during the test, usually not responding at all to anything outside what’s programmed in for the test.
    Stubs may also record information about calls, such as an email gateway stub that remembers the messages it ‘sent’, or maybe only how many messages it ‘sent’.
    Fake objects actually have working implementations, but usually take some shortcut which makes them not suitable for production (an in memory database is a good example).
    Dummy objects are passed around but never actually used. Usually they are just used to fill parameter lists.)
  • Describe the process you would take in setting up CI for our company?
  • We’re going to design the new IMDB.
    On the whiteboard, what would the table that holds the movies look like?
    Every movie has actors, how would the Actors table look?
    Actors star in many movies, any adjustments?
    We need to track Characters also. Any adjustments to the schema?

Relational Database

  • What metrics, like cyclomatic complexity, do you think are important to track in code?

Functional design questions

  • What are metaphors used for in functional design? Can you name some successful examples?
    (hint: Partial Function Application, Currying)
  • How can you reduce the user’s perception of waiting when some routines take a long time?
  • Which controls would you use when a user must select multiple items from a big list, in a minimal amount of space?
  • How would you design editing twenty fields for a list of 10 items? And editing 3 fields for a list of 1000 items?
  • Can you name some limitations of a web environment vs. a Windows environment?

Specific technical requirements

  • What software have you used for bug tracking and version control?
  • Which branching models have you used?
    (hint: No Branches, Release, Maintenance, Feature, Team)
  • What have you used for unit testing, integration testing, UA testing, UI testing?
  • What build tools are you familiar with?
    (hint: Nant, Make, Rake, PSake)

Web questions

  • Would you use a black list or white list? Why?
  • Can you explain XSS and how it works?
  • Can you explain CSRF? and how it works?
  • What is the difference between GET and POST in web forms? How do you decide which to use?
  • What do you know about HTTP.
    (hint: Application Layer of OSI model (layer 7), stateless)
  • What are the HTTP methods sometimes called verbs?
    (hint: there are 9 of them. HEAD, GET, POST, PUT, DELETE, TRACE, OPTIONS, CONNECT, PATCH)
  • How do you get the current users name from an MVC Controller?
    (hint: The controller has a User property which is of type IPrinciple which has an Identity property of type IIdentity, which has a Name property)
  • What JavaScript libraries have you used?
  • What is the advantage of using CSS?
  • What are some of the irritating limitations of CSS?

JavaScript questions

  • How does JavaScript implement inheritance?
    (hint: via Object’s prototype property)
  • What is the difference between "==" and "===", "!=" and "!=="?
    (hint: If the two operands are of the same type and have the same value, then “===” produces true and “!==” produces false. The evil twins do the right thing when the operands are of the same type, but if they are of different types, they attempt to coerce the values. The rules by which they do that are complicated and unmemorable.
    If you want to use "==", "!=" be sure you know how it works and test well.
    By default use “===” and “!==“. )
    These are some of the interesting cases:
'' == '0'          // false
0 == ''            // true
0 == '0'           // true
false == 'false'   // false
false == '0'       // true
false == undefined // false
false == null      // false
null == undefined  // true
' \t\r\n ' == 0    // true
  • On the whiteboard, could you show us how to create a function that takes an object and returns a child object?
if (typeof Object.create !== ‘function’) {
   Object.create = function (o) {
      var F = function () {};
      F.prototype = o;
      return new F();
   };
}
var child = Object.create(parent);
  • When is “this” bound to the global object?
    (hint: When the function being invoked is not the property of an object)
  • With the following code, how does myObject.pleaseSetValue set myObject.value?
var myObject = {
	value: 0
};

myObject.setValue = function () {
	var that = this; // don’t show this

	var pleaseSetValue = function () {
		that.value = 10; // don’t show this
	};
	pleaseSetValue ();
}
myObject.setValue();
document.writeln(myObject.value); // 10

Service Oriented questions

  • Can you think of any Advantages and Disadvantages in using SOA over an object oriented n-tier model?
  • What’s the simplest way to make a service call from within a web page and how many lines could you do this in?
  • What scales better, per-call services or per-session and why?
    (hint: maintaining service instances (maintaining state) in memory or any entities for that matter quickly blows out memory and other resources.)
  • What is REST’s primary objective?
  • How many ways can you create a WCF proxy?
    (hint:
    Add Service Reference via Visual Studio project
    Using svcutil.exe
    Create proxy on the fly with… new ChannelFactory<IMyContract>().CreateChannel();
    )
  • What do you need to turn on on the service in order to create a proxy?
    (hint: enable an HTTP-GET behaviour, or MEX endpoint)

C# / .Net questions

  • What’s the difference between public, private, protected and internal modifiers?
    Which ones can be used together?
  • What’s the difference between static and non-static methods?
  • What’s the most obvious difference in IL with static constructors?
    (hint: static method causes compiler to not mark type with beforefieldinit, thus giving lazy initialisation.)
  • How have you used Reflection?
  • What does the garbage collector clean up?
    (hint: managed resources, not unmanaged resources. Such as files, streams and handles)
  • Why would you implement the the IDisposable interface?
    (hint: clean up resources deterministically. Clean up unmanaged resources.)
  • Where should the Dispose function be called from?
    (hint: the objects finalizer)
  • Where is an objects finalizer called from?
    (hint: the GC)
  • If you call an objects Dispose method, what System method should you also make sure is called?
    (hint: System.GC.SuppressFinalize)
  • Why should System.GC.SuppressFinalize be called?
    (hint: finalization is expensive)
  • Are strings mutable or immutable?
    (hint: immutable)
  • What’s the most significant difference between struct’s and class’s?
    (hint: struct : value type, class : reference type)
  • What are the other differences between struct’s and class’s?
    (hint: struct’s don’t support inheritance (all value types are sealed) or finalizers)
    (hint: struct’s can have the same fields, methods, properties and operators)
    (hint: struct’s can implement interfaces)
  • Where are reference types stored? Where are value types stored?
    (hint:
    bit of a trick question. Ref on the heap, val on the stack (generally)
    The reference part of reference type local variables is stored on the stack.
    Value type local variables also on the stack.
    Content of reference type variables is stored on the heap.
    Member variables are stored on the heap.
    )
  • Where is the yield key word used?
    (hint: within an iterator)
  • What are some well known interfaces in the .net library that iterators provide implementation for?
    (hint: IEnumerable<T> )
  • Are static methods thread safe?
    (hint: a new stack frame is created with every method call. All local variables are safe… so long as they are not reference types being passed to another thread or being passed to another thread by ref.)
  • What is the TPL used for?
    (hint: a set of API’s in the System.Threading and System.Threading.Tasks namespaces simplifying the process of adding parallelism and concurrency to applications.)
  • What rules would you consider when choosing a lock object?
    (hint: keep the scope as tight as possible (private), so other threads cannot change its value, thus causing the thread to block.
    Declare as readonly, as its value should not be changed.
    Must not be a value type.
    If the lock keyword is used on a value type, the compiler will report an error.
    If used with System.Threading.Monitor, an exception will occur at runtime, because Monitor.Exit receives a boxed copy of the original variable.
    Never lock on “this”.)
  • Why would you declare a field as volatile?
    (hint: So that the order of the operations performed on the variable are not optimised to a different order.)
  • Are reads and writes to a long (System.Int64) atomic? Are reads and writes to a int (System.Int32) atomic?
    (hint: The runtime guarantees that a type whose size is no bigger than a native integer will not be read or written only partially. This is in the CLI spec and the C# 4.0 spec.)
  • Before invoking a delegate instance just before the null check is performed, What’s a good way to make sure no other threads can set your delegate to null between when the check occurs and when you invoke it?
    (hint:
    assign reference to heap allocated memory to stack allocated implements thread safety.
    Assign your delegate instance to a second local delegate variable.
    This ensures that if subscribers to your delegate instance are removed (by a different thread) between checking for null and firing the invocation, you won’t fire a NullReferenceException.)
void OnCheckChanged(EventArgs e) {
	// assign reference to heap allocated memory to
	// stack allocated implements thread safety

	// CheckChanged is a member declared as…  public event EventHandler CheckChanged;
	EventHandler threadSafeCheckChanged = CheckChanged;
	if (threadSafeCheckChanged != null)  {
		// fire the event off
		foreach(EventHandler handler in threadSafeCheckChanged.GetInvocationList()) {
			try {
				handler(this, e);
			}
			catch(Exception e) {
				// handling code
			}
		}
	}
}
  • What is a deadlock and how does one occur? Can you draw it on the white board?
    (hint: two or more threads wait for each other to release a synchronization lock.
    Example:
    Thread A requests a lock on _sync1, and then later requests a lock on _sync2 before releasing the lock on _sync1.
    At the same time,
    Thread B requests a lock on _sync2, followed by a lock on _sync1, before releasing the lock on _sync2.
    )
  • How many ways are there to implement an interface member, and what are they?
    (hint: two. Implicit and explicit member implementation)
  • How do I declare an explicit interface member?
    (hint: prefix the member name with the interface name)
public class MyClass : SomeBaseClass ,IListable, IComparable {
    // …
    public intCompareTo(object obj) {
        // …
    }

    #region IListable Members
    string[] Ilistable.ColumnValues {

        get {
            // …
            return values;
        }
    }
    #endregion
}
  • Write the above on a white board, then ask the following question. If I want to make a call to an explicit member implementation like the above, How do I do it?
string[] values;
    MyClass obj1, obj2;

    // ERROR:  Unable to call ColumnValues() directly on a contact
    // values = obj1.ColumnValues;

    // First cast to IListable.
    values = ((IListable)obj2).ColumnValues;
  • What is wrong with the following snippet?
    (hint: possibility of race condition.
    If two threads in the program both call GetNext simultaneously, two threads might be given the same number. The reason is that _curr++ compiles into three separate steps:
    1. Read the current value from the shared _curr variable into a processor register.
    2. Increment that register.
    3. Write the register value back to the shared _curr variable.
    Two threads executing this same sequence can both read the same value from _curr locally (say, 42), increment it (to, say, 43), and publish the same resulting value. GetNext thus returns the same number for both threads, breaking the algorithm. Although the simple statement _curr++ appears to be atomic, this couldn’t be further from the truth.)
// Each call to GetNext should hand out a new unique number
static class Counter {
    internal static int _curr = 0;
    internal static int GetNext() {
        return _curr++;
    }
}
  • What are some of your favourite .NET features?

Data structures

  • How would you implement the structure of the London underground in a computer’s memory?
    (hint: how about a graph. The set of vertices would represent the stations. The edges connecting them would be the tracks)
  • How would you store the value of a colour in a database, as efficiently as possible?
    (hint: assuming we are measuring efficiency in size and not retrieval or storage speed, and the colour is 16^6 (FFFFFF), store it as an int)
  • What is the difference between a queue and a stack?
  • What is the difference between storing data on the heap vs. on the stack?
  • What is the number 21 in binary format? And in hex?
    (hint: 10101, 15)
  • What is the last thing you learned about data structures from a book, magazine or web site?
  • Can you name some different text file formats for storing unicode characters?
  • How would you store a vector in N dimensions in a datatable?

Algorithms

  • What type of language do you prefer for writing complex algorithms?
  • How do you find out if a number is a power of 2? And how do you know if it is an odd number?
  • How do you find the middle item in a linked list?
  • How would you change the format of all the phone numbers in 10,000 static html web pages?
  • Can you name an example of a recursive solution that you created?
  • Which is faster: finding an item in a hashtable or in a sorted list?
  • What is the last thing you learned about algorithms from a book, magazine or web site?
  • How would you write a function to reverse a string? And can you do that without a temporary string?
  • In an array with integers between 1 and 1,000,000 one value is in the array twice. How do you determine which one?
  • Do you know about the Traveling Salesman Problem?

Testing questions

  • It’s Monday and we’ve just finished Sprint Planning. How would you organize testing?
  • How do you verify that new changes have not broken existing features?
    (hint: regression test)
  • What can you do reduce the chance that a customer finds things that he doesn’t like during acceptance testing?
  • Can you tell me something that you have learned about testing and quality assurance in the last year?
  • What sort of information would you not want to be revealed via Http responses or error messages?
    (hint: Critical info about the likes of server name, version, installed program versions, etc)
  • What would you make sure you turned off on an app or web server before deployment?
    (hint: directory listing?)

Maintenance questions

  • How do you find an error in a large file with code that you cannot step through?
  • How can you make sure that changes in code will not affect any other parts of the product?
  • How can you debug a system in a production environment, while it is being used?

Configuration management questions

  • Which items do you normally place under version control?
  • How would you manage changes to technical documentation, like the architecture of a product?

Project management

  • How many of the three variables scope, time and cost can be fixed by the customer?
  • Who should make estimates for the effort of a project? Who is allowed to set the deadline?
  • Which kind of diagrams do you use to track progress in a project?
  • What is the difference between an iteration and an increment?
  • Can you explain the practice of risk management? How should risks be managed?
  • What do you need to be able to determine if a project is on time and within budget?
    (hint: Product Backlog burn-down)
  • How do you agree on scope and time with the customer, when the customer wants too much?

Candidate displays how they communicate / present to a group of people about a technical topic they are passionate and familiar about.

References I used

If any of these questions or answers are not clear, or you have other great ideas for questions, please leave comments.

Running Wireshark as non-root user

April 13, 2013

As part of my journey with Node.js I decided I wanted to see exactly what was happening on the wire. I decided to use Burp Suite as the Http proxy interceptor and Wireshark as the network sniffer (not an interceptor). Wireshark can’t alter the traffic, it can’t decrypt SSL traffic unless the encryption key can be provided and Wireshark is compiled against GnuTLS.

This post is targeted at getting Wireshark running on Linux. If you’re a windows user, you can check out the Windows notes here.

When you first install Wireshark and try to start capturing packets, you will probably notice the error “You didn’t specify an interface on which to capture packets.”

When you try to specify an interface from which to capture, you will probably notice the error “There are no interfaces on which a capture can be done.”

You can try running Wireshark as root: gksudo wireshark

Wireshark as root

This will work, but of course it’s not a good idea to run a comprehensive tool like Wireshark (over 1’500’000 lines of code) as root.

So what’s actually happening here?

We have dumpcap and we have wireshark. dumpcap is the executable responsible for the low level data capture of your network interface. wireshark uses dumpcap. Dumpcap needs to run as root, wireshark does not need to run as root because it has Privilege Separation.

If you look at the above suggested “better way” here, this will make a “little” more sense. In order for it to make quite a lot more sense, I’ll share what I’ve just learnt.

Wireshark has implemented Privilege Separation which means that the Wireshark GUI (or the tshark CLI) can run as a normal user while the dumpcap capture utility runs as root. Why can’t this just work out of the box? Well there is a discussion here on that. It doesn’t appear to be resolved yet. Personally I don’t think that anybody wanting to use wireshark should have to learn all these intricacies to “just use it”. As the speed of development gets faster, we just don’t have time to learn everything. Although on the other hand, a little understanding of what’s actually happening under the covers can help in more ways than one. Anyway, enough ranting.

How do we get this to all “just work”

from your console:

sudo dpkg-reconfigure wireshark-common

You’ll be prompted:

Configuring wireshark-common

Respond yes.

The wireshark group will be added

If the Linux Filesystem Capabilities are not present at the time of installing wireshark-common (Debian GNU/kFreeBSD, Debian GNU/Hurd), the installer will fall back to set the set-user-id bit to allow non-root users to capture packets. Custom built kernels may lack Linux Capabilities.

The help text also warns about a security risk which isn’t an issue because setuid isn’t used. Rather what actually happens is the following:

addgroup --quiet --system wireshark
chown root:wireshark /usr/bin/dumpcap
setcap cap_net_raw,cap_net_admin=eip /usr/bin/dumpcap

You will then have to manually add your user to the wireshark group.

sudo adduser kim wireshark # replacing kim with your user

or

usermod -a -G wireshark kim # replacing kim with your user

log out then back in again.

I wanted to make sure that what I thought was happening was actually happening. You’ll notice that if you run the following before and after the reconfigure:

ls -liah /usr/bin/dumpcap | less

You’ll see:

-rwxr-xr-x root root /usr/bin/dumpcap initially
-rwxr-xr-x root wireshark /usr/bin/dumpcap after

And a before and after of my users and groups I ran:

cat /etc/passwd | cut -d: -f1
cat /etc/group | cut -d: -f1

Alternatively to using the following as shown above, which gives us a nice abstraction (if that’s what you like):

sudo dpkg-reconfigure wireshark-common

We could just run the following:

addgroup wireshark
sudo chgrp wireshark /usr/bin/dumpcap
sudo chmod 750 /usr/bin/dumpcap
sudo setcap cap_net_raw,cap_net_admin+eip /usr/bin/dumpcap

The following will confirm the capabilities you just set.

getcap /usr/bin/dumpcap

What’s with the setcap?

For full details, run:

man setcap
man capabilities

setcap sets the capabilities of each specified filename to the capabilities specified (thank you man ;-))

For sniffing we need two of the capabilities listed in the capabilities man page.

  1. CAP_NET_ADMIN Perform various network-related operations (e.g., setting privileged socket options, enabling multicasting, interface configuration, modifying routing tables). This allows dumpcap to set interfaces to promiscuous mode.
  2. CAP_NET_RAW Use RAW and PACKET sockets. Gives dumpcap raw access to an interface.

For further details check out Jeremy Stretch’s explanation on Linux Filesystem Capabilities and using setcap. There’s also some more info covering the “eip” in point 2 here and the following section.

man capabilities | grep -A24 "File Capabilities"

Lets run Wireshark as our usual low privilege user

Now that you’ve done the above steps including the log off/on, you should be able to run wireshark as your usual user and configure your listening interfaces and start capturing packets.

Also before we forget… Ensure Wireshark works only from root and from a user in the “wireshark” group. You can add a temp user (command shown above).

Log in as them and try running wireshark. You should have the same issues as you had initially. Remove the tempuser:

userdel -r tempuser

Setup of Chromium, Burp Suite, Node.js to view HTTP on the wire

March 30, 2013

As part of my Node.js development I really wanted to see what was going over the wire from chromium-browser to my Node.js web apps.

I have node.js installed globaly, express installed locally, a very simple express server listening on port 3000

var express = require('express');
var app = express();

app.get('/', function (request, response) {
   response.send('Welcome to Express!');
});

app.listen(3000);

Burp Suite setup in my main menu. Added the command via System menu -> Preferences -> Main Menu

Burp Suite Command

The Command string looks like the following.

java -jar -Xmx1024m /WhereTheBurpSuiteLives/burpsuite_free_v1.5.jar

Setting up Burp Suite configuration details are found here. I’ve used Burp Suite before several times. Most notably to create my PowerOffUPSGuests library which I discuss here. In that usage I reverse engineered how the VMware vSphere client shuts down it’s guests and replicated the traffic in my library code. For a simple setup, it’s very easy to use. You can spend hours exploring Burps options and all the devious things you can use it for, but to get started it’s simple. Set it up to listen on localhost and port 3001 for this example.

Burp Suite Proxy Listeners

Run the web app

to start our express app from the directory where our above server is located, from a console, run:

node index.js

Where index.js is the name of the file that contains our JavaScript.

To test that our express server is active. We can browse to http://localhost:3000/ or we can curl it:

curl -i  http://localhost:3000/

Should give us something in return like:


HTTP/1.1 200 OK
X-Powered-By: Express
Content-Type: text/html; charset=utf-8
Content-Length: 19
Date: Sun, 24 Mar 2013 07:53:38 GMT
Connection: keep-alive

Welcome to Express!

Now for the Proxy interception (Burp Suite)

Now that we’ve got end to end comms, lets test the interceptor.

Run burpsuite with the command I showed you above.

Fire the Http request at your web app via the proxy:

curl -i --proxy http://localhost:3001 http://localhost:3000/

Now you should see burps interceptor catch the request. On the Intercept tab, press the Forward button and curl should show a similar response to above.

Burp Suite Proxy Intercept

If you look at the History tab, you can select the message curl sent and also see the same Response that curl received.

Burp Suite Proxy History

Now you can also set Burp to intercept the server responses too. In fact Burp is extremely configurable. You can also pass the messages to different components of Burp to process how ever you see fit. As you can see in the above image looking at all the tabs that represent burp tools. These can be very useful for penetration testing your app as you develop it.

I wanted to be able to use chromium normally and also be able to open another window for browsing my express apps and viewing the HTTP via Burp Suite. This is actually quite simple. Again with your app running locally on port 3000 and burp listening on port 3001, run:

chromium-browser --temp-profile --proxy-server=localhost:3001

For more chromium options:

chromium-browser -help

Now you can just browse to your web app and have burp intercept your requests.

chromium proxied via burp

You may also want to ignore requests to your search provider, because as your typing in URL’s chromium will send searches when you pause. Under Proxy->Options tab you can do something like this:

Ignore Client Requests

Generic Coding Standards and Guidelines

January 5, 2013

Merging Conventions to Aid Readability thus Reducing Development Time

When programming in a mixed-language environment,
the naming conventions, formatting conventions, documentation conventions, and other conventions,
can be optimised for overall consistency and readability.
This may mean going against convention for one or more of the languages that’s part of the mix.

For Example…

in many classical Object Oriented and procedural languages,
routine names have an initial capital letter (PascalCase).
The convention in JavaScript is for routine names to have an initial lower case letter (camelCase),
unless the routine is a constructor (intended to be used with the new prefix).
When a constructor is invoked without the new prefix,
the constructors this will be bound to the global object,
rather than where it should be…
The functions execution context.
When invoked with the new prefix as it should be,
the function object will be created with a hidden link to the value of the functions prototype,
and the functions this value will be bound to the function object (where it should be).
Because this convention has a very important reason,
your team may decide to carry that convention across the other languages you use.

Refactor or Document Short, Hard to Read Names

I don’t know how many times I see code that uses very short names which make readability difficult.
What’s worse, is that so often there are many different names that mean the same thing sprinkled across the project/s.
Short, hard to read, pronounce, or understand names are rarely needed with the programming languages of today.
Use easily and quickly readable names where ever possible.
If you have to use short names or abbreviations, keep them consistent.
Translation tables are good for this.
You can have a commented translation table at the beginning of a file,
or at the project level if the names are wider spread.
Names should be specific to the domain your working in, rather than to the programming language.

Meaningful Loop Index Names

If your loop is more than a couple of lines long or you have nested loops,
make your loop index name something meaningful,
rather than i, j, k etc.

Additional Thoughts

  • Code is read many more times than it is written.
    Make sure the names you choose favour read-time over write-time convenience.
  • If you have names that are general or vague enough to be used for multiple purposes,
    refactor your code, maybe create additional entities that have more specific names.
  • Don’t leave the meaning of the name to guess work.
    This taxes the programmers mind unnecessarily.
    There are better uses of our cycles.
  • Agree on and adopt a set of coding standards and guidelines.
    It’s more important to have standards than to not have them because you can’t agree on the “right” way.
    They will save wasted time and arguments during coding, and code reviewing.

JavaScript Coding Standards and Guidelines

December 19, 2012

This is the current set of coding standards and guidelines I use when I’m coding in the JavaScript language.
I thought it would be good to share so others could get use out of them also, and maybe start a discussion as to amendments / changes they see that could be useful?

Naming Conventions

Names should be formed from the 26 upper and lower case letters (A .. Z, a .. z), the 10 digits (0 .. 9), and _ (underbar).
Avoid use of international characters because they may not read well or be understood everywhere.
Do not use $ (dollar sign) or \ (backslash) in names.

I think jQuery would have to be an exception to this

Do not use _ (underbar) as the first character of a name.
It is sometimes used to indicate privacy, but it does not actually provide privacy.
If privacy is important, use the forms that provide private members.

Most variables and functions should start with a lower case letter.

Constructor functions which must be used with the new prefix should start with a capital letter.
JavaScript issues neither a compile-time warning nor a run-time warning if a required new is omitted.
Bad things can happen if new is not used, so the capitalization convention is the only defence we have.

Global variables should be in all caps.
JavaScript does not have macros or constants, so there isn’t much point in using all caps to signify features that JavaScript doesn’t have.
Same with Enum names. There is no native ability to create constant variables. Although… you can create read-only properties.
With the advent of ES5 we now have a couple of well known techniques to enforce that our property values can not be altered once initialised.

When you define a property using one of the ES5 techniques, (1) when you set the writable property attribute to false the value of the value attribute can no longer be altered. (2) Using an accessor property with only a get function

var objWithMultipleProperties;
var objWithMultiplePropertiesDescriptor;

objWithMultipleProperties = Object.defineProperties({}, {
   x: { value: 1, writable: true, enumerable:true, configurable:true }, // Change writable to false enables read-only semantics.
   y: { value: 1, writable: true, enumerable:true, configurable:true }, // Change writable to false enables read-only semantics.
   r: {
      get: function() {
         return Math.sqrt(this.x*this.x + this.y*this.y)
      },
      enumerable:true,
      configurable:true
   }
});

objWithMultiplePropertiesDescriptor = Object.getOwnPropertyDescriptor(objWithMultipleProperties, 'r');
// objWithMultiplePropertiesDescriptor {
//    configurable: true,
//    enumerable: true,
//    get: function () {
//       // other members in here
//    },
//    set: undefined,
//   // ...
// }

See here for complete coverage of property attributes.

Coding Style

Commenting

Use comments where needed.

If the script needs commenting due to complexity, consider revising before commenting.
Comments easily rot (can be left behind after future code changes).
It is important that comments be kept up-to-date. Erroneous comments can make programs even harder to read and understand.
In this case they cause more damage than they originally gave benefit.
Comments should be well-written and clear, just like the code they are annotating.

Make comments meaningful. Focus on what is not immediately visible. Don’t waste the reader’s time with stuff like

i = 0; // Set i to zero.

Comment Style

Block comments are not safe for
commenting out blocks of code. For example:

/*
var rm_a = /a*/.match(s);
*/

causes a syntax error. So, it is recommended that /* */ comments be avoided and //
comments be used instead.

Don’t use HTML comments in script blocks. In the ancient days of javascript (1995), some browsers like Netscape 1.0 didn’t have any support or knowledge of the script tag. So when javascript was first released, a technique was needed to hide the code from older browsers so they wouldn’t show it as text in the page. The ‘hack’ was to use HTML comments within the script block to hide the code.

<script language="javascript">
<!--
   // code here
   //-->
</script>

No browsers in common use today are ignorant of the <script> tag, so hiding of javascript source is no longer necessary. Thanks Matt Kruse for re-iterating this.

File Organization

JavaScript programs should be stored in and delivered as .js files.

  • JavaScript should not be in HTML
  • JavaScript should not be in CSS
  • CSS should not be in JavaScript
  • HTML should not be in JavaScript

Code in HTML adds significantly to page weight with no opportunity for mitigation by caching and compression. There are many other reasons to keep the UI layers separate.

<script src=filename.js>; tags should be placed as late in the body as possible.
This reduces the effects of delays imposed by script loading on other page components.

There is no need to use the language or type attributes. It is the server, not the script tag, that determines the MIME type.

Formatting

Bracing

Use same line opening brace.
Brace positioning is more or less a holy war without any right answer — except in JavaScript, where same-line braces are right and you should always use them. Here’s why:

return
{
  ok: false;
};

return {
  ok: true;
};

What’s the difference between these two snippets? Well, in the first one, you silently get something completely different than what you wanted.
The lone return gets mangled by the semicolon insertion process and becomes return; and returns nothing.
The rest of the code becomes a plain old block statement, with ok: becoming a label (of all things)! Having a label there might make sense in C, where you can goto, but in JavaScript, it makes no sense in this context.
And what happens to false? it gets evaluated and completely ignored.
Finally, the trailing semicolon — what about that?
Do we at least get a syntax error there? Nope: empty statement, like in C.

Spacing

Blank lines improve readability by setting off sections of code that are logically related.

Blank spaces should be used in the following circumstances:

  • A keyword followed by ( (left parenthesis) should be separated by a space.
    while (true) {
    
  • A blank space should not be used between a function value and its ( (left parenthesis). This helps to distinguish between keywords and function invocations.
  • All binary operators except . (period) and ( (left parenthesis) and [ (left bracket) should be separated from their operands by a space.
  • No space should separate a unary operator and its operand except when the operator is a word such as typeof.
  • Each ; (semicolon) in the control part of a for statement should be followed with a space.
  • Whitespace should follow every , (comma).

Tabs and Indenting

The unit of indentation is three spaces.
Use of tabs should be avoided because (as of this writing in the 21st Century) there still is not a standard for the placement of tab stops.
The use of spaces can produce a larger file size, but the size is not significant over local networks, and the difference is eliminated by minification.

Line Length

Avoid lines longer than 80 characters. When a statement will not fit on a single line, it may be necessary to break it.
Place the break after an operator, ideally after a comma.
A break after an operator decreases the likelihood that a copy-paste error will be masked by semicolon insertion.
The next line should be indented six spaces.

Language Usage

Access To Members

In JavaScript we don’t have access modifiers like we do in classical languages.

In saying that, we can and should still control the accessibility of our members, and keep as much of our objects information secret to outsiders.
We can apply public, private and Privileged access.

public

Members created in the Constructor using the dot notation

function Container(param) {
    this.member = param;
}

So, if we construct a new object

var myContainer = new Container('abc');

then myContainer.member contains 'abc'.

In the prototype we can add a public method.
To add a public method to all objects made by a constructor, add a function to the constructor’s prototype:

Container.prototype.stamp = function (string) {
    return this.member + string;
}

We can then invoke the method

myContainer.stamp('def')

which produces 'abcdef'.

private

Ordinary vars and parameters of the constructor becomes the private members

function Container(param) {
    this.member = param;
    var secret = 3;
    var that = this;
}

param, secret, and that are private member variables.
They are not accessible to the outside.
They are not even accessible to the objects own public methods.

They are accessible to private methods
Private methods are inner functions of the constructor.

function Container(param) {

    this.member = param; // param is private, member is public
    var secret = 3;      // secret is private
    var that = this;     // that is private
    function dec() {     // dec is private
        var innerFunction = function () {
            that.value = someCrazyValue;  // when dec is called, value will be a member of the newly constructed Container instance.
        };

        if (secret > 0) {
            secret -= 1;
            return true;
        } else {
            return false;
        }
    }
}

A method (which is a function that belongs to an object) cannot employ an inner function to help it do its work because the inner function does not
share the method’s access to the object as its this is bound to the global object. This was a mistake in the design of the language.
Had the language been designed correctly, when the inner function is invoked, this would still be bound to the this variable of the outer function.
The work around for this is to define a variable and assign it the value of this.
By convention, the name of that variable I use is that.

  • Private methods cannot be called by public methods. To make private methods useful, we need to introduce a privileged method.
privileged

A privileged method is able to access the private variables and methods, and is itself accessible to the public methods and the outside. It is possible to delete or replace a privileged method, but it is not possible to alter it, or to force it to give up its secrets.

Privileged methods are assigned with this within the constructor.

function Container(param) {

    this.member = param;
    var secret = 3;
    var that = this;

    function dec() {
        if (secret > 0) {
            secret -= 1;
            return true;
        } else {
            return false;
        }
    }

    this.service = function () { // prefix with this to assign a function to be a privileged method
        if (dec()) {
            return that.member;
        } else {
            return null;
        }
    };
}

service is a privileged method. Calling myContainer.service() will return 'abc' the first three times it is called. After that, it will return null. service calls the private dec method which accesses the private secret variable. service is available to other objects and methods, but it does not allow direct access to the private members.

eval is Evil

The eval function is the most misused feature of JavaScript. Avoid it.
eval has aliases. Do not use the Function constructor. Do not pass strings to setTimeout or setInterval.

Global Abatement

JavaScript makes it easy to define global variables that can hold all of the assets of your application.
Unfortunately, global variables weaken the resiliency of programs and should be avoided.

One way to minimize the use of global variables is to create a single global variable
for your application:

var KimsGlobal = {};

That variable then becomes the container for your application:

KimsGlobal.Calculate = {
    calculatorOutPutArray: [
    $('.step1i1'),
    $('.step1i2'),
    $('.step1r1'),
    ...
    ],
    jQueryObjectCounter: 0,
    jQueryDOMElementCounter: 0
    ...
};

KimsGlobal.NavigateCalculator = {
    currentStepId: stepId,
    step: 'step' + stepId,
    stepSelector: = '#' + step,
    ...
};

By reducing your global footprint to a single name, you significantly reduce the chance of bad interactions with other applications, widgets, or libraries.
Your program also becomes easier to read because it is obvious that KimsGlobal.Calculate refers to a top-level structure.

Using closure for information hiding is another effective global abatement technique.

var digit_name = (function() {
  var names = ['zero', 'one', 'two', ...];
  return function (n) {
    return names[n];
  };
})();

Using the Module patterns explains this in detail.

Scoping

JavaScript scoping is different to classical languages, and can take some getting used to for programmers used to languages such as C, C++, C#, Java.
Classical languages like the before mentioned have block scope.
JavaScript has function scope. Although ES6 is bringing in block scoping with the let keyword.

In the following example “10” will be alerted.
var foo = 1; // foo is defined in global scope.
function bar() {
    if (!foo) { // The foo variable of the bar scope has been hoisted directly above this if statement, but the assignment has not. So it is unassigned (undefined).
        var foo = 10;
    }
    alert(foo);
}
bar();
In the following example “1” will be alerted.
var a = 1;
function b() {
    a = 10;
    return;
    function a() {}
}
b();
alert(a);
In the following example Firebug will show 1, 2, 2.
var x = 1;
console.log(x); // 1
if (true) {
    var x = 2;
    console.log(x); // 2
}
console.log(x); // 2

In JavaScript, blocks such as if statements, do not create new scope. Only functions create new scope.

There is a workaround though 😉
JavaScript has Closure.
If you need to create a temporary scope within a function, do the following.

function foo() {
    var x = 1;
    if (x) {
        (function () {
            var x = 2;
            // some other code
        }());
    }
    // x is still 1.
}

Line 3: begins a closure
Line 6: the closure invokes itself with ()

Hoisting

Terminology

function declaration or function statement are the same thing.
function expression or variable declaration with function assignment are the same thing.

A function statement looks like the following:

function foo( ) {}

A function expression looks like the following:

var foo = function foo( ) {};

A function expression must not start with the word “function”.

//anonymous function expression
var a = function () {
    return 3;
}

//named function expression
var a = function bar() {
    return 3;
}

//self invoking named function expression. This is also a closure
(function sayHello() {
    alert('hello!');
})();

//self invoking anonymous function expression. This is also a closure
(function ( ) {
    var hidden_variable;
    // This function can have some impact on
    // the environment, but introduces no new
    // global variables.
}() );

In JavaScript, a name enters a scope in one of four basic ways:

  1. Language-defined: All scopes are, by default, given the names this and arguments.
  2. Formal parameters: Functions can have named formal parameters, which are scoped to the body of that function.
  3. Function declarations: These are of the form function foo() {}.
  4. Variable declarations: These take the form var foo;.

Function declarations and variable declarations are always hoisted invisibly to the top of their containing scope by the JavaScript interpreter.
Function parameters and language-defined names are, obviously, already there. This means that code like this:

function foo() {
    bar();
    var x = 1;
}

Is actually interpreted like this:

function foo() {
    var x;
    bar();
    x = 1;
}

It turns out that it doesn’t matter whether the line that contains the declaration would ever be executed. The following two functions are equivalent:

function foo() {
    if (false) {
        var x = 1;
    }
    return;
    var y = 1;
}
function foo() {
    var x, y;
    if (false) {
        x = 1;
    }
    return;
    y = 1;
}

The assignment portion of the declaration is not hoisted.
Only the identifier is hoisted.
This is not the case with function declarations, where the entire function body will be hoisted as well,
but remember that there are two normal ways to declare functions. Consider the following JavaScript:

function test() {
    foo(); // TypeError 'foo is not a function'
    bar(); // 'this will run!'
    var foo = function () { // function expression assigned to local variable 'foo'
        alert('this won't run!');
    }
    function bar() { // function declaration, given the name 'bar'
        alert('this will run!');
    }
}
test();

In this case, only the function declaration has its body hoisted to the top. The name ‘foo’ is hoisted, but the body is left behind, to be assigned during execution.

Name Resolution Order

The most important special case to keep in mind is name resolution order. Remember that there are four ways for names to enter a given scope. The order I listed them above is the order they are resolved in. In general, if a name has already been defined, it is never overridden by another property of the same name. This means that a function declaration takes priority over a variable declaration. This does not mean that an assignment to that name will not work, just that the declaration portion will be ignored. There are a few exceptions:

  • The built-in name arguments behaves oddly. It seems to be declared following the formal parameters, but before function declarations. This means that a formal parameter with the name arguments will take precedence over the built-in, even if it is undefined. This is a bad feature. Don’t use the name arguments as a formal parameter.
  • Trying to use the name this as an identifier anywhere will cause a Syntax Error. This is a good feature.
  • If multiple formal parameters have the same name, the one occurring latest in the list will take precedence, even if it is undefined.

Additional hoisting examples on my blog

Now that you understand scoping and hoisting, what does that mean for coding in JavaScript?
The most important thing is to always declare your variables with var statements.
Declare your variables at the top of the scope (as already mentioned JavaScript only has function scope). See the Variable Declarations section.
If you force yourself to do this, you will never have hoisting-related confusion.
However, doing this can make it hard to keep track of which variables have actually been declared in the current scope.
I recommend using strict mode which will inform you if you have tried to use a variable without declaring it with var. If you’ve done all of this, your code should look something like this:

function foo(a, b, c) {
    'use strict';
    var x = 1;
    var bar;
    var baz = 'something';
    // other non hoistable code here
}

Inheritance

Always prefer Prototypal inheritance to Pseudo classical.

There are quite a few reasons why we shouldn’t use Pseudo classical inheritance in JavaScript.
JavaScript The Good Parts explains why.

In a purely prototypal pattern, we dispense with classes.
We focus instead on the objects.
Prototypal inheritance is conceptually simpler than classical inheritance.

I’ll show you three examples of prototypal inheritance, and explain the flaws and why the third attempt is the better way.

Example 1
function object(o) {
    function F() {}
    F.prototype = o;
    return new F();
}

The object function takes an existing object as a parameter and returns an empty new object that inherits from the old one.
The problem with the object function is that it is global, and globals are clearly problematic.

Example 2
Object.prototype.begetObject = function () {
    function F() {}
    F.prototype = this;
    return new F();
};

newObject = oldObject.begetObject();

The problem with Object.prototype.begetObject is that it trips up incompetent programs, and it can produce unexpected results when begetObject is overridden.

Example 3
if (typeof Object.create !== 'function') {
    Object.create = function (o) {
        function F() {}
        F.prototype = o;
        return new F();
    };
}
newObject = Object.create(oldObject);

Example 3 overcomes the problems with the previous prototypical examples.
This is how Object.create works in ES5

New

Use {} instead of new Object(). Use [] instead of new Array(). There are instances where using new allows compiler optimisations to be performed. Learn what happens when you use new in different scenarios and test.

Operators

Plus Minus

Be careful to not follow a + with + or ++. This pattern can be confusing. Insert parens between them to make your intention clear.

total = subtotal + +myInput.value;

is better written as

total = subtotal + (+myInput.value);

so that the + + is not misread as ++.

Equality
By default use the === operator rather than the == operator.
By default use the !== operator rather than the != operator.

JavaScript has two sets of equality operators: === and !==, and their (as Douglas Crockford puts it) evil twins == and
!=. The === and !== ones work the way you would expect. If the two operands are of the
same type and have the same value, then === produces true and !== produces false.
The other ones do the right thing when the operands are of the same type, but if they
are of different types, they attempt to coerce the values.
Make sure this is what you need rather than just blindly using the shorter form.
These are some of the interesting cases:

'' == '0'          // false
0 == ''            // true
0 == '0'           // true
false == 'false'   // false
false == '0'       // true
false == undefined // false
false == null      // false
null == undefined  // true
' \t\r\n ' == 0    // true

Patterns

Module

Module presents an interface but hides its state and implementation.
Takes advantage of function scope and closure to create relationships that are binding and private.
Eliminate the use of global variables. It promotes information hiding and other good design practices.

Global Import

JavaScript has a feature known as implied globals.
Whenever a name is used, the interpreter walks the scope chain backwards looking for a var statement for that name.
If none is found, that variable is assumed to be global.
If it’s used in an assignment, the global is created if it doesn’t already exist.
This means that using or creating global variables in an anonymous closure is easy.
Unfortunately, this leads to hard-to-manage code, as it’s not obvious (to humans) which variables are global in a given file.

Luckily, our anonymous function provides an easy alternative.
By passing globals as parameters to our anonymous function, we import them into our code, which is both clearer and faster than implied globals.

(function ($, YAHOO) {
    // now have access to globals jQuery (as $) and YAHOO in this code
}(jQuery, YAHOO));
Module Export

Sometimes you don’t just want to use globals, but you want to declare them. We can easily do this by exporting them, using the anonymous function’s return value.

var MODULE = (function () {
    var my = {},
        privateVariable = 1;

    function privateMethod() {
        // ...
    }

    my.moduleProperty = 1;
    my.moduleMethod = function () {
        // ...
    };

    return my;
}());

Notice that we’ve declared a global module named MODULE, with two public properties:
a method named MODULE.moduleMethodand a variable named MODULE.moduleProperty.
In addition, it maintains private internal state using the closure of the anonymous function.
Also, we can easily import needed globals, using the pattern we learned above.

Augmentation

One limitation of the module pattern so far is that the entire module must be in one file.
Anyone who has worked in a large code-base understands the value of splitting among multiple files.
Luckily, we have a nice solution to augment modules.
First, we import the module, then we add properties, then we export it.
Here’s an example, augmenting our MODULE from above:

var MODULE = (function (my) {
    my.anotherMethod = function () {
        // added method...
    };

    return my;
}(MODULE));

Use the var keyword again.
After this code has run, our module will have gained a new public method named MODULE.anotherMethod.
This augmentation file will also maintain its own private internal state and imports.

Loose Augmentation

While our example above requires our initial module creation to be first, and the augmentation to happen second, that isn’t always necessary.
One of the best things a JavaScript application can do for performance is to load scripts asynchronously.
We can create flexible multi-part modules that can load themselves in any order with loose augmentation.
Each file should have the following structure:

var MODULE = (function (my) {
    // add capabilities...

    return my;
}(MODULE || {}));

The import will create the module if it doesn’t already exist.
This means you can use a library like require.js and load all of your module files in parallel, without needing to block.
preferably not more than 2 to 3 at a time, else performance will degrade

Tight Augmentation

While loose augmentation is great, it does place some limitations on your module.
Most importantly, you cannot override module properties safely.
You also cannot use module properties from other files during initialization (but you can at run-time after intialization).
Tight augmentation implies a set loading order, but allows overrides.
Here is a simple example (augmenting our original MODULE):

var MODULE = (function (my) {
    var old_moduleMethod = my.moduleMethod;

    my.moduleMethod = function () {
        // method override, has access to old through old_moduleMethod...
    };

    return my;
}(MODULE));

Here we’ve overridden MODULE.moduleMethod, but maintain a reference to the original method, if needed.

Cloning and Inheritance
var MODULE_TWO = (function (old) {
    var my = {},
    var key;

    for (key in old) {
        if (old.hasOwnProperty(key)) {
            my[key] = old[key];
        }
    }

    var super_moduleMethod = old.moduleMethod;
    my.moduleMethod = function () {
        // override method on the clone, access to super through super_moduleMethod
    };

    return my;
}(MODULE));

This pattern is perhaps the least flexible option. It does allow some neat compositions, but that comes at the expense of flexibility.
As I’ve written it, properties which are objects or functions will not be duplicated, they will exist as one object with two references.
Changing one will change the other.
This could be fixed for objects with a recursive cloning process, but probably cannot be fixed for functions, except perhaps with eval.

Sub-modules
MODULE.sub = (function () {
    var my = {};
    // ...

    return my;
}());

Statements

Simple Statements

Each line should contain at most one statement.
Put a ; (semicolon) at the end of every simple statement.
Note that an assignment statement which is assigning a function literal or object literal is still an assignment statement and must end with a semicolon.

JavaScript allows any expression to be used as a statement.
This can mask some errors, particularly in the presence of semicolon insertion.
The only expressions that should be used as statements are assignments and invocations.

Compound Statements

Compound statements are statements that contain lists of statements enclosed in { } (curly braces).

  • The enclosed statements should be indented four more spaces.
  • The { (left curly brace) should be at the end of the line that begins the compound statement.
  • The } (right curly brace) should begin a line and be indented to align with the beginning of the line containing the matching { (left curly brace).
  • Braces should be used around all statements, even single statements, when they are part of a control structure, such as an if or for statement. This makes it easier to add statements without accidentally introducing bugs.
Labels
Say no to labelling Statement labels are optional. Only these statements should be labelled: while, do, for, switch. Or better still don’t use labelling.
return Statement

A return statement with a value should not use ( ) (parentheses) around the value.
The return value expression must start on the same line as the return keyword in order to avoid semicolon insertion.

if Statement

The if class of statements should have the following form:

if (condition) {
    statements
}

if (condition) {
    statements
} else {
    statements
}

if (condition) {
    statements
} else if (condition) {
    statements
} else {
    statements
}

Avoid doing assignments in the condition part of if and while statements.

Is

if (a = b) {

a correct statement? Or was

if (a === b) {

intended? Avoid constructs that cannot easily be determined to be correct.

Also see the Equality section

for Statement

A for class of statements should have the following form:

for (initialisation; condition; update) {
    statements
}

for (variable in object) {
    if (filter) {
        statements
    }
}

The first form should be used with arrays and with loops of a predetermined number of iterations.

The second form should be used with objects.
Be aware that members that are added to the prototype of the object will be included in the enumeration.
It is wise to program defensively by using the hasOwnProperty method to distinguish the true members of the object:

for (variable in object) {
    if (object.hasOwnProperty(variable)) {
        statements
    }
}
while Statement

A while statement should have the following form:

while (condition) {
    statements
}
do Statement

A do statement should have the following form:

do {
    statements
} while (condition);

Unlike the other compound statements, the do statement always ends with a ; (semicolon).

switch Statement

A switch statement should have the following form:

switch (expression) {
case expression:
    statements
default:
    statements
}

Each case is aligned with the switch. This avoids over-indentation.
Each group of statements (except the default) should end with break, return, or throw. Do not fall through.

Or better, use the following form from Angus Crolls blog post

Procedural way to do the construct

var whatToBring;
switch(weather) {
    case 'Sunny':
        whatToBring = 'Sunscreen and hat';
        break;
    case 'Rain':
        whatToBring  ='Umbrella and boots';
        break;
    case 'Cold':
        whatToBring = 'Scarf and Gloves';
        break;
    default : whatToBring = 'Play it by ear';
}

OO way to do the construct

var whatToBring = {
    'Sunny' : 'Sunscreen and hat',
    'Rain' : 'Umbrella and boots',
    'Cold' : 'Scarf and Gloves',
    'Default' : 'Play it by ear'
}

var gear = whatToBring[weather] || whatToBring['Default'];
try Statement

The try class of statements should have the following form:

try {
    statements
} catch (variable) {
    statements
}

try {
    statements
} catch (variable) {
    statements
} finally {
    statements
}
continue Statement

Avoid use of the continue statement. It tends to obscure the control flow of the function.

with Statement

Why it shouldn’t be used.

Why it should be used.

with statement Both points are valid. Understand how it works before you use it. It aint going to work with strict mode anyway, for good reason IMHO.

Function Declarations

  • All functions should be declared before they are used.
  • Inner functions should follow the var statement. This helps make it clear what variables are included in its scope.
  • There should be no space between the name of a function and the ( (left parenthesis) of its parameter list.
  • There should be one space between the ) (right parenthesis) and the { (left curly brace) that begins the statement body.
    The body itself is indented four spaces.
    The } (right curly brace) is aligned with the line containing the beginning of the declaration of the function.

This convention works well with JavaScript because in JavaScript, functions and object literals can be placed anywhere that an expression is allowed.
It provides the best readability with inline functions and complex structures.

function getElementsByClassName(className) {
    var results = [];
    walkTheDOM(document.body, function (node) {
        var a; // array of class names
        var c = node.className; // the node's classname
        var i; // loop counter

// If the node has a class name, then split it into a list of simple names.
// If any of them match the requested name, then append the node to the set of results.

        if (c) {
            a = c.split(' ');
            for (i = 0; i < a.length; i += 1) {
                if (ai === className) {
                    results.push(node);
                    break;
                }
            }
        }
    });
    return results;
}

If a function literal is anonymous, there should be one space between the word function and the ( (left parenthesis).
If the space is omitted, then it can appear that the function’s name is function, which is an incorrect reading.

div.onclick = function (e) {
   return false;
};

that = {
    method: function () {
         return this.datum;
    },
    datum: 0
};

When a function is to be invoked immediately,
the entire invocation expression should be wrapped in parens so that it is clear that the value being produced is the result of the function and not the function itself.

var collection = (function () {
    var keys = [], values = [];

    return {
    get: function (key) {
        var at = keys.indexOf(key);
        if (at >= 0) {
             return values[at];
        }
    },
    set: function (key, value) {
        var at = keys.indexOf(key);
        if (at < 0) {
            at = keys.length;
        }
        keysat = key;
        valuesat = value;
    },
    remove: function (key) {
        var at = keys.indexOf(key);
        if (at >= 0) {
            keys.splice(at, 1);
            values.splice(at, 1); }
        }
    };
}());

Variable Declarations

All variables should be declared before used.
JavaScript does not require this, but doing so makes the program easier to read and makes it easier to detect undeclared variables that become implied globals.
If you get into the habit of using strict mode, you’ll get pulled up on this anyway.
Implied global variables should never be used.

The var statements should be the first statements in the function body.

It is preferred that each variable be given its own line. They should be listed in alphabetical order.

var currentSelectedTableEntry;
var indentationlevel;
var sizeOfTable;

JavaScript does not have block scope (see Scope section), so defining variables in blocks can confuse programmers who are experienced with other C family languages.

  • Define all variables at the top of the function.
  • Use of global variables should be minimized.
  • Implied global variables should never be used.

References

http://javascript.crockford.com/code.html

http://www.webreference.com/programming/javascript/prototypal_inheritance/

http://javascript.crockford.com/prototypal.html

http://www.crockford.com/javascript/inheritance.html

http://www.adequatelygood.com/2010/3/JavaScript-Module-Pattern-In-Depth

Sanitising User Input from Browser. part 2

November 16, 2012

Untrusted data (data entered by a user), should always be treated as though it contains attack code.
This data should not be sent anywhere without taking the necessary steps to detect and neutralise the malicious code.
With applications becoming more interconnected, attacks being buried in user input and decoded and/or executed by a downstream interpreter is becoming all the more common.
Input validation, that’s restricting user input to allow only certain white listed characters and restricting field lengths are only two forms of defence.
Any decent attacker can get around client side validation, so you need to employ defence in depth.
validation and escaping also needs to be performed on the server side.

Leveraging existing libraries

  1. Microsofts AntiXSS is not extensible,
    it doesn’t allow the user to define their own whitelist.
    It didn’t allow me to add behaviour to the routines.
    I want to know how many instances of HTML encoded values there were.
    There was certainly a lot of code in there, but I didn’t find it very useful.
  2. The OWASP encoding project (Reform)(as mentioned in part 1 of this series).
    This is quite a useful set of projects for different technologies.
  3. System.Net.WebUtility from the System.Web.dll.
    Now this did most of what I needed other than provide me with fine grained information of what had been tampered with.
    So I took it and extended it slightly.
    We hadn’t employed AOP at this stage and it wasn’t considered important enough to invest the time to do so.
    So it was a matter of copy past modify.

What’s the point in client side validation if the server has to do it again anyway?

Now there are arguments both ways for this.
My current take on this for the project in question was:
If you only have server side validation, the client side is less responsive and user friendly.
If you only have client side validation, it’s out of our control.
This also gives fuel to the argument of using JavaScript on the client and server side (with the likes of node.js).
So the same code can be used both sides without having to code the same validation in two different languages.
Personally I find writing validation code easier using JavaScript than C#.
This maybe just because I’ve been writing considerably more JavaScript than C# lately though.

The code

I drew a sequence diagram of how this should work, but it got lost in a move.
So I wasn’t keen on doing it again, as the code had already been done.
In saying that, the code has reasonably good documentation (I think).
Code is king, providing it has been written to be read.
If you notice any of the escaping isn’t quite making sense, it could be the blogging engine either doing what it’s meant to, or not doing what it’s meant to.
I’ve been over the code a few times, but I may have missed something.
Shout out if anything’s not clear.

First up, we’ll look at the custom exceptions as we’ll need those soon.

using System;

namespace Common.WcfHelpers.ErrorHandling.Exceptions
{
    public abstract class WcfException : Exception
    {
        /// <summary>
        /// In order to set the message for the client, set it here, or via the property directly in order to over ride default value.
        /// </summary>
        /// <param name="message">The message to be assigned to the Exception's Message.</param>
        /// <param name="innerException">The exception to be assigned to the Exception's InnerException.</param>
        /// <param name="messageForClient">The client friendly message. This parameter is optional, but should be set.</param>
        public WcfException(string message, Exception innerException = null, string messageForClient = null) : base(message, innerException)
        {
            MessageForClient = messageForClient;
        }

        /// <summary>
        /// This is the message that the service's client will see.
        /// Make sure it is set in the constructor. Or here.
        /// </summary>
	    public string MessageForClient
        {
            get { return string.IsNullOrEmpty(_messageForClient) ? "The MessageForClient property of WcfException was not set" : _messageForClient; }
            set { _messageForClient = value; }
        }
        private string _messageForClient;
    }
}

And the more specific SanitisationWcfException

using System;
using System.Configuration;

namespace Common.WcfHelpers.ErrorHandling.Exceptions
{
    /// <summary>
    /// Exception class that is used when the user input sanitisation fails, and the user needs to be informed.
    /// </summary>
    public class SanitisationWcfException : WcfException
    {
        private const string _defaultMessageForClient = "Answers were NOT saved. User input validation was unsuccessful.";
        public string UnsanitisedAnswer { get; private set; }

        /// <summary>
        /// In order to set the message for the client, set it here, or via the property directly in order to over ride default value.
        /// </summary>
        /// <param name="message">The message to be assigned to the Exception's Message.</param>
        /// <param name="innerException">The Exception to be assigned to the base class instance's inner exception. This parameter is optional.</param>
        /// <param name="messageForClient">The client friendly message. This parameter is optional, but should be set.</param>
        /// <param name="unsanitisedAnswer">The user input string before service side sanitisatioin is performed.</param>
        public SanitisationWcfException
        (
            string message,
            Exception innerException = null,
            string messageForClient = _defaultMessageForClient,
            string unsanitisedAnswer = null
        )
            : base(
                message,
                innerException,
                messageForClient + " If this continues to happen, please contact " + ConfigurationManager.AppSettings["SupportEmail"] + Environment.NewLine
                )
        {
            UnsanitisedAnswer = unsanitisedAnswer;
        }
    }
}

Now as we define whether our requirements are satisfied by way of executable requirements (unit tests(in their rawest form))
Lets write some executable specifications.

using NUnit.Framework;
using Common.Security.Sanitisation;

namespace Common.Security.Encoding.UnitTest
{
    [TestFixture]
    public class ExtensionsTest
    {

        private readonly string _inNeedOfEscaping = @"One #x2F / two amp & three #x27 ' four lt < five quot "" six gt >.";
        private readonly string _noNeedForEscaping = @"One x2F two amp three x27 four lt five quot six gt       .";

        [Test]
        public void SingleDecodeDoubleEncodedHtml_ShouldSingleDecodeDoubleEncodedHtml()
        {
            string doubleEncodedHtml = @"";               // between the ""'s we have a string of Html with double escaped values like &amp;#x27; user entered text &amp;#x2F.
            string singleEncodedHtmlShouldLookLike = @""; // between the ""'s we have a string of Html with single escaped values like ' user entered text &#x2F.
            // In the above, the bloging engine is escaping the sinlge escaped entity encoding, so all you'll see is the entity it self.
            // but it should look like the double encoded entity encodings without the first &amp->;


            string singleEncodedHtml = doubleEncodedHtml.SingleDecodeDoubleEncodedHtml();
            
            Assert.That(singleEncodedHtml, Is.EqualTo(singleEncodedHtmlShouldLookLike));
        }

        [Test]
        public void Extensions_CompliesWithWhitelist_ShouldNotComply()
        {
            Assert.That(_inNeedOfEscaping.CompliesWithWhitelist(whiteList: @"^[\w\s\.,]+$"), Is.False);
        }

        [Test]
        public void Extensions_CompliesWithWhitelist_ShouldComply()
        {
            Assert.That(_noNeedForEscaping.CompliesWithWhitelist(whiteList: @"^[\w\s\.,]+$"), Is.True);
            Assert.That(_inNeedOfEscaping.CompliesWithWhitelist(whiteList: @"^[\w\s\.,#/&'<"">]+$"), Is.True);
        }
    }
}

Now the code that satisfies the above executable specifications, and more.

using System;
using System.Collections.Generic;
using System.Globalization;
using System.IO;
using System.Text.RegularExpressions;

namespace Common.Security.Sanitisation
{
    /// <summary>
    /// Provides a series of extension methods that perform sanitisation.
    /// Escaping, unescaping, etc.
    /// Usually targeted at user input, to help defend against the likes of XSS and other injection attacks.
    /// </summary>
    public static class Extensions
    {

        private const int CharacterIndexNotFound = -1;

        /// <summary>
        /// Returns a new string in which all occurrences of a double escaped html character (that's an html entity immediatly prefixed with another html entity)
        /// in the current instance are replaced with the single escaped character.
        /// </summary>
        /// <param name="source">The target text used to strip one layer of Html entity encoding.</param>
        /// <returns>The singly escaped text.</returns>
        public static string SingleDecodeDoubleEncodedHtml(this string source)
        {
            return source.Replace("&amp;#x", "&#x");
        }
        /// <summary>
        /// Filter a text against a regular expression whitelist of specified characters.
        /// </summary>
        /// <param name="target">The text that is filtered using the whitelist.</param>
        /// <param name="alternativeTarget"></param>
        /// <param name="whiteList">Needs to be be assigned a valid whitelist, otherwise nothing gets through.</param>
        public static bool CompliesWithWhitelist(this string target, string alternativeTarget = "", string whiteList = "")
        {
            if (string.IsNullOrEmpty(target))
                target = alternativeTarget;
            
            return Regex.IsMatch(target, whiteList);
        }
        /// <summary>
        /// Takes a string and returns another with a single layer of Html entity encoding replaced with it's Html entity literals.
        /// </summary>
        /// <param name="encodedUserInput">The text to perform the opperation on.</param>
        /// <param name="numberOfEscapes">The number of Html entity encodings that were replaced.</param>
        /// <returns>The text that's had a single layer of Html entity encoding replaced with it's Html entity literals.</returns>
        public static string HtmlDecode(this string encodedUserInput, ref int numberOfEscapes)
        {
            const int NotFound = -1;

            if (string.IsNullOrEmpty(encodedUserInput))
                return string.Empty;

            StringWriter output = new StringWriter(CultureInfo.InvariantCulture);
            
            if (encodedUserInput.IndexOf('&') == NotFound)
            {
                output.Write(encodedUserInput);
            }
            else
            {
                int length = encodedUserInput.Length;
                for (int index1 = 0; index1 < length; ++index1)
                {
                    char ch1 = encodedUserInput[index1];
                    if (ch1 == 38)
                    {
                        int index2 = encodedUserInput.IndexOfAny(_htmlEntityEndingChars, index1 + 1);
                        if (index2 > 0 && encodedUserInput[index2] == 59)
                        {
                            string entity = encodedUserInput.Substring(index1 + 1, index2 - index1 - 1);
                            if (entity.Length > 1 && entity[0] == 35)
                            {
                                ushort result;
                                if (entity[1] == 120 || entity[1] == 88)
                                    ushort.TryParse(entity.Substring(2), NumberStyles.AllowHexSpecifier, NumberFormatInfo.InvariantInfo, out result);
                                else
                                    ushort.TryParse(entity.Substring(1), NumberStyles.AllowLeadingWhite | NumberStyles.AllowTrailingWhite | NumberStyles.AllowLeadingSign, NumberFormatInfo.InvariantInfo, out result);
                                if (result != 0)
                                {
                                    ch1 = (char)result;
                                    numberOfEscapes++;
                                    index1 = index2;
                                }
                            }
                            else
                            {
                                index1 = index2;
                                char ch2 = HtmlEntities.Lookup(entity);
                                if ((int)ch2 != 0)
                                {
                                    ch1 = ch2;
                                    numberOfEscapes++;
                                }
                                else
                                {
                                    output.Write('&');
                                    output.Write(entity);
                                    output.Write(';');
                                    continue;
                                }
                            }
                        }
                    }
                    output.Write(ch1);
                }
            }
            string decodedHtml = output.ToString();
            output.Dispose();
            return decodedHtml;
        }
        /// <summary>
        /// Escapes all character entity references (double escaping where necessary).
        /// Why? The XmlTextReader that is setup in XmlDocument.LoadXml on the service considers the character entity references (&#xxxx;) to be the character they represent.
        /// All XML is converted to unicode on reading and any such entities are removed in favor of the unicode character they represent.
        /// </summary>
        /// <param name="unencodedUserInput">The string that needs to be escaped.</param>
        /// <param name="numberOfEscapes">The number of escapes applied.</param>
        /// <returns>The escaped text.</returns>
        public static unsafe string HtmlEncode(this string unencodedUserInput, ref int numberOfEscapes)
        {
            if (string.IsNullOrEmpty(unencodedUserInput))
                return string.Empty;

            StringWriter output = new StringWriter(CultureInfo.InvariantCulture);
            
            if (output == null)
                throw new ArgumentNullException("output");
            int num1 = IndexOfHtmlEncodingChars(unencodedUserInput);
            if (num1 == -1)
            {
                output.Write(unencodedUserInput);
            }
            else
            {
                int num2 = unencodedUserInput.Length - num1;
                fixed (char* chPtr1 = unencodedUserInput)
                {
                    char* chPtr2 = chPtr1;
                    while (num1-- > 0)
                        output.Write(*chPtr2++);
                    while (num2-- > 0)
                    {
                        char ch = *chPtr2++;
                        if (ch <= 62)
                        {
                            switch (ch)
                            {
                                case '"':
                                    output.Write(""");
                                    numberOfEscapes++;
                                    continue;
                                case '&':
                                    output.Write("&amp;");
                                    numberOfEscapes++;
                                    continue;
                                case '\'':
                                    output.Write("&amp;#x27;");
                                    numberOfEscapes = numberOfEscapes + 2;
                                    continue;
                                case '<':
                                    output.Write("<");
                                    numberOfEscapes++;
                                    continue;
                                case '>':
                                    output.Write(">");
                                    numberOfEscapes++;
                                    continue;
                                case '/':
                                    output.Write("&amp;#x2F;");
                                    numberOfEscapes = numberOfEscapes + 2;
                                    continue;
                                default:
                                    output.Write(ch);
                                    continue;
                            }
                        }
                        if (ch >= 160 && ch < 256)
                        {
                            output.Write("&#");
                            output.Write(((int)ch).ToString(NumberFormatInfo.InvariantInfo));
                            output.Write(';');
                            numberOfEscapes++;
                        }
                        else
                            output.Write(ch);
                    }
                }
            }
            string encodedHtml = output.ToString();
            output.Dispose();
            return encodedHtml;
        }

 

        private static unsafe int IndexOfHtmlEncodingChars(string searchString)
        {
            int num = searchString.Length;
            fixed (char* chPtr1 = searchString)
            {
                char* chPtr2 = (char*)((UIntPtr)chPtr1);
                for (; num > 0; --num)
                {
                    char ch = *chPtr2;
                    if (ch <= 62)
                    {
                        switch (ch)
                        {
                            case '"':
                            case '&':
                            case '\'':
                            case '<':
                            case '>':
                            case '/':
                                return searchString.Length - num;
                        }
                    }
                    else if (ch >= 160 && ch < 256)
                        return searchString.Length - num;
                    ++chPtr2;
                }
            }
            return CharacterIndexNotFound;
        }

        private static char[] _htmlEntityEndingChars = new char[2]
        {
            ';',
            '&'
        };
        private static class HtmlEntities
        {
            private static string[] _entitiesList = new string[253]
            {
                "\"-quot",
                "&-amp",
                "'-apos",
                "<-lt",
                ">-gt",
                " -nbsp",
                "¡-iexcl",
                "¢-cent",
                "£-pound",
                "¤-curren",
                "¥-yen",
                "¦-brvbar",
                "§-sect",
                "¨-uml",
                "©-copy",
                "ª-ordf",
                "«-laquo",
                "¬-not",
                "\x00AD-shy",
                "®-reg",
                "¯-macr",
                "°-deg",
                "±-plusmn",
                "\x00B2-sup2",
                "\x00B3-sup3",
                "´-acute",
                "µ-micro",
                "¶-para",
                "·-middot",
                "¸-cedil",
                "\x00B9-sup1",
                "º-ordm",
                "»-raquo",
                "\x00BC-frac14",
                "\x00BD-frac12",
                "\x00BE-frac34",
                "¿-iquest",
                "À-Agrave",
                "Á-Aacute",
                "Â-Acirc",
                "Ã-Atilde",
                "Ä-Auml",
                "Å-Aring",
                "Æ-AElig",
                "Ç-Ccedil",
                "È-Egrave",
                "É-Eacute",
                "Ê-Ecirc",
                "Ë-Euml",
                "Ì-Igrave",
                "Í-Iacute",
                "Î-Icirc",
                "Ï-Iuml",
                "Ð-ETH",
                "Ñ-Ntilde",
                "Ò-Ograve",
                "Ó-Oacute",
                "Ô-Ocirc",
                "Õ-Otilde",
                "Ö-Ouml",
                "×-times",
                "Ø-Oslash",
                "Ù-Ugrave",
                "Ú-Uacute",
                "Û-Ucirc",
                "Ü-Uuml",
                "Ý-Yacute",
                "Þ-THORN",
                "ß-szlig",
                "à-agrave",
                "á-aacute",
                "â-acirc",
                "ã-atilde",
                "ä-auml",
                "å-aring",
                "æ-aelig",
                "ç-ccedil",
                "è-egrave",
                "é-eacute",
                "ê-ecirc",
                "ë-euml",
                "ì-igrave",
                "í-iacute",
                "î-icirc",
                "ï-iuml",
                "ð-eth",
                "ñ-ntilde",
                "ò-ograve",
                "ó-oacute",
                "ô-ocirc",
                "õ-otilde",
                "ö-ouml",
                "÷-divide",
                "ø-oslash",
                "ù-ugrave",
                "ú-uacute",
                "û-ucirc",
                "ü-uuml",
                "ý-yacute",
                "þ-thorn",
                "ÿ-yuml",
                "Œ-OElig",
                "œ-oelig",
                "Š-Scaron",
                "š-scaron",
                "Ÿ-Yuml",
                "ƒ-fnof",
                "\x02C6-circ",
                "˜-tilde",
                "Α-Alpha",
                "Β-Beta",
                "Γ-Gamma",
                "Δ-Delta",
                "Ε-Epsilon",
                "Ζ-Zeta",
                "Η-Eta",
                "Θ-Theta",
                "Ι-Iota",
                "Κ-Kappa",
                "Λ-Lambda",
                "Μ-Mu",
                "Ν-Nu",
                "Ξ-Xi",
                "Ο-Omicron",
                "Π-Pi",
                "Ρ-Rho",
                "Σ-Sigma",
                "Τ-Tau",
                "Υ-Upsilon",
                "Φ-Phi",
                "Χ-Chi",
                "Ψ-Psi",
                "Ω-Omega",
                "α-alpha",
                "β-beta",
                "γ-gamma",
                "δ-delta",
                "ε-epsilon",
                "ζ-zeta",
                "η-eta",
                "θ-theta",
                "ι-iota",
                "κ-kappa",
                "λ-lambda",
                "μ-mu",
                "ν-nu",
                "ξ-xi",
                "ο-omicron",
                "π-pi",
                "ρ-rho",
                "ς-sigmaf",
                "σ-sigma",
                "τ-tau",
                "υ-upsilon",
                "φ-phi",
                "χ-chi",
                "ψ-psi",
                "ω-omega",
                "ϑ-thetasym",
                "ϒ-upsih",
                "ϖ-piv",
                " -ensp",
                " -emsp",
                " -thinsp",
                "\x200C-zwnj",
                "\x200D-zwj",
                "\x200E-lrm",
                "\x200F-rlm",
                "–-ndash",
                "—-mdash",
                "‘-lsquo",
                "’-rsquo",
                "‚-sbquo",
                "“-ldquo",
                "”-rdquo",
                "„-bdquo",
                "†-dagger",
                "‡-Dagger",
                "•-bull",
                "…-hellip",
                "‰-permil",
                "′-prime",
                "″-Prime",
                "‹-lsaquo",
                "›-rsaquo",
                "‾-oline",
                "⁄-frasl",
                "€-euro",
                "ℑ-image",
                "℘-weierp",
                "ℜ-real",
                "™-trade",
                "ℵ-alefsym",
                "←-larr",
                "↑-uarr",
                "→-rarr",
                "↓-darr",
                "↔-harr",
                "↵-crarr",
                "⇐-lArr",
                "⇑-uArr",
                "⇒-rArr",
                "⇓-dArr",
                "⇔-hArr",
                "∀-forall",
                "∂-part",
                "∃-exist",
                "∅-empty",
                "∇-nabla",
                "∈-isin",
                "∉-notin",
                "∋-ni",
                "∏-prod",
                "∑-sum",
                "−-minus",
                "∗-lowast",
                "√-radic",
                "∝-prop",
                "∞-infin",
                "∠-ang",
                "∧-and",
                "∨-or",
                "∩-cap",
                "∪-cup",
                "∫-int",
                "∴-there4",
                "∼-sim",
                "≅-cong",
                "≈-asymp",
                "≠-ne",
                "≡-equiv",
                "≤-le",
                "≥-ge",
                "⊂-sub",
                "⊃-sup",
                "⊄-nsub",
                "⊆-sube",
                "⊇-supe",
                "⊕-oplus",
                "⊗-otimes",
                "⊥-perp",
                "⋅-sdot",
                "⌈-lceil",
                "⌉-rceil",
                "⌊-lfloor",
                "⌋-rfloor",
                "〈-lang",
                "〉-rang",
                "◊-loz",
                "♠-spades",
                "♣-clubs",
                "♥-hearts",
                "♦-diams"
            };
            private static Dictionary<string, char> _lookupTable = GenerateLookupTable();

            private static Dictionary<string, char> GenerateLookupTable()
            {
                Dictionary<string, char> dictionary = new Dictionary<string, char>(StringComparer.Ordinal);
                foreach (string str in _entitiesList)
                    dictionary.Add(str.Substring(2), str[0]);
                return dictionary;
            }

            public static char Lookup(string entity)
            {
                char ch;
                _lookupTable.TryGetValue(entity, out ch);
                return ch;
            }
        }
    }
}

You may also notice that I’ve mocked the OperationContext.
Thanks to WCFMock, a mocking framework for WCF services.
I won’t include this code, but you can get it here.
I’ve used the popular NUnit test framework and RhinoMocks for the stubbing and mocking.
Both pulled into the solution using NuGet.
Most useful documentation for RhinoMocks:
http://ayende.com/Wiki/Rhino+Mocks+3.5.ashx
http://ayende.com/wiki/Rhino+Mocks.ashx

For this project I used NLog and wrapped it.
Now you start to get an idea of how to use the sanitisation.

using System;
using System.ServiceModel;
using System.ServiceModel.Channels;
using NUnit.Framework;
using System.Configuration;
using Rhino.Mocks;
using Common.Wrapper.Log;
using MockedOperationContext = System.ServiceModel.Web.MockedOperationContext;
using Common.WcfHelpers.ErrorHandling.Exceptions;

namespace Sanitisation.UnitTest
{
    [TestFixture]
    public class SanitiseTest
    {
        private const string _myTestIpv4Address = "My.Test.Ipv4.Address";
        private readonly int _maxLengthHtmlEncodedUserInput = int.Parse(ConfigurationManager.AppSettings["MaxLengthHtmlEncodedUserInput"]);
        private readonly int _maxLengthHtmlDecodedUserInput = int.Parse(ConfigurationManager.AppSettings["MaxLengthHtmlDecodedUserInput"]);
        private readonly string _encodedUserInput_thatsMaxDecodedLength = @"One #x2F &amp;#x2F; two amp &amp; three #x27 &amp;#x27; four lt < five quot " six gt >.
One #x2F &amp;#x2F; two amp &amp; three #x27 &amp;#x27; four lt < five quot " six gt >.
One #x2F &amp;#x2F; two amp &amp; three #x27 &amp;#x27; four lt < five quot " six gt >.
One #x2F &amp;#x2F; two amp &amp; three #x27 &amp;#x27; four lt < five quot " six gt >.
One #x2F &amp;#x2F; two amp &amp; three #x27 &amp;#x27; four lt < five quot " six gt >.
One #x2F &amp;#x2F; two amp &amp; three #x27 &amp;#x27; four lt < five quot " six gt >.";
        private readonly string _decodedUserInput_thatsMaxLength = @"One #x2F / two amp & three #x27 ' four lt < five quot "" six gt >.
One #x2F / two amp & three #x27 ' four lt < five quot "" six gt >.
One #x2F / two amp & three #x27 ' four lt < five quot "" six gt >.
One #x2F / two amp & three #x27 ' four lt < five quot "" six gt >.
One #x2F / two amp & three #x27 ' four lt < five quot "" six gt >.
One #x2F / two amp & three #x27 ' four lt < five quot "" six gt >.";

        [Test]
        public void Sanitise_UserInput_WhenGivenNull_ShouldReturnEmptyString()
        {
            Assert.That(new Sanitise().UserInput(null), Is.EqualTo(string.Empty));
        }

        [Test]
        public void Sanitise_UserInput_WhenGivenEmptyString_ShouldReturnEmptyString()
        {
            Assert.That(new Sanitise().UserInput(string.Empty), Is.EqualTo(string.Empty));
        }

        [Test]
        public void Sanitise_UserInput_WhenGivenSanitisedString_ShouldReturnSanitisedString()
        {
            // Open the whitelist up in order to test the encoding without restriction.
            Assert.That(new Sanitise(whiteList: @"^[\w\s\.,#/&'<"">]+$").UserInput(_encodedUserInput_thatsMaxDecodedLength), Is.EqualTo(_encodedUserInput_thatsMaxDecodedLength));
        }
        [Test]
        [ExpectedException(typeof(SanitisationWcfException))]
        public void Sanitise_UserInput_ShouldThrowExceptionIfEscapedInputToLong()
        {
            string fourThousandAndOneCharacters = "Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand characters. Four thousand character";
            string expectedError = "The un-modified string received from the client with the following IP address: " +
                   '"' + _myTestIpv4Address + "\" " +
                   "exceeded the allowed maximum length of an escaped Html user input string. " +
                   "The maximum length allowed is: " +
                   _maxLengthHtmlEncodedUserInput +
                   ". The length was: " +
                   (_maxLengthHtmlEncodedUserInput+1) + ".";

            using(new MockedOperationContext(StubbedOperationContext))
            {
                try
                {
                    new Sanitise().UserInput(fourThousandAndOneCharacters);
                }
                catch(SanitisationWcfException e)
                {
                    Assert.That(e.Message, Is.EqualTo(expectedError));
                    Assert.That(e.UnsanitisedAnswer, Is.EqualTo(fourThousandAndOneCharacters));
                    throw;
                }
            }
        }
        [Test]
        [ExpectedException(typeof(SanitisationWcfException))]
        public void Sanitise_UserInput_DecodedUserInputShouldThrowException_WhenMaxLengthHtmlDecodedUserInputIsExceeded()
        {
            char oneCharOverTheLimit = '.';
            string expectedError =
                           "The string received from the client with the following IP address: " +
                           "\"" + _myTestIpv4Address + "\" " +
                           "after Html decoding exceded the allowed maximum length of an un-escaped Html user input string." +
                           Environment.NewLine +
                           "The maximum length allowed is: " + _maxLengthHtmlDecodedUserInput + ". The length was: " +
                           (_decodedUserInput_thatsMaxLength + oneCharOverTheLimit).Length + oneCharOverTheLimit;

            using(new MockedOperationContext(StubbedOperationContext))
            {
                try
                {
                    new Sanitise().UserInput(_encodedUserInput_thatsMaxDecodedLength + oneCharOverTheLimit);
                }
                catch(SanitisationWcfException e)
                {
                    Assert.That(e.Message, Is.EqualTo(expectedError));
                    Assert.That(e.UnsanitisedAnswer, Is.EqualTo(_encodedUserInput_thatsMaxDecodedLength + oneCharOverTheLimit));
                    throw;
                }
            }
        }
        [Test]
        public void Sanitise_UserInput_ShouldLogAndSendEmail_IfNumberOfDecodedHtmlEntitiesDoesNotMatchNumberOfEscapes()
        {
            string encodedUserInput_with6HtmlEntitiesNotEscaped = _encodedUserInput_thatsMaxDecodedLength.Replace("&amp;#x2F;", "/");
            string errorWeAreExpecting =
                "It appears as if someone has circumvented the client side Html entity encoding." + Environment.NewLine +
                "The requesting IP address was: " +
                "\"" + _myTestIpv4Address + "\" " +
                "The sanitised input we receive from the client was the following:" + Environment.NewLine +
                "\"" + encodedUserInput_with6HtmlEntitiesNotEscaped + "\"" + Environment.NewLine +
                "The same input after decoding and re-escaping on the server side was the following:" + Environment.NewLine +
                "\"" + _encodedUserInput_thatsMaxDecodedLength + "\"";
            string sanitised;
            // setup _logger
            ILogger logger = MockRepository.GenerateMock<ILogger>();
            logger.Expect(lgr => lgr.logError(errorWeAreExpecting));

            Sanitise sanitise = new Sanitise(@"^[\w\s\.,#/&'<"">]+$", logger);

            using (new MockedOperationContext(StubbedOperationContext))
            {
                // Open the whitelist up in order to test the encoding etc.
                sanitised = sanitise.UserInput(encodedUserInput_with6HtmlEntitiesNotEscaped);
            }

            Assert.That(sanitised, Is.EqualTo(_encodedUserInput_thatsMaxDecodedLength));
            logger.VerifyAllExpectations();
        }        

        private static IOperationContext StubbedOperationContext
        {
            get
            {
                IOperationContext operationContext = MockRepository.GenerateStub<IOperationContext>();
                int port = 80;
                RemoteEndpointMessageProperty remoteEndpointMessageProperty = new RemoteEndpointMessageProperty(_myTestIpv4Address, port);
                operationContext.Stub(oc => oc.IncomingMessageProperties[RemoteEndpointMessageProperty.Name]).Return(remoteEndpointMessageProperty);
                return operationContext;
            }
        }
    }
}

Now the API code that we can use to do our sanitisation.

using System;
using System.Configuration;
// Todo : KC We need time to implement DI. Should be using something like ninject.extensions.wcf.
using OperationContext = System.ServiceModel.Web.MockedOperationContext;
using System.ServiceModel.Channels;
using Common.Security.Sanitisation;
using Common.WcfHelpers.ErrorHandling.Exceptions;
using Common.Wrapper.Log;

namespace Sanitisation
{

    public class Sanitise
    {
        private readonly string _whiteList;
        private readonly ILogger _logger;
        

        private string RequestingIpAddress
        {
            get
            {
                RemoteEndpointMessageProperty remoteEndpointMessageProperty = OperationContext.Current.IncomingMessageProperties[RemoteEndpointMessageProperty.Name] as RemoteEndpointMessageProperty;
                return ((remoteEndpointMessageProperty != null) ? remoteEndpointMessageProperty.Address : string.Empty);
            }
        }
        /// <summary>
        /// Provides server side escaping of Html entities, and runs the supplied whitelist character filter over the user input string.
        /// </summary>
        /// <param name="whiteList">Should be provided by DI from the ResourceFile.</param>
        /// <param name="logger">Should be provided by DI. Needs to be an asynchronous logger.</param>
        /// <example>
        /// The whitelist can be obtained from a ResourceFile like so...
        /// <code>
        /// private Resource _resource;
        /// _resource.GetString("WhiteList");
        /// </code>
        /// </example>
        public Sanitise(string whiteList = "", ILogger logger = null)
        {
            _whiteList = whiteList;
            _logger = logger ?? new Logger();
        }
        /// <summary>
        /// 1) Check field lengths.         Client side validation may have been negated.
        /// 2) Check against white list.	Client side validation may have been negated.
        /// 3) Check Html escaping.         Client side validation may have been negated.

        /// Generic Fail actions:	Drop the payload. No point in trying to massage and save, as it won't be what the user was expecting,
        ///                         Add full error to a WCFException Message and throw.
        ///                         WCF interception reads the WCFException.MessageForClient, and sends it to the user. 
        ///                         On return, log the WCFException's Message.
        ///                         
        /// Escape Fail actions:	Asynchronously Log and email full error to support.


        /// 1) BA confirmed 50 for text, and 400 for textarea.
        ///     As we don't know the field type, we'll have to go for 400."
        ///
        ///     First we need to check that we haven't been sent some huge string.
        ///     So we check that the string isn't longer than 400 * 10 = 4000.
        ///     10 is the length of our double escaped character references.
        ///     Or, we ask the business for a number."
        ///     If we fail here, perform Generic Fail actions and don't complete the following steps.
        /// 
        ///     Convert all Html Entity Encodings back to their equivalent characters, and count how many occurrences.
        ///
        ///     If the string is longer than 400, perform Generic Fail actions and don't complete the following steps.
        /// 
        /// 2) check all characters against the white list
        ///     If any don't match, perform Generic Fail actions and don't complete the following steps.
        /// 
        /// 3) re html escape (as we did in JavaScript), and count how many escapes.
        ///     If count is greater than the count of Html Entity Encodings back to their equivalent characters,
        ///     Perform Escape Fail actions. Return sanitised string.
        /// 
        ///     If we haven't returned, return sanitised string.
        
        
        /// Performs checking on the text passed in, to verify that client side escaping and whitelist validation has already been performed.
        /// Performs decoding, and re-encodes. Counts that the number of escapes was the same, otherwise we log and send email with the details to support.
        /// Throws exception if the client side validation failed to restrict the number of characters in the escaped string we received.
        ///     This needs to be intercepted at the service.
        ///     The exceptions default message for client needs to be passed back to the user.
        ///     On return, the interception needs to log the exception's message.
        /// </summary>
        /// <param name="sanitiseMe"></param>
        /// <returns></returns>
        public string UserInput(string sanitiseMe)
        {
            if (string.IsNullOrEmpty(sanitiseMe))
                return string.Empty;

            ThrowExceptionIfEscapedInputToLong(sanitiseMe);

            int numberOfDecodedHtmlEntities = 0;
            string decodedUserInput = HtmlDecodeUserInput(sanitiseMe, ref numberOfDecodedHtmlEntities);

            if(!decodedUserInput.CompliesWithWhitelist(whiteList: _whiteList))
            {
                string error = "The answer received from client with the following IP address: " +
                    "\"" + RequestingIpAddress + "\" " +
                    "had characters that failed to match the whitelist.";
                throw new SanitisationWcfException(error);
            }

            int numberOfEscapes = 0;
            string sanitisedUserInput = decodedUserInput.HtmlEncode(ref numberOfEscapes);

            if(numberOfEscapes != numberOfDecodedHtmlEntities)
            {
                AsyncLogAndEmail(sanitiseMe, sanitisedUserInput);
            }

            return sanitisedUserInput;
        }
        /// <note>
        /// Make sure the logger is setup to log asynchronously
        /// </note>
        private void AsyncLogAndEmail(string sanitiseMe, string sanitisedUserInput)
        {
            // no need for SanitisationException

            _logger.logError(
                "It appears as if someone has circumvented the client side Html entity encoding." + Environment.NewLine +
                "The requesting IP address was: " +
                "\"" + RequestingIpAddress + "\" " +
                "The sanitised input we receive from the client was the following:" + Environment.NewLine +
                "\"" + sanitiseMe + "\"" + Environment.NewLine +
                "The same input after decoding and re-escaping on the server side was the following:" + Environment.NewLine +
                "\"" + sanitisedUserInput + "\""
                );
        }

        /// <summary>
        /// This procedure may throw a SanitisationWcfException.
        /// If it does, ErrorHandlerBehaviorAttribute will need to pass the "messageForClient" back to the client from within the IErrorHandler.ProvideFault procedure.
        /// Once execution is returned, the IErrorHandler.HandleError procedure of ErrorHandlerBehaviorAttribute
        /// will continue to process the exception that was thrown in the way of logging sensitive info.
        /// </summary>
        /// <param name="toSanitise"></param>
        private void ThrowExceptionIfEscapedInputToLong(string toSanitise)
        {
            int maxLengthHtmlEncodedUserInput = int.Parse(ConfigurationManager.AppSettings["MaxLengthHtmlEncodedUserInput"]);
            if (toSanitise.Length > maxLengthHtmlEncodedUserInput)
            {
                string error = "The un-modified string received from the client with the following IP address: " +
                    "\"" + RequestingIpAddress + "\" " +
                    "exceeded the allowed maximum length of an escaped Html user input string. " +
                    "The maximum length allowed is: " +
                    maxLengthHtmlEncodedUserInput +
                    ". The length was: " +
                    toSanitise.Length + ".";
                throw new SanitisationWcfException(error, unsanitisedAnswer: toSanitise);
            }
        }

        private string HtmlDecodeUserInput(string doubleEncodedUserInput, ref int numberOfDecodedHtmlEntities)
        {
            string decodedUserInput = doubleEncodedUserInput.HtmlDecode(ref numberOfDecodedHtmlEntities).HtmlDecode(ref numberOfDecodedHtmlEntities) ?? string.Empty;
            
            // if the decoded string is longer than MaxLengthHtmlDecodedUserInput throw
            int maxLengthHtmlDecodedUserInput = int.Parse(ConfigurationManager.AppSettings["MaxLengthHtmlDecodedUserInput"]);
            if(decodedUserInput.Length > maxLengthHtmlDecodedUserInput)
            {
                throw new SanitisationWcfException(
                    "The string received from the client with the following IP address: " +
                    "\"" + RequestingIpAddress + "\" " +
                    "after Html decoding exceded the allowed maximum length of an un-escaped Html user input string." +
                    Environment.NewLine +
                    "The maximum length allowed is: " + maxLengthHtmlDecodedUserInput + ". The length was: " +
                    decodedUserInput.Length + ".",
                    unsanitisedAnswer: doubleEncodedUserInput
                    );
            }
            return decodedUserInput;
        }
    }
}

As you can see, there’s a lot more work in the server side sanitisation than the client side.

Sanitising User Input from Browser. part 1

November 4, 2012

I was working on a web based project recently where there was no security thought about when designing, developing it.
The following outlines my experience with retrofitting security.
It’s my hope that someone will find it useful for their own implementation.

We’ll be focussing on the client side in this post (part 1) and the server side in part 2.
We’ll also cover some preliminary discussion that will set the stage for this series.

The application consists of a WCF service delivering up content to some embedding code on any page in the browser.
The content is stored as Xml in the database and transformed into Html via Xslt.

The first activity I find useful is to go through the process of Threat Modelling the Application.
This process can be quite daunting for those new to it.
Here’s a couple of references I find quite useful to get started:

https://www.owasp.org/index.php/Application_Threat_Modeling

https://www.owasp.org/index.php/Threat_Risk_Modeling#Decompose_Application

Actually this ones not bad either.

There is no single right way to do this.
The more you read and experiment, the more equipped you will be.
The idea is to think like an attacker thinks.
This may be harder for some than others, but it is essential, to cover as many potential attack vectors as possible.
Remember, there is no secure system, just varying levels of insecurity.
It will always be much harder to discover the majority of security weaknesses in your application as the person or team creating/maintaining it,
than for the person attacking it.
The Threat Modelling topic is large and I’m not going to go into it here, other than to say, you need to go into it.

Threat Agents

Work out who your Threat Agents are likely to be.
Learn how to think like they do.
Learn what skills they have and learn the skills your self.
Sometimes the skills are very non technical.
For example walking through the door of your organisation in the weekend because the cleaners (or any one with access) forgot to lock up.
Or when the cleaners are there and the technical staff are not (which is just as easy).
It happens more often than we like to believe.

Defense in Depth

To attempt to mitigate attacks, we need to take a multi layered approach (often called defence in depth).

What made me decide to start with sanitising user input from the browser anyway?
Well according to the OWASP Top 10, Injection and Cross Site Scripting (XSS) are still the most popular techniques chosen to compromise web applications.
So it makes sense if your dealing with web apps, to target the most common techniques exploited.

Now, in regards to defence in depth when discussing web applications;
If the attacker gets past the first line of defence, there should be something stopping them at the next layer and so forth.
The aim is to stop the attack as soon as possible.
This is why we focus on the UI first, and later move our focus to the application server, then to the database.
Bear in mind though, that what ever we do on the client side, can be circumvented relatively easy.
Client side code is out of our control, so it’s best effort.
Because of this, we need to perform the following not only in the browser, but as much as possible on the server side as well.

  1. Minimising the attack surface
  2. Defining maximum field lengths (validation)
  3. Determining a white list of allowable characters (validation)
  4. Escaping untrusted data, especially where you know it’s going to endup in an execution context. Even where you don’t think this is likely, it’s still possible.
  5. Using Stored Procedures / parameterised queries (not covered in this series).
  6. Least Privilege.
    Minimising the privileges assigned to every database account (not covered in this series).

Minimising the attack surface

input fields should only allow certain characters to be input.
Text input fields, textareas etc that are free form (anything is allowed) are very hard to constrain to a small white list.
input fields where ever possible should be constrained to well structured data,
like dates, social security numbers, zip codes, e-mail addresses, etc. then the developer should be able to define a very strong validation pattern, usually based on regular expressions, for validating such input. If the input field comes from a fixed set of options, like a drop down list or radio buttons, then the input needs to match exactly one of the values offered to the user in the first place.
As it was with the existing app I was working on, we had to allow just about everything in our free form text fields.
This will have to be re-designed in order to provide constrained input.

Defining maximum field lengths (validation)

This was currently being done (sometimes) in the Xml content for inputs where type="text".
Don’t worry about the inputType="single", it gets transformed.

<input id="2" inputType="single" type="text" size="10" maxlength="10" />

And if no maxlength specified in the Xml, we now specify a default of 50 in the xsl used to do the transformation.
This way we had the input where type="text" covered for the client side.
This would also have to be validated on the server side when the service received values from these inputs where type="text".

    <xsl:template match="input[@inputType='single']">
      <xsl:value-of select="@text" />
        <input name="a{@id}" type="text" id="a{@id}" class="textareaSingle">
          <xsl:attribute name="value">
            <xsl:choose>
              <xsl:when test="key('response', @id)">
                <xsl:value-of select="key('response', @id)" />
              </xsl:when>
              <xsl:otherwise>
                <xsl:value-of select="string(' ')" />
              </xsl:otherwise>
            </xsl:choose>
          </xsl:attribute>
          <xsl:attribute name="maxlength">
            <xsl:choose>
              <xsl:when test="@maxlength">
                <xsl:value-of select="@maxlength"/>
              </xsl:when>
              <xsl:otherwise>50</xsl:otherwise>
            </xsl:choose>
          </xsl:attribute>
        </input>
        <br/>
    </xsl:template>

For textareas we added maxlength validation as part of the white list validation.
See below for details.

Determining a white list of allowable characters (validation)

See bottom of this section for Update

Now this was quite an interesting exercise.
I needed to apply a white list to all characters being entered into the input fields.
A user can:

  1. type the characters in
  2. [ctrl]+[v] a clipboard full of characters in
  3. right click -> Paste

To cover all these scenarios as elegantly as possible, was going to be a bit of a challenge.
I looked at a few JavaScript libraries including one or two JQuery plug-ins.
None of them covered all these scenarios effectively.
I wish they did, because the solution I wasn’t totally happy with, because it required polling.
In saying that, I measured performance, and even bringing the interval right down had negligible effect, and it covered all scenarios.

setupUserInputValidation = function () {

  var textAreaMaxLength = 400;
  var elementsToValidate;
  var whiteList = /[^A-Za-z_0-9\s.,]/g;

  var elementValue = {
    textarea: '',
    textareaChanged: function (obj) {
      var initialValue = obj.value;
      var replacedValue = initialValue.replace(whiteList, "").slice(0, textAreaMaxLength);
      if (replacedValue !== initialValue) {
        this.textarea = replacedValue;
        return true;
      }
      return false;
    },
    inputtext: '',
    inputtextChanged: function (obj) {
      var initialValue = obj.value;
      var replacedValue = initialValue.replace(whiteList, "");
      if (replacedValue !== initialValue) {
        this.inputtext = replacedValue;
        return true;
      }
      return false;
    }
  };

  elementsToValidate = {
    textareainputelements: (function () {
      var elements = $('#page' + currentPage).find('textarea');
      if (elements.length > 0) {
        return elements;
      }
      return 'no elements found';
    } ()),
    textInputElements: (function () {
      var elements = $('#page' + currentPage).find('input[type=text]');
      if (elements.length > 0) {
        return elements;
      }
      return 'no elements found';
    } ())
  };

  // store the intervals id in outer scope so we can clear the interval when we change pages.
  userInputValidationIntervalId = setInterval(function () {
    var element;

    // Iterate through each one and remove any characters not in the whitelist.
    // Iterate through each one and trim any that are longer than textAreaMaxLength.

    for (element in elementsToValidate) {
      if (elementsToValidate.hasOwnProperty(element)) {
        if (elementsToValidate[element] === 'no elements found')
          continue;

        $.each(elementsToValidate[element], function () {
          $(this).attr('value', function () {
            var name = $(this).prop('tagName').toLowerCase();
            name = name === 'input' ? name + $(this).prop('type') : name;
            if (elementValue[name + 'Changed'](this))
              this.value = elementValue[name];
          });
        });
      }
    }
  }, 300); // milliseconds
};

Each time we change page, we clear the interval and reset it for the new page.

clearInterval(userInputValidationIntervalId);

setupUserInputValidation();

Update 2013-06-02:

Now with HTML5 we have the pattern attribute on the input tag, which allows us to specify a regular expression that the text about to be received is checked against. We can also see it here amongst the new HTML5 attributes . If used, this can make our JavaScript white listing redundant, providing we don’t have textareas which W3C has neglected to include the new pattern attribute on. I’d love to know why?

Escaping untrusted data

Escaped data will still render in the browser properly.
Escaping simply lets the interpreter know that the data is not intended to be executed,
and thus prevents the attack.

Now what we do here is extend the String prototype with a function called htmlEscape.

if (typeof Function.prototype.method !== "function") {
  Function.prototype.method = function (name, func) {
    this.prototype[name] = func;
    return this;
  };
}

String.method('htmlEscape', function () {

  // Escape the following characters with HTML entity encoding to prevent switching into any execution context,
  // such as script, style, or event handlers.
  // Using hex entities is recommended in the spec.
  // In addition to the 5 characters significant in XML (&, <, >, ", '), the forward slash is included as it helps to end an HTML entity.
  var character = {
    '&': '&amp;',
    '<': '&lt;',
    '>': '&gt;',
    '"': '&quot;',
    // Double escape character entity references.
    // Why?
    // The XmlTextReader that is setup in XmlDocument.LoadXml on the service considers the character entity references () to be the character they represent.
    // All XML is converted to unicode on reading and any such entities are removed in favor of the unicode character they represent.
    // So we double escape character entity references.
    // These now get read to the XmlDocument and saved to the database as double encoded Html entities.
    // Now when these values are pulled from the database and sent to the browser, it decodes the & and displays #x27; and/or #x2F.
    // This isn't what we want to see in the browser.
    "'": '&amp;#x27;',    // &apos; is not recommended
    '/': '&amp;#x2F;'     // forward slash is included as it helps end an HTML entity
  };

  return function () {
    return this.replace(/[&<>"'/]/g, function (c) {
      return character[c];
    });
  };
}());

This allows us to, well, html escape our strings.

element.value.htmlEscape();

In looking through here,
The only untrusted data we are capturing is going to be inserted into an Html element

tag by way of insertion into a textarea element,
or the attribute value of input elements where type="text".
I initially thought I’d have to:

  1. Html escape the untrusted data which is only being captured from textarea elements.
  2. Attribute escape the untrusted data which is being captured from the value attribute of input elements where type="text".

RULE #2 – Attribute Escape Before Inserting Untrusted Data into HTML Common Attributes of here,
mentions
“Properly quoted attributes can only be escaped with the corresponding quote.”
So I decided to test it.
Created a collection of injection attacks. None of which worked.
Turned out we only needed to Html escape for the untrusted data that was going to be inserted into the textarea element.
More on this in a bit.

Now in regards to the code comments in the above code around having to double escape character entity references;
Because we’re sending the strings to the browser, it’s easiest to single decode the double encoded Html on the service side only.
Now because we’re still focused on the client side sanitisation,
and we are going to shift our focus soon to making sure we cover the server side,
we know we’re going to have to create some sanitisation routines for our .NET service.
Because the routines are quite likely going to be static, and we’re pretty much just dealing with strings,
lets create an extensions class in a new project in a common library we’ve already got.
This will allow us to get the widest use out of our sanitisation routines.
It also allows us to wrap any existing libraries or parts of them that we want to get use of.

namespace My.Common.Security.Encoding
{
    /// <summary>
    /// Provides a series of extension methods that perform sanitisation.
    /// Escaping, unescaping, etc.
    /// Usually targeted at user input, to help defend against the likes of XSS attacks.
    /// </summary>
    public static class Extensions
    {
        /// <summary>
        /// Returns a new string in which all occurrences of a double escaped html character (that's an html entity immediatly prefixed with another html entity)
        /// in the current instance are replaced with the single escaped character.
        /// </summary>
        ///
        /// The new string.
        public static string SingleDecodeDoubleEncodedHtml(this string source)
        {
            return source.Replace("&amp;#x", "&#x");
        }
    }
}

Now when we run our xslt transformation on the service, we chain our new extension method on the end.
Which gives us back a single encoded string that the browser is happy to display as the decoded value.

return Transform().SingleDecodeDoubleEncodedHtml();

Now back to my findings from the test above.
Turns out that “Properly quoted attributes can only be escaped with the corresponding quote.” really is true.
I thought that if I entered something like the following into the attribute value of an input element where type="text",
then the first double quote would be interpreted as the corresponding quote,
and the end double quote would be interpreted as the end quote of the onmouseover attribute value.

 " onmouseover="alert(2)

What actually happens, is during the transform…

xslCompiledTransform.Transform(xmlNodeReader, args, writer, new XmlUrlResolver());

All the relevant double quotes are converted to the double quote Html entity ‘”‘ without the single quotes.

onmouseover

And all double quotes are being stored in the database as the character value.

Libraries and useful code

Microsoft Anti-Cross Site Scripting Library

OWASP Encoding Project
This is the Reform library. Supports Perl, Python, PHP, JavaScript, ASP, Java, .NET

Online escape tool supporting Html escape/unescape, Java, .NET, JavaScript

The characters that need escaping for inserting untrusted data into Html element content

JavaScript The Good Parts: pg 90 has a nice ‘entityify’ function

OWASP Enterprise Security API Used for JavaScript escaping (ESAPI4JS)

JQuery plugin

Changing encoding on html page

Cheat Sheets and Check Lists I found helpful

https://www.owasp.org/index.php/Input_Validation_Cheat_Sheet

https://www.owasp.org/index.php/OWASP_Validation_Regex_Repository

https://www.owasp.org/index.php/XSS_(Cross_Site_Scripting)_Prevention_Cheat_Sheet

https://www.owasp.org/index.php/DOM_based_XSS_Prevention_Cheat_Sheet

https://www.owasp.org/index.php/OWASP_AJAX_Security_Guidelines

If any of this is unclear, let me know and I’ll do my best to clarify. Maybe you have suggestions of how this could have been improved? Let’s spark a discussion.

JavaScript Properties

October 2, 2012

In ECMAScript 5 we now have two distinct kinds of properties.

  1. Data properties
  2. Accessor properties

A property is a named collection of attributes.
value: any JavaScript value
writable: boolean
configurable: boolean, common for both Data and Accessor
enumerable: boolean, common for both Data and Accessor
get: a function that returns a value
set: a function that takes an argument as its value

configurable

Any attempts to delete the property or change its (writable, configurable, or enumerable) attributes will fail if set to false.
if using strict mode, we get a run time error.
if not using strict mode, the behaviour is as it was with ES3,
the deletion attempt is ignored.
If set to false:
-It can not be re-set to true.
-We can change the value and writable attributes, but writable only from true to false.

enumerable

The property will be enumerated over when a for-in loop is encountered if set to true.
if using strict mode, it’s as if the property doesn’t exist, it’s ignored.

In ES5,

  • a default property descriptor; if the property is defined the old fashioned way, without using Object.defineProperty,
    the boolean attributes will all default to true.
  • A default property descriptor; if the property is defined using Object.defineProperty and the boolean attribute values not specified,
    the boolean attributes will all default to false.

I was wondering about this, as I had heard conflicting stories.
IMO this follows the Principle of least astonishment (POLA)

var obj1 = {};
var obj1PropertyDesc;
var obj2 = {};
var obj2PropertyDesc;

Object.defineProperty(obj1, 'propOnObj1', {
   value: 'value of propOnObj1' //,
   // writable: false,
   // enumerable: false,
   // configurable: false,
});

obj1PropertyDesc = Object.getOwnPropertyDescriptor(obj1, 'propOnObj1');

// obj1PropertyDesc {
//    configurable: false,
//    enumerable: false,
//    value: "value of propOnObj1",
//    writable: false
// }

obj2.propOnObj2 = 'value of propOnObj2';

obj2PropertyDesc = Object.getOwnPropertyDescriptor(obj2, 'propOnObj2');

// obj2PropertyDesc {
//    configurable: true,
//    enumerable: true,
//    value: "value of propOnObj2",
//    writable: true
// }

So in general

Properties declared the old ES3 way are configurable (can be deleted).
Properties declared using Object.defineProperty; by default are not configurable (can not be deleted).
See edge cases below.

The delete operator is used to remove a property from an object.
It does not touch properties in the prototype chain.
If you have a prototype that has a property with the same name, it will now be used when your code references the derived object’s property that no longer exists.

var objLiteral = {
   aProperty: 'value of super property'
}

var anObject = Object.create(objLiteral); // create is an ES5 method, but easy enough to replicate for ES3 implementations

anObject.aProperty = 'value of derived property';

anObject.aProperty  // 'value of derived property'
delete anObject.aProperty;
anObject.aProperty  // 'value of super property'

Edge cases

JavaScript Patterns pg 12 states “Implied globals created without var (regardless if created inside functions) can be
deleted.”
Thanks to Angus Croll for pointing this out as untrue.

obj1 = 'kims global property';
var obj1PropertyDesc;
var obj2PropertyDesc;

obj1PropertyDesc = Object.getOwnPropertyDescriptor(this, 'obj1');

// obj1PropertyDesc {
//    configurable: true,
//    enumerable: true,
//    value: "kims global property",
//    writable: true,
// }

(function (){
   obj2 = 'kims global property declared within function scope';
}());

obj2PropertyDesc = Object.getOwnPropertyDescriptor(this, 'obj2');

// obj2PropertyDesc {
//    configurable: false,
//    enumerable: true,
//    value: "kims global property declared within function scope",
//    writable: true
// }

delete obj2;
// Nope, obj2 was not deleted.
// turn strict mode on and we get the following error:
// Uncaught SyntaxError: Delete of an unqualified identifier in strict mode.

When you declare a global,
you are actually defining a property of the global object.
If you use the var keyword on that global, you are still creating a property.
That property is non-configurable (can not be deleted with the delete operator).
Only object properties with the configurable option set to true can be deleted.
Nothing else can be deleted.
Variables which are properties that we can’t access their property descriptor, can never be deleted.

var obj1 = {};
var obj1PropertyDesc;

obj1PropertyDesc = Object.getOwnPropertyDescriptor(this, 'obj1');

// obj1PropertyDesc {
//    configurable: false
//    enumerable: true
//    value: Object
//    writable: true
// }

There are a couple of notable internal properties that are found on all ES3 and ES5 objects.
[[Get]] and [[Put]].
The Ecma specs enclose internal properties in double square brackets as a convention only.
In ES3 [[Get]] and [[Put]] are used to return and set the internal [[Value]] property.
According to the Ecma5 spec, the internal [[Get]] and [[Put]] properties appear to do the same thing, although it’s not stated explicitly.
This may just be an oversight of the spec.

Accessor Properties

All the examples so far have been showing data properties.
By default properties are data properties unless they define a getter and/or setter,
in which case they are defined as accessor properties.
There are two attributes that are distinct to accessor properties.
get and set.
Both of which allow a method (and only a method) to be assigned to them to get or set respectively.

JavaScript getter error

  • Internally the getter calls the functions internal [[Call]] method with no arguments.
  • Internally the setter calls the functions internal [[Call]] method with an arguments list containing the assigned value as its sole argument.
    The setter may but is not required to have an effect on the value returned by subsequent calls to the properties internal [[Get]] method.

So these attributes may or may not leverage the internal [[Get]] and [[Put]] properties that are found in ES3 and ES5 on all objects.
You can in fact define only a getter (readonly), or only a setter (write-only) accessor if you so choose.

Defining accessor properties literally:

var testObj = {
   // An ordinary data property
   dataProp: 'value',

   // An accessor property defined as a pair of functions
   // get accessorProp() { return this.dataProp; },
   set accessorProp(value) { this.dataProp = value; }
};

testObj.accessorProp = 'an updated string';
alert(testObj.accessorProp); // undefined
alert(testObj.dataProp);  // an updated string

Can we create a data (default) property and then change it to be an accessor property?

var testObj = {}; // Start with no properties at all
// Add a nonenumerable data property x with value 1.
Object.defineProperty(testObj, 'x', { value : 1,
writable: true,
enumerable: false,
configurable: true});

// Check that the property is there but is non-enumerable
alert(testObj.x); // 1

// check that we can't enumerate the testObj
alert(Object.keys(testObj)); // returns an empty array of strings

// Now modify the property x so that it is read-only
Object.defineProperty(testObj, 'x', { writable: false });

// Try to change the value of the property
testObj.x = 2;
// Fails silently or throws TypeError in strict mode
alert(testObj.x); // 1

// The property is still configurable, so we can change its value like this:
Object.defineProperty(testObj, 'x', { value: 2 });

alert(testObj.x); // 2

// what happens if we change configurable to false?
Object.defineProperty(testObj, 'x', { configurable: false });
Object.defineProperty(testObj, 'x', { value: 2.5 }); // Uncaught TypeError: Cannot redefine property: x

// Now change x from a data property to an accessor property
// providing we haven't set configurable to false as above.
Object.defineProperty(testObj, 'x', {
   get: function() {
      return 0;
   }
});

alert(testObj.x); // 0

Yip.

Ok, so what does a property descriptor of an Accessor Property look like?

var objWithMultipleProperties;
var objWithMultiplePropertiesDescriptor;

objWithMultipleProperties = Object.defineProperties({}, {
   x: { value: 1, writable: true, enumerable:true, configurable:true },
   y: { value: 1, writable: true, enumerable:true, configurable:true },
   r: {
      get: function() {
         return Math.sqrt(this.x*this.x + this.y*this.y)
      },
      enumerable:true,
      configurable:true
   }
});

objWithMultiplePropertiesDescriptor = Object.getOwnPropertyDescriptor(objWithMultipleProperties, 'r');
// objWithMultiplePropertiesDescriptor {
//    configurable: true,
//    enumerable: true,
//    get: function () {
//       // other members in here
//    },
//    set: undefined,
//   // ...
// }

The Global Object

When the JavaScript interpreter starts (or whenever a web browser loads a new page),
it creates a new global object and gives it an initial set of properties that define:
• global properties like undefined, Infinity, and NaN
• global functions like isNaN(), parseInt(), and eval()
• constructor functions like Date(), RegExp(), String(), Object(), and Array()
• global objects like Math and JSON

delete undefined;  // not deleted
delete Infinity;   // not deleted
delete NaN;        // not deleted
delete isNaN;      // deleted
delete parseInt;   // deleted
delete eval;       // deleted
delete Date;       // deleted
delete RegExp;     // deleted
delete String;     // deleted
delete Object;     // deleted
delete Array;      // deleted
delete Math;       // deleted
delete JSON;       // deleted
undefined = 'kims undefined'; // nonassignable
Infinity = 'kims infinity';   // nonassignable
NaN = 'kims nan';             // nonassignable
  1. Why are undefined, Infinity and NaN not removed?
  2. Are they non-configurable?
  3. Why are they non-assignable?
  4. How do we test this?
  5. Are they constants?
  6. Are Infinity, NaN and undefined reserved words?

I’ll answer these questions shortly.

ES3 properties

According to the standard
8.6 “Each property consists of a name, a value and a set of attributes”.
8.6.1 A property can have zero or more attributes from the following set:

These attributes along with others (see ES3 spec) are reserved for internal use.

Attribute Description
ReadOnly The property is a read-only property.
Attempts by ECMAScript code to write to the property will be ignored.
(Note, however, that in some cases the value of a property with the ReadOnly attribute may change over time because of actions taken by the host environment; therefore “ReadOnly” does not mean “constant and unchanging”!)
DontEnum The property is not to be enumerated by a for-in enumeration
DontDelete Attempts to delete the property will be ignored.
Internal Internal properties have no name and are not directly accessible via the property accessor operators. This means the property is not accessible to the ECMAScript program.
How these properties are accessed is implementation specific.
How and when some of these properties are used is specified by the language specification.

These property attribute values can not be changed
An interesting Internal property is the [[Prototype]]
There are a number of ways to access the internal [[Prototype]] property indirectly.
I’ve detailed them in my post on prototypes here.

More on ES5 properties

writable, enumerable and configurable replace the ES3 property attributes: ReadOnly, DontEnum, DontDelete.
The property attributes and their values define the property descriptor object (including Data or Accessor properties and those that apply to both (enumerable and configurable)).

The property attributes can be manually managed by the:
Object.defineProperty and Object.defineProperties methods
Object.getOwnPropertyDescriptor

var myObj = {};

Object.defineProperty(myObj, 'propOnMyObj', {
   value: 'property descriptor',
   writable: true,    // ReadOnly = false in ES3
   enumerable: false, // DontEnum = true in ES3
   configurable: true // DontDelete = false in ES3
});

console.log(myObj.propOnMyObj); // 'property descriptor'

// getOwnPropertyDescriptor is the only way to get the properties attributes.
// They don't exist as visible properties on the property (other than for setting them as above),
// they're stored internally in the ECMAScript engine.
var myPropertyDescriptor = Object.getOwnPropertyDescriptor(myObj, 'propOnMyObj');

console.log(myPropertyDescriptor.enumerable); // false
console.log(myPropertyDescriptor.writable);   // true
// etc.

getOwnPropertyDescriptor

There’s lots of new methods defined in ES5.

Now, back to the six questions we had above.

  1. Why are undefined, Infinity and NaN not removed?
    Because  their property descriptors configurable attribute is set to false.
  2. Are they non-configurable?
    Yes, as above.
  3. Why are they non-assignable?
    Because their property descriptors writable attribute is set to false.
  4. How do we test this?
    NaN
    Number
    global Infinity property
    undefined
  5. Are they constants?
    Effectively, yes.
  6. Are Infinity, NaN and undefined reserved words?
    No. Avoid using their names to remove ambiguity.

ES3
Infinity read/write (the value can be changed). Holds positive infinity.
Number is a property on the global object, which has readonly properties Infinity and NaN.
NaN read/write (the value can be changed).
undefined

ES5
Infinity (well… POSITIVE_INFINITY) is a property on the global Number property with the value Infinity
We now also have NEGATIVE_INFINITY with the value –Infinity
Number
Infinity is also declared directly on the global object.
NaN is a property on the global object with the value NaN
undefined is a property on the global object with the value (you guessed it) undefined.
These are all constants now.

Important differences between Properties and Variables

Variables are properties, but not vice versa.

The VariableObject in ES3 is called the VariableEnvironment in ES5.
Can be seen in the specs.
Not sure why they changed what they called it.

Each execution context (be it global or any function) has an associated VariableObject.
Variables (and functions) created within a given context are bound as properties of that context’s VariableObject.
Even function parameters are added as properties of the VariableObject.
Discussed in depth in:
ECMAScript3 spec under “10 Execution Contexts”
ECMAScript5 spec under “10.3 Execution Contexts” onwards
This is why we can access global variables as properties of the global object…
Because that’s what they are.

  • The global object is created before control enters any execution context.
  • The global object is the same as the global contexts VariableObject.
  • In the HTML DOM; the window property of the global object is the global object.

Now variables of functions are similar, but we can’t access them as properties.
Why?…
ECMAScript has an Activation Object.
When control enters the execution context of a function, an activation object is created and associated with the execution context.
The activation object is initialised with:

  1. The this value
  2. an arguments property (referred to as a binding in ES5 spec) that has the DontDelete attribute (configurable set to false in ES5).

The activation object is then used as the VariableObject.
We can access members of the activation object but not the activation object itself,
which is why we can’t access the members as properties.
Further details in:
ECMAScript3 spec under “10.2 Entering An Execution Context”
ECMAScript5 spec under “10.4 Establishing an Execution Context”

Feature Detection (Yes, Including JavaScript)

I know this is not property specific, but it was something I thought noteworthy.

There’s a library that looks to have potential for JavaScript feature detection.
“has.js”
This should be useful for detecting what your users browsers are capable of EcmaScript wise.
The project lead is Peter Higgins (Dojo Toolkit project lead).
Has a good sized group of committers.
May have potential to be a better Modernizr.
The source is here.
Explanation of has.js features here.

Additional References:

Succinct explanation of Variables vs Properties in JavaScript

EcmaScript5 Objects and Properties

Slideshow by Doug Crockford on ES5’s new parts

Dmitry Soshnikov’s elaborations on the Ecma standards:

http://dmitrysoshnikov.com/ecmascript/es5-chapter-1-properties-and-property-descriptors/

http://dmitrysoshnikov.com/ecmascript/chapter-7-2-oop-ecmascript-implementation/

http://dmitrysoshnikov.com/ecmascript/chapter-2-variable-object/

Extending, Currying and Monkey Patching. part 3

May 27, 2012

Monkey Patching

Or sometimes known as Duck Punching.

So what we’ve got here is an arbitrary object kimsObject (line 01) thrown in the BINARYMIST global, with a public method kimsMethod (line 03).
The trace method is passed an object and a method name (line 10). It replaces the specified
method with a new method that “wraps” additional functionality around the original method.
The call is made from line 23

var BINARYMIST = {};
BINARYMIST.kimsObject = (function () {
  return {
    kimsMethod: function () {
      alert("Now inside kimsMethod.");
    }
  };
}());

BINARYMIST.trace = (function () {
  return function (targetObject, targetMethod) {
    var originalMethod = targetObject[targetMethod];
    var that = this;
    (function () {
      console.log(new Date(), "Entering:", targetMethod);
      var result = originalMethod.apply(that, arguments);
      console.log(new Date(), "Exiting:", targetMethod);
      return result;
    }(/*execute me!*/));
  };
}());

(function () {
  BINARYMIST.trace(BINARYMIST.kimsObject, 'kimsMethod')
}());

I believe if used with care, and in moderation.
Monkey Patching can support and provide implementation for
Dr. Barbara Liskov’s Liskov Substitution Principle (LSP)

Consider the developer that uses a function he/she thinks does something, but it actually does something entirely unexpected.
Be sure to make known to the team and those what will use your code what you have done, and what they can expect.
Make sure this is documented in code around the API, whether it be with comments or simply in the way you write your code.

monkey patching

Jeff Atwood has a good post on the dangers of using Monkey Patching.
Be sure to check it out.

You may or may not need the Object.create function. It’s part of the ECMA Standard 262
which is the vendor-neutral standard for what was originally Netscape’s JavaScript.
This allows us to specify the object we want to be the prototype for our new object, I.E. our base class.
Properties added to an object‘s prototype are shared, through inheritance, by all objects sharing the prototype.
Alternatively, a new object may be created with an explicitly specified
prototype by using the Object.create built-in function.

This also makes me think of what we try to achieve with Aspect Orientation in .NET.
Below I’ve had an attempt at trying to make the BINARYMIST.createAspectedObjectWithTracing as generic as possible.
So that on the last line you can instantiate a BINARYMIST.kimsObject and call the kimsMethod method.
The way I’ve setup the BINARYMIST.createAspectedObjectWithTracing, you should be able to instantiate any object and call which ever method on it that you desire.
You get the tracing functionality for free.
Using this sort of approach, you should be able to apply any functionality you want to any method you want.

I’m keen on hearing how this could be improved.
Making the call even more transparent.

BINARYMIST.kimsObject = (function () {
  return {
    kimsMethod: function () {
      alert("Now inside kimsMethod.");
    }
  };
}());

BINARYMIST.trace = (function () {
  return function (targetObject, targetMethod) {
    var originalMethod = targetObject[targetMethod];
    var that = this;
    (function () {
      console.log(new Date(), "Entering:", targetMethod);
      // We can pass any number of args to the original method we wanted to call.
      // We also handle any return value. So hopefully this should be able to be used for any method.
      var result = originalMethod.apply(that, arguments);
      console.log(new Date(), "Exiting:", targetMethod);
      return result;
    }(/*execute me!*/));
  };
}());

// Add create method to the Object function
// Gives us dynamic control of what a base class should look like.
// method (create) that takes an object argument, returns a new object that has the parameter assigned to it's prototype.
// Creates a function on Object (create) that takes an object argument.
// Creates a function (Func) and assigns the object parameter to the function definition's prototype.
// Instantiates and assigns the new function (Func) to the "create" function definition on Object.
(function () {
  if (typeof Object.create !== 'function') {
    Object.create = function (o) {
      var Func = function () {};
      Func.prototype = o;
      return new Func();
    };
  }
}());

BINARYMIST.createAspectedObjectWithTracing = (function (anyObject, anyFunction) {
  var localObject = Object.create(anyObject);
  localObject[anyFunction] = function () {
    return BINARYMIST.trace(anyObject, anyFunction);
  };
  return localObject;
});

// Instantiate your object (any object) and call a method on it.
BINARYMIST.createAspectedObjectWithTracing(BINARYMIST.kimsObject, 'kimsMethod').kimsMethod();

Now as you can see again, all of the objects I’ve defined are sitting tidily out of the global object.

javascript global abatement

Now as you can see below, our BINARYMIST.kimsObject is now part of the new object’s (that localObject references) prototype object.
and it’s method is clearly visible.

javascript prototype

On line 42 above, we set the localObject‘s kimsMethod  to reference a wrapped version of BINARYMIST.kimsObject.kimsMethod.
The wrapped version has the tracing added.

We then return the localObject which references BINARYMIST.kimsObject and call the most specific kimsMethod
which has the functionality we just set.

return BINARYMIST.trace(anyObject, anyFunction);

and as you can see below, we’re now ready to start playing with our target method (kimsMethod).

javascript tracing

We then assign the this. We then execute the  anonymous closure.
In doing so, this is bound to the global object.
So to access the outer scope we need the that.
We then apply the originalMethod to that (BINARYMIST in our case).