Tuesday, 18 November 2014

Messaging and transient issues.

Messaging and transient issues.

A small post but something worth considering. 

Recently when discussing a situation where a MongoDB replica set was in the process of failing over, concern was raised about writing data - whilst a new primary was being elected. 

This is going to be a transient issue and issues similar to it - such as temporary server outages and routing  are too .  

They are going to take a little time to resolve but should resolve fairly quickly. In the meantime, there are a few solutions available whilst this transient issue sorts itself out:  

  1. Do nothing.  In this case give up and find another job  you lazy hacker.
  2. Let the process fall over, report  the failure to users and let them try again via a button click. A users experience might be sullied - in the opinion of some -  but it still could be a reasonable way to recover (this depends on what stakeholders/business think really).  This might not be reasonable for important information which must stored and can not be optionally retried. 
  3. Employ a retry mechanism, simply loop a few (reasonable times) until we get success, or employ an exponential back off to give reasonable time for recovery.  I have done the former using, as someone I know put it, some funky "AOP" shit.  However, I wouldn't recommend doing the AOP stuff for transaction management or bounded retries, because eventual consistency still may never occur and AOP, certainly in a lot of the frameworks I have used, is complete magic. I did have trouble explaining the concepts to some of the less experienced and even seasoned members in my team what that "Transaction" attribute was above my DAL method, and how this was setup on the IOC container and  how code was executed pre and post the method in an execution pipeline .
  4. Use a durable messaging framework like NServiceBus.  The command or event (message) which was sent will fail and if second level retries are enabled will be retried a specified number of times. If unsuccessful within the specified number of retries (which have exponential back off) The message will be placed on an error queue,  relevant administrators will be notified or at least be reporting on this stuff.  The exception and or problem will be noted and hopefully fixed and then the message in the error queue will be replayed bringing everything back into a consistent state.  And all of this with the user completely unaware that four data centers where nuked.

That is all. 



As I am sure you are aware, this is not a new feature in .NET - having been around for over a year as part of .NET 4.5. There have been plenty of posts about it, and so I'm not going to go into a great deal of depth, as I am not an expert and there are people who have gone into it in more depth than I could.  I just want to get a couple of  the key concepts up and provide a laymans tilt on it. 

I've been using the language feature as part of a site I am developing as a platform to testbed a few technologies including:

  • Knockout.Js, 
  • The continuous integration and continuous deployment capabilities of  Visual Studio Online (formerly Team Foundation Service)
  • WebAPI and integrating with 3rd party web services
  • Some of the newer C# language features 
  • Bootstrap templates 
  • Stanford Core NLP for .NET 
The site is work in progress and is available @

The main purpose of the site is as a front end for a simple aggregation service for searching for development related articles, plumbing in information from a number of sources such as YouTube, MSDN, PluralSight and Stackoverflow.   I've got a few ideas about potentially leveraging the Stanford Core NLP  fpr .NET ported by Sergey Tihon so that I can, perhaps, perform slightly more accurate content matching, but its a bit of a mess at the moment. 

What is Async/Await? 


A C# language feature making it almost trivial to perform asynchronous work.  There are a number of different ways to already do this in C#/.NET including the Asynchrounous Programming Model (APM), think IAsyncResult, and Event  based Asynchrounous Pattern (EAP) think Background Worker.  I'm not going to discuss these much here, but suffice it to say async/await is probably a better and easier way to do async work and is also now the preferred way to do async work (I believe there are some low level optimisations to take advantage of here provided for by the TPL (Task based parallel library) which async/await sits atop of.    

In order to identify a method as something which will initiate asynchronous work,  we let the compiler know by adding the async keyword to the signature like so:

Here the Evaluate method is going to initiate some asynchronous process. It also makes this method awaitable. That is to say, a call to this method  will not block and can work in an asynchronous manner too. 

The Evaluate method will return a Task<ResultSet>.  The return type here is important.  We are not returning something that will necessarily have the value immediately but rather something that signifies that the result of some work will appear at some point in the future.    I think I've heard this same kind of concept being called a promise in other languages.  Another thing to note is that the actual return statement inside the method body will wrap the return type in Task<T>. So here ResultSet is wrapped in Task<ResultSet> impicitly.

Inside the method a call to _dataFeedRetriever.GetDataFeedResults(searchTerm) is prefixed with await.   This means that when the aforementioned method does work which does not get a result immediately,  the Evaluate method, itself, will be returned from immediately,  but crucially the position, in the Evaluate method, where execution got to, will be retained/saved so that when there are results available, normal synchronous execution will resume inside the Evaluate method.

Interestingly prefixing a method call with await, has the effect of unwrapping the Task<T>  type the method returns and pulling out the results when they are available.  Conversely, prefixing the return type in the signature means that the return type of the method is wrapped in a Task<T> implicitly. 

If  awaitable methods are nested,  when encountering the first await where a delay is encountered, execution returns immediately on the first await and then where each await is encountered, back up the call stack, so we back out upstream if you like.  When a result is available at the inner most await, execution resumes here and then will continue back up the call stack, perhaps without any further delay at the outer awaits. 

A misconception of async is that utilises additional threads to do its bidding, but this is not always the case.  A single thread can be used to  call out to an async operation,  such as file IO (i.e. I/O bound work).  Whilst IO work is being done and a response/result is being waited upon, the thread, used to make the request, can be freed and returned to the thread pool to be used to perform other work, such as incoming web requests.   Once the IO is completed, the thread, upon which the operation was started, can be scheduled to be used to continue execution where it left off.  

This kind of pattern allows for more efficient use of system resources and threads and potentially reduces the need to create new threads  helping save on all of the associated costs with this (TEB, TKO, User mode stack  and context switching).

Facilitating the above is a state machine which the compiler generates where the async/await keywords are encountered.   It is framework magic but I think it safe to say this is no fly by night 3rd party library but a C# language feature, which although not infallible, is pretty consistent.   

Coincidentally, the state machine which is generated for async/await is not too dissimilar to the state machine used for the yield keyword, where the position in a sequence is maintained whilst enumerating a sequence.   Jeffrey Richter leveraged the yield feature to provide an async library before it became part of the C# language.   If you are interested about Jon Skeet has both an excellent tutorial where he dissects async/await EduaAsync and also a Pluralsight video (which is probably a little easier to follow)  going into the innards of the state machine used in async/await.

Putting it all together


Using async/await in the Website mentioned earlier allows me  to query  the 4 feeds I mentioned at the same time,  and take advantage of only having to wait as long as the longest call to a feed, as opposed to the sum of the calls to all the feeds. At the same time making use of only one thread to do all this work and in a nice clean easy to understand manner, is a great argument for using the feature.   I'll add the rest of the code to this post or the next. 

Sunday, 16 November 2014



MongoDB training, provided by MongoDB themselves, was up for grabs recently, so I put my name in the hat to get a keener insight into MongoDB and NoSQL databases, having used them only a few occasions. The training was attended by a near 50/50 split of devs and DBAs and this led to some interesting debates and reactions.  

Like lots of the stuff I've been introduced to/have started looking at recently (I am late to nearly every technology party there is) they (Mongo and NoSQL) have been around for a while and have become established as a viable alternative persistence solution to the likes of MSSQL. 

The main features:
  • Its schema less
  • Document oriented/No support for JOINs 
  • Querying performed using JS/JSON
  • Indexing
  • Replication/Redundancy/High availability
  • Scaling out via Sharding 
  • Aggregation (unfortunately we ran out of time so this is not covered here)
  • Authorisation (again ran out of time on this one) 

Its schema less

You'll know doubt be aware that relational databases use schema to determine the shape and type of data that is going to be stored in them.  MongoDB doesnt, quite simply.  You take an object from some application (most likely) and it is serialised (typically) from your application into  JSON and then into BSON and then stored on disk.  

MongoDB does not care for the type of data used or its shape and when a document is added, a field with a particular datatype,  used for one commit, may be of a different type on another (I'm not sure how this would impact indexing)  But the look of sheer horror and gasps from the DBAs was priceless (more on this in a minute). 
In practice data type changes like this would probably be as rare as they are in an application as they are in the database, how often (apart from during development) do you go and change the fundamental data-type (that is,to say not its precision or  max size) in production code/databases? 

Document oriented

As there is no schema,  things we think of as objects in applications (e.g. a Customer object)  can be serialised into JSON and then as a MongoDB document added to a collection of customers.  A collection is roughly analogous to a table and therefore a collection can have a number of documents like a table can have a number of rows. 

Documents can be entire object graphs with other embedded documents (e.g. a customer with a collection of addresses).  Access to documents can be expressed through dot notation when querying and projecting.  

With this arrangement the notion of relationships expressed  through queries over many sets of tables (like in SQL) dissapears, relationships are instead expressed by virtue of parent child relationships you see in a typical object graph.  Joins are killed off at the DB layer and would generally then be performed in the application (if at all).  However,  there is the ability to define foreign key relationships with other documents in the database.  There is a cost to this though as referential integrity will come at some cost to query performance.  

With document orientation though I think it seems like a good opportunity to embrace and leverage small transactional boundaries perhaps closely aligned to an aggregate root/entities like in DDD.  Each document commit is atomic and so getting the aggregate root root right, I'm sure, would certainly improve concurrency and reduce  the size of working sets.  

Querying performed using JS/JSON

One of the particularly great features of MongoDB is the simplicity of its querying. 
For example finding all of the documents with a customer whose name is bob on a reports database (db context is reports below) within the customers collection looks like:

The find method is simple JavaScript the first argument is JSON - straightforward. The predicate is expressed by a key and value pair, with the key being the field in a document with values matching the specified value.  Just  like in SQL, complex relational expressions can be built up using in built functions such as $lt, $gt, $or and $in.

Taking it a little further we can determine which part of the  document we would like to view

The second argument to the find method {customer_name : 1} is a filter to say which fields in the document to return in any results found, analogous to the select statement in SQL.   Here customer_name will be returned. We could inverse the statement passing 0 to customer_name and all of the fields EXCEPT customer_name would be returned. 

As queries get more complicated there can be a tendency to unbalance the brackets in the expression, so be mindful of this.  A good tip was to add in all of the sub-expressions first and expand them out.  Alternately, use a text editor which has highlighting for brackets/curly braces etc.


Indexing is supported via single or compound indexes on document fields.  They can be configured differently on primary and secondary nodes dependent on needs, with a bit of jiggery-pokery (taking nodes offline and configuring them differently).  
I can't speak much for the absolute speed and performance of the indexes, but relative to not having them, they are much quicker. 
Under the hood they are implemented using B-Trees, whether this is optimal would be for some performance analysis and comparisons to answer. 

Replication/Redundancy/High availability

Apart from when development sandboxing and running in standalone mode,  the recommendation is to run MongoDB databases in a replica set which must contain at least 3 nodes to be effective and always an odd number for larger replica sets (explained in a moment). A replica set consists of a minimum of 3 nodes in total.   A primary and 2 secondaries.  

All writes and reads are, by default performed, at the primary and then the instructions and data are replicated to the secondaries.  

Primary node failure and subsequent failover is handled by an election process to determine which secondary should become the new primary.  The odd number of nodes in a replica set is crucial here as this ensures that a majority can be obtained by participating nodes and a new primary elected.  If an even number of nodes was configured in a replica set, there could be a  shared vote for leader and all of the nodes would go into a read only state, as there would be no primary.  I'm not entirely sure if MongoDB enforces the odd number rule we didn't try it out in the examples.

In the 3 node replica set arrangement, if  the primary node where to fail, an election would take place between the secondaries and which ever node obtained the majority vote first would be come primary. Obviously, nodes can not vote for themselves.  In this instance there is 50/50 chance of either one of the secondaries becoming the new primary, as long as there are no priorities configured. 

Using secondaries here as foils for the primary would see this arrangement more attuned with Mirroring in MSSQL where exact copies of the primary data a farmed out to slaves (in essence) ensuring redundancy.   And so replication here is not necessarily the  same as replication used in MSSQL.  

However, a slight tweak to configuration of the secondaries means that all reading need not occur through the primary, and specific secondaries can be read from directly.   This provides an element of scaling out, where writes and reads are performed through different servers,   Indeed, indexing can be tuned separately on any of the nodes to help with optimal query execution

The configuration of replica sets is an absolute doddle and this was the general consensus amongst both devs and DBAs.

Scaling out via Sharding

This got the trainer hot under the collar and is one of the more prominent features, particularly for the larger software projects you may find in an enterprise. Sharding only requires a  tad more work  than is required for adding a replica set to get going.    

Analogous to partitioning in MSSQL, sharding is, in essence, a way of making use of a distributed index.   It also helps distribute storage and retrieval across MongoDB instances, thereby performing a kind of load balancing. 

Sharding occurs at a collection level and an appropriate shard key is used to shard upon. So, for example, if we were to take a customer collection we could shard it on the email address of customers.

Consequently documents with email addresses starting A through S and S through Z could end up being placed into 2 different shards with all read and writes for data with email addresses falling within the respective ranges being routed appropriately to write or retrieve related data (a mongos server and configuration servers are required for this) 
Shards contain chunks of a pre-defined size.  If data, in a shard, exceeds this threshold then splitting occurs and data is moved between shards in attempt to maintain a close to equal split of data amongst the shards.   This process occurs in the background, it will most likely impact performance but its not something the user has to worry about (that is to say this would be a non-functional requirement).  This allows auto-scaling to a degree, until you realise or plan for more shards as your data needs grow. 

DBA Horror

I think the developers found using and configuring MongoDB a more edifying experience than the DBAs.  I would hope the DBAs involved would agree I was in fair in saying they seemed to find the use of the shell, command prompt and JSON a little challenging.   Higher level concepts similar in MongoDB and SQL worlds were easily digested by attendees from both sides.  The consensus though, generally, was that MongoDB was a low friction route into administering and working with a highly scalable and available data persistence mechanism for developers!   Having only been a SQL hacker and not being involved enterprise persistence choices I'm not best placed to say whether MongoDB would be a  better or replacement solution for SQL, but it does seem bloody easy to use and administer.

To this end I would recommend attending the MongoDB course if it wings its way into your organisation or take the MongoDB intro courses on Pluralsight or MongoDB University to help make an informed decision before defaulting to SQL.