I've had an interesting year in terms of learning new things, or learning about the changes to old things.

At my day job, we've been building microservices and the infrastructure necessary to manage them. Lots of Docker and schedulers and orchestration and distributed configuration and things of that nature. Also, Node.js.

On my side project, I've been migrating a monolithic ASP.NET MVC 5 application to ASP.NET 5 / MVC 6 and getting it running in Docker containers. As I've split things into smaller and smaller services, I've rewritten some bits in Node.js because it's either better at certain things or there are better libraries out there (particular hat-tip to Passport). (It's nice that the things I've been learning at the day job have helped with this.)

Oh, and I've spent 99% of my time working in Linux on my laptop.

All this has contributed to a bit of a shift in the way I think about building solutions, which I'm going to try to write down here.

What I've learned

The common thing between Linux and Node.js is the idea of doing one thing, and doing it well.

Linux

Linux has a couple of things that push developers in this direction. At the kernel level, inter-process communication in Linux is incredibly easy thanks to Unix Domain Sockets, which are like Named Pipes but more flexible and easier to work with. They also have a very low overhead in terms of system resources and latency. These make it much easier to build small services that communicate with each other.

The other thing that is common to all Linux distros at a fundamental level is package managers, like APT for Debian/Ubuntu, Yum for Fedora, or PacMan for Arch. These are used to distribute both libraries and applications, with dependency management built right in, so you can take a dependency on, say, libcurl and it gets installed with your package, rather than having to be bundled with it.

An indicator of how strongly this ethos of do one thing and do it well can be seen in some part of the community's reaction to systemd, the core process of most current Linux distributions. Systemd does lots of things. Whether it does them well or not is less important than the fact that it has pulled into itself a lot of functionality that was previously handled by multiple cooperating processes. This made some people quite cross.

Node.js

The core of Node.js has very constrained functionality out of the box. The highest-level thing in it is probably the basic HTTP handling, which is done in C/C++ to make it as fast as possible. When you need higher-level functionality, you install a package from NPM, which in turn installs a bunch of other packages from NPM, and so on. If you take a look at the actual code for some of these packages, it might only be a couple of dozen lines, but they're a couple of dozen lines you don't have to write and maintain.

An example. I use Gulp as my build system for all my Node.js or browser projects: building TypeScript; pre-processing CSS; linting code; running tests; minifying JavaScript; and so on. When I run a build, there is a "clean" step, which just deletes all the outputs of the previous build (like nuking the wwwroot folder in my ASP.NET 5 projects). For that, I use an NPM package called del, which just deletes files and directories. Take a look at the entire source code for del. 72 lines, including whitespace and closing braces.

Back to .NET

So, to bring this back to what I still think of as my primary domain, .NET.

This is the web page I was on right before starting this post — it's an issue for the Azure Storage SDK for .NET wherein somebody has suggested that the library/package is overly complex, and many people have added a +1 (because GitHub still doesn't have a vote-for-issue feature).

The WindowsAzure.Storage NuGet package includes functionality for working with [KeyVault], and depends on Microsoft.Data.OData, which in turn depends on System.Spatial, which contains "classes and methods that facilitate geography and geometry spatial operations".

I would submit that the vast majority of consumers of the Storage package are not going to use the KeyVault functionality, and probably aren't working on problems involving geography or geometry. But to be able to simply put some data into blob storage, for example, their solution must take an indirect dependency on all these packages.

Why is this a problem?

There are two reasons why this is a bigger problem heading into 2016 than it was at the start of this year.

Resource overhead

These packages and their assemblies have to be installed for each application, which in the old world of monolithic applications running in VMs was less of a problem (disk space is cheap, after all), but in a world of microservices and Docker images starts to have more of an impact. Plus, the .NET runtime has to at least partially load these assemblies into memory. These microservices become less micro with this additional – and often unnecessary – content.

Agility

We have a whole new .NET now, the Core .NET, with some new APIs, and some that have been deprecated for ages finally removed altogether. Migrating a project to run on this new platform is a non-trivial task. The bigger and more complex a project is, and the more dependencies it has, the bigger that task becomes.

If you can identify and isolate the core functionality of a library, you can iterate on it more quickly, and provide value to the majority of your users without frustrating them by making them wait while you work on the edge cases that they don't even need. Push those edge cases into add-on or plug-in packages that can be updated on a different cadence. Decide whether a particular area of your library that has a dependency on a third-party package is 100% necessary for all use cases, and if not, maybe you can factor it out instead of waiting for that package to be updated.

Adopting this approach also makes it easier for the community to contribute to your project. Digging into a project and finding tens of thousands of lines of code covering dozens of different types of functionality is daunting. A project with maybe a few hundred lines of code (covered by a decent test suite) is much more approachable.

Kudos to the ASP.NET 5 guys, by the way, for breaking MVC 6 down into multiple packages and providing a Core package for those who don't need the full feature set (e.g. the Razor view engine). There is still a Microsoft.AspNet.Mvc NuGet package, but it's a meta-package that just pulls in all the smaller packages.

Suggested action

Personally, I've been working on a plug-in pipeline for Simple.Data 2, but now I'm thinking of breaking out some of the less-used built-in functionality into sub-packages that can be installed when needed (probably along with some exceptions that tell people to install them). I could also break out some of the lower-level functionality into packages that could be used by other libraries.

And I look at other packages (such as the ubiquitous JSON.NET) and wonder if they could move in this direction as well. Maybe Newtonsoft.Json.Bson, or some of the more obscure type handling such as dictionaries and numerics, could be refactored into sub-packages; there could be a Newtonsoft.Json.Core package with the basic functionality, and Newtonsoft.Json could become a meta-package to install everything and avoid breaking existing projects.

Disclaimer

This was kind of a brain-dump post, and I haven't thought through all the technical challenges that these ideas might present in the statically-typed, assembly-dependent world of .NET, but I welcome any feedback in the comments section, or whole blog posts explaining why I'm misguided or confused. :)