Bas Geertsema

I have been playing around with LINQ the last couple of weeks. Really nice stuff to work with, it changes the way you work with sets of data and the process of transforming them into other sets of data. But I don’t want to focus on the utility of LINQ here, but on the underlying paradigm shift that we might encounter and that I have been thinking about lately: drawing the analogy between application tiers and a supply chain, and giving the power back to the interface tier.

Typically it is considered good practice to split up your application in multiple tiers. This will enhance your reusability, flexibility and scalability. All good qualities to strive for obviously. But it comes at a cost: all tiers combined will be larger in code size and more complex to enable strict seperation of concerns. For example, we all have created applications in which in the eventhandler of a button you directly prepared a SQL statement query string and sended it directly to the database server at hand. Not a good practice in many ways, but very straightforward and effective. Now consider you introduce a method where the SQL preparation and subsequent database fetching takes place. This method will be more flexible and can be used from multiple locations (maybe two or three other button click eventhandlers) in your application code. The multi purpose versatility of the method is reflected in the method itself by adding some if or switch statements. A bit more complex, but no real problem here.

We take the next step and really seperate the interface and the data access layer. So we setup a seperate project which only concerns itself with fetching and updating the data in the database. It is unaware of the front-end and therefore creates many abstractions that will be used by the front-end. This might lead to some redundancy. For example, it might check data that is already checked in the front-end, or it might make calls to the database to retrieve data that was already retrieved before, but of which the data layer was unaware. This might be a slightly bigger problem, the performance hit here to re-fetch data from the database can be considerable. But, CPU and memory is cheap, so you don’t really need to worry about it these days.

The applications grows bigger and bigger, the decision is made to implement some kind of middleware, or a business layer at the boundary between your interface tier and your data tier. This middle tier must be quite flexible, because it abstracts away both the data retrieval and the interface functionality. Since there is little knowledge of this available, the middle tier can do nothing else than just work at the level of the lowest common denominator: whole business entities, perform all validations, all authentication, authorization, etc. Good practice in itself because it is the only way you can reliably say that your business logic and integrity will be executed. But again you might face the problem of even more redundancy of functionality in the interface tier, the business logic tier and your data access tier.

But is this really necessary?! The data access layers and business layers are often designed with this question in mind: ‘which information do we supply?’. But should the emphasis instead not be on the question ‘what information does the interface want?‘. Do we push the data? Or do we explicitly pull the data?

For example, in our original application we decided to fetch the name field of an entity (e.g. a user) using in-line sql code. We ask for the name, and that is what we get. Nothing more, nothing less. Now consider the 3-tier application. We ask for the name, but there is no such method in the middleware so we retrieve the whole entity instead. The middleware decides to check whether we have the proper authorization to fetch the entity. And then whether we have the authoriziation to read the name field. The data access layer is unaware of these validity checks and, just to make sure, performs this validity again. The middle tier is also aware of the whole entity hierarchy so for convenience it implicitly figures out the exact (derived) type of the entity, even thought the name field we are interested in is only specified in the root entity. Maybe a bit exagerated, but I hope you get my point: there is a lot of unnecessary overhead involved. Surely we needed all this functionality when we wanted to retrieve the whole entity, using polymorphic fetches, etc. But we _do not need it in this case. W_e just want the name of the entity. Ofcourse proper validation and authorization is needed, but only the kind that is directly related to the data we want. We don’t want any data we don’t need, and surely no functionality acting upon that unrequired data.

Now consider the three tiers working together as a typical supply chain seen in the retail industry. Within the field of operations research, the effect of unnecessary provided resources compared to the original demand is well-known as the bullwhip effect. This effect occurs when every manufacturer in the supply chain adds a ‘safety’ margin to the original amount of requested products. This accumulates up the chain and ends up in supply orders that overestimate the original demand. So even though the end customer maybe only have requested ten products, by the time the order reaches the raw materials manufacturer the order has grown to create a potential of twenty products. You can clearly see the problem here in the waste of resources due to the mismatch between supply and actual demand. The solution in this field of research has been the focus on demand-driven supply, or to put it different, the original demand by the customer must stay intact as long as possible upstream in the supply chain. This makes sure the supply is most likely to be equal to the actual demand.

I see analogies in the software systems we are building. But we may even have a bigger problem. Whereas in the retail industry the focus has been on the customer instead of the supplier for a while now, the software industry still considers the ‘customer’ (in this case the interface tier or the actual user) as just an executor of supplied services and information. Not as the _creator of demand _of specific services and information. Who is in charge of your information retrieval?! Is is truly the business layer that can decide which information to supply? Does the interface tier merely makes convenient use of all the entities the middle ware provides? Or is it the interface tier that specifies exactly which data it is interested in? I believe it should be the latter.

There might be some improvements to be made on this topic. Yes, it is bad practice to directly access the database from your button click eventhandler. But this does not mean it is equally wrong to specify what you want in a query format and hand it over the other tiers. E.g. if I would write this in my eventhandler, would it be really wrong?

SELECT Name FROM User where User.Id=123

Is this not exactly what it needs?! How can it be more terse and self- explaning? Still, most programmers would shrug at just the sight of seeing code like this in interface code. But what if I could just hand this query over the middle tier for further processing. The middle tier would then add some business logic and validation logic in this query definition and hand it over to the data tier. This tier would in turn add some database specific filters to the query and execute it. The resultant data (a single string representing the name of the entity) would be returned to the middle tier and back to the interface tier. The amount of overhead is minimal, since the original request of information stays_ intact_ all the way up to the data tier. There is nothing more and nothing less we need.

To come back to the technology I started with, I believe LINQ is a path to enable a strategy like this. The deferred execution makes is possible to adjust the query definition in all the tiers before it is being executed. Thereby making it possible to inject the query with validation logic and authorization logic. Or to inspect the query just before executing it, instead of making assumptions about the final use of the supplied data.

Empower your interface, it knows best, but make sure your data and business layers are your law-enforcement troops.

comments powered by Disqus