Tag Archives: terminology

Boiling the Ocean (or Attempting to Keep Up With Tech)

I was asked by a fellow developer how I keep up to date on the variety of technologies that I’m expected to understand when doing my job. I intended to write a short answer, and generate a blog post as a longer answer. I got this idea from Scott Hanselman, who in his talks about productivity mentions saving keystrokes, and the idea that when you answer a question for someone, you should share that in a way that is useful to you and others. With that in mind, here’s the advice I would give if you find yourself taking on a job that requires a broad knowledge of the technology and software.

Question: Somebody in your position obviously needs to understand a lot of technologies to be able to pick the best solution for a project. How do you approach learning and having an understanding of all the new stuff that is constantly coming out? Obviously you can’t sit down and learn every tiny detail, but do you just obtain a high level understanding of how things work and then flush out the details once you start a project in a technology that you’ve never used before?

I use Google Reader a lot, I just checked and I have 121 rss/atom feeds going into it. They are not all work related, and it’s not like I read every article. But I star stuff I want to read, or put it in Instapaper and then go back to it. I try to read a mix of updates on what I already know and interesting things about what I don’t.

As for how to learn and retain, and relate technology, that is a harder question. I’d love to tell you that you just get a high level exposure first and then fill in the details when needed, but my mind doesn’t work that way. I hated classes like Systems Analysis in college that only talked about systems in generic terms, because I couldn’t relate those terms to specific examples.

My own take is that in the beginning of your career, you just have to learn practical skills and learn the tools you work with well. When you start wanting to learn about systems and being able to evaluate, compare, select, recommend, etc, then you pick up as follows:

Let’s say I’ve never used a database, and I’m assigned to work on a project that will use Ruby on Rails and PostgreSQL. Take the on the job time to learn the tool as well as you can, focusing on the aspects you need for your job. Where possible, try to understand and separate product names / features from conceptual names / features. In this example, that means understanding what database and schema mean in generic terms, and why they mean different things in the PostgreSQL than they do in some other products (like Oracle or SQL Server). Spend a little bit of off time playing around with an easily attainable (usually open source) alternative. Spend a night or two doing those same types of tasks in MySQL. You’re not looking to be a dba, just doing similar tasks as to what you do. Finally, talk to people who use other products and compare. This works great with products that are hard to get a hold of because of cost, etc. In this example, talk to an Oracle DBA or App Developer. What features do they like about their product that would prevent them from switching. If you don’t know one, ask the question online. Quora, LinkedIn, Twitter are all great places for this kind of question.

At this point, you’re out of work time (so not counting your project time working with PostgreSQL because you would have done that anyway) is 4-6 hours on MySQL and maybe another hour in conversation. But you should have a really good idea of the concepts surrounding relational databases.And you should know what the different projects compete on in that arena and what some of their strengths and weaknesses. If you spend time away from that field and come back in, you should have the conceptual understanding and just need to buff up on the implementation details and latest trends.

Two other quick pieces of advice. First – After getting the practical experience on a type of tool, read the Wikipedia page. Most pages are at most a 15 min read and they lay out the purpose and strategy behind any tool type, and usually list the major players in that area. Second – Try to keep personal opinion from having to much sway. We all prefer different tools, but every tool can be criticized. That’s important to do, and know it’s weakness, but don’t dismiss the tool outright. They were built for a reason, a context. And many of the weaknesses relate to some tradeoff the programmers / vendor made that has a good reason behind it.

On Terminology: “Single Source of the Truth”

According to Wikipedia, Single Source of the Truth “refers to the practice of structuring information models and associated schemata, such that every data element is stored exactly once” (emphasis is mine). This would mean, for example, a customer’s first name to be stored in once repository, not in every system that refers to the customer.

First, it’s a concept that is both difficult, and subject to various interpretations and implementations. The Wikipedia page does a nice job of mentioning the difficult parts, like dealing with the schemata of Vendor products, etc. As for the variety of implementations, you can enforce this in a dogmatic way where data is truly only stored in one place. Or you can implement with policy, having a location for each piece of data that is considered the master, and other pieces of data are responsible for publishing changes and updating periodically from that source. Either way, it is clear that this is a strategy to choose judiciously.

Additionally, choosing this strategy requires strong consideration of the effects on performance, reliability, and caching. If secondary storage is allowed, then stale data and concurrency issues arise. If secondary storage is prohibited, then you now have a single point of failure for many applications. Using the example of a CRM system being the single source of a customer’s first name, imagine the impact of that CRM system being down if other applications are not allowed to store that data.

So why this post? Why all this time and effort to define the term and discuss some of it’s nuances? In a variety of work places, I’ve seen this catch on as part of the lingua franca between business and IT workers, but used carelessly. And the number one problem is that I’ve seen Type A managers use this term to justify their oversimplified view of information management.

Notice the emphasis on information, and that I emphasized data in the definition “refers to the practice of structuring information models and associated schemata, such that every data element is stored exactly once.” Information is data within a context, and that’s the key problem when you get sloppy with the concept of “Single Source of the Truth.”

In one example, a particular manager had a problem with the fact that weather data was being stored in many different systems across the enterprise. I was part of a team tasked with creating a single consolidated data store and import program for all weather data across the company, because of his goal of having a single source of the truth. Briefly after looking into the other systems, it was clear that he didn’t grasp the ramifications of the concept.

Weather is a key factor in the demand for this customer, and so it is the basis of historical analysis, contract bidding, countless other aspects of their business. To our anonymous manager, that meant it was crucial to consolidate this information and have only one source. He was certain that people were out there using inconsistent sources that were causing efficiency problems, among other things.

Let’s start with the different types of weather data. There are forecasts and actuals. There is daily weather and hourly (and daily sources are peak for some uses, average for others). Finally, it’s worth noting that weather data is often corrected later, when the real-time value provided was measured incorrectly, or some other type of error occurred.

So let’s assume that we’re trying to consolidate hourly actual data. All applications should use this source. And let’s look at a couple of those uses:

  1. A bid for service is based on historical data, where the agent writing the bid used that weather data to evaluate the customers demand sensitivity to weather, and to evaluate the companies supply as trend of weather.
  2. A report on the effect on weather on supply is regularly supplied to operations managers.
  3. The accuracy of this data supplied by an outside vendor is to be regularly audited by Supply Chain.

Now, let’s assume those activities have taken place for the month of April, and it’s the middle of May. Now the vendor comes in with correction data for the middle of April. For the first purpose (the contract), I want to store what information was used to write the contract at the time. It’s the only fair way to evaluate the agent, as he wrote the bid based on the best available information.

Because supply and production is naturally affected by weather, for the second purpose (operations evaluation), I want to rerun those reports based on new, more accurate information.

Even more disruptive is the fact that in order to evaluate the variances in the accuracy of the vendor data, the company should be storing both values.

This leaves you at a decision point: Do you handle this by declaring these as different information, or version the information. In other words, the bid history is linked to uncorrected vendor data, and the updates are used to create a corrected data source that can be used for the operations purpose.

The alternative is that that corrections cause the creation a new set, but all sets are retained. Differentiation is handled with a version number or timestamp, and all the above problems are solved. While this sounds simple, versioned data grows quickly, and is difficult to query and understand.

Due to the timestamp, each record can now be referred to as the single source of weather data, for that location, occurance date and time (date of the weather), as provided on said date and time (import time). But for each location and time, there are multiple potential values as corrections are entered. And there is forecast vs actual data.

So to be precise, I still can’t say “give me Cleveland’s weather for March 7, 2011.” I would have to say “give me the actual weather value for Cleveland on March 7, 2011 that was available when I wrote a bid on April 5th.” Or in the case of an operations manager, they would request “the latest value of actual weather for Cleveland on March 7, 2011.”

Those are different pieces of data. But I don’t think that’s what our manager had in mind when he requested a single source of the truth for weather data. Because he meant weather information. Context / details / reality didn’t fit the mental model he had of weather data.

In the case of this project, we were able to slightly reduce the amount of weather data stored. And we certainly reduced the amount of batch jobs involved in fetching that data from external sources. But we also created a performance and reliability bottleneck. That may or may not have been the right decision. My point is that it is worth taking some time to think through and understand the terms you are using. Sometimes simple answers are great, but sometimes they are really just a sign of naivety.

On Terminology

I should have written this a long time ago. One of the most universal lessons I’ve learned as a consultant is to be careful with terminology and jargon. Context, industry, experience all play a factor in how people will interpret what you say.

At one client in particular, it got so bad that I regularly threatened to create a swear jar that was for ambiguous terms instead of profanity. This was inspired by a long argument about control tables, only to learn that everyone in the room had a different definition of control table.

Currently html 5 is a hot term, and the source of a lot of argument. Many people mean a host of technologies when they say html 5, including css 3, etc. So much so that the standards people have gone to using just “html” when referring to specifically html 5.

I regularly think developers misunderstand the term n-tier application (or n-tier architecture), and this is part of a larger problem of sloppy use of “layer” and “tier”. Architects generally are more precise, and use tier to indicate a boundary (process or network), while layer is a component that will run in process. In the .Net world, anything using a remote call (even to a process on the same box) like WCF, SqlServer, MSMQ, etc would be a change of tiers, while multiple DLLs in the same process are just layers. Often times, when discussing a n-tier application, developers will respond as if you are talking about a 3-layer business application with a UI layer, Business Logic Layer, and Data Layer. What’s confusing about this, is that such a 3-Layer architecture in a ASP.Net world is actually part of a 3-tier application, where the html, javascript and css on the client is considered Tier 1. The ASP.Net web/application server is considered Tier 2, and the database server is Tier 3 (even if it is running on the same physical machine because it is out of process). And the 2 layers (UI and Data) that the developers are confusing with tiers, are in fact responsible for managing the different tiers (web content and database interaction).

When I started writing this post, I asked for suggestions on twitter. The first response mentioned agile, which is certainly a bombshell of terminology. It is a methodology with practices in multiple areas of a project. How many teams do you know that say they are agile, by which they mean they don’t have a process (and don’t document)? A work contact of mine regularly describes a group of cowboy coders we are familiar with as Agile because they’re boss came over and yelled requirements at them, and they worked all weekend to implement. Agile is a reaction to the realities that caused that environment, but I doubt that any real Agile practitioner would describe that as an Agile team. And how many teams have started calling they’re morning meetings a scrum without any real implementation of Scrum?

What To Do?
In order to manage this better in my own day to day work, I have made it a point of regularly looking up terms that I feel I understand at a high level, but lack some certainty about it. You don’t need to read entire books or long articles on every topic you come across, but definitions combined with articles or posts that provide some context for use help.

And the next time you feel that terminology could be the issue, just ask the person to step back and clarify what they mean by the term. You don’t have to come of as ignorant of the subject matter, just make a simple statement like, “I think the implication and meaning of the term X is at the heart of what we are talking about, can we take a step back and make sure that we all our talking about the same thing?”

The Really Hard Cases:
I think examples like html 5 and n-tier where one bad definition has almost universally taken hold are the hardest to deal with. Which term do you use? It seems obvious to say that you want to use the correct term, but do you want to spend all of your time explaining that you mean the lesser-used more-accurate definition?

In these cases, I simply try to avoid the terms, or add some kind of specifier to the statement. For example, let’s say I was recommending training on the new and upcoming Web Standards (html 5, css 3, javascript). Rather than say “I’d like our developers to train in html 5”, say “I’d like our developers to train in the html 5 way of doing sites.” In extreme scenarios, you can stretch that out to explicitly say “html 5, css 3, and javascript.”

I could write a book on the number of times I’ve heard people misuse the term Service Oriented Architecture during my time as a consultant. I’ll sum up by saying this, SOA has nothing to do with Web Services. Web Service is merely one of many transports you could choose to implement SOA. But the mere presence of Web Services is no more an indication that you are in a SOA environment than the an engine is evidence that you have a car.

Finally, take care when using terms that people are sensitive about. Telling someone who has made a point of learning and practicing Agile that your team was “basically Agile” because you ran around changing on a whim is offensive to the time and investment that person made in learning

More Examples
Do you feel confident that everyone one your team would define these terms in compatible way? How sure are you that your definition is right?

  • Agile
  • Unit Test
  • Functional Test
  • Integration Test
  • System Test
  • Specification
  • Requirement
  • Layer
  • Tier
  • Physical Architecture
  • Logical Architecture
  • Instrumentation
  • Testable Code
  • Document
  • Entity
  • Messaging
  • Workflow
  • Automated Workflow
  • System Workflow
  • Document Workflow
  • Service
  • Pattern
  • Framework