[ Content | Sidebar ]

Understanding CPUs and the Business of CPUs Better

July 26th, 2011

I’ve been reading Jon Stokes’ Inside the Machine, and it’s a very good read. In particular I was struck by a couple of simple aspects of how CPUs work.

ISA

First, let’s discuss ISAs (instruction set architecture). x86 is a famous one created by Intel. POWER is an ISA created by IBM. PowerPC was created by IBM, Motorola and Apple. ISAs stay may evolve, but stay relatively consistent (usually backwards compatible) as new CPU designs that use that ISA are created. For developers, think of the ISA as the API (application programming interface) of the chip. This is because, the implementation can vary drastically. For example, many x86 processors take the complicated instructions x86 allows for, and executes them as a series of sub instructions (RISC-like). As a programmer of any language (including assembly), you only care that the ISA is still the same, now how the work is done.

ISAs are disconnected from manufacturers. They can be licensed. While Intel comes to mind when you think of x86, AMD produces chips as well. ARM chips are licensed and produced by all kinds of manufacturers, including Qualcomm, Apple and more.

So what happens when a device or platform manufacturer changes processor? Let’s give a couple of examples. In an environment where backwards compatibility is paramount, it’s very hard to change ISAs. Microsoft has yet to do it, although the upcoming Windows 8 will support ARM. They will deal with the issue that Apple dealt with when the Macintosh line switched from PowerPC to x86/x64 chips. Apple had to provide a software compatibility layer (named Rosetta). Appropriately named, it translated the low level language of PowerPC instructions to x86/x64 instructions. Eventually, Apple made it’s development tools optionally support “Universal” binaries, so called “fat binaries” because they contain the instructions for both ISAs and the build of the operating system for each ISA knows how to select the correct portion of the binary for itself. Microsoft appears to be trying the simpler route of not providing translation for legacy applications to run on ARM. Still, it’s tool-chain going forward will have to provide builds for both ISAs. Presumably with some foresight, the installer could contain both binaries and install only the correct binaries. This would be valuable considering the ARM devices are presumably tablets where space considerations still matter. Regardless of the path forward, developers need to recompile all code to support the migration path forward, including 3rd party or shared libraries.

What about a more controlled platform, like a game console? For example, the original Sony PSP was an MIPS chip, while the PS Vita uses an ARM chip. In this case, the clear line between product generations makes the transition easier. Any code has to be recompiled, just like before. But that is more of an expected result among software makes for consumer devices. As new devices come with new high level APIs and operating system calls, and that is the real adjustment for a programmer making software on such consumer devices. If Sony does choose to support downloadable PSP games on the Vita, it will be on them to provide the compatibility later.

Microarchitecture and Processor Lines

Now that we understand that ISA doesn’t dictate implementation, it’s worth explaining that the actual implementation is called a microarchitecture. Changes in microarchitecture do not change the ISA. So for a counter-example, when the x86 ISA got MMX extensions, those resulted in new instructions. That is not a microarchitecture change, but an ISA change. The chip can execute the new required instructions any way it sees fit, MMX just means it handles those instructions. An example of a microarchitecture change is when Intel’s microarchitecture started using out of order execution of instructions to optimize the efficiency of loading instructions (and reduce bubbles, but that’s a longer topic for another time).

Microarchitecture changes can result in real performance differences. Various clever tricks like pipelining, branch-prediction and more can drastically improve the throughput of a processor without affecting it’s clockspeed. When one cheap leader seems to be in the lead in benchmarks, but the processor numbers (like speed and cache) are the same, it’s usually a sign that said vendor has a better microarchitecture at the moment.

With that in mind, it is much easier to decode the processor lines than it would first seem. Product names change a lot, but the microarchitectures stick around for a while. If you look at that info, you’ll find that the product lines that use the same microarchitecture differ by cost, cache size(s), clock speed, power consumption, transistor density, etc. It helps to look through a list like this of microarchitectures released by a company. Just be aware that some of the codenames are really just smaller versions of earlier microarchitectures. You’ll see a power/heat change in that case, but it’s largely just a manufacturing change.

Summary

So what does all this mean? Hopefully, when you see benchmarks, or discussions about major platform or tooling changes based on chip changes, it will make a bit more sense. And processor shopping should be a little easier if you understand the that once you zoom in a microarchitecture that you prefer, you can slide up and down the cost scale a bit based on clock speed, cache size, etc. Certainly this basic understanding has emphasized to me that clock speed isn’t everything. One only need see the benchmarks of two different microarchitectures to see how big the differences can be. For example, see this comparison of an Intel Core 7 (Nehalem microarchitecture) and AMD Phenom II (K10 microarchitecture). You can see real differences in there.

And finally, as you ready about various hardware configurations you should begin to recognize where certain ISAs fit as the best tool for the job. The pure efficiency and power of IBM’s Power ISA is the reason it still has such a stronghold in super-computing and other big-iron applications. While ARMs low power efficiency and flexibility makes it the clear leader in portable devices. Take the iPad, Apple’s A4 and A5 chips may sound like a new invention, but it is just a new implementation of the ARM ISA with an on chip GPU. Finally, x86′s desktop software library and price / performance balance have kept it king of the desktop computer for a long time running.

Interesting speculation to think about: With Windows 8 and OS X Lion both heading in a tablet friendly direction, and Windows 8 and Apple tablets running ARM, you have to wonder if Apple and Microsoft won’t move away from x86/x64 in order to simplify their developer tools by getting rid of the need to compile to two different ISAs.

Post to Twitter

Popup Overlay Auto-Applied to an Anchor with jqModal

July 25th, 2011

Working on a relatively straightforward page, I want to have some of the simpler static contact come up as an overlay style of popup. Following web standards, it would be nice to have this code automatically work based on a CSS class. This would provide two benefits: easy to add more such links without writing new code and it will work without javascript.

I worked through a solution using a variety of samples. The project uses html5, jquery, and the jqModal plugin.

The footer code where I have the link I want to change gets the “popup” css class added to it.

          <a href="terms.html" class="popup">Terms</a>

terms.html itself is a simple static html page with a header and some paragraphs, so I won’t show it here. I’ll only mention that it can still have html and body tags of it’s own.

In css, I have some simple styling in place for jqModal to use.

.jqmWindow {
    display: none;

    position: fixed;
    top: 17%;
    left: 50%;

    margin-left: -300px;
    width: 600px;
    height: 600px;
    overflow:auto;

    background-color: #EEE;
    color: #333;
    border: 1px solid black;
    padding: 12px;
}

.jqmOverlay { background-color: #000; }

Finally, here’s the javascript that sets it all up. If you were going to use this on multiple pages, you might want to move it into it’s own file.

      <script type="text/javascript"
            src="http://ajax.aspnetcdn.com/ajax/jQuery/jquery-1.6.2.min.js">
      </script>
      <script type="text/javascript" src="jqModal.js"></script>

      <script type="text/javascript">
        var dialogs = {}; // cache for dialogs keyed by url

        $().ready(function() {
          $('a.popup').click(function(event){
            event.preventDefault();
            var url = $(this).attr('href');
            if(!(url in dialogs)) {
              dialogs[url] =
                $("<div class='jqmWindow'>Loading content</div>");
              $(dialogs[url]).jqm({ajax:url});
              $('body').append(dialogs[url]);
            }
            $(dialogs[url]).jqmShow();
          });
        });
      </script>

First, you’ll notice the code references jquery via the Microsoft CDN. It also pulls in a local copy of jqModal.

Then, I declare a page level variable. “Global” variables are not something to use often, but as a page level cache, this is actually an appropriate use. If this was off in it’s own file, you may want to use namespacing tricks in javascript to avoid name collision (outside the scope of this example).

Next, in jquery’s document ready function, we add a click handler for any anchor (link) with the class popup. It stops the default behavior (which for a link is to submit a get request). Then, the function tests to see if it has already setup a dialog div for that url. If not, it creates the html for the div with the proper jqModal class, and adds it to the body. Finally, it sets up the dialog div to load the url from the associated anchor via ajax.

The final step after either creating or fetching the dialog div, is to call it’s jqmShow() method.

Post to Twitter

Rake Tasks For NuGet

July 7th, 2011

If you use NuGet, and only check-in your packages.config files to source control, then your source control repository will stay smaller, and checkout faster. Checking in binaries is usually a nice thing to avoid. However, you need new developers to be able to get those libraries locally easily, and to allow your build server (continuous integration or otherwise) to keep libraries up to date.

After all, that’s one of the advantages of NuGet, you only download a new library when you first need it for that solution, or when you change versions.

In order to solve this problem, I use a rake task, as we use rake for other setup tasks (creating or seeding databases, continuous integration, etc). Rake may sound like an odd choice for a .Net environment, but it’s very good for customing cmd line tasks, and all the .Net building and setup we have run into can be done from the command line. Anyway, assuming the directory structure below, I thought I’d share the Rake tasks…

MySampleProject
|-Rakefile
|-tools
 |-NuGet.exe
|-Source
 |-Packages
  |-NHibernate
  |-MassTransit
 |-SampleProject.Web
  |-packages.config
  |-SampleProject.Web.csproj
  |-other files, etc
 |-SampleProject.ServiceBus
  |-packages.config
  |-SampleProject.ServiceBus.csproj
  |-other files, etc

Relevant tasks in the Rakefile below. Note the ci task using nuget task before it builds.

def nuget_for_project(project_dir)
  sh "tools\\NuGet.exe " +
    "i Source\\#{project_dir}\\packages.config " +
    "-o Source\\Packages"
end

namespace :nuget do
  desc "nuget for servicebus"
  task "ServiceBus" do
    nuget_for_project "SampleProject.ServiceBus"
  end    

  desc "nuget for web"
  task "Web" do
    nuget_for_project "SampleProject.Web"
  end  

  desc "nuget for all"
  task "all" => ["nuget:ServiceBus", "nuget:Web"]
end

desc "continuous integration task"
task "ci" => ["clean", "nuget:all", "build", "test"]

Post to Twitter

NHibernate Named SQL Queries with Parameters

June 23rd, 2011

I had to create a stored procedure to be called from NHibernate. You could use Session.Connection to execute with ADO.Net, but I like the idea of staying in NHibernate for consistency. Anyway, I found a lot of documentation on how to call one, but not with a parameter, so I thought I’d document that here.

This will be a simple and contrived example, that in no way justifies not just using NH linq, Criteria, or HQL to query. But let’s say you have a book table, and you want to query by author and for some reason you need to do this in a stored proc, because there is some aspect of the code or optimization that you only can do in the db.

Create your stored proc:

Use [MyDatabase]
Go

if OBJECT_ID('[dbo].sp_BooksByAuthor') is not null
begin
	drop proc [dbo].sp_BooksByAuthor
end
go

create proc [dbo].sp_BooksByAuthor
   @author_id bigint
as
begin
Set NOCOUNT on
Select b.*
From Books b
Where AuthorId = @author_id
end
Go

*Note the alias for books is optional, but if you do it, you need to specify it in the query mapping (see below).

Then map the query. Somewhere in one of your mapping files, but outside of a class put:

<sql-query name="MyBookByAuthorQuery">
  <return class="Book" alias="b" />
  exec sp_BooksByAuthor :AuthorId
</sql-query>

Finally, your NHibernate query would look like as follows:

public IList<Book> GetBooksByAuthor(Author author)
{
   var session = SessionFactory.GetCurrentSession();
   var qry = session.GetNamedQuery("MyBookByAuthorQuery");
   qry.SetParameter("AuthorId", author.Id);
   return qry.List<Book>();
}

Your data access may vary a lot from a simple method like that, but you get the idea.

One thing worth noting, if you get an error about clazz_, it’s related to polymorphism. Go read this post for how to write your sql to account for it: http://www.methodicmadness.com/2009/01/nhibernate-what-is-heck-clazz.html

Post to Twitter

Source Control Considerations for ConnectionStrings in .Net

June 23rd, 2011

For .Net projects that have multiple developers, configuration differences can be a real problem. For this post, we’ll use the example of connection strings, but it could just as easily be a directory location, or some other difference. Let’s say that we have two developers, one is using SQL Server, while the other is using SQL Server Express. A minor difference in configuration. The first developer needs to list a DataSource of “.” or “localhost”, while the second developer’s DataSource is something like “.\SqlExpress” or “localhost\SqlExpress”.

Note, for the purposes of this discussion, I’ll refer to some git terminology, as it’s my preferred source control tool, but this could just as easily apply to TFS, subversion, mercurial, etc.

Some developers will put the most common version in their web.config, and the developers in the minority have to change their file without checking in, and regularly undo and redo that process in order to get latest if their are other web.config (or app.config) changes.

To avoid this, some teams don’t check in the web.config file. They check in something like web.config.sample which has the default settings, and you make a local copy called web.config that is not registered with source control. You can make any appropriate changes without causing conflicts. Anytime you get latest and see a web.config change, you’ll want to use your favorite merge tool to pull any necessary changes into your web.config. Less work than the previous version, but more manual merging.

I like to recommend a 3rd method, that uses a lesser known piece of web.config capability. Most config sections allow a configSoure attribute to specify an external config file.

So in the case of the connection string differences, the web.config would get checked into source control, and it’s connection string section would like like the following:

<connectionStrings configSource="connection.config"/>

And then the connection.config file is treated like the second method above. There is a connection.config.sample file in source control that shows all the connection strings you need to have with sample connection info. Like follows:

<?xml version="1.0" encoding="utf-8" ?>
<connectionStrings>
    <add name="SomeDB"
         connectionString="Data Source=.;Initial Catalog=SomeDB;Integrated Security=True;"
         providerName="System.Data.SqlClient" />
</connectionStrings>

Additionally, we have a rake task (could be msbuild, maven, .bat, etc) that helps you initialize the project upon first checkout. One of the things it does, is copy that file from connection.config.sample to connection.config. And our .gitignore file on the project, tells git to ignore connection.config. Now you only need to change your DataSource and are ready to go. And the web.config can change without the need to change your connection.config file. The only time that should change is when you add, remove, or rename a connection string.

This same idea can be used for Elmah config, SMTP config, etc. For example, in development, we have email go to file, and different developers use different directories to store the mail. This scheme handles it.

Lastly, this same scheme can largely handle the differences we have with staging and production environments. Since the differences in web.config from the development branch to the stage and production branch are minimal, merging is pretty painless. And those external config files are a one-time setup on the server.

Consider this pattern for your projects, and let me know what you think. Either in the comments below or at @thoolihan. And if you find any room for improvement or have other feedback, please pass it along.

Post to Twitter

Fixing Canon MP Navigator with Windows 7 x64

June 18th, 2011

I installed a scanner / printer directly via usb on a windows machine at my new home. At my previous home, it was hooked to a time capsule and I would plug it into my mac to scan. Canon has some software “MP Navigator” that does the scanning and handily puts the scans into documents. I was able to use it to print via any program, and scan via windows fax & scan, but the MP Navigator for Windows 7 wouldn’t work.

Why does it matter? Windows fax and scan is nice for sending faxes via a modem, but it’s scanning is very limited. MP Navigator, for instance, will allow you to scan multi-page documents in and save them as a single pdf file.


After googling around a bit, it appears many other users were having this problem on Windows 7, and x64 in particular. I went to the good old compatibility tab of the icon, and voila, things work. This works 9 times out of 10 when a program doesn’t play well with Windows 7.

The settings are as follows:
settings

Post to Twitter

On Terminology: “Single Source of the Truth”

May 24th, 2011

According to Wikipedia, Single Source of the Truth “refers to the practice of structuring information models and associated schemata, such that every data element is stored exactly once” (emphasis is mine). This would mean, for example, a customer’s first name to be stored in once repository, not in every system that refers to the customer.

First, it’s a concept that is both difficult, and subject to various interpretations and implementations. The Wikipedia page does a nice job of mentioning the difficult parts, like dealing with the schemata of Vendor products, etc. As for the variety of implementations, you can enforce this in a dogmatic way where data is truly only stored in one place. Or you can implement with policy, having a location for each piece of data that is considered the master, and other pieces of data are responsible for publishing changes and updating periodically from that source. Either way, it is clear that this is a strategy to choose judiciously.

Additionally, choosing this strategy requires strong consideration of the effects on performance, reliability, and caching. If secondary storage is allowed, then stale data and concurrency issues arise. If secondary storage is prohibited, then you now have a single point of failure for many applications. Using the example of a CRM system being the single source of a customer’s first name, imagine the impact of that CRM system being down if other applications are not allowed to store that data.

So why this post? Why all this time and effort to define the term and discuss some of it’s nuances? In a variety of work places, I’ve seen this catch on as part of the lingua franca between business and IT workers, but used carelessly. And the number one problem is that I’ve seen Type A managers use this term to justify their oversimplified view of information management.

Notice the emphasis on information, and that I emphasized data in the definition “refers to the practice of structuring information models and associated schemata, such that every data element is stored exactly once.” Information is data within a context, and that’s the key problem when you get sloppy with the concept of “Single Source of the Truth.”

In one example, a particular manager had a problem with the fact that weather data was being stored in many different systems across the enterprise. I was part of a team tasked with creating a single consolidated data store and import program for all weather data across the company, because of his goal of having a single source of the truth. Briefly after looking into the other systems, it was clear that he didn’t grasp the ramifications of the concept.

Weather is a key factor in the demand for this customer, and so it is the basis of historical analysis, contract bidding, countless other aspects of their business. To our anonymous manager, that meant it was crucial to consolidate this information and have only one source. He was certain that people were out there using inconsistent sources that were causing efficiency problems, among other things.

Let’s start with the different types of weather data. There are forecasts and actuals. There is daily weather and hourly (and daily sources are peak for some uses, average for others). Finally, it’s worth noting that weather data is often corrected later, when the real-time value provided was measured incorrectly, or some other type of error occurred.

So let’s assume that we’re trying to consolidate hourly actual data. All applications should use this source. And let’s look at a couple of those uses:

  1. A bid for service is based on historical data, where the agent writing the bid used that weather data to evaluate the customers demand sensitivity to weather, and to evaluate the companies supply as trend of weather.
  2. A report on the effect on weather on supply is regularly supplied to operations managers.
  3. The accuracy of this data supplied by an outside vendor is to be regularly audited by Supply Chain.

Now, let’s assume those activities have taken place for the month of April, and it’s the middle of May. Now the vendor comes in with correction data for the middle of April. For the first purpose (the contract), I want to store what information was used to write the contract at the time. It’s the only fair way to evaluate the agent, as he wrote the bid based on the best available information.

Because supply and production is naturally affected by weather, for the second purpose (operations evaluation), I want to rerun those reports based on new, more accurate information.

Even more disruptive is the fact that in order to evaluate the variances in the accuracy of the vendor data, the company should be storing both values.

This leaves you at a decision point: Do you handle this by declaring these as different information, or version the information. In other words, the bid history is linked to uncorrected vendor data, and the updates are used to create a corrected data source that can be used for the operations purpose.

The alternative is that that corrections cause the creation a new set, but all sets are retained. Differentiation is handled with a version number or timestamp, and all the above problems are solved. While this sounds simple, versioned data grows quickly, and is difficult to query and understand.

Due to the timestamp, each record can now be referred to as the single source of weather data, for that location, occurance date and time (date of the weather), as provided on said date and time (import time). But for each location and time, there are multiple potential values as corrections are entered. And there is forecast vs actual data.

So to be precise, I still can’t say “give me Cleveland’s weather for March 7, 2011.” I would have to say “give me the actual weather value for Cleveland on March 7, 2011 that was available when I wrote a bid on April 5th.” Or in the case of an operations manager, they would request “the latest value of actual weather for Cleveland on March 7, 2011.”

Those are different pieces of data. But I don’t think that’s what our manager had in mind when he requested a single source of the truth for weather data. Because he meant weather information. Context / details / reality didn’t fit the mental model he had of weather data.

In the case of this project, we were able to slightly reduce the amount of weather data stored. And we certainly reduced the amount of batch jobs involved in fetching that data from external sources. But we also created a performance and reliability bottleneck. That may or may not have been the right decision. My point is that it is worth taking some time to think through and understand the terms you are using. Sometimes simple answers are great, but sometimes they are really just a sign of naivety.

Post to Twitter

Interesting Assignment Related to Rich Internet Controls

May 19th, 2011

One or my recent projects was to work with a client to create a Silverlight control for reporting purposes that has some general flashiness, and is reusable across several sections of the site. It has brushed into some interesting areas. I have used Silverlight for some media and imaging applications before, but this was the first that the control had to integrated with the authorized users profile and other security related server side settings.

In particular, the first thing to keep in mind is that as long as the Silverlight Enabled WCF endpoint is within the web application on IIS, then the service code has access to the users profile info. It’s just as if making a jquery request from a page, there is no need for the Silverlight control to re-authenticate.

The really interesting part is where the project goes from here. The client has asked me to create an equivalent control using HTML 5 technologies. There will be some lunch and learn sessions on how both of these projects were done, and comparing and contrasting both the user experience and developer experience.

I’m excited about doing this type of work for a variety of reasons. First, the training aspect really helps our company differentiate from other consulting companies, especially from rent-a-coder staff augmentation type shops. And I enjoy the challenges and rewards that come with helping other developers work with tools that are new to them. Finally, the very question we’re addressing (plugin vs standards based RIA controls) is on the forefront of lot of minds, given the current state of browser and mobile applications.

Anyway, this is just a general update of what I’m up to, but I hope this work will spin off some presentations I can do locally, and some blog posts.

Post to Twitter

Silverlight Application Code 2103

April 20th, 2011

When working with Silverlight, if you get an application code error 2103 and go looking for answers, you’ll find posts like this that suggest it’s a namespace problem. If that’s not the problem, you’ll find yourself stuck without many other suggestions.

Here’s a simple check for another common problem, permissions to download the Silverlight xap. Type into your browser http://localhost:8080/ClientBin/MySLControl.xap (based on your dev server and Silverlight control name) and see if you can save the file. FYI – It will show up as a zip.

If this is an asp.net site and you do have permissions issues, just add the following to your web.config:

    <location path="ClientBin">
        <system.web>
            <authorization>
                <allow users="?"/>
            </authorization>
        </system.web>
    </location>

Post to Twitter

Agile Muffins: A Simple Justification of Iterative Methodologies

March 28th, 2011

I find myself more and more involved in the sales process with my company. And while our primary focus is on expertise around technology and information, the team I sell for most does work using Agile methodologies. It’s not the primary aspect of our sale though, as people want to hear about our proposed solution, our work history etc. Rarely do they want a long explanation of how we do our work. But we do briefly explain it. So what is my Agile elevator speech?

I stumbled into this explanation on a particular sales call. I wanted to explain / justify an iterative approach, as opposed to a phased (big design up front) approach to a firm that had a background in mechanical engineering. After all, big design up front comes out of the kinds of engineering found in manufacturing and construction.

When building a bridge, the bridge is the end product. It is tangible and defined. It’s state may change, and it requires maintenance and monitoring, but it is a big noun. Creating software is more like making a muffin recipe (say for a book or something). In other words, it’s verbs. The muffin isn’t the product, the recipe is. After all, software is instructions for a computer, like a recipe is instructions for a baker.

In either case, you want to invest your time and money in the product. So with a bridge, you want all the data and research you can get, in order to have it right before building anything that commits you to all or part of a design. The cost of change is high. And the process is just a necessary evil. There is no value in the process, the value is in the bridge. And before finishing, you will test and refine the bridge, not the construction process.

With a recipe, you are investing in the process. The cost of change (throwing out the muffins) is low. And you are testing the process, not the muffins. Yes, you test by eating the muffins, but they are a proxy for evaluating the process. After all, if the muffins are bad, you don’t try to fix that batch of muffins, you rework the process and make a new batch.

Your mileage may vary, but for me, this analogy helps me keep the core values of agile in mind. I think Sports Management is another good one. Imagine planning everything for a season before it starts, and then refusing to adapt during the season.

Post to Twitter