Saturday, November 14, 2009

In the future, my laptop is a dumb terminal

I have a computer problem: I think the computer I want doesn't exist.

Or maybe it's that Sun had it right, that The Network Is The Computer, and I want an imaginary network.

I do most of my work on a laptop. It's a fairly recent model (MacBook Pro), with a modern CPU (2.5GHz), fast hard drive (7200RPM), and has what would have been not too long ago an incredible amount of memory (4GB).

It just isn't enough, though. I easily exceed what the machine can do with my day-to-day work, often waiting around while it pages, and much longer when I actually have it doing Real Work. On an average day I have the following applications running at the same time:
  • a web browser (Firefox)
  • a WYSIWYG document editor (OpenOffice)
  • 1 or 2 IDEs (Eclipse)
  • 2 Java-based Application Servers (JBoss, Tomcat)
  • a relational database (MySQL)
  • an instant messaging application (Skype)
On a bad day I might have to also fire up a virtual machine (VMWare Fusion to run Windows).

There simply isn't a laptop that can keep up with what I want to do. As I see it there are only a few categories of solutions: (a) run less software, (b) distribute the software to multiple machines, or (c) get a better machine.

I'm inclined to consider (c) only because its an interesting thought experiment in the world where the latest laptops aren't much better. Here's a survey of what I could come up with.

  1. Get a powerful desktop at each location I work from.

    Pros:
    • Fast

    Cons:
    • Software licensing. I don't want to buy software all over again for each machine.
    • Synchronization. There are lots of ways to sync, from network home directories on a server to software like Windows Live Mesh, dropbox, MobileMe, etc. I'm not sure using any of these is realistic.
    • Multiple powerful desktops are expensive


  2. Run all software on a powerful central server. My laptop, and any other machine, is used simply as a terminal to the server.

    Pros:
    • Synchronization is not an issue, as all work is done on the central server

    Cons:
    • Requires violations of the first three fallacies of distributed computing
    • Remote desktop software doesn't usually work all that well. (VNC - slow, limited capabilities; X - sssslllllooooowwww; RDC - not an option for Mac OS X)
    • Server can't access to internal network resources without VPN or advanced remote desktop solutions (like Window's RDC)

  3. A portable disk drive, with eSATA or Firewire 800 interfaces, that is carried around and booted from on powerful desktops at each location.

    Pros:
    • Solves the synchronization issue of option #1

    Cons:
    • All machines must be close -- or identical -- in hardware
    • A single drive is vulnerable (but encryption and backups can solve that)
    • Requires option #1 - fast desktops at each location - and therefore expensive.


  4. A portable disk drive containing a virtual machine. Similar to the above option, but the disk contains a virtual machine image. The VM is then run on a desktop at each location and the host desktop contains no other software than the virtualization platform.

    Pros:
    • No common hardware requirement as was required by option #3
    • Solves the synchronization issue of option #1

    Cons:
    • Performance hit due virtualization
    • Single drive is vulnerable (but encryption and backups can solve that)
    • Requires #2 - fast desktops at each location - and therefore expensive.

  5. A luggable. Since I run Mac OS X, this would probably have to be a Mac Pro taken apart and refitted for a small case.

    Pros:
    • Least expensive of all options

    Cons:
    • Likely heavy and awkward to carry. Keeping some items, like the power supply, external (and duplicated) could help with that.
    • No hope of warranty service.

It doesn't seem like any of these options are very practical. Maybe the real solution is to figure out how to run less software.

Tuesday, June 23, 2009

One-line consistency test

I once had to come up with a quick way of sending commands to a server to see if we could reproduce a sporadic error. Perl/netcat one-liner to the rescue:

perl -e 'open(INP, "<data.txt"); $|=1; while(true) { while(<INP>) { sleep(3); print; }; seek(INP,0,0); }' | nc -i 1 localhost 43210

.. where the data.txt file had a list of commands, one per line. This worked because we weren't looking for multi-threaded testing and because the sever took pretty simple linefeed-delimited commands. If we needed to do anything more complicated, it probably would have been time to get out JMeter.

Monday, June 22, 2009

Google Chart Example

About a year and a half ago, Google released their Charts API. I took a look at it then, and here were my notes I took at that time. You can pretend you're reading this in December 2007 if you want.

  1. Data ranges. The range of data that is plottable is limited in that it should be zero based. If you want to graph data that isn't zero-based, you'll have to massage your data first.
  2. The limit resolution of the graphs means more data massaging. For large data sets, you'll have to apply some numerical methods for shrinking the set to something that will fit the resolution. There isn't any fantastic way around this for google except to allow for larger images or to accept more data and do their own massaging the way a spreadsheet application would.
  3. Request limit lengths mean that you also have to limit your data set and possibly limit your range so you can use the cheapest (simple) encoding. Supporting POSTs would solve this.
Here's a sample chart of the temperature in a small server room I monitor, as measured by the air inlet temperature in a racked machine. The call to the google charting component used is http://tinyurl.com/2tlsvu


In case you are wondering, we had some interesting events in that room on and off for a few months. There were AC failures where the breaker would trip and leave only half the AC running, and then there were responses where we'd power down a lot of the equipment. We were also receiving more equipment and were close to a break-even point where adding more heat would cause the temperature to run away. Since we were that close, even leaving a CRT monitor on, the door open for a little too long, or people spending a lot of time in there could impact the measured temperature. All electrical energy in a sever room is converted to waste heat, with the possible exception of some escaping electrons and photons (via fiber optics, charge migration through data lines, and possibly ionization of the air).

The source data is temperature, so the scale could be anything (0-100 would make sense), but the interesting range is somewhere between 60-100. To generate this from the source data, the data set is reduced in the x axis by
  • breaking the samples up into buckets, the number of which is the # of raw data points divided by the number points I can display;
  • taking the average of the bucket;
  • and then using the average as the displayed data point.
The y data is then mapped to fit in the 0-100 range so that it will look
more dynamic (it'd be rather more flat on a 0-100 F scale). The points are then encoded using google's simple encoding to limit the request length.

For comparison, here is an image generated using perl's GD::Graph. Since there is essentially no limit on what you can do when you control the whole API, I was free to use a lot more data.



In the chart above you can see a lot more detail, which seems like noise. I played around with different filtering methods to try smoothing out the chart, such as an inflection filter and a moving average. For this writeup I left a lot of the "noise" in so you could see the increased resolution, but that the general shape is the same as the Google Chart.

One could argue that the limitations imposed by the Google Chart API actually force you to produce a smoother, easier to understand chart. That is probably a good thing in the general case, although in the engineering world overfiltering the data might mask too much meaning. For example, this chart tells me more than just the average temperature over the period of a few days. The very-short-term fluctuations in the graph (the "noise") could be just as informative as the overall trends. It tells me that the temperature isn't stable in small intervals, which could be because of thermal sensor noise, problems capturing or filtering the data, the AC units cycling very frequently, and so on.