Saturday, February 23, 2013

Git and Mac OS 'brew' case study.

Summary

The Git Distributed Version Control System (DVCS) is one my current favorite computer science models right now. And I am ecstatic to see it starting to find it's way into every day tools I use.

This morning I was reminded that the Mac OS 'brew' 3rd party open-source library (linux wanna-be) package manager needed only a 'git pull' command to update it's local database.. From this, I was reminded of all the complexities of configuration ownership, and in an instant my mind was free of frustration (albeit only briefly)

What is Git?

http://git-scm.com/

or better:

https://github.com/

or honorable mention

http://www.atlassian.com/software/stash/overview

My Summary of Git is that it solves solves 3 problems.

How do I keep represent an arbitrary file in an arbitrary state? (e.g. maintain changes)
How do I share these files and associated changes with someone else?
How do I integrate files and changes from someone else into one or more files?

Now, you might think.. Oh but X,Y,Z already does this... Perhaps.. And perhaps very well (Adobe photoshop suite has an excellent integrated collaboration system). MicroSoft share-point integrated office-documents pretty decently. Subversion + apache is an excellent centralized poor-man's publication system. Then of course there are formal publication tools.

But all the above are special case, proprietary, require central control (and thus present beaurocratic delays in mental work-flows). In order to 'save' a file, I have to publish the file... In order to save the file, I have to get a sys-admin to create a repository. In order to get a file, I have to 'ask' my peer to publish their file. In order to reconcile, there can be only 1 canonical representation.. If a casual user were to go looking for my file, they'd probably pick the canonial one (even though multiple tags, branches, versions exist), and thus everybody has to agree on what that is (which means you are FORCED to use a bureaucracy)...

Now don't get me wrong, bureaucracy is a good thing.. But it's not applicable to every situation. And, as I mentioned, it does interrupt the creative process. Imposing possibly weeks of delay.. (Company A and Company B spend 2 weeks agreeing on a file exchange protocol like ftp, Kerberos + subversion, etc). Or what is the absolute worst - to avoid the bureaucracy - they just email each other files. I hope I don't have to explain the horrors of this process and it's likely outcome over time.

How is Git Different?

Git uses cryptographic SHA-1 signatures of 'Objects' (really just files, but some file formats are special, like directories, tags, commits).

This means all objects can be independently verified by comparing to their signatures
All differing objects are universally unique (very very very very very low probability that objects will EVER have the same signature - but with enough objects in a working-set, it is possible; billions of objects)

Git uses the SHA-1 as the object-ID

All IDs are the same size

allows efficient database indexing

ID's are relatively small (20/40 bytes bin/hex)

allows efficient storage / network transmission

ID's of different locations can safely be merged together (thanks to global uniqueness)

Objects can stand alone

You can inject, delete any object into any Git system
You can create, split, merge, archive, delete objects independently from git repositories
They are unique eternal representations of an object (file)
They are independent of their storage format (raw, prefixed, gzip'd, xdelta'd, future)

Objects themselves are NOT versioned (versioning is externalized)

versioning is external to a file (unlike RCS, subversion, office-documents, etc)
Allows SHA-1 to never need changing
Allows alternate version-control systems to be applied to the same logical change-sets
Allows multiple parallel histories of the same file (below)

Non-Trivial files have canonical representation

Directory, tags, change-sets all are sorted with well formed structure
This allows two independent authors to produce an identical object-SHA's for coincidentally identical content bundles.

Efficient work-space representation

Entire history is stored locally for rapid analysis and version-mutation
Entire history is generally gzip + xdelta'd and naturally de-dupped (due to SHA-IDs)
Fully fault-tolerant (due to SHA-IDs, and infinite number of work-space copies)
Single check'd out work-space representation

Rapid local delta-ing when switching between points in the version history (switching to tag, branch, arbitrary change-set point)

'hard link' based cloned work-spaces, minimizes (though not eliminating) overhead

Works on most modern file-systems, ntfs, ufs, ext, hfs, xfs, zfs, btrfs, etc

network-copies are rsync deltas of object-bundles.

Minimum "central" server load (not even required)

Due to laptop proliferation (transient up-time), central 'push' servers are valuable for collaboration.
The central server is nothing but a dumb file-system with whatever convenient copy-in, copy-out protocols are available (DAV, 'git', rsync, SMB, NFS, ssh+scp, etc).

Since you're just transferring 'compacted' object bundles and an index of ID to bundle-offset, virtually anything can work; you just want to avoid concurrent over-writing of some index files

Change-set Parallel history universes (below)
Promotes Social collaboration

Does not enforce a methodology on other's
Is "pull" centric instead of "push" centric.

You look at everyone else's changes and choose what you want from them

Fetching is getting remote changes without integration (merging)

You only ever push to YOUR repository; separate from everyone else

Encourages 'asking' people to publish their incomplete changes

Empowers content authors
Emboldens communication by content browsers / contributors

Disagreements can be resolved via publication namespaces (URLs)

Owner of project owns canonical namespace for publication
Others can use different [temporary] name-spaces
Until resolved, participants can choose to use whichever namespace is most appropriate for continued productivity
Volatile / politically dangerous changes can be hidden in private namespaces until/unless a time is appropriate.

What is a file?

What is a file? If I rename it, is it the same file? What if I wanted to swap two file names? What if I deleted a file, then later created a new file that happened to share the same name? What if a file's name and contents stayed similar, but a subtle change in the contents means that a human obviously would not consider it to be the same file anymore (ex president-accomodations.doc).

Systems like subversion, sharepoint, etc, know what a file is... The file-name... If you wanted to rename the file, you'd use 'mv' type commands, and it's subversion's job to keep track of that. If you wanted to replace a file, you'd delete it, then create a new one. Easy right?.....

But how do I communicate those 'verbs' across a federated universe of human actors? If the buracracy breaks down and lines of communication are lost.. If we had to restore our VCS from backup and annotate it back to life.. Was the history of those VERBs maintained? If Joe 'moves' shopping-list.txt to 'joe-shopping-list.txt' and Bob overwrites it with his own needs, what happens?

The issue is that while you CAN get clever, you're going to go wrong eventually.. And with federation in particular, that wrong is going to be the norm. Consider the following:

Husband: Honey, I need eggs. Wife: Ok, I'll make a note *edits android app to include eggs in shopping list; publishes in cloud*Husband: {thinks}hmm, she's going to forget, let me edit my shopping list to add.. *publishes to cloud*Wife: Honey, while I'm shopping can you publish your list of things you needed, I already have eggsHusband: *adds unrelated shopping list fragments to file and publishes"Wife: *updates local shopping list*

Now what's the expectation when the wife reconciles? There are LOTS of changes, but more importantly, the same change was made twice: 'eggs'. What is the logical human expectation?

Well, one could argue that they should be alerted to the fact that there is a conflict. One could argue that any modification to the list that needs reconciliation should alert the user. But in this case, there is no conflict... Someone didn't say "I need eggs" and someone else said "Eggs are bought"... They're both saying "I need eggs".

The point is not that git happens to solve this particular problem better than any of the other SCMs (except perhaps real-time systems like google-docs; which are not always practical, and solve a different problem entirely), but that there are potentially DIFFERENT ways that this reconciliation can be applied.

Git's strength is that it doesn't enforce versioning patterns on the user. It provides a default which can sometimes be confusing. But not only can you reconcile in different ways, for different situations, but, YOU can CHANGE the history of a file. An administrator is not needed. And if someone doesn't like that history, they can apply an alternate history in a different namespace (e.g. repository).

The KEY is that the ONLY thing that mattered was the contents of the file. If, at the end of the day, the wife is able to see what she needs to buy, then how it got there (it's history and merge process) is irrelevant.

This isn't always a true statement. Government audit records are not histories to be trifled with. A single canonical representation is critical. Even customer presented feature-change-lists should be immutable - they want to see version 1.2.3 followed by 1.3.5 followed by 2.0.9. If that history got rearranged in the change-list of the next release, they'd be un-happy... And that's what canonical lists are for. Lists that are intended to never have their histories changed, and where bureaucracy is mandatory for submission of changes.

BUT, this is only the final publication.. There are intermediate publications that go on all the time in life.. We chat verbally with our peers.. We exchange emails.. We email files.. We ftp/share files. All outside of the canonical publication.. The only real question is whether a publication system can track those histories as well - since tracking things brings order to the chaos.

This, I'd argue that Git is a file representation that logically exists in a multi-verse of potential states-of-being, with different pasts, and different futures. And, I'd apply Occam's Razor. If two descriptions of a system produce identical measurable outcomes, then the simpler of the two is more likely correct. At the end of the day, you get the same published PDF, word-doc, HTML-file, source-code. The challenge is in the process that gets from thoughts to final publication.

Basic Git 'things'

Directory tree of objects

Every object goes in a file named by it's SHA-1 (with first 4 chars as 2 directory levels)
Object Files are either gzip'd or raw (small header thrown in to say which)

Compacted Object bundle

'Compacting' takes 99% of the free-form file-objects (above) and throws them into a single file

The file is SHA-1'd and it's name is that of the SHA

A second 'index' file is a simple fixed DB which maps SHA-ID's to offsets within the compacted bundle
This is the basis for all network transfers.

Any push operation first compacts the objects and transfers an xdelta from an existing known remote bundle

ASCII text starting points

.git/refs/heads/master - file with 40 bytes; hex representation of SHA-1 containing head (root) of main-line tree
.git/refs/heads/feature1_branch - file with 40 hex bytes; representing SHA-1 of the head of a branch (which might temporarily be same as master)
.git/refs/tags/release_1_0_0 - file with 40 hex bytes; representing SHA-1 of an arbitrary local tag
.git/refs/remotes/bobs_computer/master - file with 40 bytes; hex .... of starting point of an object pulled in from bob's computer represending HIS current head. Note, YOU might already have had this object and thus rsync didn't copy it.. YOU might have created the same set of files and it happened to match his representation (independently authored)
.git/HEAD - ascii file with the name of the current 'head', e.g. 'master' or 'feature1_branch' or 'release_1_0_0' or 'remotes/bobs_computer/master'
.git/config - simple 10+ line win.ini type config-file (arguably should have been XML). Denotes remote URLs and any special settings.

While not as elegant as it could be.. It's idiot simple and easy for 3rd party tools (like ruby-based brew) to extend.

What's the point of github?

So given how every work-space is a full backup.

Given that anybody can pull from anyone else's repository directly (drop-box, google-drive, sky-drive, SMB, http).

Given that you can email patches directly from git commands "git format-patch -M origin/master -o outgoing/; git send-email outgoing/*".

Given that there are thousands of collaboration tools.

What's the point of github?

Github represents a popular canonical name-space for a multitude of projects.. Like an always on closet server (to front your transient-on laptop), it represents a universally accessible collaboration point.

Most any git project will be on github - EVEN IF it's already on atlansian's 'stash'... Why? Because people that collaborate on open-source already have a github account, and have local tools which expediate working on multiple platforms (windows, linux, mac).. They have passwords memorized. They understand the work-flow because they've contributed to a project or two.. It isn't necessarily the be-all-end-all... But they're familiar with it.

The same COULD have been said about code.google.com or sourceforge.net. Had they been 'git' based. github was just first and sufficient (and pretty darn pretty).

github gave a useful online editing tool, so you can edit from any 'chrome-book' or possibly even android device. (obviously only micro-edits on such devices).

github exemplifies the social-hierarchy and culture of:

Give credit to original authors
While you can't wrest ownership, you can 'fork'

Someone else forking doesn't mean they hate you; it's the ONLY way for them to contribute because they don't have write-access (so you're less likely to take offense)

You 'respectfully submit' a 'pull request' to the owning author from your fork's change-list.
Authors can accept with a single button-click (visually seeing the diff), reject or request ammendments.

This fosters collaborate, enforces consistent formatting rules, reduces the bureaucratic review process while FORCING some minimal review

Contributors are in no way required for the original author to comply or accept.. If they reject or delay.. You have a completely valid ALTERNATE canonical representation of the original project.
Due to alternate version-histories, pull requests are required to be trivial-merges...

This promotes a linear canonical version-tree and simple (like subversion)
This avoids conflict resolution (allows reliable 1-button UI merges)
This forces the submitter to continuously vet their changes against the master/trunk (least surprise)
This parallelizes the work.. The project owner is not constantly pestered with maintenance tasks.. He can delegate work to peers, and each is naturally responsible (no merge possible) for maintaining latest merge-histories with the central fork.

What is Brew?

https://github.com/mxcl/homebrew

To me, brew represents a replacement for 'apt' and 'yum' and 'rpm'. It is a package manager which gives a linuxy-feel to mac..
It owns the '/usr/local' directory space (though technically it could go anywhere.
It gives non-root ownership of said repository (something rpm doesn't)
It makes '/usr/local' a git-repository.

Updating apt-cache or yum-listings is simply 'git pull'

All libraries are downloaded and locally compiled

Minimizes dependence central compiler-farms (though I do miss apt/rpm pre-compilation)

All libraries are stored in localized directory structures (most libraries support this, but package managers hate it for some reason)

/usr/local/Cellar/ffmpeg/1.1/ bin , lib , share , etc
Can install multiple simultaneous versions of same library

All libraries are sym-linked to central location

/usr/local/bin/ffmpeg -> ../Cellar/ffmpeg/1.1/bin/ffmpeg
/usr/local/lib/libavcodec.54.86.100.dylib -> ../Cellar/ffmpeg/1.1/lib/libavcodec.54.86.100.dylib
This allows you to swap out a library AND trace back every artifact to it's owning project and version (slightly more reliably than rpm/apt)

A single /usr/local/bin/brew ruby-script
The brew hosting is on github.

So the exciting thing for me is to see a community-driven trivial-to-extend database of source-code-projects.

Conclusion

There is nothing special about git, github or brew... All are someone hackish toys... Initiatives whos' authors were at the right place at the right time. But they, for me, represent a trend that I hope continues... Open-Collaboration, Open-Contribution, Open-Attribution. And from a science perspective, ever more elegant models of federation, data-distribution, de-centralization (or at least limited dependence on central choke points).

In our 'cloud' era, this is something we are starting to forget... We cite google, apple, android, iOS as enabling tool providers... But they're really monolithic appliances (like a car).. They're monumental achievements, to be sure. But ultimately they will last only so long as some venture capitalist (or centrally planned government agency) is willing to subsidize the cost (as most such services are not intrinsically/directly funded). The cloud-model (including app-stores, rpm-repositories) is very main-frame era in nature. Even with distributed cloud, you are still reliant on a single vendor, a single chief administrator, a single attack-point of a virus.

Distribution, on the other hand exemplifies a cell-network.. Something which is resilient to faults.. Resilient to adverse-interests. It is something that can scale because you have both big-iron vendors and their fully engineered (and funded) projects, sitting next to a transient laptop (serving providing content 'seeds' to trusted peers).

Saturday, April 7, 2012

Callbacks coming back

Historic Context
pun intended

Callbacks are a wonderful thing. It's the original object-oriented code.

Code A makes a function call to Code B along with a context.

OOP programming sort of took us off the straight-and-narrow when it generalized this.. The problem became that A started to know a LOT about B, tightening the coupling to the point that code was nearly impossible to be polymorphic (ironicly).. If A wants a list of "Employee" objects, so that it can sum up their salary to produce a company cost.. I can't reuse that same "sum-up" algorithm and apply it to, say, matrix-math on arbitrary data-structures. I'd need to write an adaptor classes/functions to produce common data-structures that both the algorithm (the sum-up function) and the classes themselves both conform to. Now do this in such a way that two vendors would have chosen the EXACT same signature. And your chances of code-reuse go out the door.

The same can be said for execution pipelines. Do A, then B, then C... BUT if A, B or C fail, do error-code E. Normally we'd think procedurally and write function-calls DIRECTLY to E everywhere.. OOP gave us the notion of exception-handling, which has it's merits.. But if we're linking to 3rd party code, we have similar problems.. How do I differentiate exactly what each exception means genericly enough that I can link to any 3rd party code? I can't, I have to read the documentation and import their specific exception-class.. And God help us if they handle this poorly.

But, if our code was instead a sequence of callbacks that had minimalist signatures and data-hidden contexts, we could be more versatile.

a( callback-b ( callback-c ( { DONE }, error-cb-E ), error-cb-E ), error-cb-E)

Here A, B and C do their thing then call either a success or error callback to done the exit-path or the continuation forward-path. The void-main decides the linkage order, and here's the beauty.. Instead of A hard-coded calling B, it can be swapped out at CALL-time to do any number of alternate things.

If you want specialization, you can write a function which you call which links A to B, but leaves C as a callback; you could write a function which takes A, B, C but defaults a common E, etc.

The main obsticle, I would imagine is readibilty of code. And here comes the strengths/ weaknesses of the underlying language.

Who's got my back?
The caller does

We'll assume a virtually stateless zero-arg input, zero-arg output for these examples

C func-signatures
=============
typedef void (*voidfunc)(void* data);
void runme(void* data, voidfunc cb) { cb(data); }
void mycallback(void* data) { ... }
int x = 5;
runme(&x, mycallback);

The definition of ugly could reference here.

C++ Objects
==========
class Callback {
public:
virtual void operator()() = 0;
};
void runme(Callback& cb) { cb(); }

class MyCB: public Callback {
public:
void operator()() { .. }
};
MyCB cb;
runme(cb);

More code, more assembly-setup (constructors/destructors), and only slightly more readable than C (you could replace operator() with something more readible, but then what's the likelihood other vendors would have chosen your name too?).

C++ 1998 templates
===============
template<typename T>
void runme(T& cb) { cb(); }

class MyCB {
public:
void operator()() { .. }
};

MyCB cb;
runme(cb);
void foo() { .. }
runme(foo);

Getting there, still have boiler-plate. Sadly the templates need to live in .h files.

C++ 2011 lambda's / closures
==============
void runme(function<void()> cb) { cb(); }

int x = 5;
int y = 6;
runme([](){ printf("Hello\n"); });
runme([=](){ printf("Hello %i\n", x); });
runme([x](){ printf("Hello %i\n", x); });
runme([&](){ x++; });
printf("x=%i\n", x);
runme([&x](){ x++; });
printf("x=%i\n", x);
runme([&x, y](){ x+=y; /* y++; error */});
printf("x=%i, y=%i\n", x, y);

Ok, I think they got the point.. You can staticly specify the closure (what is copied, what is referenced. Other than the initial shock of seeing square-brackets, it does stand out as a lambda.. And the rest seems rather intuitive. parents for callback params, brackets for code. The one remaining oddity is the ret-val signature..

GCC and visual studio should support some subset of this syntax. GCC didn't like =x for example.

For GCC you have to use --std=c++0x
More at:
http://candrews.net/blog/2011/07/understanding-c-0x-lambda-functions/

Apple LLVM C/C++ extension; call-blocks
======================
typedef void (^callback)();
void runme(callback cb) { cb(); }
int x;
runme(^{ printf("%i", x); });

Now we're really on the right track. Though it gets nasty if you want a reference closure.

__block int x;
runme(^{ x++; });

And lets not get started with the possible bugs introduced from their explicit cloning and reference counting of blocks.

Happiness does not derive from here... Moving on.. (hate mail welcome)

Java Objects
=========
void runme(Runnable r) { r.run(); }

final int x= 5;
final AtomicInteger ai = new AtomicInteger(0);
runme(new Runnable() {
public void run() { System.console.printf("Hi %i", x); ai.set(x); }
});

Slightly better than class C++ Objects (due to anonymity). But hopefully peers have chosen Runnable and Callable as their signatures. But what if we wanted 2 input params? What no TupleN data-type?

language-fail

Javascript anonymous functions with full closure
=======================
function runme(cb) { cb(); }
var x = 5;
runme(function() { x++; ... })

Pretty much the definition of simplicity and readibility. Sadly, you don't always want closure.. Say in

for (var i = 0; i < 10; i++ ) runme(function() { i--; }) // REALLY BAD

Basically java mandates one way which is safe, javascript mandates the other way which is unsafe.

perl anonymous functions
==================
sub runme($) { my $cb =$_[0]; &$cb(); }
my $x = 5;
runme(sub { $x++ })

Invocation is pretty good, but the dispatch code is nasty looking. Moral of the story.. Perl Rule #1, be a library USER, not a library writer in perl. :)

I will say, however, that the generic stack-structure of perl makes it EXCELLENT for callback oriented code.. You can write a function which doesn't know, nor care the number of arguments.. So, for example, sort-routines, couting routines are natural fits

sub count(@) {
my $tot = 0;
$tot += $_ for @_;
return $tot
}

and the type-coersion means if you can stringify something, then numerify-that string, the count func will work on it. But lets not forget rule perl # 1, shall we.. Moving on..

ruby lambdas
==========
def runme(cb)
cb.call();
end
runme lambda { ... }

Their general lambda syntax is excellent.

lambda { |a1,a2| /* closure+local code, inferred returned data-type */ }

If only it weren't for the rest of ruby syntax. :) I mean, these guys took BASIC, python and C, and said, eh, I think you missed a combination. Clearly there were not vi fans.

python lambdas
===========
def runme(cb): cb

runme(lambda: ... )

Hard to complain, yet somehow I've never bothered. And I'm one of the rare people that like their tab-oriented syntax. I white-board pseudo-code in python.. I just never want to have to use it for some reason.

groovy closures
============
def runme(cb) = cb();

runme( { ... } );

Groovy has a lot of merit. It seems to be driving syntactic changes in java itself. I just fear that it's compromising on a lot of java's strengths. type-safety, tight deterministic execution paths (groovy compiled code can be highly reflective for something that otherwise would look efficient), harder to staticly analyze by an editor (e.g. unsafe refactor operations, or 'who calls this code?' queries).

But if you look at it as a language unto itself, it's certainly in the same domain as jython, jruby.

google dart
========
runme(cb) { cb(); }

runme(() => expression);
or
mylocal() { expressions }
runme(mylocal);

I think Dart has merit as a replacement for javascript, being that it's written as a forwardly compatible language. It's definitely more system-language than javascript and thus more amendable to large-scale 'million-line-apps'. I think it has some quirks because most feedback in the dart-forum is, "we agree such and such would be better, but we're really trying to maintain the legacy language feel, to help invite new coders". My response, which is echo'd by others' is that you're going to shock them with the deviations anyway, so just because you can make it smell like C++ isn't going to win anyone's heart once they delve into it.

google go closure
=======
func runme(f func() { f(); }

runme(func() { ... });

scala functions with full closure
===========
def runme(cb: =>) = cb
var x = 5 // variant
val y = 6 // invariant
runme( => println((x++) + y) )

Basically a safer javascript, since you are very prone to lock down almost all variables by default (don't have to lift an extra finger to type "const" or "final").

Almost all traditional syntax is optional (semi-colons, parens). So we could have been explicit:

def runme(cb: () => Unit): Unit = { cb(); };
...
runme(() => Unit { ... });

I'm becoming prone to using scala as my prototyping language because it's faster to type, more compact than most other languages (including ruby), yet has the full explicit power of the JVM library (threading, async-IO, JNI, video-processing, etc). Further, they nicely try to mitigate several of the languages bugs like null-pointers. And the power to which they've extended callbacks is mind blowing (literally hurts the brain).

def createfunc(x: Int)(y:Int):Int = x + y

val cb = createfunc(5)_; // nasty trailing "_" to denote incomplete execution

print(cb(6)); // prints 11

Is a lambda-factory which generates a callback which wraps "x". The callback will take "y" and add it to the wrapped x.

I only elaborate because the next level of callback functions is the synthesis of callback functions

Lisp / clojure / etc
==========
(define runme(cb) ((cb)))
(foo (lambda () ( ... ))

Or even the crazy lamba factory syntax
(def create-foo(x) `(lambda (y) (+ x y)))

// WARNING - syntax most probably incorrect

With the mandetory xkcd link:
http://xkcd.com/224/
or (http://timbunce.files.wordpress.com/2008/07/perl-myths-200807-noteskey.pdf)

Summary
Callback oriented code, IF the underlying syntax promotes it, allows you to almost fully decouple your code. You can do this with objects, functions, function+data pairs, functions-with-closure. But the key is that such coding style promotes cross-code-use. You can scaffold your execution pipeline differently in different contexts within the same executable.

Javascript AJAX and apple grand-central are promoting the hell out of callbacks. They don't leave you any choice.
Java's spring-framework gets you half-way-there with the notion of dependency-injection... Forcing you at least to interface your code for testibility. It's a half-measure.

The real power is that once you've callbackified your code, asynchronous programming models become second nature. If you were going to break up the error-handling or steps 1, 2 and 3 of your code anyway.. Now you have the option to use a dispatch system (grand-central, java executor-pools, javascript worker-threads) to leverage multiple CPUs. But this isn't always a positive (inter-CPU cache-coherence can actually slow-down performance, and will certainly burn more power for the same unit of work). Still, it's great for UI stall-mitigation.. backgrounding all non-critical-path work-flows. If nothing else, the explicit concurrency can flattened into a single thread of execution. That's just a run-time setting.

Thursday, November 3, 2011

The US Cell Phone Market

or how I learned to play wacka-mole

Looking to buy a new phone?

What carrier should I choose?
What phone should I choose?
How future proof are these choices?
Can I switch carriers?
Will my phone work overseas?

If you're a techie like me, it can be maddening looking at the history and current state-of-affairs of world-wide mobile phones.

A brief overview

Cell phones have gone through 4 generations numbered 1G through 4G loosely representing the past 4 decades (since 1980).

1G was the original analog, which used circuit switched networks. This had low-latency (quick response) and connection reliability but very little spectrum efficiency (e.g. downloads were less efficient per MHz). AMPS (Advanced Mobile Phone System)
2G added digital downloads and uploads which leveraged packet switched networks. Europe made a concerted effort to standardize radio frequencies and protocols via GSM (Global System for Mobile Communications), while most of the US carriers went their own ways. Common GSM protocol names were GRPS (General Packet Radio Service) and EDGE (Enhanced Data Rates for GSM Evolution).
3G started in 2000 in Europe as a general specification UMTS (Universal Mobile Telecommunications System) which stated 3G will be anything that is 200kbps or faster and must be all digital. This included HSPA (High Speed Packet Access), later faster downloads with HSDPA (High Speed Downlink Packet Access). Still later HSUPA (High Speed Uplink Packet Access). And finally HSPA+ (with between 5Mbps to 48Mbps). A major competing standard was CDMA2000 which included 1xRTT and EV-DO (Evolution-Data only later rebranded as Evolution-Data Optimized). And finally WiMAX (Worldwide Interoperability for Microwave Access).
4G was guided by IMT Advanced (International Mobile Telecommunications-Advanced) in 2008 to define the most recent generation. This included a minimum bandwidth of 1Gbps, and a requirement to be pure IP based. It basically leveraged OFDM (Orthogonal frequency-division multiplexing)- a technique leveraged in short-range WiFi since 802.11a. It also leveraged the latest WiFi techniques (802.11n) of multi-antenna / multi-path, often called MIMO (multiple-input and multiple-output). However, there were major challenges, given existing radio frequency spectrum allocations, costs, and handset requirements. So a practical evolutionary approach was decided on called LTE (Long Term Evolution) and subsequently LTE Advanced. Separately higher speed WiMAX solutions have been adopted by some vendors. LTE peek speeds are currently on the order of 50Mbps. Far short of the 4G requirement. More importantly, there is tremendous overlap of LTE and HSPA+ deployments.

The above is certainly not an authoritative reference - it represents the culmination of my frustrated attempts at learning bit-by-bit from various articles including wikipedia, anandtech, and carrier marketing site.

The basic problem

Radio frequencies have the following characteristics:

A Frequency is measured in HZ (after Heinrich Hertz), and it represents one full rotation (similar to a rotating tire - but more specifically a complete sinusoidal oscillation between two orthogonal states).
Radio frequencies are split up in a continuous spectrum much like color shades from red through violet (as can be seen through a prism).

This means there are a very large number of frequencies between even 1HZ and 2HZ (though quantum physics quickly gets in the way).

Radio Frequencies are categorized as those between 3Hz and 300GHz.
The frequencies commonly attributed to Cell phones are UHF (Ultra High Frequency) 300MHz - 3GHz.
What is interesting about Radio Frequencies is that they can emit off copper wires and be captured again by a different copper wire miles away.
Some radio frequencies can pass through walls, while others have a hard-time. Still others' are subject to natural interference.
For a given power level, some frequencies can travel great distances with little signal loss; others can only travel a few inches. For those shorter range frequencies, you can usually increase the power of the transmitter to increase the range, but sometimes there are consequences - an Arc-welder, for example, is a high powered radio transmitter.
Radio transmissions are composed of oscillating multi-dimensional bundles of energy called photons that continuously transition back and forth between the electric and magnetic fields, while simultaneously traveling through space at the speed of light.

The process of transmitting radio frequencies wirelessly through space is very similar to how they are transmitted through a waveguide such as a copper wire.

As a consequence, wired ethernet and fiber-optics have very similar limitations as wireless - but have the advantage that multiple parallel cables have very little interference with one another - So you can achieve massive bandwidth with very little power consumption.

Antenna's are basically echo-chambers that can absorb 50% of the power in a given radio photon. The remaining 50% is echo'd/reflected back out.

This means radio transmission is never more than 50% power efficient
Given a long-lasting coherent frequency, the echo's resonate within the antenna's echo-chamber and can coherently be amplified (in the case of analog music radios like FM/AM) or measured / sampled (in digital systems).

Minor obstructions in the wires of an antenna have massive effects on which frequencies the echo chamber can resonate effectively. This is more-or-less how we can tune a wire to accept ONLY a small frequency-range.
We call a pre-defined frequency-range a channel (VHF channel 13 is 156.65 MHz +/- 25 KHz)

A GSM cell tower may define 124 channels centered about 850MHz in 200 KHz increments.

There is bleeding of one frequency into the next due to noise, photon-scattering (e.g. through air), and the basic physics involved in antennas. So while a given photon is precisely 1 frequency (for example 30,153.112212 Hz), the resonation-amplified signal will not be determinable beyond an accuracy of say 5kHz at the 30Mhz level. Thus an entire contiguous band needs to be allocated/reserved.
Using mathematical techniques on digitally sampled measurements of the antenna, we can effectively encode between 0.1 and 20 bits per Hz of a given contiguous frequency-band.

Thus if a cell phone was allocated a 1 MHz band somewhere in the UHF range of 300MHz to 3GHz, then they may be able to encode between 100Kbps and 20Mbps. Note, there would only be 2,700 such bands in the UHF range. So for a given region, you could not support more than 2,700 cell phones with such a partitioning scheme.

The measurement of bits per preciously scarce Hz is called spectral efficiency.
Given the ever increasing cost of frequency-ranges in given geographic regions (such as high population areas), there is an ever growing need to increase spectral efficiency - all else being equal.
When you sell a device that uses a frequency, you have to wait until they have all been retired before you can effectively re-purpose that frequency for a different device or protocol.

Thus, historically the frequency spectrum is full of ranges that you are not allowed to use because of spectrum squatters.

In the US, the FCC (Federal Communications Commission) defines what devices are allowed to use which frequency-ranges; what protocols to use in those ranges, and at what max-power-levels (so as to define a max-range/distance of interference)
You can generally separate two-way send/recieves in frequency or time (or some combination there-of).

TDMA - Time Division Multiple Access
FDMA - Frequency Division Multiple Access
CDMA - Code Division Multiple Access (combines a range of frequencies over time to produce a code-point that

So to sum it up, we've got a limited supply of useful over-the-air spectrum - though it can all be reused in each city. Much of it is full of legacy devices (such as Analog radio (AM / FM), Satalite, Analog TV). The dollar value of a each MHz of spectrum skyrockets yearly, and thus legacy systems are quickly being retired so as to allow their spectrum to be re-distributed to the highest dollar-value market.

Today that market is the cellphone industry.

BUT, much of that spectrum is already allocated for cell-phone use. And there are over 1 billion cell phone devices world-wide. There is a very high cost in upgrading older cell phone devices that have less spectral efficiency or to consolidate which frequencies are used for which protocols, so a given phone can be useful in different cities around the world. So instead, most carriers will opt to continue to fragment the world-wide-market for short-term cost-savings.

So lets say we have 2 cities A and B that each implemented their own 3G phone Phone 1 and Phone 2. Lets say they used frequencies 1 GHz and 2 GHz respectively. Lets say there was some legacy satalite blocking 1 GHz in city B.
Now lets say they both wanted to create a 4G network. Let's say in City A, the only available frequency is 3 GHz. So they create a Phone 3 (that is backwardly compatible with Phone 1's frequencies) . But in City B, it was cheaper to decommision that satalite and reuse the 1 GHz space. Lets also say 3 GHz is NOT currently cost effective to re-purpose. So City B KNOWs that if it chooses 1 GHz it will be in conflict with City A. But it would take many years and a lot of money to do something else - so instead they create a Phone 4. So now Phone 1 and 3 work in City A ONLY, and Phones 2 and 4 ONLY work in City B.

Now take this situation and take 20 frequencies and 15 protocols. Many of which requires special dedicated hardware to work properly. Depending on the details, there could be upwards of 100 specialized pieces of hardware required to actually work on all possible frequencies and protocols. While technically possible, the cost-effectiveness of making a LOW-POWER portable cell phone is perhaps challenging. Now throw in patents / royalties for a given protocol, and a handset manufacturer has to think seriously about whether it's worth while making a true world phone (one that can operate in any major city around the world with at least voice connectivity).

The breakdown

Europe:
While Europe is full of conflicting standards, it did champion the GSM standard. This allocated the following frequencies:

900MHz (890-915 uplink, 935-960 downlink) GSM / EDGE / GRPS/ 2G with 124 channel-pairs.
1.8GHz (1.710-1.785 uplink, 1.805-1.880 downlink) GSM / EDGE / GRPS/ 2G with 374 channel-pairs.
1.9GHz / 2.1GHz IMT (1920–1980 uplink, 2110–2170 downlink) UMTS / 3G / W-CDMA (2004 - )
1.8GHz DCS (1710–1785 uplink, 1805–1880 downlink) UMTS / 3G / W-CDMA (alternative / migration)
900GHz GSM (880-915 uplink, 925-960 downlink) UMTS / 3G / W-CDMA (alternative / migration)

US:

Due to conflicts, the US allocated these (and other) frequencies:

800MHz (825 - 894) for 1G AMPS (FDMA) then incrementally upgraded to 2G D-AMPS (TDMA) in ATT / Verizon / On-Star (started in 1982 and discontinued in 2008).
850MHz (824-849 uplink, 869-894 downlink) GSM / EDGE / GRPS/ 2G with 124 channel-pairs.
850MHz T-Mobile for GSM (1996-) ROAMING ONLY
850MHz (824-849 uplink 869-894 downlink) ATT UMTS / HSPA / 3G (HSDPA 2005 ) (HSUPA in 2009 ) - slowly replacing GSM
850MHz Verizon CDMA / 3G
1.9GHz Verizon CDMA / 3G
1.9GHz T-Mobile GSM / 2G (1994-) (GPRS 2002) (EDGE 2004)
1.9GHz ATT GSM / 2G (GPRS 2002 ) (EDGE 2004 )
1.9GHz (1.85-1.99) Sprint 2G CDMA / GSM custom.(1995 - 2000)
1.9GHz PCS Sprint CDMA / EV-DO
1.7GHz / 2.1GHz AWS (1.71-1.755 uplink, 2.11-2.155 downlink) T-Mobile UMTS / 3G (W-CDMA 3.6Mbps in 2006) (HSPA 7.2Mbps in 2010) (HSPA+ 42Mbps in 2011)
1.9GHz PCS (1.85-1.91 uplink 1.93-1.99 downlink) ATT UMTS / HSPA / 3G (HSDPA 2005) (HSUPA in 2009) - slowly replacing GSM
700MHz ATT LTE / 4G (2011 - )
700MHz (777-787 uplink, 746-756 downlink) Verizon UMTS / 3G
2.5-2.7GHz Sprint XOMH WiMAX / 4G

Spectral efficiency and bandwidth

AMPS (0.03 bits/Hz)
D-AMPS (1.62 bits/Hz) Each channel-pair is 30KHz wide in 3 time slots (TDMA). Supports 94 channel-pairs.
GSM Each channel is 200KHz wide.
GSM / GPRS (?? ) ( 56Kbps to 154Kbps)
GSM / EDGE (1.92 bits/Hz) (400Kbps to 1Mbps)
CDMA2000 / EV-DO (2.5 bits/Hz) (2.4Mbps to 3.1Mbps) 1.25MHz
UMTS / WCDMA / HSDPA (8.4 bits/Hz) (1.4Mbps to 14Mbps) 5MHz channel-pairs
UMTS / HSPA+ (42Mbps)
LTE Advanced (16 bits/Hz) (6Mbps normal peek 300Mbps) 1.25MHz .. 20 MHz channel-bundles
WiMAX (1.75 to 20 bits/Hz)
V.92 modem (18.1 bits/Hz)
802.11g (20 bits/Hz)
802.11n (20 bits/Hz)

Phones:

Apple's iPhone 4 contains a quadband chipset operating on 850/900/1900/2100 MHz, allowing usage in the majority of countries where UMTS-FDD is deployed. Note, this doesn't support the 1.7GHz uplink for T-Mobile.
Samsung Galaxy S Vibrant (SGH-T959) T-Mobile - GSM / 2G 850, 900, 1800, and 1900. UMTS 1700/2100 (US, Tmobile only) and UMTS 1900/2100 (Europe). It does NOT support the 850 band as used by AT&T 3G.
Samsung Galaxy S Captivate (SGH-i897) ATT -GSM / 2G 850, 900, 1800 and 1900. UMTS / 3G 850/1900 (US, ATT) and UMTS / 3G 1900/2100 (Europe).
Samsung Galaxy S II (SGH-I777) ATT - GSM/3G 850/1900 (US) 900/1800 (Europe)
UMTS / 3G 850/1900 (US, ATT) and UMTS / 3G 1900/2100 (Europe) [1.2 GHz, Dual Core Exnyos C210 + Mali-400 MP GPU]
Samsung Galaxy S II (SGH-T989) T-Mobile - HSPA+ [1.5 GHz, Dual Core Qualcomm Snapdragon S3]
Samsung Galaxy S II Skyrocket (SGH-I727) ATT - HSDPA / 3G 850 / 1900 / LTE 700MHz [1.5 GHz dual-core Snapdragon S3]

Why am I missing Verizon/Sprint?

Frankly, the information for Verizon/Sprint was less abundant - and this being a personal handset research project, I just gave up looking.

The VAST majority of the data was garnered from wikipedia and general google searches. If I come across more detailed information in my spare time, I'd love to update the data.

References:

http://en.wikipedia.org/wiki/4G

http://en.wikipedia.org/wiki/Evolved_HSPA

http://en.wikipedia.org/wiki/Comparison_of_wireless_data_standards

http://en.wikipedia.org/wiki/GSM_frequency_bands

http://en.wikipedia.org/wiki/History_of_mobile_phones

http://en.wikipedia.org/wiki/Digital_AMPS

http://en.wikipedia.org/wiki/Personal_Communications_Service

http://en.wikipedia.org/wiki/Spectral_efficiency

http://en.wikipedia.org/wiki/Verizon_Wireless

http://en.wikipedia.org/wiki/T-Mobile_USA

http://en.wikipedia.org/wiki/AT%26T_Mobility

http://en.wikipedia.org/wiki/UMTS

http://en.wikipedia.org/wiki/EV-DO

http://en.wikipedia.org/wiki/Sprint_Nextel

http://en.wikipedia.org/wiki/General_Packet_Radio_Service

http://en.wikipedia.org/wiki/EDGE

http://en.wikipedia.org/wiki/LTE_Advanced

http://navcen.uscg.gov/?pageName=mtVhf

http://en.wikipedia.org/wiki/Radio_frequency

Saturday, June 11, 2011

Cloud Computing

Intro

I recently had the chance to go to a cloud computing expo in NYC. I didn't think there would be much to learn, and indeed, most of the presentations were very high level. But if you paid attention, there were many little gems.

Where we've been

Virtualization has been around for a while. A LONG while. IBM has been doing this since the 70's. The idea was that a business needed a really really really reliable system, so it needed 3 way redundancy for every aspect of the computing framework. The devices were self-healing; on failure, the hardware would reroute to working chips, drives, etc.

Now that you've got this multi-million dollar system, it would be kind of nice to not let it sit idle all the time. So we invent time-sharing. This basically allows multiple users to perform their tasks (possibly in isolated operating system slices). This has the effect of making the whole system slower, but the goal was data reliability and price - not performance.

Then came the cheap commodity DOS and windows hardware. This meant smaller businesses could solve the same basic problems without 'big iron'. And with the manual backup process, you could have as much fault tolerance as you could eat. Big corporations now represented dis-economies of scale. By being so large that they couldn't survive down-time, or they were so large, they couldn't fit on floppies, it meant their ONLY option was expensive hardware.

Over time various UNIX flavors started replacing big-iron.. Now 'cheap' $20,000 servers could handle thousands of users in time-sharing environments, and could utilize 'cheap' SCSI disks in a new fangled MIT described RAID configuration. 5 200 MEG SCSI disks could get you a wopping 800 Meg of reliable disk storage! A new era of mid-size companies was taking over. And the 90's tech bubble was fueled by incremental yet massively scaling compute capabilities. Herein the IT staff was king. The business models were predicated around bleeding edge data or compute capabilities, and thus it was up to the IT division to make it happen, and the company was made or broken on the ability to meet the business challenges... Of course, as we all know, not every business model made any sense at all, but that's another story.

The next phase was the introduction of free operating systems, namely Linux. The key was that now we could make cost effective corporate appliances that were single-functioned.. And in the world of network security, there was a constant need to isolate and regularly upgrade security patches.

Enter VMware..

While a mostly academic endeavor, it had long been possible to 'simulate' a computer inside a computer. I personally worked on one such project. Many companies, like Apple and DEC had strong needs to 'migrate' or expand over applications written for different operating systems, and they sometimes found that it was easier to just emulate the hardware. IBM once tried software service emulation with their OS/2 to support Windows 3.0 style software, and likewise Linux had a windows 3.0 software-stack emulator called WINE. But both of these endeavors were HIGHLY unreliable, in that if they missed something, it wouldn't be visibile until somebody crashed. And further, it couldn't be as efficient as running on the native OS - begging the question of whether full hardware emulation might still be better.

So now VMware had found some techniques to not emulate the CPU itself, but instead only emulate the privledged instructions that represent OS calls. This is presumably a much smaller percentage of emulation, and thus faster than CPU-emulation. You can then act as a proxy, routing those OS calls to a virtualized OS. So the application runs natively, the transition to the OS is proxied, and the OS runs in some combination of native and expensive CPU emulation. Namely if it's just some resource allocation management algorithm that requires no special CPU instructions, then that'll run natively in the OS, but if the OS call is actually manipulating IO-ports, virtual memory mappings, etc, then each of those instructions will be heavily trapped (proxied). Later, VMware type solutions found they could outright replace specific sections of OS code with proxied code, minimizing the expensive back-and-forth between the VM and the target OS.

Thus VMware was notoriously slow for IO operations, but decently fast for most other operations, and, like vitual CPU emulation, it was 100% compatible with a target OS. You could run Linux, Windows, OS/2, and theoretically Mac OS all side by side on the same machine.. This meant if you had software that only ran on one OS, you could have access to it.

Eventually people realized that this could solve more than just software access... What if I knew that FTP sites around the world were being hacked.. Sure I'll patch it, but what if they discover a new flaw before I can patch? They'll get to my sensitive data... So, lets put an FTP server inside it's own OS.. But that means I'd have to buy a new machine, a new UPS, a new network switch (if my current one has limited ports). Why not just create a VMware instance and run a very small Linux OS (with 64 Meg of RAM) which just runs the FTP service. Likewise with sendmail, or ssh servers, or telnet servers, or NFS shares, etc, etc etc. Note these services are NOT fast, and you have to have redundant copies of the OS and memory caches of common OS resources. But that's still cheaper than allocating dedicated hardware.

So technically VMware didn't save you any money.. You're running more RAM, more CPU and slower apps than if you ran a single machine. But the 'fear' of viruses, exploitation, infiltration, etc made you WANT to spend more money. So assuming that your CTO gave you that extra money, you've saved it by instead going to VMware. See free MONEY!!

If you were concerned about exploitation, then there was one glaring hole here.. The VM ran on some OS.. that might have it's own flaws... Further, the path is from guest app trapping to Vmware (running as a guest app on a host OS), who then delegates the CPU instruction to the guest OS. And the host OS may or may not allow the most efficient way of VMware to make this transition. Obviously VMware needed special proprietary hooks into the host OS anyway.. Sooo.

VMware eventually wrote their own OS... Called ESXi. This was called a hyper-visor. A 'bare metal' OS that did nothing but support the launching of real OS's in a virtualized environment.

In theory this gave the OS near native speeds if it was the only thing running on the machine, since there was only a single extra instruction proxy call needed in many cases.

So now we can start innovating and finding more problems to solve (and thus more services to charge the customer). So we come up with things like:

Shared storage allows shut-down on machine 1 and boot up on machine 2. This allows hardware maintenance with only the wasted time to shutdown/boot.
vMotion allows pretending that you're out of RAM, forcing the OS to swap to disk, which, in reality is forcing those disk writes to an alternate machine's RAM.. When the last page is swapped, the new OS takes over and the original OS is killed. This is near instant fail-over (dependent on the size of RAM).

toggling RAM/num-CPUs per node.
Using storage solutions which give 'snapshot' based backups.
Launching new OS images based on snapshots.
Rollback to previous snapshots within an OS.
This all allows you to try new versions of the software before you decide to migrate over to it. And allows undoing installations (though with periodic down-time).

The reason Linux was valuable during this period was that making these OS snapshots, mirrored images, etc, had no licensing restrictions. You were free to make as many instances / images as you saw fit.. With a Windows environment, you had to be VERY careful not to violate your licensing agreement.. In some cases the OS would detect duplicate use of the license key and deactivate/cripple that instance. Something that is overcomeable but not to the casual user/developer.

Note that VMware was by far, not the only player in this market. Citrix's XEN, Red Hat's KVM, VirtualBox, and others had their own directions.

Hosted Solutions:

In parallel with all of this came the hosted web-site solutions. Build-a-web-site with WYSIWYG such that grandma can build/publish online with ease.

Next to that were leased hardware solutions.. Server Beach, Rack Space, etc. Also were the good old time-sharing solutions, where you'd just be granted a login; literally next to hundreds of concurrent users.

Whatever the need, the idea was that, 'why manage your own hardware and datacenter and internet connection'? That's an economy of scale sort of thing.. Someone will buy 1Gbps connection and 500 machines and associated large scale backup / switches / storage. They then lease it out to you for more than cost. You avoid up-front costs, and they make out a decent business plan.. It's a win-win... Sometimes.

The problem is that, installation on these solutions leaves with you very little to build a business over. You can download and install free software; but commercial software is largely difficult to deploy (especially if hardware has no guarantees and from month to month you may be forced to switch physical hardware (which would auto-invalidate your licenses)).

Free software was still fledgling.. Databases were somewhat slow/ unreliable.. Authentication systems were primitive. Load balancing techniques were unreliable/unscalable. And, of course, the network bandwidth that most providers gave were irregular at best.

Enter Amazon AWS

So some companies decided that it would make sense to try and solve ALL the building block needs.. Networking, load balancing, storage, data-store, relational data-store, messaging, emailing, monitoring, fast fault tolerance, fast scaling...

But more importantly, these building blocks happen without a phone-call.. Without a scale-negotiated pricing agreement.. Without an SLA. It is a-la-cart. Credit-card end-of-month 'charge-back'. You pay for what you use on an hourly or click basis. This means if you are small, you're cheap.. If you have bursts, you pay only for that burst (which is presumably profitable and thus worth it). And if you need to grow, you can. And you can always switch providers at the drop of a hat.

It's this instant provisioning and a la cart solution pricing that's innovative here. But provisioning that gives you commercial grade reliability (e.g. alternatives to oracle RAC, CISCO F5s, netapp).

Along with this came the proliferation of online add-on service stacks. The now classic example is salesforce.com. Something where there is an end solution that naturally can be extended with custom needs. This extension allows the opportunities for secondary markets and business partnership opportunities.

So today, apparently we use the following buzz words to categories all the above.

SaaS - Software as a Service

This is an ebay, salesforce.com. Some end software solution (typically website / web-service) but that can be built upon (the service aspect). The key is charge per-usage volume, and pluggability. A cnn.com is not pluggable and thus not categorized SaaS.

PaaS - Platform as a Service

This is the Amazon AWS (language neutral), the google app engine (Java/Python/Go), the Microsoft Azure (.NET). These are a-la-cart charge-back micro-services. The end company makes money (or as in google's case recoup costs) and provides highly-scalable solutions, so long as you stay within their sandbox. Currently there is vendor-lock in, in so far as you can't swap the techniques of vendor A with vendor B. And this isn't likely to change... If, for no other reason than the languages themselves are differentiated between these various platforms. Thus even a common SDK is unlikely to provide abstraction layers.

IaaS - Infrastructure as a Service

This is the classic hosted hardware with only the ability to provision/decommission hardware. Amazon EC2, Rackspace, terramark, Server Beach, etc are all in this model.

You are charged per use.. You have no visibility into the hardware itself.. Only RAM-size, number of CPUs, some benchmark representing minimum quality-of-CPU-compute-capability, and disk-space-size.

In some instances you can mix and match.. In others, you are given natural divisions of a base super-unit of resources.. Namely you can half/quarter/split 8 ways/16 ways/32 ways the basic compute unit (as with rack space). The needs of your business may dictate one vendor's solution v.s. another. Including whether they properly amortize valid windows licenses.

The reaction:

So, of course, the corporate buzz being - leased hardware, leased software stacks, make less efficient but more scalable solutions. Every CTO is being asked by their board, "What about cloud computing?". And thus there was a push-back.. In the analysis, there were some major concerns.

Vendor-lock-in
Sensitive data
Security
Uncontrolled Outages
Lack of SLAs
Lack of realized cost savings
Latency

So vendors started stepping in, and charging fees on top of fees, pro-porting to solve the various issues. New phrases were coined:

Virtual Private Cloud - The idea that on-premises or in a shared data-center, you could guarantee hardware and network isolation from peer clients. This guards against government search warrents that have nothing to do with you. This also guards against potential exploits at the VM layer (client A is exploited, and the hacker finds a way to hack the VM and then gets in-memory access to all other OSes running on that hardware; including yours). Companies like terramark advertise this.

Hierarchical Resource allocation - The IT staff is responsible for serving all divisions of a corporation. But they don't necessarily know the true resource needs - only the apparent requested needs (via official request tickets).

Thus with cloud-in-a-box on-premises appliances, the IT staff can purchase a compute-farm (say 60 CPUs and 20TB of redundant disk space). It then divides this 3 ways into 3 divisions based on preliminary projections of resource consumption. It then grants each division head "ownership" of a subset of the resources.. This allocation is not physical, and in fact can allow over-allocation (meaning you could have allocated 200 CPUs even though you only had 60). Those divisions then start allocating OS instances with desired RAM/CPU/disk based on their perceived needs.. They could have a custom FTP site, a custom compiler-server, a custom share-point, or some shared windows machine with visio / utilizing r-desktop for one-at-a-time access. The key is that the latency from the end user and the division head is faster than to joe-sys-admin who's over-tasked and under-budgeted. There is 'no phone call necessary' for provisioning of a service.

Now the central IT staff monitors the load of the over-committed box. There might be 50 terabytes of block-devices allocated on only 20TB of physical disk-arrays, BUT 90% of all those OS images are untouched. Note that defrag would be a VERY bad thing to run on such machines because they would commit all those logical blocks into physical ones. Likewise, each OS might report that they have 2GB of RAM, but in reality a 'balloon' app is leaching 1,200Meg back to the host hyper-visor. (so running graphically rich screen-savers is a bad bad bad thing).

As the load of the cloud appliance reaches 70%, they can purchase a second node and vmotion or cold restart individual heavy VMs over to the new hardware. They then re-portion any remaining unused reserve for given sub-divisions to come instead form the new cloud farm. Repeat and rinse..

At the end of each quarter, divisions pay back their actual resource consumption to the central IT budget.

The stated goal is that, previously, you'd need 1 IT staff member per 50 machines.. Now you can have 1 central IT staff member per 5,000 "virtual" machines. Note, they're still dealing with desktop/laptop/ipad integration issues left and right. They're still pulling bad hard disks left and right. They're still negotiating purchase order agreements and dealing with network outages. But the turn-around for provisioning is practically eliminated, and the cost of provisioning amortized.

Solutions like abiquo provide corporate multi-level hierarchical resource sub-divisioning. So from division, to department, to office-floor, to 3-man development team. The only request upstream is for more generic resources (any combination of disk, ram, HD). You manage the specific OS/environment needs. And by manage, a majority of OS deployments are stock services.. For example, a fully licensed windows 7 OS with visio install. A fully configured hudson continuous integration build machine. A fully configured shared file-system.

Availability Zones

Of course, since network, DDOS issues, power-issues, geographic network/power failure issues exist. The remaining issue for client-facing services is whether to host your own data-centers or go to a virtual private cloud or even a public cloud for a subset of your business needs/offerings. Here, it becomes too expensive to host Tokyo, Ireland, California, NY data center presences, so it may be more cost effective to just leverage existing hosted solutions. More-over, some vendors (akamai through rackspace, custom cloud-front through amazon AWS, etc) offer a thousand data-center 'edge' points which host static content.. These massively decrease world-wide average load latencies.. This is nearly impossible to satisfy on your own.

Many hosted solutions offer explicit Availability Zones (AZs), even within a given data-center. Obviously it is pointless to have a mysql slave node on the same physical machine as the master node. If one drive goes down, you lose both data nodes. Of course, with private cloud products, a given vendor will give you assurances that they only report 0.001% return rates.. Meaning, 'just trust their hardware appliance'.. You don't need to buy 2 $50,000 netapp appliances.. It's like it's own big-iron. But let me assure you.. Ethernet ports go bad. motherboard connectors go bad. Drives have gone bad such that they push high-voltage onto a common bus, obliterating all other shared connectors. ANY electrical coupling is a common-failure point. I personally see little value in scaling vertically at cost, and instead find greater comfort in scaling horizontally. Some solutions do focus on this electrically isolated fault tolerance, but most focus (for performance reasons) on vertical integration; and most stupidly happen to have shared electrical connections with no surge isolation (which makes them sub-mainframe quality).

Business Needs

The conference did help me shift my perception in this one important way. My excitement about solving 'web scale' problems is just that, excitement.. It does not directly translate into a business need. Over and over, speakers expressed that at the end of the day, all of this cloud 'stuff' tries to solve just one problem 'lower cost'. I was initially offended by this assertion.. But by putting on my economics 101 cap. All short-term fixed costs eventually become long term variable costs. Any short term problem can be designed for economies of scale in the long run. If I can't 'scale' today, I can over purchase then rent-out the excess tomorrow in 1,000 data centers. I can partner with akamai directly with a sufficiently high volume tomorrow for cheaper than I can purchase today or even lease today. If my software doesn't scale today, I can engineer an assembly language low-latency, custom FPGA design that fixes my bottlenecks tomorrow. ALL problems are solvable in the long-run. So the idea that clouds uniquely solve ANY problem is fundamentally flawed. I can do it better, cheaper, faster... tomorrow. The cloud only solves a small subset of problems today. And so the question any board of directors needs to consider when investing in private v.s. public infrastructure is time-to-market, and the risk of capital costs. If I know a product will have 3 years of return, and will need a particular ramp-up in deferred capital costs. Then I'm pretty sure I can engineer a purchase plan that is cheaper than an associated Amazon AWS solution. It'll run faster, cheaper, and more reliably. BUT, what if those projections are wrong? What if the project is a failure? What if we have spikes earlier than projected? Cloud computing provides SOME degree of risk mitigation, presumably in reducing the costs. Really, it just changes the equation radically enough so that old problems are replaced by new ones.

The single biggest beneficiary of cloud computing are in-experienced divisions or startups. Those who don't have talented seasoned IT staff members (DBAs, certified CISCO engineers, teleco contractors on call, etc). Those that can't afford the up-front costs. Those who's business risks are massive. Those who's challenge is raising money until they start showing revenue. Yet they can't earn revenue until some hardware/software stack is in place.

To this category, there simply is no alternative. You have zero up-take until you go viral.. Then no amount of hardware is sufficient to meet you needs.. That exponential growth period is highly interderministic, and also the one-shot make-or-break moment.

To this category, cloud computing is a no-brainer. At least in the beginning.. But even here, growth stories have shown 'cloud' doesn't solve their problems once they achieve a sustainable business model. 'github' for example, uses rack-space.. BUT only in a traditional datacenter model. They have massive storage needs which is in contrast to rackspace's business model.. So github doesn't leverage any public managed cloud services.. They own their own hardware/software stack. The only things rackspace provides is a subset of 'IaaS', which includes akamai's edge CDN network. And ultimately this 'hybrid' approach is probably where most medium+ businesses will be forced to live.

Security

It was somewhat good to hear some best practices at the conference include single-sign-on services. We're already familiar with facebook connect, openID and google's single-sign on. But these are obviously highly proprietary. The more interesting solutions for me were SAML based standards compliant trusted authority solutions.. Those where your corporate environment can leverage an Active Directory / LDAP metadata store of user+password+roles, then transmit trusted tokens to a google-apps, salesforce, etc to access their services with little fear of password hacking; and, of course, the value of single-sign-on. Here each tier is layered, so the 'password' part can be replaced with a finger-print scanner, or RFID card+finger+pin combo, etc. I personally like these hybrids, as the Sony exploit has shown that people are dumber than dumb.. Associating simple dictionary words 6 characters or less with credit card info. People simply can't be trusted to remember complex pass phrases that aren't biographically linked to metadata easily discoverable about them.

Separately there are all the best practices that SHOULD be honored. Don't ever pass sensitive data that you don't directly need through your network. Don't allow relay of public information through you (e.g. don't attach generic blogs unless necessary and unless monitored). Don't use encryption where assertion can solve the same problem (stolen assertion data is useless to a hacker, whereas stolen encrypted data can be hacked for a single master key).

Conclusions

I think that businesses HAVE to follow the joneses. But they should do so pragmatically, and in a venture-capital sort of way.. Fund initiatives to see if they have practical ROI. Do they solve more problems then they cause.. Keep your company and team agile (fast turn around time and with an ability to shift directions). Keep them appraised of possible solutions that solve problems more quickly, efficiently, cheaply. But remember that in 5 years, we will lament the whole 'cloud' era and laugh at people still that use centralized data. Much like we did in the early 90s. iPad/android peer-to-peer apps hiding data from 1984 oppresive government eyes is more important than consistent up-to-date data. Who knows what tomorrows critical challenges will be. So I wouldn't put too much stock (literally) in moving existing solutions over to public clouds.. But high-risk projects with large hardware needs and potentially short-lifetimes does make a lot of sense to get your corporate feet wet.

Tech articles

Saturday, February 23, 2013

Git and Mac OS 'brew' case study.

Summary

What is Git?

How is Git Different?

What is a file?

Basic Git 'things'

What's the point of github?

What is Brew?

Conclusion

Saturday, April 7, 2012

Callbacks coming back

Thursday, November 3, 2011

The US Cell Phone Market

or how I learned to play wacka-mole

Looking to buy a new phone?

A brief overview

The basic problem

The breakdown

Saturday, June 11, 2011

Cloud Computing

Followers

Blog Archive

About Me