First of all, I'm dumping a block of MATLAB source code. This ought to interest and lose the right people. PM me and I'll happily email sample .mat files from Extended Worlds 2009. This is open source, I'm not bluffing, and anyone with a copy of Matlab can reproduce my results.
function node=analyzeExtended()
tol=5;
load names;
load matrix;
scores=matrix(:,1);
matrix=matrix(:,2:end);
siz=size(matrix);
ave=mean(matrix,1);
cmatrix=matrix;
for k=1:siz(2)
cmatrix(:,k)=cmatrix(:,k)-ave(k);
end
node(1).parent=0;
node(1).umbrella=1:siz(1);
nodes=1;
[node,nodes]=branch(node,nodes,1,matrix,tol);
figure(1)
treeplot([node.parent])
end
function [node, nodes]=branch(node, nodes, target, matrix, tol)
[coeff,score,latent]=princomp(matrix);
node(target).coeff=coeff;
node(target).vals=score(:,1);
[trash,m]=min(score(:,1));
[trash,M]=max(score(:,1));
norm((matrix(m,:)-matrix(M,:)).*coeff(:,1)')
if norm((matrix(m,:)-matrix(M,:)).*coeff(:,1)')>tol
f=find(score(:,1)>0);
g=find(score(:,1)<=0);
nodes=nodes+2;
next1=nodes;
node(nodes-1).parent=target;
node(nodes).parent=target;
node(nodes-1).umbrella=node(target).umbrella(f);
node(nodes).umbrella=node(target).umbrella(g);
cmatrix=matrix(f,:);
for k=1:size(cmatrix,1);
cmatrix(k,:)=cmatrix(k,:)-dot(cmatrix(k,:),coeff(:,1))*coeff(:,1)';
end
[node,nodes]=branch(node,nodes,nodes-1,cmatrix,tol);
cmatrix=matrix(g,:);
for k=1:size(cmatrix,1);
cmatrix(k,:)=cmatrix(k,:)-dot(cmatrix(k,:),coeff(:,1))*coeff(:,1)';
end
[node,nodes]=branch(node,nodes,next1,cmatrix,tol);
end
end
*phew*
Now let's get to what I did and why. I've been talking about it forever, but it was high time to implement it. The idea here is that I recursively build a tree by performing principle component analysis on a fairly large dataset. I build a nearly minimum entropy classifier for deck *types* (but not decklists) by creating branches from a node that reflect either a positive or negative dot product with the largest principle component at that node. Example: the first principle component in vintage is workshops vs drains, so if my name vector is ["Mana Drain", "Island", "Mishra's Workshop","Tangle Wire"] the principle component would look like [1, 1, -1, -1]. For a Drain deck, the vector is [4 2 0 0]...but we have to subtract off the means (about-ish [2 1 2 2]) which would leave us with [2 1 -2 -2]. Take the dot product: 2*2 + 1*1 + -1*-2 + -1*-2 = 4+1+2+2 = 9. And 9>0 telling us that we have Drains and not Workshops. This is a drastic simplification and the real components appear to have 20+ significant cards.
The algorithm needs to be told when to stop splitting. We do this by supplying a tolerance: the bottom-most nodes on the graph must be different by at least X cards. If you pick it intelligently, the bottom-most nodes will reflect either archetypes or variations within an archetype. Then you can do regression on individual card choices against winningness WITHIN AN ARCHETYPE for all *tested* cards automatically.
The result is this gorgeousness:

tl;dr: Decks are classifiable. I have an algorithm that not only does it but produces fairly small trees. This one suggests only seven real archetypes in a given extended metagame. Further, I can tell you within an archetype what maindeck choices best correlate with winning.