PCA-Based deck classification

AmbivalentDuck

Tournament Organizers
Basic User

Posts: 2807

Exile Ancestral and turn Tiago sideways.

PCA-Based deck classification

« on: June 17, 2010, 05:37:16 pm »

First of all, I'm dumping a block of MATLAB source code. This ought to interest and lose the right people. PM me and I'll happily email sample .mat files from Extended Worlds 2009. This is open source, I'm not bluffing, and anyone with a copy of Matlab can reproduce my results.

Quote

function node=analyzeExtended()

tol=5;

load names;
load matrix;

scores=matrix(:,1);
matrix=matrix(:,2:end);

siz=size(matrix);
ave=mean(matrix,1);

cmatrix=matrix;
for k=1:siz(2)
   cmatrix(:,k)=cmatrix(:,k)-ave(k);
end

node(1).parent=0;
node(1).umbrella=1:siz(1);
nodes=1;

[node,nodes]=branch(node,nodes,1,matrix,tol);

figure(1)
treeplot([node.parent])
end

function [node, nodes]=branch(node, nodes, target, matrix, tol)
[coeff,score,latent]=princomp(matrix);
node(target).coeff=coeff;
node(target).vals=score(:,1);
[trash,m]=min(score(:,1));
[trash,M]=max(score(:,1));

norm((matrix(m,:)-matrix(M,:)).*coeff(:,1)')
if norm((matrix(m,:)-matrix(M,:)).*coeff(:,1)')>tol
   f=find(score(:,1)>0);
   g=find(score(:,1)<=0);
   nodes=nodes+2;
   next1=nodes;
   node(nodes-1).parent=target;
   node(nodes).parent=target;
   node(nodes-1).umbrella=node(target).umbrella(f);
   node(nodes).umbrella=node(target).umbrella(g);
   cmatrix=matrix(f,:);
   for k=1:size(cmatrix,1);
   cmatrix(k,:)=cmatrix(k,:)-dot(cmatrix(k,:),coeff(:,1))*coeff(:,1)';
   end
   [node,nodes]=branch(node,nodes,nodes-1,cmatrix,tol);
   cmatrix=matrix(g,:);
   for k=1:size(cmatrix,1);
   cmatrix(k,:)=cmatrix(k,:)-dot(cmatrix(k,:),coeff(:,1))*coeff(:,1)';
   end
   [node,nodes]=branch(node,nodes,next1,cmatrix,tol);
end

end

*phew*
Now let's get to what I did and why. I've been talking about it forever, but it was high time to implement it. The idea here is that I recursively build a tree by performing principle component analysis on a fairly large dataset. I build a nearly minimum entropy classifier for deck *types* (but not decklists) by creating branches from a node that reflect either a positive or negative dot product with the largest principle component at that node. Example: the first principle component in vintage is workshops vs drains, so if my name vector is ["Mana Drain", "Island", "Mishra's Workshop","Tangle Wire"] the principle component would look like [1, 1, -1, -1]. For a Drain deck, the vector is [4 2 0 0]...but we have to subtract off the means (about-ish [2 1 2 2]) which would leave us with [2 1 -2 -2]. Take the dot product: 2*2 + 1*1 + -1*-2 + -1*-2 = 4+1+2+2 = 9. And 9>0 telling us that we have Drains and not Workshops. This is a drastic simplification and the real components appear to have 20+ significant cards.

The algorithm needs to be told when to stop splitting. We do this by supplying a tolerance: the bottom-most nodes on the graph must be different by at least X cards. If you pick it intelligently, the bottom-most nodes will reflect either archetypes or variations within an archetype. Then you can do regression on individual card choices against winningness WITHIN AN ARCHETYPE for all *tested* cards automatically.

The result is this gorgeousness:

tl;dr: Decks are classifiable. I have an algorithm that not only does it but produces fairly small trees. This one suggests only seven real archetypes in a given extended metagame. Further, I can tell you within an archetype what maindeck choices best correlate with winning.


« Last Edit: June 21, 2010, 09:23:11 am by AmbivalentDuck »	Logged

A link to the GitHub project where I store all of my Cockatrice decks.
Team TMD - If you feel that team secrecy is bad for Vintage put this in your signature
Any interest in putting together/maintaining a Github Git project that hosts proven decks of all major archetypes and documents their changes over time?

Smmenen

2007 Vintage World Champion
Adepts
Basic User

Posts: 6392

Re: PCA-Based deck classification

« Reply #1 on: June 17, 2010, 08:29:53 pm »

I'm confused. Are you trying to correlate cards with winning, or are you trying to classify decks? Those, it seems to me, are different goals.

Archetype taxonomy is one of the most difficult things in Magic analysis, and if you've solved it, that would be great. But is that what this is about?


« Last Edit: June 17, 2010, 09:23:32 pm by Smmenen »	Logged

@SMenendian on Twitter

Check out my podcast!

My Eternal Central Article Archive (new articles)

My Star City Games Article Archive (300+ Vintage articles since 2002)

AmbivalentDuck

Tournament Organizers
Basic User

Posts: 2807

Exile Ancestral and turn Tiago sideways.

Re: PCA-Based deck classification

« Reply #2 on: June 18, 2010, 09:00:21 am »

I'm excited because I've classified *archetypes.* This is most definitely archetype taxonomy. And the figure shows a classification tree with surprisingly few steps.

Once you have archetypes, you can regress card *difference* from the principle components that define the archetype and correlate those with winning.

They're one and the same goal because you need a mathematical definition of a given archetype to calculate the relative impact of 3 vs 4 Crucible of Worlds in Stax instead of the relative impact of playing Crucibles (Stax) vs not (the rest of the meta). Further, the value of Crucible might actually scale with whether or not it's in a cluster of cards. Ie. Time Vault is useless without Key or Tez. You can regress against the whole cluster using this "PCA" technique.

I've also produced the tree for a tolerance of 1 card (Ie. Build a taxonomy of all 102 decks that were in the PTQ leaving only card-for-card identical builds sharing final nodes). You can do this by adjusted the "tol" paramater to 1.


« Last Edit: June 18, 2010, 09:06:38 am by AmbivalentDuck »	Logged

Diakonov

Full Members
Basic User

Posts: 758

Hey Now

Re: PCA-Based deck classification

« Reply #3 on: June 18, 2010, 12:31:29 pm »

This is awesome. I remember discussing it with you a bit--purely in a hypothetical sense--months ago, but I didn't think you were actually going to create it.

Could you label the nodes in the first Extended chart you presented?


	Logged

VINTAGE CONSOLES
VINTAGE MAGIC
VINTAGE JACKETS

Team Hadley

vassago

Basic User

Posts: 581

Re: PCA-Based deck classification

« Reply #4 on: June 18, 2010, 06:50:56 pm »

Quote from: Diakonov on June 18, 2010, 12:31:29 pm

Could you label the nodes in the first Extended chart you presented?

This would be helpful.

Not that it would be too relevant, but what PTQ was it?


	Logged

Quote from: M.Solymossy

.... "OMGWTFElephantOnMyFace".

AmbivalentDuck

Tournament Organizers
Basic User

Posts: 2807

Exile Ancestral and turn Tiago sideways.

Re: PCA-Based deck classification

« Reply #5 on: June 19, 2010, 05:58:48 am »

Quote from: Diakonov on June 18, 2010, 12:31:29 pm

That's what I'd like to do next. It's actually a pain in the ass since matlab doesn't support automatically annotating tree plots. The other issue is that to "label" a node, the "best" description is really a sum of the preceding principle components. Those are annoyingly large labels...to the point where maybe it would be best to just write a PHP script that can act as a node browser.

Quote from: vassago on June 18, 2010, 06:50:56 pm

Not that it would be too relevant, but what PTQ was it?

The dataset I've been working with is extended Worlds 2009.


	Logged

evouga

Basic User

Posts: 537

Re: PCA-Based deck classification

« Reply #6 on: July 04, 2010, 12:16:33 am »

I think the idea of bringing rigorous statistical analysis to bear on the problem of deck classification is a very intriguing one. But I have a concern.
The meaningfulness of this kind of classification will very much depend on the inner product used to measure the "distance" between deck lists. The ordinary dot product is unsuitable for several reasons:

It doesn't take into account functional similarity of cards. If I take an Oath list with 4x Spell Pierce and instead put in 4x Mana Drain, I've not changed the deck nearly as much as if I replace 4x Oath with 4x Mana Drain. Yet the dot product will say both resulting decks are equally distant from the starting one.
It gives undue weight to similarity/difference in mana bases, since decks can share large numbers of cards with other decks that happen to have a similar mana base but play very differently.

What are you thoughts on this?


	Logged

AmbivalentDuck

Tournament Organizers
Basic User

Posts: 2807

Exile Ancestral and turn Tiago sideways.

Re: PCA-Based deck classification

« Reply #7 on: July 07, 2010, 07:36:37 am »

I agree completely. It would be nice to put the distance metric in some abstract resource space. Ie. By using a card's (likely) impact on specific resources such as the colors of mana, creature butts, hand size, etc.


	Logged

Pages: [1]