TheManaDrain.com
November 17, 2025, 09:56:32 am *
Welcome, Guest. Please login or register.

Login with username, password and session length
News:
 
   Home   Help Search Calendar Login Register  
Pages: [1]
  Print  
Author Topic: PCA-Based deck classification  (Read 3201 times)
AmbivalentDuck
Tournament Organizers
Basic User
**
Posts: 2807

Exile Ancestral and turn Tiago sideways.

ambivalentduck ambivalentduck ambivalentduck
View Profile
« on: June 17, 2010, 05:37:16 pm »

First of all, I'm dumping a block of MATLAB source code.  This ought to interest and lose the right people.  PM me and I'll happily email sample .mat files from Extended Worlds 2009.  This is open source, I'm not bluffing, and anyone with a copy of Matlab can reproduce my results.
Quote
function node=analyzeExtended()

tol=5;

load names;
load matrix;

scores=matrix(:,1);
matrix=matrix(:,2:end);

siz=size(matrix);
ave=mean(matrix,1);

cmatrix=matrix;
for k=1:siz(2)
    cmatrix(:,k)=cmatrix(:,k)-ave(k);
end

node(1).parent=0;
node(1).umbrella=1:siz(1);
nodes=1;

[node,nodes]=branch(node,nodes,1,matrix,tol);

figure(1)
treeplot([node.parent])
end

function [node, nodes]=branch(node, nodes, target, matrix, tol)
[coeff,score,latent]=princomp(matrix);
node(target).coeff=coeff;
node(target).vals=score(:,1);
[trash,m]=min(score(:,1));
[trash,M]=max(score(:,1));

norm((matrix(m,:)-matrix(M,:)).*coeff(:,1)')
if norm((matrix(m,:)-matrix(M,:)).*coeff(:,1)')>tol
    f=find(score(:,1)>0);
    g=find(score(:,1)<=0);
    nodes=nodes+2;
    next1=nodes;
    node(nodes-1).parent=target;
    node(nodes).parent=target;
    node(nodes-1).umbrella=node(target).umbrella(f);
    node(nodes).umbrella=node(target).umbrella(g);
    cmatrix=matrix(f,:);
    for k=1:size(cmatrix,1);
        cmatrix(k,:)=cmatrix(k,:)-dot(cmatrix(k,:),coeff(:,1))*coeff(:,1)';
    end
    [node,nodes]=branch(node,nodes,nodes-1,cmatrix,tol);
    cmatrix=matrix(g,:);
    for k=1:size(cmatrix,1);
        cmatrix(k,:)=cmatrix(k,:)-dot(cmatrix(k,:),coeff(:,1))*coeff(:,1)';
    end
    [node,nodes]=branch(node,nodes,next1,cmatrix,tol);
end

end
*phew*
Now let's get to what I did and why.  I've been talking about it forever, but it was high time to implement it.  The idea here is that I recursively build a tree by performing principle component analysis on a fairly large dataset.  I build a nearly minimum entropy classifier for deck *types* (but not decklists) by creating branches from a node that reflect either a positive or negative dot product with the largest principle component at that node. Example: the first principle component in vintage is workshops vs drains, so if my name vector is ["Mana Drain", "Island", "Mishra's Workshop","Tangle Wire"] the principle component would look like [1, 1, -1, -1].  For a Drain deck, the vector is [4 2 0 0]...but we have to subtract off the means (about-ish [2 1 2 2]) which would leave us with [2 1 -2 -2].  Take the dot product: 2*2 + 1*1 + -1*-2 + -1*-2 = 4+1+2+2 = 9. And 9>0 telling us that we have Drains and not Workshops.  This is a drastic simplification and the real components appear to have 20+ significant cards.

The algorithm needs to be told when to stop splitting.  We do this by supplying a tolerance: the bottom-most nodes on the graph must be different by at least X cards.  If you pick it intelligently, the bottom-most nodes will reflect either archetypes or variations within an archetype.  Then you can do regression on individual card choices against winningness WITHIN AN ARCHETYPE for all *tested* cards automatically.

The result is this gorgeousness:


tl;dr: Decks are classifiable.  I have an algorithm that not only does it but produces fairly small trees.  This one suggests only seven real archetypes in a given extended metagame.  Further, I can tell you within an archetype what maindeck choices best correlate with winning.
« Last Edit: June 21, 2010, 09:23:11 am by AmbivalentDuck » Logged

A link to the GitHub project where I store all of my Cockatrice decks.
Team TMD - If you feel that team secrecy is bad for Vintage put this in your signature
Any interest in putting together/maintaining a Github Git project that hosts proven decks of all major archetypes and documents their changes over time?
Smmenen
2007 Vintage World Champion
Adepts
Basic User
****
Posts: 6392


Smmenen
View Profile WWW
« Reply #1 on: June 17, 2010, 08:29:53 pm »

I'm confused.  Are you trying to correlate cards with winning, or are you trying to classify decks?  Those, it seems to me, are different goals.

Archetype taxonomy is one of the most difficult things in Magic analysis, and if you've solved it, that would be great.  But is that what this is about?  
« Last Edit: June 17, 2010, 09:23:32 pm by Smmenen » Logged

AmbivalentDuck
Tournament Organizers
Basic User
**
Posts: 2807

Exile Ancestral and turn Tiago sideways.

ambivalentduck ambivalentduck ambivalentduck
View Profile
« Reply #2 on: June 18, 2010, 09:00:21 am »

I'm excited because I've classified *archetypes.*  This is most definitely archetype taxonomy.  And the figure shows a classification tree with surprisingly few steps.

Once you have archetypes, you can regress card *difference* from the principle components that define the archetype and correlate those with winning.

They're one and the same goal because you need a mathematical definition of a given archetype to calculate the relative impact of 3 vs 4 Crucible of Worlds in Stax instead of the relative impact of playing Crucibles (Stax) vs not (the rest of the meta).  Further, the value of Crucible might actually scale with whether or not it's in a cluster of cards.  Ie. Time Vault is useless without Key or Tez.  You can regress against the whole cluster using this "PCA" technique.

I've also produced the tree for a tolerance of 1 card (Ie. Build a taxonomy of all 102 decks that were in the PTQ leaving only card-for-card identical builds sharing final nodes).  You can do this by adjusted the "tol" paramater to 1.
« Last Edit: June 18, 2010, 09:06:38 am by AmbivalentDuck » Logged

A link to the GitHub project where I store all of my Cockatrice decks.
Team TMD - If you feel that team secrecy is bad for Vintage put this in your signature
Any interest in putting together/maintaining a Github Git project that hosts proven decks of all major archetypes and documents their changes over time?
Diakonov
Full Members
Basic User
***
Posts: 758


Hey Now


View Profile
« Reply #3 on: June 18, 2010, 12:31:29 pm »

This is awesome.  I remember discussing it with you a bit--purely in a hypothetical sense--months ago, but I didn't think you were actually going to create it. 

Could you label the nodes in the first Extended chart you presented?
Logged

VINTAGE CONSOLES
VINTAGE MAGIC
VINTAGE JACKETS

Team Hadley

vassago
Basic User
**
Posts: 581


phesago
View Profile Email
« Reply #4 on: June 18, 2010, 06:50:56 pm »

Could you label the nodes in the first Extended chart you presented?

This would be helpful.

Not that it would be too relevant, but what PTQ was it?
Logged

Quote from: M.Solymossy
.... "OMGWTFElephantOnMyFace".
AmbivalentDuck
Tournament Organizers
Basic User
**
Posts: 2807

Exile Ancestral and turn Tiago sideways.

ambivalentduck ambivalentduck ambivalentduck
View Profile
« Reply #5 on: June 19, 2010, 05:58:48 am »

This is awesome.  I remember discussing it with you a bit--purely in a hypothetical sense--months ago, but I didn't think you were actually going to create it. 

Could you label the nodes in the first Extended chart you presented?
That's what I'd like to do next.  It's actually a pain in the ass since matlab doesn't support automatically annotating tree plots.  The other issue is that to "label" a node, the "best" description is really a sum of the preceding principle components.  Those are annoyingly large labels...to the point where maybe it would be best to just write a PHP script that can act as a node browser.

Not that it would be too relevant, but what PTQ was it?
The dataset I've been working with is extended Worlds 2009. 
Logged

A link to the GitHub project where I store all of my Cockatrice decks.
Team TMD - If you feel that team secrecy is bad for Vintage put this in your signature
Any interest in putting together/maintaining a Github Git project that hosts proven decks of all major archetypes and documents their changes over time?
evouga
Basic User
**
Posts: 537


View Profile Email
« Reply #6 on: July 04, 2010, 12:16:33 am »

I think the idea of bringing rigorous statistical analysis to bear on the problem of deck classification is a very intriguing one. But I have a concern.
The meaningfulness of this kind of classification will very much depend on the inner product used to measure the "distance" between deck lists. The ordinary dot product is unsuitable for several reasons:

  • It doesn't take into account functional similarity of cards. If I take an Oath list with 4x Spell Pierce and instead put in 4x Mana Drain, I've not changed the deck nearly as much as if I replace 4x Oath with 4x Mana Drain. Yet the dot product will say both resulting decks are equally distant from the starting one.
  • It gives undue weight to similarity/difference in mana bases, since decks can share large numbers of cards with other decks that happen to have a similar mana base but play very differently.

What are you thoughts on this?
Logged
AmbivalentDuck
Tournament Organizers
Basic User
**
Posts: 2807

Exile Ancestral and turn Tiago sideways.

ambivalentduck ambivalentduck ambivalentduck
View Profile
« Reply #7 on: July 07, 2010, 07:36:37 am »

I agree completely.  It would be nice to put the distance metric in some abstract resource space.   Ie.  By using a card's (likely) impact on specific resources such as the colors of mana, creature butts, hand size, etc.
Logged

A link to the GitHub project where I store all of my Cockatrice decks.
Team TMD - If you feel that team secrecy is bad for Vintage put this in your signature
Any interest in putting together/maintaining a Github Git project that hosts proven decks of all major archetypes and documents their changes over time?
Pages: [1]
  Print  
 
Jump to:  

Powered by MySQL Powered by PHP Powered by SMF 1.1.21 | SMF © 2015, Simple Machines Valid XHTML 1.0! Valid CSS!
Page created in 0.039 seconds with 19 queries.