Semi-automated Vintage trend analysis

AmbivalentDuck

Tournament Organizers
Basic User

Posts: 2807

Exile Ancestral and turn Tiago sideways.

Semi-automated Vintage trend analysis

« on: May 08, 2009, 05:15:39 pm »

Right now, we're busily fighting about restrictions and unrestrictions, and card choices in a variety of decks. Unfortunately, everyone is short on numbers...so let's remedy that.

I wrote a script to parse out all of the PT: Kyoto qualifier data to prove a point to Tom LaPille. You can see me cited in his article here. Point is, someone (possibly me) needs to write a parser for Morphling.de and all of the other major tourney report sites.

We need a process similar to what I used for the obelisks but more expansive:
1. Implement a parser in a language with good string support. PHP, Perl, or Python. For obvious reasons, it must be open source.
a. Stick everything into a database. MySQL or PostgreSQL. Put a web front-end out for public inspection (and modification?).
2. Implement a Python, R, Matlab, or SPSS script to do descriptive stats.
a. Most prevalent cards.
b. Archetype recognition (ie. Tezz -> Tezz, Welder + Thirst - Tez -> Slaver, etc)
c. Winningness correlations with deviations from the protolist (as determined by principal component analysis). Matlab or...what else can do this?
d. Multi-variate correlations with winningness by archetype, generic distance from nearest protolist, and region.
e. Make pretty graphs. I like Python and Matlab for this.
3. Maintain a community effort to keep archetype recognition complete and up-to-date. Maintain a community effort to determine parser algorithms.

Coding this up isn't a big deal. Figuring out all of the rules is.

If 3+ other people volunteer to take parts of this and maintain them, we could do something really cool. This project is easily much larger than my free time. Oh, and no programming experience required. Just figuring out all of the logic for what defines an archetype would be a huge contribution to the project.

Progress so far:
FlyFlySideOfFry seems to have volunteered to work on some of the archetype recognition logic.
LennonMarx and Meadbert have offered to help.

Time to get started.


« Last Edit: May 10, 2009, 08:29:56 am by AmbivalentDuck »	Logged

A link to the GitHub project where I store all of my Cockatrice decks.
Team TMD - If you feel that team secrecy is bad for Vintage put this in your signature
Any interest in putting together/maintaining a Github Git project that hosts proven decks of all major archetypes and documents their changes over time?

FlyFlySideOfFry

Full Members
Basic User

Posts: 412

Re: Semi-automated Vintage trend analysis

« Reply #1 on: May 08, 2009, 05:34:07 pm »

Just posting to verify that I'm willing to work on figuring out what defines a specific archetype and look over decklists to assign them if necessary. If this thread gets moved could I please get PMed a link so I can keep track of this? Thank you. Smile


	Logged

Quote from: Norm4eva on April 30, 2010, 11:06:40 pm

Mickey Mouse is on a Magic card. Your argument is invalid.

Smmenen

2007 Vintage World Champion
Adepts
Basic User

Posts: 6392

Re: Semi-automated Vintage trend analysis

« Reply #2 on: May 08, 2009, 06:44:45 pm »

I would contest a starting premise of your project: we are not short on numbers. The Vintage community actually has a very rich dataset of Vintage tournaments, centrally located. That doesn't mean your project isn't worthwhile, but I'm just correcting one of your starting assumptions.


	Logged

@SMenendian on Twitter

Check out my podcast!

My Eternal Central Article Archive (new articles)

My Star City Games Article Archive (300+ Vintage articles since 2002)

AmbivalentDuck

Tournament Organizers
Basic User

Posts: 2807

Exile Ancestral and turn Tiago sideways.

Re: Semi-automated Vintage trend analysis

« Reply #3 on: May 08, 2009, 06:47:57 pm »

Yes and no. You haven't weighted top 8 showings by pilot skill and tourney size, for starters. We have numbers...but we could have many more. Since I know you'll end up contesting archetype definitions (and everything else) later, any interest in joining when/if I get a critical mass to start on this?


« Last Edit: May 08, 2009, 07:21:30 pm by AmbivalentDuck »	Logged

tito del monte

Basic User

Posts: 377

Re: Semi-automated Vintage trend analysis

« Reply #4 on: May 09, 2009, 08:29:18 am »

Interesting idea. I'm afraid I have virtually no skills to offer, but I'd be interested to see the data seperated into proxy and non-proxy. I don't have premium, so I don't read Smennen's meta-game reports, but it seems that the data they are built on (and subsequent discussions of restrictions/unrestrictions) don't take this into account - and as far as I was aware, Wizards are interested in the state of what to them is 'real' Vintage - i.e sanctioned Vintage.


	Logged

Author of the e-book SO DO YOU WEAR A CAPE? - The Unofficial Story of Magic: The Gathering

meadbert

Adepts
Basic User

Posts: 1341

Re: Semi-automated Vintage trend analysis

« Reply #5 on: May 09, 2009, 01:58:51 pm »

I am a professional C programmer, but I do not know any of the languages mentioned well. I could learn one to do one part. Let me know how I can help.


	Logged

T1: Arsenal

Rubik_3x3x3

Basic User

Posts: 38

Re: Semi-automated Vintage trend analysis

« Reply #6 on: May 09, 2009, 06:14:55 pm »

I agree with Stephen that we have lots of numbers, but any more in-depth analysis of those numbers obtainable has the potential to be extremely useful. It could also save quite a bit of time and research, which I'm sure many players don't live having to do (shame on them). I can't help with anything too useful, but this looks like a good idea and I just wanted you to know you have support. Who doesn't love information (and pretty graphs!)? However, as pleasant of an idea as this seems, don't kill yourself trying to get it done because Vintage players are entirely competent and can get the majority of the relevant numbers for themselves if they want them.

Thanks, and good luck!


	Logged

LennonMarx

Basic User

Posts: 32

Re: Semi-automated Vintage trend analysis

« Reply #7 on: May 10, 2009, 04:37:49 am »

Hey there, this does sound like an interesting project. I'm a CS student, and am fairly adept at Prolog style logical programming, if that will be helpful for the archetype definition stuff. I'm mostly a Java programmer, but like bert, I could learn one of the languages you wish to use for the project.

~Lennon


	Logged

"There is no such thing as a good play. There is the right play and then there is the mistake" -Jon Finkel

"We are the religious wackjobs of (ostensibly) competitive Magic." -AngryPheldagrif

Team Masquerade

AmbivalentDuck

Tournament Organizers
Basic User

Posts: 2807

Exile Ancestral and turn Tiago sideways.

Re: Semi-automated Vintage trend analysis

« Reply #8 on: May 10, 2009, 09:12:26 am »

Awesome, that's enough people.

I'm not a professional programmer, btw. My research requires me to program in Python, Matlab, and C++ to do analysis on human movements. Anyways, I'm not a pro in any sense at formalizing algorithms or designing data structure that other people will have to use. So, Meadbert, much help woud be appreciated there.

Two parallel projects to start with.

1. We need a parser for Morphling.de. I'll volunteer to write it unless someone else has parsing experience?

Meadbert, can you help me figure out how to organize the database we're sticking this into? Something like:

A database of decks:
-Each deck is a table (fields varchar cardname, int quantity?)
-An additional table of decks (fields varchar archetype, pilot, tourney, date dateplayed, some unique serial # or hash, see below)
-An additional table of deck metadata. Once we get done doing all of the PCA analysis and such, we'll want to save the results. A unique hash for each deck makes joins easier?

2. Someone (FlyFlySideOfFry, LennonMarx, or both) start defining archetypes below. People will naturally tear into it and that's a good thing. Sticking to boolean logic is probably good. We'll formallize a language (so it can be read it automagically) later. Oh, and is there any benefit to fuzzy definitions?

Just a super quick overview on principal component analysis. Each card that actually sees play in vintage is a dimension. So, like width, height, and depth form a basis for defining movements in the physical world, each card that sees play forms part of the basis for defining decks. PCA finds groups that are usually together. So, an example principal component will probably be Workshop, Tangle Wire, Welder, and Trinisphere + some other cards I can't predict off hand. So, kind of like google sets.

Where this is going is that you can define archetypes by their principal components and actually say that maybe Slaver (and this is total bs) is 90% tezzet and 10% stax. PCA has the cute property that the basis sets it creates will describe the entire dataset and will rank the relative contribution of each set to all of the data. So, if there's a 4-Drain Tezzeret component that actually contributes to 40% of the data, it raises alarms.

So, pros:
We're going to do it anyways.
It's powerful.

Cons:
It's complicated and only statisticians and engineers already know about it.
It doesn't help your argument when nobody knows what you're saying.


« Last Edit: May 10, 2009, 09:23:35 am by AmbivalentDuck »	Logged

meadbert

Adepts
Basic User

Posts: 1341

Re: Semi-automated Vintage trend analysis

« Reply #9 on: May 10, 2009, 11:39:43 am »

Quote from: AmbivalentDuck on May 10, 2009, 09:12:26 am

Meadbert, can you help me figure out how to organize the database we're sticking this into? Something like:

A database of decks:
-Each deck is a table (fields varchar cardname, int quantity?)
-An additional table of decks (fields varchar archetype, pilot, tourney, date dateplayed, some unique serial # or hash, see below)
-An additional table of deck metadata. Once we get done doing all of the PCA analysis and such, we'll want to save the results. A unique hash for each deck makes joins easier?

So I have a bit of database experience, but my wife uses SQL professionally. She suggests the following:
She suggests 3 tables.

The first Table is the card table where each row represents a card.
There would be a card_id(sequence number) column and a card name column.

The second table would be the deck table.
It would also contain a unique deck_id.
It might also contain a deck name.

The contents of decks would be stored in a a deck configuration table.
This would contain a deck confiuguration id as well as card id and deck id.
Basically this table creates a many-to-many relationship between cards and decks. My wife used the phrase "cross reference table."
The question here is how do we represent multiple Mana Drain's in one deck. My wife suggests we simply use the same row multiple times.
This may not optimize performance, but she says it will make future querries much easier to write and since a card rarely show up more than 4 times in a deck it
should not significantly impact performance.

The other way to do this project (the way pedantic c users would do it) is to simply open source both the code and the text database on source forge. Then anyone could access either and anyone with permision could modify them. The whole application could be written in C. To use it you would just need to do an svn up followed by make. This is all extremely convenient for those who use linux since make and gcc general come installed, but my wife has insisted that for real people that still use windows this would pose an extreme headache and thus your method is much better.


	Logged

T1: Arsenal

Zygon

Basic User

Posts: 11

Re: Semi-automated Vintage trend analysis

« Reply #10 on: May 10, 2009, 12:02:13 pm »

I do program Java/C professionally (I also know some Python, and I'm learning Lisp, I've only toyed with Prolog briefly in a grad class) and have had some db experience. I don't really have a ton of time to devote to outside programing due to working all day and having a class on top of that but this summer I could potentionally help out. I've already had in mind (and started) making some analysis tools for my own purposes as my personal interests have been shifting towards machine learning in the past couple years.
The first thing I would recommend doing if you're designing a db schema is to research database normalization: http://databases.about.com/od/specificproducts/a/normalization.htm (as just an example) and shoot for at least 3rd normal form. Otherwise, you might end up shooting yourself in the foot later down the road if you have to migrate to a more flexible schema. Other than that for now I'd be happy to review any design documents and provide feedback (pm me for contact info). And I do recommend making a detailed design doc if you're serious about getting down and dirty and want to have a project with developers who aren't co-located.

Sounds like fun! Good luck folks.

-Zygon


	Logged

"They told me that you had gone totally insane, and that your methods were unsound."

AmbivalentDuck

Tournament Organizers
Basic User

Posts: 2807

Exile Ancestral and turn Tiago sideways.

Re: Semi-automated Vintage trend analysis

« Reply #11 on: May 10, 2009, 02:03:31 pm »

I use linux, myself. I'm typing this on an older XP64 machine, but I ssh into a beefy linux machine for all of my development.

Open-sourcing everything on source-forge is wise, even if the project gets mixed across languages. GNU/Octave can handle SVD (and therefore PCA), and it's open source. PHP and Python are open source for parsing (which is unnecessarily annoying to debug in C). Even just picking a blas and lapack implementations for the math is a major debate if we don't use a preset package like matlab or octave. Oh, and the code overhead to doing linear algebra in C is enormous. PCRE in C isn't horrible, but I vastly prefer languages built for string processing.

Make to just stick scripts in the correct places is perfectly possible, no? There are some annoying aspects...like not everyone having an Apache server up and running to stick PHP scripts into. With a project this modular, it may make sense to distribute source code for each module individually?

Also, what additional descriptors do we want? A table of tourneys with turn out and location in a coordinate grid? Do we track pilots and associate them with decks?

@Zygon...any input is greatly appreciated. In particular, I have no experience with design docs since I'm the only user and maintainer of most of the software I write. I'll happy defer to you and Meadbert for that.


	Logged

FlyFlySideOfFry

Full Members
Basic User

Posts: 412

Re: Semi-automated Vintage trend analysis

« Reply #12 on: May 10, 2009, 04:45:35 pm »

Quote from: AmbivalentDuck on May 10, 2009, 09:12:26 am

2. Someone (FlyFlySideOfFry, LennonMarx, or both) start defining archetypes below. People will naturally tear into it and that's a good thing. Sticking to boolean logic is probably good. We'll formallize a language (so it can be read it automagically) later. Oh, and is there any benefit to fuzzy definitions?

Just to be sure before I start tossing a few hours into this would I just be defining archetypes in terms of plain English (like CS is a Mana Drain deck based on the synergy between Goblin Welder, expensive artifacts, and Thirst for Knowledge) and looking over decklists to assign lists to an archetype or do I have to know something about programming for it to be useful? Should it be the latter I'm afraid I can't really help and I misunderstood what you were asking for, but as the former I can still gladly help.


	Logged

Quote from: Norm4eva on April 30, 2010, 11:06:40 pm

Mickey Mouse is on a Magic card. Your argument is invalid.

LennonMarx

Basic User

Posts: 32

Re: Semi-automated Vintage trend analysis

« Reply #13 on: May 10, 2009, 06:55:04 pm »

Quote from: FlyFlySideOfFry on May 10, 2009, 04:45:35 pm

Quote from: AmbivalentDuck on May 10, 2009, 09:12:26 am

I think what he's looking for is something along the lines of something like (though this may be too formal):
There exists X, Uses_Thirsts(X), Has_Welders(X), Has_Robots(X) <--> ControlSlaver(X)

where Uses_Thirst(), Has_Welders(), Has_Robots(), and ControlSlaver() are relations such that Uses_Thirsts(X) means deck X uses Thirsts and so on (kind of self explanatory now that I type it out). Please correct me if I am wrong AD.

~Lennon


	Logged

FlyFlySideOfFry

Full Members
Basic User

Posts: 412

Re: Semi-automated Vintage trend analysis

« Reply #14 on: May 10, 2009, 07:12:46 pm »

Quote from: LennonMarx on May 10, 2009, 06:55:04 pm

Quote from: FlyFlySideOfFry on May 10, 2009, 04:45:35 pm

Quote from: AmbivalentDuck on May 10, 2009, 09:12:26 am

Ah I was under the impression that it would be a manual thing rather than just plugging decklists into a program. Well good luck I look forward to seeing how it turns out. Smile


	Logged

Quote from: Norm4eva on April 30, 2010, 11:06:40 pm

Mickey Mouse is on a Magic card. Your argument is invalid.

LennonMarx

Basic User

Posts: 32

Re: Semi-automated Vintage trend analysis

« Reply #15 on: May 10, 2009, 08:47:37 pm »

Quote from: FlyFlySideOfFry on May 10, 2009, 07:12:46 pm

Quote from: LennonMarx on May 10, 2009, 06:55:04 pm

Quote from: FlyFlySideOfFry on May 10, 2009, 04:45:35 pm

Quote from: AmbivalentDuck on May 10, 2009, 09:12:26 am

Ah I was under the impression that it would be a manual thing rather than just plugging decklists into a program. Well good luck I look forward to seeing how it turns out. Smile

That is manual, just in formal-logic notation. Again, I'm not even sure if that is what AD was looking for, it's just how I read it.


	Logged

AmbivalentDuck

Tournament Organizers
Basic User

Posts: 2807

Exile Ancestral and turn Tiago sideways.

Re: Semi-automated Vintage trend analysis

« Reply #16 on: May 10, 2009, 09:15:22 pm »

It it kind of manual, but notated in logic.

Ie. Tezzeret has Tezz, Vault, Drains, and no Oath of Druids main.
Fatespinner Ichorid has Fatespinners, Bazaar of Baghdad, and Bridge from Below

All decks that meet the description are of the archetype. All decks of the archetype fit the description.


	Logged

patat

Basic User

Posts: 16

Re: Semi-automated Vintage trend analysis

« Reply #17 on: May 10, 2009, 11:38:50 pm »

Correct me if I'm wrong, but I feel that if the decks are categorized into their own specific archtypes one by one, you may get a lot of unsorted, hard to deal with data, no?

It seems to me that if decks were to be categorized by engines, parsing and categorizing archtypes might be much easier..

I'm obviously not much of an avid poster, but quoting nataz from the vintage tiers thread:

Quote

...the "engines" are Drain, Bazaar, Workshop, Dark Ritual, and Null Rod (not really an engine, so perhaps a different word like core stratigies would be better)...

It's not quite in context, but it contains the core of what I am trying to say. So from this point, you can define most decks in terms of a couple different key factors:

1) Main engines (Drain, Bazaar, ect.)
2) Key components defining an archtype (Remora has a different draw engine than a Tezz based deck, and thus different key cards)
3) Tweaks to that specific archtype

I guess even win conditions could be an important category, maybe in between 1 and 2 in there. I just wanted to mention I think it might be easier to parse through that data if you were able to sort through a tree like structure before creating a label/archtype name for that deck.
Also, I feel it's pretty important to have a location category to the decks being played. Possibly by country/region (in the case of the US, the east and west coast metas are quite different.. =P) I wouldn't know how specific or broad to have a category like that, so I'll leave that thought to others to decide on.

Just the two cents from a TMD newbie with a little bit of coding knowledge. Hope it helps out. Also, I really like the idea of what it is you're trying to do. I spend a good amount of time on TMD and other sites learning what I can to become more knowledgeable about playing Type 1, and it's not a simple task. Having something like this accessible to all players, new and old, is something that I feel could bring a good amount of knowledge to a community trying to recruit new Vintage players, especially with it being said that Vintage is the hardest format to get into. Good luck with your task, and if you have any grunt work needing to be done, I may be able to help. =)


	Logged

Diakonov

Full Members
Basic User

Posts: 758

Hey Now

Re: Semi-automated Vintage trend analysis

« Reply #18 on: May 11, 2009, 09:03:18 am »

Patat has a good point, and to go further on that path, we have to remember that there is sort of a cascading effect of sets and subsets in the categorization of "archetypes." It's rather unfortunate that the general Vintage community uses that term so loosely (but it would be unrealistic not to), because it is such a gray concept at the surface to begin with. In other words, there are hybrids between archetypes that don't automatically fit right in one or the other exclusively. Many of these archetype-sets are intersecting.

To get any useful analysis, though, we have to draw solid lines in some places, even though they might be a tiny bit off/arbitrary.

I think that when you look at any deck, the first and most important thing about it is general strategy. For example, at the first level of categorization, Stax and Fish might together be in the same set, which is "play permanents that disrupt and prevent your opponent from winning quickly." Then we might expect Ichorid and TPS to be in the same set at this level, even though they use different engines. AFTER that, you can start to break down by engine.

This is where it can get weird: Ichorid and Stax might both use the Bazaar engine but for totally different strategies. For a perfect model you would need to look at it from an additional dimension, supposing you wanted to somehow be able to classify those two decks under the same set title of "Bazaar decks." This is still important for analysis, because sometimes it is the environment that allows a certain strategy to dominate (i.e. times when fast combo in general is just better) as compared to times when it is truly the engine at fault, where you might theoretically see Stax and Ichorid dominating strictly because of the power of Bazaar. More complicated still, sometimes it is specifically a composition of a strategy and an engine that is the problem. (This is probably the more frequent one.)

I don't have a lot to offer on how to organize this. Sorry. Just another thing to consider when you do start the classification system.


	Logged

VINTAGE CONSOLES
VINTAGE MAGIC
VINTAGE JACKETS

Team Hadley

Smmenen

2007 Vintage World Champion
Adepts
Basic User

Posts: 6392

Re: Semi-automated Vintage trend analysis

« Reply #19 on: May 11, 2009, 09:36:18 am »

With respect to the classification question, the greatest difficulty I have encountered is classifying various Fish and Workshop archetypes. If you can cut this gordian knot, then I think you can probably classify anything else.

For example, I think the most obvious distinction is Workshop Prison and Workshop Beatdown. In the vast majority of cases, you can perceive an emphasis. For example, if the deck has 4 Smokestack, it usually doesn't have 4 Juggernaut. And if it has 3-4 Arcbound Ravager and 4 Juggernaut, it usually doesn't have 4 Smokestack. The problem is the hybrids, like Aggro MUD. Or the Trinisphere era lists that ran 4 Stack and 4 Juggernaut. Also, how do you differentiate between the various colors of Workshop builds. Should we distinguish between UR Stax and 5c? Uba v. 5c? More importantly, what about mono-brown builds with Metalworker versus the other versions?


	Logged

@SMenendian on Twitter

Check out my podcast!

My Eternal Central Article Archive (new articles)

My Star City Games Article Archive (300+ Vintage articles since 2002)

hundredpercentjuice

Basic User

Posts: 2

Re: Semi-automated Vintage trend analysis

« Reply #20 on: May 11, 2009, 11:16:58 am »

Hey guys-
I'm at work right now, so I can't look this over very closely, but I have extensive experience in this sort of project.

For my own edification, I've done similar analysis/scraping in the past for MODO events and Deckcheck.net.

If someone will take the design lead, I will loan out my coding experience. -js

(also, the schema noted here is pretty similar to the one I used, if Meadbert's wife wouldn't mind helping with that part, PM me and I'll drop you my models.py from the django interface I use for this sort of thing)


	Logged

FlyFlySideOfFry

Full Members
Basic User

Posts: 412

Re: Semi-automated Vintage trend analysis

« Reply #21 on: May 11, 2009, 12:22:24 pm »

I think decks should be categorized by their goals, and then there should be an additional list for just plain specific card analysis. It really becomes quite simple categorizing decks if you look at the goals because hybrids become their own archetype. For example Shop Aggro's goal is to use Mishra's Workshop to power out huge creatures and enough soft locks to stall with them while the goal of Stax is to use Mishra's Workshop to power out enough soft locks to create a hard lock and wrap it up with a finisher. Thus it isn't the presence of something like Juggernaut that seperates the archetypes it becomes something like a specific look at non-Welder creature density. Blue is blue, water is wet, and Stax is Stax regardless of the colors or presence of Uba Mask. However, once you incorporate the two-tiered analysis that I suggest you still don't lose the specific card statistics that would be able to show a clear distinction between the dominance of one decklist over another within a select archetype. Of course this would be a lot harder than just a one-tiered analysis but it would definitely be more useful in my opinion.


	Logged

Quote from: Norm4eva on April 30, 2010, 11:06:40 pm

Mickey Mouse is on a Magic card. Your argument is invalid.

Smmenen

2007 Vintage World Champion
Adepts
Basic User

Posts: 6392

Re: Semi-automated Vintage trend analysis

« Reply #22 on: May 11, 2009, 12:58:48 pm »

Quote from: FlyFlySideOfFry on May 11, 2009, 12:22:24 pm

That's all easier said than done. What if the deck has multiple goals? What if it can kill you by either locking you out or attacking with Juggernaut? That's the problem. It's that many Vintage decks are hybrids. There are Painter-Remora Decks, Tez-Remora decks, Bomberman-Tez decks, etc. How do we categorize these when they split right down the middle?


	Logged

@SMenendian on Twitter

Check out my podcast!

My Eternal Central Article Archive (new articles)

My Star City Games Article Archive (300+ Vintage articles since 2002)

meadbert

Adepts
Basic User

Posts: 1341

Re: Semi-automated Vintage trend analysis

« Reply #23 on: May 11, 2009, 01:06:57 pm »

Quote from: Smmenen on May 11, 2009, 12:58:48 pm

Quote from: FlyFlySideOfFry on May 11, 2009, 12:22:24 pm

Categorizing decks can be very difficult. My suggestion would be to do something simple like list an example deck for each achetype and then categorize decks based on which example deck it has the most number of cards in common with.


	Logged

T1: Arsenal

AmbivalentDuck

Tournament Organizers
Basic User

Posts: 2807

Exile Ancestral and turn Tiago sideways.

Re: Semi-automated Vintage trend analysis

« Reply #24 on: May 11, 2009, 01:25:24 pm »

Well, I got at this earlier...we're stuck choosing between heuristics and principle components. The heuristics will make more sense, but the principal components (by definition) will more thoroughly describe the system.

I intend to do both, but I want heuristics first. My reasoning is that getting this project going will take individual steps that yield results the community can use. Even if after applying heuristics for the easiest 15 decks, we have some 'unclassified,' so what? A database and a trivial knowledge of SQL will let anyone repeat Steve's most recent analysis of drain dominance whenever they want almost on a whim. Huge first step by itself. And heuristics are the faster path there than PCA since I don't know how long that kind of analysis will even take and whether I'll need to code it up in CUDA to get it done in a sane amount of time.

Let's ignore the hard cases to start with (and leave them for PCA). And just define the archetypes that most matter and have a decent chance of t8ing.


« Last Edit: May 11, 2009, 02:07:26 pm by AmbivalentDuck »	Logged

Smmenen

2007 Vintage World Champion
Adepts
Basic User

Posts: 6392

Re: Semi-automated Vintage trend analysis

« Reply #25 on: May 11, 2009, 01:27:31 pm »

One of the most useful things for Vintage is that the four major engines: Drains, Rituals, Workshops, and Bazaars almost never overlap. That's almost always going to be your starting point. Force of Wills, however, can be found with all four.


	Logged

@SMenendian on Twitter

Check out my podcast!

My Eternal Central Article Archive (new articles)

My Star City Games Article Archive (300+ Vintage articles since 2002)

Harlequin

Full Members
Basic User

Posts: 1860

Re: Semi-automated Vintage trend analysis

« Reply #26 on: May 11, 2009, 02:51:49 pm »

As far as structure goes, I work with Databases all the time. And across the few systems I help maintain, I've seen both the benefits of good structure ... and the headaches caused by poorly designed structure.

I would suggest something like this:

Table Deck_Report

Key - Date
Key - Event_Name/ID
Key - Person_ID
Field - Deck_ID

Table Deck_Master

Key - Deck_ID
Field - Architype_ID
Field - ?
Field - ?? *whatever stats you want to gather at a deck-id level.

Table Deck_Reg

Key - Deck_ID
Key - Card_ID
Key - MainDeck_TF (or some other indicator for maindeck vrs sideboard
Field - Card_Quantity

Table BRList_Hist

Key - Card_ID
Key - Effective Date
Field - BR_Code (so for example: 1 = became restricted, -1 = became banned, 0 = became unrestricted/unbanned aka a 4-of)

Card_Description

Key - Card_ID
Field - Card_Name
Field - Ori_Printed_Date
Field - ? (again you could include whatever other stats help your arguement)

=====================================================================

As for the key to the card_id I would suggest using first printing of that card and then just concat its [Set][Card Number] similar to MWS.

These tables should give you everything you need for gathering your stats, as well as the ability to easily and quickly check for legality of deck lists. (not only simple sum(qty)>= 60 checks, but also making sure that cards met the B/R lists of that day (by building an oldschool select max subquery, or using some of the sql analytics stuff).

You'd also want a De-Dupeing Job that would go into the deck list table and try to find exact replicas, then change all instances of one id to the other and remove the duplicate list from the deck-list table. So If Jer and I happen to run the same exact list (right down to the sideboard) we would have two entries on the Deck_Report (one for each person), but each deck would referance the same Deck-ID.

As for architypes, you ~could~ use architype as part of the Deck-ID key. So for example if I had two very similar lists, so for example Jer and I run the same deck, but I run 2 shattering sprees in the sideboard but he runs 2 Ingot chewers. You could have them start with the same Architype ID, but then have unique identifer as another part of the key. So for example:

Jeff - Deck_ID = 12347123 (UBR Tezz with sideboard Shattering spree)
Jer - Deck_ID = 12348128 (UBR Tezz with sideboard Ingot chewer)

Or:

Jeff - Arch_ID = 1234, Deck_ID = 0001
Jer - Arch_ID = 1234, Deck_ID = 0003
(where 1234 = "UBR TEZZ" for example, and each Deck_ID is an example of that architype)

You would still end up needing the full two part key to find each persons deck, but one would have simpler queries to say "Give me every list that falls under the heading of 'UBR TEZZ' (Arch_ID = 1234, as opposed to Deck_ID Like '1234*')" Its sorta semantics at that point, you could make a 15 part key if you had a need for 15 levels of separation.

With the B/R table you could also quickly referance what cards were legal at the time a deck was created. for example if you were looking at a deck with 1 Gifts Ungiven in it, you may not immediately be able to see if that card was restricted at the time the deck was played. This could help quickly carve out better statistics especailly when things change. Also you could analyze how many restricted cards each architype runs on average, and as the list changes, does this number go up or down. I think if one of the main purposes of this database is research and insight into B/R chages, not including this type of historical table would cause overly complex analysis of raw stats.

Also on the Card_ID table you could include the date the card was originally printed/legal in vintage. Again, making certain types of inqueries much easier (for example, how many 'new cards' are played in the 12 months after a set is released on average across all sets).

I wouldn't run a separate row for each 'card' in a deck. It over complicates the index for that table for no gain... so for example:
Table Deck_Reg

Key - Deck_ID
Key - Card_ID
Key - MainDeck_TF (or some other indicator for maindeck vrs sideboard
Field - Card_Quantity

You'd have:
Deck-ID | Card-ID | MD_TF | qty
11349971 | DI133 | True | 2
12349971 | DI133 | False | 1
So that's how I would show a deck that ran 2 maindeck Trygon Preditors (first printed as card 133 in Dissension), and 1 in the sideobard. No Key Violation.

The other way would look like...
Deck-ID | Card_In_Deck | Card-ID
11349971 | MD_27 | DI133
11349971 | MD_28 | DI133
...
11349971 | SB_14 | DI133

Without Card_In_Deck, you'd end up with a key violation even if you made the entire table the key.

That gives you two complications I can think of off the top of my head... #1 comparing two decks against one another that are exact replica. So for example again, Jer and I playing idential decks, but In my list I put Trygon Preditor as the 4th card on my reg-sheet, and Jer puts it as his 14. If we don't use some sort of sorting algorythm when the decks are enter we end up with two identical lists, that don't appear in the database to be idential.
#2 You always have to do some sort of aggrogating on Deck-ID / Card-ID to get anything useful. so for example, if you wanted to know "How many decks run any card in the sideboard, that they also run maindeck?" In the first structure, its a fairly simple self-join where MD_TF<> deal... in the other structure, you have to do a little more grouping before you join.
EDIT a 3rd I thought of is if you wanted to do something like "Show me every deck that ran more than 1 Maindeck In the Eyes of Chaos" It would be a simple where clause in the first method. But the 2nd method would be ... well a more complex where that would have to be de-duped (probably with select distinct).


« Last Edit: May 11, 2009, 03:29:27 pm by Harlequin »	Logged

Member of Team ~ R&D ~

AmbivalentDuck

Tournament Organizers
Basic User

Posts: 2807

Exile Ancestral and turn Tiago sideways.

Re: Semi-automated Vintage trend analysis

« Reply #27 on: May 12, 2009, 09:46:13 am »

@Harlequin
You've just completely lost me. Want to volunteer to handle the SQL end of this since you seem to have done most of the design work in your head already?


	Logged

Harlequin

Full Members
Basic User

Posts: 1860

Re: Semi-automated Vintage trend analysis

« Reply #28 on: May 12, 2009, 02:59:36 pm »

I can help with the design of it no problem.

But the problem is that I work basically 100% on main frame stuff. I'm way behind the times when it comes to internet front-ends and portals, etc. So I can mock up the tables and put in examples in like an MSAccess DB... but someone would have to take that and port it over to mySQL or some other net-friendly platform. So we can access it through those serise of tubes know as "the internets" Wink


	Logged

Member of Team ~ R&D ~

emidln

Basic User

Posts: 437

Re: Semi-automated Vintage trend analysis

« Reply #29 on: May 12, 2009, 04:34:25 pm »

I can help out. I use Python daily at work. I'd be willing to work up a frontend (it'd be using it something like Apache/Django to save time unless someone objects). If someone is more familiar with another language I could probably use that too (we consultants end up very adaptable).

Edit: Moving between databases is pretty easy these days. I'm partial to an abstraction layer called SQLAlchemy which is what we use to keep our code portable (mostly) between Oracle, Access, MySQL, SQL Server 2000, and Sqlite at work. There are several others for Python, and similar packages exist for most environments.


« Last Edit: May 12, 2009, 04:42:29 pm by emidln »	Logged

BZK! - The Vintage Lightning War

Pages: [1] 2