[Premium Article] The hottest deck in Vintage, GWB Beatdown, and Other Vintage

Yeah, I don't know why I didn't think of doing polls years ago. The results are fascinating.

Here is my response to Kevin Binswanger's SCG questions:

Kevin Binswanger wrote:

Quote

Then, last June, the DCI restricted five cards: Merchant Scroll, Brainstorm, Ponder, Gush, and Flash. At a minimum, it is apparent that neither Gush nor Flash needs to be restricted with Merchant Scroll, Brainstorm, and Ponder all restricted.

I don't think this has been proven at all.

Kevin, I do not mind responding to your feedback, and addressing your points. But I would appreciate the courtesy of more than bald, directly contrary assertions. This is pretty much the responsive equivalent of saying: ‘I think you are wrong.” It would be one thing if there is a minor point or a tangent, but the argument that the June, 2008 restrictions were overbroad is one of the major arguments I take pains to develop during the course of a 23 page article. You could at least do me the courtesy of addressing my supporting arguments if you disagree with my conclusion. Large swaths of my 23-page article were dedicated to unpacking the proposition that Gush and Flash were unnecessary restrictions. If you did not find those arguments persuasive, then you co0uld at least do me the courtesy of explaining why you did not find my arguments convincing rather than simply assert that “I am wrong.”

Not to mention, addressing your point actually requires that I re-state arguments already developed in the article. So, responding to your claims amounts to nothing less than an article re-write.

The arguments I presented: a) that the DCI has a habit of over-restricting, b) this is most likely when they restrict multiple cards with the same objective in mind, c) Gush, Scroll, Brainstorm, Ponder, and Flash were cards that were used together, and their restrictions were aiming at the same objectives; d) any single one of those restrictions would have produced a profound impact on those respect decks. These arguments, and their fuller elaboration throughout my article last week and in earlier articles, constitute very strong evidence and make a powerful case that at least some of the June 20 restrictions were unnecessary. I’m not sure what you consider to be ‘proof.’ It might not rise to the level of scientific proof, but it’s enough to take to a jury. Is it that hard to believe that restricting just Merchant Scroll and Brainstorm would have seriously stunted both Flash and Gush decks to such a degree that additional restrictions were unnecessary?

But putting the content of my argument aside for a moment, the standard you’ve set is one that is unreasonable, if by ‘proof’ you are asking for more than reasonably strong arguments. Sure, my case is not 100% airtight. It’s theoretically possible that Flash and Gush both deserve restriction even with the other three restrictions in place, but anything is theoretically possible in Magic. This is not a hard science. This is easy to see this if you flip the script. When was the last time any DCI or Wizards decision met that standard of proof?

You are holding my argument that Gush and Flash shouldn’t have been restricted to a high level of proof, but the evidence I presented, by any reasonable measure, make a stronger case than the evidence presented by the DCI in support of their decision. If I have not proven that these cards did not deserve restriction, has the DCI proven that they needed restriction? They have not once advanced arguments that would explain why Gush and Flash need to be restricted if Brainstorm, Merchant Scroll, and Ponder were each restricted.

Their first attempt (Explanation 1) at explaining these restrictions was so unbelievably wanting in any logical whatsoever that the DCI was shamed into issuing a more detailed explanation, hardly something that inspires confidence. The second was more detailed (Explanation 2), but left major questions unanswered, such as the timing of the decisions and their scope. For example, why restrict Flash on June 20 instead of March 20 or September 20? Why restrict Flash if Brainstorm and Scroll were restricted? Not only were the restrictions unconnected in any clear way from actual data, their analysis never even acknowledged that the various restrictions would have an effect on the need for others. Tom LaPille tried to make the case for a third time Explanation 3 ), and while there was even more detail this time, the reasoning was somewhat astonishing. Tom LaPille stated that they were pursuing two particular objectives. The first objective was to balance the metagame by engine. Yet, predictably, the restrictions had precisely the opposite effect (http://img8.imageshack.us/img8/1205/metagamechartbyengine.png ). It’s not just that the restrictions are overbroad and questionable timing (especially considering the data), it’s that the objectives don’t even match the means chosen! Regardless, these brief excerpts, with numerous gaps in reasoning, failing to address the most critical questions regarding the restrictions, despite three attempts, amount to far less than any standard of proof you are holding my arguments up to. At a minimum, the DCI has failed to make the case that each of these cards deserved restriction, despite a record three attempts to do so. It’s obvious why: because, as I’ve shown, the DCI tends to over-restrict. Not one time they did explain why Gush and Flash need to be restricted if Ponder, Brainstorm or Scroll were restricted, despite three opportunities.

Quote

there is a fundamental limit on how fast you can make a two card combo in Vintage, even if every single Vintage card were unrestricted. That limit is roughly 70-80% turn one goldfish

Out of curiosity, where does this number come from? Running out statistics with no idea where they come from makes it look like you're making stuff up, whether or not you are.

In the sentences immediately following the quote your excerpted is not only a citation for where I got that statistic, but also an explanation for it.

Quote

If it's just because of your experiments, I'm sort of concerned; your personal experiments don't necessarily represent the truth of the format.

Isn’t that true of anything I say, taking you at your word? You can apply that to anything that isn’t a rule of the game or deck construction. My testing doesn’t mean that my decks are good. My tournament victory doesn’t mean that my deck was good. My analysis doesn’t mean that I’m right. That’s why I try to support my contentions with reasons to think that they are true.

To recap:

Probability governs the chances of seeing two of any given 4-ofs in the opening seven of a 60-card deck. Tutors can functionally increase the permutations in which a two-card combo can be achieved. However, the constraint of only being able to draw 7 cards in opening hand ensures that your mana supply is similarly limited, acting as a constraint on the degree to which tutors can functionally a combo part. Fourth, there remains a limited card pool from 15 years of printings, and only so many tutors that can be used. These four limits, three theoretical and one contextual, can be roughly measured.

In my unrestricted Vintage experiment, I put together an unrestricted Flash deck that was designed based on the idea every restricted Vintage card was unrestricted. That allowed me to run maximum amounts of mana accelerants and tutors. After tweaking the Flash deck, I discovered that there was an upward bound of about 80% to achieving turn one Goldfish. I think there are good reasons to think that my experiment gave me an approximate measure of the 4 constraints set out in the paragraph above. Given that: 1) Flash either is or is tied for the most mana efficient game winning two card combo in the game since it only requires a blue mana and a colorless to executive in terms of its mana requirements. 2) Flash combo is support by the greatest number of efficient tutors available in the card pool relative to other two card combos, primarily because of Merchant Scroll and Summoner’s Pact. Summoner’s Pact is a zero mana tutor for one combo part, and Scroll is a two mana tutor for the other, as efficient as Demonic Tutor, which I also ran. If those two reasons are correct, that that constitutes very compelling reasons to think that the simulation I ran is pretty accurate.

You should use your own judgment as to whether you find my claims persuasive, but I think you should first at least take a look at my reasoning and what my experiments demonstrated before writing them off as simply ‘my personal experiments.’ They might not be the ‘truth of the format’, but there might nevertheless be truths there.

Quote

RE: Belcher, I strongly doubt that an optimized 4 Channel Belcher list will be running Banefire or Kaervek's Torch. Empty the Warrens seems like a much more powerful card, since it doesn't require channel. I'm glad you came to the right decision, but I feel like you're sort of missing the point there. EtW itself is basically immune to counters.

Take another read through the section analyzing Belcher. The objective was not to build an optimal tournament Belcher list. Rather, I explained that I wanted to ‘stress-test’ to see how fast Belcher could goldfish if it were designed specifically for that purpose. If Belcher could goldfish above a certain threshold, then I would could reject the proposition of unrestricting Channel at the outset without having to inquire further into tournament viability.

Quote

Neither list has things like Yawgmoth's Will, Necropotence, DT, VT; you know, staples.

It’s interesting that you assert that black cards are staples in Belcher when the highest placing Belcher list in a recent American tournament did not run those cards. On what basis do you assert that those cards are staples for the archetype?

Quote

And not running the black tutors or card advantage spells is a large reason why the deck is as good as it is. If you want to talk Belcher, I strongly suggest you talk to David Kleppinger.

Again, my intent was not to build the optimal tournament list. Rather, I was testing to see how fast I could make Belcher. If I were interested in pursuing this inquiry beyond the goldfish rate, I would take up your suggestion, but it is unnecessary.

Quote

I'm glad you came to the right decision,

It’s interesting that you feel that I made the right decision, especially since you seem to hold me to such a high standard of proof with respect to my claim that Flash and Gush didn’t need to be restricted. Given that you hold me to such a high standard of proof, how do you know I came to the correct conclusion? My analysis was primarily limited to evaluating goldfish speed, which is far from the only factor relevant to evaluating a card’s tournament viability.

Quote

Did you goldfish the Flash deck the way you did the Belcher deck?

Yes, I did. If you must know, the goldfish rate was about 15% on turn one.

Quote

The numbers aren't there, which makes it look like you didn't.

Take another look at the analysis and the way in which I’ve structured my inquiries for both cards. The analysis for Belcher and Flash was very different. For Belcher, I was primarily interested, as a threshold inquiry, in its speed. For Flash, there was no serious question that it would be a much slower goldfish, for many obvious reasons. The more important inquiry for Flash was how strong the deck would be overall. It was more important to see not just how much slower Flash was, but to gauge its power, flexibility, and resilience as well.

Quote

As for having 2 cards; you can discard a creature to Body Snatcher and put another back with Brainstorm.

That’s my point. This deck has many, many problems that unrestricted Flash, with 4 Brainstorm and 4 Scroll, did not. Brainstorm is now restricted, so the Flash deck will have a great deal of difficulty winning if it draws two of the late combo creatures.

Quote

I'm concerned with Gush that you still have your blinders on. Many of the Gush decks weren't GAT at all, but more things like combo.

You say I have blinders on, yet I explicitly pointed this fact out within the body of the article:

Quote

The restriction of Merchant Scroll would have seriously handicapped all Gush decks. At a minimum, the ability to no longer combo out in a single turn by chaining Gushes into more Gushes via Scroll meant that Storm decks could no longer use the Gush-bond engine.

A visible, but small, percentage of Gush decks were Doomsday or Storm combo, such as the Tropical Storm. Rather than undermine my essential contention that Gush need not be restricted, it is a case in point. It was precisely these decks that were most unlikely to continue to use Gush after the restriction of Scroll and Brainstorm. Unlike GAT or Tyrant Oath which used Gush both as a combo engine and for other purposes, such as a multi-turn draw engine or a combo with Tyrant, Storm combo much more heavily relied on being able to find Fastbond and chain Brainstorms, Scrolls, Gush, and Ponder to a lethal victory. Remember, Storm combo is interested in winning on a single turn. Unrestricted Gush is no longer a reliable for that end. Without Scroll and Brainstorm unrestricted, your chances of stall are too high. The fact that Gush was used by Storm decks is not evidence that I have blinders on – in fact, it’s a point that I rely on, as it supports by general contention that restricting Gush was unnecessary with Scroll and Brainstorm also restricted.

The issue of whether Gush and Flash were unnecessary restrictions, in light of the other three restrictions, comes up again and again in my articles. But too many times, people act as if the other three restrictions are not part of the overall issue. For just once I would like someone to explain why Gush and Flash deserved restriction if Merchant Scroll, Brainstorm, and Ponder were also restricted.

Quote

Plus, how can you say "Unrestrict Gush to fight Drains" when the Gush lists run Drains? (Yours runs 2, most Gush-era builds had 3).

This is factually inaccurate. While some Gush decks did run Drains, very few ran more than 2 Drains, and most ran none after the printing of Lorywn. The typical Gush decks of the era, Tropical Storm, Tyrant Oath, GAT, were each decks that ran 0-2 Drains, and mostly zero after Lorwyn. While some Gush-era decks did run Drains (specifically, Probasco’s SCG winning Painter deck), most did not. For proof, take a look at the May/June metagame breakdown. (http://www.starcitygames.com/magic/vintage/16161_So_Many_Insane_Plays_The_MayJune_Vintage_Metagame_Report.html ) There were 32 Gush decks in top 8s, and only 2-3 had more than 2 Drains, and all of those were Painter variants modeled after Andy Probasco’s SCG winning list in Richmond in May. In fact, some of the one’s modeled after Andy’s list, such as the Annecy winner, only had 2. Many of the Painter lists subsequent actually went into the control mode and dropped the Gush engine altogether.

Conceptually though, as I explained last month (see point (6) of the post-script), there is a major difference between the decks that start:

4 Force of Will

4 Brainstorm

4 Gush

4 Merchant Scroll

1 Fastbond

And decks that start:

4 Force of Will

4 Brainstorm

4 Mana Drain

4 Thirst For Knowledge

X artifacts

The latter are indicative of a very different breed of archetype. The latter is Drain Tendrils, Control Slaver, etc. The former represents GAT, Tyrant Oath, Tropical Storm/Doomsday combo, etc. While there is some degree of intersection, especially since a couple of Gush decks use Mana Drains, almost no Gush decks ran Thirst. They represent different draw engines, different modes of play, different strategies and tactics, and a very different approach to the metagame.

The difference between the two types of decks I just described is as different as decks that start like this:

4 Force of Will

4 Brainstorm

4 Duress

4 Dark Ritual

The fact that they shared two common blue spells does not make them the same deck any more than the fact that TPS and Control Slaver both run 4 Force of Wills and many of the same blue and black spells.

The difference is critical as the Drain/Thirst decks were strategically advantaged against the Workshop decks, but lost to the Gush decks. However, the Gush decks, while enjoying a strategic advantage over the Drain/Thirst decks, were disadvantaged when facing the Workshop decks. This is, in fact, one of the key reasons I believe Gush should be unrestricted. Our metagame is too Drain heavy.

Quote

What about Tezzeret with Gush to back up Thirst for Knowledge?

Why didn’t Control Slaver run Gush to backup Thirst For Knowledge? Same answer. The Drain/Thirst decks are not optimal homes for unrestricted Gush.

Quote

One fair deck using a card doesn't prove that the card is fair.

Of course one fair deck using Gush doesn’t prove its fair. It’s misleading to suggest that the only reason I advanced to think that unrestricted Gush is fair is my GAT list.

Quote

Specifically, your Gush list doesn't use Time Vault.

I’m curious as to how this is pertains to the objective of rebalancing the Mana Drain dominance field, which was the point of my article last week.

Quote

I concede that some proportion of the field will be "stubborn" and play the same deck regardless of their performance, until after a long enough period they either change or quit trying to compete. However, we know that enough players do switch to make a difference in terms of Top 8s proportions, as the Gush and Trinisphere and other eras prove. If it was simply player preference projected into Top 8s, then those eras would have had different results.

I'm not sure the data are there to support this claim, and even if they are I don't feel like you're presenting the data.

If you look at my metagame reports and those compiled by Phil Stanton before me, the data speaks for itself. Gush was not legal before June, 2007, and then it grew to a huge proportion of Top 8s. This point is provable as a matter of mathematical logic. Large numbers of players had to switch in order for this to occur. Its just math. If people didn’t switch from other decks then this couldn’t have happened.

Also, as I explained, the data collects proportions of top 8. One archetype per top 8 represents 12.5% of top 8s. It only takes one person in a field of (of at least 33 players) to help a deck reach 12.5% of top 8s! That’s a tiny percentage of the field. That is, as little as 1% of the field has to change to make a 12.5% change in the percentage of Top 8s stat. And if 2 players make top 8 with the new archetype, you have huge changes in the overall top 8 pie.

But I explained all of this last week in more detail:

Quote

It only takes a tiny number of people to make the switch in any field to make a difference. For example, only one player per archetype in a tournament has to make top 8 on average with a different archetype for that archetype to make up over 10% of top 8s in total. Tournament top 8s are very sensitive to small changes in player deck choices, provided that those deck choices are high performers. The data reflects this.

Even if 90% of players play the underperforming archetype, it only takes the remaining 10% to switch to totally transform what top 8s look like in the aggregate. This suggests, contrary to the underlying assumption, that other decks are probably not better choices for the tournament.

Thus, the point is made: if Necro decks were 60% of the field, but 40% of tops, then they wouldn’t be a problem because they are underperforming However, the question would be: why would the large proportion of Necro decks persist, even though it has been shown that in terms of raw proportions, they are only the third best performing deck? The answer is very simple: they may still be the best deck choice.

Most obviously, the Necro players may still believe that the Necro decks give them the best chance to win the tournament, even if they make top 8 in slightly lower proportions than a few other deck options. One could imagine how this might be the case. The fact that people haven’t switched away should not suggest that people are stubborn, but rather the belief that people have that those decks give them the best chance to do well in the tournament.

Consider Mana Drains. Even if Mana Drain decks put up lower proportions of Top 8s relative to the field as compared to, say, Ichorid decks. Mana Drain decks give you the absolute best chance of winning a tournament once you are in a top 8. Mana Drain decks, for example, were only 42.5% of Top 8s, but they were 66% of tournament wins in March and April. As another example, Mana Drain decks were 30% of the Day 1 Waterbury Field, but they were 43.75% of the top 16, and the eventual tournament victor. Mana Drain decks may be the optimal game theory option, even if they underperform relative to their proportions in the field, for this reason and more.

In short, if Mana Drains were underperforming the field, then we would know it because it would manifest in changes in the proportions within top 8s, since it requires such a very, very small number of people to effectuate that change. It’s just math.

Quote

Plus, we're not just dealing with random players. We're dealing with specific players with blue deck biases. People like Rich Shay and Andrew Probasco, as an example, are both better than 1/N to T8 a tournament and also strongly prefer playing blue decks. And they bring with them their support network (aka barns).

This is really an important point, and one of the most critical errors that Vintage players, particularly though with ties to the community make.

The fact is that we are dealing with random players, at least in the sense that the top 8 data does not reflect the preferences of particular players. Rather, the top 8 data is aggregate. Top 8 data is aggregate data that compiles data points of well over a hundred players in tournaments that represented hundreds and hundreds of players.

It’s very important that the Vintage community understand the data. In fact, your reference to Andy Probasco and Rich Shay allows me to underscore a critical point. Virtually every single tournament in which these players competed in the last 6 months is excluded or not represented from my data sets for two reasons. First of all, roughly 2/3s of my metagame reports are from non-American tournaments. Secondly, virtually zero New England tournaments make it into my data sets since they tend to be under the cut off 33 players. For example, in the last dataset, only two of the fifteen tournaments were from New England (the two Waterbury’s) and only 4 of the 15 tournaments were from the US.

So, in fact, the datasets do not reflect Rich Shay, Andy Probasco, or their barns.

Your assertion to the contrary is known as the ‘fundamental attribution error’ in psychology, when people attribute system outcomes to the preferences or behavior of a particular number of individuals. It’s a very common bias. It has to do with the way that we encode information, and our conceptual frameworks for interpreting data. There is a misperception, fostered by the fundamental attribution error, since the Vintage community being relatively small and somewhat insular, that particular individuals carry far more weight than they do. This perception is heightened by visibility of these players on the community forums or within small tournaments in local areas. But it’s simply an inaccurate one, at least insofar as my metagame reports are concerned. You’ve been logging too many hours in IRC (j/k).

But, as I explained above in my discussion of the simple math regarding what it takes to change top 8 statistics, even if there were particular deck biases, performance ultimately determines top 8 data in the long run, not preferences. That’s because, again, it only takes a teeny, weenie fraction of the field to pick up the higher performing deck to change top 8 data. We should have to assume virtually 100% stubbornness for top 8s not to change, and we know that this is empirically untrue, not to mention silly.