TheManaDrain.com
September 08, 2025, 06:17:48 am *
Welcome, Guest. Please login or register.

Login with username, password and session length
News:
 
   Home   Help Search Calendar Login Register  
Pages: [1] 2
  Print  
Author Topic: Amount of Games Required For an Effective Testing Experience  (Read 9120 times)
JarofFortune
Basic User
**
Posts: 356



View Profile
« on: July 30, 2015, 12:46:56 pm »

When you are testing a deck, how many games/matches of each matchup do you play to become proficient in the matchup? How many games should you play as the opponent's deck? This doesn't have to do with win percentages, but more being comfortable with how your deck and the opponent's decks interact in the matchup, and knowing what lines of play to take.
Logged

The Auriok have fought the metal hordes for so long now that knowing how to cripple them has become an instinct. -Metal Fatigue
vaughnbros
Basic User
**
Posts: 1574


View Profile Email
« Reply #1 on: July 30, 2015, 02:23:15 pm »

It really depends on how much variance is involved in the match up, how familiar you are with each deck, and what kind of resources you have in terms of primers/play test partners.

If you are getting a lot of crazy games where for instance a resolved Ingot chewer is winning the game in a blue vs blue MU you will probably need a lot of games.  Likewise if you've never piloted a gush storm deck it'll take longer to understand exactly what to do.  If you have a play test partner who knows all this stuff it can accelerate things for you. 
Logged
mmcgeach
Basic User
**
Posts: 318


View Profile
« Reply #2 on: July 30, 2015, 03:26:26 pm »

on a related note, I'm pretty confident that 7-10 games is enough to get an idea of how a particular matchup goes, and whether or not it's in my favor.  After 7-10 games I can usually tell what cards are important in the matchup and what I want to side out and in; or possibly make changes to the sideboard.  Of course, if those are significant changes, it takes another 7-10 games to decide how *those* impact the matchcup.
Logged
TheBrassMan
Administrator
Basic User
*****
Posts: 692


AndyProbasco
View Profile
« Reply #3 on: July 30, 2015, 04:23:56 pm »

I'd echo what other posters are saying here. Vintage is particularly hard to test because of variance, and because of how swingy certain cards are. If you test 10 games and find that "the player who resolved Ancestral won" each time, you can't use that information to add more Ancestrals. (you can add MisDs or something, but you get the idea). There's a bit of an art to it.

I usually try to test in batches of 8-10, as mentioned, but you really want to pay attention to more than just the results. Try to pay attention to cards you draw that didn't help, even when you won, or cards that would have been outs, even if you lost. Pay extra attention to games that you lose where you drew or saw sideboard cards. It's a whole skill, completely aside from playskill, to be able to extract good data from test games, and be honest with yourself about the likelyhood of tournament games playing out the same way.

If you're testing to build a deck, rather than testing an existing deck to learn to play it, you can alter the way you play to give you better results in fewer games.

If you want to test a card that you don't want to run multiples of (maybe it's restricted, or maybe your sideboard space is tight, or maybe it's useless in multiples), you could play 10 games without ever drawing it, which gets you nowhere. One technique you can use is seeding the card into your opening hand. Just set the card aside, draw 6 cards and add the test card as the 7th. It'll mess with your mulligans, but you can get valuable data fast. If you seed Virulent Plague in your opener against dredge, and you quickly lose 6 games in a row, you can dump the idea and move on to the next card. If you just ran 2 Virulent Plague/4 Cage/1 Yixlid Jailer and played fair, you might play dozens of games without learning anything about the card ... you might even WIN most of those games (on the back of the other cards) and never realize that the Plague is bad for your deck.

A similar technique for testing a new card is just running more than you would expect is the correct number. This is useful when you don't necessarily want in your opening hand, or maybe one that gets better when your opponent doesn't know you have it. Remember this is a test game, so you don't have to stop at 4. If you want to see how good Flusterstorm is in a matchup, you can run 5 or 6. You have to be smart about this though. If you lose when Flusterstorm in hand, but the first Flusterstorm you played was strong, maybe you just have too many - but if you lose with Flustertorm in hand and the first one was mediocre, the card has to go. Again, if you only ran the 1/2 you were planning on adding, it might take you much longer to figure it out.

It works both ways - you can take cards out of your deck that you expect to run (e.g. how good is this matchup when I DONT draw Tinker?), you can seed cards into your opponent's hand (e.g. can this deck consistantly beat Chalice on the draw?), I know people who mostly test games against workshops on the draw, or Dredge players that mostly test post board games.

Of course those are great ways to answer questions about a deck faster, but you need good questions to ask. The only way to get those is just getting in lots of games.


Logged

Team GGs:  "Be careful what you flash barato, sooner or later we'll bannano"
"Demonic Tutor: it takes you to the Strip Mine Cow."
Smmenen
2007 Vintage World Champion
Adepts
Basic User
****
Posts: 6392


Smmenen
View Profile WWW
« Reply #4 on: July 30, 2015, 05:11:48 pm »

Unfortunately, the question in the OP really depends on what is meant by "effective" or "proficient."  

It's my view that performance, generally, scales to skill, and skill has no real cap.  There may be diminishing returns on skill past a certain point on a skill scale, but it still makes a difference.  At the extreme end, look at some of the games played by LSV and Efro in the VSL.  So, when playing a matchup, you can't really untangle the strategy and tactics (the cards used) from the opponent you are facing.  It's typical for pro players to talk in terms of matchup %ages, but those are really just heuristics, and aren't actually empirical assertions.  

Also, you can't really untangle specific skills with a strategy from general Vintage or even Magic skill.  There are alot of tactics that cut across the format that aren't deck specific - such as how to use Vampiric Tutor, timing Ancestral Recall, and use countermagic.  Someone who has very high skill level with a specific deck (like Landstill), but very low experience and skill in Vintage in general, may be good strategically, but only medium strength tactically.  Performance outcomes depend on all of those factors, and there is no way to isolate the contribution of each on ultimate outcomes.  

In general, I don't think it's possible to play a matchup and definitively define a matchups percentages for the reasons just mentioned.  

That said, I think the function of testing is less to assess %ages, and more to look for patterns and develop a conceptual model or understanding of the matchup.  You should test until you can see clear patterns in the data and develop at least a basic causal structure for matchup outcomes - such as what cards or circumstances seemed to most closely correlate with victory or defeat.  Once you have played enough games to discern clear patterns, then that should be the basis for understanding the matchup.  That number of games will vary greatly based upon a number of factors, including experience with the matchup, experience in the format, ability to perceive patterns, and much more.  

Logged

Elric
Full Members
Basic User
***
Posts: 213



View Profile
« Reply #5 on: July 30, 2015, 06:27:08 pm »

on a related note, I'm pretty confident that 7-10 games is enough to get an idea of how a particular matchup goes, and whether or not it's in my favor.  After 7-10 games I can usually tell what cards are important in the matchup and what I want to side out and in; or possibly make changes to the sideboard.  Of course, if those are significant changes, it takes another 7-10 games to decide how *those* impact the matchcup.

As Smmenen says above, there's no such thing as a matchup percentage because skill matters so much. But even taking your skill and the skill of your opponent as a constant, unless a matchup is very lopsided 10 games isn't enough to tell who is favored based on the results alone. For example, if you win 6 of 10 games, the 95% confidence interval for your win percentage is 31% to 83% (for stats nerds, this is using the Wilson method: http://vassarstats.net/prop1.html).  If you win 8 of 10 games the 95% confidence interval is (49% to 94%). This ignores sideboarded and unsideboarded games being different, which increases the number of games needed. To get a sense of which deck is favored from a small number of games, you need a lot of intuition or knowledge separate from the actual results.
Logged
JarofFortune
Basic User
**
Posts: 356



View Profile
« Reply #6 on: July 31, 2015, 12:00:11 am »

Thanks for the responses. There are a lot of good ideas in here. I should clarify that I wasn't really thinking about win percentages so much as the amount of matches you should play for a given matchup in order to prepare for an event (Obviously this should vary depending on the expected popularity of each deck in your gauntlet.).To be even more specific, after you have learned the core interactions and recurring patterns in a matchup, how much more should you test it if you are preparing for an event and have limited time? Ar what point does the diminishing returns aspect become big enough that jamming more games isn't worth it if you have a limited amount of time to prepare? For example, is there a value of x where after x amount of games there is not much to be learned until you have played several hundred more?
« Last Edit: July 31, 2015, 12:03:38 am by JarofFortune » Logged

The Auriok have fought the metal hordes for so long now that knowing how to cripple them has become an instinct. -Metal Fatigue
TheWhiteDragon
Basic User
**
Posts: 1644


ericdm69@hotmail.com MrMiller2033 ericdm696969
View Profile WWW
« Reply #7 on: July 31, 2015, 12:30:59 am »

As a rough rule of thumb, I'd say (given your deck is as ironed out as you want it pre-tourney) 100 games per matchup to get a decent understanding of the way the matchup plays out in general (effective strategies, problematic cards, etc.).
Logged

"I know to whom I owe the most loyalty, and I see him in the mirror every day." - Starke of Rath
serracollector
Basic User
**
Posts: 1359

serracollector@hotmail.com
View Profile Email
« Reply #8 on: July 31, 2015, 01:21:08 am »

I agree with the above. When I tested my standstill for a tournament I played a total of 133 games on cocktrice against well known friends and players against a varied gauntlet and it helped out tremendously me gettin a first of sixteen players and top four in a thirty one player tournament. Learning sideboard options and mulligan choices were clutch. Also playing landstill obv knowing the matchups definitely helped with when to use counters or not.
Logged

B/R discussions are not allowed outside of Vintage Issues, and that includes signatures.
Guli
Basic User
**
Posts: 1763


View Profile
« Reply #9 on: July 31, 2015, 01:32:39 am »

500 games, in various match ups, most of them post board. Then you can start talking and writing Smile
Logged

Shax
Basic User
**
Posts: 247


0TonyMontana0 =twittername add me!

Braveheart+Shax
View Profile Email
« Reply #10 on: July 31, 2015, 01:07:45 pm »

I would say until you feel comfortable enough to flash your hand on the table with a Zing! Type of moment.
Logged

Jesus Christ the King of Kings!

Vintage Changes: Unrestricted Ponder

Straight OG Ballin' shuffle em up tool cause you lookin' like mashed potatoes from my Tatergoyf. Hater whats a smurf? You lucksack? I OG. You make plays? I own deez. You win Tourneys? I buy locks. You double down? I triple up. Trojan Man? Latex. ClubGangster? I own it.Sexy mop? Wii U. Shax 4 President?
-Hypnotoa
Bluediamonds
Basic User
**
Posts: 23


View Profile
« Reply #11 on: July 31, 2015, 01:37:44 pm »

Since vintage variance is highly dependent on the appearance of restricted cards in the first 3 turns or so

Has anyone considered testing permutations of restricted card appearances?

U can do this with testing forced hands of 1 to 2 restricted cards and draw 5 - 6

i.e. Deck A with recall & walk  vs Deck B without on both play and draw
      Deck A with Lotus vs Deck B with Lotus openings on both play and draw
Logged
MTGFan
Basic User
**
Posts: 273


View Profile
« Reply #12 on: August 03, 2015, 09:11:39 am »

Unfortunately, the question in the OP really depends on what is meant by "effective" or "proficient."  

It's my view that performance, generally, scales to skill, and skill has no real cap.  There may be diminishing returns on skill past a certain point on a skill scale, but it still makes a difference.  At the extreme end, look at some of the games played by LSV and Efro in the VSL.  So, when playing a matchup, you can't really untangle the strategy and tactics (the cards used) from the opponent you are facing.  It's typical for pro players to talk in terms of matchup %ages, but those are really just heuristics, and aren't actually empirical assertions.  

Also, you can't really untangle specific skills with a strategy from general Vintage or even Magic skill.  There are alot of tactics that cut across the format that aren't deck specific - such as how to use Vampiric Tutor, timing Ancestral Recall, and use countermagic.  Someone who has very high skill level with a specific deck (like Landstill), but very low experience and skill in Vintage in general, may be good strategically, but only medium strength tactically.  Performance outcomes depend on all of those factors, and there is no way to isolate the contribution of each on ultimate outcomes.  

In general, I don't think it's possible to play a matchup and definitively define a matchups percentages for the reasons just mentioned.  

That said, I think the function of testing is less to assess %ages, and more to look for patterns and develop a conceptual model or understanding of the matchup.  You should test until you can see clear patterns in the data and develop at least a basic causal structure for matchup outcomes - such as what cards or circumstances seemed to most closely correlate with victory or defeat.  Once you have played enough games to discern clear patterns, then that should be the basis for understanding the matchup.  That number of games will vary greatly based upon a number of factors, including experience with the matchup, experience in the format, ability to perceive patterns, and much more.  

Separating player skill from deck/matchup positioning always has seemed to be the most challenging aspect of testing for me. The best way to remedy this concern is to ensure that the skill level of both testers is as similar as possible. When doing serious testing, seek someone you consider to be your equal in terms of skill. Note that it might actually be detrimental, if you solely want to test match-up percentages in particular, to find a play-testing partner who exceeds your level of skill (when in most other situations playing against someone of a higher skill level will eventually raise your own skill level over time) just as it will be equally detrimental to secure a play-testing partner who adumbrates your level of skill.

One particular way to ensure that the skill level of both play-testers is equal is simply to use one person in both roles! I like to test certain match-ups sometimes by opening a local game one-player in Cockatrice and playing both sides as optimally as I think they can be played. I specifically focus on the mechanics of the interactions of both decks to determine match-up peculiarities and develop a personal framework for that match-up exclusive of player idiosyncrasies that may arise if both play-testers were instead different people.

An added benefit to doing this "one player" type of testing is that you can step through each chain of interactions and even rewind that chain to compare different lines of play (because you are playing the roles of both players and know what each side wanted to do when), whereas in more traditional two-player matches it is harder to be cognizant of the exact thought process of each player and the exact decision tree traversed in any given turn.

You can do this type of testing (and many have) in paper, but software such as Cockatrice automates tedious mechanical motions such as card shuffling and deck assembly, such that it will allow you to do rapid-fire testing and focus as efficiently and sharply on exactly whichever dynamic you wish to test.
« Last Edit: August 03, 2015, 09:18:50 am by MTGFan » Logged
TheBrassMan
Administrator
Basic User
*****
Posts: 692


AndyProbasco
View Profile
« Reply #13 on: August 03, 2015, 10:03:34 am »

I'd be real careful about playing games against yourself. It's a very dangerous way to confirm your own biases - a lot of players have defended truly awful decks on the back of "two-fisted testing", and had those decks completely fall apart in tournament play (or never even played in a tournament at all). It's great for getting SOME information without needing a partner, but be careful not to rely on it too heavily.

Another easy way to account for disparity in skill between players is just to switch decks. Play 10 games as deck A and 10 games as deck B, if the two sets of games have very different results, you have your info.

It's not super critical to match playskill in the first place. You're looking for the relative value of your options, not the absolute value. If you test Delver vs Mentor and get around 40%, then test Shops vs Mentor and get around 60%, you have useful information about which deck to play, even if your opponent is more or less skilled than you. The real problem is skill disparity between DECKS. If both you and your opponent are both good at playing Mentor but bad at playing Shops, you're going to bad numbers no matter how you split up the games.
Logged

Team GGs:  "Be careful what you flash barato, sooner or later we'll bannano"
"Demonic Tutor: it takes you to the Strip Mine Cow."
Smmenen
2007 Vintage World Champion
Adepts
Basic User
****
Posts: 6392


Smmenen
View Profile WWW
« Reply #14 on: August 03, 2015, 04:37:00 pm »

I actually think two-fisted testing is fine, as far as it goes.  But it has the same pitfalls as regular testing, in terms of skill differential, static list usage, etc.  No testing prepares for a tournament where people can play tweaked decks rather than stock lists. 
Logged

vaughnbros
Basic User
**
Posts: 1574


View Profile Email
« Reply #15 on: August 04, 2015, 11:57:30 am »

I'd be real careful about playing games against yourself. It's a very dangerous way to confirm your own biases - a lot of players have defended truly awful decks on the back of "two-fisted testing", and had those decks completely fall apart in tournament play (or never even played in a tournament at all). It's great for getting SOME information without needing a partner, but be careful not to rely on it too heavily.

Well its certainly difficult to do, but I wouldn't say that it confirms bias if you are doing it right.  If you can't play one of the decks correctly and disconnect yourself from the fact that you know your opponents hand then it is not going to be productive.  You have to set rules for how each deck is going to play that particular game and then potentially allow for these rules to be adaptable from one game to the next.  It also has the advantage that you are seeing the effect of each spell on both players.  For example when I play chalice of the void, I have no idea what impact it is having on my opponent without seeing their hand or playing from their side.

Quote
No testing prepares for a tournament where people can play tweaked decks rather than stock lists.

I second this, and add that you can't replicate the experience of a tournament.  The combination of how misplays are handled, the interaction with your opponent, the physical environment you are playing in, ect. will all be different in play testing than during the tournament.
Logged
Varal
Basic User
**
Posts: 165


View Profile Email
« Reply #16 on: August 04, 2015, 01:47:09 pm »

How do you treat missed triggers and small mistakes, e.g. casting Sol Ring into Chalice of the Void, casting Mana Drain paying 1U, in playtesting? If you play hardcore like in a tournament, you might get skewed results but if all mistakes get corrected it might creates bad habit that can be really punishing in a real tournament.
Logged
vaughnbros
Basic User
**
Posts: 1574


View Profile Email
« Reply #17 on: August 04, 2015, 02:05:05 pm »

How do you treat missed triggers and small mistakes, e.g. casting Sol Ring into Chalice of the Void, casting Mana Drain paying 1U, in playtesting? If you play hardcore like in a tournament, you might get skewed results but if all mistakes get corrected it might creates bad habit that can be really punishing in a real tournament.

If I'm testing a brew against an established deck, I do not allow the brew to get a take back, but do allow the established deck to.  This allows you to skew the results to the established deck making the brew have to out perform the handicap in order to be considered a success.
Logged
diopter
I voted for Smmenen!
Basic User
**
Posts: 1049


View Profile
« Reply #18 on: August 04, 2015, 02:15:00 pm »

Two-fisted and seeded games are a quick way of testing a large number of ideas quickly.

Once you land on a great idea then you should probably do a lot more "real" testing but honestly the key is probably weeding out crap ideas in the first place.
Logged
Space_Stormy
Basic User
**
Posts: 187

Trinket Mage or bust!


View Profile Email
« Reply #19 on: August 06, 2015, 12:00:49 am »

I actually think two-fisted testing is fine, as far as it goes.  But it has the same pitfalls as regular testing, in terms of skill differential, static list usage, etc.  No testing prepares for a tournament where people can play tweaked decks rather than stock lists. 

I have to agree. Where you might run into a bias of wanting your deck to win and you might have the stock list make slightly inferior plays, you can get yourself into situations where both sides make the correct choices since there is perfect information and have really informative testing.
Logged

Tune in to coverage of The Mana Drain Vintage League! Sundays @ 9est/6pst: www.twitch.tv/hammybone
-Samuel Alaimo-
MTGFan
Basic User
**
Posts: 273


View Profile
« Reply #20 on: August 10, 2015, 11:22:57 am »

How do you treat missed triggers and small mistakes, e.g. casting Sol Ring into Chalice of the Void, casting Mana Drain paying 1U, in playtesting? If you play hardcore like in a tournament, you might get skewed results but if all mistakes get corrected it might creates bad habit that can be really punishing in a real tournament.

I think it's best to ensure making the "optimal" play for the deck you're testing *against*. So this way, your test list gets the roughest treatment possible. If it survives "optimal" play, you know you have a better-than-average chance against the suboptimal play you will inevitably run into during the course of a tournament.
Logged
TheWhiteDragon
Basic User
**
Posts: 1644


ericdm69@hotmail.com MrMiller2033 ericdm696969
View Profile WWW
« Reply #21 on: August 10, 2015, 11:09:14 pm »

How do you treat missed triggers and small mistakes, e.g. casting Sol Ring into Chalice of the Void, casting Mana Drain paying 1U, in playtesting? If you play hardcore like in a tournament, you might get skewed results but if all mistakes get corrected it might creates bad habit that can be really punishing in a real tournament.

If I'm testing a brew against an established deck, I do not allow the brew to get a take back, but do allow the established deck to.  This allows you to skew the results to the established deck making the brew have to out perform the handicap in order to be considered a success.

I think that is a bad approach.  I allow takebacks for both decks.  If you do something dumb and cast sol ring into chalice@1, and could have won later when you had cast that topdecked hurkyll's if only you had 1 extra mana (from sol ring), it is not showing your brew deck to be inferior - just the pilot in that game.  I could have a great brew that is a far superior deck, but constantly misplay it (because it's new, after all) and lose to an established deck.  If I didn't allow takebacks to both sides, I'd have to conclude the brew was inferior, which is faulty.  Even if it isn't a mistake, there are games where I actually rewind the game several turns to pivotal plays (I have perfect knowledge the whole time anyway) and make a replay to see how the game would play out differently.  If I'm trying to get to my combo quicker on turn 1 and still need lands to boot, should I turn 1 that serum vision or sleight of hand?  Play them both and ride it out for a few turns to find out the outcomes. The goal isn't to pilot a new brew perfectly while giving the established deck slack.  That doesn't help.  Giving both decks takebacks and rewinding turns allow you to learn which plays are optimal in which situations (since you see the results of both play paths).  The goal is to learn which plays are the best, not make them off the bat.  Giving the established deck rewinds is also important, because you want to face the best version of the opponent to really know the brew's strengths and weaknesses.
Logged

"I know to whom I owe the most loyalty, and I see him in the mirror every day." - Starke of Rath
vaughnbros
Basic User
**
Posts: 1574


View Profile Email
« Reply #22 on: August 11, 2015, 06:47:23 am »

How do you treat missed triggers and small mistakes, e.g. casting Sol Ring into Chalice of the Void, casting Mana Drain paying 1U, in playtesting? If you play hardcore like in a tournament, you might get skewed results but if all mistakes get corrected it might creates bad habit that can be really punishing in a real tournament.

If I'm testing a brew against an established deck, I do not allow the brew to get a take back, but do allow the established deck to.  This allows you to skew the results to the established deck making the brew have to out perform the handicap in order to be considered a success.

I think that is a bad approach.  I allow takebacks for both decks.  If you do something dumb and cast sol ring into chalice@1, and could have won later when you had cast that topdecked hurkyll's if only you had 1 extra mana (from sol ring), it is not showing your brew deck to be inferior - just the pilot in that game.  I could have a great brew that is a far superior deck, but constantly misplay it (because it's new, after all) and lose to an established deck.  If I didn't allow takebacks to both sides, I'd have to conclude the brew was inferior, which is faulty.  Even if it isn't a mistake, there are games where I actually rewind the game several turns to pivotal plays (I have perfect knowledge the whole time anyway) and make a replay to see how the game would play out differently.  If I'm trying to get to my combo quicker on turn 1 and still need lands to boot, should I turn 1 that serum vision or sleight of hand?  Play them both and ride it out for a few turns to find out the outcomes. The goal isn't to pilot a new brew perfectly while giving the established deck slack.  That doesn't help.  Giving both decks takebacks and rewinding turns allow you to learn which plays are optimal in which situations (since you see the results of both play paths).  The goal is to learn which plays are the best, not make them off the bat.  Giving the established deck rewinds is also important, because you want to face the best version of the opponent to really know the brew's strengths and weaknesses.

I see this testing method is yielding great results for your dark times deck.  As none of my brews are successful...  Thank you for the input.
Logged
TheBrassMan
Administrator
Basic User
*****
Posts: 692


AndyProbasco
View Profile
« Reply #23 on: August 11, 2015, 11:23:39 am »

It's important to realize that we can get a lot of different things out of testing, and it help to know what your goals are. If you're trying to improve your decklist - (whether it's a new brew or you're tuning something established), you get a lot more mileage out of allowing takebacks, and backing up the game state like TheWhiteDragon mentioned - you're trying to get information about how the deck could play out if you played it well.

If you're trying to improve your play for tournaments, obviously taking back plays is going to give you misleading results. Running mini tournaments with friends, with actual REL and something (something small) on the line is a fantastic way to improve play, but you'll get deck data less quickly.

There's no one perfect way to test, but if you know what your goals are, you get better results in less time.

That's the whole trick, right? Obviously more testing is better, but nobody in vintage says "I'm just going to test until my deck is perfect and my play is perfect" - trying to be a professional full-time vintage player is a terrible idea (I say this as someone who did it for a few years). You have a finite amount of time to dedicate to testing, and that time is almost definitely going to be smaller than the amount it takes to hit serious diminishing returns. So the question isn't "how many games are required before I get good results?" but, "how do I get the best results out of the time I have?"
Logged

Team GGs:  "Be careful what you flash barato, sooner or later we'll bannano"
"Demonic Tutor: it takes you to the Strip Mine Cow."
vaughnbros
Basic User
**
Posts: 1574


View Profile Email
« Reply #24 on: August 11, 2015, 12:31:24 pm »

It's important to realize that we can get a lot of different things out of testing, and it help to know what your goals are. If you're trying to improve your decklist - (whether it's a new brew or you're tuning something established), you get a lot more mileage out of allowing takebacks, and backing up the game state like TheWhiteDragon mentioned - you're trying to get information about how the deck could play out if you played it well.

The problem with the take back method is that it can lead to misleading results and is difficult to execute.  A line that is actually the optimal line can seem terrible in retrospect if your opponent has their singleton out.  Treating the game as though there is no skill involved is also inaccurate.  Some cards frankly are just worse because they are difficult to play (ie Cabal Therapy).  Some cards are better because they are easy to play (ie Mental Misstep).  If you are getting perfect hindsight and perfect play on everything that is happening in the game the power level of certain cards (and strategies) can dramatically change. 

To further the focus of testing shouldn't be on wins/losses in the first place.  If its a brand new idea the focus should be on whether you were competitive enough in those games that the idea is something that should have more time placed into it.  If I'm getting blown out every game the idea is probably not worth exploring.  After this, the focus should be on improving your skill with the deck and deciding what cards are worth playing and what are not worth playing.  Again the barometer has to be how competitive were you in those games.  This measure may be somewhat subjective, but its also a lot more informative than a Win/loss.

The focus should be on improving your skill with the deck and deciding what cards are worth playing.

No there is no perfect way, but there are ways to test that are better than other ways.  Of course you need to factor in some individual skill when considering testing as well.  For instance some people are just better at two fisted testing than others.  Other people are better at testing with partners.  And some are just frankly bad at testing in general and need the rigors of tournament play to properly evaluate things.
Logged
TheWhiteDragon
Basic User
**
Posts: 1644


ericdm69@hotmail.com MrMiller2033 ericdm696969
View Profile WWW
« Reply #25 on: August 11, 2015, 05:18:35 pm »

How do you treat missed triggers and small mistakes, e.g. casting Sol Ring into Chalice of the Void, casting Mana Drain paying 1U, in playtesting? If you play hardcore like in a tournament, you might get skewed results but if all mistakes get corrected it might creates bad habit that can be really punishing in a real tournament.

If I'm testing a brew against an established deck, I do not allow the brew to get a take back, but do allow the established deck to.  This allows you to skew the results to the established deck making the brew have to out perform the handicap in order to be considered a success.

I think that is a bad approach.  I allow takebacks for both decks.  If you do something dumb and cast sol ring into chalice@1, and could have won later when you had cast that topdecked hurkyll's if only you had 1 extra mana (from sol ring), it is not showing your brew deck to be inferior - just the pilot in that game.  I could have a great brew that is a far superior deck, but constantly misplay it (because it's new, after all) and lose to an established deck.  If I didn't allow takebacks to both sides, I'd have to conclude the brew was inferior, which is faulty.  Even if it isn't a mistake, there are games where I actually rewind the game several turns to pivotal plays (I have perfect knowledge the whole time anyway) and make a replay to see how the game would play out differently.  If I'm trying to get to my combo quicker on turn 1 and still need lands to boot, should I turn 1 that serum vision or sleight of hand?  Play them both and ride it out for a few turns to find out the outcomes. The goal isn't to pilot a new brew perfectly while giving the established deck slack.  That doesn't help.  Giving both decks takebacks and rewinding turns allow you to learn which plays are optimal in which situations (since you see the results of both play paths).  The goal is to learn which plays are the best, not make them off the bat.  Giving the established deck rewinds is also important, because you want to face the best version of the opponent to really know the brew's strengths and weaknesses.

I see this testing method is yielding great results for your dark times deck.  As none of my brews are successful...  Thank you for the input.

Umm...okay, that was out of left field.  My method doesn't help a deck that doesn't play in tournaments to win tournaments, I agree. The next time I see a Vintage tourney within an hour of San Antonio, I'll let you know how it does.  As for now, I'm relegated to table games and the occasional cockatrice match for Vintage. I always used the same method back when i did play vintage often, and it did yield great results (see SCGP9s 2001-2004).  The metagame has changed greatly, but effective methods of deck testing are always the same.  There may be different ways to test, but any method should result in valuable information.  making misplays and not backing up doesn't reward you with any information other than "that was a bad play".  Something obvious like forgetting a chalice and casting stuff into it is just a dumb misplay, but has zero to do with the strength of your brew.

I never said your brews weren't successful - I actually don' know who you are or any decks you play. My method does yields results in determining optimal lines of play, however and I don't understand what info you gain from the method you described.  If it works for you, fine, but please explain what info you actually gain from casting a sol ring into chalice @ 1 mistakenly and not allowing a takeback.

I don't see how anything with dark times invalidates my points though - and I was speaking of playtesting any deck in general. Punishing your brew deck for your own play errors doesn't tell you anything about the strength of your brew - it only shows you how bad misplays can be.  My brews do much better as I test and I learn which plays are better by rewinding and allowing take backs to both sides.  If I just make a dumb play error like cast a 1cc into chalice@1 and don't allow the takeback, please tell me how that helps me gain any knowledge of the brew deck other than learning "don't make stupid plays".
« Last Edit: August 11, 2015, 05:21:59 pm by TheWhiteDragon » Logged

"I know to whom I owe the most loyalty, and I see him in the mirror every day." - Starke of Rath
TheWhiteDragon
Basic User
**
Posts: 1644


ericdm69@hotmail.com MrMiller2033 ericdm696969
View Profile WWW
« Reply #26 on: August 11, 2015, 05:47:59 pm »

The problem with the take back method is that it can lead to misleading results and is difficult to execute.  A line that is actually the optimal line can seem terrible in retrospect if your opponent has their singleton out.  Treating the game as though there is no skill involved is also inaccurate.  Some cards frankly are just worse because they are difficult to play (ie Cabal Therapy).  Some cards are better because they are easy to play (ie Mental Misstep).  If you are getting perfect hindsight and perfect play on everything that is happening in the game the power level of certain cards (and strategies) can dramatically change.  

The takeback method leads to bad results only if you are allowing takebacks until your deck always wins and then say "wow, my deck wins every time".  That'd be bad.  But if you take back something that is just a misplay (like a sol ring into a chalice @1 already on the table), you are just removing "dumb play mistakes" from evaluating the strength of your deck.  I'm not talking about casting a sol ring and your opposing deck has misstep so you take back casting the sol ring.  That play should follow through, because you wouldn't have known your opponent had misstep and you need to see how your deck can carry on with mana getting countered.  A derf play into a chalice on board just staring you in the face....that's different.  Things like cabal therapy, takebacks can help too.  If your trying to protect a creature (lets say dark times since you insist on bringing my pet deck into the convo) and have depths and hexmage and name plowshares, you might wiff.  If you name FoW (which would stop hexmage) you might hit.  You may wiff naming FoW and they might be holding plowshares.  But over time, allowing takebacks and doing several tests, you can learn that FoW is in their hand far more than plowshares and naming FoW on turn 1 is the far better play.  Similarly, if I have ponder and brainstorm and want to chain spells for storm, which do I play first?  Well, one way to gain info faster is to try them both.  Play them in one order, then rewind and play them in the other order to see which has the better outcome.  You may learn that it's always better to ponder first when you are digging, but always better to brainstorm first when you are trying to filter your hand. The point is, takebacks allow you to see how different lines play out much faster than just running one line out and losing and playing a hundred more hands waiting for that exact situation to occur again in another test.  

A great example that I found myself doing a lot recently was in modern.  I run a fun mill deck.  If I fetch a swamp on my first fetch and then topdeck hedron crab, I would have to use my second fetch to fetch blue and cast crab and have no land drop.  If I use my first fetch for blue (even if I have no crab in hand), then if I topdeck a crab, I can cast it and mill 6 with my second fetch.  After testing I learned that, absent anything to do with that first mana otherwise, fetching a blue source first was ALWAYS the right call.  But I didn't just play a million games to run across multiple instances of me having 2 fetches and topdecking a crab when I had none in hand.  I backed up the game state and fetched blue, then saw how milling that extra 6 helped in more wins.  I didn't take that game where I made a takeback as a "win" for tally's sake, but it showed me which was the right play to make henceforth.  Had I just said "derp, that was dumb to fetch for black first" and kept playing, i might not have seen how that 6 card difference could affect the game overall.  And had I waited for that situation to occur again, i'd be testing until the cows come home.  But by backing up, I got to see how the game changed overall due to one decision.  Something like "always gitaxian probe before playing your land", while perhaps intuitive to an experienced player, could be something learned with takebacks. The situation may also be dependent on the opponents hand and gamestate too, but in general, trends occur when cards are played in certain sequences. You'll see how the result comes out differently by trying different paths.  

To further the focus of testing shouldn't be on wins/losses in the first place.  If its a brand new idea the focus should be on whether you were competitive enough in those games that the idea is something that should have more time placed into it.  If I'm getting blown out every game the idea is probably not worth exploring.  After this, the focus should be on improving your skill with the deck and deciding what cards are worth playing and what are not worth playing.  Again the barometer has to be how competitive were you in those games.  This measure may be somewhat subjective, but its also a lot more informative than a Win/loss.

That's exactly my point - you don't count a game with a ton of takebacks as a win.  You learn from the individual play which plays are optimal in certain situations.  You learn and see results faster with takebacks rather than losing and waiting for the same situation in another match.  There are also times where a strategy can be really great, but playing it out in the wrong sequence causes the strategy to always fail.  If you don't automatically know the right sequence, you will make the wrong play often and abandon your strategy with your method (as you've described it), but by replaying, you instantly see the different outcomes and realize "oh, if I do this instead, the combo is 10x more successful/resilient/etc."  I find this particularly true of tutors and draw spells or things like vampiric tutor and confidant.  I've learned it's often better, when hunting for a specific card, to draw off bob first, then vamp if I didn't draw what I wanted as opposed to vamping before the trigger and drawing an extra card afterward.  Why?  Because sometimes the card you want is already on top and you wasted an otherwise useful spell.  To players that have been in the game for years and have this level of practice, you can intuit that same realization based on what the cards do.  But for any player, and especially newer players, you'll learn that fact quicker by allowing takebacks and replays (put the vamp back in your hand and just reveal to bob the card you wanted that was on the topdeck anyway) as opposed to making a bad play and living with it and seeing the different result only  when that situation happens again.
« Last Edit: August 11, 2015, 05:58:58 pm by TheWhiteDragon » Logged

"I know to whom I owe the most loyalty, and I see him in the mirror every day." - Starke of Rath
vaughnbros
Basic User
**
Posts: 1574


View Profile Email
« Reply #27 on: August 11, 2015, 05:52:04 pm »

Umm...okay, that was out of left field.  My method doesn't help a deck that doesn't play in tournaments to win tournaments, I agree.

Honestly what is the point of testing then?  If you are just trying to have some fun testing is irrelevant.

My method does yields results in determining optimal lines of play

Does it?  Rewinding the game multiple turns is a massive change in information that can make an earlier play seem wrong when it in fact was not wrong and vice versa.  Its the Monday morning quarterback situation.  "Why did Russell Wilson throw that pass?"  

[Playing a card into a chalice also isn't always wrong (What if I needed to fill my graveyard for a pending Delve spell?), nor is even letting a spell resolve through a chalice (if I wanted my opponent to have to win Mana Crypt rolls?  Or wanted a sol ring around to steal it with Dack Fayden?).  These types of grand generalizations don't carry a lot of weight]

I don't see how anything with dark times invalidates my points though

You are calling my approach bad in a comparison to your approach.  Am I not allowed to compare the results that we have gotten with these methods?

Punishing your brew deck for your own play errors doesn't tell you anything about the strength of your brew - it only shows you how bad misplays can be.

It can tell you a lot about your brew.  As I just stated above, play skill is a thing, and certain decks (and cards) also have higher levels of skill involved with them.  By removing play errors, you are removing a necessary component of success in magic.  "If I had perfect plays" scenario is not realistic and will yield skewed results.


All the takeback scenarios you've described still need to be played out to get the full result as they are very subtle differences and again not always correct.  You are going to spend just as much time (if not more) rewinding the game as getting an entirely new rep in.  
« Last Edit: August 11, 2015, 05:57:33 pm by vaughnbros » Logged
TheWhiteDragon
Basic User
**
Posts: 1644


ericdm69@hotmail.com MrMiller2033 ericdm696969
View Profile WWW
« Reply #28 on: August 11, 2015, 06:07:57 pm »

Umm...okay, that was out of left field.  My method doesn't help a deck that doesn't play in tournaments to win tournaments, I agree.

Honestly what is the point of testing then?  If you are just trying to have some fun testing is irrelevant.
 

To try to make the best deck possible - so when I do play (on cockatrice, table games, local shop, wherever) my deck has the best chance of winning.  Playing something god-awful that is untuned and losing all the time isn't fun.  Testing to tune a deck and make it good vs other competitive decks allows you to play fun, competitive games, wherever that may be.  And if I do take it to a tourney, I'd want the best tuned version of my deck.  I'm sure your deck that came in Xth place has a better showing than my pet deck that I've never played in a tourney - but that has nothing to do with my logic on playtesting strategy in general.
Logged

"I know to whom I owe the most loyalty, and I see him in the mirror every day." - Starke of Rath
TheWhiteDragon
Basic User
**
Posts: 1644


ericdm69@hotmail.com MrMiller2033 ericdm696969
View Profile WWW
« Reply #29 on: August 11, 2015, 06:18:41 pm »

Umm...okay, that was out of left field.  My method doesn't help a deck that doesn't play in tournaments to win tournaments, I agree.

Honestly what is the point of testing then?  If you are just trying to have some fun testing is irrelevant.

My method does yields results in determining optimal lines of play

Does it?  Rewinding the game multiple turns is a massive change in information that can make an earlier play seem wrong when it in fact was not wrong and vice versa.  Its the Monday morning quarterback situation.  "Why did Russell Wilson throw that pass?"  

That's only true if your gaining new information after the fact, like if the opposing deck had a counterspell.  If all things would remain the same otherwise, you can see how different plays play out to their conclusion, then back up and try different paths - like sequence of land drops, which to lead with, etc.

[Playing a card into a chalice also isn't always wrong (What if I needed to fill my graveyard for a pending Delve spell?), nor is even letting a spell resolve through a chalice (if I wanted my opponent to have to win Mana Crypt rolls?  Or wanted a sol ring around to steal it with Dack Fayden?).  These types of grand generalizations don't carry a lot of weight]

I never said it was ALWAYS wrong.  I'm saying when it is just a derp play error - you didn't intend to cast into a chalice but forgot it was on the table - it pays to take it back because all you learn otherwise is "don't make stupid play errors."

I don't see how anything with dark times invalidates my points though

You are calling my approach bad in a comparison to your approach.  Am I not allowed to compare the results that we have gotten with these methods?

Maybe I shouldn't have said your approach was "bad".  I just don't see the value in it as much as the takeback method when trying to learn quickly the consequences of certain plays vs others in a similar situation.

Punishing your brew deck for your own play errors doesn't tell you anything about the strength of your brew - it only shows you how bad misplays can be.

It can tell you a lot about your brew.  As I just stated above, play skill is a thing, and certain decks (and cards) also have higher levels of skill involved with them.  By removing play errors, you are removing a necessary component of success in magic.  "If I had perfect plays" scenario is not realistic and will yield skewed results.

It yields skewed results if your counting those games in a win/loss type of way, but not in farming info on individual plays.  Someone can have awful play skills and totally botch playing a deck that won the last Vintage Worlds.  Does that mean the deck sucks or the person playing it?  By living with "play errors", you are evaluating the strength of a deck on the strength of the pilot.  You may learn that you have bad playskill with an archetype.  How does that help you understand how good the brew is in general? And by not using takebacks to learn which are the better plays, you slow the time in which you will learn the optimal way to pilot the deck.

All the takeback scenarios you've described still need to be played out to get the full result as they are very subtle differences and again not always correct.  You are going to spend just as much time (if not more) rewinding the game as getting an entirely new rep in.  

I didn't say you don't play them out....you DO play them out, then rewind and play it out a different way.  Sometimes I don't even finish a game...I just carry out a senario 5 or 6 turns to see where it is leading, back up, and try the other way...set up the same scenario with a different board state, play it out 5 turns, back up, try another way.  You learn what is the correct play in a variety of situations quickly that way.
Logged

"I know to whom I owe the most loyalty, and I see him in the mirror every day." - Starke of Rath
Pages: [1] 2
  Print  
 
Jump to:  

Powered by MySQL Powered by PHP Powered by SMF 1.1.21 | SMF © 2015, Simple Machines Valid XHTML 1.0! Valid CSS!
Page created in 0.328 seconds with 20 queries.