GDC 2010: Design in Detail XIX


And there it is! The big detail!

It’s a large change. It’s very easy to convince yourself that you can feel tiny changes, but you will be fooling yourself. The balance never hinges on a 2% difference in a single value!

It was a smaller change than we tried initially. I think originally we changed it to 0.9, which broke the flow and wrecked the weapon, but did fix the problem, so we knew we were on the right track. In general, you want to overshoot and then come back. You have to make sure your change accomplishes your goal, and then dial it back.


There are lots of ways to verify that a change was successful. In this case, the Sniper Rifle didn’t get any less popular; People still use it whenever they can get it. But Optimizers stopped using it exclusively.


The other reason we could tell it was balanced was because we could compare how the behavior we were seeing in playtests matched the desired role we described in our paper design. It’s not quite an objective test, but it should help. And most importantly, I no longer got nervous when I watched people use the Sniper Rifle. You should verify a change with both your brains and your guts.


Ship it!

At this point during my GDC talk, the audience started clapping, cluing me into the fact that I was completely out of time.
So I just went with the flow and ended it there, which was fine. I had walked people through the balancing process, brought up the important principles, and applied them to the sniper rifle change. But I had also intended to mention a couple of things about the last stages of balancing, which I will get into next week!

Advertisements

GDC 2010: Design in Detail XVII


This is the point in development where we finally changed the Sniper Rifle. Now I will try to describe how all the work from previous passes informed this decision…


The Sniper Rifle was overpowered — that’s what we intended, remember – but it made the other aspects of the game feel weaker.  We couldn’t make the rest of the arsenal strong enough to keep the Sniper Rifle in line.  One way we could tell was because the players we had picked out as Optimizers were using it exclusively.  Role Players, on the other hand, were still not using it, but suffering for it.


Worse, the Sniper Rifle was being used at close quarters, which is clearly outside of its role.  And nothing the targeted player could do would allow them to avoid being sniped.


When something impacts you emotionally we say we were “moved”.  Emotions are what compel you to act — not graphs and data.  Use your Sense of Balance to feel when something is wrong and trust those instincts.

Nerf Herders?

Against Statistical Design

Statistical Design

I suggested that a good way of improving one’s design sense is by staring at Rorschach Tests, and here is a practical example of the importance of practicing pattern-avoidance.

To me it looks like a designer's brain in a vice

Stop seeing patterns!

This image is a heatmap showing where people most often die on Assembly, a Halo multiplayer map.  These heatmaps were first used by the Halo design team to analyze maps during testing, but were so interesting looking they became part of the bungie.net statistics pages.  This data is so rich — so detailed and specific — it must be useful to a designer in some way, right?  The problem-loving brain of the game designer latches on to this as The Solution and immediately starts searching for The Problem.  It is tempting, given a powerful tool like statistical analysis, to incorporate it into the design process somehow — especially since design is often stranded in a world of abstraction and uncertainty.  Having concrete numbers is a rare treat.

However, what does this data mean?  Are red areas bad?  Should dark areas be eliminated?  Does a well-designed multiplayer map have a symmetrical shape?  What percentage of a map should be yellow?  Something about high-contrast feels unbalanced, so perhaps the map should be revised so that the gradient from safe to dangerous is more continuous.  And areas where nobody dies seem wasteful, maybe they should be removed.  And obviously the red areas will be frustrating, so they should be made safer by limiting line-of-sight and adding cover.  Pretty soon we have a completely yellow multiplayer map, that we have tricked ourselves into believing is balanced because our data looks pretty.  We have fallen victim to statistical design.

Players Aren’t Statistical

Statistics are powerful tools because they aggregate a large number of unique instances into a manageable form so it can be analyzed.  It would be impossible to watch every death of every player across thousands of games and have any cohesive understanding of how often players were dying in a given area.  Given enough examples, we would develop an emotional feeling of dread or security associated with certain spots, but the brain uses a very unscientific method to determine these attachments.  Exciting experiences are weighted much too heavily, which is why the impartiality of statistics is useful in discovering imbalances.  Using statistics to find problems is fine; designers go wrong when they use statistics to evaluate solutions.

Players don’t engage with the game statistically — they experience it personally.  It doesn’t matter if more players are killed standing in a specific spot than anywhere else on the map, what matters is the unique experience of a player killed in that spot.  If they realize that they shouldn’t have crested the hill with no cover that is right below where the Sniper Rifle spawns, vow not to do that again and move on, there is nothing wrong with the map.  Even if they do it over and over, growing more and more frustrated at their repeated mistake and creating a bright red dot on the heatmap, the map is not unbalanced.  However, if players are forced to expose themselves at a single chokepoint, or get sniped through a hidden line-of-sight in an otherwise safe area, it doesn’t matter if it is a rare experience and there is no red, the map ought to be fixed.  Neither of these situations can be found through statistical analysis, and neither of them are fixed by a solution that merely addresses the probability of being killed in a given area.

Avoiding Statistical Design

Some systems can only be balanced statistically.  If there are three factions in the game, and one faction wins 43% of the time, the factions are not balanced.  If a map is intended to be used for two-flag CTF, but the bases aren’t mirror images of one another, then the two sides had better be perfectly fair.  The necessity of reverting to statistical methods is inherent in the design of the system itself.  The designer will be forced to make changes that do not change the unique player experience — or may even harm it– in order to fix a statistical imbalance.  Worse still, players are skilled at detecting when a system must be balanced statistically, but since they do not have access to hard numbers their personal experience will tell them that it isn’t balanced — even when the data says that it is!

Nerf Herders?

Nerf Paladin?

Well-designed systems do not need to be balanced through data-manipulation.  If there are 10 weapons in the game, and one weapon is responsible for 20% of the kills, there is probably not a problem.  If the unique player experience isn’t negatively impacted, the statistical difference isn’t a balance issue.  So, the easiest way to avoid the trap of statistical design is to avoid systems that must be balanced mathematically in favor of those that can be balanced behaviorally.  If a system requires a large amount of instrumentation and is extremely sensitive to tiny value changes, instead of obsessing over statistical patterns, try revisiting the system’s design and making it less brittle.

GDC 2010: Design in Detail XV


Without anyone getting kicked in the face…


You always need to listen when people don’t like something. You are too close to the game; You probably already fixed all the things you didn’t like, so you should value a fresh perspective. Keep in mind that you can always trust someone’s emotional reactions, they are always authentic and valuable, but never just blindly take their advice. The designer’s job is to separate emotional feedback from thoughtful suggestions and treat the appropriately.


Before you can interpret someone’s feedback, you need to understand the source. Feedback means “the game in my head is different” and often times your response to feedback should be to probe about what kind of game they are imagining. You don’t necessarily need to agree on the game you are making to benefit from their feedback; they probably represent some portion of your audience.

You see Development Bias a lot with the public when the development process is very open. Playtesters know the game isn’t finished, they know you expect them to provide constructive criticism, so they become a lot more sensitive and more likely to complain. Once the game is on the shelves, those small problems fade into the background and players rarely notice them.


You also need to understand the source of feedback; If you can categorize someone’s play style, it will help you understand how to react to their feedback. You can weight their comments appropriately.
Here are some examples:
(The names have been changed to protect the guilty)


I used to balance “Easy” by playing with my nose (true story) but Steve still couldn’t beat it. I miss that guy, he was incredibly useful for balancing.


Even more important than categorizing other players, you need to understand your own playstyle. For instance, I’m a “role-player”, so I tend to ignore small balance problems if the results are still dramatic. I have to recruit “pros” that are more sensitive to useless or underpowered elements.

GDC 2010: Design in Detail XIV


If you were disciplined in writing your paper design, and stayed firm while doing setting up the rough balance, this stage should be very rewarding and exciting.  If not, it is going to be disappointing and frustrating.


The timing for this stage is tricky.  If you start too early, your balance changes will be swallowed up by the churn of new features coming online.  If you wait too long, the rough balance will become entrenched and the team will object to changes.  Generally, this coincides with a “First Playable” build where everything is at least in the game and functioning.

It’s crucially important to communicate this new phase to the rest of the team, so they know what to expect and understand that now is the time for them to give the feedback they have been patiently waiting to deliver. One way to do that is to implement a controlled opportunity for them to play the latest build and provide their feedback in a structured format.  Make sure you tell them what you are currently working on, so their responses will be relevant, but don’t tell them exactly what has changed or you may bias their opinions.


So how do you balance a Sniper Rifle? It is not by adding weaknesses!  Don’t undo the work you did in making it powerful!  Balance it by narrowing its role through limitations.


The best way to detect which elements need to be limited is by watching for the game to become predictable.  If the same strategy is being used in a variety of different situations, to the point where players are no longer required to think about which strategy to choose, it means an element is too useful outside of its designated role.  If the Sniper Rifle is not only the best weapon at long range, but players are carrying it indoors and using it against vehicles, it needs to be constrained.  Give it some time first, because the playtesters might just not have figured out the new balance yet, but if it is consistent for a few tests, start looking for ways to limit the dominant element.

On the other hand, if the game is completely unpredictable, it is a sign that the elements are not effective enough at their roles.  A truly random strategy should never be as good as intentionally selecting an element that is strong in the desired role.  It may also be a symptom of a role going unfulfilled.  If there is no Sniper Rifle, the Shotgun and the SMG are equally terrible at long range combat, so it doesn’t matter which one you choose.