November 15, 2010

The Value of Metrics

Written by: Vixsin
Back in the way back when, when killing Prince Malchezaar was a personal accomplishment, or even before, when all I wanted in the world could be found in Quagmirran’s Eye, I was content in my ignorance about performance. Fun and output were loosely correlated at best. But as I progressed into the world of competitive raiding, and then further still into a world where teams feel like failures if they fall out of the US top 20, I started to recognize the edge that comes from continual performance review. And when, about 2 years ago, I started with a firm that prides itself on the ability to evaluate projects based solely on key data points, the value of metrics finally solidified itself in my mind.

Over the years, I’ve read, heard and participated in a number of arguments about the pros and cons of meters. And nothing is more sure to get a response from a complete WoW stranger than linking a meter in chat and commenting on some perceived shortcoming. Even Ghostcrawler himself has weighed in on the topic saying:

Trying to evaluate how awesome a healer you are by looking at healing meters is extremely dangerous. Heck, it’s even dangerous to compare dps if you don’t look at what’s really going on. (Source)

And this oldie but goodie (wonderfully enough, from a Shaman thread about Ulduar):

The moral of the story is meters are very useful, but like any tool, their ability to measure what happens in reality has limitations. In my experience, players put too much emphasis on them, especially for healing. (Source)

Now I’ll credit GC with being an incredibly intelligent individual and certainly one of sufficient patience to put up with the attacks that frustrated and angry players level at him day in and day out. And I’d bet that at one point or another he’s had to delve into some class performance data to evaluate how much “nerfed to the ground” actually is. But on this point—on the topic of if meters can truly give you an accurate picture of a player’s performance—I have to respectfully disagree with him, and likely, many of you. Meters, and the metrics they present, absolutely matter. They’re not perfect at present, but I have some ideas of how they can get there.

(Do note: when I say “meter” in this post, I am using it as a general term encompassing every combat log parser out there—from real-time addons like Recount and Skada, to detailed online tools like WoL and WMO.)


Why do we need meters?

I am, quite unashamedly, an obsessive trinkerer. I am constantly looking for new ways to improve how I do things—from repetitive data handling (I love you, Visual Basic!) to casually tracking the fastest way to get home in rush hour traffic (bah, traffic!) In WoW, this means I generally spend a good deal of time and energy thinking and exploring ways to improve my performance. (Hell, I even come up with “optimal” farming routes, for those times when I want to zone out and still be productive.)

But I understand that this perspective isn’t one that’s shared by everyone. And in the 9th month of running ICC, even I abandoned the idea of process improvement long ago, content to simply enjoy the ride until Cata’s release. And yet I will still argue that meters are valuable to everyone, regardless of whether they are casual or uber-hardcore. Why?

The large majority of people think they’re above average.

Of course, you’re above average, I’m just talking about those other fools who don’t read my blog. Just like I’m talking about those other baddies who are the root of all evil in WoW and do things like behave badly, afk during trash, or ninja log during a bad LFD group. You and I would never do that … *cough* My point here is, you and I are both guilty of largely overestimating our performance and our own skill. It’s why each and every one of us can recount a tale of absolute baddies but casually dismiss our own bad behavior and choices. It’s also the same reason players point to meters and say, “that’s not representative of my contributions to the team”, because how we feel about our performance and how the numbers actually play out, are two separate things entirely.

This effect (Illusory Superiority) provides an explanation for something we all have trouble acknowledging—meters are important because we stink at gauging how good, or bad, we actually are. (For more interesting reading, check out: Why we overestimate our competence and Inaccurate Beliefs About Learning and Memory)

(Now let me stop you for one second before you race to the comments section to advise me just how wrong I am—this is not a post about the use or misuse of meters. This post is not about how capable we are of objectivity. This is about why and how we can use meters to help us become better players. Do read on, I’ve got some ideas in that regard.)


So what do meters tell us?

Aside from the obvious benefits to epeen enlargement, meters actually provide a good window into player performance, more than they’re really given credit for. Even if you’re not willing to delve too deep in your analysis and break out the comparison spreadsheets and data charts, most meters out there can still give you a good selection of information to look at. Aside from the “Damage Done” and “Healing Done” options, meters also generally display the following (in some way or another):

  • Damage taken and healing taken (by raid member, by spell)
  • Enemy damage done (to raid members, by spell, by percent)
  • Dispels (by raid member)
  • Interrupts (ohai, did we all forget about this one? Even you former RoS rogues?)
  • Activity (by raid member)
  • Overhealing done (by raid member)
  • Targets healed (by raid member)

Now, I absolutely concede that these meters, as they currently exist, make it hard for players to see the full picture of a fight or instance, because it’s easy to get lost within all of the twists, turns and nuances of the average parse. But to say that the above metrics are useless because “they don’t tell the whole story” is simply a load of hogwash.

Meters, and combat logs, are the raw data of an encounter, without any of our personal bias thrown in. “Vixsin totally made the heal that saved the tank and prevented a wipe” isn’t appended on any of the lines, nor will my epic dodge of a Shadow Trap be recorded as a +1 Skill anywhere on the Skill Meter. But instead of seeing that as a bad thing—as players often argue—I instead see that as something positive. Yes, winning as a team is about group performance, and there is no I in a healing team, but if that were the long and short of the story, then there’d be no reason to track sports statistics, right?

So when GC or anyone else says that meters are dangerous, I instead take it to mean that he and they are talking about times where someone links a dps meter (with himself at the top, naturally) and concludes that he wins at the internet. One meter, one metric for that matter, does not tell an entire picture, this much is true. But why can’t we create a series of indicators that do?


A better perspective

As I see it, the problem with current combat log parsers centers around a couple key flaws:

  • Problem: Meters contain little hierarchy of information. At their most detailed, they are like full-blown, detailed, cross-referenced, and avidly footnoted thesis, with absolutely no abstract or summary, no chapters, and no conclusion. In a business environment, this would be like lacking an Executive Summary, so named because company executives rarely have the time to read your 120 page report on the forensic analysis of costs on the new private performing arts center. They need the facts, your findings and your conclusions, and they need it in as few words as possible.
  • Problem: The established gauges of excellence “DPS” and “HPS” are insufficient standards. DPS and HPS are not latched on to as “good” metrics because players understand the values they represent (and they represent a limited world view at best). They are latched onto because they are easy to understand (more dps kills bosses faster) and easy to use as a point of comparison (HPS = healing version of dps). But they are exactly that, one point.
  • Problem: There is little way to cross-compare personal or guild performance. Oftentimes, looking at your own performance from week-to-week or looking at several guilds’ performance on a specific encounter leaves you struggling to identify similarities and differences within a multi-faceted parse. This means that tracking your own performance gains is boiled down, once again, to the easy and fast metrics of HPS and DPS, disregarding the supplemental factors that influenced that single number.

So what’s the solution? New metrics, and more of them.

Ultimately, the goal of any metric is to provide a system which allows raid leaders and raiders themselves to evaluate performance relative to their teammates, data groups, class, etc. But this data lacks context unless it can be compared to other data sets in an attempt to quantify improvements or slippages. So, in order to do this, we need to establish common, yet influential, performance elements, with apply in just about every boss fight, like the following:


  • % activity / 100%
  • Type: player-to-player comparison
  • Benefit: An easy metric to understand how good a player is at staying alive (or how loved he/she is by healers). This is already something shown on WoL and WMO, but currently can’t be compared on a parse-to-parse basis.
  • Best Score: 100 (for surviving the whole encounter)

Resource Consumption

  • Total resources expended / 12 seconds
  • Type: class-to-class
  • Benefit: Although I’m not sure this would be a point of comparison that would be valid for death knights (maybe on a limited basis for runic power), this would allow most players to evaluate their resource management against other players of their class. Was one rougue able to eek out more dps because of his energy use? Was one healers’ regen far surpassing another healer of the same class?This metric would ultimately identify which players are doing more for less (a vital find at least for mana users in Cata).
  • Best Score: (no limit)


  • [(your damage to target1 / total damage to target1) + … (your damage to targetn / total damage to targetn)] / total number of targets to which damage was done (for healers, replace damage with healing)
  • Type: player-to-player
  • Benefit: It’s an oftentimes forgotten aspect of dps and healing performance, but the number of times you switch targets directly correlates to a reduction in throughput for both dps and hps. So, a focus metric not only allows you to easily determine whether or not you’re comparing a raid-healing pally to a tank-healing pally, but also evaluate on a basic level whether or not your healers and dps are doing the jobs that they should be doing.
  • Best Score: (depends on assignments)

Support activities

  • (your # of interrupts, CC, cleanses) / (total # of interrupts, CC, cleanses)
  • Type: player-to-player and player-to-team
  • Benefit: It goes without saying, but sometimes performance of a simple task makes all the difference in the world (interrupts on Vezax, for example). And when these types of mechanics crop up in Cata, and they will, it will behoove you to know which members of the team are only working to further their own numbers and which are managing to multitask.
  • Best Score: 100 (representing that you did all of the interrupts in an encounter)

Incoming Damage

  • your damage taken / (total damage taken – outliers)
  • Type: player-to-player
  • Benefit: In an environment where mana matters, the damage your partners and team takes will start to have a pretty sizable impact on the success of the team. So, while this metric might not be valuable when evaluated for a single raid night, when looking at weeks of raids, you’ll be able to quickly see who has a knack for standing in fire and who is really worthy of the healers’ adoration.
  • Max Score: 0 (you took no incoming damage during an encounter)

Personal and Team Values (at least for Healing)

  • Personal value: activity / effective healing (Max of 100)
  • Team value : (effective healing) x (% of total healing done) x (activity) x (normalization factor) so that ∑ team values = 1
  • Type: player-to-player
  • Benefit: Playing for a team is a delicate balance, and oftentimes when you start looking at “top parses” it’s easy to forget that personal accomplishment comes at the price of team accomplishment. The DPS that was Power Infusion’ed to the top of WoL got there because he also stood in fire, and ignored adds, not because he was particularly adept at his rotation. Likewise, the healer who deviated from a tank-healing assignment to raid-snipe (and thus caused other healers to have to supplement tank heals), was working more for himself than the team. By creating complimentary metrics for personal and team “worth” you can not only see performance gains within your team, but between players and between parses from week-to-week.

Example Healer Metric Calculations

Ultimately, I think establishment of the above metrics could contribute to a better analysis of performance, and provide a clearer picture of where teams are succeeding and where they’re falling short. Inclusion of each of the above, on a summary page, would allow a player to quickly and easily assess themselves and others. True, you’ll never be able to see the value of that “one ZOMG EPIC clutch heal” but then, who’d want to reduce their healing contributions down to one twitch moment anyways?


Any takers?

So, I guess after all that schpiel, the only thing left to say is … so who knows lua?


  1. Sekul

    Great post as usual.

    I’m glad to see some of my opinions regarding meters are validated. I’ve always used meters and log parses to evaluate performance of myself and raid members. And, I’ve always had to defend that against the many people that are of the belief that meters are bad for a raid.

    While I do agree that the DPS/Damage total and healing total linking are in fact bad for raids, I don’t use meters or logs for that purpose. It’s never good for a DPS to pad the meters at the expense of fight mechanics just to see their name at the top, we can all agree on that.

    The things I look for on meters is stuff like, what percent of your damage was on the adds you were supposed to be killing as compared to similar classes (aka were you doing your job or just nuking the boss to pad the meter)? Were you dispelling/CC when you were supposed to? What damage did you take and was it avoidable? Etc…

    Notice that the things I look for typically center around fight mechanics. I personally believe that an 8K mage that performs the mechanics of a fight correctly is more valuable than a 12K mage that ignores adds/CC/etc., so that is what I evaluate performance on.

    Now, of course if you have someone that is very very below the average in terms of healing/dps then you need to look more closely at their individual performance with regard to spell selection, activity and possibly gemming/spec, but I’ve found that to be pretty rare.

    I’m a big fan of “What killed you?” Usually the meter I have running live in a fight is “Deaths”. When someone dies in a raid, I want to know why. If they stood in fire then I can see that and address it immediately, if they died because they were taking raid wide damage and didn’t receive a heal for 6 seconds, then I can address that as well.

    The moral of my story is something I’ve seen many people state before and what you say so eloquently in your blog. Meters are good and bad, it depends on how they are used. If used correctly, they are an invaluable tool for evaluating raid performance and will greatly contribute to a raids success. If used improperly they will report the epeen numbers you put up on your 15 sec wipe.

  2. Qat

    This is perhaps the best breakdown of holistic meter-use I’ve ever read. My endgame experience (11/12 icc 25 hm, BT/Sunwell in BC) has been full of healers who use the healing done/HPS meters as bragging points — ref: CoH spam priests in BT or the holy pally who refused to do JoW even when JoL wasn’t needed– or else refuse to acknowledge meters as useful… neither extreme is helpful.

    I really liked your point about target switching. I don’t normally think of target switching as a throughput loss (having generally been a raid-only healer), but it’s given me something to think about.

    I also heartily agree with using metrics that include support activities like dispels and interrupts, as well as incoming damage taken. I’ve been in raids where the resto druid has been in meter trouble because he has to stop HoT-ing to try and save a tunnel-vision Chain Healing shaman who won’t get out of the fire.

    Another issue you didn’t really address but I think is a major symptom of meter-whoring is healers who will crush everyone else on easier fights where awareness checks are at a minimum and no one is ever at risk of dying, and use those numbers to justify their poor performance on progression or difficult healing fights.This type of person was often a HoT spammers ( holy priests and druids alike) who would soak up all the incidental minor raid damage with rolling hots.

    My healing background is mostly priest, and I’m changing to shaman for cataclysm, so I’m new to the shaman game. I’m also new to this blog, but I’ll be putting it on my list of regularly read resources. Keep up the good work.

  3. Narci

    Excellent post – I’m always impressed by your ability to integrate interdisciplinary information in such a dense and objective way.

    I agree with the poster above that as a RL/Individual, the stat I want to know RIGHT AWAY is “Deaths”. Determining why people are dying, and how to fix it, is the surest way to make progress. It’s also the #1 thing I can control as a healer. Once we hit an enrage with 10/25 people alive, I’ll worry about DPS.

    One tool you might not know about that starts to allow comparisons like this is Kamigami CompareBot, which allows you to compare performance between two individual’s WOL parses. The coding on this was complicated enough to let me see why more comprehensive tools aren’t readily available. I’d imagine the author would be interested to hear your thoughts on it, though.

  4. Patrick

    Thanks for another well-written and thought-provoking post. At a broad level, I agree with the direction you’re moving in: healing meters can be a valuable tool for evaluation, and some of their limitations can be overcome by designing better metrics. On the other hand, there are some real challenges to the concept of numeric evaluation that you don’t really deal with. It seems like you are engaging this issue at a deep level, so I thought it would be useful to challenge you on the premise of this post.

    I want to clarify what we’re actually trying to measure. Success in an encounter in WoW is effectively binary. As a result, the underlying measure of a player is how they affect the teams’s probability of sucess. The binary nature of a boss fight makes this type of evaluation vastly easier than many similar situations in the real world, but it’s still a hopelessly complex problem. In a boss fight, we face a continuous stream of decisions with multiple available actions, and at each point in time, our choice will affect the probability of success.

    Any metric is therefore going to be an imperfect proxy of your true contribution. As you identify, for damage dealers it is much easier to relate a metric we can measure (DPS) to our probability of success than it would be for healers. There is a readily apparent argument that making the fight shorter will increase our probability of success. This same argument is less clear for healers. Metrics such as effective healing or active time are much more loosely related to success.

    As a result of these factors, the section of metrics is an inherantly subjective process. For instance, I’m pretty sure that most Disc Priests are more likely to place a higher value on absorbs than other classes. More generally, you identify that people tend to overrate their own performance, but metrics aren’t inherantly a solution to that issue because people select the metrics that make them look good. As you expand the number of ways to evaluate a player, you make rating a more subjective process because there are more dimensions in which someone can excel, and there are no objective weights for the various measures.

    In addition, anytime you create a metric, you face the temptation to tailor your play to maximize your score, rather than maximizing your chance of succeeding on the encounter. The more you invest in the metric, the greater this temptation will be. Snipe healing is an example of playing to maximize HPS rather than chance of success. As we move towards limited mana pools in Cata, a lot of healers are going to go OOM while outperforming on the meters because they use their expensive heals to top people off who don’t immediately need a heal. Any metric you devise will have this issue though. A more complex metric will just hide these factors better. If we universally adopt a complex metric, and that metric happens to overweight effective healing, then over time we will gravitate too far towards low overhealing percentages and underweight other things. Your argument is basically that a better healing metric will encourage better healing, and I agree with this up to a point. But as you invest more heavily in a metric, you weigh the results of it even more, and you become more susceptible to its limitations.

    The other side of the coin is empirical analysis. Your metrics are fimly rooted in theory: you think it’s important to keep effective healing high, so you put it into your calculations. At the extreme other end of the spectrum, in theory you could build stastical models that look at the outcome of a large number of fights and seek to indentify which factors are most strongly correlated with success. Most people exist somewhere in the middle ground. They build mental models based on their past experience (empirical), and then intellectually construct theories to explain this experience (theoretical). Metrics assist in this process, but they also can contribute to laziness. My prior guildleader and MT would glance at the healer meters every so often and make broad statements about one healer being better than another one. He consistently undervalued out holy pally (who ironically spent basically the entire fight healing him).

    I’ve rambled on too long, but my basic point is that metrics will always be subjective and imperfect, and developing better metrics will never obviate the need to use them intellegently. In the end, quality evaluation will still depend on the creativity and attention of the evaluator.

  5. Pruritis

    Love the post. It’s interesting to see how people feel about meters. As a long time discipline priest (new to this shaman thing) I’ve hated healing meters for a long time. Absorbs are rarely looked at when evaluating a ‘healing team.’ I’m glad the new recount has included an “absorbs” section in it but people tend not to look there.

    A lot of people only look at two sections of their meter, (one section in most cases) healing done and damage done. While it makes logical sense that the more damage or healing someone does the better they are at the internet. Like you said it’s not the whole story. A smart person can look at meters intelligently and understand different components of a fight. Some classes just do better in different fights, and a lot of people understand that and take that into account while others don’t. The bottom line is you can’t fix stupid. People will always look at the mage pulling top dps who stood in fire the whole fight and say WOW look at that, while shaking their finger at the priest who had 43 dispels but lowest healing of Faction Champs.

    Another thing about meters (particularly with healing meters) is that they vary so much based on how much damage is going out in a fight. For example it’s really easy to put out crazy HPS and healing done on a fight like Blood Queen whereas for Lady Deathwhisper there just isn’t a whole lot of damage going around. So a shaman number 1 on healing done for Deathwhisper isn’t an accurate representation of what they can do on a high damage fight. So just looking at that simply doesn’t equate to how much ‘skill’ they have as a healer.

  6. Eso

    Great post as always and as a raid leader I couldn’t agree more with Narci.

    Thanks for the link to comparebot I will definitely have to check that out.

    On the topic of performance metrics (sort of), I have been working on an addon to track the benefit of our mastery. i.e. for any given fight track how much of the mastery healing was effective healing and its benefit to overall effective healing.

    Anyone know of any similar projects that would save me time of developing it any further, or would anyone be interested in a copy of it?

  8. Cadence

    Eso: I’d definitely be interested in it. I’m pretty curious about how mastery will work into our overall healing.

    Also, Vixsin, thank you. I’ve always found it frustrating that meters don’t really apply to evaluate my ability as a healer. This gives me another tool, and ideas towards new ones.

