Pitfalls in measuring organizational performance
The human resources director at a former employer of mine was reviewing the results of the annual employee satisfaction survey. The survey measures various aspects of employee engagement such as how well employees understand company strategy, how fulfilling the work is, and whether she thinks of interviewing elsewhere. The organization is scored on each factor. The current year’s scores are compared with previous years’ scores to determine if employee satisfaction is improving in the organization. She touted the high response rate the survey received and was satisfied the survey measured how employees actually felt.
On a different day the CEO was celebrating the gains in Daily Active Users (DAUs) and Monthly Active Users (MAUs) for the past quarter as a testament to the increasing popularity of our online publications.
However, these methods of measuring performance can be misleading and unwary organizations can be left with a false impression of how they are doing. There are two kinds of biases in these measurements that we need to correct for. One arises from ignoring the changes in population mix. Individuals are of different types and the proportions of these types can vary over time. The second is survivorship bias, which arises from only considering individuals who are currently in the population.
How do these biases affect organizational measurements? Just for illustration, consider two types of employees based on their tenure in the organization: new employees with less than 1 year in the organization and those with more than 1 year. New employees, having chosen to join the organization are likely to be enthusiastic about the organization and give higher scores in the survey. Employees who have been in the organization a while have a better understanding of the organizational culture, performance and leadership, could potentially give lower scores. Changes in mix over time, an influx of new employees, for example, could skew the overall scores higher, even though the scores from longer-term employees may have dropped.
Survivorship bias can similarly distort measurement. Employees leave the organization: sometimes due to dissatisfaction with their job, sometimes for better prospects elsewhere. The most dissatisfied employees leave before others do. Employee satisfaction surveys miss the opinions of these most dissatisfied employees. This will lift the performance scores of the organization. And the performance may show an improvement over time even though the performance may actually have gotten worse.
Similar biases confound customer engagement metrics such as Daily Active Users (DAU) and Monthly Active Users (MAU) that are commonly used by online businesses. As measured these metrics hide the degree of user churn the business is experiencing. If the user churn is moderate to high, at some point the stock of new users will diminish and the DAUs and MAUs will drop, the business will assume that some event must have caused the metrics to drop even though nothing has changed. DAUs and MAUs are important measures that should be tracked, but another metric should be given equal importance. This is the proportion of previous visitors (customers) who return.
How then should company performance over time be measured? The right way to measure would be to use an index. Think of the stock markets index such as the Dow Jones and the Standard & Poor (S & P 500) and the consumer price index (CPI) used to measure price changes in the economy. These are designed specifically to measure changes over time. So what’s an index? An index is a fixed set of data points whose aggregate changes are measured over time. The index should be representative. In the case of employees, there should be good representation across gender, experience levels, race/ethnic backgrounds, departments and role or position in the organization. Once the proper categories are defined, an average score can be computed for each category. The overall score is a weighted sum of these scores.. What to do about employees that have voluntarily left? There are two ways to do this: a conservative approach would be to assume that the departed employee would have scored the organization poorly and use low scores (the lowest scores across all responding employees). This may be appropriate when the number of such employees is small. When the numbers are sufficiently large, the departing employee can be asked to fill the survey as part of the exit interview. They can complete the surveys anonymously (since a large number fill out these surveys it would be difficult to identify individual responders). On top of anonymity, surveys can also be made confidential by using randomized response methods– a coin toss decides whether the true employee response or a default response is recorded. The correct aggregate statistics can be computed in spite of the randomization.
The methods I have described certainly add more complexity to the survey process. In many cases it may be unnecessary to take on this complexity. If the organization is fairly stable in terms of its composition, and hiring and attrition rates, the simple survey process may be sufficient. That should be a deliberate choice, however.
DAUs and MAUs are important metrics for online publishers. In addition user retention and engagement should also be tracked. Again, the measurement should be based on an index consisting of a fixed set of users — chosen at a point in time. These users need to be tracked over time. This is easier to do if users are required to login and can be identified. Measuring site performance by user category is valuable to understand differences in user behavior. Categories should be carefully chosen — avoid using categories that have a correlation with the variable being measured (i.e. don’t introduce endogeneity bias). I have seen organizations that use categories such as fans and visitors and track their activity over time. These category definitions suffer from endogeneity bias since they are correlated with the variable being measured! On the other hand, it is fine to define these categories based on their activity over a period of time in the past and measure their performance over a non-overlapping future time period.