The Return Problem

12 Jun

Under real financial strain, the higher education sector is being asked a deceptively simple question: what is the return on all of this? It is one of the hardest questions in all of evaluation - and watching the sector answer it in public exposes a confusion that runs well beyond universities.

Start with the strain, because it is real. The Office for Students projects that, without mitigating action, around 45 per cent of higher education providers in England (some 124 institutions) could be in deficit in 2025/26.¹ Recruitment recovered somewhat, but came in below what institutions had banked on, leaving a tuition-income shortfall against forecast. Courses are closing. Posts are going. And in that climate, "what's the return?" stops being a rhetorical flourish and starts driving decisions.

It is a fair question. It is also a trap, and the trap is this: when budgets tighten, organisations rarely cut what is least valuable. They cut what is hardest to measure. The two are not the same thing - and in a university, they are very often opposites.

Consider what a university is actually in the business of producing.

We have the outcome data. HESA's Graduate Outcomes survey (the largest annual social survey in the country) found that 88 per cent of 2022/23 graduates were in work or further study fifteen months on, and that 76 per cent of UK graduates working in the UK were in high-skilled roles.² Good numbers. But notice what they cannot tell you. They cannot tell you what those same people would have done without the degree. There is no control group of un-educated versions of the same graduates, living out the counterfactual. The honest account of that 88 per cent involves selection (who goes to university in the first place, what the credential tells an employer regardless of what was learned) and many confounds.

This is not a data-quality problem that better measurement would fix. It is the attribution problem, and it is structural. You can observe the outcome. You cannot observe the cause in isolation. Even the current argument over whether AI is hollowing out the graduate labour market remains unresolved - partly because the same difficulty applies: the market is moving for many reasons at once, and disentangling them is genuinely hard.

So the demand for a clean return on a degree (a single multiple, a number for the board paper) is, at root, a category error. It asks a question the phenomenon cannot answer in those terms.

I have spent enough time inside the evaluation of complex systems to recognise where this confusion comes from. It is, in the end, a paradigm problem - a subject I have been writing about, and one I first worked through in my MA dissertation on complexity and sociological paradigms.³

The demand for return-on-investment is a functionalist demand. It assumes outcomes are objective facts, that valid knowledge comes from measurement, and (most tellingly) that what cannot be counted does not count. That is one legitimate way of reading organisational reality. It is not the only one. Burrell and Morgan mapped the competing paradigms half a century ago, and the point that has stayed with me is that much of what a university produces (the slow formation of judgement, the capacity to think, the change in how a person sees) does not live in the functionalist world at all. It is constituted by meaning, and it emerges over time. Held up against a metric, it mostly disappears.

This does not mean measurement is worthless, or that we should retreat into the comfortable position that the important things can never be assessed. I have never believed paradigms are sealed boxes - the boundaries are porous, and there are grounds for choosing between them. The point is narrower and more practical. You have to know which questions a metric can answer and which it cannot, before you build a decision on top of it. The functionalist question (did the number move?) is necessary. It was never sufficient.

I could be writing about my own profession, and the mirror is uncomfortable.

Executive coaches are often asked about ROI. Many buyers want the figure - and the coaching literature itself concedes the ground beneath it. Two decades ago, Carter, Wolfe and Kerrin observed that there is no logical causal link between coaching and improved business results; the connections are interpretative, and multiple other things (a new line manager, a shifting market) move the numbers at the same time.⁴ You never get to observe the un-coached version of the same leader. And the outcomes that matter most (a steadier nerve under pressure, a poor decision not made, a resignation quietly averted, a culture that shifts by half a degree) are precisely the ones least willing to be reduced to a number.

None of this means the change isn't real. Done well, executive coaching produces change a leader can feel and colleagues can see - sharper judgement, steadier decisions, a different quality of attention in the room. The difficulty was never whether the change happens. It is that the change refuses the kind of accounting being demanded of it.

And yet the industry has a standard answer, and it repays examination precisely because it is so widely used. The best-known ROI methodology for executive coaching reaches the moment where it must isolate the effect of the coaching from everything else acting on the same leader - and resolves it like this: it asks the participant to estimate the percentage of the improvement they attribute to the coaching, and then multiplies that figure by their own stated confidence in the guess.⁵ A self-report, weighted by a second self-report, presented as a number with a decimal point. That is not a solution to the attribution problem. It is the attribution problem, wearing the costume of measurement. The confident return that follows was never discovered. It was elicited.

I do not say this from the outside. When I worked through coaching evaluation for my own Level 7 qualification, I went looking for something better than ROI, and what I reached for was the balanced scorecard - the instinct that if a single monetised figure distorts, the remedy is to measure across several domains at once.⁶ I still think that is a real improvement on a lone number. But I have come to think it does not go far enough, and it is worth being honest about why. A scorecard widens the dashboard; it does not touch the attribution problem. Measuring four things instead of one tells you more about what happened. It tells you no more about what the coaching caused. The deeper difficulty was never how many things we count. It is that counting, however broadly, cannot settle a causal question on its own.

There is a better answer, and it is the one serious evaluators have been using for years. You stop trying to prove attribution, and you start building a case for contribution.

This is the heart of John Mayne's contribution analysis, and it is more honest than the alternative without being any weaker.⁷ You cannot prove that the intervention caused the outcome. But you can set out the theory of change explicitly (the causal chain you believe is operating) and then test it. Are the intermediate links actually showing up? Does the sequence hold? Have you taken the rival explanations seriously and shown why they do not account for what you see? What you are left with is not a fabricated number but a credible, evidenced, defeasible argument. It is the same logic that underpins the government's own evaluation guidance, the Magenta Book, refreshed this year: match the method to the question, and be honest about what each design can and cannot establish.⁸ Tellingly, that guidance now frames theory-based methods not as proving an intervention caused a result, but as building the case that it contributed to one.

In a coaching engagement this is less abstract than it sounds. It means agreeing at the outset what change we are actually working towards and what would count as evidence of it; attending through the work to the intermediate signals that it is taking hold; and ending with an honest account of what the coaching contributed, and what it did not. That is a more demanding standard than a headline number, not a softer one - it asks more of me, not less.

Metrics still matter in this picture. They are necessary. They were simply never sufficient - and pretending otherwise, in a complex system, is not rigour. It is the opposite of it.

Which brings me to the part that is easiest to miss.

In a market saturated with inflated return claims (whether from a university defending its value or a coach defending a fee) the most credible position available is honesty about what can and cannot be known. The serious evaluator and the serious coach end up saying the same thing: here is what I can show you, here is what I genuinely cannot, and here is the case for contribution I am willing to stand behind.

That candour is not a weakness in the argument. In a complex system it is the only intellectually defensible position there is.

And it is, in the end, the more reassuring offer, not the less. A coach who hands you a number is selling a certainty he does not have. A coach who tells you plainly what can and cannot be shown, and builds the case for contribution anyway, gives you something that holds when you are asked to justify it.

The sector will come through its financial reckoning. The open question is whether it does so by learning to evaluate what it actually does - or by counting what is easy and quietly cutting the rest.

References

Office for Students (2025) Financial sustainability of higher education providers in England: November 2025 update.
Higher Education Statistics Agency / Jisc (2025) Graduate Outcomes 2022/23: Summary Statistics, released 17 July 2025.
Walker, A. A. (2026) The Paradigm Problem. Drawing on the author's MA dissertation, Understanding and Exploring Signatures of Complexity, University of Hull, 2001 (supervised by Dr Norma Romm).
Carter, A., Wolfe, H. and Kerrin, M. (2005) 'Employers and Coaching Evaluation', International Journal of Coaching in Organizations, 3(4).
Phillips, J. and Phillips, P. (2005) 'Measuring ROI in Executive Coaching', International Journal of Coaching in Organizations, 3(1).
Kaplan, R. and Norton, D. (1992) 'The Balanced Scorecard: Measures That Drive Performance', Harvard Business Review.
Mayne, J. (2008) Contribution Analysis: An Approach to Exploring Cause and Effect, ILAC Brief 16.
HM Treasury (2026) The Magenta Book: Central Government Guidance on Evaluation, updated 15 May 2026.

Andrew Walker

The Return Problem

The Paradigm Problem