University Report Cards, Ratings, Rankings, and Performance Measures

November 6, 2006

President’s Statement to Governing Council, Fall 2006

In the last few months, the Vice-President and Provost and I have briefed governance bodies on our view of some of the university rankings published by for-profit media outlets and non-profit agencies.The Vice-President and Provost has also kept governors apprised of our substantial array of institutional performance measures.More recently, a number of colleagues have asked about my views on the 2006 Globe and Mail University Report Card, released earlier this week (October 31st).This statement accordingly offers some reflections on the Globe Report Card, and a brief recapitulation of where we stand more generally on university rankings and performance measures, including theMaclean’s rankings.

I. THE GLOBE AND MAIL UNIVERSITY REPORT CARD

A) New Format, Strong Messages

First, I welcome the Globe’s new magazine format.The stories in the 2006 University Report Card show that the Globe and Mail has assumed a leadership role in the Canadian debate about higher education.One striking example is an article entitled, ‘Can our schools become world-class?’, by Alex Usher, Vice President of the non-profit Educational Policy Institute [EPI]. Usher offers a trenchant summary of the challenges facing this institution and Canada’s other research-intensive universities.His analysis should be required reading for every Member of the Parliament of Canada and every Member of the Provincial Parliament of Ontario.

B) Fair Data-Gathering – U of T’s Approach

Second, the Globe and Mail has added some new fields of analysis to its previous student survey data.It has done so in partnership with EPI, using publicly-available data sources.This, too, is welcome.Maclean’s has insisted for many years that institutions do extensive and customized data analysis on a pro bono basis for the magazine’s university issues.In so doing, institutions not only used public funds to subsidize a for-profit enterprise, but tacitly legitimized one media source above others.

The University of Toronto’s approach to all these exercises is straightforward.We have been the national leader in institutional performance measurement.We intend to continue developing and posting on-line not only more information, but more partially-processed data for others to analyze as they see fit.

In this respect, to facilitate inter-institutional comparisons, the University has been working closely with other Ontario universities to develop a Common University Dataset for Ontario (CUD-O).CUD-O will use standardized definitions for a substantial array of key variables, and promote consistent analysis of each university’s operations. By defining these variables ourselves, we achieve two goals. We can show how different definitions affect the results and explain any interpretive pitfalls.We can also mitigate the ‘gaming’ that may become prevalent when data are processed for direct consumption by the media.

The first edition of CUD-O should be public in a matter of days.We anticipate adding more variables to the dataset in the months ahead, and we have begun encouraging universities outside of Ontario to participate.This initiative will help to level the playing field among media outlets and agencies that want to publish competing assessments of universities.

More generally, the University of Toronto does not intend to provide a blanket endorsement of any of these analyses by media outlets or other third-parties.The University reserves the right to work with any outlet or agency that is committed to the responsible generation and interpretation of information for our diverse stakeholders.We have a healthy degree of skepticism about the rhetoric of accountability and transparency from those who, quite reasonably, have a certain interest in headline font sizes, circulation numbers, or corporate balance-sheets.Our professoriate has very substantial expertise in institutional performance analysis, and they, along with some of us in the administration, will remain free to comment on the reliability and validity of any of these efforts to grade or rank institutions.

C) An Opportunity to hear from Undergraduates

Third, the main component of the Globe and Mail Report Card consists of the results of a large-scale undergraduate survey, undertaken with the help of a market research firm, The Strategic Counsel.The Globe and Mail acquired survey data reflecting the opinions of about 33,000 full- or part-time undergraduates registered as members of thestudentawards.com database.Unfortunately, we do not have an appropriately detailed description of the methodology.It would be very useful to see full response rates by institution, to know the potential impact of question-framing effects on responses, and to understand the extent of potential sampling biases.However, on the positive side, starting in 2005 the Globe and Mail and Strategic Counsel rejected numerical rankings as uninformative, and instead adopted letter grades to reflect the mean scores that students gave their university for each question on the survey.

With the response template allowing a maximum score of 5.0, institutions received a grade of A+ if their mean score was 4.6 and above for a particular question, A for 4.4; A- = 4.2; and so on in decrements of 0.2 down to C- for 3.0.A grade of D was assigned for a score less than 3.0.Because mean scores varied for each question, scores were standardized.To its credit, the Globe and Mail cautions that “there may not be statistically significant differences separating universities that receive different letter grades, although their mean scores are different”.A full summary of our grades is reproduced in the Table 1.

Some colleagues have downplayed these results because they reflect perceptions or opinions, rather than reasonably obvious facts about the operations of our university.Certainly there are some striking anomalies.For example, our library resources, both in physical and on-line holdings, are unmatched in Canada, and the latest ranking of our library puts it 3rd in holdings in North America, bettered only by Harvard and Yale.Scores ranging from A- to B+ seem hard to fathom for this dimension of our performance.

Another anomaly is a score of B for “Number of courses to choose from”.University of Toronto undergraduates have by far the widest selection of courses of any institution in Canada.“Diversity of extra-curricular activities” also draws a B, even as U of T students have access to over 300 registered clubs, with arguably the largest selection in the country.The “Attractiveness of the Campus” is graded B, similar to many universities with less favoured environments, notwithstanding the beautiful settings of UTM and UTSC, and the remarkable blend of historic and award-winning new buildings on the St. George campus.Last, on “Availability of needs-based scholarships”, we scored C-.U of T commits approximately $45M per annum to needs-based bursaries, and was the first institution in Canada to institute a guarantee that registered students should not be prevented from completing their degrees owing to financial exigency.

While these and other grades may seem inconsistent with reality on our three campuses, any misperceptions of their environment by our students represent an opportunity for improvement simply through enhanced communication.To that end, we have already discussed these results at the executive table and they will inform our ongoing investments in internal communications targeting a range of audiences, not least students.

On the other hand, I must emphasize that many of the grades are perfectly consistent with the results of the preceding year’s Globe and Mail survey and the last two years’ results from the National Survey of Student Engagement (NSSE).Other Ontario institutions, most notably the University of Western Ontario, have achieved higher grades from their undergraduate students with per-student funding no different than ours.We have lessons to learn from these and other survey results, and from our peers.

D) Lessons from our Undergraduates

Here are a few lessons that come to mind from the survey grades:

Many of those who are most knowledgeable about universities view U of T as the best research-intensive university in Canada.Usher, for one, describes U of T as the only Canadian institution that is “genuinely world-class across a wide range of fields”.Table 2 summarizes the recent international rankings of the overall standing of U of T and other highly-ranked Canadian institutions.While I personally view these overall ranking exercises as reductionist and arbitrary, the consistency of the results is apparent.In contrast, the University did not receive an A+ even for areas of obvious excellence, such as “Overall academic reputation of your university” or “Reputation for conducting leading-edge research”.It seems that the institution’s academic strengths are currently lost on meaningful numbers of undergraduates.Why?Perhaps it is again a matter of internal communication.Perhaps our tough grading practices and high academic standards create some friction and our students are responding with some ‘grade deflation’ of their own.Perhaps too few undergraduates have an opportunity to interact personally with some of the renowned scholars who exemplify the international standing of our collegium. Understanding these perceptions may require internal ‘market research’ with our undergraduates.

Other lessons are more concrete.It appears we need to bolster front-line library services and hours of operation, rather than investing solely to defend our acquisitions budget against the ongoing erosion of library purchasing power.We need more quiet study space (not a surprise) and better classrooms and lecture halls (also not a surprise).The range of food services is limited on all three campuses, and more choices and better facilities will be required if we are to migrate from the current D to a more reasonable grade.The concept of expanding merit-based scholarships is now widely accepted, but to my frustration, it has taken months for a consensus to emerge on implementation.We received a D for “Availability of Merit-Based Scholarships” and frankly, it is what we deserve.

In short, many of these findings simply reinforce what we know, and offer a useful reminder that, as funding permits, new investments must be made urgently in facilities and services that our students value and need.

II. OVERALL RANKINGS (including Maclean’s)

I turn next to the question of overall University rankings.This matter is timely given the publication today of Maclean’s universities issue that attempts, among other goals, to rank Canadian universities overall in three categories — primarily undergraduate, comprehensive, and medical-doctoral.For clarity, the latter category is distinguished from the second by the presence of large-scale health science complexes, illustrating from the outset the difficulty with broad-brush comparisons.

A) Ranking and Data Sharing

Maclean’s ranking exercise is different in various respects from the Globe and Mail’s “report card” exercise. The Globe has analyzed third-party data, and displayed the results numerically, rather than in ranks.As well, the survey data from the Globe are used for grading on an A to D scale using the methods outlined earlier.This allows individuals to assess Universities on the basis of criteria that matter to them. Maclean’s to its considerable credit is publishing more and more actual numerical information, and has developed a web-based tool that allows individuals to assess universities on a personalized basis.Unfortunately, Maclean’s remains committed to global rankings; and this feature, together with its requests for customized data, caused a large number of institutions to decline to participate fully in the ranking exercise.

As governors know, the University of Toronto was among the 26 institutions that did not generate customized data for Maclean’s this year.I have already explained that, in past years,meeting the magazine’s data needs had been very time-consuming, and essentially gave one media outlet a monopoly in the field.Our preferred strategy is therefore the one outlined above, wherein data will be made available using standardized definitions for a variety of media outlets and other agencies to use.I attach for reference a letter sent to Maclean’s by eleven university presidents in August, outlining our reasons for declining to submit customized data this year(http://www.news.utoronto.ca/bin6/060814-2502.asp ).Our collective position reflects growing skepticism, worldwide, about reductionist rankings of universities by commercial enterprises.

B) International Skepticism about Rankings

A number of media outlets and non-profit agencies and institutes have attempted to generate assessments of the global performance of post-secondary institutions over the last 20 years.National reports include the US News and World Report rankings of American universities and, obviously, the Maclean’s rankings of Canadian universities.AsTable 2 shows, world university rankings are produced by the Institute for Higher Education of Shanghai Jiao Tong University, the Times Higher Education Supplement and, starting in 2006, Newsweek International.

For a decade now, there has been growing skepticism about these rankings in the US post-secondary education system.The criticisms are essentially identical to those that have been raised about Maclean’s in Canada. In the late 1990s, for example, the majority of American law deans signed a declaration warning prospective students against commercial ranking systems “that purport to reduce a wide array of information about law schools to one simple number that compares all 192 ABA-approved law schools with each other. These ranking systems are inherently flawed because none of them can take your special needs and circumstances into account when comparing law schools.”The deans went on to decry the “arbitrary weighting of numerical factors” in these rankings, taking particular aim at the US News and World Report rankings.They commented:“The idea that all law schools can be measured by the same yardstick ignores the qualities that make you and law schools unique, and is unworthy of being an important influence on the choice you are about to make.”

In the last two years a handful of American universities have declined altogether to participate in ranking exercises.Others have objected publicly to the perverse effects of rankings on institutional decision-making.A number of highly selective liberal arts colleges, including Amherst, Swarthmore, and Williams, agreed this summer to reduce their early admissions offers and associated merit-based scholarships, and to expand needs-based support with a view to enhanced opportunities for lower-income applicants.The presidents of these institutions publicly acknowledged ‘gaming’, whereby they sought to increase the total number of applications, thereby accepting a lower proportion and appearing more selective in the US News and World Report rankings.

The New York Times reported in September 2006 that these same institutions were also “considering creating a new set of statistics to measure their educational standing. The proposed standards would be available to the public, but the individual measurements would not be combined to produce an overall score, as in the ranking guides.” Again, the parallels with CUD-O are obvious.

Inside Canada, there has been similar controversy.Both President Birgeneau and Interim President Iacobucci discussed withdrawal from the Maclean’s rankings with their executive teams.President Birgeneau was particularly concerned that Maclean’s would not modify its global scoring system in a way that better reflected international research excellence.As one example, Maclean’s does not capture international prizes for scholarly excellence where U of T takes dramatically more than its per-capita shares.In 2002 President Birgeneau wrote formally to request changes in the scoring system, but no changes were made.As a new president, I was surprised both by the methodological weaknesses of all these ranking systems, and by the depth of antipathy to them among other university presidents.

I have already commented publicly on some of the conceptual problems with overall rankings of universities.These commentaries include the President’s Column in the Spring 2006 University of Toronto Magazine (http://www.magazine.utoronto.ca/06spring/prez.asp) and a commentary for the editorial pages of the Ottawa Citizen on April 22, 2006 (http://www.canada.com/ottawacitizen/news/opinion/story.html?id=adc4a1df-d148-484f-a569-80c18039f7c6).To repeat a caveat from the U of T magazine:“Imagine a hospital that was superb at heart surgery but had a mediocre obstetrics program. The combined rating for those two programs would be useless for heart patients and expectant women alike! It’s much the same when complex universities are reduced to a single score.”

Media outlets since the 1990s have attempted to defend themselves against this latter criticism with the counter-argument that universities assign students a cumulative GPA.With respect, that analogy is nonsensical.Here is a more accurate classroom analogy:The editor of a newsmagazine works hard to complete a course, but then is assigned an average grade based on the performance of the entire class, with twice as much weight given to the grades of those who happen to sit on the east side of the classroom.So it is when complex institutions, with fifteen or twenty faculties, scores of programs, and hundreds of courses, are represented by one global ‘ranking’.

Let us be clear that rankings are not going to disappear.The appeal of any ranking exercise rests on basic human psychology.Humans clearly have an instinct to orchestrate competitions with first-past-the-post results.Humans also like heuristics — convenient interpretive frameworks to manage complexity.We talk about the “Top 10” as if number eleven did not exist, and we invent categories such as the “Top 25” or “Top 100”.One wonders what would have happened had our species been blessed with six digits on each hand.

The Dean of the Wharton Business School, Patrick Harker, has been a particularly outspoken critic of the way that institutions use rankings for marketing purposes. Both Harvard and Wharton recently declined to facilitate surveys of their business school alumni for ranking purposes, and Dean Harker wrote as follows in May 2004:“There is a very strong consensus among all of the parties I’ve consulted that the ranking methodologies are severely flawed. Some people who agree with that also ask, ‘But if the rankings help us, who cares if they are flawed or give only a limited view of the school?’ But we can’t have it both ways. We either endorse a defective, inconsistent practice, or we speak out, offer better alternatives for information, and work with the media to enable them to report with more useful, objective data.”I support those sentiments.The University of Toronto will need to be careful about how it presents and uses any ranking results in the months and years ahead.

C) Some Limits to the Macleans’ Methodology

I shall comment only briefly on  Maclean’s because the magazine is making positive changes, and the problems with the Maclean’s ranking system are similar to those that have been identified with other ranking exercises.The Maclean’s rankings of Canadian universities have been published for 15 years, and U of T ranked first in the medical-doctoral category for 12 of those years.However, as a telling indicator of the limitations of first-past-the-post rankings, it is clear that U of T and McGill have received extremely similar scores for most of that period.Small wonder:The weights in the original system were almost certainly designed to reflect conventional wisdom about the two oldest and best-known research-intensive universities.

Those weights, as a corollary, are entirely arbitrary.The Maclean’s score for medical-doctoral institutions includes 24 metrics with weighting as shown in categories such as Average Entering Grade (11%),Faculty/Research (17%), Library (12%) or Reputational Survey (16%).As one example of the fallacy of these metrics, there is no consistency in how Average Entering Grade is determined across Canada.Alberta is penalized because its high school students write standardized provincial examinations that lead to sharply lower grade 12 averages.Moreover, rewarding universities that recruit students with the highest entering averages presupposes that we are uninterested in non-academic attributes, or that universities should be encouraged to skim off the students who are easiest to educate, leaving the bigger challenges for other institutions.

As well, much of the data that underpin the Maclean’s rankings are not subject to audit.They are self-reported data based on definitions developed in consultation with the universities in the early years of the publication. The interpretation of definitions and their applicability to each university’s unique circumstances is largely in the hands of those preparing the submission. This has led to apprehensions that ‘gaming’ occurs as institutions seek to put the best possible construction on their data.

A further weakness, shared by Maclean’s and the Times Higher Education Supplement, is heavy reliance on survey data where response rates vary, leading to obvious risks of bias.I should repeat that similar concerns arise with the student survey data, including those reported by the Globe and Mail.We have not studied Maclean’s 2006 university report, given that it is just on newsstands today, but response rates on the reputational survey dropped from 11.1% last year to 10.2% this year.These very low proportions open the door wide to dramatic shifts arising purely from response biases.

All these and many other concerns led a number of university presidents to open discussions in 2006 about the rankings system with each other and with the new editorial team for the universities issues at Maclean’s.The results have already been described.My understanding today is that, among the universities in the Maclean’s medical/doctoral category, at most four — Laval, McGill, Sherbrooke and Saskatchewan — submitted customized data.Maclean’s is accordingly issuing its rankings using last year’s data for a number of fields, and attempting to argue that, since the numbers vary minimally from year to year, this is legitimate.

I do remain optimistic that a more sensible relationship with Maclean’s is emerging from this year’s disagreements.The magazine has indeed been making many positive changes, and I look forward to working with the Maclean’s universities team in the years ahead.

D) Limits to the International Ranking Systems

In fairness to Maclean’s, let me emphasize that each of the international rankings shown in Table 2 shares the general weakness of arbitrarily combining data from variables that may be badly measured from the outset. Each of them has particular flaws. The Times creates a composite score by combining reputation ratings, research outputs, proportion of international students and faculty, student-faculty ratios and survey data from employers or recruiters. In my view, it relies far too heavily on reputational survey data, so that the results for some institutions vary from year to year in rather startling fashion, depending on geographic response biases.The Shanghai scoring system relies on objective research performance measures, but underestimates the role of the humanities and social sciences, and assesses the success of an institution’s educational mission largely on the basis of a handful of graduates who happen to win very prestigious research awards.It says almost nothing about the educational strengths of an institution or the experience of the students who attend it.Newsweek International has tried to balance off the weaknesses of these two systems by simply combining them, and adding an additional 10% weight to reflect the comprehensiveness of research library holdings. Again, however, reasonable people can disagree radically about what weights should be given to any of these elements, as well as the utility of any overall ranking of complex institutions.

III. CONCLUSION: PERFORMANCE MEASUREMENT IN HIGHER EDUCATION

I shall add only one closing reflection and a commitment.As someone who spent much of his academic career in the field of institutional performance measurement in health care, I must say that the state of performance measurement in higher education is deplorable.Inputs are confused with process indicators, processes are confused with outputs, and outputs are confused with outcomes.The current obsession with rankings only adds to the prevalence of misinformation, in part because institutions — not least our own — have been too quick to advertise those rankings that put them in a favourable light, and criticize or downplay those that are less advantageous.As I mentioned at the outset, Toronto has led Canadian institutions in the volume of data that we post on-line (http://www.utoronto.ca/aboutuoft/accountabilityreports.htm) and in the way we analyze and interpret our institutional performance measures.The Vice-President and Provost and I, together with our remarkably capable team of institutional analysts and academic executives, are committed to maintaining that leadership position in the years ahead.