GLOBAL: University rankings meaningless

David Woodhouse* 07 September 2008

The central criticism about whole-of-institution rankings relates to the methodology that addresses quality in a superficial way but projects a complex image. Most rankings rely on two types of data: data given by institutions that may not be validated, and data obtained from opinion polls in the name of 'expert opinion'. With both components on shaky ground, the use of complex formulae with weights and indicators only helps to project a pseudo-scientific image to outcomes that may be statistically irrelevant.

Rankings rely on quantification, indicators and weights, assigning weights to indicators and using the weighted scores to rank the institutions. This forces the multidimensional quality aspect into a linear scale. In this process, aspects of the institution that cannot be measured with weightings and numbers get distorted.

The 'health warning' in the introduction to the UK's University management statistics and performance indicators includes such comments as, "Unfortunately [combined] rankings do attract a great deal of attention and mislead more than they enlighten". The authors of the Shanghai Jiao Tong University ranking have themselves published extensive criticisms/limitations of their own ranking.

The rankings do not take into account the important qualities of an educational institution that cannot be measured with weightings and numbers: As one critic observes, "Seven of the 10 [rankings] do not include an indicator for teaching quality... Obtaining independent, objective measures of teaching quality is difficult, expensive and time-consuming."

The researchers acknowledge that ... "[The Melbourne Institute Index] offers few insights into which institution has the best teachers or who provides the most value-added experience for students."

The rankings by the Shanghai Jiao Tong are heavily weighted towards the sciences. In Williams and Van Dyke (2004), the weighting given to laboratory disciplines is more than the other disciplines. Other defects are not including equity of undergraduate as a separate factor, calculating the revenue per student in a misleading manner for dual-sector institutions, the compressed scale for the undergraduate students, and the overlap in the research publication factors.

In other words, rankings are largely based on what can be measured rather than what is relevant and important. Furthermore, rankings are sensitive to relatively small changes in the weightings used. Small changes in the weightings of indicators alter the results from year to year without any tangible change between institutions.

The vastly different weights given to the same factor in different rankings demonstrates their arbitrariness. The "choice of weights is subjective and arbitrary, with little or no theoretical or empirical basis", writes one critic. The honest and rational conclusion from this would be that any overall ranking or clustering is meaningless.

Reputation is used too often as a measure of academic quality in ranking. For example, the THE allots 50% weight to a survey of academics in which it asked them to name the top institutions in the areas and subjects on which they felt able to make an informed judgement. It does not say what 'top' might be. Marginson (2006) observes that THE ranking "is a shabby survey and perceived as such".

'Expert opinions' suffer from three major flaws. First, the halo-effect: one department's reputation that the expert is familiar with may indiscriminately influence the rating of the whole institution. Second, so-called experts may be uninformed about all the institutions they are rating. Third, there is a question over the seriousness with which respondents are likely to treat an opinion poll. This all makes the reliability, validity and objectivity of reliance on expert opinion (not professional judgement) highly questionable.

The great majority of league tables fail the normal tests of reliability and validity, including statistical validity, that one would expect of any serious social science enterprise. Differences between the scores of two institutions may be statistically insignificant but they appear in a linear scale which exaggerates the differences.

Using a Bayesian latent variable model, Guarino et al. (2005) analysed a Times Higher Education ranking of UK universities. This showed the much-reduced extent to which meaningful differences can be claimed. For example, University College London was listed 10th in the ranking so the reader is invited, explicitly or implicitly, to understand that UCL is worse than the ninth and better than the 11th listed universities.

The Bayesian analysis suggests that it is only possible to reliably state that (in terms of the parameters under consideration) UCL is 'worse' than the first two entries on the list and 'better' than those from 22 onwards.

What may be useful is a middle ground, putting a large number of institutions into a few groupings, which would allow differentiation but preserve a wide range of institutional types with scope for improvement. This line of thinking is behind the Australian Good Universities Guide and has made many quality assurance agencies opt for reports and grades. The use by the Guide of a five-star rating system to group institutions reduces the problem of making statistically insignificant distinctions, as 'proximate' universities would be likely to get the same rating (although there can still be 'injustice' at the grade boundaries).

So, with these fundamental defects, why do rankings have any traction apart from the human fascination with lists? There appear to be two reasons. One is that they are a ready reference for institutions seeking collaborators, pointing to another institution's international popularity (THE) or some aspects of its research achievements (SJTU). The other is a paucity of other public information. This is relevant most to prospective students.

Rankings are not essential for the first reason, as institutions have long had international collaborations and many international networks of similar institutions have been set up without the benefit of rankings. However, the appearance of an institution in a ranking might bring it to the attention of a potential partner. The defects of rankings mean that this should be used only to create awareness and should be followed by an investigation to determine whether the potential partner is suitable along the desired dimensions.

In relation to the second reason, what widely available information is really needed? We need to know that institutions are doing what they say, for example, in terms of enabling students to achieve the specified graduate attributes, both generic and discipline-specific.

There is something we can say about all qualifications with the same name, all BEcon, for example, that all departments/schools/programmes in all institutions are achieving threshold standards, that institutions have the opportunity to demonstrate points of difference, that institutions have the opportunity to show how well they are performing - not just that they exceed the threshold.

These can be addressed under the following headings:

* Generic attributes

When employers are asked what characteristics they look for in graduates, they tend to list the things we call 'generic attributes'. However, they showed little interest in having a university's claimed generic attributes listed on the recently-designed Australian Higher Education Graduation Statement - because universities can generally not report how well each student has achieved on these attributes.

All universities have statements of the characteristics that they expect graduates to demonstrate. It is the task of the Australian Universities Quality Agency to hold institutions accountable for the statements they make and, through the first cycle of audits, AUQA has asked each university how it goes about achieving the attributes it claims.

While in a few universities the course approval process includes the requirement for proposers to show how the relevant university generic attributes will be incorporated into the curriculum, inculcated through the teaching, and measured along the way, it is more frequently assumed that they will appear, because they are 'inescapable concomitants of university study'. One pressing issue, therefore, is to produce indicators of generic graduate attributes, and reliable ways of measuring their achievement.

* Discipline - common attributes

Australia has a national Qualifications Framework to ensure that qualifications with the same name have some commonality in terms of level. For example, there should be something that is the same for all bachelor degrees which goes beyond the generic attributes we have just mentioned. We should be able to go further than this: it is reasonable to expect that two degrees with the same name (BEcon or BEng) have something in common that relates to the name. This was an expectation set out in the Labor White Paper on Education in 2006.

The UK's Quality Assurance Agency has overseen the development by the higher education sector of 'subject benchmark statements' in about 70 disciplines. These set out what the professionals in the field believe should be the nature and scope of a bachelor degree in that field. It could be valuable for Australia to develop something similar while concentrating on what should be common to all bachelor degrees in that field: we might call these 'subject descriptors'.

Such descriptors could specify threshold standards for the degree in the field. Professional associations should be involved in the process as they have experience in checking at the threshold level to ensure that graduates of a programme have the ability to perform as professionals in the field.

There could be a national assessment on the areas that have been agreed to be common as this, by definition, would not detract from institutional autonomy or diversity. This would permit comparison between the same programme in different institutions, in just the same way we can now compare performance on Australia's course experience questionnaire across institutions. Such 'measurement' would be more meaningful than trying to give a single ordinal number to each institution.

* Discipline - different attributes

Each UK institution is required to write specifications for each programme that aligns with the corresponding subject benchmark statement. A different implementation of this that we might find useful is a statement that builds on our subject descriptors: we might call these 'programme descriptors'.

These would bring out the different characteristics of the specific programme. This would be the programme-level equivalent of institutions stating that they have different characteristics (eg research-focused, technology-focused, regional-focused). Just as universities in the self-selected groupings such as ATN, Go8 compare themselves with each other, so programmes with particular characteristics could collaborate and benchmark effectively, and see how well they are performing in their selected area.

Some prospective students (though not as many as is often claimed) turn to rankings in the absence of other information. Yet Australia has an excellent collection of data currently gathered together in the form of the Institutional Assessment Framework (IAF) data. The non-confidential data in the IAF could be made publicly available through a flexible electronic query system that would allow the user to specify her or his indicators and weights, and hence produce personalised rankings.

A similar system has had some success in Germany and the US is embarking on a 'Voluntary System of Accountability' that will present learning outcomes, costs, graduation rates, graduate destinations, and student engagement.

A federal government working group has recently proposed an Australian higher education graduation statement. At present, it looks primarily like an augmented transcript because it sets out what is currently possible. If we were to develop subject and programme descriptors, these could be added to the statement.

When AUQA was instructed by the ministers of education to pay more attention to reporting on standards achieved in the second cycle of audits, we convened a small reference group with broad representation to propose a 'standards framework'. This framework is in the AUQA Audit Manual and institutions report that it is useful. It does not specify standards but outlines areas in which standards could be reported, and relevant evidence.

References

Clarke, M. (2005), 'Quality Assessment Lessons from Australia and New Zealand', Higher Education in Europe, vol. 30, no. 2.
Guarino, C. et al. (2005), 'Latent Variable Analysis: a New Approach to University Ranking', Higher Education in Europe, vol. 30, no. 2.
Hazelkorn, E. (2008), 'The rising popularity of university rankings: Lessons and implications', Public Lecture, University of Melbourne, 7 April.
Marginson, S. (2006), 'Australian Universities in the Global Context', Campus Review, 22 March 2006, pp.8-9.
Nian, C. l. & Cheng, Y. (2005), 'The Academic Ranking of World Universities', Higher Education in Europe, vol. 30, no. 2.
Stella, A. & Woodhouse, D. (2006), 'Ranking of Higher Education Institutions', Australian Universities Quality Agency, AUQA Occasional Publication, no. 6.
Stella, A. & Woodhouse, D. (2008), 'Promoting Quality Literacy: Undoing the Damage of Rankings', Presentation to Australian Universities Quality Forum, Canberra, July.
Van Dyke, N. (2005), 'Twenty Years of University Report Cards', Higher Education in Europe, vol. 30, no. 2.
Williams, R. & Van Dyke, N. (2004), 'The International Standing of Australian Universities', Melbourne Institute Report No. 4, The University of Melbourne, 24 November 2004.

* David Woodhouse is executive director of the Australian Universities Quality Agency