GERMANY: Another system of ranking universities

Uwe Brandenburg 15 February 2009

Talking about rankings usually means talking about league tables. Values are calculated based on weighed indicators which are then turned into a figure, added and formed into an overall value, often with the index of 100 for the best institution counting down. Moreover, in many cases entire universities are compared and the scope of indicators is somewhat limited. We at the Centre for Higher Education Development are highly sceptical about this approach.

For more than 10 years we have been running our own ranking system which is so different to the point that some experts have argued it might not be a ranking at all which is not true. Just because the Toyota Prius is using a very different technology to produce energy does not exclude it from the species of automobiles. What then are the differences?

First, we do not believe in the ranking of entire higher education institutions. This is mainly due to the fact that such a ranking necessarily blurs the differences within an institution. For us, the target group has to be the starting point of any ranking exercise. Thus, one can fairly argue that it does not help a student looking for a physics department to learn that university A is average when in fact the physics department is outstanding, the sociology appalling and the rest is mediocre.

It is the old problem of the man with his head in the fire and the feet in the freezer. A doctor would diagnose that the man is in a serious condition while a statistician might claim that overall he is doing fine.

So instead we always rank on the subject level. Given the results of the first ExcellenceRanking, which focused on natural sciences and mathematics in European universities with a clear target group of prospective masters and PhD students, we think that this proves the point: only four institutions excelled in all four subjects, another four in three, while most excelled in only one subject. And this was in a quite closely related field.

Second, we do not create values by weighing indicators and then calculating an overall value. Why is that? The main reason is that any weight is necessarily arbitrary, or in other words political. The person weighing decides which weight to give. By doing so, you pre-decide the outcome of any ranking. You make it even worse when you then add the different values together and create one overall value because this blurs differences between individual indicators.

Say a discipline is publishing a lot but nobody reads it. If you give publications a weight of 2 and citations a weight of one, it will look like the department is very strong. If you do it the other way, it will look pretty weak. If you add the values you make it even worse because you blur the difference between both performances.

And those two indicators are even rather closely related. If you summarise results from research indicators with reputation indicators, you make things entirely irrelevant. Instead, we let the indicator results stand on their own and let the user decide what is important for his or her personal decision-making process. In the classical ranking we allow the users to create "my ranking" so they can choose the indicators they want to look at and in which order.

Third, we strongly object to the idea of league tables. If the values which create the table are technically arbitrary (because of the weighting and the accumulation), the league table positions create the even worse illusion of distinctive and decisive differences between places.

They then bring alive the impression of an existing difference in quality (no time or space here to argue the tricky issue of what quality might be) which is measurable to the percentage point. In other words, that there is a qualitative and objectively recognisable measurable difference between place number 12 and 15. Which is normally not the case.

Moreover, small mathematical differences can create huge differences in league table positions. Take the THES QS: even in the subject cluster SocSci you find a mere difference of 4.3 points on a 100 point scale between league rank 33 and 43. In the overall university rankings, it is a meagre 6.7 points difference between rank 21 and 41 going down to a slim 15.3 points difference between rank 100 and 200.

That is to say, the league table positions of the institutions might differ by much less than a single point or less than 1% (of an arbitrarily set figure). Thus, it tells us much less than the league position suggests.

Our approach, therefore, is to create groups (top, middle, bottom) which refer to the performance of each institution relative to the others. This means our rankings are not as easily read as the others but we strongly believe in the cleverness of the users. Moreover, we try to communicate at every possible level that every ranking (and therefore also ours) is based on indicators which are chosen by the ranking institution.

Consequently, the results of the respective ranking can tell you something about how an institution performs in the framework of what the ranker thinks interesting, necessary, relevant and so on. Rankings therefore never tell you who is the best but maybe (depending on the methodology) who is performing best (or in our cases better than average) in aspects considered relevant by the ranker.

A small but highly relevant aspect might be added here. Rankings (in higher education as well as in other areas of life) might suggest that a result in an indicator proves that an institution is performing well in the area measured by the indicator. Well, it does not.

All an indicator does is hint at the fact that, given the data are robust and relevant, the results give some idea of how close the gap is between the performance of the institution and the best possible result (if such a benchmark exists).

The important word is "hint" because "indicare" - from which the word "indicator" derives - means exactly this: a hint, not a proof. And in the case of many quantitative indicators, the "best" or "better" is again a political decision if the indicator stands alone (for example, are more international students better? Are more exchange agreements better?).

This is why we argue that rankings have a useful function in terms of creating transparency if they are properly used, that is, if the users are aware of the limitations, the purpose, the target groups and the agenda of the ranking organisation and if the ranking is understood as one instrument among various others fit to make whatever decision related to an institution (study, cooperation, funding, etc.).

Finally, modesty is maybe what a ranker should have in abundance. Running the excellence ranking in three different phases (initial in 2007, second phase with new subjects right now, repetition of natural sciences just starting) I am aware of certainly one thing.

However strongly we aim at being sound and coherent, and however intensely we re-evaluate our efforts, there is always the chance of missing something, of not picking an excellent institution. For the world of ranking, Einstein's conclusion holds a lot of truth:

Not everything that can be counted, counts and not everything that counts can be counted.
* Uwe Brandenburg is project manager at the Centre for Higher Education Development and CHE Consult, a think tank and consultancy focusing on higher education reform. This article was first published on the GlobalHigherEd weblog which is edited by Kris Olds at the University of Wisconsin-Madison and Susan Robertson at the University of Bristol.