Rankings mislead with unstandardised scores

Recent research into university ranking methodologies has uncovered methodological problems within the most well-known systems currently being produced. One key problem is the summation of unstandardised indicators for the total scores used in rankings and this weight discrepancy can misinform and hence mislead.

Ranking universities has been in existence for the past decade, since the first appearance in 2003 of the Shanghai Academic Ranking of World Universities (ARWU), followed by the Quacquarelli Symonds World University Ranking (QSWUR) and then the Times Higher Education World University Ranking.

A cursory surf on the internet will show clearly the overwhelming responses from almost all corners of the world every time a set of rankings is released. Universities ranked high gloriously proclaim their victories; those ranked not too high claim merits with self-congratulation by shrinking the scope down to a region or even a country so they can proudly announce to be “top in X region” or “top in Y country”.

Those that are unranked naturally are silent so as not to draw attention to the painful fact. With the recent appearances of the THE 100 Under 50 and QS Top 50 Under 50, some disappointed losers are given a second chance to become promising winners of a kind. Glaringly common among the responses is the absence of any discourse on the methodology affecting the validity of the rankings results.

The rankings results are accepted with blind faith although there are critical methodological problems, including spurious precision, arbitrary weighting, the meaning of total scores, and a discrepancy between nominal and attained weights. The last of these is the most disturbing. (See the note at the end for an explanation of nominal and attained weights.)

When a ranking system assigns different weights to the indicators, it explicitly tells the rank-users (mostly university leaders and occasionally policy-makers or politicians) that a particular indicator is x times as important as another indicator. Believing the ranker has produced an accurate ranking, the user then takes action in the hope of achieving a higher rating in the next round.

But if there are gross discrepancies between the nominal weights and the attained weights, the rank-users are misinformed and misled. In a sense, there is a breach of trust between the rankers and those using the rankings, because the consumers have no way of knowing that what they get is not what they were promised.

Discrepancies that have serious implications have been found in all three of the most widely reported systems. Re-analyses of recent years’ data, publicly available from the three systems’ websites, has revealed that the relative strengths of the indicators are not what are assigned or promised.

For instance, the Shanghai ARWU assigns equal weights to its second indicator, staff, and its fifth, sciences. But the staff indicator is almost twice as strong as sciences, with a ratio of 1.8. This means a university administrator would believe the two indicators are equally important – when, in fact, the staff indicator is more influential than sciences in determining the university’s position in the ranking table.

Another example is in the THE ranking, which assigns teaching a weight six times that for international mix – when the actual ratio is only 3.8 times. Without knowledge of such hidden discrepancies, rank-users assume the ranking results to be accurate. They then make conclusions and decisions according to what appears to be true; or they might go further and plan for improvements but on the wrong aspects of performance.

How much time and resources have been wasted by universities because of the impressions created by the biased ranking results can only be guessed at. Obviously, this is one area of research worthy of effort and resources if the THE World University Ranking is to continue to be taken as seriously as it has been.

The release last year of the THE 100 rankings of universities younger than 50 years were based on the same data as the 2011-12 rankings, with a slight adjustment to reputation measures and the focus on universities that are 50 or younger. Because the same methodology was employed, the same bias was repeated.

Although teaching, research, and citations have the same nominal weights of 30% each (ie 1:1:1) in the rankings, teaching and research have a ratio of 0.9 but teaching and research with citations have ratios of 0.5 and 0.6 respectively. These indicate that citations has around twice as much influence on the overall result as teaching and research.

Furthermore, citations is supposed to be 12 times as influential on the overall ranking as industry income for the nominal weights (ie 30% versus 2.5%) whereas instead it is about 16 times as influential. Other discrepancies between the nominal and the attained weights can likewise be found.

In a reassessment that I performed of the data for the 100 ‘young’ universities, six universities did not change their ranks, 33 gained from one to five positions and 15 rose six positions or more. On the other side of the revised scale, 25 universities lost from one to five positions and 21 were down six or more positions. Tables showing these changes can be seen in the full paper.

More ranking systems with new perspectives, such as ranking nations instead of institutions, or ranking ‘young’ universities apart from their more aged peers, have appeared and more can be expected. At the same time, though slowly, the limitations and flaws of university ranking have also begun to surface as more rigorous research is done.

The question is: where do we go from here? Some may see the need for standardisation as an insignificant technical (statistical) refinement. But it is a crucial small step to take before weighting and summing the indicator scores if university ranking is to be trustworthy and rank-users are not to be misinformed.

If the same methodology is continued and if the problem of weight discrepancy is not properly dealt with, universities and their supporters will continue to be misinformed and misled. Acknowledging that university ranking can be done more appropriately and determining to do it the proper way is only intellectual honesty. An immediate action to take is to prevent the weight discrepancy problem from recurring.

Several conceptual and methodological issues have been identified in recent years and they warrant careful research with a view to perfecting the systems. It is wasteful to merely repeat the exercises every one to three years, even with new perspectives, such as at the national instead of institutional level as is done for the Universitas 21 ranking, or focusing on the low-ranking universities as is done in THE 100 Under 50 and QS Under 50.

By using the same conceptual frameworks and methodologies, inherent problems are inevitably perpetuated and rank-users will forever be misinformed and misled. To move ahead, university ranking has to be raised to a level of more rigorous scientific research and must not stay at the level of sensationalised surveys.

Rankers may have to consider possible collaborations instead of working separately with polite but subtle competition. They may even need to work jointly towards a unified system to replace the currently competing systems that confuse the rank-users.

Perhaps, the ultimate goal is a global common data set of university ranking for the common good. At the same time, rank-users need some form of consumer-education to equip them to be informed and discerning consumers who are able to make valid conclusion and wise decisions.


Nominal and attained weights: Typically, those devising a ranking system decide on a set of indicators and assign certain weighting to each of them. The indictor scores are weighted accordingly and then summed for the overall total to rank the universities. This is made up of the indicator scores weighted as the ranker intends, thus maintaining the nominal weights. But this is like adding two different currencies, such as US$50 and SG$50 and claiming the sum is $100.

Problems arise when the indictor scores have different spreads (ie on different metrics). In this case, the difference between the highest and lowest scores for one indicator may be very much greater (or smaller) than the difference for another indicator. When this happens (and it does!), the actual attained weights for the two indicators are not the same as intended by the rankers.

In other words, the indictor scores may in fact contribute much more (or less) than intended to the overall result because of the difference in metric. To prevent this from happening and misleading readers, the indicator scores need be transformed to the same metric before weighting and summing; this obviously has not been done by the rankers.

Analogously, one would convert the two monies to the same currency before adding them. SG$50=US$39.37 and therefore the total is US$89.37; or US$50=SG$63.50 and the total is SG$113.50.

* Dr Kaycheng Soh was formerly head of the centre for applied research in education at the National Institute of Education in Singapore and is now an independent consultant, having worked on programmed evaluation and action research with the Singaporean Ministry of Education and with the Hong Kong Educational Bureau.

This article is an edited extract from “Misleading university rankings: Cause and cure for discrepancies between nominal and attained weights”, published in the
Journal of Higher Education Policy and Management. Read the full paper here.