Replacing the REF assessment of research outputs by a process of collecting peer-provided ratings

Posted by on December 10, 2015

Research Assessment FrameworkThere are several countries where the performance of research institutions is assessed periodically, typically for allocating institutional funding. The UK’s Research Excellence Framework (REF), formerly named Research Assessment Exercise, is a prototypical example of such assessments. A key part of REF is the peer review, by about 900 panel members, of a large number of research outputs (191,150 in the last REF) submitted by the assessed higher education institutions (HEIs).

The Metric Tide, a recent  report commissioned by the Higher Education Funding Council of England, investigating the role of metrics in research assessment, analysed, among others, their potential role in the next REF and found that traditional metrics cannot supplant the peer review component of the REF. But, in a previous blog post, I have argued that aggregating ratings provided by post-publication peer review by the scientists who read the publications for their own research leads to metrics that are superior to the traditional, citation-based ones.

Could these new metrics based on ratings replace the peer review part of the REF? I will argue that research assessment can be achieved better and at a much lower cost using rating-based indicators rather than the current organization of the REF. This probably generalizes to other national assessments of research.

A new proposed process

Let us assume that the REF organizers, instead of requesting HEIs to select and submit information about their staff and research outputs, they request HEIs to ask their research-performing academic staff to publish a rating for each scientific publication they read in its entirety and with great care, that has at least a UK-based author, and has been published in the last 6 years (the period covered by the last REF). Also, let us assume that the number of staff pertaining to this category equals the number of the staff submitted to the 2014 REF. According to the computation detailed in the Appendix below, this will lead, in 6 years, to about 113,450 articles with UK authors that will receive at least 3 ratings, 71,586 articles with 2 ratings and 156,187 articles with one rating. The number of articles with at least 2 ratings, 185,036, is 18% higher than the number of articles that were submitted as outputs to REF, 157,021, and the analysis can be extrapolated proportionally to other types of research outputs.

Advantages of the process based on the aggregation of ratings

If two assessors read each output in the REF, then this is equivalent to the information provided by two independent ratings. In reference to the available official information, at least two assessors per output were used by Main Panel A; two assessors per output were used by Sub-Panel 17; but I could not find official information regarding this issue for other panels or subpanels. Assuming that, on average, two REF assessors read each output, by using the new proposed process of aggregating ratings, the number of outputs for which the equivalent evaluative information is obtained is 18% higher than in REF. For about 72% of the number of outputs reviewed in the REF, the aggregation of ratings will lead to at least 3 instead of 2 ratings, i.e. more than 50% additional evaluative information per output.

Another source of extra evaluative information is the format of the ratings: while in the REF outputs are classified into 5 categories, ratings can be expressed on a scale of 100 percentile ranks, and the uncertainty of the reviewer is also collected in the form of an interval of percentile ranks.

The most important improvement brought by the aggregation of ratings is the involvement of many more scientists than the assessors used by REF (e.g., more than 50,000 instead of 898 or 934). It has been previously argued that the REF panel members do not necessarily have sufficient expertise in the core field of the assessed outputs, and that “the mechanisms through which panelists are recruited are tailor-made for the sponsored replication of disciplinary elites”. In the case of the proposed process, ratings will be given by the scientists who use the outputs for their own research needs, and, therefore they will provide ratings for outputs that belong to their core field of expertise. There are concerns about the biasness of REF against interdisciplinarity, which the proposed process will eliminate.

Therefore, the quantity and quality of the evaluative information regarding research outputs obtained by this method of aggregating ratings will be much higher than the one provided by the last REF.

Moreover, instead of the assessment exercise taking place once every 6 years, by using the new proposed process the evaluative information will be available in real time, as scientists read new publications. The allocation of funds could thus be adapted on a finer time scale (e.g., yearly) in response to the changes of research quality and importance in the considered time interval (e.g., the last 6 years).

Significant cost decreases

According to official estimates, REF cost £246M. Out of this amount, £78M represented  research outputs-related costs supported by the HEIs (including the costs of panelists, central management and coordination costs incurred in relation to research outputs, costs of reviewing / negotiating selection of staff and publications, and costs of validating / extending bibliographic records for submitted research outputs; see the Appendix). These costs will not be incurred if using the proposed process, because no panelists will be needed to review publications and HEIs will neither have to select outputs nor incur costs for selecting staff. The rated publications are implicitly selected by scientists while they are looking for publications that are useful for their own work. The staff that is eligible to submit ratings could be selected broadly according to criteria that do not require any selection effort, e.g. according to their academic position within a HEI.

With the proposed process, the extra time that will be spent by scientists in providing the ratings over 6 years is estimated to cost less than £4M (see Appendix). Therefore, the net savings relative to REF will amount to more than £74M.

Towards an international process for aggregating ratings

If the ratings are shared publicly, through a platform such as Epistemio, then the same process of aggregating ratings could be used by funders from many countries. Processes similar to the REF are currently used, e.g., in Australia, France, Hong Kong, Italy, the Netherlands, New Zealand, Portugal, Romania, and Spain. If scientists agree to publish ratings not only of publications with authors from their country, but also for all recent publications that they read thoroughly, perhaps as a consequence of agreements between national organizers of assessment exercises, then the quantity of evaluative information will increase significantly. As already mentioned above, if each scientist would publish one rating weekly, 52% of publications would get at least 10 ratings.

The aggregated ratings could be used not only for the allocation of institutional funding, but also for the assessment of individuals that apply for jobs or promotions. For example, 10 ratings for each of the 10 best papers of the assessed individual, given by 50-100 international experts in the core field of each publication, could provide a much deeper and accurate evaluative information than one typically available from the traditional 3 reference letters, or from a typical hiring committee of 5-10 members who may lack the time to read thoroughly all the 10 publications nor always have expertise in the core fields of the assessed publications, similarly to the case of the REF panel members.

Other considerations

The calibration of the assessments within the panels was an important component of the REF assessment process. Such a calibration was needed because, e.g., categorizing one research output as “world leading” vs. “internationally excellent” is not obvious. The Epistemio rating scale uses the set of all publications read by the reviewer as a reference. Calibration between reviewers is implicitly achieved if the reviewers, through their training and research experience, have read a large sample of the relevant research in their fields that highly overlaps with the sample read by other experts in these fields. Automated normalization methods, such as the one used by the Computer Science and Informatics REF panel, may also be used.

The possibility of gaming is an important concern for such a process. Rings or cartels that engage in abnormal mutual exchanges of positive ratings can be detected automatically and be eliminated from the analysis after further investigation. Obvious conflicts of interest can be automatically detected given information about the present and former institutional affiliations of scientists and about their co-authorships. It is also the duty of scientists to conduct themselves ethically and refrain from rating publications in the case of conflicts of interest. The organizers of national assessment exercises could arrange to have the agreement of participating scientists to disclose their identity to these organizers, even though their anonymity to other parties will be preserved. In this case, the organizers could use typical processes for screening conflicts of interest, such as those used by funding agencies for the assessment of proposals. The use of ORCIDs and of institutional email addresses that can be linked to publications and institutional affiliations can prevent fraud by identity theft.

Appendix

According to a study of a sample of US faculty members, one of them reads, on average, 252 scholarly articles per year. Out of these, 30.8% (at least 77) are read in their entirety and with great care. About 78% of these (at least 60) are articles published in the last 6 years. I will thus consider that each of the UK research-performing academic staff can reliably rate at least 60 articles per year. This is a conservative extrapolation, because it does not include the articles that are read partly with great care (an extra 33.4% of readings).

I consider the number of the UK research-performing academic staff that will provide ratings equals the number of the staff submitted to the 2014 REF, i.e. 52,061 (about 27% of the academic staff employed in the UK HEIs). I also consider that the share of publications that have at least one UK author in the publications that they read equals the share of publications with at least one UK author in the world’s scientific publications. This is a conservative estimate, because UK publications could be read more because of their higher than average quality and better local dissemination of local publications.

Adding the number of documents published in each journal listed by SCImagoJR in 2014 results in a total number of 2,351,806 documents, out of which 160,935 (6.84%) are attributed to the UK.

The number of ratings of articles from the last 6 years and with UK authors that can be provided by the UK research-performing academic staff, per year, is 52,061 x 60 x 6.84% ≈ 213,658. The average number of ratings per article with UK authors is 213,658 / 160,935 ≈ 1.33.

The ratings are distributed unevenly across articles: some are not read (and rated) at all, while some are read (and rated) multiple times. Considering that the distribution of readings across articles is similar to the distribution of citations, I used a previously published model to compute that about 12% of articles (i.e., 113,450 in 6 years) will get at least 3 ratings, about 7% (71,586) will get 2 ratings and about 16% (156,187) will get one rating.

To estimate the time spent by scientists to read the articles that are rated, I considered that the longest reading durations reported in the study of a sample of US faculty members are for the articles that are read in their entirety and with great care, i.e. those that will be rated. Then, 28% of articles (8.7% / 30.8%) would be rated after being read for more than one hour, 68% (20.8% / 30.8%) would be rated after being read between half an hour and one hour, and the remaining 4% would be rated after being read between 11 and 30 minutes. These estimates are similar to the estimate of less than one hour spent by one REF assessor for reading one of the submitted research outputs if each output was read by two assessors. It has been previously argued that this time spent per output is much less than the one spent by the reviewers who assess a manuscript prior to publication, and indeed, a pre-publication review for a paper takes, on average, 8.5 h (median 5 h) for a typical scientist, and 6.8 h for an active reviewer. However, this is probably because of the relatively more effort needed for devising improvements to the current form of the manuscript and putting them in writing, while, as described above, the average time needed for reading thoroughly an article for the scientist’s own research needs is around one hour.

To estimate the extra time spent by a scientist for providing the rating, we estimate that the extra time spent on one rating is 5 minutes. Taking into account, as above, that a scientist can rate 60 articles per year out of which 6.84% are from the UK; assuming that a full-time job comprises 1950 work hours per year; and overestimating the average cost of the yearly salary of reviewers to the level of a senior-grade academic, £69,410, we get the total cost of providing ratings over 6 years as 60 x 6.84% x 5 / 60 / 1950 x 69,410 x 52,061 x 6 ≈ £3.80M.

According to data from the official REF Accountability Review, the research outputs-related costs of REF can be computed to amount to £78.04M, given that:

  • The cost of panelists (excluding costs related to impact assessment) was £19M;
  • Out of the £44M reported for central management and coordination costs (within the HEIs), an average of 40% are reported to be incurred in relation to research outputs, i.e. £17.60M);
  • Out of the £112M reported for costs at the unit-of-assessment level, excluding costs for impact statements and case studies, an average of 55% was spent on reviewing / negotiating selection of staff and publications, and of 12% for validating / extending bibliographic records for submitted research outputs. The total is 37%, i.e. £41.44M.

Aggregating ratings leads to peer-review-based metrics

Posted by on December 9, 2015

blog-banner-metricsMetrics (quantitative indicators) and peer review are often seen as two opposing paradigms used in research assessment, e.g. in The Metric Tide, a recent  report commissioned by the Higher Education Funding Council of England, investigating the role of metrics in research assessment. Peer review is considered as the gold standard for research assessment. However, decision makers need quantitative indicators because these provide the data which is required for optimal allocation of resources to research and for establishing whether these resources were spent efficiently.

In fact, peer review can lead to quantitative indicators that can be used by decision makers. For example, in the UK’s Research Excellence Framework (REF), peer review assigned research outputs to categories, and the percentage of outputs for each category defined a numerical indicator.

REF, however, is costly, requires a complex organization, and its results are not optimal. A better way to define indicators based on expert judgment, which substantiates decisions for allocating resources to research institutions and to scientists, is by aggregating ratings of the scientific publications that are read by scientists for the purpose of their own work.

Every scientist reads thoroughly an average of about 77 scientific articles per year for his or her own research. However, the evaluative information they can provide about these articles is currently lost. Aggregating in an online database ratings of the publications that scientists read provides important information and can revolutionize the evaluation processes that support funding decisions. I have previously estimated that, if each scientist would publish one rating weekly and one post-publication peer review monthly, 52% of publications would get at least 10 ratings, and 46% of publications would get at least 3 reviews. The publications that would get the most ratings and reviews would be the ones that are most read by scientists during their typical research activities.

Online-aggregated ratings are now a major factor in the decisions made by consumers when choosing hotels, restaurants, movies and many other types of services or products. It is paradoxical that in science, a field for which peer review is a cornerstone, rating and reviewing publications on dedicated online platforms, after publication, is not yet a common behaviour.

To achieve this kind of ratings, an appropriate rating scale should be defined. Online ratings typically take the form of a five-star or ten-star discrete scale: this standard has been adopted by major players such as Amazon, Yelp, TripAdvisor and IMDb, and also by the REF. However, these types of scales are not able to accurately measure the quality and importance of scientific publications, due to high skewness of the distribution of its values across publications. Similarly to the distributions of other scientometric indicators, the maximum value could be of about 3 to 5 orders of magnitude larger than the median value. Therefore, a scale of 5, 10 or even 100 discrete categories cannot represent well this variability if the values that the scale represents vary linearly across categories. A solution to this conundrum calls for experts to assess not the absolute value of quality and importance, but its percentile rank. Since raters should be able to express their uncertainty, the rating should be given as an interval of percentile rankings. I have presented the resulting rating scale at this year’s International Society of Scientometrics and Informetrics Conference.

This scale can be used for rating on Epistemio any scientific publication, thereby making metrics based on peer review a reality. Any scientist can publish an assessment of the publications that she / he has read lately in less than one minute, by going to epistemio.com, searching the publication, and adding a rating. About five extra minutes are needed once to sign up, at the first use of the website. Ratings and reviews can be either anonymous or signed, according to authors’ choice.  Epistemio hosts freely these ratings and reviews and provides them under an open access licence. The copyright for reviews remains with the authors.

Ratings on one publication given by multiple experts can be aggregated in a distribution. Individual publications can be ranked according to their rating distributions. Distributions corresponding to multiple publications in a set can be aggregated in a distribution characterizing the set. Sets of publications (and, implicitly, the entities defining a set — scientists, units, institutions) can be ranked by directly using these distributions. The public results of REF are similar distributions. Such usage is reasonable if each set includes the same number of top publications of an entity, relative to the entity size, and differences between the typical numbers of publications per scientists among disciplines are taken into account. The last condition is implicitly fulfilled if rankings are performed within disciplines, as in the REF. One may also define a function mapping ratings to absolute values, in order to clarify, e.g., the equivalence between several low-rated publications and a high-rated publication. In this case, selecting a number of top publications of an entity is not necessary. An example of such a function is the relative amount of funding allocated by the UK funding councils for the various REF output categories.

Rating-based indicators solve the most crucial problems typically associated with traditional bibliometrics:

  • The coverage of citation databases is uneven across disciplines and publication types; in particular, the coverage of arts and humanities is limited. This limits the applicability of citation-based indicators across disciplines and publication types. With rating-based indicators, any type of publication from any field, including arts and humanities, can be assessed on equal grounds. Here, “publication” refers to any type of research output that is publicly available and that can be uniquely identified through a reference, including journal articles, conference papers, book chapters, books, reports, preprints, patents, datasets, software, videos, sounds, recordings of exhibitions and performances, digital or digitalized artefacts, and so on.
  • Citation-based indicators, when used across fields, need to be field-normalized, with all the associated problems of defining fields. Such normalization is not needed for rating-based indicators if the sets associated with rated entities have an equally small number of publications. There is no bias against interdisciplinary work.
  • Citations need time to accumulate: the first citations of a new publication appear after a full publication cycle that may take quite some months, up to one or two years. The first ratings of a new publication may appear much faster, in a few days or weeks after it has been published, immediately after the publication is read by specialists in the field.

In a following post, I argue that assessment of research outputs can be achieved better and at a much lower cost using rating-based indicators rather than the current organization of the REF.

A code of conduct for post-publication peer review

Posted by on December 6, 2015

Post-publication peer reviewAt Epistemio, we believe that post-publication peer review will take an increasingly important role in research assessment. For example, aggregating ratings will lead to peer-review-based metrics of quality and importance of individual publications, eliminating the problems of current indicators indicated, for example, in the San Francisco Declaration on Research Assessment (DORA).

Post-publication peer review will achieve its potential only if it will be performed responsibly and ethically. While there are various codes of conduct for traditional pre-publication peer review, there were no clear guidelines for post-publication peer review.

We have recently developed a code of conduct for post-publication peer review, by adapting the Committee on Publication Ethics (COPE)’s Ethical Guidelines for Peer Reviewers. This code of conduct has already been included in our recently updated Terms of Use, which should be observed by all of our users, including those who post on Epistemio ratings and reviews of the publications they read.

Here is this code of conduct:

Scientists should publish ratings or reviews of a publication only if all of the following apply:

  • they have the subject expertise required to carry out a proper assessment of the publication;
  • they do not have any conflict of interest;
  • they have read the publication thoroughly and with great care.

Situations of conflict of interest include, but are not limited to, any of the following:

  • working at the same institution as any of the authors of the publication (or planning to join that institution or to apply for a job there);
  • having been recent (e.g. within the past 3 years) mentors, mentees, close collaborators or joint grant holders with the authors of the publication;
  • having a close personal relationship with any of the authors of the publication.

Additionally, all of the following should be observed:

  • the assessment should be based on the merits of the publication and not be influenced, either positively or negatively, by its origins, by the nationality, religious or political beliefs, gender or other characteristics of the authors, by commercial considerations, by any personal, financial, or other conflicting considerations or by intellectual biases;
  • the assessment should be honest, fair, and reflect the reviewer’s own views;
  • the review should be objective and constructive;
  • the reviewer should refrain from being hostile or inflammatory, from making libelous or derogatory personal comments, and from making unfounded accusations or criticisms;
  • the reviewer should be specific in her/his criticisms, and provide evidence with appropriate references to substantiate critical statements;
  • the reviewer should be aware of the sensitivities surrounding language issues that are due to the authors writing in a language that is not their own, and phrase the feedback appropriately and with due respect;
  • if the review or comment is anonymous, the reviewer should not write it in a way that suggests that it has been written by another identifiable person.

Publishing on Epistemio a rating of a scientific publication that you have read lately takes no more than one minute. Here is how you can do it:

  • Log in or sign up;
  • Search the publication you would like to rate, for example by typing its title;
  • Add the rating;
  • Optionally, add a review that supports your rating.

Celebrating peer review week

Posted by on September 28, 2015

peer review week

Peer Review Week (September 28th-October 4th) is an occasion to celebrate this activity that is a keystone of science. We invite scientists to join us for this celebration by publishing on Epistemio a post-publication peer review or rating of one of the scientific publications they have read lately. Publishing a rating takes no more than one minute:

  • Log in or sign up;
  • Search the publication you would like to rate, for example by typing its title;
  • Add the rating;
  • Optionally, add a review that supports your rating.

Ratings and reviews may be either anonymous or signed.

By providing such ratings, you contribute to building peer-review-based metrics of quality and importance of individual publications, eliminating the problems of current indicators indicated, for example, in the San Francisco Declaration on Research Assessment (DORA).

A new scale for rating scientific publications

Posted by on July 23, 2015

We are officially announcing the launch of a new scale for rating scientific publications, which scientists may use for contributing to the assessment of publications they are reading. The rating scale has been presented at the 15th International Conference on Scientometrics and Informetrics, recently held in Istanbul, Turkey (Florian, 2015). Scientists can now use this scale to rate publications on the Epistemio website.

The use of metrics in research assessment is widely debated. Metrics are often seen as antagonistic to peer review, which remains the primary basis for evaluating research. Nevertheless, metrics can actually be based on peer review, by aggregating ratings provided by peers. This requires an appropriate rating scale.

rating scale

Online ratings typically take the form of a five-star or ten-star discrete scale: this standard has been adopted by major players such as Amazon, Yelp, TripAdvisor and IMDb. However, these types of scales do not measure well the quality and importance of scientific publications, because of the likely high skewness of the actual distribution of values of this target variable. Extrapolating from distributions of bibliometric indicators, it is likely that maximum values of the target variable can be 3 to 5 orders of magnitude larger than the median value.

A solution to this conundrum is asking reviewers to assess not the absolute value of quality and importance, but the relative value, on a percentile ranking scale. On such a scale, the best paper is not represented by a number several orders of magnitude larger than the number representing the median paper, but just 2 times larger (100% for the best paper vs. 50% for the median paper).

It is typically possible to estimate the percentile ranking of high-quality papers with better precision than for lower quality papers (e.g., it is easier to discriminate between top 1% papers and top 2% papers than between top 21% papers and top 22% papers). Therefore, the precision in assessing the percentile ranking of a publication varies across the scale. Reviewers may also have various levels of familiarity with the field of the assessed publication. Thus, it is useful for them to be able to express their uncertainty. The solution adopted for the new scale was to allow reviewers to provide the rating as an interval of percentile rankings, rather than a single value. Scientists can additionally publish on Epistemio reviews that support their ratings.

The aggregated ratings could provide evaluative information regarding scientific publications that is much better than what is available through current methods. Importantly, if ratings are provided voluntarily by scientists for publications they are reading for the purpose of their own research, publishing such ratings entails a minor effort from scientists, of about 2 minutes per rating. Each scientist reads thoroughly, on average, about 88 scientific articles per year, and the evaluative information that scientists can provide about these articles is currently lost.  If each scientist would provide one rating weekly, it can be estimated that 52% of publications would get 10 ratings or more (Florian, 2012). This would be a significant enhancement for the evaluative information needed by users of scientific publications and by decision makers that allocate resources to scientists and research organizations.

Indicators that aggregate peer-provided ratings solve some of the most important problems of bibliometric indicators:

  • normalizing across fields citation-based indicators is necessary due to differences in the common practices across fields (e.g., the median impact factor or the median number of citations is larger in biology than in mathematics), but widely-available bibliometric indicators are not normalized by their providers;
  • in some fields, publishing in scientific journals is not the only relevant channel for publishing results, but the coverage of other types of publications (books, conference papers) in the commercially-available databases is poorer; this may be unfair for these fields, or requires arbitrary comparisons between different types of indicators.

Indicators that aggregate peer-provided ratings makes possible the unbiased comparison of publications from any field, of any type (journal papers, irrespective of whether they are present in the major databases or not; conference papers; books; chapters; preprints; software; data), regardless of the publication’s age and of whether the publication received citations or not.

References

Florian, R. V. (2012). Aggregating post-publication peer reviews and ratings. Frontiers in Computational Neuroscience, 6, 31.

Florian, R. V. (2015). A new scale for rating scientific publications. In Proceedings of ISSI 2015: 15th International Society of Scientometrics and Informetrics Conference (p. 419-420). Istanbul, Turkey: Boğaziçi University.

Confusing Nature article on peer-review scams

Posted by on December 7, 2014

Nature has recently published a news feature where the authors, all associated with the Retraction Watch blog, discuss some cases where the peer review system has been abused. The article includes a range of confusing statements that, instead of exposing the real flaws in the review processes, in order to help others avoiding these flaws, hides them under a smokescreen of statements about alleged vulnerabilities of publishing software.

The article begins by describing a case where a scientist called Hyung-In Moon “provided names, sometimes of real scientists and sometimes pseudonyms, often with bogus e-mail addresses that would go directly to him or his colleagues” when asked by a journal to provide suggestions for reviewers for his papers.  The article then says: “Moon’s was not an isolated case. In the past 2 years, journals have been forced to retract more than 110 papers in at least 6 instances of peer-review rigging. What all these cases had in common was that researchers exploited vulnerabilities in the publishers’ computerized systems to dupe editors into accepting manuscripts, often by doing their own reviews.” This suggests that Moon was also exploiting a vulnerability in the publisher’s computerized system.

The article then presents another case. “[…] Ali Nayfeh, then editor-in-chief of the Journal of Vibration and Control, received some troubling news. An author who had submitted a paper to the journal told Nayfeh that he had received e-mails about it from two people claiming to be reviewers. Reviewers do not normally have direct contact with authors, and — strangely — the e-mails came from generic-looking Gmail accounts rather than from the professional institutional accounts that many academics use […]“. This led to an investigation that found 130 suspicious-looking accounts in the publication management system, that were both reviewing and citing each other at an anomalous rate, and 60 articles that had evidence of peer-review tampering, involvement in the citation ring or both, with one author in the centre of the ring.

Is the software to blame?

The article explains these cases as follows: “Moon and Chen both exploited a feature of ScholarOne’s automated processes. When a reviewer is invited to read a paper, he or she is sent an e-mail with login information. If that communication goes to a fake e-mail account, the recipient can sign into the system under whatever name was initially submitted, with no additional identity verification.

In fact, Moon was not exploiting a vulnerability of some computer software, but vulnerabilities in the publisher’s process, which were independent of the software used by the publisher to manage the review process. These two vulnerabilities were that the publisher (through the editors) asked authors to suggest reviewers, and that the publisher did not properly check the credentials of reviewers and the association of the emails used by the system with the actual persons selected for being reviewers or with their publications.

The quoted explanation of the feature of the ScholarOne software does not explain how the process was flawed, but only confuses the reader. How can the invitation sent by email to a reviewer go to a fake email account? Was there a redirection of the email through some tampering of the network, or was the email address wrong from the start? What makes an email account fake, as opposed to just another email account? Who initially submitted a name?

What the Nature article seems to describe is a situation where the editors want to invite a particular scientist by sending an email to a particular address, but this address is not actually used by the selected scientist. If this is the case, what made the editors use a wrong address, and what is the blame of ScholarOne software? If the editors get a wrong email address for some scientist, it is irrelevant whether the invitation is transmitted through ScholarOne or through any other software capable of sending an email. In the Moon case, it appears that Moon introduced wrong email addresses while suggesting reviewers. In the Chen case, who introduced the wrong email addresses? Is ScholarOne providing, independently of the editors, a database of email addresses of scientists, and is it suggesting editors to trust that the email addresses actually belong to the scientists in the database, possibly identified by name and affiliation? If not, then the responsibility for using a particular email address belongs to the editor or the publisher. A ScholarOne user guide (see pp. 24-25) suggests that the software has such a database, but it is not clear whether the information in this database is provided by ScholarOne independently of the editors of a particular journal or publisher, or it is just what the editors saved there. Since ScholarOne is provided by the same company that manages Web of Science (Thomson Reuters), does it crosscheck the emails of potential reviewers with the emails of corresponding authors of articles in Web of Science? If not, why? What additional identity verification should be performed by users of ScholarOne? The Nature article does not explain any of these issues.

An extra source of confusion is the story about alleged reviewers contacting an author. This issue seems unrelated to that of falsifying the identity of reviewers. What was the purpose of the alleged reviewers for doing this? Were they trying to recruit a real person in the review ring? Review rings are an issue independent of fake identities, because they might be composed entirely of real persons. Again, the Nature article does not shed any light on any of these issues.

The Nature article continues by telling how Elsevier is taking steps to prevent reviewer fraud by consolidating accounts across journals and by integrating the use or ORCIDs. But how is the risk of fraud reduced by consolidating accounts across journals? Wouldn’t consolidated accounts expose editors to using wrong emails introduced by other people, by trusting data in this accounts without knowing how reliable is it, as in the putative case of the ScholarOne database discussed above? How is the use of ORCIDs decreasing the risk of fraud, as compared to the use of emails as IDs of persons? (A short answer: not much, see below). As in the case of Thomson Reuters, Elsevier also has a database associating the emails of authors to scientific publications (Scopus); is it using this database when selecting reviewers? Again, the Nature article does not explain any of these.

Trusting email addresses other than those included in publications requires careful analysis

Science is a global enterprise, and scientific publishers typically interact remotely with their authors and reviewers. For efficiency purposes, the transactions between publishers and reviewers are typically performed online. This leads to challenges in vetting reviewers. Checking traditional forms of identification, such as government-provided IDs, is not suitable for being included in the typical workflows of publishers. If so, what mechanisms should be used for identifying reviewers?

Peer review of scientific publications implies the assessment of publications by other experts. What defines somebody as an expert suitable for reviewing other publications is the expert’s own publications. Publications typically include the email address of the corresponding author, thereby creating an association between the publication, the name of the corresponding author, her/his affiliation, and her/his email address. If the set of publications associated with an email address has enough relevance for defining as a relevant expert the author supposedly associated with the email, then this email address can be used to identify a potential reviewer because the associations create a link between the email and the sought expertise.

If the email address that is about to be used by an editor for inviting a reviewer cannot be associated with a set of relevant publications, then a careful analysis of the available information must be performed by the editors in order to assess the probability of associating the new email address with the putative person to which it belongs and to publications putatively authored by this person. If the editors or the publishers do not perform this analysis, it is entirely their responsibility and not one of publishing software.

How software can help avoiding misuse

In fact, software can help by automatically suggesting reviewers, given information about the publication to review (its references and text). This avoids asking authors to suggest reviewers, as in the Moon case, which obviously creates a conflict of interest.

Software can also help by searching the newly-introduced email addresses of potential reviewers in publication databases that include the email addresses of authors, thereby validating the associations between an email address and publications authored by somebody who used that email address.

The issue of fake identities overlooks a proper discussion about review and citation rings that can also be composed of real persons that unethically agree to support reciprocally. If a review and citation ring including some non-existing persons was able to publish at least 60 papers, in the Chen case, then the same can happen with rings composed of real persons. Again, software based on the current state of the art in network science and machine learning is able to pinpoint potential unethical review and citation rings, given information about citation networks and reviewers.

Thus, although the Nature article blames software about the peer review scams that were discussed, in fact software can help in preventing such scams. The scams described in the article were caused, in fact, by the negligence of publishers and editors.

Would ORCID help?

The use of ORCID would not improve much the situation, at least in the short term. For ORCIDs to be used instead of email addresses for identifying potential reviewers, there must be a certified association between ORCIDs and publications, similar to how currently email addresses are published within publications. Many publishers now allow authors to associate their ORCIDs to the publications, however, the percentage of publications having associated ORCIDs is currently very small. Then, there is the challenge of associating ORCIDs to actual persons. Anyone can create an ORCID account with a made-up email address and by hijacking the name of somebody else, similarly to how email accounts can be created while using the name of somebody else, as in the case of the scams discussed here.

ORCID will allow organizations to associate themselves with the ORCIDs of individuals actually employed by them, and this will help identify individuals as long as the organizations creating these associations can be trusted. Again, this is something that has to gain a larger adoption in order to be used on a large scale when selecting reviewers. It remains to be seen whether how large the adoption of this mechanism will become, and it is unlikely to generalize because organizations have to pay ORCID for participating in it.

Easily managing institutional lists of publications

Posted by on June 5, 2014

Scientific publications typically are the final result of the work pursued in basic research. Despite that research is an important mission for many universities and the core purpose of research institutes, many institutions are not able to showcase on their websites complete, up-to-date lists of publications authored by their scientists. Lists of publications are not typically available on the webpages of departments, neither. Individual scientists and laboratories typically have such lists on their webpages, but in most cases they are maintained manually, requiring tedious work for keeping them up-to-date. This is why a significant percentage of laboratory lists of publications are outdated. In a sample of labs that have recently received grants amounting to more than one million dollars from the US National Science Foundation, about 38% had outdated or no lists of publications, and among those that had lists that seemed to be up-to-date, about 73% appeared to be manually maintained.

Maintaining such publications lists is indeed quite cumbersome if appropriate software is not used. For each publication, the person maintaining the list must ensure that all information is in place, properly formatted; this means either tediously applying formatting and HTML tags, or filling with copy/paste lots of fields in some database that is later processed by a script for generating the formatted text and the links. If a scientist gets a new publication, some process should be implemented to ensure that the person maintaining the webpage is notified, gets all the details of the publication from the authors, and updates the webpage in a timely manner.

Many research institutions subscribe to databases such as Web of Science or Scopus, where the publications belonging to a particular institution can be searched. However, this is not a reliable way for an institution to collect the list of publications authored by its scientists, because: searching by the name of the institution yields incomplete results, as there typically are many variants of the name in these databases (up to hundreds of variants for institutions with thousands of publications); the institutional profiles in the databases are automatically generated and also have errors, such as including publications not authored by scientists within the institution; and the databases are limited in coverage and do not include all publications authored by the scientists. The providers of these two databases, Thomson Reuters and Scopus, also sell specialized software for institutions to collect validated information about their publication lists, but their systems are quite complex and expensive. They typically require several months for being set up and less than 1% of worldwide universities currently afford them. Being so complex, they are not suitable for smaller units such as laboratories, individual departments or small research institutes.

This is why we have developed Epistemio Outcomes as a service that allows units of all sizes (from laboratories to universities and even national research systems) to easily collect and manage the lists of publications authored by their scientists, while being able to be set up in as little as 24 hours, and also affordable to a wide range of institutions, including ones in emerging countries.

Epistemio Outcomes is based upon our database that includes more than 56 million publications, i.e. of about the same size as Web of Science or Scopus. This means that most publications to be included by someone on their list are already in our database, and no tedious manual typing or copying/pasting is necessary when adding most new publications. This is a major advantage over solutions developed in-house by universities or institutes, which typically lack such a database. Maintaining such a database requires significant effort and computational resources, much larger than what is possible or reasonable for a university or institute to allocate. But such a database is required if we want a system that does not waste the time of scientists and administrators by requiring them to fill manually the details of publications. This is why adopting a commercial solution based upon a database, such as Outcomes, is an optimal way for an institution to manage its publications.

However, it is not technically feasible today to associate publications with institutions and their sub-units entirely automatically, so the stakeholders must confirm that the found publications really belong to the scientists within the units using the service. We expect that, in most cases, scientists themselves will confirm their publications. With our service, they are motivated to do so because by doing it they also update their list of publications for their own needs, such having it available for being exported to a CV or for updating their personal web page, if they use our embedding feature. Confirming publications can also be delegated to administrative personnel.

When setting up Epistemio Outcomes, the administrator can define the organizational structure of the institution (e.g., departments, labs, etc.) and add other administrators for the various sub-units, as needed. The administrators can invite scientists belonging to the institution to log on to Epistemio. Scientists log in to Epistemio and confirm the publications found by our intelligent search. Administrators may also add or edit publications, if the scientists are too busy to do it themselves. Publications are automatically aggregated upwards in the organizational hierarchy, with automated deduplication. The aggregated lists of publications, for all units within the institution, are available for being embedded on the web pages of those units (code must be added on web pages just once, and thereafter the embedding will ensure that the lists stay up-to-date as scientists add new publications). The lists can also be exported at anytime, e.g. for including them in annual reports.

When a scientist writes a new publication, the scientist may go to Epistemio, click on a checkbox to confirm the newly found publication, and the list of publications of all units to which the scientist belongs (research group, laboratory, department, institute, university), as well as the individual list of the scientist, will be immediately updated. This would also immediately and automatically update all web pages these units on which the Epistemio-managed publication lists have been embedded, with no extra involvement of any webmaster.

The service is suitable for units of all sizes, from single labs to national research systems. Even large collaborative projects, such as the European Horizon 2020 ones, could use this system for easily managing the project’s publications webpage.

The system can also serve as a simple, lightweight institutional repository. Epistemio Outcomes allows authors to add links to pre-prints/post-prints/archived PDFs. One of the simplest ways to publish such a PDF is to place it in a Dropbox folder, ask Dropbox to create a link for it, and then add the link to the publication on Epistemio. This simple method can rapidly enable an institution to create a repository of its publications.

Later edit: Epistemio Outcomes is now discontinued.

Talking at the conference “Technologies Transforming Research Assessment”

Posted by on March 13, 2014

Next week I will talk about research assessment in Romania at the conference “Technologies Transforming Research Assessment“, organized at the Parliament of Lithuania.

Here is the abstract:

In 2011, the higher education and research systems of Romania undertook major reforms that were praised by the European Commission and led to what Nature editorialists characterized as “exemplary laws and structures for science”. Research assessment was a key focus of these reforms. This included: introducing a habilitation process, for evaluating an individual’s research achievements in order to become eligible to apply for full professorship jobs in the universities; minimal scientometric standards for an individual’s eligibility for the various levels of faculty jobs in universities, and for the eligibility for submitting grant applications for the major research funding programmes; the assessment of grant applications, which started to use mostly foreign reviewers; a national assessment exercise for the classification of universities and for the ranking of the universities’ study programmes, for which research was a major component; and a national assessment of research institutions. I present the background and the constraints that led to the design of these research assessment processes, and I discuss the choices that have been made. I also discuss some new tools and processes for research assessment that were designed to solve some technical problems encountered during these processes.

To some extent, Epistemio’s features were informed by the issues encountered during the 2011 reforms of the higher education and research systems of Romania, when I have been an adviser to the minister of education and research. For example, the data about scientific publications submitted by universities for the national assessment exercise included many errors, to such an extent that the ministry had to request a re-submission of data. This happened because universities were lacking suitable research information systems to allow them to have accurate information about their scientific publications. Epistemio Outcomes, that we launched in 2014, solves this problem and helps universities to easily aggregate their lists of publications authored by their scientists.

The minimal standards that were introduced in 2011 in Romania were much discussed, and an important issue was to find suitable standards for scientific domains where citations-based metrics, such as the article influence score, were not available or not applicable. Such domains are computer science and, to some extent, some areas of engineering where conferences, rather than journals, are the main vehicle, or an important one, for publishing original research results; and humanities and some social sciences, where there are few citations, and books are the main vehicle of publication, or an important one. The research councils that designed the minimal standards spent much time trying to find suitable equivalents, for these domains, of the article influence score that was used for natural sciences. For example, because there was no article influence score for conferences, an equivalent has been established by using a classification into three categories established by the Australian Research Council. Because there was no citation information for books, an equivalent has been established by the National Research Council by counting the number of WorldCat libraries where the books were available.

The problem of establishing ad-hoc equivalents between inherently distinct metrics would not have appeared if there would have been available a common metric for all types of publications. An obvious common metric are the ratings given by peers. Peer review is the foundation of assessment in science, and metrics based directly on peer review are likely to be much more relevant than any other types of scientometric indicators that are just weakly connected, through proxy intermediaries, to peer review. This is why Epistemio aims to aggregate ratings and reviews provided by peers, especially by those who read anyhow the publications to be rated, for their own research.