María del Carmen Calatrava
Interdisciplinarity has become a major topic in discussions of higher education structures, knowledge production and research funding. The demand for criteria and tools for its evaluation is subsequently increasing. Interdisciplinary research can be evaluated according to its many different aspects—including collaboration, integration of disciplines, generation of new areas of research or solutions to complex problems (Wagner, et al., 2011)—using both qualitative and quantitative methods of analysis.
Most quantitative measures of the output of interdisciplinary research rely on bibliometric methods. Such methods present two very important advantages: (1) they deliver an objective measure of interdisciplinarity, and (2) in combination with computational tools, large datasets can be analyzed in an effective manner. They are increasingly being used to inform policy in science and technology. A recent example is a review of interdisciplinary research conducted by Elsevier and commissioned by the UK higher education funding bodies and the Medical Research Council (Pan & Katrenko, 2015). In order to be accurately representative though, it is essential that interdisciplinary measurements are conducted with reliable indicators.
Citation analysis based on a taxonomy of disciplines
Since interdisciplinary research is often conceptualized as the integration of knowledge, one of the most common methods for its measurement is citation analysis, in which an exchange or integration among fields is captured via discipline-specific citations referring to other fields. In other words, a publication is considered interdisciplinary when it references the publications of more than one field. Such an approach requires a taxonomy of disciplines that classify publications into disciplinary fields (Leydesdorff, Carley, & Rafols, 2013; Porter & Rafols, 2009; Rafols, Leydesdorff, OHare, Nightingale, & Stirling, 2012). Although there is no consensus as to which is the best taxonomy (National Research Council, 2010; Rafols & Leydesdorff, 2009), the one utilized by Web of Science is the one most widely used (Bensman & Leydesdorff, 2009; Pudovkin & Garfield, 2002). The data for the analysis is gathered from Web of Science. This particularly convenient bibliographic resource provides three essential features: it indexes journals in different disciplines, it provides citation records for indexed publications, and it categorizes journals into disciplines within the taxonomy. Once the references of a publication are categorized into one or more disciplines of the taxonomy, its interdisciplinarity can be measured by calculating the number of referenced fields, their proportion, and their similarity, all of which are the basis of widely-used indicators of interdisciplinarity (Porter & Rafols, 2009).
Missing data affects the accuracy of interdisciplinarity measurements
While analytical indicators and tools to measure interdisciplinarity have been refined over time, their results should be understood only as a proxy. The accuracy of interdisciplinarity measurements is directly related to the quality of the underlying bibliographic data, which not only needs to be correct, but also complete. Unfortunately, gathering a correct and complete bibliographic dataset is almost impossible because the data, which is typically gathered from digital libraries, is rarely complete. Even though this problem can be mitigated by gathering publication data from different bibliographic sources, it will continue to exist due to the fact that there is no bibliographic source that indexes all existing scientific publications. For example, Web of Science and Scopus do not cover books, book chapters or many regional non-English journals. Even conference proceedings, which constitute publication venues in many applied fast-changing fields such as computer science, are often not indexed.
For our most recent bibliometric analysis, we gathered 1,746 publications from Web of Science and Scopus. Even after combining the data from both digital libraries, the extraction of references was possible for only 1,068 of them (Calatrava Moreno, Auzinger, & Werthner, 2016). Another source of inaccuracy is created when publications are incorrectly categorized or are not categorized at all into disciplines. The 1,746 publications of our dataset had a total of 12,243 references, of which only 5,310 were categorized into disciplines. This poses a serious obstacle when conducting citation analysis because each citation needs to be categorized into at least one discipline. If citations remain uncategorized, they will not be taken into account in the analysis. The more citations that remain uncategorized, the less accurate the measurement will be.
How much missing data should we allow in a bibliometric analysis?
In order to decrease the amount of unreliable data, previous literature has selected publications with a proportion of categorized references above a threshold value when computing an index of interdisciplinarity (Rafols, Leydesdorff, OHare, Nightingale, & Stirling, 2012). This approach, however, does not take into account that uncategorized references affect the measurement of disciplinary and interdisciplinary publications in different ways. While the uncategorized references of a disciplinary publication are likely to be from the same discipline, the references of an interdisciplinary publication will reference multiple disciplines. Therefore, missing data in highly interdisciplinary publications leads to an underestimation of the extent of their interdisciplinarity.
We have developed a method that addresses this problem. Given a publication and its references (both categorized and uncategorized), our method estimates the uncertainty caused by the uncategorized references. It acts as a confidence indicator that can be used to assess the reliability of bibliographic data and thereby discard unreliable publications from the bibliometric analysis.
Our contribution is a first approach to measure interdisciplinarity taking into account the incompleteness of bibliographic data. Further work will be needed in order to tackle other problems that still affect the results of indicators of interdisciplinary research.
María del Carmen Calatrava is in the final year of her PhD at Vienna University of Technology, Austria. She has an interdisciplinary background in computer science, innovation and education science. She has two master’s degrees, one in computer science and one in innovation in computer science. Her main research interest is data analysis applied to the field of higher education. She is currently analyzing the production of interdisciplinary research within the context of new doctoral structures after the Bologna Process with both qualitative and quantitative methods. Her interest in technology has led her to contribute to the field of business informatics as well.
Bensman, S. J., & Leydesdorff, L. (2009). Definition and identification of journals as bibliographic and subject entities: Librarianship versus ISI Journal Citation Reports methods and their effect on citation measures. Journal of the American Society for Information Science and Technology, 60(6), 1097-1117.
Calatrava Moreno, M. C., Auzinger, T., & Werthner, H. (2016). On the uncertainty of interdisciplinarity measurements due to incomplete bibliographic data. Scientometrics, 107(1), 213-232.
Leydesdorff, L., Carley, S., & Rafols, I. (2013). Global maps of science based on the new Web-of-Science categories. Scientometrics, 94(2), 589-593.
Moed, H., Burger, W., Frankfort, J., & Van Raan, A. F. (1985). The application of bibliometric indicators: Important field- and time-dependent factors to be considered. Scientometrics, 8(3-4), 177-203.
National Research Council. (2010). Data on federal research and development: A pathway to modernization. Washington, DC: The National Academies Press.
Pan, L., & Katrenko, S. (2015). A review of the UK’s interdisciplinary research using a citation-based approach. Report to the UK HE funding bodies and MRC by Elsevier. Elsevier.
Porter, A. L., & Rafols, I. (2009). Is science becoming more interdisciplinary? measuring and mapping six research fields over time. Scientometrics, 81(3), 719-745.
Pudovkin, A. I., & Garfield, E. (2002). Algorithmic procedure for finding semantically related journals. Journal of the American Society for Information Science and Technology, 53(13), 1113-1119.
Rafols, I., & Leydesdorff, L. (2009). Content-based and algorithmic classifications of journals: Perspectives on the dynamics of scientific communication and indexer effects. Journal of the American Society for Information Science and Technology, 60(9), 1823-1835.
Rafols, l., Leydesdorff, L., OHare, A., Nightingale, P., & Stirling, A. (2012). How journal rankings can suppress interdisciplinary research: A comparison between innovation studies and business & management. Research Policy, 41(7), 1262-1282.
Wagner, C. S., Roessner, J. D., Bobb, K., Klein, J. T., Boyack, K. W., Keyton, J., . . . Börner, K. (2011). Approaches to understanding and measuring interdisciplinary scientific research (IDR): A review of the literature. Journal of Informetrics, 5(1), 14-26.