UNIVERSIDAD DE COSTA RICA SISTEMA DE ESTUDIOS DE POSGRADO MINING SOFTWARE REPOSITORIES TO AUTOMATICALLY MEASURE DEVELOPER CODE CONTRIBUTIONS Tesis sometida a la consideración de la Comisión del Programa de Estudios de Posgrado en Computación e Informática para optar por el grado y título de Maestría Académica en Computación e Informática SIVANA HAMER Ciudad Universitaria Rodrigo Facio, Costa Rica 2023 Dedication To all who have helped me. ii Acknowledgements Throughout this thesis, I have received insurmountable support from many. Luck has been on my side, everything considered. I thank anyone who has helped me directly and indirectly throughout my journey. Though unusual, I would like to thank myself. Despite adversity, I have per- severed with constant dedication. Like anyone, there is a lot that can still be improved on, yet I am happy with what I have achieved and excited for what is to come. Hopefully, I can learn to be more kind and patient with myself, which has been an insurmountable personal quest. I am immensely grateful to my father, mother, and sister, in no particular order, for their constant motivation and support. Without them, I would not be the person I am today or have the opportunity to do what I love. My dogs, Poker and Cherry, also deserve special thanks as they have emotionally supported me through my journey by being so cute, adorable, and lovable. I am also extremely grateful to Dr. Christian Quesada-López for all his invalu- able advice, continuous guidance, and unyielding support. I am extremely thankful that he convinced me to consider researching as a career that I adore. He has truly shaped my career by being such an excellent role model. Through thick and thin, he has been there for me and guided me throughout my process. I also es- pecially thank Dr. Marcelo Jenkins for their professional support of my career. His invaluable experience has been immensely useful. I also thank Dr. Allan Berrocal and Dr. Alexandra Martínez for their help during this thesis. I would also like to thank anyone who has helped me improve professionally as a researcher. I want to especially thank Dr. Bodgan Vasilescu and his lab, Socio-Technical Research Using Data Excavation Lab (STRUDEL), for being so kind and motivational to me during my visit to Carnegie Mellon University (CMU). This thesis is part of the research project No. 834-C1-011 “Procedimiento iii automatizado de medición de contribuciones a partir de repositorios de proyectos de desarrollo de software” of the Universidad de Costa Rica (UCR). This work was supported by the Centro de Investigaciones en Tecnologías de la Información y Comunicación (CITIC), Sistema de Estudios de Posgrado (SEP), and Escuela de Ciencias de la Computación e Informática (ECCI). iv v Table of Contents Dedication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ii Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iii Approval Sheet . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . v Table of Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vi Resumen . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ix Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . x List of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xi List of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xii List of Acronyms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiv 1 Introduction 1 1.1 Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 1.2 Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 1.3 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 1.4 Document structure . . . . . . . . . . . . . . . . . . . . . . . . . . 9 2 Background 11 2.1 Continuous software engineering . . . . . . . . . . . . . . . . . . . 11 2.2 Software metrology and metrics . . . . . . . . . . . . . . . . . . . . 16 2.2.1 Measurement context model . . . . . . . . . . . . . . . . . 16 2.2.2 Goal Question Metric . . . . . . . . . . . . . . . . . . . . . 18 2.2.3 Classifying software measures . . . . . . . . . . . . . . . . 20 2.3 Mining software repositories . . . . . . . . . . . . . . . . . . . . . . 20 2.3.1 Metrics from repositories . . . . . . . . . . . . . . . . . . . 23 vi 2.3.2 Software traceability . . . . . . . . . . . . . . . . . . . . . . 23 2.4 Design science . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 2.5 Empirical software engineering . . . . . . . . . . . . . . . . . . . . 27 2.5.1 Systematic mapping studies . . . . . . . . . . . . . . . . . . 28 2.5.2 Case studies . . . . . . . . . . . . . . . . . . . . . . . . . . 29 2.5.3 Controlled experiments . . . . . . . . . . . . . . . . . . . . 31 2.5.4 Surveys . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 3 Characterizing developer contribution research in software engineer- ing 34 3.1 Study design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 3.2 Results and discussion . . . . . . . . . . . . . . . . . . . . . . . . 39 3.3 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42 4 Developing the automated code contribution measurement procedure 43 4.1 Measurement procedure design . . . . . . . . . . . . . . . . . . . 44 4.2 Tool implementation . . . . . . . . . . . . . . . . . . . . . . . . . . 47 4.3 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52 5 Evaluating the effectiveness of the procedure 53 5.1 Characterizing the process with contributions . . . . . . . . . . . . 55 5.2 Recovering the code traces automatically . . . . . . . . . . . . . . 57 5.3 Measuring the contribution quality . . . . . . . . . . . . . . . . . . 59 5.4 Integrating the procedure in software engineering projects . . . . . 60 5.5 Classifying the value of contributions . . . . . . . . . . . . . . . . . 62 5.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64 6 Discussion and conclusions 65 6.1 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65 6.2 Discussion and future work . . . . . . . . . . . . . . . . . . . . . . 68 Bibliography 76 vii Appendix A How have we researched developers’ contributions in soft- ware engineering? A systematic mapping study 97 Appendix B Measuring students’ contributions in software development projects using Git metrics 142 Appendix C Using git metrics to measure students’ and teams’ contri- butions in software development projects 153 Appendix D Automatically recovering students’ missing trace links be- tween commits and user stories 183 Appendix E Measuring students’ source code quality in software devel- opment projects through commit-impact analysis 198 Appendix F Students projects’ source code changes impact on soft- ware quality through static analysis 209 Appendix G Students’ perceptions of integrating contribution measure- ment tools in software engineering projects 221 Appendix H Development perceptions and behaviors of continuously measuring software contributions 232 Appendix I Classifying the value of code contributions: An exploratory study 245 viii Resumen Las personas desarrolladoras contribuyen a los proyectos en una variedad de for- mas y actividades diferentes. La evaluación de las contribuciones puede ayudar a los procesos, productos, desarrolladores y proyectos de software en la educación, investigación, industria y proyectos de software abierto. Los procedimientos ac- tuales típicamente extraen medidas de repositorios de software. Se necesitan procedimientos y herramientas de medición para capturar mejor la naturaleza compleja y multidimensional de las contribuciones objetivamente, ayudando en la adopción. Por lo tanto, el objetivo de esta tesis es desarrollar un procedimiento automatizado para medir las contribuciones de código de las personas desar- rolladoras mediante la minería de repositorios de software. Para lograr esto, seguimos las guías de la ciencia del diseño para desarrollar la herramienta de procedimiento de medición por medio de tres ciclos principales. Primero, se re- alizó un mapeo sistemático de 166 estudios para caracterizar cómo se han in- vestigado las contribuciones de las personas desarrolladoras en la ingeniería de software. Segundo, se propuso e implementó un procedimiento automatizado de tres fases que extrae datos de repositorios que miden seis dimensiones de las contribuciones de las personas desarrolladoras. Finalmente, la efectividad del procedimiento fue evaluada en ocho estudios empíricos. Analizamos 13 proyec- tos distintos de ingeniería de software educativo, con un total de 246 estudiantes desarrolladores. A lo largo de nuestras evaluaciones empíricas, encontramos evidencia de la efectividad, aceptación, aplicabilidad y utilidad del enfoque. La in- vestigación puede aprovechar el procedimiento automatizado y los conocimientos adquiridos para trabajos futuros. ix Abstract Developers contribute to projects in a variety of ways and different activities. As- sessment of contributions can help education, research, industry, and open-source software processes, products, developers, and projects. Current procedures typ- ically mine software repositories for measures. Measurement procedures and tools are needed to better objectively capture the complex and multi-dimensional nature of contributions, aiding in the adoption. Therefore, the objective of this thesis is to develop an automated procedure to measure developer code contribu- tions by mining software repositories. To achieve this, we followed design science guidelines to develop the measurement procedure tool through the following three main cycles. First, a systematic mapping study of 166 studies was conducted to characterize how developer contributions have been researched in software engi- neering. Second, an automated three-phase procedure was proposed and imple- mented that mines data from repositories measuring six dimensions of developer contributions. Finally, the procedure’s effectiveness was evaluated in eight empir- ical studies. We analyzed 13 distinct educational software engineering projects, totaling 246 student developers. Throughout our empirical evaluations, we found evidence of the effectiveness, acceptance, applicability, and utility of the approach. Research can take advantage of the automated procedure and insights gained in future work. Keywords: software contributions, automated measurement procedure, continu- ous software engineering, software measures, mining software repositories, soft- ware engineering education, empirical software engineering x List of Tables 2.1 The seven wastes . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 3.1 Mapping study research questions with their motivation . . . . . . 36 3.2 The inclusion (I) and exclusion (E) criteria . . . . . . . . . . . . . . 37 3.3 Data extraction fields with their dimensions . . . . . . . . . . . . . 38 3.4 Study quality assessment criteria . . . . . . . . . . . . . . . . . . . 39 4.1 The procedure contribution dimensions with measures . . . . . . . 46 4.2 The current measures visualized by the tool . . . . . . . . . . . . . 52 5.1 Summary of the approaches of the evaluations . . . . . . . . . . . 54 xi List of Figures 1.1 Research subject . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 1.2 Research design science framework . . . . . . . . . . . . . . . . . 6 1.3 Design science process . . . . . . . . . . . . . . . . . . . . . . . . 7 1.4 Research methodology summary . . . . . . . . . . . . . . . . . . . 8 1.5 Detailed contributions of the work . . . . . . . . . . . . . . . . . . . 10 2.1 The “Stairway to Heaven” model . . . . . . . . . . . . . . . . . . . 13 2.2 Continuous* model . . . . . . . . . . . . . . . . . . . . . . . . . . 15 2.3 Measurement context model . . . . . . . . . . . . . . . . . . . . . 17 2.4 GQM+ strategies model . . . . . . . . . . . . . . . . . . . . . . . 19 2.5 Mining software repository . . . . . . . . . . . . . . . . . . . . . . . 23 2.6 Relationship between trace artifacts and trace links . . . . . . . . 24 2.7 Traceability process model . . . . . . . . . . . . . . . . . . . . . . 25 2.8 Summary of the design science approach . . . . . . . . . . . . . . 26 2.9 Systematic mapping process . . . . . . . . . . . . . . . . . . . . . 28 2.10 Case study process . . . . . . . . . . . . . . . . . . . . . . . . . . 32 2.11 Experiment process . . . . . . . . . . . . . . . . . . . . . . . . . . 32 3.1 Mapping study process . . . . . . . . . . . . . . . . . . . . . . . . 35 4.1 Automated measurement procedure for developer contribution . . 45 4.2 Tool gathering code contributions with their relationship to user sto- ries from software repositories . . . . . . . . . . . . . . . . . . . . 48 4.3 The tool’s main user interfaces. . . . . . . . . . . . . . . . . . . . . 51 xii 5.1 General characteristics of the empirical studies . . . . . . . . . . . 55 5.2 Methodology of the process evaluation . . . . . . . . . . . . . . . . 56 5.3 Methodology of the traceability evaluation . . . . . . . . . . . . . . 58 5.4 Methodology of the quality evaluation . . . . . . . . . . . . . . . . 60 5.5 Methodology of the integration evaluation . . . . . . . . . . . . . . 61 5.6 Methodology of the value evaluation . . . . . . . . . . . . . . . . . 63 xiii List of Acronyms API: Application Programming Interface CSE: Continuous Software Engineering DC: Design Cycles EC: Empirical Cycles GQM: Goal Question Metric ITS: Issue Tracking Systems KQ: Knowledge Question MSR: Mining Software Repositories OSS: Open Source Software RQ: Research Question SE: Software Engineering SO: Specific Objective VCS: Version Control Systems VSBE: Value-Based Software Engineering xiv Autorización para digitalización y comunicación pública de Trabajos Finales de Graduación del Sistema de Estudios de Posgrado en el Repositorio Institucional de la Universidad de Costa Rica. Yo, _______________________________________, con cédula de identidad _____________________, en mi condición de autor del TFG titulado ___________________________________________________ _____________________________________________________________________________________________ _____________________________________________________________________________________________ Autorizo a la Universidad de Costa Rica para digitalizar y hacer divulgación pública de forma gratuita de dicho TFG a través del Repositorio Institucional u otro medio electrónico, para ser puesto a disposición del público según lo que establezca el Sistema de Estudios de Posgrado. SI NO * *En caso de la negativa favor indicar el tiempo de restricción: ________________ año (s). Este Trabajo Final de Graduación será publicado en formato PDF, o en el formato que en el momento se establezca, de tal forma que el acceso al mismo sea libre, con el fin de permitir la consulta e impresión, pero no su modificación. Manifiesto que mi Trabajo Final de Graduación fue debidamente subido al sistema digital Kerwá y su contenido corresponde al documento original que sirvió para la obtención de mi título, y que su información no infringe ni violenta ningún derecho a terceros. El TFG además cuenta con el visto bueno de mi Director (a) de Tesis o Tutor (a) y cumplió con lo establecido en la revisión del Formato por parte del Sistema de Estudios de Posgrado. INFORMACIÓN DEL ESTUDIANTE: Nombre Completo: . Número de Carné: Número de cédula: . Correo Electrónico: . Fecha: . Número de teléfono: . Nombre del Director (a) de Tesis o Tutor (a): . FIRMA ESTUDIANTE Nota: El presente documento constituye una declaración jurada, cuyos alcances aseguran a la Universidad, que su contenido sea tomado como cierto. Su importancia radica en que permite abreviar procedimientos administrativos, y al mismo tiempo genera una responsabilidad legal para que quien declare contrario a la verdad de lo que manifiesta, puede como consecuencia, enfrentar un proceso penal por delito de perjurio, tipificado en el artículo 318 de nuestro Código Penal. Lo anterior implica que el estudiante se vea forzado a realizar su mayor esfuerzo para que no sólo incluya información veraz en la Licencia de Publicación, sino que también realice diligentemente la gestión de subir el documento correcto en la plataforma digital Kerwá. SISTEMA DE ESTUDIOS DE POSGRADO ESTUDIANTE Sivana Alexa Hamer Campos B63296 "Mining software repositories to automatically measure developer code contributions" x Sivana Alexa Hamer Campos B63296 9 0115 0250 sivana.hamer@ucr.ac.cr Christian Quesada-López 27 de julio 2023 8722 1111 xv 1 Chapter 1 Introduction The contribution of developers to projects is a central notion of software engineer- ing [1]. A contribution is defined as the act of giving or supplying something [2]. Therefore, software contributions are defined as any action to the software project performed by anyone involved in the software engineering process. While con- structing software, developers participate in diverse technical and non-technical tasks hence contributions are also varied. Different types of contribution have been investigated including code developing and reviewing [1, 3–12], bug report- ing and fixing [1, 10, 11, 13–15], communication through messages [1, 12], and models [16]. Contributions are assessed to help software engineering products, people, processes, and projects [1, 3, 10, 13, 14]. Assessment is defined here as the evaluation, calculation, or valuation of a contribution. Works commonly assess contributions through measuring data mined software repositories [10], artifacts produced and archived during the software development cycle [17]. Measure- ment in software engineering provides valuable information helping quantitative decision-making and aiding in understanding, controlling, and improving software products, processes, resources, and projects [18,19]. Assessing software contributions provides benefits for research, industry, open- source projects, and education: • In software engineering research, creating and understanding phenomena through constructs, “things” that are inderectly measured, is core to knowl- edge acquisition [20]. For example, size, coupling, and cohesion are soft- ware constructs [21]. Along the same line, contributions are another soft- 2 ware engineering construct. For example, when we determine health we can utilize software contribution measures that serve as indicators to represent the concept [22]. As such, improved operationalizations of our measured indicators can generate more empirical evidence of software engineering practices. Hence, we can better understand software engineering phenom- ena. • In software projects, there are increasingly rising demands for rapidness, fre- quency, fluidity, adaptability, and customer-centricity, with trends such as ag- ile processes and continuous software engineering emerging with increased momentum and prominence [23–25]. Continuous improvement through mea- surement is a crucial orthogonal aspect for software projects [26]. As such, contributions assessments with measurement can also be used to help mon- itor development, plan future projects, identify risks, recognize developers, improve behaviors, increase efficiency, and make informed business deci- sions [1,3,10,13,14]. Management can thus use contribution data to better current projects through increased insights and future projects through im- proved planning. • For developers, ensuring their involvement in developing software is needed for the sustainability of projects. Specifically, the sustainability of open- source projects is vital as most participants are volunteers [27–29]. As such, papers have extensively studied which factors help or intervene with the health of projects [30–33]. Research has also focused on determining mea- sures that detect the health of communities and ecosystems [34, 35]. Less work has focused on creating interventions that help community health, usu- ally through tooling [36]. As recognition is a success and motivational factor for developers [37–39], adequately recognizing all developer contributions could help the health of software projects. At the same time, developers can benefit from tracking their progress and gaining additional information to improve their skills. • In software engineering education, there are gaps between industry needs for newcomer developers and what is taught in university classrooms [40, 41]. Automated approaches can help train future professionals by provid- ing valuable feedback on their contribution to improve continuously. At the same time, contribution assessment approaches can be utilized by educa- 3 tors to aid in grade assignment [42, 43], which is still recently considered challenging for instructors [44]. Additionally, educators can use quantitative information to aid in teaching by determining learning challenges. This way, discussion opportunities are created and refined within the classroom to fur- ther comprehension of software practices. Based on previous course data, improvement opportunities can also be found. Though the assessment of contributions is beneficial, there are still several challenges with the effectiveness and usability of assessment approaches. • Software contributions are diverse. Assessment approaches need to con- sider many aspects including both technical and non-technical contributions [10, 45]. At the same time, research consensus on the definition of contributions has not yet been achieved [1, 10, 11]. Even recently, studies have men- tioned that works are not explicitly detailing the types of contributions con- sidered [46]. This is further exacerbated by the difficulty of objectively, fairly, and accurately discerning individual contribution in team projects [43]. • When assessing contributions, in depth characterizations is needed. Soft- ware is too complex to be quantified by a single metric [47]. Approaches do not only need to quantify the size, which may also be difficult to measure, but also it has to consider other aspects such as quality [6, 48], complex- ity [49], or value [5]. Hence, the multiple dimensions must be considered when assessing software contributions. As such, assessment models may be unrepresentative by invisibilizing certain contributions to projects [50]. • Assessment approaches need to be easy to integrate for adopters. Software measures require a considerable upfront investment of time [51]. Software tools help developers in producing software [52]. Examples include inte- grated development environments that aid in the development of code [53, 54], version control systems that help manage software versions [55, 56], bots that automate software tasks [57–59], and artificial intelligence models that suggest code [60, 61]. Thus specialized tools can help adopters bene- fit from objective contribution assessment insights while reducing adoption costs. Yet, few such tools have been implemented in research and integrated into projects. 4 Due to the previous challenges, automated measurement procedures of soft- ware contribution can be created. The procedure mines from software reposi- tories different measures to account for and acknowledge the diverse and multi- dimensional nature of software contributions. Thus aiding software engineering projects, people, products, and processes. 1.1 Objectives This thesis, therefore, aims the following: Main objective. Develop an automated procedure to measure developer code contributions by mining software repositories. Hence, the research question of the work is: How can developer code contri- butions be measured by mining software repositories? To achieve our research goal and question, three specific objectives were proposed. Specific objective 1. Characterize software engineering research of devel- oper contributions. The motivation of this first specific objective (SO1) is to aggregate software en- gineering research of developer contributions systematically. We characterize the contribution types, research topics, research design, measurements, assessment approaches, contexts, threats to validity, and challenges. Therefore, the findings can help empirically uniformize terms, consolidate current findings, and determine gaps in developer contribution research. Specific objective 2. Design an automated procedure to measure devel- oper contributions by mining software repositories. The second specific objective (SO2) develops an automated procedure that measures code contributions from software repositories. This includes the design of the procedure and implementation of the tool. The measurement procedure characterizes six dimensions of developer contributions in a three-step process. Meanwhile, our implementation for code contributions has three main phases: 5 setup, extraction, and usage of information. The creation of tools helps with tech- nology transfer [62]. Specific objective 3. Evaluate the effectiveness of the automated proce- dure to measure developer code contributions by mining software reposito- ries. Finally, the third specific objective (SO3) evaluates the effectiveness of the measurement procedure. Hence, we determine if the tool was successful in achieving the desired results and has perceived relevance. To achieve this, we conducted five different types of empirical studies: characterizing the process, re- covering traces, measuring quality, integrating the procedure, and classifying the value. Researching the measurement procedure helps formalize and generate more evidence of engineering practices. Insights from the evaluations provide improvement opportunities for contribution assessment approaches. 1.2 Methodology The methodology of this work followed the guidelines of design science in infor- mation systems and software engineering, studying artifacts within a context [63]. Fig. 1.1 shows our research subject and its interactions. The research artifact studied was the proposed automated procedure to measure developer code con- tributions by mining software repositories, while the context studied was in soft- ware engineering projects developed by students. Validating a solution in academia can also serve as the first step for empirical technological transfer to industry [62]. Automated procedure to measure developer code contributions by mining software repositories Artifact Software engineering projects devel- oped by students Context Interaction Figure 1.1: Research subject The design science framework is shown in Fig. 1.2. The social context of this thesis includes stakeholders who may affect the project or be affected by it and the sponsors. Therefore, the stakeholders are instructors and students. The design science context corresponds to the design problems and knowledge questions 6 of the automated procedure to measure developer code contributions. Finally, the knowledge context is the existing theories, specifications, and designs. In our case, these include the fields of continuous software engineering, software metrology, mining software repositories, empirical software engineering, and soft- ware contributions. The background of continuous software engineering, software metrology, mining software repositories, and empirical software engineering is ex- plained in Chapter 2. Meanwhile, the research of developer contributions is char- acterized in Chapter 3. Social context instructors, students Design Design of a code contribution measurement procedure Investigation Find knowledge about the code contribution measure- ment procedure in context Design Science Knowledge context continuous software engineering, software metrology, mining software repositories, empirical software engineering, software developer contributions Goals, budgets Designs Existing knowledge and designs New knowledge and designs Existing answers to knowledge questions New answers to knowledge questions Artifact & context to investigate Knowledge & new design problems Figure 1.2: Research design science framework In design science, design cycles solve engineering design problems, while empirical cycles answer knowledge questions. The DC and EC of our work are shown in Fig. 1.3. 1. The main design cycle of the thesis, named DC1, focuses on the design of the automated procedure to measure developer code contributions artifact. To achieve this, first, the research problem was defined (T1). 2. Then the empirical cycle, called EC1, characterizes the software engineering research of developer contributions through a mapping study. In this cycle, 7 the goals for the mapping study were defined (T2), protocol designed (T3) and validated (T4), study conducted (T5), and results analyzed (T6). This cycle was achieved by carrying out a systematic mapping study [64,65]. 3. After EC1, the main design cycle DC1 was continued. The treatment was designed (T7) inspired by the measurement context model [66], validated (T8), and implemented (T9) using iterative development [23,24]. 4. Finally, the automated procedure developed in DC1 was evaluated in the empirical cycle, called EC2, through empirical studies [62]. In this cycle, for each study the goals were defined (T10), protocol designed (T11) and validated (T12), study executed (T13), and results analyzed (T14). 5. The development of DC1 and evaluation of EC2 was repeated in multiple iterations. DC1 Design an automated pro- cedure to measure de- veloper contributions T1. Problem investigation Define problem and motivations T7. Treatment design Design measurement procedure’s conceptual model T8. Treatment validation Validate the measurement procedure model T9. Treatment implementation Implement measurement procedure EC1 Characterize soft- ware engineering research of devel- oper contributions T2. Research problem analysis Define goals for mapping study T3. Research and interference design Design the mapping study’s protocol T4. Validation Validate the mapping study’s protocol T5. Research execution Conduct the mapping study T6. Data analysis Analyze the results of the mapping study EC2 Evaluate the auto- mated procedure to measure developer code contributions T10. Research problem analysis Define study’s goals T11. Research and interference design Design the study’s protocol T12. Validation Validate the study’s protocol T13. Research execution Conduct the study T14. Data analysis Analyze the results of the study Figure 1.3: Design science process Technical research questions (RQ) for the design cycles and knowledge ques- tions (KQ) for the empirical cycles were also defined. RQ focuses on creating artifacts to achieve a goal. Meanwhile, KQ gathers knowledge about the world. Based on our objectives, the KQ and RQ are: KQ1: How are developer contributions researched in software engineering? RQ2: How to design an automated procedure to measure developer contributions by mining software repositories? 8 KQ3: What is the effectiveness of the automated procedure to measure develop- ers code contributions by mining software repositories? KQ1 contributes to SO1, RQ2 satifies to SO2, and KQ3 responds to SO3. The relationships between the thesis objectives, cycles, questions, methods, and products are shown in Fig 1.4. The detailed methodology of each cycle is covered in the respective chapters of this work. Their location is provided in Section 1.4. SO1. Characterize software engineer- ing research of de- veloper contributions EC1. Characterize software engineer- ing research of de- veloper contributions KQ1. How are developer contributions researched in software engineering? Mapping study Kietchnham & Charters 2007, & Petersen 2008 Mapping study charac- terizing software en- gineering research of developer contributions SO2. Design an auto- mated procedure to measure developer contributions by mining software repositories DC1. Design an au- tomated procedure to measure devel- oper contributions RQ2. How to design an automated procedure to measure developer contributions by mining software repositories? Measurement context model & Iterative development Abran 2010, Sommerville 2016, & Pressman 2010 Design and implemen- tation of the automated procedure to measure developer contribution SO3. Evaluate the effec- tiveness of the automated procedure to measure developer code con- tributions by mining software repositories EC2. Evaluate the au- tomated procedure to measure developer code contributions KQ3. What is the effec- tiveness of the automated procedure to measure developers code con- tributions by mining software repositories? Empirical studies Wohlin et al. 2012 Empirical evaluations of the automated procedure that measures devel- oper code contribution Objectives Cycle Questions Methods Products Figure 1.4: Research methodology summary 1.3 Contributions This thesis, therefore, contributes to the following. • We characterized software engineering studies of developer contributions, summarizing the works to standardize terms, consolidate findings, and de- termine research gaps. • We developed an automated procedure that measures mutiple dimensions of developer contributions by mining software repositories implementing the measurement tool for code contributions, aiding software engineering re- searchers or adopters of continuous measurement. • We evaluated the automated measure procedure in empirical studies, de- termining the effectiveness, acceptance, applicability, and utility of the ap- proach. 9 A representation of the contributions with the relationship to the research ques- tions is shown in Fig. 1.5. 1.4 Document structure The structure of this research thesis is as follows: • Chapter 2 describes the background of continuous software engineering, metrology and metrics, mining software repositories, design science, and empirical methodologies. • Chapter 3 presents the systematic mapping study that systematically char- acterizes software research of development contributions. • Chapter 4 explains the developed automated measurement procedure and tool that automatically measures developer code contributions. • Chapter 5 presents the empirical evaluation of the effectiveness procedure and tool focused on five topics: process characterization, traceability recov- ery, quality measurement, perception integration, and value classification. • Chapter 6 summarizes the main findings, contributions, and future work of the thesis. 10 Develop an automated procedure to measure developer code contribu- tions by mining software repositories RQ2. How to design an automated procedure to measure developer contributions by mining software repositories? KQ1. How are developer contributions researched in software engineering? KQ3. What is the effec- tiveness of the automated procedure to measure developers code con- tributions by mining software repositories? GC1. We characterized soft- ware engineering studies of developer contributions, summarizing the works to standardize terms, consolidate findings, and determine research gaps. GC2. We developed an auto- mated procedure that measures mutiple di- mensions of developer contributions by mining software repositories implementing the mea- surement tool for code contributions, aiding software engineering re- searchers or adopters of continuous measurement. GC3. We evaluated the au- tomated measure pro- cedure in empirical studies, determining the effectiveness, accep- tance, applicability, and utility of the approach. • Types of contributions • Methodological approaches • Measures • Topics • Contexts • Challenges • Threats to validity • GQM model • Measurement procedure • Tool • Process • Traceability • Product • Integrations • Value Q ue st io ns G oa l G en er al co nt ri bu tio n P ro du ct s Figure 1.5: Detailed contributions of the work 11 Chapter 2 Background This chapter presents the background. Section 2.1 describes the field of con- tinuous software engineering, detailing trends that motivate and situate our work within the field. Section 2.2 provides foundations of software metrology and met- rics, used for the design of the automated measurement procedure. Section 2.3 details the mining software repository field and the general processes, utilized for the extraction of data. Section 2.4 explains design science, the methodology of this thesis. Finally, Section 2.5 explains empirical software engineering method- ologies utilized throughout the empirical studies. 2.1 Continuous software engineering Software engineering is founded on software processes, activities, actions, and tasks leading to the development of software. Though the applicable activities are adaptable for each project, there are a set of activities that are always included: communication, planning, modeling, construction, and deployment [23]. Software processes’ high-level, abstract descriptions are represented in software process models [24]. Software engineering processes are constantly evolving to respond to the in- dustry’s needs of dealing with market changes and providing more accurate cus- tomer solutions [26]. These business demands have led to a paradigm shift in software development, learning from the customer’s software usage after delivery and deployment [67]. The software process evolution from traditional develop- ment towards continuous development is represented in the “Stairway to Heaven” 12 model, shown in Fig. 2.1. The model’s five steps are described below [67,68]. Traditional development. Companies start with traditional development. Tradi- tional software models are based on the waterfall model, the first published software process model. The model presents sequential, separate steps of requirement generation, analysis, design, coding, testing, and operations. It is an example of a plan-driven model where planning and scheduling are done before starting implementation. However, this model has several limi- tations in accommodating change, managing uncertainty, and providing fast, workable software versions [23,24,67,69]. R&D Organization All Agile. To tackle many of the challenges with traditional processes, companies adopt agile practices [68]. These software processes interleaved activities in feedback cycles to produce software incrementally and rapidly for customers. Adopting agile practices improves the software’s fluidity, adaptability, and rapidness [23, 24, 70]. Different agile software de- velopment processes have been proposed such as Extreme Programming (XP) [71] and Scrum [72]. Nonetheless, project management and system verification still follow traditional development approaches [68]. Continuous integration. As agile benefits materialize, verification gets involved in continuous practices [68]. Continuous integration is a development prac- tice with developers frequently integrating their work. Each integration is automatically built and verified with tests [73]. This leads to development teams producing software faster and reducing bugs [74]. Continuous deployment. Once continuous integration is incorporated into the process, project management gets involved in the agile development cycle and requests faster development cycles. Continuous deployment is adopted, constantly and automatically pushing out code changes to production. This allows for continuous customer feedback and eliminates waste of non-valuable products for the customer [67,68,74]. R&D as an Innovation System. Finally, deployed functionality is treated as an experiment. Customer feedback is analyzed, with techniques such as A/B testing, and used to determine their needs. The delivered functionality is treated as a starting point that is further tuned [67,68]. 13 Traditional Development R&D Organization All Agile Continuous Integration Continuous Deployment R&D as an Innovation System Figure 2.1: The “Stairway to Heaven” model [68] Continuous software engineering (CSE) is a process in which software is de- veloped continuously and incrementally, allowing for continuous learning and im- provement [75]. Lean software engineering is a relevant and useful lens to un- derstand CSE [25]. Lean software development (a.k.a lean software engineering) seeks to create as rapidly as possible value for the customer [76]. Value can mean many different things and even has its research field denominated Value-Based Software Engineering (VSBE) [77]. VSBE is concerned with incorporating value in development projects to create products that are more useful for stakehold- ers [78]. In agile, value is providing the customer what they want and require [77]. In lean software engineering, value is providing customers with their unmet needs to delight them and achieved only when the customer receives the product [76,79]. Finally, in VBSE, value is “relative utility, worth, or importance” [78]. Lean is characterized by its principles, broadly applicable ideas, and insights. These principles are described below [76]. Eliminate waste. Waste is anything that does not add value to the customer. De- velopment should spend time only on what adds customer value. Eliminat- ing waste requires learning to see, uncovering the source, and eliminating the waste, iteratively. Seven types of waste are defined in manufacturing and can be translated to software development, shown in Table 2.1. For example, a waste in software engineering is motion. The motion within the process can be interrupted when the focus needs to be re-established and artifacts are reassigned. 14 The seven wastes in manufacturing The seven wastes of software development Inventory Partially done work Extra processing Extra process Overproduction Extra features Transportation Task switching Waiting Waiting Motion Motion Defects Defects Table 2.1: The seven wastes [76] Amplify learning. Compared to manufacturing, software development is a cre- ative process where learning is expected and should be amplified. While developing, perfecting the solution requires trial and error. This leads to a process where quality is focused on solving a problem in an easy-to-use and cost-effective manner; variability is expected as different solutions are pro- vided to each unique customer; and, iterations are desirable as they are the most effective way to generate knowledge in ill-defined problems. Decide as late as possible. Options should be kept open for as long as possi- ble. This provides insurance against uncertainty, where decisions are taken when more knowledge is available and easier to predict. This leads to the need of building the capacity for change inside the system. Deliver as fast as possible. Rapid delivery provides customers with what they need now providing a competitive advantage. Speed allows to delay deci- sions until more is known, have reliable feedback, and increase learning. Empower the team. Frontline workers equipped with the expertise and guided by a leader are better equipped than anyone to taking technical and process decisions. As decisions are taken late and delivery is rapid, management by an authority is not possible. Therefore, workers have decision-making responsibility and design authority on satisfying customer needs. Build integrity in. Systems have to focus since day one on fulfilling customers’ needs, being cohesive as a whole, and maintaining usefulness over time. To achieve this, excellent information flows between customers and developers must be adopted. 15 See the whole. Improving the software process requires considering the whole process from end to end. Optimizing only parts of the process makes sub- optimization likely to occur. The Continuous* model, shown in Fig. 2.2, displays the end-to-end, holistic process of continuous software engineering [25]. The process is divided into three main activities: business, development, and operations. The business fo- cuses on adaptable planning and budgeting based on the business environment. Development concentrates on frequent and rapid aspects of software creation and management, such as integration, delivery, and verification. Finally, the opera- tions monitor system use to detect problems as early as possible. The foundation of these activities is continuous improvement, and continuous innovation and ex- perimentation. They focus on small and big changes, respectively, to improve processes based on data-driven decision-making. Development Continuous integration Continuous deployment Continuous delivery Continuous ver- ification/testing Continuous compliance Continuous evolution Business strategy Continuous planning Continuous budgeting Operations Continuous use Continuous trust Continuous run- time monitoring Continuous improving Continuous experimen- tation and innovation BizDev DevOps Figure 2.2: Continuous* model [25] A critical, orthogonal aspect to the evolution of software processes is organi- zational performance metrics [26]. In this work, we, therefore, define continuous evaluation or measurement as continuously collecting software metrics. We con- sider continuous evaluation as a part of continuous improvement, and continuous innovation and experimentation activities. 16 2.2 Software metrology and metrics Metrology is the science of measurements, including its theoretical and practical aspects [80]. As software engineering is the “application of a systematic, disci- plined, quantifiable approach to the development, operation, and maintenance of software; that is, the application of engineering to software...” [81], measurement is fundamental to the discipline [66]. Furthermore, measurements are part of the foundational engineering topics in the Software Engineering Body of Knowledge (SWEBOK), thus it is part of the generally accepted knowledge in software engi- neering [82]. There are three vital terms in metrology: measurement, measure, and metric. Measurement. The process where numbers or symbols are assigned to attributes of real-world entities based on clearly defined rules [19]. Measure. Number or symbol assigned to the entity to characterize an attribute [19]. Metric. The term refers to a quantitative measure relating to the degree to which an object possesses a given attribute [81]. Software measures exist for different aspects such as code [83], and qual- ity [84–86]. Software measures are also commonly utilized to assess software engineering contributions [10]. Notably, many works count commits as contribu- tions [50]. To gather measurement results, a process must be defined. This process is denominated as the measurement context model and is described in Sub- section 2.2.1. An approach to selecting the measures must also be defined. Specifically, in this project, the goal-oriented approach Goal Question Metric is used and is described in Sub-section 2.2.2. Finally, different types of software measures exist, a classification and description of specific software measures is presented in Sub-section 2.2.3. 2.2.1 Measurement context model The process to gather and exploit measurement results is defined by measure- ment context models. They are integrated by three steps, shown in Fig. 2.3: de- sign of the measurement method, application of the measurement method, and 17 Determination of the objectives Characterization of the concept to be measured (entity & attribute) Design or selection of the meta-model (Measurable con- truct: Relationships across entuty and attribute)Definition of the numberical as- signment rules Step 1 Design of the measurement method Software documen- tation gathering Construction of the software model Application of the assignment rules Measurement results Audit Step 2 Application of the measurement method Quality model Budgeting model Productivity model Estimation model Estimation process · · · · · · Step 3 Exploitation of the measurement results (examples) Figure 2.3: Measurement context model [66] exploitation of the measurement results. A measurement method is a general se- quence of logical operations used to obtain a measurement. Meanwhile, a mea- surement procedure is a specific set of operations to obtain particular measure- ments according to a given method [66]. Each step of the measurement context model will be detailed below. Design the measurement method. A measurement method must be designed or selected from previous methods. To design the measurement method, four sub-steps must be followed. First, measurement objectives are defined based on what we want to measure, what is the measurement point of view and who are the intended users. Then, the measured concept is character- ized by defining the measured entity, measurable attributes, and the empir- ical detailed definition of the attributes. Concurrently, the representation of the entities and attributes with their relationships are abstractly described in a meta-model. The meta-model must also describe how to recognize the measured attributes. Finally, the numeric assignment rules are defined with their measurement unit. Apply the measurement method. In this step, the measurement method is ap- plied to measure specific context (i.e. measurement procedure). To apply the measurement procedure, five sub-steps are carried out. Firstly, informa- tion about the software is gathered. Secondly, the software model is built ac- cording to the proposer meta-model. Thirdly, the numeric assignment rules are applied. Fourthly, the measurement results are documented with de- 18 tails such as the measurement unit, measurement process, and measurers. Finally, the results are verified and audited to ascertain their correctness. Exploitation of the results. Measurement results are exploited for both quanti- tative and qualitative analysis using models, such as evaluation, budgeting, and estimation models. 2.2.2 Goal Question Metric An approach to determine what to measure is Goal Question Metric (GQM) [19]. It is a goal-oriented method that assumes that purposeful measures for organi- zations must be defined by project and organization goals, traced by the goals to the data-defined goals, and provided by a framework to interpret the data [62,87]. The result of the approach is the specification of a measurement system targeting specific issues, and rules to interpret the measurement data. The methodology has three main components: goals, questions, and metrics [62,87]. Conceptual level (Goal). Goals are defined for objects, for multiple purposes, from distinct focuses, from various points of view, relative to the environ- ment [88]. Operational level (Question). Questions characterize objects to assess a goal based on characterization models [62,87]. Questions bridge the subjectivity of goals with quantitative measurements [87]. Quantitative level (Metric). Objective or subjective data is associated with ques- tions to answer them quantitatively [62,87]. GQM+ Strategies extends the GQM methodology, adding the capability of aligning the measurement program with business goals and strategies, software goals, and measurement goals [88]. The GQM+ Strategy approach is composed by the following elements [88,89]: Business goals. Organization goals to achieve strategic objectives. Goals are conformed by the activity performed to achieve it, focus, object under consid- eration, quantified magnitude, timeframe where it must be achieved, scope, constraints, and relationship with other goals. 19 Goal Strategy Context/ assumption Goal + Strategies element GQM Goal Question Question made measurable trough Metric Metric Metric Interpretation model GQM Graph realized by a set of influences influences Goal+Strategies element GQM graph Measures achievement of Made measurable through leads trough a set of is part of Figure 2.4: GQM+ strategies model [89] Context factors. Organizational environment variables that affect the used mod- els and data. Assumptions. Estimated unknowns that may affect data interpretation. Strategies. Possible approaches to achieve goals, refined by activities. Lower level goals. Set of lower-level goals inherited from the strategy of upper- level goals. Interpretation models. Models help interpret data to determine if goals have been achieved. Goal+Strategies element. A goal with its strategies, activities, and assumptions. GQM graph. A GQM goal with the corresponding questions, metrics, and inter- pretation models. It is associated with a Goal+Strategy element. The model of GQM+ Strategies is shown in Fig. 2.4. There are multiple goal levels, allowing for strategies for each level. Goals may be realized through strate- gies that may also define other goals. Context information influences the definition of goals and strategies. At every level, GQM plans are defined by measurement goals, questions, metrics, and interpretation models to measure the achievement of the respective goal and strategy [88,89]. 20 2.2.3 Classifying software measures All measurement methods must define the software entities and attributes it is de- scribing. As such, measures can be classified according to software entities into processes, products, and resources. Furthermore, within each entity software can also be further categorized into internal or external attributes. Internal attributes can be measured by examining the entity. While external attributes are measured in terms of the behavior of the entity in its environment. In the following sections, software processes, products, and resources measures are detailed, respectively [19]. Process. These measures are related to software activities and are usually asso- ciated with time. Process attributes include cost, controllability, observability, and stability. Only a limited number of internal aspects can be measured such as the duration, effort, or number of incidents of a specified type. Pop- ular process metrics are the number of changes, time, and effort. Product. These measures focus on analyzing the attributes of software artifacts. Some attributes measured include size, integrity, usability, portability, testa- bility, complexity, and maintainability. Common product metrics include Lines of Code (LOC), Function Points (FP) [90], Cyclomatic Complextiy [91], de- fects, and the number of features. Resources. These measures examine entities required in the software process, such as personnel, materials, tools, and methods. This can help determine the magnitude, cost, and quality of our resources. Some resource measures are the number of developers, the hardware requirements, and the cost of personnel. 2.3 Mining software repositories Software configuration items started out being physically stored as paper docu- ments, thus managing the information was difficult, error-prone, time-consuming, and complex. Consequently, repositories were used as a center of the accu- mulation of knowledge. Repositories started as people but it was challenging, meanwhile, nowadays databases are used as software repositories [23]. 21 Software repositories are artifacts produced and archived while developing software [17]. Mining software repositories (MSR) is the field that analyzes and cross-links interesting and actionable information from software repositories about software products and projects [92]. The field was consolidated in 2004 with the organization of the workshop on Mining Software Repositories at the Inter- national Conference of Software Engineering (ICSE) [93]. Software repositories can be classified as: historical repositories, recording the evolution of software ar- tifacts; run-time repositories saving the execution and usage of applications; and, code repositories containing the source code of developed applications [92]. Ex- amples of historical repositories include source control repositories (i.e., source code management systems), bug repositories (i.e., issue tracking systems), and archived project communications [92,94]. Source code management (SCM). Record and maintain changes of source code artifacts [94]. They are also known as version control systems (VCS), revi- sion control systems (RCS), software configuration management, or source code control [95]. Examples of this type of repositories are Git [56], Subver- sion [96], and CVS [97]. Issue tracking systems (ITS). Track bugs, features, and inquiries from their cre- ation to their final state. Bugs represent defects, features are new function- ality or enhancements, and inquiries are questions or technical support for customers [98]. They are also known as requirement tracking systems and bug tracking systems (BTS) [17,98]. Examples include Bugzilla 1 and Jira 2. Archived project communications (APC). Track discussions about any aspect of software development [92]. Examples include mailing lists, bulletin boards, question and answer forums, and microblogs storing discussions [99–101]. There is a wide spectrum of techniques and purposes used in MSR [17]. i. Metadata analysis gathers the metadata stored in software repositories us- ing methodologies such as regular expressions, heuristics, and common- sequence matching. 1https://www.bugzilla.org/ 2https://www.atlassian.com/software/jira 22 ii. Static source code analysis extracts facts from versions of a software system using techniques for parsing, processing, and extracting facts from source code. iii. Source code differencing and analysis focuses on the changes between ver- sions of source code while using methods to express both syntactic and semantic changes. iv. Software metrics quantify aspects of software products, projects, and pro- cesses such as size, effort, cost, functionality, quality, and complexity. v. Visualizations use information-visualization techniques to represent data am- plifying cognition. vi. Clone-detection methods find source code with similar textual, structural, and semantic compositions, applying text-based and token-based techniques or code abstractions like abstract syntax trees and program dependency graphs. vii. Frequent-pattern mining discovers trends, patterns, and rules utilizing meta- data, source code data, and difference data with techniques such as itemset and sequential-pattern mining. viii. Information-retrieval is a methodology used to classify and cluster textual units of various similarity concepts based on metadata. ix. Classification with supervised learning is based on machine learning tech- niques that use metadata and historical data to acquire intricate knowledge to improve tasks. x. Social network analysis considers techniques to derive and measure invisi- ble relationships between social entities. A typical MSR process has the following four steps, shown in Fig. 2.5, data extraction, data modeling, synthesis, and analysis [100]. First, the raw data is extracted from the software repositories. Then, the data may be preprocessed before it is used. Then, the data can be synthesized by applying data mining or learning techniques. Finally, to conclude the data is analyzed and interpreted. 23 Data extraction Data modeling Synthesis Analysis Figure 2.5: Mining software repository process [100] One of the many data that can be extracted are metrics that are described in Section 2.2. Further details of some metrics mined from repositories are pre- sented in Sub-section 2.3.2. Furthermore, to model the information mined from repositories, data must be linked. A full description of traceability is explained in Sub-section 2.3.2. 2.3.1 Metrics from repositories The metrics mined from software repositories are very diverse. Some of these studies focus on quality, developers, activities, and other categories. Quality met- rics focus on topics such as defects, vulnerabilities, bugs, evolution, anti-patterns, and merges [48, 102–107]. Some measures used in this category are boun- ties [105], code smells [48], and bug density [105]. Developer-focused studies have focused on the contributions, activity, and productivity [1, 108, 109]. Devel- oper studies have used metrics such as code owned [109], inequality indexes [108], and developer contribution [1]. Some metrics topics include energy consump- tion [110], semantic similarities [111], and number of daily stars [112]. Furthermore, these metrics are extracted from a variety of software reposi- tories types. Version control systems are software repositories very used, such as Git, CVS, and Subversion [1, 48, 102, 104–106, 108–113]. Furthermore, issue tracking systems are also used including Jira, Bugzilla and Google Code [1,102– 105, 108]. Finally, other repositories have also been used to extract measures including release blogs, vulnerabilities databases, mailing lists and wikis [1, 105, 111]. 2.3.2 Software traceability Even though it is typical for MSR studies to focus on only one repository, using and linking data between repositories can improve the quality of the data and provide a more complete view to practitioners [92]. The activity of establishing links between 24 Source artifact Target artifact Trace link Primary trace link direction Reverse trace link direction Figure 2.6: Relationship between trace artifacts and trace links [116] and within software artifacts is called traceability [114]. Traceability can focus on tracing requirement artifacts (i.e. requirement traceability), software engineering artifacts (i.e. software traceability), and software artifacts with system-level com- ponents (i.e., system traceability) [115]. The building blocks of traceability are the traceability artifacts and the trace links. Trace artifacts are the artifacts that are being traced in a project and can be classified as either source artifacts or target artifacts. Source artifacts are the origin of the trace, while the target artifact is the destination of the trace. Trace links are directional associations between a pair of artifacts. The direction of the association is called the primary trace link direction, and the opposite direction is denominated as the reverse trace link direction. As trace links can be associated in both directions they are bidirectional. The relationship between the traceability artifacts and the trace links is shown in Fig. 2.6. Based on these definitions, a trace can either mean the triplet of the source artifact, trace link, and target ar- tifact, or the act of following a trace link. Furthermore, traces can be atomic or chained. Atomic traces only have one source code and target artifact. Mean- while, a chained trace is a chain of linked traces. Traceability is the potential of establishing and using traces [116]. The traceability process model, shown in Fig. 2.7, abstractly defines the ac- tivities to establish and use traceability. These activities are traceability strategy, creation, use, and maintenance. First, the traceability strategy is planned and managed. Stakeholder and system traceability requirements are determined, de- signed, and implemented. Then, traceability links between artifacts are created either manually, automatically, or semi-automatically. The links can be created in real-time (trace capture) or later (trace recovery). Updating and creating trace- ability links to keep the traceability information is performed in the traceability maintenance step. The maintenance can be either continuous, immediately af- ter changes to the artifacts, or on-demand, when it is requested. Finally, these traceability links are used both short-term and long-term. Some uses are re- 25 Creating Using Planning and manag- ing traceability strategy Maintaining Trace created [use requested] Trace created [el- ements change] Trace creation planned [create directed] Trace maintained [use requested] Trace maintenance re- quired [elements change] Requirements for traceability changed Trace maintenance planned Maintenance feedback Creation feedback Use feedback Trace envisaged Traceability required Project archived Trace retired Figure 2.7: Traceability process model [117] quirement validation, impact analysis, verification, validation, and change man- agement [116,117]. The activities required to create or use traces are called tracing. Tracing can be: manual, established by a human; automatic, established using tools, tech- niques, or tools; or semi-automatic, established using both manual and automatic traces [116]. Manual trace links are frequently incomplete, inaccurate, and un- trustworthy [118–120]. Even with the ubiquity of software repositories, developers often forget or fail to link the artifacts [118]. Hence, automatic approaches to creat- ing and maintaining software links have been proposed and developed, including heuristics, information retrieval, machine learning, and artificial intelligence [121]. 2.4 Design science Design science is a methodology that studies artifacts within a context. The ar- tifacts interact with the context to improve something in the context. The full ap- proach of design science is shown in Fig. 2.8 [63]. There are two main parts of design science: design and investigation. Design problems require a change in the real world, thus one of many possible solutions is designed to achieve a goal. This problem can also be represented in technical research problems or technical research questions. Knowledge questions ask 26 Research problem Design problem • Improve a problem context • By (re)designing an artifact • That satified some requirements • In order to help stakeholders to achieve some goals Knowledge question - Descriptive questions: • What, when, where, who, how many, how often, etc. - Explanatory questions: • Why? Causes, mechanisms, reasons. Part I: Framework for design science - Problem investigation • Stakeholders? • Goals? • Conceptual framework? • Theory of phenomena? (statistical, causal, architectural) • Contribution to goals? - Treatment design • Requirements! • Contribution to goals? • Available treatments? • New treatment design! - Treatment validation • Effects? • Requirements satisfaction? • Trade-offs? • Sensitivity Part II: Design cycle Artifact design - Conceptual framworks • Architectural structures • Statistical structures - Theorethical generalizations • Natural and social science theories • Design science theories Part III: Theories - Problem analysis • Conceptual framework? • Knowledge questions? • Population? - Research setup design • Objects of study? • Treatment? • Measurement? - Inference design • Descriptive inferences? • Statistical inferences? (statistical models) • Abductive inferences? (casual or architectural explanations) • Analogical inferences? (generalizations) - Validation inferences against research setup Research design - Research execution - Data analysis • Descriptions? • Statistical conclusions? • Explinations? (casual, architectural) • Generalizations by analogy? • Answers to knowledge questions? Part IV: Empirical cycle Theories improve our capability to describe, explain, predict, design Figure 2.8: Summary of the design science approach [63] 27 about the world as it is , assuming there is only one answer to the question. Furthermore, the problem context of an artifact can be extended to contain the social and knowledge context of the artifact. The design science problem iterates over two problem-solving cycles, the de- sign cycle and the empirical cycle. Design problems are treated by the design cycle and knowledge questions can be answered in the empirical cycle. The de- sign cycle iterates over problem investigation, treatment design, and treatment validation and is part of the engineering cycle. But, it is restricted to the first three phases of the engineering cycle. The design cycle’s outcome is a validated artifact design, but not an implementation. The empirical cycle analyzes the prob- lem, designs the research and interference, validates, executes, and analyzes the data. This thesis utilizes different empirical software engineering methodologies, explained in Section 2.4. 2.5 Empirical software engineering As software engineering is a human-centered context, empiricism allows us to gather evidence of socio-technical phenomena from observation from real-world projects [62, 122, 123]. There exists a wide variety of research methods, such as laboratory experiments, surveys, and field studies, that have distinct research objectives and focuses [124,125]. Empirical standards for the software engineer- ing field have also been developed to improve the transparency and quality of peer-reviewed studies [126]. Research can be classified depending on the source of information from which the data was generated. Primary studies gather information from the the primary data sources [62]. As in, from observing the studied phenomena. Examples of such approaches include experiments or case studies. Meanwhile, when the source of information is research works, this is considered a secondary study [64]. Examples of such approaches include systematic literature reviews and mapping studies. Finally, if the source of information is other secondary studies the work is therefore considered a tertiary study [127]. In software engineering, tertiary studies tend to follow the same guidelines as secondary studies. Still, different threats to validity for such approaches exist such as double counting [128]. The studies can also gather quantitative or qualitative data [62]. Quantitative 28 data represents numerical representations of phenomena. For example, counting the number of commits created by developers. The information is thus usually analyzed with statistical techniques. Meanwhile, qualitative data represents non- numerical information. For example, in a survey, we can gather the perceptions of developers utilizing a tool. To analyze such information, qualitative research synthesis approaches such as narrative synthesis, thematic synthesis, and ground theory, to analyze [129]. This research uses two principal empirical software engineering methodolo- gies for the empirical cycles. First, we utilize mapping studies, a type of sec- ondary research, to gather information about the research area (Section 2.5.1) in Chapter 3. Furthermore, case studies (Section 2.5.2) and controlled experiments (Section 2.5.3) are utilized in our evaluations of Chapter 5. Surveys are also used to gather data for our evaluations (Section 2.5.4). 2.5.1 Systematic mapping studies Systematic mapping is a type of secondary study (studies that analyze previous research) that provides an overview of a broad topic area, identifying and classify- ing all the research of the topic area [130], thus focusing on answering more gen- eral questions [131]. The main goal of mapping studies is to provide an overview of the research area by identifying the quantity and type of research in a field. The mapping study process, shown in Fig. 2.9, has five steps and outcomes [65]. Definition of Re- search Question Conduct Search Screening of Papers Keywording us- ing Abstract Data Extraction and Mapping Process Process steps Review Scope All Papers Relevant Papers Classification Scheme Systematic Map Outcomes Figure 2.9: Systematic mapping process [65] Definition of research questions. First, the research questions are defined, es- tablishing the research scope. Often, mapping studies ask questions related to frequencies over time or publication forums. 29 Conduct search. In this step, primary studies are gathered. Strategies to search for primary studies include snowballing, manual search, and database search. Additionally, strategies to develop the search such as defining the terms based on the PICO (Population, Intervention, Comparison, and Outcome) model, deriving keywords from known papers, and consulting librarians or experts. The search can be evaluated using a test set of known papers, expert evaluation, checking authors’ websites, and utilizing test-retest [132]. It is a good strategy to utilize a quasi-gold standard of manually selected papers to tune the performance of the search string [133]. Screening of papers. Papers are included or excluded based on criteria to re- move studies that do not answer research questions. Strategies to decide to include or exclude papers include decision rules, resolving disagreements among multiple researchers, and identifying objective criteria to evaluate ob- jectivity [132]. Keywording using abstracts. The classification scheme is defined to map stud- ies. A technique to create the classification scheme is keywording, which has two main steps. First keywords and concepts are gathered from review- ers reading abstracts, through introductions and conclusions may also be used. Then, the keywords are combined to gather a high-level understand- ing of the research to help identify representative categories. Keywording is one of many ways to analyze data from literature reviews or mapping stud- ies [129]. Data extraction and mapping process. Finally, the data is extracted and classi- fied based on the scheme, though the scheme can further evolve during the extraction. Thus producing the results of the systematic mapping study. 2.5.2 Case studies Case studies are a research methodology that studies contemporary phenomena that is hard to study in isolation. The case study process, shown in Fig. 2.10, has five major steps [134,135]. Case study design. First, the reasons for studying the case study are defined. Based on this objective, what we expect to achieve, is established and is fur- 30 ther refined in research questions, propositions, and hypotheses. Further- more, the case and the units of analysis are selected. A case is anything that is a contemporary software engineering phenomenon in its real-life setting (e.g., software projects, individuals, processes, or technologies). Then, the case can be further integrated by subunits (i.e. units of analysis). Both the case and units of analysis are selected intentionally to find typical, critical, revelatory, or unique information. To refine the context, the theoretical frame of reference or related work is defined. The general decisions of how the data is collected are also defined in the design, taking into account the data source, quantity, and type. Finally where the data will be collected is selected ensuring there is enough coverage to enhance the validity and reliability of the findings. Preparation for data collection. Which and how the data will be collected is de- fined. Data can be collected directly from the source with the interaction of the researcher (first degree), indirectly where the research does not inter- act with the source (second degree), or independently from artifacts that are already available (third degree). Furthermore, using multiple data sources is advantageous to limit the effects of analyzing only one data source (i.e. triangulation). Methods to recollect data include interviews, focus groups, observations, archival data, and metrics. Collecting evidence. In this step the previously defined procedures and proto- cols are executed on the studied case, gathering the results. Analysis of collected data. With the information gathered, the data is analyzed to understand what happened and seek patterns within the data. Data can be analyzed both qualitatively and quantitatively. Qualitative data analysis uses techniques such as hypothesis generation and hypothesis confirma- tion on non-numerical data. However, quantitative data analysis focuses on working with numbers, including techniques such as descriptive statistics, correlation analysis, predictive models, and hypothesis testing. Reporting. Lastly, the findings are reported tailoring the report depending on the audience. In the case of research, the case study should introduce the work, describe related work, detail the case study design, state the results with their analysis and present the conclusions. 31 2.5.3 Controlled experiments Experiments or controlled experiments in software engineering are empirical stud- ies in which variables are varied as part of the studied setting based on random- izing treatments [62]. Quasi-experiments are similar to experiments, yet they do not fully randomize treatments. Experiments are usually done in laboratory-type settings, to better control the treatments to the control variables. There are five main steps for the experimental process, presented in Fig. 2.11. Scoping. In the first activity, the goal, objectives, and hypothesis of the experi- ment must be defined clearly. Planning. Based on the goal of the study, the foundations of how the experiment design is defined. This includes defining the context, variables, subjects, and instrumentation. The previously defined hypothesis is refined. Additionally, depending on the objective and resources, the design type is chosen be- tween experiments and quasi-experiments. Finally, threats to validity must be analyzed, mitigated, tackled, reduced, or reported. Operation. Based on the design, the experimental design is put into operation. Hence, the subjects are prepared if needed, the design executed, and the data collected ensuring the validity. Analysis & interpretation. With the experimental results, we can now analyze the data and interpret it. The same techniques for data analysis as case studies can be used. Experiments tend to gather more quantitative data. Presentation & package. Finally, the experimental results are reported. This step is similar to the last step of case studies. 2.5.4 Surveys Surveys are a research technique to gather information from or about people [62, 136]. Surveys can be used as a primary source of information for a study or to supplement other empirical software engineering strategies. Surveys intend to understand phenomena based on a sample of a population to generalize findings. There are two main types of data collection for surveys: questionnaires and inter- views. For the first type of survey, questionnaires, respondents answer physical 32 Case study design Preparation for data collection Collection of evidence Analysis of collect data Reporting Figure 2.10: Case study process adapted from [134,135] Experiment scoping Experiment planning Experiment operation Analysis & in- terpretation Presentation & package Figure 2.11: Experiment process [62] 33 or digital questions from forms and are given back as evidence. In the second type of survey, interviews, questions are asked by interviewers and answered by respondents. The answers are recorded to be later transcribed for analysis. There are three different types of surveys: descriptive, explanatory, and ex- ploratory. Descriptive surveys focus on understanding what is the distribution of the population to enable assertions. Explanatory studies focus on exploring cer- tain claims in populations. For example, understanding why certain phenomena become present. Lastly, explorative are used pre-study execution to test the sur- veys and refine the study design. Surveys, as with any empirical methodologies, need to designed [137]. Though survey instruments can be reutilized, this is rare in software engineering [138]. Hence, survey instruments must be validated (i.e., measuring what it sets to mea- sure) and reliable (i.e., having reproducible data) [139]. Sampling considerations to gather representative subsets of a population are vital while conducting sur- veys [140]. Surveys can be analyzed with similar techniques to case studies and experiments, though the type of analysis depends on the data [141]. Some com- mon pitfalls and considerations exist such as validating the correctness or com- pleteness of the survey, partioning data into segments based on demographics, and transforming scales. 34 Chapter 3 Characterizing developer contribution research in software engineering This chapter presents a summary of the design and results of the systematic mapping study that characterizes how software development contributions are re- searched in software engineering. The specific objective tackled is: SO1. Characterize software engineering research of developer contribu- tions. The work thus provided a synthesis of the state of the art. To achieve this aim, we conducted a systematic mapping study [64,65] that characterizes the de- veloper contribution research in software engineering. This work, therefore com- plements prior primary works that characterize the types of contributions using practitioner guidelines and information. Understanding the literature can help uni- formize terms, consolidate findings, and identify gaps for future work. At the same time, approaches and measures helped inspire the tool design (Chapter 4). Fur- thermore, the empirical design approaches and challenges motivated the empiri- cal evaluations (Chapter 5). The work of this chapter is based on the following paper, in publication as part of this thesis [142]. In Section 3.1 we present the design of the study. Then, in Section 3.2, a synthesis of the main results is provided. Finally, in Section 3.3 we summarize and discuss main findings. The full work is in Appendix A. 35 3.1 Study design An overview of our approach is shown in Fig. 3.1. The process had four main steps: definition of the research questions, conducting the search, screening pa- pers, and data extraction and analysis. Figure 3.1: Mapping study process We started the process by defining seven research questions that were an- swered. They are shown with their motivation in Table 3.1. Thus, we classify the contribution types, research topics, research design practices, measurement constructs, assessment approaches, contexts under study, threats to validity, and challenges. Then, we searched digital libraries to gather the set of potentially relevant pa- pers to achieve our goal and answer our questions. This required specifying two main components: digital databases, and search strings. We gathered the poten- tially relevant papers utilizing three digital databases: Scopus, IEEE Xplore, and Web of Science. They were chosen as they have been used previously in soft- ware engineering secondary studies and provide determinist results. Thus, the selected databases are in line with previous work [143]. For the query construc- tion, we utilized a set of 11 control papers that acted as a quasi-gold standard to validate our search string [133]. The resulting search query was: (“software engineering” OR “software devel- opment” OR “software system” OR “software project”) AND contribution AND ( assess* OR evaluat* OR measur* OR examin* ) AND ( developer* OR team* OR 36 Table 3.1: Mapping study research questions with their motivation ID Question Motivation RQ1 What types of developer contributions have been investigated in software en- gineering studies? To construct a classification of the types of contributions in software en- gineering research. RQ2 What topics are addressed in software contribution works? To discover the areas in which de- veloper contributions have been re- searched. RQ3 What are the research design prac- tices in developer contribution stud- ies? To discover the design of the studies. RQ4 What measures are detailed in devel- oper contribution research? To collect which measures studies have proposed and employed. RQ5 What are the assessment approaches presented in developer contribution re- search? To find the assessment approaches used to assess developer contribu- tions. RQ6 What contexts are investigated in de- veloper contribution studies in soft- ware engineering? To assemble the research settings of the works, summarizing the usage scenarios of the research. RQ7 What threats to validity are described in software engineering literature? To gather reported threats to validity of the works to serve as a checklist used in the design of future studies. RQ8 What contribution assessment chal- lenges are reported by the re- searchers? To assemble the prevalent challenges of assessing developer contributions indicated by the literature. 37 student* ). We added terms, that specified the subject of the papers (e.g., con- tribution), delimited the context (e.g., software engineering), and were adaptable for different terms (e.g., student, team), inspired by common words found in our control studies. The final query hence retrieved all the control papers, achieving a sensitivity of 100%. This surpassed the threshold of 80%, hence the query had an acceptable performance [133]. This resulted in 1, 112 potential relevant papers. After removing duplicates, 828 distinct potentially relevant studies remained. Papers were then screened to only synthesize relevant studies for our work. First, we defined collaboratively and iteratively a set of inclusion and exclusion cri- teria as a basis for our screening, shown in Table 3.2. Then, works were screened in a two-phase process. Initially, works were screened in the first phase by the two first authors of the study, based on the title, abstract, and keywords, resulting in the selection of 287 studies. In case of any doubt about the relevance of the work, it was included to be checked in the full-text screening. Subsequently, a full-text screen was conducted, selecting 166 relevant papers. The final precision of the string was 20%, hence the performance of the string was within acceptable ranges as indicated by software engineering guidelines [133]. Validation of the screening showed inter-coder agreement (90.1%), with acceptable inter-rater performance (Cohen’s Kappa of k = 0.794 and p < 0.001, Krippendorff’s alpha of α = 0.793). Table 3.2: The inclusion (I) and exclusion (E) criteria ID Type Criteria E1 Exclusion Not written in English. E2 Exclusion Unrelated to software engineering. E3 Exclusion Without full-text available. E4 Exclusion Are not primary studies. I1 Inclusion Study software developer contributions. I2 Inclusion Assess software developer contributions. I3 Inclusion Investigated software engineering projects. Finally, the extraction and analysis phase was carried out. First, data extrac- tion fields, shown in Table 3.3, were defined to answer each research question. The number of themes, codes, and occurrences of the code are shown for each data extraction item. Then, the selected relevant papers’ information was syn- thesized through thematic analysis [129, 144], also inspired by the keywording approach [65]. We constructed the categories collaboratively with an iterative in- 38 tegrated approach that combined deductive codes from previous works [17, 20, 21,23,24,62,65,82,92,99,100,145–153] and inductive codes that emerged from the data. We therefore extracted the data, created codes, translated codes into classifications, and when appropriate created themes for the data. Works could be classified for the same data extraction item in multiple categories. For example, a study could consider a contribution to the code and communication in a project. Additionally, we gather general information from the studies and other data from online sources such as GitHub or the tools. Lastly, for each study, quality assessment criteria were evaluated to quantify the level of detail of the reports de- scribing if they had enough information to answer our research questions. Scores ranged between 0 and 10. The criteria are shown in Table 3.4. Table 3.3: Data extraction fields with their dimensions Question Data extraction field Themes Codes Occurences RQ1 Type of contribution 4 11 210 RQ1 Development activity 3 9 245 RQ2 Topics 4 35 232 RQ3 Research contribution - 5 252 RQ3 Research method - 4 195 RQ3 Research type - 5 168 RQ3 Analysis type - 2 177 RQ3 Analysis techniques - 5 327 RQ4 Measures 12 295 1391 RQ5 Extraction techniques 2 9 292 RQ5 Software repositories 9 26 197 RQ5 Software artifacts 9 23 470 RQ5 Tools - 79 133 RQ5 Datasets - 53 84 RQ6 Context - 6 185 RQ6 Project - 9127 11838 RQ7 Threats to validity 3 + 4 64 943 RQ8 Challenges 3 23 342 Total 53 9781 17681 Total without projects 53 654 5843 39 Table 3.4: Study quality assessment criteria ID Quality criteria (Score - Specification) Q1 Does the study explicitly mention the goal? (1 - Mentions the goal, 0 - The are no goals) Q2 Does the study apply the assessment? (2 - Applied in at least two projects, 1 - Applied in one project, 0 - It is not applied in any project) Q3 Does the study describe how the contributions are assessed? (2 - Describes the as- sessment explicitly, 1 - Describes the assessment implicitly, 0 - No description about the assessment) Q4 Does the study describe how the contribution assessment was extracted? (2 - Describes the data extraction explicitly, 1 - Describes the data extraction implicitly, 0 - The data extraction is not described) Q5 Does the study analyze the assessed contribution data? (1 - The data is analyzed, 0 - The data is not analyzed) Q6 Does the study explicitly describe the threats to validity? (2 - Describes threats to validity explicitly, 1 - Describes threats to validity implicitly, 0 - There are no described threats to validity) 3.2 Results and discussion As for the results, the oldest publication was from 1989 and the most recent was from 2021 (the year in which the query was executed). Ever since 2008, there has been at least one study that assesses developer contributions by year. Additionally, more works were published in conferences (122 studies) than journal articles (44 studies). Though we found 105 distinct venues of publication, only 26 venues had at least 2 research with more than two studies. The average quality score of the studies was 7.2. Hence, the studies were, in general, comprehensive reports. For the following results, it is important to note that studies could be classified into multiple categories. For example, studies can consider multiple types of contribution or utilize multiple approaches. Starting with the types of contributions (RQ1), we found four different main types studied in the literature. First, contributions can be related to the devel- opment of the software products (130 studies), such as code contributions (111 studies), issues (16 studies), and tasks (2 studies). Another common type of con- tribution investigated is those related to the involvement within the community (23 studies). Thus contributing by communication (23 studies) and providing atten- tion (1 study). Contributions that support software projects were rare (6 studies). 40 These focused on documentation for the project (4 studies) and administration (3 studies). Furthermore, some publications were ambiguous in their definition of a contribution (26 studies). A total of 11 different types of contributions were identified. The most studied types of contributions were related to the code (111 studies). The least investigated types of contributions were related to design (2 studies) and attention (1 study). The topics of study of the works were focused on four themes (RQ2). The first most studied area aimed to comprehend software development phenomena (107 studies). For example, understanding contributions inequality between team members (4 studies). These comprehension works focused on people (53 stud- ies), artifacts (50 studies), or systems (28 studies). This was followed by works that focused on training and teaching students (34 studies). Then, another topic of study was focused on constructing models and artifacts (29 studies). Finally, the least studied topics were related to proposing contribution assessment ap- proaches (22 studies). For the research design (RQ3), we classified the research contribution, re- search method, research type, analysis type, and analysis techniques. Research has mostly contributed with models (91 studies) and validating research as the type (87 studies). Studies mostly analyze quantitative data (105 studies). Statisti- cal analysis techniques are also very common (131 studies). We identified 297 distinct measures mentioned in 161 works, which we classi- fied into 12 construct types (RQ4). The most utilized type of measure is related to repository activities (120 studies). Particularly, the number of commits has been the most used measure (54 studies). This is followed by demographic information (73 studies). For example, the number of contributors to a project (23 studies). Meanwhile, performance (64 studies) and quality (57 studies) metrics were also frequent. The least common construct measures are those related to the sig- nificance (14 studies) and the purpose (6 studies). Additionally, we also classify the data type (quantitative or qualitative) and extraction type (objective or subjec- tive) of each measure mentioned. Almost all measured constructs are related to quantitative objective data. Regarding the assessment approaches (RQ52), we gathered the extraction techniques, software repositories, software artifacts, tools, and datasets. Publica- tions have mostly focused on utilizing mining software repository techniques (131 studies). Notably, by mining the metadata of the projects (121 studies). For exam- 41 ple, gathering the number of lines of code changes in a commit. Code repositories are the most prominent type of repository used (110 studies). As such, code arti- facts are also frequently used (124 studies). Still, human perception is also widely used to assess contributions (45 studies). We found that there were no tools or datasets that were used in more than three works that were specifically proposed for contribution assessment works. For example, through experts, peers, or auto- evaluating the contributions. As for the context (RQ6), more than half of the works have been investigated within the open-source software domain (109 studies). Student education (40 studies) and practitioner-industry (20 studies) projects were also mentioned as domains. We additionally gathered specific open-source projects mentioned and mined from their repository popularity and activity information. We found that Java (39 studies) is the most investigated language and the owner of most projects is Apache (26 studies). Additionally, few projects have been studied with less than 100 commits (7 studies). We also gathered the threats to validity reported in the studies (RQ7). Threats to validity were found in most works (144 studies). These were classified as inter- nal (118 studies), construct (117 studies), external (98 studies), or conclusion (73 studies) threats to validity. We found 64 different threats to validity in the studies. The most mentioned threat was the generalization across selection or settings (83 studies). This was followed by the threat of mono or unbalanced operationalization (78 studies), where there may be issues due to not correctly representing a con- struct. Notably, some studies mention this threat in their contribution assessment. Another widely mentioned threat was related to inaccurate data (48 studies). For example, issues with user identities or manipulated commits. Finally, regarding the challenges of assessing contributions (RQ8), there were 23 mentioned across 72 papers. These were grouped into three categories. The most mentioned challenge is related to how usable the approaches are (57 stud- ies). Examples of such pain points include the personalization of approaches (28 studies) and techniques being simple to use (28 studies). Then, the effectiveness of the works was also commonly mentioned (47 studies). The most reported issue within this category is related to how comprehensive the approaches are (37 stud- ies). The least mentioned theme was related to the dependability of the results (45 studies). The most mentioned pain points for this challenge are related to the fairness of results (24 studies) and encouragement for adopters (24 studies). 42 3.3 Summary In this chapter, we characterized the research of developer contributions in software engineering with a mapping study. The results show that what a contri- bution is in software engineering is broad. Yet, we need to provide more usable, effective, and dependable assessments of contributions. More focus should be given to considering more than code contribution activity measures. Work is also needed to help the reusability and replicability of the results through tools and datasets. Another opportunity is to evaluate the approaches with adopters. The findings indicate that software contribution measurement tools are needed as assessment approaches need to be usable, effective, and dependable. Despite the large amount of work, less than 30 studies proposed tools and none have been extensively utilized in research. Additionally, we have to consider more than just the number of commits as a software engineering contribution including the quality and value. For the code, we can also consider the relationship of these contributions with others through traces. Finally, implementing and evaluating in practice the approaches is also needed. In this thesis, we help tackle prior issues by proposing a measurement proce- dure model that considers multiple dimensions of a software contribution. With such procedures, we can improve the comprehensiveness of the operationaliza- tion of software contributions. Additionally, we implemented the procedure through a tool, aiding in the adoption. The classifications and insights gained from this study inspired both the measurement procedure design (Chapter 4) and the em- pirical evaluations (Chapter 5). 43 Chapter 4 Developing the automated code contribution measurement procedure In this chapter, we detail the automated procedure design and development of a software measurement to collect developer code contributions mining software repositories. The following specific objective is addressed: SO2. Design an automated procedure to measure developer contributions by mining software repositories. This measurement procedure helped automatically measure developer contri- butions. To achieve this aim, we designed a measurement procedure inspired by the measurement context model [66]. This model clarifies the distinct steps for de- signing measurement methods, specifically for software engineering. Additionally, we instantiated the procedure by developing a tool for code contributions following iterative development [23,24]. This chapter is based on the paper published as part of this thesis [154]. We detail the measurement procedure design in Section 4.1. The development of the tool is described in Section 4.2. We finalize by summarizing the contributions of this chapter in Section 4.3. The work is present in Appendix G. 44 4.1 Measurement procedure design Our measurement procedure, shown in 4.1, has three main steps: design, appli- cation, and exploitation. In the following, we describe each step. The first phase of the measurement procedure is the design where the mea- surement method is proposed. First, we defined the goals and measures of the measurement procedure with a generic Goal Question Metric model [19]. The goal of the measurement procedure is to automatically measure developer contri- butions by mining software repositories. Based on this, we proposed six different questions based on different dimen- sions of the contributions with four types of questio