"You must be the change you wish to see in the world." – Mahatma Gandhi

I'm a PhD student at University of Lille (France), University of Mons (Belgium) and member of the Spirals research team at Inria Lille since October 2017.
My major research interests are Mining Software Repositories, Process Mining, and Machine learning applications in Software Engineering.
My supervisors are Laurence Duchien and Tom Mens.

My PhD subject

Distributed open source software development teams are using specific processes to help them in their daily activities such as coding, testing, bug fixing, issue tracking, code reviewing and continuous integration. For each of these activities, dedicated tools and processes are being used that may vary from one development community to another and also from one time period to another. A typical example of this is the well-known bug fixing activity that is supported by bug tracking tools such as Bugzilla and tends to follow a dedicated bug fixing process. According to the ISO/IEC 14764:2006 standard on ``Software Engineering -- Software Life Cycle Processes -- Maintenance'', there are four main types of software maintenance: corrective, adaptive, perfective and preventive. Bug fixing falls within the category of corrective maintenance, defined as the ``Reactive modification of a software product performed after delivery to correct discovered problems'' [2]. Bug fixing is an essential activity to ensure software quality and it is estimated that 80% of software development effort is spent on software maintenance [1]. My PhD research project consists of empirically studying the evolution of bug fixing processes in open source software, and determining the possible inefficiencies in these processes. Based on these insights, we aim to propose recommendations tools to help individual developers and developer communities to improve upon their practices. This paper only focuses on the empirical part of the project, which is a prerequisite for the second part. We will follow a mixed-methods research approach, combining quantitative and qualitative socio-technical analyses of the bug fixing process. These analyses may be used to compare the effectiveness of different developer communities of software ecosystems. To achieve the above goals, we propose to combine techniques and tools from the field of Software Process Mining with empirical and statistical approaches used in the field of Software Repository Mining. While process mining aims at discovering, analyzing and improving processes, repository mining aims at analyzing the rich historical data available in software repositories (such as version control repositories, bug and issue trackers, mailing lists or Q\&A websites) to uncover interesting and actionable information about software projects. By combining techniques and results from both fields, we intend to improve the existing state of the art in research and practice for large-scale distributed open source software development.
  • [1] Iso/iec/ieee international standard for software engineering - software life cycle processes - maintenance. ISO/IEC 14764:2006 (E) IEEE Std 14764-2006 Revision of IEEE Std 1219-1998), pages 1–58, Sept 2006.
  • [2] Gregory Tassey. The economic impacts of inadequate infrastructure for software testing. National Institute of Standards and Technology, RTI Project, 7007(011), 2002.

Master Thesis

Title: Configuration rule mining for supporting variability in business process models.

Objective: Different organizations can share one customizable process model called “Configurable process model”. These models can be configured by (de)selecting (ir) relevant features in order to derive new variants. I’m interested in supporting stakeholders to derive variants from the configurable process by means of “configuration rules”. In short, these rules should reflect the best practices that help in selecting the configuration options in the process model. I’m using automated techniques (data mining & process mining) in order to learn the rules from past experience by using the execution logs of existing variants.

Our Work:

  • Collect event logs and Configurable process models.
  • Establish a mapping between the configurable process model and the existing event logs
    • which activity in the model belongs to which event in the log and how the configurations are manifested in the event logs (this step will be performed using process mining techniques, e.g. alignment)
  • Extract the frequently selected configurations and their correlations using data mining techniques (e.g. association rule mining)
  • Implement the approach as a plugin in ProM, an academic open source tool for process mining techniques.