Dealing with large changes in Open-Source projects
A few weeks back, I was approached by Asmir about some changes he wanted to make to the Doctrine Migrations library. The changes sounded simple enough: to add support for multiple migration namespaces. This has been on our wishlist for a while, but we’ve never managed to get to it and also didn’t have contributors do the work. Since he has a few customers where this is a repeated pain point, he decided to invest the time to work on this. We talked a little bit about how to approach this, and when I assured him that this is definitely something we want to have in the library, he started working on it.
Since this feature would touch some of the most basic concepts in the library, it became clear relatively soon that it couldn’t just be added in without refactoring a good portion of the code base. This was when he approached me again, saying that it’s going to be a large pull request, because splitting up the work into multiple pull requests wasn’t feasible. One of the reasons for the large number of changes is that the architecture of the library is quite old, with a few core classes handling things they shouldn’t be handling. So before he could add the feature he wanted, Asmir needed to refactor the configuration system, version handling, and some of the classes that track executed migrations.
Now, in an ideal work environment, these changes would be extracted to a separate, smaller pull request for easier reviewing. Once the pull request has been merged, the next changes can be made, again in a smaller pull request, and so on. However, our Open-Source world is far from ideal: maintainers often don’t have the time to quickly review pull requests, so dealing with changes that build on several pull requests can be a tedious, time consuming process. This slows down development and can be a big source of frustration for the contributor.
Together, we decided to not do that, and instead keep changes in a single pull request to avoid unnecessary friction when splitting changes into multiple pull requests, reviewing them, getting them merged, and then repeating that process multiple times until the feature we wanted to build was done. When everything was said and done, we ended up with a single pull request consisting of more than 100 commits and a combined diff of almost 20000 lines.
Dealing with large pull requests is a difficult endeavour in any setting: it takes more time to review them and reviewers tend to get sloppy, sometimes reducing the review to skipping over files, just catching a quick glimpse at the changes without thoroughly reviewing the code and thinking about its impact. The resulting large number of comments in such a pull request can also be very demotivating to a contributor.
In this specific instance, Asmir and I were able to meet up at a conference in Berlin and took some time to go through the changes face to face. Instead of just looking at the code and trying to figure out the reason for some changes, we sat down and went over the architecture without looking at the existing code. Asmir explained the idea behind the new classes and the new architecture, and highlighted the reason for some changes that I didn’t think were necessary. This is something that we often forget when working on Open-Source software: talking to the people you work with is a lot more productive than just writing comments on a pull request. You don’t even need to be in the same place to do this: working via video chat and screen sharing will do just fine and helps resolving issues quickly.
Even then, at some point in time the pull request will have to be reviewed. This is where GitHub shines, as they have introduced features to make large pull requests easier to review. When Asmir gave me the go-ahead to review the code, I started going through the PR from top to bottom. As I got done with a file, I marked it as “viewed”, which collapses the file so I no longer see it. That way, when reviewing the code review over multiple sessions, I don’t have to think about where I left off, but I can just scroll down to the first uncollapsed file and go from there. Using the “Jump to file” box at the top allows you to quickly jump to other files to cross-reference code or double-check something you may already have reviewed. It isn’t optimal, but it makes such large-scale changes to projects easier to handle for both contributors and maintainers.
Looking past this specific example, there are some things both contributors and maintainers can do to make working with large pull requests easier and more enjoyable for everyone involved. Let’s start with the contributors’ perspective.
For one, I wouldn’t suggest taking a project, rewriting large portions of it, and then presenting the maintainers with a large pull request: chances are it will either be closed without merging, or outright being ignored, which is even worse. Instead, create an issue in the project talking about the changes you want to make and discuss it with maintainers and users of the library alike. This will help judge interest in the changes you want to make, while also opening a discussion channel around how to make the changes and to gather feedback.
Discussion doesn’t necessarily have to happen in the open, it can also be done in a closed setting like Slack, Discord, or any other tool the project may be using. Be careful though as different maintainers of a project might have different opinions, so you may want to talk to more than just one maintainer. In the example above, Asmir and I talked through our 1-on-1 chat on the Doctrine Slack, which lead to loss of information when communicating with other maintainers. Thus, I would always recommend to discuss this in an open channel to allow others to join the conversation. If possible, try to have video chats or a face-to-face sessions with one or more maintainers to discuss your changes, explain the architecture, or get general feedback.
When creating your pull request, don’t be afraid to create it while you’re still working on your code. Mark it as a work in progress, and let maintainers know when you want to have feedback. This can be extremely valuable to maintainers to see where the work is heading and to let you know when you’re taking a wrong turn before a lot of work goes to waste.
When you are done with your pull request, do a code review on it yourself. Add comments for any complex or controversial changes, explain complicated sections, or highlight key areas of the code. Besides making it easier for others to understand your changes, this can help you spot errors that you may have missed while writing the code, improving the review experience for yourself and others.
An important part of your pull request are going to be tests. If you make a large number of changes, make sure they are backed up by tests and other code quality tools. This is the best way to avoid future issues, and to ensure that future changes don’t break the functionality you’ve provided. It also gives maintainers a sense of security if they see that you’ve thought about testing your changes. It’s important that you try to cover all new functionality, even if the project previously didn’t have good test coverage. This greatly contributes to the project’s stability.
From a maintainers perspective, it’s important to be available to a contributor that makes such large changes. This means keeping a feedback channel open, checking in on the contributor, and asking them if they need help with anything. Be open to the contributor about the time you’re able to invest in helping them or reviewing code. There’s no worse feeling for a contributor to make large amounts of changes, then waiting for a long time to get the code reviewed.
When reviewing, don’t focus too much on code style. This is true for any pull request, but the larger a pull request is, the more likely people are to engage in bike shedding. You don’t want to leave tens of comments asking contributors to change names of classes or their members, while only leaving a handful of comments on the architecture or algorithms involved. Whenever possible, use the “suggested change” functionality to suggest new code. This allows the contributor to quickly apply review feedback to the code.
When reviewing the pull request over multiple days, consider submitting the review in batches or keep the contributor informed via other means. It’s important that you take your time to review the changes in detail, but a contributor won’t know if you’re actually reviewing their code or if you’re busy doing other things and won’t get to their PR for a while.
Last but not least, keep in mind that a pull request isn’t a gift from a contributor to a project. As a contributor, you are providing the project with your most valuable resource: time. At the same time, you’re asking maintainers to invest their future time to maintain your code, making sure it keeps working, and fixing bugs in the process. If you are contributing large amounts of code to a project, talk about future maintenance. For contributors, this means staying active in the project to help out with any issues involving your code, or even joining the project team. For maintainers, don’t be afraid to ask past contributors for help if you can’t figure out an issue with code they provided.
The suggestions I’ve made above won’t work for every project, every person, or every time. In our day jobs, we have meetings, discussions, a regular feedback about the work we do and problems we face. It is important to remember this when working on Open-Source, because it may not seem obvious due to geographical separation or time constraints. Communication between contributors and maintainers is key in making sure that working on Open-Source is productive and rewarding for everyone involved.