Three Components for Reviewing a Pull Request

Overview

Mechanics
Social Aspects
Technical Aspects

Thomas J. Fan

This talk on Github: thomasjpfan/data-umbrella-2021-reviewing-prs

Glossary

PR: Pull request
Reviewer: Person reviewing the pull request
Contributor: Person who create the pull request
Merged: Pull request is included into the main branch

Why code review? 🔎 💻

Correctness
Maintainability
Knowledge sharing

1. Mechanics

Mechanics of Reviewing a Pull Request on GitHub 🛠

Demo

Browsing code in GitHub

t: Activate the file finder
l: Jump to a line in your code
b: Open blame view

GitHub CLI

GitHub On The Command Line

gh pr checkout 20165

Demo

Overview of Mechanics

Markdown with some extras
Use suggestions when possible
Check and Navigate the CI
GitHub Keyboard shortcuts for browsing code

Collaboration
Conversational
Contributor In the Loop

Ask, Don’t Tell

✅

This would cause a performance regression. We can cache the results and use it again for sequential calls. What do you think?

❌

Never do this. It is very slow.

Offer Alternative Implementations

✅

This leads to a memory regression. May we try using batched calls to reduce the memory overhead? Something like this:

Insert code snippet

❌

This leads to a memory regression.

Explain why code should be changed

✅

We have similar logic in multiple places in this PR. Can we refactor this into its own function to be more DRY?

❌

Please refactor this into its own function.

Avoid using derogatory terms

✅

We can improve this function by adding more input validation and tests on the following edge cases:

List edge cases

❌

This whole function is stupid.

Be humble

Assume everyone is intelligent and well-meaning

✅

I checked the reference and it uses the following equation:

Insert Equation

I do not think this matches our implementation. What do you think?

❌

I have 20 years of experience and this is clearly wrong.

Ask for clarification

✅

This is part of the code is unclear. What was your reasoning behind using a queue here?

❌

This is confusing.

Communicate which ideas you do not feel strongly about

Communicate which ideas you feel strongly about

Try to leave a positive comment

✅

This is a very clever usage of defaultdict!

Thank contributors for their work

✅

Thank you for the PR @thomasjpfan!

Be explicit
Do not be sarcastic
Keep the discussion around the code
Self-directed commentary: “I feel”, and " I think"

Disagreement

Inevitable and good: People are engaged
Keep it positive
When code gets to a certain level of quality, disagreement is more about tradeoffs.
Get more opinions: Ping other contributors that are familiar with the code or the subject

Talk synchronously

If there are too many “I do not understand” or “alternative solutions”
Keep meeting notes and comment on the PR about the discussion

Collaboration
Conversational
Contributor In the Loop

3. Technical aspects

Presentation
Engineering
Testing

Presentation

3. Technical aspects

Does the description describe why the change is being made?

Description matches the contents of the PR
Links to information issues related to PR

Familiarize yourself with the context of the issue and why the Pull request exists.

Do we want this new feature? 🤔

Make sure that the team and the community wants to build the feature
Every line of code we add means more code size and more maintenance

Was unrelated code changed? 🛠

Only include files that are directly related to the change
If you use VSCode:

{
    "editor.formatOnSave": false,
}

Is the PR easy to review?

Difficulty of reviewing a PR scales non-linearly with size
Suggest how to split the PR into smaller PRs

Is the PR easy to review?

Some PRs must be large because they require major design changes

Engineering

3. Technical aspects

Every change should be intentional

Offer ways to simplify or improve code

Was a core component changed?

Needs to be discussed
Regressions?

Does the change maintain backwards compatibility?

Deprecation?

Could the change be more generally applicable elsewhere in the codebase?

Small changes: encourage the author to make the changes
Big changes: open another issue to track it

Does the change use idiomatic or existing functionally in the codebase?

There could be helper functions already implemented

Will the change break something planned in the future?

It may directly contradict planned future work

Performance optimizations

Require Benchmarks

Is there documentation?

User guide and documentation
Audience

Application (Examples)

Understanding (User Guide)

Details (API docs)

Code comments

Subtle changes
Are comments out of date?

Was there a hack?

Needs to be discussed
Code comments

Testing

3. Technical aspects

Are there tests?

Prevents breaking something in the future
Checks that the PR is correct
Bug fixes require non-regression tests
- Failed on main branch
- Does not fail with the PR