Lauryn's Open Source Journey

Join me as I blog about my experience as a self-taught programmer & open source contributor :)

View My GitHub Profile

22 December 2021

Learning git-grep

by Lauryn Menard

One common hurdle of starting to contribute to many open-source sofware projects is navigating a large and complex codebase. I know that when I first looked at the cloned Zulip git repository on my local machine, I simply felt overwhelmed.

Image from 1999 film '10 Things I Hate about You' of Chastity asking Bianca if you can be 'whelmed'

Where do I start?

An oft-repeated refrain from experienced Zulip folks in their replies to new contributors asking where to start on addressing an issue is to use git-grep.

Prior to Zulip, I hadn’t heard of this git command line tool and learning to use it has been a huge help in gaining confidence contributing and becoming more knowledgable about Zulip.

Here’s my intro to using git-grep to navigate a large and complex repository like Zulip’s.

The basics

In short, grep is a command built into git that searches through the working directory of a git repository for a string or regular expression. It’s fast, simple and very useful.

The main benefit of git-grep is that, unlike grep, it searches only the files in the git repository. This means it’s faster and doesn’t return results from unrelated files.

$ git grep max_message_length
frontend_tests/node_tests/compose_validate.js:    page_params.max_message_length = 10000;
frontend_tests/node_tests/compose_validate.js:    page_params.max_message_length = 10000;
static/js/compose_validate.js:    const max_length = page_params.max_message_length;
static/js/compose_validate.js:    if (compose_state.message_content().length > page_params.max_message_length) {
templates/zerver/api/  and `max_message_length`, and renamed `max_stream_name_length` and
zerver/lib/        state["max_message_length"] = settings.MAX_MESSAGE_LENGTH
zerver/openapi/zulip.yaml:                      max_message_length:
zerver/tests/        "max_message_length",

In the codeblock above, I’ve used git-grep to match max_message_length, which refers to a setting for messages in Zulip.

On each line of the terminal output, you see the file path and the pattern match that was found is in bold.

Now, if I want to work on some code that relates to this setting, I have a managable list of files to start looking at in the repository!

Going beyond the basics

Just using git-grep is worthwhile, but here are a few ways to make your searches more effective and fruitful.

Knowing what you’re looking for

Most of the time, using git-grep on a general project related word or phrase, for example ‘message’ or ‘stream_id’ in Zulip’s repositiory, isn’t going to be very helpful.

But, when you have something concrete to look for like a variable (e.g. max_message_length), function (e.g. do_update_message) or setting (e.g. mandatory_topics) from the project, you’ll get a good idea of where to start reading and potentially working.

I’ll admit my first few attempts at using git-grep were not successful. I ended up with a few terminal outputs that were seemingly endless lists of matches in the Zulip codebase.

So, before you start using git-grep, it’s worth the time to do a little orientation about the task or issue you’re interested in via your project’s resources.

If you’re considering contributing to Zulip, I’d recommend:

Case insensitivity

Remember my example above with max_message_length? Here it is again:

$ git grep max_message_length
frontend_tests/node_tests/compose_validate.js:    page_params.max_message_length = 10000;
frontend_tests/node_tests/compose_validate.js:    page_params.max_message_length = 10000;
static/js/compose_validate.js:    const max_length = page_params.max_message_length;
static/js/compose_validate.js:    if (compose_state.message_content().length > page_params.max_message_length) {
templates/zerver/api/  and `max_message_length`, and renamed `max_stream_name_length` and
zerver/lib/        state["max_message_length"] = settings.MAX_MESSAGE_LENGTH
zerver/openapi/zulip.yaml:                      max_message_length:
zerver/tests/        "max_message_length",

If you look closely, you’ll see my call of git grep max_message_length didn’t match the constant MAX_MESSAGE_LENGTH in zerver/lib/ due to the difference in capitalization.

By adjusting my call to include the option for a case insensitive search (-i), the output below now includes the matches for when my pattern is a variable (lowercase) and when it’s a constant (uppercase).

$ git grep -i max_message_length
frontend_tests/node_tests/compose_validate.js:    page_params.max_message_length = 10000;
frontend_tests/node_tests/compose_validate.js:    page_params.max_message_length = 10000;
static/js/compose_validate.js:    const max_length = page_params.max_message_length;
static/js/compose_validate.js:    if (compose_state.message_content().length > page_params.max_message_length) {
templates/zerver/api/  and `max_message_length`, and renamed `max_stream_name_length` and
zerver/lib/    if len(content) > settings.MAX_MESSAGE_LENGTH:
zerver/lib/        state["max_message_length"] = settings.MAX_MESSAGE_LENGTH
zerver/lib/markdown/        MAX_MESSAGE_LENGTH = settings.MAX_MESSAGE_LENGTH
zerver/lib/markdown/        if len(rendering_result.rendered_content) > MAX_MESSAGE_LENGTH * 100:
zerver/lib/markdown/                f"Rendered content exceeds {MAX_MESSAGE_LENGTH * 100} characters (message {logging_message_id})"
zerver/lib/    return truncate_content(body, settings.MAX_MESSAGE_LENGTH, "\n[message truncated]")
zerver/    content: str = models.TextField()  # Length should not exceed MAX_MESSAGE_LENGTH
zerver/openapi/zulip.yaml:                      max_message_length:
zerver/tests/        "max_message_length",
zerver/tests/    @override_settings(MAX_MESSAGE_LENGTH=10)
zerver/tests/        """A rendered message with an ultra-long length (> 100 * MAX_MESSAGE_LENGTH)
zerver/tests/        msg = "mock rendered message\n" * 10 * settings.MAX_MESSAGE_LENGTH
zerver/tests/    @override_settings(MAX_MESSAGE_LENGTH=25)
zerver/tests/        MAX_MESSAGE_LENGTH = settings.MAX_MESSAGE_LENGTH
zerver/tests/        long_message = "A" * (MAX_MESSAGE_LENGTH + 1)
zerver/tests/            sent_message.content, "A" * (MAX_MESSAGE_LENGTH - 20) + "\n[message truncated]"
zproject/ = 10000

When less is more

Here are the results I got when I was looking for the function do_update_message recently:

$ git grep do_update_message
analytics/tests/    do_update_message_flags,
analytics/tests/        do_update_message_flags(user1, client, "add", "read", [message])
zerver/lib/    number_changed = do_update_message(
zerver/lib/ do_update_message_flags(
zerver/lib/ do_update_message(
zerver/tests/    do_update_message,
zerver/tests/    do_update_message_flags,
zerver/tests/            lambda: do_update_message(
zerver/tests/            lambda: do_update_message(
zerver/tests/            lambda: do_update_message(
zerver/tests/            lambda: do_update_message(
zerver/tests/            lambda: do_update_message_flags(
zerver/tests/            lambda: do_update_message_flags(
zerver/tests/                lambda: do_update_message_flags(
zerver/tests/    do_update_message,
zerver/tests/        def do_update_message_topic_success(
zerver/tests/            do_update_message(
zerver/tests/        do_update_message_topic_success(
zerver/tests/        do_update_message_topic_success(
zerver/tests/        do_update_message_topic_success(cordelia, message, "Another topic", users_to_be_notified)
zerver/tests/        do_update_message_topic_success(hamlet, message, "Change again", users_to_be_notified)
zerver/tests/                # Since edit history is being generated by do_update_message,
zerver/tests/    do_update_message,
zerver/tests/        do_update_message(
zerver/tests/    do_update_message_flags,
zerver/tests/            do_update_message_flags(
zerver/views/    do_update_message_flags,
zerver/views/    count = do_update_message_flags(user_profile, request_notes.client, operation, flag, messages)

However, what I really needed was a list of the files with matches to this function name, and the repetition of the match in and made it difficult for me to see that information quickly and clearly.

Luckily, a quick read-through of the docs (by calling git help grep in the terminal) revealed a way to return a simplified view by using: git grep -c (or git grep --count).

$ git grep -c do_update_message

This time in the terminal output above, I’ve got a list of 7 files to look at and each line includes a count of how many times my pattern matched in each file.

And if you don’t want or need the count in your output, then you can use: git grep -l (or git grep --files-with-matches).

$ git grep -l do_update_message

Limit yourself to what you need

After working on Zulip for a little while, I’ve learned that the frontend implementation of the application’s features are in the static/js directory and that the help center documentation files are in the templates/zerver/help directory.

So, if I want even more targeted searches with git-grep in my working directory, I can include a path in my calls to see only the files in those specific directories.

Here’s a template call for specifying a directory path:
git grep <pattern> -- <path>

$ git grep -c default_view -- static/js

In the codeblock above, I’ve limited my search to where default_view appears in the frontend implementation of Zulip’s web application.

$ git grep -l "default view" -- templates/zerver/help

Similarly, in the code above are the matches to the string “default view” in the help center markdown files.

A little bit of formatting goes a long way

Most of the time, I want to read through code in my preferred IDE. But sometimes I want to format my git-grep search results to get a little glimpse of the code in the terminal.

Here are some helpful options I’ve found:

Combining these options will return a slighly ‘formatted’ output to the terminal. Below you can see the terminal output with these three git-grep options for the admin setting mandatory_topics, which is used in the Zulip tutorial for adding a new application feature.

$ git grep --break --heading -n mandatory_topics
186:boolean field, `mandatory_topics`, to the Realm model in
196:+    mandatory_topics: bool = models.BooleanField(default=False)
216:+        mandatory_topics=bool,
239:`git add zerver/migrations/`
263:  Applying zerver.NNNN_realm_mandatory_topics... OK
444:+    mandatory_topics: Optional[bool] = REQ(json_validator=check_bool, default=None),
548:+        realm_mandatory_topics: page_params.mandatory_topics,
563:  message, message edit history etc. Our `mandatory_topics` feature
573:in. For example in this case of `mandatory_topics` it will lie in
604:`mandatory_topics`, since this setting only has an effect on the
623:+            mandatory_topics: noop,

242:    page_params.realm_mandatory_topics = true;
329:    page_params.realm_mandatory_topics = false;

22:    realm_mandatory_topics: $t({defaultMessage: "Require topics in stream messages"}),
121:        realm_mandatory_topics: page_params.realm_mandatory_topics,

489:    if (page_params.realm_mandatory_topics) {

208:                mandatory_topics: noop,

151:                  setting_name="realm_mandatory_topics"
153:                  is_checked=realm_mandatory_topics
154:                  label=admin_settings_label.realm_mandatory_topics}}

279:                ("mandatory_topics", models.BooleanField(default=False)),

259:    mandatory_topics: bool = models.BooleanField(default=False)
653:        mandatory_topics=bool,

3612:                                    mandatory_topics:
10478:                      realm_mandatory_topics:

154:        "realm_mandatory_topics",

81:    mandatory_topics: Optional[bool] = REQ(json_validator=check_bool, default=None),

Summing up

This post only scratches the surface of using git-grep, but hopefully it’s enough to get some people started and curious to learn more.

Definitely check out the official git-grep documentation here.

And here are two blog posts I found useful while writing this post that go into more options for using git-grep than I do here:

Heath Ledger dancing on high school stadium seats in 1999 film '10 Things I Hate about You'

On a non-programming related note, the two GIFs included in this post are from the 1999 film ‘10 Things I Hate about You`, which is a retelling of Shakespeare’s ‘The Taming of the Shrew’ set in an imaginary, uber-rich, US high school in the late 90s. It’s both delightful and ridiculous with a fantastic soundtrack. I highly recommend.
