22 December 2021

Learning git-grep

by Lauryn Menard

One common hurdle of starting to contribute to many open-source sofware projects is navigating a large and complex codebase. I know that when I first looked at the cloned Zulip git repository on my local machine, I simply felt overwhelmed.

Image from 1999 film '10 Things I Hate about You' of Chastity asking Bianca if you can be 'whelmed'

Where do I start?

An oft-repeated refrain from experienced Zulip folks in their replies to new contributors asking where to start on addressing an issue is to use git-grep.

Prior to Zulip, I hadn’t heard of this git command line tool and learning to use it has been a huge help in gaining confidence contributing and becoming more knowledgable about Zulip.

Here’s my intro to using git-grep to navigate a large and complex repository like Zulip’s.

The basics

In short, grep is a command built into git that searches through the working directory of a git repository for a string or regular expression. It’s fast, simple and very useful.

The main benefit of git-grep is that, unlike grep, it searches only the files in the git repository. This means it’s faster and doesn’t return results from unrelated files.

lauryn@leeloo2:~/Zulip/zulip
$ git grep max_message_length
frontend_tests/node_tests/compose_validate.js:    page_params.max_message_length = 10000;
frontend_tests/node_tests/compose_validate.js:    page_params.max_message_length = 10000;
static/js/compose_validate.js:    const max_length = page_params.max_message_length;
static/js/compose_validate.js:    if (compose_state.message_content().length > page_params.max_message_length) {
templates/zerver/api/changelog.md:  and `max_message_length`, and renamed `max_stream_name_length` and
zerver/lib/events.py:        state["max_message_length"] = settings.MAX_MESSAGE_LENGTH
zerver/openapi/zulip.yaml:                      max_message_length:
zerver/tests/test_home.py:        "max_message_length",

In the codeblock above, I’ve used git-grep to match max_message_length, which refers to a setting for messages in Zulip.

On each line of the terminal output, you see the file path and the pattern match that was found is in bold.

Now, if I want to work on some code that relates to this setting, I have a managable list of files to start looking at in the repository!

Going beyond the basics

Just using git-grep is worthwhile, but here are a few ways to make your searches more effective and fruitful.

Knowing what you’re looking for

Most of the time, using git-grep on a general project related word or phrase, for example ‘message’ or ‘stream_id’ in Zulip’s repositiory, isn’t going to be very helpful.

But, when you have something concrete to look for like a variable (e.g. max_message_length), function (e.g. do_update_message) or setting (e.g. mandatory_topics) from the project, you’ll get a good idea of where to start reading and potentially working.

I’ll admit my first few attempts at using git-grep were not successful. I ended up with a few terminal outputs that were seemingly endless lists of matches in the Zulip codebase.

So, before you start using git-grep, it’s worth the time to do a little orientation about the task or issue you’re interested in via your project’s resources.

If you’re considering contributing to Zulip, I’d recommend:

Case insensitivity

Remember my example above with max_message_length? Here it is again:

lauryn@leeloo2:~/Zulip/zulip
$ git grep max_message_length
frontend_tests/node_tests/compose_validate.js:    page_params.max_message_length = 10000;
frontend_tests/node_tests/compose_validate.js:    page_params.max_message_length = 10000;
static/js/compose_validate.js:    const max_length = page_params.max_message_length;
static/js/compose_validate.js:    if (compose_state.message_content().length > page_params.max_message_length) {
templates/zerver/api/changelog.md:  and `max_message_length`, and renamed `max_stream_name_length` and
zerver/lib/events.py:        state["max_message_length"] = settings.MAX_MESSAGE_LENGTH
zerver/openapi/zulip.yaml:                      max_message_length:
zerver/tests/test_home.py:        "max_message_length",

If you look closely, you’ll see my call of git grep max_message_length didn’t match the constant MAX_MESSAGE_LENGTH in zerver/lib/events.py due to the difference in capitalization.

By adjusting my call to include the option for a case insensitive search (-i), the output below now includes the matches for when my pattern is a variable (lowercase) and when it’s a constant (uppercase).

lauryn@leeloo2:~/Zulip/zulip
$ git grep -i max_message_length
frontend_tests/node_tests/compose_validate.js:    page_params.max_message_length = 10000;
frontend_tests/node_tests/compose_validate.js:    page_params.max_message_length = 10000;
static/js/compose_validate.js:    const max_length = page_params.max_message_length;
static/js/compose_validate.js:    if (compose_state.message_content().length > page_params.max_message_length) {
templates/zerver/api/changelog.md:  and `max_message_length`, and renamed `max_stream_name_length` and
zerver/lib/actions.py:    if len(content) > settings.MAX_MESSAGE_LENGTH:
zerver/lib/events.py:        state["max_message_length"] = settings.MAX_MESSAGE_LENGTH
zerver/lib/markdown/__init__.py:        MAX_MESSAGE_LENGTH = settings.MAX_MESSAGE_LENGTH
zerver/lib/markdown/__init__.py:        if len(rendering_result.rendered_content) > MAX_MESSAGE_LENGTH * 100:
zerver/lib/markdown/__init__.py:                f"Rendered content exceeds {MAX_MESSAGE_LENGTH * 100} characters (message {logging_message_id})"
zerver/lib/message.py:    return truncate_content(body, settings.MAX_MESSAGE_LENGTH, "\n[message truncated]")
zerver/models.py:    content: str = models.TextField()  # Length should not exceed MAX_MESSAGE_LENGTH
zerver/openapi/zulip.yaml:                      max_message_length:
zerver/tests/test_home.py:        "max_message_length",
zerver/tests/test_markdown.py:    @override_settings(MAX_MESSAGE_LENGTH=10)
zerver/tests/test_markdown.py:        """A rendered message with an ultra-long length (> 100 * MAX_MESSAGE_LENGTH)
zerver/tests/test_markdown.py:        msg = "mock rendered message\n" * 10 * settings.MAX_MESSAGE_LENGTH
zerver/tests/test_message_send.py:    @override_settings(MAX_MESSAGE_LENGTH=25)
zerver/tests/test_message_send.py:        MAX_MESSAGE_LENGTH = settings.MAX_MESSAGE_LENGTH
zerver/tests/test_message_send.py:        long_message = "A" * (MAX_MESSAGE_LENGTH + 1)
zerver/tests/test_message_send.py:            sent_message.content, "A" * (MAX_MESSAGE_LENGTH - 20) + "\n[message truncated]"
zproject/default_settings.py:MAX_MESSAGE_LENGTH = 10000

When less is more

Here are the results I got when I was looking for the function do_update_message recently:

lauryn@leeloo2:~/Zulip/zulip
$ git grep do_update_message
analytics/tests/test_counts.py:    do_update_message_flags,
analytics/tests/test_counts.py:        do_update_message_flags(user1, client, "add", "read", [message])
zerver/lib/actions.py:    number_changed = do_update_message(
zerver/lib/actions.py:def do_update_message_flags(
zerver/lib/actions.py:def do_update_message(
zerver/tests/test_events.py:    do_update_message,
zerver/tests/test_events.py:    do_update_message_flags,
zerver/tests/test_events.py:            lambda: do_update_message(
zerver/tests/test_events.py:            lambda: do_update_message(
zerver/tests/test_events.py:            lambda: do_update_message(
zerver/tests/test_events.py:            lambda: do_update_message(
zerver/tests/test_events.py:            lambda: do_update_message_flags(
zerver/tests/test_events.py:            lambda: do_update_message_flags(
zerver/tests/test_events.py:                lambda: do_update_message_flags(
zerver/tests/test_message_edit.py:    do_update_message,
zerver/tests/test_message_edit.py:        def do_update_message_topic_success(
zerver/tests/test_message_edit.py:            do_update_message(
zerver/tests/test_message_edit.py:        do_update_message_topic_success(
zerver/tests/test_message_edit.py:        do_update_message_topic_success(
zerver/tests/test_message_edit.py:        do_update_message_topic_success(cordelia, message, "Another topic", users_to_be_notified)
zerver/tests/test_message_edit.py:        do_update_message_topic_success(hamlet, message, "Change again", users_to_be_notified)
zerver/tests/test_message_edit.py:                # Since edit history is being generated by do_update_message,
zerver/tests/test_message_fetch.py:    do_update_message,
zerver/tests/test_message_fetch.py:        do_update_message(
zerver/tests/test_push_notifications.py:    do_update_message_flags,
zerver/tests/test_push_notifications.py:            do_update_message_flags(
zerver/views/message_flags.py:    do_update_message_flags,
zerver/views/message_flags.py:    count = do_update_message_flags(user_profile, request_notes.client, operation, flag, messages)

However, what I really needed was a list of the files with matches to this function name, and the repetition of the match in tests_event.py and tests_message_edit.py made it difficult for me to see that information quickly and clearly.

Luckily, a quick read-through of the docs (by calling git help grep in the terminal) revealed a way to return a simplified view by using: git grep -c (or git grep --count).

lauryn@leeloo2:~/Zulip/zulip
$ git grep -c do_update_message
analytics/tests/test_counts.py:2
zerver/lib/actions.py:3
zerver/tests/test_events.py:9
zerver/tests/test_message_edit.py:8
zerver/tests/test_message_fetch.py:2
zerver/tests/test_push_notifications.py:2
zerver/views/message_flags.py:2

This time in the terminal output above, I’ve got a list of 7 files to look at and each line includes a count of how many times my pattern matched in each file.

And if you don’t want or need the count in your output, then you can use: git grep -l (or git grep --files-with-matches).

lauryn@leeloo2:~/Zulip/zulip
$ git grep -l do_update_message
analytics/tests/test_counts.py
zerver/lib/actions.py
zerver/tests/test_events.py
zerver/tests/test_message_edit.py
zerver/tests/test_message_fetch.py
zerver/tests/test_push_notifications.py
zerver/views/message_flags.py

Limit yourself to what you need

After working on Zulip for a little while, I’ve learned that the frontend implementation of the application’s features are in the static/js directory and that the help center documentation files are in the templates/zerver/help directory.

So, if I want even more targeted searches with git-grep in my working directory, I can include a path in my calls to see only the files in those specific directories.

Here’s a template call for specifying a directory path:
git grep <pattern> -- <path>

lauryn@leeloo2:~/Zulip/zulip
$ git grep -c default_view -- static/js
static/js/admin.js:1
static/js/hashchange.js:10
static/js/hotkey.js:2
static/js/info_overlay.js:1
static/js/realm_user_settings_defaults.ts:2
static/js/server_events_dispatch.js:3
static/js/settings.js:1
static/js/settings_config.ts:2
static/js/settings_display.js:1
static/js/user_settings.ts:2

In the codeblock above, I’ve limited my search to where default_view appears in the frontend implementation of Zulip’s web application.

lauryn@leeloo2:~/Zulip/zulip
$ git grep -l "default view" -- templates/zerver/help
templates/zerver/help/configure-default-view.md
templates/zerver/help/include/sidebar_index.md
templates/zerver/help/keyboard-shortcuts.md

Similarly, in the code above are the matches to the string “default view” in the help center markdown files.

A little bit of formatting goes a long way

Most of the time, I want to read through code in my preferred IDE. But sometimes I want to format my git-grep search results to get a little glimpse of the code in the terminal.

Here are some helpful options I’ve found:

--break, which will print an empty line between matches from different files.
--heading, which will show the file name on the line above the matches instead of the start of the line.
-n or --line-number, which will prefix the line number to the matching lines in the file.

Combining these options will return a slighly ‘formatted’ output to the terminal. Below you can see the terminal output with these three git-grep options for the admin setting mandatory_topics, which is used in the Zulip tutorial for adding a new application feature.

lauryn@leeloo2:~/Zulip/zulip
$ git grep --break --heading -n mandatory_topics
docs/tutorials/new-feature-tutorial.md
186:boolean field, `mandatory_topics`, to the Realm model in
196:+    mandatory_topics: bool = models.BooleanField(default=False)
216:+        mandatory_topics=bool,
239:`git add zerver/migrations/NNNN_realm_mandatory_topics.py`
263:  Applying zerver.NNNN_realm_mandatory_topics... OK
444:+    mandatory_topics: Optional[bool] = REQ(json_validator=check_bool, default=None),
548:+        realm_mandatory_topics: page_params.mandatory_topics,
563:  message, message edit history etc. Our `mandatory_topics` feature
573:in. For example in this case of `mandatory_topics` it will lie in
604:`mandatory_topics`, since this setting only has an effect on the
623:+            mandatory_topics: noop,

frontend_tests/node_tests/compose_validate.js
242:    page_params.realm_mandatory_topics = true;
329:    page_params.realm_mandatory_topics = false;

static/js/admin.js
22:    realm_mandatory_topics: $t({defaultMessage: "Require topics in stream messages"}),
121:        realm_mandatory_topics: page_params.realm_mandatory_topics,

static/js/compose_validate.js
489:    if (page_params.realm_mandatory_topics) {

static/js/server_events_dispatch.js
208:                mandatory_topics: noop,

static/templates/settings/organization_settings_admin.hbs
151:                  setting_name="realm_mandatory_topics"
153:                  is_checked=realm_mandatory_topics
154:                  label=admin_settings_label.realm_mandatory_topics}}

zerver/migrations/0001_initial.py
279:                ("mandatory_topics", models.BooleanField(default=False)),

zerver/models.py
259:    mandatory_topics: bool = models.BooleanField(default=False)
653:        mandatory_topics=bool,

zerver/openapi/zulip.yaml
3612:                                    mandatory_topics:
10478:                      realm_mandatory_topics:

zerver/tests/test_home.py
154:        "realm_mandatory_topics",

zerver/views/realm.py
81:    mandatory_topics: Optional[bool] = REQ(json_validator=check_bool, default=None),

Summing up

This post only scratches the surface of using git-grep, but hopefully it’s enough to get some people started and curious to learn more.

Definitely check out the official git-grep documentation here.

And here are two blog posts I found useful while writing this post that go into more options for using git-grep than I do here:

Heath Ledger dancing on high school stadium seats in 1999 film '10 Things I Hate about You'

On a non-programming related note, the two GIFs included in this post are from the 1999 film ‘10 Things I Hate about You`, which is a retelling of Shakespeare’s ‘The Taming of the Shrew’ set in an imaginary, uber-rich, US high school in the late 90s. It’s both delightful and ridiculous with a fantastic soundtrack. I highly recommend.

tags: