Join me as I blog about my experience as a self-taught programmer & open source contributor :)
by Lauryn Menard
One common hurdle of starting to contribute to many open-source sofware projects is navigating a large and complex codebase. I know that when I first looked at the cloned Zulip git repository on my local machine, I simply felt overwhelmed.
An oft-repeated refrain from experienced Zulip folks in their replies to new contributors asking where to start on addressing an issue is to use git-grep
.
Prior to Zulip, I hadn’t heard of this git command line tool and learning to use it has been a huge help in gaining confidence contributing and becoming more knowledgable about Zulip.
Here’s my intro to using git-grep
to navigate a large and complex repository like Zulip’s.
In short, grep
is a command built into git that searches through the working directory of a git repository for a string or regular expression. It’s fast, simple and very useful.
The main benefit of git-grep
is that, unlike grep
, it searches only the files in the git repository. This means it’s faster and doesn’t return results from unrelated files.
lauryn@leeloo2:~/Zulip/zulip $ git grep max_message_length frontend_tests/node_tests/compose_validate.js: page_params.max_message_length = 10000; frontend_tests/node_tests/compose_validate.js: page_params.max_message_length = 10000; static/js/compose_validate.js: const max_length = page_params.max_message_length; static/js/compose_validate.js: if (compose_state.message_content().length > page_params.max_message_length) { templates/zerver/api/changelog.md: and `max_message_length`, and renamed `max_stream_name_length` and zerver/lib/events.py: state["max_message_length"] = settings.MAX_MESSAGE_LENGTH zerver/openapi/zulip.yaml: max_message_length: zerver/tests/test_home.py: "max_message_length",
In the codeblock above, I’ve used git-grep
to match max_message_length
, which refers to a setting for messages in Zulip.
On each line of the terminal output, you see the file path and the pattern match that was found is in bold.
Now, if I want to work on some code that relates to this setting, I have a managable list of files to start looking at in the repository!
Just using git-grep
is worthwhile, but here are a few ways to make your searches more effective and fruitful.
Most of the time, using git-grep
on a general project related word or phrase, for example ‘message’ or ‘stream_id’ in Zulip’s repositiory, isn’t going to be very helpful.
But, when you have something concrete to look for like a variable (e.g. max_message_length
), function (e.g. do_update_message
) or setting (e.g. mandatory_topics
) from the project, you’ll get a good idea of where to start reading and potentially working.
I’ll admit my first few attempts at using git-grep
were not successful. I ended up with a few terminal outputs that were seemingly endless lists of matches in the Zulip codebase.
So, before you start using git-grep
, it’s worth the time to do a little orientation about the task or issue you’re interested in via your project’s resources.
If you’re considering contributing to Zulip, I’d recommend:
Remember my example above with max_message_length
? Here it is again:
lauryn@leeloo2:~/Zulip/zulip $ git grep max_message_length frontend_tests/node_tests/compose_validate.js: page_params.max_message_length = 10000; frontend_tests/node_tests/compose_validate.js: page_params.max_message_length = 10000; static/js/compose_validate.js: const max_length = page_params.max_message_length; static/js/compose_validate.js: if (compose_state.message_content().length > page_params.max_message_length) { templates/zerver/api/changelog.md: and `max_message_length`, and renamed `max_stream_name_length` and zerver/lib/events.py: state["max_message_length"] = settings.MAX_MESSAGE_LENGTH zerver/openapi/zulip.yaml: max_message_length: zerver/tests/test_home.py: "max_message_length",
If you look closely, you’ll see my call of git grep max_message_length
didn’t match the constant MAX_MESSAGE_LENGTH
in zerver/lib/events.py
due to the difference in capitalization.
By adjusting my call to include the option for a case insensitive search (-i
), the output below now includes the matches for when my pattern is a variable (lowercase) and when it’s a constant (uppercase).
lauryn@leeloo2:~/Zulip/zulip $ git grep -i max_message_length frontend_tests/node_tests/compose_validate.js: page_params.max_message_length = 10000; frontend_tests/node_tests/compose_validate.js: page_params.max_message_length = 10000; static/js/compose_validate.js: const max_length = page_params.max_message_length; static/js/compose_validate.js: if (compose_state.message_content().length > page_params.max_message_length) { templates/zerver/api/changelog.md: and `max_message_length`, and renamed `max_stream_name_length` and zerver/lib/actions.py: if len(content) > settings.MAX_MESSAGE_LENGTH: zerver/lib/events.py: state["max_message_length"] = settings.MAX_MESSAGE_LENGTH zerver/lib/markdown/__init__.py: MAX_MESSAGE_LENGTH = settings.MAX_MESSAGE_LENGTH zerver/lib/markdown/__init__.py: if len(rendering_result.rendered_content) > MAX_MESSAGE_LENGTH * 100: zerver/lib/markdown/__init__.py: f"Rendered content exceeds {MAX_MESSAGE_LENGTH * 100} characters (message {logging_message_id})" zerver/lib/message.py: return truncate_content(body, settings.MAX_MESSAGE_LENGTH, "\n[message truncated]") zerver/models.py: content: str = models.TextField() # Length should not exceed MAX_MESSAGE_LENGTH zerver/openapi/zulip.yaml: max_message_length: zerver/tests/test_home.py: "max_message_length", zerver/tests/test_markdown.py: @override_settings(MAX_MESSAGE_LENGTH=10) zerver/tests/test_markdown.py: """A rendered message with an ultra-long length (> 100 * MAX_MESSAGE_LENGTH) zerver/tests/test_markdown.py: msg = "mock rendered message\n" * 10 * settings.MAX_MESSAGE_LENGTH zerver/tests/test_message_send.py: @override_settings(MAX_MESSAGE_LENGTH=25) zerver/tests/test_message_send.py: MAX_MESSAGE_LENGTH = settings.MAX_MESSAGE_LENGTH zerver/tests/test_message_send.py: long_message = "A" * (MAX_MESSAGE_LENGTH + 1) zerver/tests/test_message_send.py: sent_message.content, "A" * (MAX_MESSAGE_LENGTH - 20) + "\n[message truncated]" zproject/default_settings.py:MAX_MESSAGE_LENGTH = 10000
Here are the results I got when I was looking for the function do_update_message
recently:
lauryn@leeloo2:~/Zulip/zulip $ git grep do_update_message analytics/tests/test_counts.py: do_update_message_flags, analytics/tests/test_counts.py: do_update_message_flags(user1, client, "add", "read", [message]) zerver/lib/actions.py: number_changed = do_update_message( zerver/lib/actions.py:def do_update_message_flags( zerver/lib/actions.py:def do_update_message( zerver/tests/test_events.py: do_update_message, zerver/tests/test_events.py: do_update_message_flags, zerver/tests/test_events.py: lambda: do_update_message( zerver/tests/test_events.py: lambda: do_update_message( zerver/tests/test_events.py: lambda: do_update_message( zerver/tests/test_events.py: lambda: do_update_message( zerver/tests/test_events.py: lambda: do_update_message_flags( zerver/tests/test_events.py: lambda: do_update_message_flags( zerver/tests/test_events.py: lambda: do_update_message_flags( zerver/tests/test_message_edit.py: do_update_message, zerver/tests/test_message_edit.py: def do_update_message_topic_success( zerver/tests/test_message_edit.py: do_update_message( zerver/tests/test_message_edit.py: do_update_message_topic_success( zerver/tests/test_message_edit.py: do_update_message_topic_success( zerver/tests/test_message_edit.py: do_update_message_topic_success(cordelia, message, "Another topic", users_to_be_notified) zerver/tests/test_message_edit.py: do_update_message_topic_success(hamlet, message, "Change again", users_to_be_notified) zerver/tests/test_message_edit.py: # Since edit history is being generated by do_update_message, zerver/tests/test_message_fetch.py: do_update_message, zerver/tests/test_message_fetch.py: do_update_message( zerver/tests/test_push_notifications.py: do_update_message_flags, zerver/tests/test_push_notifications.py: do_update_message_flags( zerver/views/message_flags.py: do_update_message_flags, zerver/views/message_flags.py: count = do_update_message_flags(user_profile, request_notes.client, operation, flag, messages)
However, what I really needed was a list of the files with matches to this function name, and the repetition of the match in tests_event.py
and tests_message_edit.py
made it difficult for me to see that information quickly and clearly.
Luckily, a quick read-through of the docs (by calling git help grep
in the terminal) revealed a way to return a simplified view by using: git grep -c
(or git grep --count
).
lauryn@leeloo2:~/Zulip/zulip $ git grep -c do_update_message analytics/tests/test_counts.py:2 zerver/lib/actions.py:3 zerver/tests/test_events.py:9 zerver/tests/test_message_edit.py:8 zerver/tests/test_message_fetch.py:2 zerver/tests/test_push_notifications.py:2 zerver/views/message_flags.py:2
This time in the terminal output above, I’ve got a list of 7 files to look at and each line includes a count of how many times my pattern matched in each file.
And if you don’t want or need the count in your output, then you can use: git grep -l
(or git grep --files-with-matches
).
lauryn@leeloo2:~/Zulip/zulip $ git grep -l do_update_message analytics/tests/test_counts.py zerver/lib/actions.py zerver/tests/test_events.py zerver/tests/test_message_edit.py zerver/tests/test_message_fetch.py zerver/tests/test_push_notifications.py zerver/views/message_flags.py
After working on Zulip for a little while, I’ve learned that the frontend implementation of the application’s features are in the static/js
directory and that the help center documentation files are in the templates/zerver/help
directory.
So, if I want even more targeted searches with git-grep
in my working directory, I can include a path in my calls to see only the files in those specific directories.
Here’s a template call for specifying a directory path:
git grep <pattern> -- <path>
lauryn@leeloo2:~/Zulip/zulip $ git grep -c default_view -- static/js static/js/admin.js:1 static/js/hashchange.js:10 static/js/hotkey.js:2 static/js/info_overlay.js:1 static/js/realm_user_settings_defaults.ts:2 static/js/server_events_dispatch.js:3 static/js/settings.js:1 static/js/settings_config.ts:2 static/js/settings_display.js:1 static/js/user_settings.ts:2
In the codeblock above, I’ve limited my search to where default_view
appears in the frontend implementation of Zulip’s web application.
lauryn@leeloo2:~/Zulip/zulip $ git grep -l "default view" -- templates/zerver/help templates/zerver/help/configure-default-view.md templates/zerver/help/include/sidebar_index.md templates/zerver/help/keyboard-shortcuts.md
Similarly, in the code above are the matches to the string “default view” in the help center markdown files.
Most of the time, I want to read through code in my preferred IDE. But sometimes I want to format my git-grep
search results to get a little glimpse of the code in the terminal.
Here are some helpful options I’ve found:
--break
, which will print an empty line between matches from different files.--heading
, which will show the file name on the line above the matches instead of the start of the line.-n
or --line-number
, which will prefix the line number to the matching lines in the file.Combining these options will return a slighly ‘formatted’ output to the terminal. Below you can see the terminal output with these three git-grep
options for the admin setting mandatory_topics
, which is used in the Zulip tutorial for adding a new application feature.
lauryn@leeloo2:~/Zulip/zulip $ git grep --break --heading -n mandatory_topics docs/tutorials/new-feature-tutorial.md 186:boolean field, `mandatory_topics`, to the Realm model in 196:+ mandatory_topics: bool = models.BooleanField(default=False) 216:+ mandatory_topics=bool, 239:`git add zerver/migrations/NNNN_realm_mandatory_topics.py` 263: Applying zerver.NNNN_realm_mandatory_topics... OK 444:+ mandatory_topics: Optional[bool] = REQ(json_validator=check_bool, default=None), 548:+ realm_mandatory_topics: page_params.mandatory_topics, 563: message, message edit history etc. Our `mandatory_topics` feature 573:in. For example in this case of `mandatory_topics` it will lie in 604:`mandatory_topics`, since this setting only has an effect on the 623:+ mandatory_topics: noop, frontend_tests/node_tests/compose_validate.js 242: page_params.realm_mandatory_topics = true; 329: page_params.realm_mandatory_topics = false; static/js/admin.js 22: realm_mandatory_topics: $t({defaultMessage: "Require topics in stream messages"}), 121: realm_mandatory_topics: page_params.realm_mandatory_topics, static/js/compose_validate.js 489: if (page_params.realm_mandatory_topics) { static/js/server_events_dispatch.js 208: mandatory_topics: noop, static/templates/settings/organization_settings_admin.hbs 151: setting_name="realm_mandatory_topics" 153: is_checked=realm_mandatory_topics 154: label=admin_settings_label.realm_mandatory_topics}} zerver/migrations/0001_initial.py 279: ("mandatory_topics", models.BooleanField(default=False)), zerver/models.py 259: mandatory_topics: bool = models.BooleanField(default=False) 653: mandatory_topics=bool, zerver/openapi/zulip.yaml 3612: mandatory_topics: 10478: realm_mandatory_topics: zerver/tests/test_home.py 154: "realm_mandatory_topics", zerver/views/realm.py 81: mandatory_topics: Optional[bool] = REQ(json_validator=check_bool, default=None),
This post only scratches the surface of using git-grep
, but hopefully it’s enough to get some people started and curious to learn more.
Definitely check out the official git-grep
documentation here.
And here are two blog posts I found useful while writing this post that go into more options for using git-grep
than I do here:
On a non-programming related note, the two GIFs included in this post are from the 1999 film ‘10 Things I Hate about You`, which is a retelling of Shakespeare’s ‘The Taming of the Shrew’ set in an imaginary, uber-rich, US high school in the late 90s. It’s both delightful and ridiculous with a fantastic soundtrack. I highly recommend.
tags: