Granted that code review is often a boring job and many developers tend to gloss over it, it’s still an important part of software development. Thanks to recent AI development, part of this job can be delegated. In this article, I’ll show you how to automate certain type of code review using OpenAI’s API and BitBucket web hook.
The process
- When a PR is ready to be reviewed, developer adds a comment using a certain keyword command to the PR (in this case, we use command “review now”).
- A BitBucket web hook will be called whenever there’s a new PR comment. This web hook is a call to our remote code-review API
- Our Code-review API will then examine this newly added comment, recognises command keyword, retrieve diffs from BitBucket, calls OpenAI to do assessment.
- Assessment is finally posted back to PR as an inline comment for each file committed in this PR.
What do we ask OpenAI to assess?
At the moment, we’re only considering the following, which is more than a bucket full imo:
- Security threats
- Code optimization
- Code documentation
Setup BitBucket App Password
We use App Password credentials for our API call to BitBucket. Steps to generate:
- Log in to your Bitbucket account.
- Click on Settings icon on the top-right corner, then click on “Personal Bitbucket settings” from the pop-up menu.
- In the settings menu, under the “Access Management” section, click on “App passwords”.
- Click on the “Create app password” button.
- In the “Label” field, enter a meaningful name for this app password so you can remember what it is used for.
- Select the permissions you want to give to this app password. We’ll need Pull Request Read and Write (because we need to be able to post comment)
- Click on the “Create” button.
- The newly created app password will be displayed. Make sure to copy it and store it in a secure place as you will not be able to see it again. If you lose it, you will have to create a new one.
Remember, treat your app passwords with the same level of security as you would with your main account password. They can provide extensive access to your Bitbucket account.
Understand BitBucket APIs
We use the following REST APIs of BitBucket:
- Get PR info
- Get PR diff: this endpoint can be extracted from the json response of the get PR info endpoint above.
- Add new comment
The Code-review API
Our code-review api is written using Python’s FastAPI. It has several key functions:
- A web hook to process recent bitbucket comment and pickup key phrase.
- Web hook then call
run()
function. And this function will then call BitBucket API to retrieve info about this PR including all the diff content. - Subsequently several calls will be made to OpenAI for assessment.
- Finally, we call BitBucket REST api to post our assessment as a comment to this current PR.
Check out a brief summary of our main.py
:
import json import requests from dotenv import dotenv_values import openai from fastapi import FastAPI, Request, BackgroundTasks # Load environment variables from .env file env = dotenv_values(".env") username = env["username"] password = env["password"] workspace_id = env["workspace_id"] repo_slug = env["repo_slug"] openai.api_key = env["openai_key"] app = FastAPI() """ Bitbucket will call this api whenever a new comment is added to a PR """ @app.post("/webhook") async def handle_webhook(request: Request, background_tasks: BackgroundTasks): payload = await request.json() # Extract the pull request ID from the payload pullrequest_id = payload['pullrequest']['id'] # get comment content comment_content = payload['comment']['content']['raw'] # check if "review now" command is given in comment, if yes, start the reviewing process: if "review now" in comment_content.lower(): # we want to do review as background because it may take more than 10s to run. And BB timeout is already 10s. background_tasks.add_task(run,pullrequest_id) return {"message": "Review in progress. Comments will be posted to PR."} return {"message": "Failed. Check log."} def run(pullrequest_id): # get diff content of this PR diff_content = get_pr_diff(pullrequest_id) # process diff of EACH file of this PR at a time. Otherwise, it'll be too large. # each diff starts with 'diff --git', so we can use that string as delim diffs = diff_content.split('diff --git') # Remove any leading or trailing whitespace from each diff diffs = [diff.strip() for diff in diffs if diff.strip()] # Now have each diff processed for i, diff in enumerate(diffs, start=1): # get file path of this current diff, so that we can do inline commenting on BB PR file_path = get_file_path_from_diff(diff) est_tokens = count_words(diff) # only analyze diff that is not too long and that has addition(s) if est_tokens < 3500 and has_additions(diff): analysis_result = get_openai_analysis(diff) optimization_result = get_openai_optimization(diff) documentation_result = get_openai_documentation_assessment(diff) first_diff_line_no = get_line_number_from_diff(diff) # post to bitbucket comment_content = f"File: {file_path} \n\n" \ f"- Security analysis: {analysis_result} \n" \ f"- Optimization: {optimization_result} \n" \ f"- Documentation (code comments): {documentation_result}\n" post_comment_to_bicbucket( pullrequest_id, file_path, first_diff_line_no, # 1st line of diff comment_content) print("\n\nAll Analysis Done.") return True """ Get diff content of the whole PR. It'll be all the diff being returned. Not just one single diff of 1 file (if there's more than 1 file) :param: pullrequest_id int id of the pull request :return: diff content (all diffs in 1 return). You will need to extract diff for each file. """ def get_pr_diff(pullrequest_id): # get pr details url = f"https://api.bitbucket.org/2.0/repositories/{workspace_id}/{repo_slug}/pullrequests/{pullrequest_id}" response = requests.get(url, auth=(username, password)) response_json = response.json() # get endpoint for pr's diff diff_url = response_json["links"]["diff"]["href"] # call this endpoint to get diff content. return will be text, not json. response_diff = requests.get(diff_url, auth=(username, password)) return response_diff.text """ Retrieve the path of diff file from diff text. Preceding a/ or b/ will be removed. if path is /dev/null, return None. """ def get_file_path_from_diff(diff): ... return path def get_line_number_from_diff(diff): ... return start_line_number def post_comment_to_bicbucket(pullrequest_id, file_path, line, content): url = f"https://api.bitbucket.org/2.0/repositories/{workspace_id}/{repo_slug}/pullrequests/{pullrequest_id}/comments" headers = { "Accept": "application/json", "Content-Type": "application/json", } raw = { "content": { "raw": "**OpenAI bot (beta):**\n\n " + content + "\n\n--- This comment is automated by bot." } } # post comment inline if line is provided if line > 0 and file_path is not None: raw["inline"] = { "from": line, "path": file_path } payload = json.dumps(raw) response = requests.request( "POST", url, data=payload, headers=headers, auth=(username, password) ) return True """ Get security analysis """ def get_openai_analysis(code_lines): messages = [ {"role": "system", "content": "You are a software security analysis with excellent coding skills. "}, ] messages.append({ "role": "user", "content": "<insert your prompt here that is suitable for your project...>:\n\n" \ "" + code_lines}) #print(messages) response = openai.ChatCompletion.create( model="gpt-3.5-turbo", messages=messages ) # convert ai response text assistant_text = response['choices'][0]['message']['content'] print(assistant_text) return assistant_text
""" Get optimization analysis from openai for the diff :code_lines: string might be just diff section or whole file with diff section """ def get_openai_optimization(code_lines): ... """ Get assessment of code documentation """ def get_openai_documentation_assessment(code_lines): ...
Setup BB web hook
Finally, setup web hook on BitBucket. This means if there’s a new PR comment, it’ll call our code-review api to see if a code review command is given in the comment.
- Log into your Bitbucket account.
- Navigate to the repository where you want to add the webhook.
- Click on the “Settings” option in the left-hand navigation panel.
- Scroll down to the “Workflow” section and click on “Webhooks”.
- Click on the “Add webhook” button.
- A new screen will appear. Here you’ll need to provide the following information:
– Title: This is a descriptive name for your webhook.
– URL: This is the server endpoint where Bitbucket will send the webhook POST requests. This should be the URL of your application that will handle the webhook data.
– Status: You can choose to enable or disable the webhook. For now, you should keep it enabled.
– Triggers: Choose when you want Bitbucket to call your webhook. In our case, choose when a new PR comment is added/created. - After you’ve filled in all the details, click on “Save”.
That’s it!
Now, we’ve successfully automated part of our jobs. There’s still more to be done such as refining prompts to improve accuracy and depth of results. Meanwhile, check out an example screenshot below: