Periodically Updating GitHub Profile's README.md using Python & GitHub Actions

My GitHub Profile README.md

The GitHub Profile README.md is a special Markdown file that can be created and displayed on your GitHub user profile to provide a customized introduction and showcase your work to visitors of your profile page. It allows you to personalize your GitHub profile and give a brief overview of your skills, interests, and the projects you're working on.

For example, I have a section in the README.md file that shows 5 recent blog posts that are posted here. Unfortunately, every time I create a post here, I am too lazy to update that README.md file just to add a link to a new post.

To help with that, I created a Python script that will run within GitHub Actions periodically to update the README.md file.

First, I created a simple Python script using BeautifulSoup4 and python-requests to scrape the index page of my blog and fetch the first 5 blog posts' titles and links.

import requests
from bs4 import BeautifulSoup

URL = "https://hunj.dev"

req = requests.get(URL)
soup = BeautifulSoup(req.content, "html.parser")
posts = []

for post in soup.select('article.post'):
    title = post.select_one('h2.post-title').text.strip()
    path = post.select_one('a.post-title-link')['href']
    text = f"- [{title}]({URL}{path})"
    posts.append(text)

posts.append('\n')
posts_text = '\n'.join(posts)

Since creating a link in Markdown is easy and I want it to be a list item, I made each line to be formatted as - [title](link).

This mini script can be augmented by using the PyGitHub module to get your repository's information, make changes, and push up a new commit:

from github import Github, Auth
import re

TOKEN = 'your_github_token'
REPO = 'your/repo'
HEADER = "### Recent Blog Posts"
END = "Read more"
REGEX = rf"{HEADER}[\s\S]*?(?={END})"

shithub = Github(auth=Auth.Token(TOKEN))
repo = shithub.get_repo(REPO)
readme = repo.get_readme()
old_content = readme.decoded_content.decode()

new_content = re.sub(REGEX, posts_text, old_content)
if new_content != old_content:
    repo.update_file(readme.path, 'Update recent blog posts', new_content, readme.sha)

Using a regular expression, the text content between ### Recent Blog Posts and Read more is being replaced, in the following README file section like:

### Recent Blog Posts
- [ufw vs. iptables](https://hunj.dev/ufw-vs-iptables/)
- [Working with iptables](https://hunj.dev/working-with-iptables/)
- ["뭐든지 배워서 하겠습니다"](https://hunj.dev/sinib-junieo-gaebaljaga-cwihaeya-hal-jase/)
- [Dev Machine Setup](https://hunj.dev/dev-machine-setup/)
- [Fixing error: no matches for kind "PodDisruptionBudget" in version "policy/v1beta1"](https://hunj.dev/no-matches-for-kind-poddisruptionbudget-in-version-policy-v1beta1/)

Read more at [hunj.dev](https://hunj.dev)

Note that for the TOKEN variable you must replace with a personal access token if you want to test run this in your local environment. However, in GitHub Actions, we do not need to create and provide one, thanks to the GITHUB_TOKEN secret.

Using Poetry, I created a simple package dependency file for BeautifulSoup4, python-requests, and PyGitHub, then created a GitHub Actions workflow that will install dependencies and periodically run this script, called update_recent_blog_posts.yml:

on:
  schedule:
    - cron: '0 0 * * 0' # Run once a week at 00:00 (midnight) on Sunday
  workflow_dispatch:

jobs:
  update_posts:
    runs-on: ubuntu-latest

    steps:
    - name: Check out repository
      uses: actions/checkout@v3

    - name: Set up Python
      uses: actions/setup-python@v4
      with:
        python-version: '3.11'

    - run: |
        pip install poetry
        poetry install

    - name: Scrape posts and update README
      run: poetry run python ./.github/scripts/update_posts.py
      env:
        GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}

The directory structure in my repository looks like this:

.github
  ㄴ scripts
    - update_posts.py
  ㄴ workflows
    - update_recent_blog_posts.yml

You can find the full Python script here.