Wordpress to Hugo Migration Process
By EricMesa
- 9 minutes read - 1901 wordsAs there are many people who are currently looking for alternatives to Wordpress in light of a little…. instability… I decided I would document my migration process. I figure my case is one of the more extreme cases, as I ran a self-hosted Wordpress instance for the past 19 years (since Feb 2005) and have ~4000 posts that needed to be migrated. I also have lots of photos, videos, and other media. Finally, I have made heavy use of many Wordpress features.
Getting your data into the right format
The Hugo Docs have 4 scripts for moving from Wordpress to Hugo. I went with wp2hugo as the listed features seemed to cover all my needs. I can also, happily, say that the developer is responsive to issues. I created 2 issues for things that affected my site and he responded within a day or so. (They both also got fixed although we shouldn’t expect every volunteer dev to cater to every issue) At first used the script in the mode in which it will download images, but it was having issues every time it came across a broken link (and it also seemed to want to download images from other sites (like Goodreads) so, in the end, I just went without) I mention this only because it might affect whether you have to take the next step.
Post Data Freedom Tasks
Images
All of my image, video, etc links in my new markdown files still pointed to /wp-content/uploads. So, at first I ran
grep -lr --include='*.md' 'wp-content' > /tmp/hugo_needs_image_fix
and I thought perhaps I would just take my time and do it manually over the next few months. But then I realized I was being stupid. This is SO what computers are good at that it was why the Perl programming language was invented. I already backup my Wordpress site via rsync, so I had all my images on disk. Therefore I wrote the following Python script to take care of this for me:
from pathlib import Path
import re
import shutil
if __name__ == '__main__':
DIRECTORY = "/tmp/generated-2024-10-11-16-36-01/content/en/posts/"
IMAGE_SRC = "/media/Archive/server/DigitalOcean/ericsbinaryworld/wordpress/wp-content/uploads/"
IMAGE_DEST = "/tmp/generated-2024-10-11-16-36-01/static/images"
p = Path(DIRECTORY)
all_blog_posts = list(p.glob('*.md'))
for post in all_blog_posts:
with open(post, "rt") as file:
content = file.read()
if "wp-content/uploads" in content:
compiled = re.compile(r'wp-content/uploads/(\d*/\d*/[\d\w\-%]*\.(jpg|png|PNG|mp3|mp4|jpeg|epub|webp|gif|pdf|webm))')
file_path = re.findall(compiled, content)
for file in file_path:
try:
file_path = file[0]
if "%5F" in file_path:
file_path = file_path.replace("%5F", "_")
shutil.copy2(f"{IMAGE_SRC}{file_path}", f"{IMAGE_DEST}/{file_path}")
except FileNotFoundError as error:
print(error)
content = content.replace('wp-content/uploads', 'images')
with open(post, 'w') as file:
file.write(content)
This was an 80% solution. There were a few things it missed and I should technically not have had it do the replacement if the image, video, etc wasn’t copied over. I’ll have to write another script to find dead image links and fix that.
Fixing Categories and Tags
I don’t know if the wp2hugo script was written for an older way of doing things in Hugo or if it’s just the way the PaperMod theme that wp2hugo uses wants things (more on this later), but I ended up with my categories and tags in the “front matter” (the metadata part of a Hugo md file) in the singular form. But Hugo expects the plural form. This was a source of consternation for me over the course of a couple hours as I was trying to figure out why categores, tags, and related posts weren’t showing up in the theme I chose. So….time for another script!
from pathlib import Path
if __name__ == '__main__':
DIRECTORY = "/ItsABinaryWorld2-Hugo/content/en/posts/" # I'd moved my blog folder at this point.
p = Path(DIRECTORY)
all_blog_posts = list(p.glob('*.md'))
for post in all_blog_posts:
with open(post, "rt") as file:
content = file.read()
content = content.replace('category:', 'categories:')
content = content.replace('tag:', 'tags:')
with open(post, 'w') as file:
file.write(content)
Code Snippets
No need to do anything here. In the few posts I checked, the wp2hugo theme did a great job converting things over.
Videos
The theme I chose does not have a shortcode for displaying videos, so I had to create my own. If you host your own videos (as opposed to using Vimeo or YouTube), you can take the following code:
<video width="600" preload="metadata" controls>
<source src="{{ .Get "src" }}" type="video/{{ replace (path.Ext (.Get "src")) "." ""}}">
Your browser does not support the video element.
</video>
And put it in the directory: $HUGO_ROOT/layouts/shortcodes/video.html
(where $HUGO_ROOT means the directory that has all the files and directories for your Hugo site)
Why not host them on YouTube or Vimeo? Because sometimes stuff that is clearly fair use can get unfairly targeted by their ContentID system and I don’t feel like fighting copyright strikes.
Archetypes
Speaking of custom stuff, to have a blog post writing process that’s closer to Wordpress and requires less remembering what goes in the “front matter”, create a file in the $HUGO_ROOT/archetypes folder.
Here’s one I made with the filename post.md:
---
title: "{{ replace .Name "-" " " | title }}"
author: EricMesa
type: post
date: {{ .Date }}
featured_image: myimage.jpg
draft: true
categories:
- A
- B
- C
tags:
- Hugo
- Game Development
- Internet of Things (IoT)
- Linux
- ...
description: xxx
---
It reminds me of most of the fields I want to fill in. Then, to make a new post, I use the command:
hugo new -k post content/en/posts/name-of-post.md
It will fill in the date and title for me.
Still left ToDo
There are a few things I still haven’t fixed because this is just a personal blog and I’m not making money off of it so I’m taking my time.
- Menus - a few menus were generated automatically for me with this theme, but I had a bunch of menu items in my Wordpress site (mostly revolving around my Pages). I have to figure out how to either automate that or how I will manually structure it.
- For some reason, most of my video embeds did not get converted correctly by the wp2hugo script. I have to manually fix those. Thankfully it’s only about 20 or so blog posts.
- write a script to convert the “front matter” field of “cover” to “featured_image”. This is another example of where the PaperMod theme wanted the featured image to be named “cover” but my current theme wants it to be named “featured_image”.
- a script to get rid of “&” which replaced a few ampersands in my post titles.
- an aforementioned script to figure out which images didn’t get ported over correctly by my first script.
- gaining a true understanding of Hugo’s image pipeline to get smaller image files and/or converting them to WebP, AVIF, or JPEG-XL. This allows you to have the full-sized images around as original backups, but only upload optimized files.
- Most of my YouTube embeds were not successfully converted. Need to write a script to convert those to shortcodes. Or at least a script to find those posts so I can fix them manually (a script might be more regex than I care to deal with)
- figure out how to add previous/next for blog posts - although there is already an issue for this in the theme github repo.
- create a shortcode for Mastodon embeds - this should be incredibly easy
- figure out how to turn on search
- get Google to find my new sitemap.xml
Tips for others following in my footsteps
If you are switching from Wordpress to Hugo you’re probably a technical or technical-adjacent person. You probably also know that if you have many fewer images and videos than I (I’ve got 8GB worth) you can host your site from Github, Gitlab, or use those to push to Digital Ocean or Netlify. So you’ll want to use git to manage changes to your site. If you’re going to try any of my scripts, I strongly suggest first committing the results of wp2hugo as an initial commit in git. Then create a branch to test out the scripts. If they work well, you can then merge them to main and keep going.
One of the neat things about Hugo (that works better in my mind than Wordpress) is that if you launch the local server (with the “hugo serve” command) you can get live previews as you save your file. I happen to have a multi-monitor setup so I can have the page open on one minotr while I type in the other. So you can run the script, see what it does to your site and then keep going.
This brings me to something important I learned! Do not run hugo (which builds or compiles your site) while you have hugo serve running. You will end up generating a site that has links to localhost rather than the correct URL.
Things you might miss / Slight Annoyances with Hugo as compared to Wordpress
- being able to search for images/videos - how easy this is will depend on your operating system’s search abilities rather than the nice Wordpress media view. That said, with the amount of media I had, I found the Wordpress site to be incredibly slow when searching for a particular image anyway.
- An easily set of fields for filling out image metadata like alt tags, captions, etc which are important for accessability (even though I was never as perfect at that as I wish I was).
- having the links fill in for you if you reference your other pages - you’ll have to either remember the page you want to link to or need to search your site to find it.
- themes being interchangeable as long as you don’t have a complicated setup - over the past 19 years, with very few exceptions, I could bounce between Wordpress themes (and often did). In fact, I track that on this page Blog Post Series (I’ll probably switch to using a Hugo series taxonomy for this). This almost never led to issues (except in how they would treat cover/featured images - depending on whether they were page width or the same width as the post). This is NOT the case in Hugo. They expect your pages to potentially have different front matter. The options page will potentially require many changes. If you like to change your themes often, Hugo might not be for you. (Unless you love making scripts to fix everything)
- complete control of comments together with your blog or software - although I haven’t really had readers writing comments on my blog in years (with a very, very small set of exceptions), there are lots of commments on my early pages (back when blogs were more in fashion than social media) which I haven’t figure out how to port yet. Hugo, like most (all?) static site generators do not handle comments. (Makes sense, then they would be dynamic sites, not static sites) So you need to use another service (open source and/or commercial) to handle your comments.
- spell check - you can probably configure vim, emacs, kate, notepad++, etc to use spell check. But it’s not built-in as it is in Wordpress (via the browser)
- building the site can take a long time if you have tons of categories, tags, and images. Even with thec cache-dir it takes 69623 ms to build my site.
- config file not documented in the easiest way - there’s a relatively large cognitive load to figuring out the settings file. It’s documented, but it’s not always clear which settings are universal and which are theme-dependent.