Migrating WordPress

tl;dr Migrating an ActivityPub-enabled WordPress instance using export and import of XML files while maintaining ActivityPub subscribers:

  • Don’t try this
  • If you get it wrong, it will appear that you kept your followers, but in fact you have silently lost them all
  • The author ID numbers have to line up exactly
  • Then you can copy over the ActivityPub author key
  • Posts then do get delivered to Mastodon and Friendica
  • Replies from Mastodon don’t seem to work, but they do from Friendica

Purpose

About a year ago I decided to start a blog. Actually two blogs. At the time I maintained a small, idle WordPress site for a friend as a favour. I thought it would make sense to set up my new blogs, plus the existing site, as a multisite WordPress on a new server. I couldn’t be bothered finding a good host, so I used AWS. I reasoned that it would be easy to move to something else later.

Recently I discovered Yunohost, thanks to the excellent articles from Elena Rossini. This is definitely the way to self-host stuff. One thing it does particularly well is migration: make an archive, download it, upload it, restore it, flip the DNS, and it works. This process also works very well for splitting and merging services, very useful if my friend ever wants to take back control of her website. I immediately wanted to migrate to that. At the same time I also wanted to migrate to Hetzner, since it’s much cheaper than AWS.

But Yunohost WordPress doesn’t work well with multisite WordPress. You can use multisite, but it can’t handle different sites on different domains. Instead the recommendation is to install several separate WordPresses on Yunohost.

So my purpose was to extract each site from the multisite and establish it as a separate WordPress on Yunohost.

As far as possible I wanted to do this in a naïve way, without third-party plugins. I had already moved my friend’s site once before, using the standard WordPress export and import tools. I wanted to know if the same thing would work here. And I certainly wasn’t going to pay for a solution when I’m perfectly capable of migrating a blog myself.

The Snag

The problem is that the whole reason I set up blogs in the first place was to promote ActivityPub on WordPress. I don’t have anything interesting to say, but nevertheless I have picked up a couple of followers. It doesn’t speak well of ActivityPub as a solution if after only a year I break those subscriptions.

I know enough about ActivityPub to guess that this is likely to be a problem. At the very least there is the question of encryption keys. All ActivityPub messages are signed by the actor who initiates them with that actor’s secret key. Would these secret keys be correctly exported and imported?

First Attempt

To check that this worked I set up yet another site on my multisite, called test.exon.name, and subscribed to a user from my real Friendica account. This was intended to prove that the system works, without flooding my tiny number of subscribers with irritating test posts.

At first I simply did the export and import as I have before. Then I published a test post. And, no big shock, it did not arrive at Friendica.

Doing It Properly

So I moved away from using live services and set up a suite of test servers to investigate properly:

  1. test1 with a WordPress multisite, built to imitate my real site
  2. test2 with Yunohost hosting either a Friendica or Mastodon instance, with a follower following test1 just like the followers of my real blogs
  3. test3 with Yunohost hosting a new destination WordPress site holding just one of the multisite’s blogs

Naturally this required quite a few iterations, but eventually I proved that it could be forced to work.

Author IDs

I had a suspicion that author IDs would be an issue. When you create a user in WordPress, they get a sequential number as their author ID, the first user being 1. Then the URL of that author looks like https://blog.example/?author=1. And that’s where the ActivityPub profile information lives. You can see that information by fetching it on the command line (and piping it to jq to format it nicely):

$ curl -H "Accept:application/activity+json" 'https://blog.example/?author=1' | jq
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  3794    0  3794    0     0  15755      0 --:--:-- --:--:-- --:--:-- 15742
{
  "@context": [
    ...
  ],
  "id": "https://blog.example/?author=1",
  "type": "Person",
  "attachment": [
    ...
  ],
  "name": "Alice Blogger",
  ...
  "url": "https://blog.example/author/alice/",
  "publicKey": {
    "id": "https://blog.example/?author=1#main-key",
    "owner": "https://blog.example/?author=1",
    ...
  },
  ...
  "alsoKnownAs": [
    "https://blog.example/?author=1",
    "https://blog.example/author/alice/",
    "https://blog.example/@alice"
  ],
  ...
  "webfinger": "alice@blog.example"
}

There’s a lot going on there. One thing to note is that the url is officially https://blog.example/author/alice/. This is a much better URL. Because it has a “human” name instead of a number defined by the number of previous users, it’s much easier to move that to another site and keep the same URL. The ID is https://blog.example/?author=1. That could cause some problems.

I don’t think those results are really definitive though. ActivityPub implementations get their URLs via webfinger. You can see the webfinger address in the above results. You can query webfinger by curl on the command line too.

$ curl -H "Accept:application/activity+json" https://blog.example/.well-known/webfinger?resource=acct:alice@blog.example | jq .
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100   529    0   529    0     0   1885      0 --:--:-- --:--:-- --:--:--  1882
{
  "subject": "acct:alice@blog.example",
  "aliases": [
    "https://blog.example/?author=1",
    "https://blog.example/author/alice/",
    "https://blog.example/@alice"
  ],
  "links": [
    {
      "rel": "self",
      "type": "application/activity+json",
      "href": "https://blog.example/?author=1"
    },
    {
      "rel": "http://webfinger.net/rel/profile-page",
      "type": "text/html",
      "href": "https://blog.example/?author=1"
    },
    {
      "rel": "http://ostatus.org/schema/1.0/subscribe",
      "template": "https://blog.example/wp-json/activitypub/1.0/interactions?uri={uri}"
    }
  ]
}

There you can see that the self link is to the bad URL, https://blog.example/?author=1.

So in this tangled mess of recursive pointers to different lumps of JSON, which URL will our followers’ hosts use to verify my identity when I publish something?

At this point I should probably dive into the ActivityPub spec. But I already know that the spec doesn’t specify how to identify an author’s inbox and outbox by name. The widely used alice@blog.example (or @alice@blog.example) format does not appear, and clients are expected to just know where those URLs live. So webfinger was tacked on as a solution for that problem. But in theory, how exactly should that be implemented? That’s not quite clear.

In practice it was easy to check: does it work? And the answer is that it doesn’t. When importing into WordPress set up with my new user on a different ID from my old user, the follower relationship just doesn’t import. If I give them the same ID by carefully adding users in the same order, the import works fine. And I can see the bad author ID URLs inside the XML items representing my follow relationships.

So that’s that. Give your authors the same ID, or don’t bother. This already rules this out as an idea for large established multisites. If you have 1000 authors overall and you want to split out a single site with just 10, you need to create 990 dummy accounts, and carefully intersperse them with the real accounts.

Fortunately for me, I only ever created a single ActivityPub author, myself, who was the second account created. So this is easy to replicate.

Author Keys

The next problem is fairly clear. All ActivityPub messages are signed with a secret key, so that the receiving server can verify that the message is authentic. The keys are held on the sending server on behalf of each user – this is not end-to-end authentication. When you export an ActivityPub site, does the resulting dump include the secret keys?

Answer: no, it doesn’t.

The WordPress export tool is a fairly odd beast and I’m not quite sure why it’s built the way it is. In particular, there’s no way to include media files themselves, meaning that the export in no way constitutes a backup. Instead, when you import into another server, the importing server connects to the original server and downloads the high-resolution versions of each image. This means the original server must be live on the original domain name. That excludes a whole pile of use-cases involving conflicts with the previous administrators. In my case of course, fine, I’ll just do it that way.

It’s also somewhat clear from the export form that users are not included, and there’s no way to export the user list. I would have guessed that private keys would arrive with such a user export file, but since users aren’t included at all, it’s no surprise this doesn’t work here.

Screenshot of the export form from wordpress.  Apart from the option "all content", there are options for posts, pages, followers, outbox, extra fields, inexplicably a second extra fields option, and finally media.  There is no option to export users.

So, the only alternative is to hack the database. I use mysql on the command line because I’m a relic. The user keys are kept in the options table, rather awkwardly with a suffix indicating which user the key is for. The complication here is that on a multisite, there is a separate options table for each site. So you need to establish which site you’re talking about first. One way to do this is just guess the number and select the blogname to check. For example, here I establish that my test blog is site number 5:

MariaDB [blog]> select option_value from wp_5_options where option_name = 'blogname';
+----------------------+
| option_value         |
+----------------------+
| Matthew Exon Testing |
+----------------------+
1 row in set (0.000 sec)

Then you can select the key itself. In the next step we need to enter it as a string with \n instead of literal newlines, so we have to perform that substitution here. (This is not my actual key, I took this from my test server.)

$ sudo mysql blog -sN -e 'select option_value from wp_3_options where option_name = "activitypub_keypair_for_mat"' | sed -e 's/"/\\"/g'
a:2:{s:11:\"private_key\";s:1704:\"-----BEGIN PRIVATE KEY-----\nMIIEvgIBADAN...QAB\n-----END PUBLIC KEY-----\n\";}

Then on the new single-site update the key there:

MariaDB [wordpress]> update wp_options set option_value = "a:2:{s:11:\"private_key\";s:1704:\"-----BEGIN PRIVATE KEY-----\nMIIEvgIBADAN...QAB\n-----END PUBLIC KEY-----\n\";}" where option_name = "activitypub_keypair_for_mat";
Query OK, 1 row affected (0.005 sec)
Rows matched: 1  Changed: 1  Warnings: 0

You have to do that for every user that has followers. You can check that the key is correct by retrieving the public key on the command line for both the old and new servers:

$ curl -H "Accept:application/activity+json" https://blog.example/author/mat/ | jq .publicKey

Results

I found that with these two caveats, using the same author ID and copying over the keys, I was able to publish from the new WordPress and have the results picked up by followers on both Friendica and Mastodon.

However, while on Friendica I could like and reply to WordPress posts, on Mastodon those likes and replies never made it to WordPress. I could debug this, but at that point I decided that this was good enough for me, and I couldn’t be bothered debugging this further. No-one ever replies to me anyway.

The worst thing I found was that if you don’t fix the signing keys, the results are very misleading. The followers list appears to import correctly, even including the avatars of the people following. But when you publish a new post, it is silently not delivered.

I think this is pretty bad. It’s fair enough that the default export and import tools aren’t supported as an account migration solution. But it’s entirely reasonable for a user to guess that this would work. And it really does appear to work at first glance! In fact, long enough for a user to shut down their old server permanently. There is a real possibility of this failure leading to a user losing their private keys and a possibly very large follower list.

Implementations

Developers have quite the love/hate relationship with ActivityPub. There are now hundreds of implementations. The major competitor protocol is AT, and while it has far more users, it has far fewer implementations. So there must be something in there that developers particularly like. But developers also love complaining about it.

One of the major complaints is how difficult all of the authentication handshaking is to get right, particularly given how loosely specified it is. We already saw that translating alice@blog.example into a URL is just missing, and webfinger had to be retrofitted for that purpose. And we also saw how that leads to confusingly contradictory URLs – without an overarching spec, how is a developer supposed to understand the relationships between fields in two different protocols?

More than this though: it’s not specified which fields should be cached, which should be held permanently, and which are ephemeral. For my purposes that means, some clients likely hold on to https://blog.example/?author=1 and use it to retrieve https://blog.example/author/alice/, and some likely do the opposite. Or they hold on to both permanently. Or even hold on to the IP address too! Who knows?

And bear in mind that both Friendica and Mastodon existed long before ActivityPub ever did, and many other implementations have ActivityPub grafted on as an afterthought, often via separate plugins. Of course they all have peculiarities with their approach to federation.

It’s therefore no shock to me that while both Friendica and Mastodon receive updates from migrated WordPress fine, only Mastodon has problems replying. This is exactly the kind of implementation-specific madness I have come to expect. I already know that WordPress just doesn’t bother verifying the signatures of replies at all. And while I could dive down the rabbit-hole of debugging the difference between Friendica and WordPress, it’s unlikely to lead to any important new knowledge for me. It will more likely just turn out to be a random gotcha.

This is why I’m a keen supporter of the FediTest project. This is an attempt to build an automated test system that can plug into any ActivityPub implementation and check how it really behaves. This includes testing strictly against the spec, while also checking optional or conventional behaviours. That could eventually include testing what happens when an account identity moves to a different server. Unfortunately the technical challenges are immense, they only seem to have scratched the surface of the problem so far, and their funding from NLnet appears to have run out.

Account Migration

Ultimately account migration is a feature that WordPress just doesn’t support. When in a conspiratorial frame of mind I might speculate about vendor lock-in, but that way madness lies. Not every feature exists, that’s life.

Perhaps more seriously, ActivityPub itself also just does not support account migration, even though this is explicitly given as a core motivation for federated social networks in the first place. Some implementations may include features to automate the process, but these are outside the core spec, and the results are known to be unreliable. Some federated services have “nomadic identity” built-in, but this is not a mainstream feature. The elephant in the room for the Fediverse was going to be Threads, who promised full integration including account portability by 2025, but here we are and Threads is barely even a presence in the Fediverse, and account portability seems to have fallen off the roadmap.

I knew all of this when I set up my blog, and half-expected to have to do manual hackery as above when the time came. So I’m only mildly disappointed that these things haven’t improved much. Still. Mildly disappointed.

Improvements

I would suggest some improvements to the WordPress ActivityPub implementation. Although I have messed around with the internals of the plugin a little, I can’t say if these are actually good ideas, or if they would solve any of the problems I outlined. So I’m not even going to officially submit these as feature requests. I’m just noting them here as ideas.

  1. The webfinger implementation should as far as possible use URLs with user logins instead of user IDs.
  2. There should be an option to export user data in the built-in WordPress export tool.
  3. When exporting followers, the public key of the account that was followed should be included.
  4. When importing followers, if the public key of the selected account does not match the account that was originally followed, that follow should be dropped, and an error printed.

Leave a Reply

Your email address will not be published. Required fields are marked *