When cross-posting content to other domains, we're taught to reference duplicate content with a canonical link. But why has Google Lighthouse been penalising us for it? I set out to solve this conundrum once and for all.
I cross-post blog content regularly on DEV.to, and I like to cross-post articles I write for companies to my personal blog site. This means that after a post is published in its original location, I publish it on another domain, word for word. This helps me get my content to more people. Cross-posting is perfectly legitimate, and when you do cross-post, you need to let Google know that your content is a duplicate of the original.
How to reference duplicate content
When cross-posting content to other domains, reference duplicate content with a canonical link in the
head tag of the web page, like so.
<link rel="canonical" href="https://originaldomain.com/link/to/original-post">
If you run regular Google Lighthouse checks, you might have noticed that you were penalised for pointing your canonical link to a different domain where the original content lives! But hold up. Isn't this what canonical links are for?
Canonical links on the same domain
Now, there are perfectly valid reasons to include canonical links that point to the same domain. Google Search Console tells us:
A canonical URL is the URL of the best representative page from a group of duplicate pages, according to Google.
Why might you have duplicate pages on your site? Take for example an e-commerce site, where your search page URLs might exist in duplicate forms. Without a specified canonical link, Google will choose one (at random, maybe) as canonical.
For the following URL examples, you should set your canonical link to https://shop.com/search.
The investigation begins
I was confused. And I was ashamed that my Lighthouse SEO scores were lower than I thought they should have been! (Oh, gamification, how you taunt me!) And so, I took to Twitter to investigate. Here's a thread started by Tamas, reaching out to Martin — a Developer Advocate at Google.
Martin suggests that Yahoo and Bing don't like cross-domain canonical links, which was referenced in the source code for Lighthouse. I wasn't happy with this! And so I continued down the rabbit hole, and found a light at the end of the tunnel.
Bing Webmaster to the rescue
I found the Bing Webmaster Guidelines, and the guidance for canonical links in 2021 stated:
Do not reuse content from other sources. It is critical that content on your page must be unique in its final form. If you choose to host content from a third party, either use the canonical tag (rel="canonical" to identify the original source or use the alternate tag (rel=" alternate").
Given that Bing was recommending
rel="canonical", and Yahoo uses results crawled from Bing, I opened an issue on the Google Lighthouse repo with my findings.
I opened a PR to Google Lighthouse
After a few months of discussion on the issue, we concluded that the advice for canonical links was indeed, outdated, and that all the major indexers began to support cross origin canonical urls in 2009. This was great news! And so, I opened a pull request to remove the cross-origin check for canonical URLs, which was merged to main in November 2021.
Your scores are now improved
A few more months of waiting in excitement, and the code change is now available to everyone in Chromium browsers! Your SEO scores for pages that use canonical links that point to a different domain are now improved.
What's my one piece of advice after all of this? Question everything. You never know what it might lead to. You could end up improving the web for everyone!
I'm a live streamer, software engineer, and developer educator. I help developers build cool stuff with blog posts, tutorial videos, live coding and open source projects.