Duplicate content is multiple pages containing the same or very similar text. Duplicates exist in two forms:
- Internal: When the same domain host the exact or similar content.
- External: When you syndicate content from other sites (or allow the syndication of your content).
In both cases, duplicate content splits link authority and thus diminishes a page’s ability to rank in organic search results.
Say a website has two identical pages, each with 10 external, inbound links. That site could have harnessed the strength of 20 links to boost the ranking of a single page. Instead, the site has two pages with 10 links. Neither would rank as highly.
Duplicate content also wastes the crawl budget and forces Google to choose a page to rank — seldom a good idea.
While Google states that there is no duplicate content penalty, getting rid of such content is a good way to consolidate your link equity and improve your rankings.
Here are two good ways to remove duplicate content from search engine indexes — and eight to avoid.
2 Ways to Remove
To correct indexed duplicate content, consolidate link authority into a single page and prompt the search engines to remove the duplicate version from their index. There are two good ways to do this.
- 301 redirects are the best option. They consolidate link authority, prompt de-indexation, and redirect users to the new page. Google has stated that it assigns all link authority to the new page with a 301 redirect.
- Canonical tags point search engines to the main page, prompting them to transfer link equity to it. The tags work as suggestions to search engines — not commands like 301 redirects — and they don’t redirect users to the main page. Search engines typically respect canonical tags for truly duplicate content (i.e., when the canonicalized page has a lot of similarities to the main page). Canonical tags are the best option for external duplicate content, such as republishing an article from your site to a platform such as Medium.
8 Methods to Avoid
Some options that attempt to remove duplicate content from search engine indexes are not advisable in my experience.
- 302 redirects signal a temporary move rather than permanent. While Google has stated that it treats 302 redirects the same as 301s, the latter is the best way to permanently redirect a page.
- Meta refreshes (executed by client-side web browsers) are visible to users as a brief blip on their screen before the browser loads a new page. Your visitors and Google may be confused by these redirects, and there’s no reason to prefer them over 301s.
- 404 error codes reveal that the requested file isn’t on the server, prompting search engines to deindex that page. But 404s also remove the page’s associated link authority. There’s no reason to use 404s unless you want to erase low-quality link signals pointing to a page.
- Soft 404 errors occur when the server 302 redirects a bad URL to what looks like an error page, which then returns a 200 OK server header response. Soft 404 errors are confusing to Google, so it is best to avoid them.
- Search engine tools. Google and Bing provide tools to remove a URL. However, since both require the submitted URL to return a valid 404 error, the tools are a backup step after removing the page from your server.
- Meta robots noindex tags tell bots not to index the page. Link authority dies with the engines’ inability to index the page. Moreover, search engines must continue to crawl a page to verify the noindex attribute, wasting crawl budget.
- Robots.txt disallow does not prompt de-indexation. Search engine bots no longer crawl disallowed pages that have been indexed, but the pages may remain indexed, especially if links are pointing to them.
Avoiding Duplicate Content
In its official documentation, Google recommends avoiding duplicate content by:
- Minimizing boilerplate repetition. For example, instead of repeating the same terms of service on each page, publish it on a separate page and link to it sitewide.
- Not using placeholders that attempt to make automatically generated pages more unique. Instead, spend the effort to create one-of-a-kind content on each page.
- Understanding your ecommerce platform to prevent it from creating duplicate or near-duplicate content. For example, some platforms minimize product-page snippets on category pages, making each page unique.