How to Find Your Orphan Pages
In order for Google and other search engines to index your pages, they need to know they exist and where.
This is usually accomplished in one of two ways:
- The crawler follows a link from another page.
- The crawler finds the URL listed in your XML sitemap.
A page without any links to it is called an orphan page.
Because search engines can’t find an orphan page through other links on the end, orphan pages often go unindexed and never show up in search results.
Even if your orphan pages are listed in your XML sitemap, they are still a problem for SEO.
With no internal links, no authority is passed to the pages, and search engines have no semantic or structural context in which to evaluate the page.
Without any way of knowing where the page fits into your site as a whole, it can be more difficult to determine which queries the page is relevant for.
In this post, we’ll explore how to find orphan pages on your site.
1. Identify Your Crawlable Pages
First, you’ll need a list of all of the URLs that currently can be reached by crawling your site’s links.
You will need an SEO spider to do this. I recommend ScreamingFrog.
Whatever crawler you use, make sure it is set to crawl only pages that are indexable by search engines, meaning that it should not crawl pages that are noindexed or pages that are hidden from search engines by robots.txt.
Start the crawl from the homepage of the site, making sure to use the canonical URL, including the proper https or http and www versus non-www.
Once you have crawled your site, export the URLs to a spreadsheet like this:
2. Resolve 2 Common Causes of Orphan Pages
Before checking any tools or sources to find orphan pages, there are two common causes of orphan pages that should be immediately addressed and dealt with.
What both of these causes have in common is that they are essentially page duplicates that should automatically redirect consistently to only one URL.
If they don’t, it’s likely that some versions of the page are not linked to and…
Read More Here