Let’s start by getting back to search engine 101 – what is crawling? Every search engine, be it Google, Bing or Yahoo sends out crawlers (sometimes known as spiders, robots or simply bots).
The job of these crawlers is to follow links across the web. For every link they are able to follow, they record the page it takes them to in an index (with a number of exceptions).
When a user searches using a particular search term, the engine shows them results from its index that the crawlers found on their journey across the web.
So simply put – the results you see in the search engine result pages (SERPs) are not a real time view of the web. They are a snapshot of pages from the last time a crawler visited them.
Therefore it is not only vital that your pages are crawled in the first place but also that they are crawled regularly to keep up-to-date with any changes. Crawl efficiency is all about ensuring this happens as seamlessly as possible.
The good news is that for small to medium websites – it’s pretty manageable. For sites with thousands of pages it can become much more challenging and all the more important to stay on top of.
There are a number of tools available that help you understand how/if your pages get crawled like Deep Crawl or Screaming Frog. However, Google Webmaster Tools (GWT) is completely free and gives you all the detail you’ll need for a small to medium sized site.
If you’re having problems getting GWT set up for your site then don’t panic – just give us a call! Bing also provide a similar tool and if you’ve got the time, then it’s worth utilising this as well.
Once you’ve got your tools ready, it’s time to start improving your crawl efficiency. Here’s five ways you can do exactly that.
What is a sitemap? It’s typically an XML file that sits on your web server and tells the search engine about the URLs and content of your website. It also tells them about the importance of particular pages and the frequency that they change.
So if you have a page that’s pretty important that you update and a regular basis, let’s say your home page – you’d want Google to crawl that page as regularly as possible. A sitemap helps to encourage it to do just that.
Sitemaps can also tell a search engine when a page was last modified, helping it to determine the freshness of a page. Freshness is one of many factors Google uses to rank pages in the SERPs so well worth pointing out when you’ve updated a page.
It’s easy to submit a sitemap using GWT and if you’d like to create/manage one, Google is happy to tell youhow.
If you’re not fortunate enough to have a CMS that automatically creates a sitemap for you like the Intergage one, there are websites that can do it for you. However, it’s important to update them and resubmit if you make any changes to the site.
Google assign a ‘Crawl Budget’ to every website, meaning only so many pages will be crawled per visit from the spiders. So you don’t want them wasting their time crawling less important pages and missing the juicy ones.
It’s likely that some URLs you just wouldn’t want appearing in SERPs, like pages from a customer portal for example.
There are a number of ways to influence where crawlers go and what they index. However, if you would like them not to index a specific page you can add the content=”noindex” attribute to a robots <meta> tag in the head of the HTML code.
It would look something like this…
<meta name=”robots” content=”noindex”>
There may well be a number of URLs in your site you don’t want to be indexed. In this case you can create a robots.txt file that sits on your web server like a sitemap.
You can check what pages are being blocked using the ‘Blocked URLs’ section of GWT. Or you can request a URL be removed from SERPs using the ‘Remove URL’ section.
Google likes to show its searchers sites that are user friendly. Clicking a link that takes you to a page that doesn’t exist just doesn’t fit that criteria. That’s called a 404 error and when Google finds them by crawling your links, it doesn’t like it one bit.
Lots of 404 errors will not negatively impact your rankings, however it’s not good for the site visitor, nor is it useful for the robots to crawl lots of empty pages when they could be crawling relevant content.
Head to the ‘Crawl Errors’ section of GWT to identify any problems. A 404 can be resolved by simply redirecting the missing URL to a relevant existing one.
Speak to your Webmaster to instruct your server to perform a ‘301’ redirect.
The ‘internal links’ and ‘links to your site’ sections of GWT tells you the number of links that point to your different pages. Pages with more links pointing to them get crawled more often because they typically have a higher PageRank.
So check to make sure your internal and external links point to your most important pages. If they don’t then you’ll need to review your external link building strategy and/or the structure of you internal links.
One of the best things you can do with GWT is the simplest – check how many of your pages have been indexed and at what frequency.
If you have 200 pages and 190 have been indexed then things are probably ticking along nicely. But if you’re getting low indexation numbers then perhaps you need to make some adjustments.
Are you blocking pages in your robots.txt file, do you have nofollow attributes on links or in your <meta> tags?
There could be a number of reasons why your pages are not getting indexed – head to the ‘Index status’ section of GWT to start your investigation!
If your crawl frequency (found in the ‘crawl stats’ section) is low then you may need to update your site with new content. Consider also resubmitting an up-to-date sitemap.
Another key factor in improving your crawl efficiency is your page load times. However, that’s a blog for another day.
Crawl data often gets ignored by marketers and website owners who focus all their attention on traffic and conversion data.
However, how search engines find your site is the key to generating traffic and subsequent conversions thus forms a fundamental part of the process.
There’s a whole host of useful data in GWT. If you’d like to learn more about it – come along to one of my training courses and I’ll show you how!
No posts found, be the first!
[url]http://example.com[/url] or [url=http://example.com]Example[/url]
[list][*] Point one [*] Point two[/list]
Copyright © 2016 Intergage Ltd | All Rights Reserved | Registered in England | Company No. 03989761 | VAT No. 754 8431 12