October 27th, 2008
One thing that I read a lot of people asking about is how to increase the rate at which Google will crawl your website. The higher that your Google crawl rate, the more important Google thinks your website is. That said, all you have to do to increase the crawl rate is to increase the importance of your website, right? Well, yes, but here are a few more actions that you can take to try and get your site crawled more often.
- Most importantly, update your content regularly and ping Google when you do. Try to update your website at least 3 times per week, if not more. Make sure Google knows there is new content each time, so it can tell that your site provides fresh content on a regular basis.
- Make sure all of your webpages and links are working. Finding broken links on a website is a big turnoff for Googlebot, so it is well worth your time to make sure everything is working on a regular basis.
- Keep your sitemap updated appropriately. The faster Googlebot can get through your site, the more pages it will get to, so make sure all of your important pages are on the sitemap and that unnecessary links are left off.
- Do your best to get backlinks from regularly crawled websites. Social bookmarking sites are good for this purpose. Generally, Pagerank can serve as an indicator of how often Google searches the website.
- Make sure each page of your website has unique title and meta tags.
These are just a few suggestions that might help. If you continue to add good content to your website on a regular basis, your crawl rate will increase over time. Making sure that you do the five items listed above, should help get you there a little more quickly though.
No Comments »Tags: crawl rate, google, googlebot, increase crawl rate, seo
Posted in Tips and Information
July 27th, 2008
Despite the importance of the Robots.txt file in getting your website indexed with the major search engines, many webmasters don’t offer one on their site. What is the robots.txt file you ask? If you don’t know, you are far from alone. The robots.txt file is a simple text file (no html) that is placed in your website’s root directory in order to tell the search engines which pages to index and which to skip.
When a search engine sends its webcrawler to your site, one of the first things the webcrawler will do is search the root directory for the robots.txt file. A correctly formated robots.txt file will consist of several records, each providing instructions for a particular search-bot. A record will generally consist of two components, the first is called the user-agent and is where the name of the search-bot is listed. The second line consits of one or more “disallow” lines. These lines tell the webcrawler which files or folders should not be indexed (ie a cgi-bin folder).
If you currently have a website and do not have a robots.txt file, you can create one easily. As mentioned earlier, the files are plain text, so just open up notepad and save the file at robots.txt. Most webmasters can use one record that will apply to all of the search engine crawlers. Once you have opened notepad enter the following:
User-agent: *
Disallow:
The “*” applies this rule to all bots. In this example, there is nothing listed in the disallow line. This tells the robot to index the entire site. You can also enter a folder path here such as “/private” if there is a folder that shouldn’t be indexed. This can be very useful if you are still testing a portion of your website or is a section is still under construction.
Now that you know what should go into your robots.txt file, there are several common mistakes people make when creating these files. Never enter notes or comments into the file as these items can cause confusion for the webcrawler. Also, the format should always be the user-agent on the first line, followed by the disallow(s). Do not reverse the order. Another common mistake made involves using the incorrect case. If the disallowed folder is /private, make sure your robots.txt file does not list the folder as /Private. It seems like a very minor issue, but it will cause problems if done incorrectly. Finally, there is no Allow command. You cannot tell the webcrawler what to look at, only what not to look at.
If you are still curious about the robots.txt file you can find many more complex examples online. Just try one of your favorite websites and look for their robots.txt file. For example you can go to http://www.cnn.com/robots.txt. If you need help creating a robots.txt file for your site, there are plenty of places online that will create the file for you for free. One example is http://www.seochat.com/seo-tools/robots-generator/. Despite its apparently simplicity, this file can make or break your site’s chances with the search engines. Make sure you have your robots.txt file in place and correctly formatted today.
Technorati Profile
No Comments »Tags: google, googlebot, search engine marketing, seo, webcrawlers
Posted in Rants and Updates