“So can you make sure that my new website appears in Google’s search results, on the first page?” It’s what clients tend to ask. They seem to think that one picks up the phone to have a little chat with Google to see if they can be persuaded to bump the site’s classification up a bit. No can do.

Nevertheless, there are ways and means to help Google classify your site properly – and higher. In the old days, it was commonplace to add a bunch of keywords in the META tags of a site in order for search engines to believe that the site was relevant to a lot of searches (the inclusion of porn-related keyword being a much-practised solution). Nowadays, however, Google all but complete ignores keywords. It’s better to leave out the META keywords altogether, since including irrelevant keywords will damage your reputation with Google rather than improve it.

In lieu of keywords, Google checks the actual content of your site (we’ll get to that in depth in a moment). It also checks the number of sites that link to yours (so-called backlinks). This number isn’t something you can improve overnight. Some social engineering is necessary to get other webmasters to link to your site. Link trading is commonplace: you link to my site, I link to yours. This is the most powerful way to get your site noticed by Google.

Indexation checks

In order to find whether or not Google has actually included your site in its index (and which pages are actually included), you can have Google do the following search:

site:mywebsite.com

This performs a search within the domain of your site. It’ll show you the total of pages indexed, and links to these pages. Note that for a new site, Google may take a while to get around to indexing it. You may need to be patient for a week or two. If your site still isn’t indexed, you may have been banned from Google for violation of policy. Also, you’ll have to Google know that your site actually exists. This is something you can do through Google Webmaster Tools.

Brand check

If your site actually appears in Google, that’s a good start. But more specifically, you’ll want your site to be included on the first page of search results (preferable position 1-3) when people search for your brand name. To check this, simply feed your company’s name in Google search. Unless your brand is a common name, you should actually be in a top position. If not, Google may have penalized you for violating Google’s policy (check the Google Webmaster Guidelines!)

Accessibility

If you’ve followed best accessibility practices during the design of your website, then the site is probably accessible to a variety of browsers, vision impaired users, users without Flash, users without JavaScript, colorblind users etc. However, this does not mean that the site is accessible to search engines. When trying to index your site, Google will have a script open your site (a so-called crawler) and look at its contents. It’s your job to help the crawler to be able to read all content and access all links on your site. If JavaScript is required to access part of your site, then you’re out of luck: the crawler will ignore all JavaScript. Also, a bit of a no-brainer here, if you lock part of your site from public access, Google will obviously not be able to index that part.

One important part of your site that must absolutely be accessible to Google is its top navigation menu. If your menu is built with Flash, Google won’t follow any of the links it contains. The use of Ajax and frames is also not recommended for the same reason. If you must have a Flash menu, be sure to offer a secundary navigation menu with plain text links (maybe at the bottom of the page). If you using images for your links, make sure that they have alt attributes. Even with images with proper alt attributes, a secundary menu is recommended.

Content

In order for Google’s crawler to properly read all content of your site, static HTML is best. Nevertheless, you’ll probably want to use some spiffy techniques to spice up the visitors’ experience when visiting your site. There’s probably a part where Ajax is used to speed up navigation. If so, it’s no problem – as long as you provide a non-Ajax way that gets users (and crawlers) to the same page. This can be harder than it sounds and may require some heavy recoding…

robots.txt

All search engines will check for the existence of the file robots.txt on your web server. Although not required for your site to function, it is good practice to include this file as it contains an overview of the site’s contents, and specifically, which pages the search engine (or robot) should index and which pages should be skipped. The robots.txt file is a plaintext file. Typically, its contents are:

User-agent: *
Disallow:
Sitemap: http://www.mysite.com/sitemap.xml

This means: all user agents (browsers) are allowed to visit my site (including crawlers). No pages of my site are forbidden to index. In addition, a sitemap (an overview of the structure of the site – see below) may be found in sitemap.xml. There’s a lovely site named robotstxt.org that explains all the ins and out of the file, with a bunch of examples.

Sitemap

A sitemap is a document containing links to all or most pages of your website. Sitemaps are a good idea to include with your site, as search engines may use them to discover pages that they would otherwise not find (in particular, if these pages can only be reached through JavaScript or Flash). For optimal indexation, your sitemap should have an XML format and robots.txt should provide the URL to the sitemap. Although sitemaps are typically generated automatically by content management systems, you can easily create one yourself. Here’s one place to do it. Of course, auto-generated sitemaps are kept up to date automatically, while a one-off sitemap may get out of date quickly.

The road ahead

Now that your site is properly found and index by the search engines, the next step would be to improve its content so that the site classification goes up. That will be the subject for a future article.