SEO best practices and requirements for modern sites | John Mueller

All right. So initially, I was contacted by the Angular
team because we hear these questions from them all the time. So what about search? And what do Angular sites need to do with
search? Because there are lots of weird — from our
point of view, misconceptions out there, specifically around search. It starts with things like this. We see this all the time. JavaScript SEO. If you're using Angular of course search SEO;
right? According to the headlines. It gets even worse. Angular is it even SEO. So good to watch out for this.

And this is one that particularly hurts, it's,
like, these both come from the same company, how come it doesn't work together; right? And some people are getting really close to
the truth, and they're asking maybe this is due to those curly brackets. So I think from a search point of view if
you all just got rid of the curly brackets, then I'm sure SEO would just work. Well, I guess there are a few other things
to look into. So just briefly about me, I've been with Google
over nine years now. I work mostly within search. I'm not an Angular developer, so most of the
things we'll be going through here are focused more on search in general and JavaScript sites,
JavaScript frameworks in particular, and not specific to Angular.

There's a brief section on Angular as well,
though. And this is Google bot so the side who will
be joining us for the run of the presentation. All right. So I'll be looking at in general what search
needs, a bit about rendering because this is a really big topic for us. Really important. Quickly on what you could do to fix issues
that you've run into so far. And some useful tools along the way. So looking at the minimum requirements for
search, one of the things that I thought is kind of useful to understand is briefly how
search actually works. So how we work with indexing, how we pick
up the content that we can show in the search results. And all of that essentially starts with a
bunch of URLs, and we go off to the Internet to render these URLs, kind of like a browser,
and the content we pick up there, we take for indexing. So these steps, not particularly in detail,
but the steps themselves gives you a little bit of understanding of what you might need
to watch out for when it comes to search.

So in particular, search like I mentioned
starts with the URLs. So that's kind of the most important thing
for us. So we really need to have unique indexable
URLs for each piece of content that you have. And when you talk with SEOs, they always say,
oh, you need to put this on your page and that on your page, but from our point of view,
the URLs are the most important part.

It's not even what's on the page, and we can
crawl the URLs, so we can still show them in search, so to crawl the URLs to show content
and links. Links in particular are important for Google
because that's how we discover the rest of your website, the rest of your Web app. So we start with your home page and see, oh,
all of these links on the page, we will try to follow those to that individual piece of
content. So since we're talking about URLs, this is
one thing that I see as being problematic with a lot of single page apps. Some of the Angular apps as well in that it's
really important to understand what's a good URL and what's a bad URL for Google. So we essentially look at the path in the
file name, parameters are fine if you have parameters as well.

It's important that you have one URL per piece
much content. So if you have multiple states within your
app, within your website, then all of those states need to have individual URLs. If you have translated content, so in English,
German, and French, then all of that needs to have separate URLs as well. Bad URLs, so this is kind of an antipattern
that we see a lot with Angular sites sometimes with single page apps in general is that either
you have one URL for multiple states within your website, within your content, or you
use a hash as a way of identifying pieces of content within the website. So for us, if we see kind of the hash sign
there, then that means that the rest there is probably irrelevant, and we'll — for the
most part, we'll drop that when we try to index the content. So if you're developing locally, and you use
kind of the hash-based URLs to do that locally, that's fine. But when you push it to production, when you
make it live, and if you want that content actually visible in search, then it's important
that you use kind of the more static looking URLs.

And another antipattern that we see a lot
is irrelevant URL parameters essentially. So session IDs is something that used to be
really common. Something we still sometimes see is URLs that
have log in information, for example, or any kind of private information. It's not that we'll go out and try to find
these URLs, but they may show up in search, and suddenly they'll be in article somewhere,
like, oh, if you do this weird Google search, you can find all of the log in information
for this website.

So that's something to kind of avoid. Okay. So canonical URL, has anyone heard of canonicallization? Some of you have. Some of you have been reading the SEO-type
information out there. So this is one of our main things that we
focus on when it comes to indexing content in search. In that it's really common for websites to
have multiple versions of their content available under multiple URLs. And for us, it's important to find that one
URL that's unique that is kind of the canonical, the main key for this piece of content. So we try to do that automatically for the
most part, we do that really well. But you can take control of that as well on
your site if you know exactly what types of URLs you want to have in there.

So you can use link URL canonical elements,
there's some other things as well here that you can do. It's really important for us that you're consistent
with your canonical. So if you choose one URL as canonical or one
type of URL, make sure you use that exact type of URL across your whole website so that
we don't reunion multiple references to the same content in different ways. All right. Hash bang URLs, Ajax crawling. Has anyone heard of this? Implemented this happen? Some hands. Okay. Great. So this was I think a really great idea back
in the day. We came up with this 2008, 2009 where essentially
Google bot didn't have the capability to render pages.

So we thought we would come up with the special
scheme where if we recognized kind of a hash and a URL, we would call it in a different
way, and we would expect you to render the pages yourself, and we'll take that rendered
version and use that for index. So back in the day, this was a great thing
to do because there were no real other options for JavaScript-based sites. But in the meantime, things have gotten a
lot better, and there are better ways to handle this.

And additionally over the years, we noticed
that this caused a lot of pain with webmasters because they had two versions of their site
essentially that they had to maintain separately. And if there was an issue in the Google bot
version of the site, then that could cause problems in search that they would never notice
when they look at the site themselves. So Google bot has gotten better at rendering
pages. Essentially in the meantime, Google bot has
gotten to be pretty smart with rendering pages kind of like a browser. It has a fairly modern browser, it can for
the most part render most modern websites well. But there's some exceptions. So they're kind of listed here. On the one hand, there's specific elements
that are currently not supported where you could use polyfills to kind of solve that. That includes promises, Service Workers, fetch
API, local storage. These are things that to some extent don't
really make sense for Google bot to use either.

So it's not that Google bot would be able
to run a service worker for every website out there and kind of keep that updated. We kind of expect you to function well without
service worker support. So polyfills would work here or progressive
enhancement, graceful techniques. We also don't fully support ES6. So if you're transpiling your code, make sure
you transpile it to ES5 if you need Google bot to actually render the pages for you. Finally, there's the robot tech standard that
tells search engines which URLs they're allowed to crawl and which they're not allowed to
crawl.

And for us to be able to render your pages
properly, we need to be able to crawl all of the embedded content on your pages. And finally, one last thing that we sometimes
see is kind of this pattern of making clickable elements on a website that aren't really links. And for us, that's kind of problematic because
we don't recognize those links because we can't follow them to the rest of your content. So if you're making links between pieces of
your website, then make sure that they render as A elements. So as I mentioned before, we try to render
pretty much all of the pages that we crawl in index, which is a ton of pages, but sometimes
that doesn't work, which could be due to some of the things that I mentioned before. And in those cases where we can't render the
content, what will happen is we'll fall back to the raw HTML version that you serve us
as well.

So for single-page apps, that could be problematic. In that if we fall back to the HTML page,
there is for the most part, there's no real content on those pages. So what will happen there is we'll recognize
all of these URLs as being duplicate because it's the same HTML that we receive and over
time would be we'll follow this into one URL to try to solve the problem for the webmaster,
which is probably not what you're trying to do.

So this is one of those elements where if
you recognize your page is dropping out of Google's index, then it might be that we're
just not able to render them anymore and that we're falling back to the raw HTML version
that we receive from your website. Another really important aspect when it comes
to search engines is meta tags. For the most part, we don't need meta tags
for ranking, but sometimes you need to tell us how you want these pages actually used. So you could restrict whether or not we show
cache link in the search results. Whether or not this URL should be indexed
at all. That's all index you can do in the head of
the page with meta tags.

You can also link different language or versions
together, you can also link your desktop site and mobile site if you have separate URLs
for those. And all of these we require the meta tags
be in the page. Ask sometimes we see sites have an HTML page,
but it has a script tag on top of the head that injects some content maybe and after
rendering the page, essentially the head is kind of broken. And the meta tags that we would see otherwise
in the head, they slip into the body, and then we can take into account for indexing.

So this is something I kind of keep in mind
and think about as you build up your site and maybe test some of your templates that
you have to make sure that they don't break the head of the pages. Prerendering, this is a topic that comes up
all the time. We call it hidden text for large part, it's
not trying to do something malicious and actively hide something on a page, but it's a UI element. So this could be a tab interface, for example,
it could be a click to expand element on a page. And for us when we render a page, and we notice
that some of the content is not visible by default, then we assume that it's not primary
content, and we'll kind of devaluate in search. So if there's something that someone has to
click on to make visible, make sure it's visible by default or move to a separate URL so that
it can be indexed separately. And additionally if you have something that's
not even loaded by default that requires an event, some kind of interaction with the user,
then that's something we might not pick up at all because Google bot isn't going to go
around and click on various parts of your site just to see if anything new happens.

So if you have something that you have to
click on like this click to read more and then when someone clicks on this, actually
there's Ajax call that pulls into content and displays it, that's probably something
we won't be able to use for indexing at all. Again, if this is important content for you,
move it to visible part of the page, or maybe move it to a separate URL. All right. So I mentioned Google bot is really good at
rendering. But sometimes, rendering, like, having Google
bot handle rendering can be a bit tricky because it's really hard to kind of debug what Google
bot is actually doing.

So some sites render just for Google bot,
they serve Google bot-rendered content, which is similar to the Ajax crawling scheme. From our point of view, this is sometimes
tricky, and it leads to regular problems in that the indexing team will come to us and
say, hey, you need to contact this big website, they're serving us rendered content, but it's
totally broken. So this is sometimes really tricky to debug
and to maintain. So our recommendation is to do prerendered
content for all users, which essentially is more JavaScript, which is something Angular
universal kind of let's you do as well. So the idea here is that you serve the prerendered
content to all the users when they come to your page for the first time. Google bot can see the prerendered content
immediately. Other Web services that interact with your
website can see the prerendered content, and everything else is then taken to the client
side afterwards. All right. So fixing old issues is kind of easy once
you've been able to figure out what those issues are. The first one I mention here is to really
double-check the nonGoogle dependencies as well because this is something that we often
see that people say, oh, I will just fix all of these issues, and then they don't realize
that maybe their share widget is completely broken after they fix things for Google, so
really that I can see that across the board you're not breaking other things while fixing
things for Google.

Bring up the clean version, make sure that
the code actually renders. If you're changing URLs, then you need to
set up some kind of redirects as well. So with regards to redirects, for normal websites,
it's fairly easy because you can just do server side redirects with JavaScript sites if you
have a hash in the URL, then you need to do that redirect on the client side with JavaScript. And for that, it's really important from our
point of view that you do that redirect as quick as possible, you don't have interstitials
there, you don't have time out after a certain period of time.

Because what might then happen is that we
don't recognize that redirect, and we might say, oh, this is actually the content on the
page, so the interstitial will pick up the interstitial for indexing. Cite that file is important. Cite that file is a way to notify search engines
of the URLs on your website when they last changed. So redirecting, make sure to update the file
for old and new URLs. And of course double-check the robot's check
file. And one last thing that we regularly see people
have trouble, make sure you update all of the hidden URL references as well. So these are things that are not the immediate
navigation within the website, but, for example, if you link the different language versions
together, make sure that those versions also reflect the new URL structure.

Or if you have a separate mobile site that
that also refers to the right URLs within your website. All right. Some useful tools. Search console. I imagine a lot of you use search console. Can I get a quick show of hands? Search console users. Okay. A bunch. That's good. And I guess the others are just hungry and
waiting for lunch. But search console is a really great way to
get insight into how Google sees your website to understand how the crawling process is
working, how we're able to render content, how indexing is working, so I strongly recommend
if you're trying to do something to work in search, make sure that you set up search console,
that you understand roughly what all is involved there. One good reason to be prepared with search
console is also for urgent removals.

We see this all the time in that people will
accidentally put private content online, Google bot will pick that up fairly quickly, and
then you kind of have to scramble to get that removed. If you have everything verified in search
console, it's a matter of a couple of minutes of submitting that for removal, and then it's
out of the index. So the most important tool for you is probably
fetch and render in search console where we send Google bot to a URL that you specify
and Google bot will try to render the page and show you what it looks like when it's
rendered with Google bot. So this is a great way to confirm that your
prerendering is working well if you're doing that. You can test with smartphone and desktop user
agents to make sure that the mobile version is being picked up properly. It lists the URLs that are blocked by robot's
text. So it's a lot easier to figure out if you're
blocking maybe an API on your server or some other server response. Or if you're using an external API that's
blocked by their robot's text.

That's something to kind of watch out for. The problematic part maybe for some of you
is that for rendering, we show you a screenshot, we don't show you the full DOM in the page. So if there's something in the meta tags that's
critical for you, you might need to do some hacks to actually make sure that Google bot
is picking that up properly. All right. Site map file is another one of those things
that comes up regularly when it comes to search. In the site map file, you list the URLs and
the last change dates of those you reallies. And I recommend separating that out by route
or by template or some logical structure that you have within your website that makes it
a lot easier for you to debug issues that are systematic to one particular part of your
site. So if you have product detail pages and put
all of those detail pages into one site map file, and then you'll see if there's a systematic
issue with that template, you'll see that fairly quickly.

And this shows when last changed, and we'll
recrawl as quickly as possible. The other attributes in the site map file,
you can probably ignore. All right. So we kind of made it to the end. Very quick summary of in general with regards
to URLs. I think URLs is kind of the most critical
part here, especially when you're looking at single-page apps, Angular apps, make sure
that you're using clean URLs, no hashes in the URL. Understand how Google bot does rendering,
understand how you could do rendering on your side. Maybe check out universal to see how that
works. For each route or template that you have set
up on your website, kind of separate those out logically so that you could recognize
systematic issues, and then everything should fall into place, and you shouldn't have to
worry about all of those — I don't know bad headlines around JavaScript sites and search. So with regards to Angular, if you're using
Angular 1, make sure that you have HTML5 mode set so that I don't have the hash in the URL.

With Angular 2, I believe it's set up properly
by default. And, again, check out universal, that might
be something to look into as well. All right. With that, I think we've kind of come to the
end. If you have more questions, I will be in the
office hours this afternoon. I also do regular office hours hangouts, which
you can join on YouTube or Google plus hangouts. And feel free to ping me on Twitter or Google
plus or wherever if you have more questions afterwards. [Applause]
>> Thanks very much, John, that was really helpful.

So it's going to be lunch now. Just before you go, I've got a couple of housekeeping. Just to remind you that there's office hours
later this afternoon and panel QA's. Make sure you check your schedules for the
information there. There's also the IET mini-workshops, so we're
doing a workshop taking place this afternoon at 2:40. If you have special dietary requirements,
check outside the door there, and they'll help you, and there's plenty of food, you
can come back later, they are going to be replace the food throughout lunchtime. So plenty of time, go to the games and the
chill out space just to relax and get your food later.

Thanks very much. Enjoy lunch. Captions provided by @chaselfrazier and @whitecoatcapxg..

You May Also Like