Explore XML Sitemaps
After completing this unit, you’ll be able to:
- Describe what sitemaps are.
- List four elements contained in a sitemap.
- List three sitemap attributes.
- Explain how the Salesforce B2C Commerce APIs process a sitemap request.
What About Sitemaps?
Brandon Wilson is a senior merchandiser for Cloud Kicks, a company that specializes in high-end custom sneakers.
Lately, Brandon has paid a lot of attention to driving shoppers to his company’s storefront. A new area of focus is sitemaps. Sitemaps are XML files that provide search engines, such as Google, Bing, Baidu, and Yahoo, with info about a website so their crawlers can index that site more efficiently. They contain details that help search engines construct links to a site and control the ranking of links within their search results. The details include:
- A list of URLs available for indexing
- When a page was last updated
- Page update frequency
- Page relevance
- Page language (hreflang, optional)
- Related objects of a page (images, optional)
Here’s what you can improve with sitemaps.
- New page awareness: You can make bots aware of new pages as you update your storefront. This is a major usage of this feature!
- Site crawler coverage: You can expose dynamic content that's not referenced by the site’s static content and can't be found by the standard crawling process.
- Search results: You can keep your content fresh in search indexes by telling search engines when to reindex a page.
- Site planning: You can examine Google reports on page visibility, searches resulting in traffic to the site, and how crawlers index each page and improve your site’s visibility.
What’s in a Sitemap?
A sitemap file contains one index file (
sitemap_index.xml), and one or more sitemap files (
sitemap_0.xml, sitemap_1.xml, ...).
The index file is the file that you register with the search engine, and it contains the locations of all the sitemap files. Its only purpose is to point to the actual sitemaps. The number of sitemap files is determined by the configurable URLs per sitemap file (for example, 50,000) or by a maximum size of 10 MB per sitemap file, whichever condition is reached first.
Each sitemap can contain from 0 to 50,000 links. When a file, such as
sitemap_1.xml, reaches the number of links that Brandon specifies, B2C Commerce creates another file, such as
sitemap_2.xml. If his site has 25,000 links and he sets this to
5,000, B2C Commerce generates five sitemaps. If he sets it to
25,000, B2C Commerce generates one file.
B2C Commerce sitemap files include an entry for each URL asset for the supported site locales.
Here are some sample entries.
For each entry, locales can be listed as alternate links. URL assets can be URLs for products, categories, content, folders, controllers, home pages, and product images. You configure settings for these assets, which are represented by attributes when included in a feed.
This table shows how attributes relate to asset settings.
Business Manager Setting
|Can be one of these:
Change frequency: Tells search engines how frequently the page might change. Always (default), hourly, or never.
Priority: Used by search engines to determine which pages to crawl first. One (1) has the highest priority.
You can specify if you want to include available products only, available and orderable products only, or all products. You can also choose to include or exclude non-searchable products. By default, products must be online, available, and searchable to be included in a sitemap.
Image View Types
You can select the image view type you want to include. Each catalog has its own view types, for example, large, medium, small, and swatch. When you include the view type, B2C Commerce adds the image location, title, and alternative text for the image to the sitemap.
If you have multiple sites, you might need multiple sitemaps. It depends on how you use catalogs and how you configure your hostname aliases. Brandon’s sites share the same standard catalog. That means that even though his sites have different locales, they don’t require separate sitemaps. However, you can create a sitemap set for every hostname (domain).
If your sites don’t share the same standard catalog, they need different sitemaps. You must register and generate your sitemaps separately.
Google uses a Googlebot to crawl the Cloud Kicks site, collecting search results. Google knows what pages are new in this site using the sitemap Brandon provides. When Google requests the sitemap, the B2C Commerce API gets involved. Brandon turns to his coworker, Vijay Lahiri, a Cloud Kicks senior developer, to learn how the classes, methods, and pipelets (included in the B2C Commerce API) work together to make this happen.
This diagram illustrates how Brandon uses Business Manager and Google to process sitemaps.
Brandon uses Business Manager to create and manage sitemap and sitemap index files. B2C Commerce uses the
SitemapMgr class to manage and process sitemaps.
Next comes the Google part of the equation. Brandon uses search engine webmaster tools (such as Google Search Console) to:
- Register the site with the search engine.
- Verify site ownership.
- Analyze site traffic.
Put It All Together
So what does B2C Commerce do to synchronize the Business Manager configuration with the Google account? It starts by checking if a URL asset can be added to the generated sitemap, with these conditions.
- The URL must be included with the sitemap via sitemap settings.
- The URL asset must be online.
- For products, the URL must have the sitemapIncluded flag set to yes or be assigned to a category that has the sitemapIncluded flag set to
yes(or its parents).
- For content assets, the URL must have the sitemapIncluded flag set to
B2C Commerce is now ready to generate the sitemap and include custom sitemaps using the getCustomSitemapFiles() method. The method SendGoogleSiteMap copies the generated sitemap set into the request output stream to make it accessible for the Google bot.
In this unit, you learned what sitemaps are used for, what’s contained in a sitemap, and how the B2C Commerce APIs process a Google sitemap request. Next, you learn how to generate sitemaps.