cloud-gov · Ephraim-G · Feb 25, 2025 · Feb 14, 2025 · Feb 14, 2025
@@ -0,0 +1,18 @@
+<table class="usa-table usa-table--borderless">
+  {% if include.caption %}
+  <caption>
+    {{ include.caption }}
+  </caption>
+  {% endif %}
+  <thead>
+    <tr>
+      <th scope="col">{{ include.header1 }}</th>
+      <th scope="col">{{ include.header2 }}</th>
+      <th scope="col">{{ include.header3 }}</th>
+      <th scope="col">{{ include.header4 }}</th>
+    </tr>
+  </thead>
+  <tbody>
+    {{ include.content }}
+  </tbody>
+  </table>
@@ -31,4 +31,11 @@ forked repositories.
 ## Builds and Logs
 Build history and logs for every build are available in the Pages web application. Note: build logs will only be available for **180** days after the build completes.
 
-![Build logs screenshot]({{site.baseurl}}/assets/images/pages/buildlogs.png)
+![Build logs screenshot]({{site.baseurl}}/assets/images/pages/buildlogs.png)
+
+**Absolute URL management**
+
+Although Pages automatically sets `BASEURL`, it is best to define your production URL in the site config file (`site.yaml`) to construct absolute URLs throughout an Eleventy site where `url: “https://agency-production-url.gov”`. This will allow the sitemap to construct proper absolute URLs by using `site.url` and `page.url` instead of the `BASEURL` value maintaining consistency across builds.
+{% raw %} 
+`<loc>{{ site.url }}{{ page.url }}</loc>`
+{% endraw %}
@@ -13,3 +13,49 @@ We recommend using [Search.gov][], a free site search and search analytics servi
 If you'd prefer another solution, you can configure a tool like [lunrjs](https://lunrjs.com/) that creates a search function run using the client browser. An example of this is at the [18F blog](https://18f.gsa.gov/blog/). This avoids any dependency on another service, but the search results are not as robust.
 
 [Search.gov]: https://search.gov/
+
+**Crawl/Index Pages sites**
+
+Pages automatically handles search engine visibility for preview URLs via the Pages proxy. For traffic  served through a preview site, the Pages proxy automatically serves the appropriate HTTP robots header, `robots:none`. Preview URLs are not crawlable or indexable by design. Only webpages on the production domain are served with the `robots: all` directive, indicating to crawlers and bots such as search.gov to index the site and enable search capabilities. 
+
+{% capture search_table_content %}
+<tr>
+    <th scope="row">1</th>
+    <td><p> <strong>robots.txt in your Pages site</strong> <br> <br> Discourages robots from crawling the page or pages listed. Webpages that aren’t crawled generally can’t be indexed.</p></td>
+    <td><code>User-agent: *</code><code>disallow: / directory</code></td>
+    <td>N/A, crawling is allowed by default</td>
+</tr>
+<tr>
+    <th scope="row">2</th>
+    <td><p> <strong>X-Robots-Tag HTTP header (served by Pages via the Pages proxy)</strong> <br> <br> Encourages or discourages robots to read and index the content on this page or use it to find more links to crawl.</p></td>
+    <td><code>robots: none</code>(this is automatically served to  visitors of all Pages preview builds) </td>
+    <td><code> robots: all</code>(this is automatically served to visitors of custom/production domains)</td>
+</tr>
+<tr>
+<th scope="row">3</th>
+    <td><p> <strong>&lt;meta name="robots"&gt; in your Pages site webpage HTML</strong> <br> <br> Discourages robots from crawling the page or pages listed. Webpages that aren’t crawled generally can’t be indexed.</p></td>
+    <td><code>content="noindex, nofollow”</code></td>
+    <td>N/A, indexing is allowed by default</td>
+</tr>
+{% endcapture %}
+
+{% include content-table.html
+  caption="Search with Pages"
+  header1="Priority"
+  header2="Method to manage robot behavior"
+  header3="How to <u>prevent</u> indexing/crawling"
+  header4="How to <u>allow</u> indexing/crawling"
+  content=search_table_content %}
+
+If you want to disable crawling and indexing for specific pages of your production site, you can include the `noindex/nofollow` meta tag in the head of those pages, or include those folders in your `robots.txt`, if your site generates one.
+
+**Conditionally set robots - Eleventy (11ty)** 
+
+Take advantage of Pages-provided environment variables to enable environment-specific functionality. Hardcode the condition and meta tags to check the branch from the `process.env` environment variable. This differs from how it is dealt with on a Jekyll site, you are able to add specificity with `process.env.BRANCH`.
+You can use this code sample 
+```
+{% unless process.env.BRANCH == "main" %}
+  <meta name="robots" content="noindex, nofollow">
+{% endunless %}
+```
+See additional documentation on [build environment variables](https://cloud.gov/pages/documentation/env-vars-on-pages-builds/).