Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cronjobs: inject canonical URLs into older manual pages (SEO) #1241

Open
wants to merge 3 commits into
base: grass8
Choose a base branch
from

Conversation

neteler
Copy link
Member

@neteler neteler commented Nov 12, 2024

The GRASS GIS manual pages of the different versions have been published for a long time with a difficult to understand concept of being invisible, redirected or shown, which also strongly affects the search engine ranking.

SEO: Without indication of "canonical" URLs different versions wipe each out out in search engines. Canonical tags help consolidate duplicate or similar content by specifying the preferred version of a page, ensuring search engines index and rank the desired URL while avoiding duplicate content issues.

This PR changes the cronjob scripts to

  • inject "grass-stable" as the "canonical" into older manual pages under versioned URL
  • inject "grass-devel" as the "canonical" into the development manual pages under versioned URL

Like this no "duplicate content" from a SEO perspective should occur.

Also robots.txt is updated to reactivate the manual pages of old GRASS GIS versions (which now contain "grass-stable" as the canonical).

Fixes OSGeo/grass#4579

The GRASS GIS manual pages of the different versions have been published for a long time with a difficult to understand concept of being invisible, redirected or shown, which also strongly affects the search engine ranking.

SEO: Without indication of "canonical" URLs different versions wipe each out out in search engines. Canonical tags help consolidate duplicate or similar content by specifying the preferred version of a page, ensuring search engines index and rank the desired URL while avoiding duplicate content issues.

This PR changes the cronjob scripts to
- inject "grass-stable" as the "canonical" into older manual pages under versioned URL
- inject "grass-devel" as the "canonical" into the development manual pages under versioned URL

Like this no "duplicate content" from a SEO perspective should occur.

Also `robots.txt` is updated to reactivate the manual pages of old GRASS GIS versions (which now contain "grass-stable" as the canonical).

Fixes OSGeo/grass#4579
@neteler neteler added manual Documentation related issues CI Continuous integration labels Nov 12, 2024
@neteler neteler self-assigned this Nov 12, 2024
@neteler
Copy link
Member Author

neteler commented Nov 12, 2024

Note: these files are now deployed on grass.osgeo.org for testing.

@echoix
Copy link
Member

echoix commented Nov 12, 2024

Wouldn't the grass-devel and grass-stable primary content be rather similar, thus being potentially penalized as duplicates?

How does this method handle pages that don't exist in later versions, or are renamed/moved?
It doesn't happen often, but we had some every now and then.

… to point to "stable" manual (rather than "devel")
@neteler
Copy link
Member Author

neteler commented Nov 13, 2024

Wouldn't the grass-devel and grass-stable primary content be rather similar, thus being potentially penalized as duplicates?

Very good point.
I have changed it in 9824486 to inject in the 8.5 versioned and the "grass-devel" manual sections the "canonical" to point to "stable" rather than to "devel".

Deployed update on grass.osgeo.org, triggered cronjob and told Google Search about it.

How does this method handle pages that don't exist in later versions, or are renamed/moved? It doesn't happen often, but we had some every now and then.

A few of them are handled with redirects in Apache. I would not know any other method.

@neteler
Copy link
Member Author

neteler commented Nov 19, 2024

Too bad, now the building with cron_grass_preview_build_binaries.sh on the Debian grass.osgeo.org server is broken after the MD merge:

...
Parsing <v.what.strds.timestamp>... SUCCESS
Parsing <wx.metadata>... FAILED
Parsing <wx.mwprecip>... FAILED
Parsing <wx.stream>... FAILED
Parsing <wx.wms>... FAILED
+ cp /home/neteler/.grass8/addons/modules.xml /var/www/code_and_data/addons/grass8/modules.xml
+ export ARCH
+ export ARCH_DISTDIR=/home/neteler/src//main/dist.x86_64-pc-linux-gnu
+ export GISBASE=/home/neteler/src//main/dist.x86_64-pc-linux-gnu
+ export VERSION_NUMBER=8.5
+ python3 /home/neteler/src//main/man/build_keywords.py /var/www/code_and_data/grass85/manuals/ /var/www/code_and_data/grass85/manuals/addons/
Traceback (most recent call last):
  File "/home/neteler/src//main/man/build_keywords.py", line 202, in <module>
    build_keywords("md")
  File "/home/neteler/src//main/man/build_keywords.py", line 68, in build_keywords
    from build_md import (
  File "/home/neteler/src/main/man/build_md.py", line 264, in <module>
    man_dir = os.path.join(os.environ["MDDIR"], "source")
  File "/usr/lib/python3.9/os.py", line 679, in __getitem__
    raise KeyError(key) from None
KeyError: 'MDDIR'

Can @landam help?

@neteler
Copy link
Member Author

neteler commented Nov 22, 2024

Too bad, now the building with cron_grass_preview_build_binaries.sh on the Debian grass.osgeo.org server is broken after the MD merge:

Bugfix PR: OSGeo/grass#4739

@echoix
Copy link
Member

echoix commented Nov 25, 2024

Is everything clear for this one now?

neteler added a commit to neteler/grass-website that referenced this pull request Nov 27, 2024
So far https://grass.osgeo.org/sitemap.xml showed the versioned manual pages which is unhelpful in terms of consolidating search engine results for manuals.
In the past months we were penalized by "duplicate content".

For an overview, see OSGeo/grass#4579

For efforts to address this situation, see

- OSGeo/grass-addons#1168
- OSGeo/grass-addons#1241

This PR changes the URL in `sitemap.xml` from versioned manual URLs to grass-stable/grass-devel in order to complete the other PRs.
neteler added a commit to OSGeo/grass-website that referenced this pull request Nov 27, 2024
So far https://grass.osgeo.org/sitemap.xml showed the versioned manual pages which is unhelpful in terms of consolidating search engine results for manuals.
In the past months we were penalized by "duplicate content".

For an overview, see OSGeo/grass#4579

For efforts to address this situation, see

- OSGeo/grass-addons#1168
- OSGeo/grass-addons#1241

This PR changes the URL in `sitemap.xml` from versioned manual URLs to grass-stable/grass-devel in order to complete the other PRs.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CI Continuous integration manual Documentation related issues
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Bug] docs: Fix search engine ranking of manual pages (SEO)
2 participants