Skip to content

Commit

Permalink
Add checkpoint support to Pulp core and file
Browse files Browse the repository at this point in the history
Introduce a checkpoint field for Publication and Distribution models.
Handle serving checkpoint Publications via checkpoint Distributions.
Protect checkpoint Publications' RepositoryVersions from cleanup.
Enable checkpoint support in pulp_file.

closes #6244
  • Loading branch information
Moustafa-Moustafa committed Feb 12, 2025
1 parent f72af09 commit 6f24ed7
Show file tree
Hide file tree
Showing 20 changed files with 760 additions and 24 deletions.
1 change: 1 addition & 0 deletions CHANGES/6244.feature
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
Added support to create and distribute checkpoint publications in Pulp.
3 changes: 3 additions & 0 deletions CHANGES/plugin_api/6244.feature
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
Added support to create and distribute checkpoint publications in Pulp.
Plugins can choose to enable this feature by exposing the checkpoint field in their inherited PublicationSerializer and DistributionSerializer.
Checkpoint publications and distributions can be created by passing checkpoint=True when creating them.
1 change: 1 addition & 0 deletions CHANGES/pulp_file/6244.feature
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
Added support to create checkpoint file publications and distribute them through checkpoint file distributions.
1 change: 1 addition & 0 deletions docs/admin/reference/tech-preview.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,3 +5,4 @@ The following features are currently being released as part of tech preview:
- [Support for Open Telemetry](site:pulpcore/docs/admin/learn/architecture/#telemetry-support)
- Upstream replicas
- Domains - Multi-Tenancy
- [Checkpoint](site:pulpcore/docs/user/guides/checkpoint.md)
64 changes: 64 additions & 0 deletions docs/dev/learn/subclassing/checkpoint.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,64 @@
# Checkpoint

!!! warning
This feature is provided as a tech preview and could change in backwards incompatible
ways in the future.

Pulp's checkpoint feature offers a robust way to manage and access historical versions of
repositories. By integrating checkpoints into your plugins, you enable users to recreate
environments from specific points in time, which is invaluable for identifying when changes or
regressions were introduced. This feature supports reproducible deployments, helps track changes in
package behavior, and facilitates a structured update workflow.

!!! warning
The checkpoint feature is only supported for plugins using publications.

Plugin writers need to expose the `checkpoint` field on their distribution and publication
serializers to allow users to create checkpoint publications and create checkpoint distributions to
serve these publications. The `checkpoint` field is already present on the base distribution and
publication models, so no new migration is needed.

Example: enabling the checkpoint feature in the pulp_file plugin.
```python
class FileDistributionSerializer(DistributionSerializer):
"""
Serializer for File Distributions.
"""
publication = DetailRelatedField(
required=False,
help_text=_("Publication to be served"),
view_name_pattern=r"publications(-.*/.*)?-detail",
queryset=models.Publication.objects.exclude(complete=False),
allow_null=True,
)
checkpoint = serializers.BooleanField(default=False)

class Meta:
fields = DistributionSerializer.Meta.fields + ("publication", "checkpoint")
model = FileDistribution
```

```python
class FilePublicationSerializer(PublicationSerializer):
"""
Serializer for File Publications.
"""
distributions = DetailRelatedField(
help_text=_("This publication is currently hosted as defined by these distributions."),
source="distribution_set",
view_name="filedistributions-detail",
many=True,
read_only=True,
)
manifest = serializers.CharField(
help_text=_("Filename to use for manifest file containing metadata for all the files."),
default="PULP_MANIFEST",
required=False,
allow_null=True,
)
checkpoint = serializers.BooleanField(default=False)

class Meta:
model = FilePublication
fields = PublicationSerializer.Meta.fields + ("distributions", "manifest", "checkpoint")
```
119 changes: 119 additions & 0 deletions docs/user/guides/checkpoint.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,119 @@
# Create and Distribute Checkpoints

!!! warning
This feature requires plugin support to work correctly.

!!! warning
This feature is provided as a tech preview and could change in backwards incompatible
ways in the future.

## Overview

Checkpoints in Pulp provide a way to access and manage historical versions of repositories. This
feature allows users to view and install packages as they existed at specific points in time. By
using checkpoints, you can recreate environments from any given date/time, which is particularly
useful for tracking down when changes or regressions were introduced.

Checkpoints support reproducible deployments, help identify changes in package behavior over time,
and facilitate a structured update workflow. This ensures that a validated environment can be
consistently replicated across different stages of development and production.

For a similar concept, you can refer to [Debian's snapshot archive](https://snapshot.debian.org/),
which offers access to old snapshots of the repositories based on timestamps.

## Enabling Checkpoints

Checkpoint is a plugin-dependent feature. It needs to be enabled in a plugin before you can start
using it.

## Creating Checkpoints

The first step to start using checkpoint, is to create a checkpoint distribution which will be used
to distribute checkpoint publications. A checkpoint distribution serves all the checkpoint
publications of the related repository.

```bash
pulp file distribution create \
--name <distro_name> \
--repository <repo_name> \
--base-path <distro_base_path> \
--checkpoint
```

The next step is to create checkpoint publications. Only publications marked as checkpoint will be
served from the checkpoint distribution. Checkpoint publications can only be created using the
repository's latest version. Repository versions of the distributed checkpoint publications will be
protected from the `retain_repo_versions` cleanup.

```bash
pulp file publication create \
--repository <repo_name> \
--checkpoint
```

## Accessing Checkpoints

### Listing All Checkpoints
You can access a listing of all the available repository's checkpoint publications by accessing the
base path of any of the repository's checkpoint distributions.

```bash
http :24816/pulp/content/checkpoint/myfile
```

```html
<html>
<head><title>Index of checkpoint/myfile/</title></head>
<body bgcolor="white">
<h1>Index of checkpoint/myfile/</h1>
<hr><pre><a href="../">../</a>
<a href="20250130T203000Z/">20250130T203000Z/</a> 30-Jan-2025 20:30
<a href="20250130T205000Z/">20250130T205000Z/</a> 30-Jan-2025 20:50
</pre><hr></body>
</html>
```

### Accessing a Specific Checkpoint
To access a specific checkpoint, suffix the checkpoint distribution's path with a timestamp in the format
`yyyyMMddTHHmmssZ` (e.g. 20250130T205339Z), If a checkpoint was created at this time, it will be
served. Otherwise, you will be redirected to the latest checkpoint created before this timestamp.
Trying to access a checkpoint using a timestamp in the future or before the first checkpoint's
timestamp, will result in a 404 response.

Assuming the checkpoints from the above example, the below table show responses for sample requests
<table>
<tr>
<th>Request path</th>
<th>Response</th>
</tr>
<tr>
<td>checkpoint/myfile/20250130T203000Z/</td>
<td>200</td>
</tr>
<tr>
<td>checkpoint/myfile/20250130T204000Z/</td>
<td>
301 <br>
Location: checkpoint/myfile/20250130T203000Z/
</td>
</tr>
<tr>
<td>checkpoint/myfile/20250130T206000Z/</td>
<td>
301 <br>
Location: checkpoint/myfile/20250130T205000Z/
</td>
</tr>
<tr>
<td>checkpoint/myfile/20250130T202000Z/</td>
<td>
404
</td>
</tr>
<tr>
<td>checkpoint/myfile/29250130T203000Z/</td>
<td>
404
</td>
</tr>
</table>
2 changes: 1 addition & 1 deletion docs/user/guides/update-repo-retention.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ Setting retain_repo_versions to 1 effectively disables repository versioning sin
store the latest version.

Cleanup will ignore any repo versions that are being served directly via a distribution or via a
publication.
publication. This includes repo versions of distributed checkpoint publications.

To update this field for a file Repository called myrepo, simply call:

Expand Down
6 changes: 4 additions & 2 deletions pulp_file/app/serializers.py
Original file line number Diff line number Diff line change
Expand Up @@ -115,10 +115,11 @@ class FilePublicationSerializer(PublicationSerializer):
required=False,
allow_null=True,
)
checkpoint = serializers.BooleanField(default=False)

class Meta:
model = FilePublication
fields = PublicationSerializer.Meta.fields + ("distributions", "manifest")
fields = PublicationSerializer.Meta.fields + ("distributions", "manifest", "checkpoint")


class FileDistributionSerializer(DistributionSerializer):
Expand All @@ -133,9 +134,10 @@ class FileDistributionSerializer(DistributionSerializer):
queryset=models.Publication.objects.exclude(complete=False),
allow_null=True,
)
checkpoint = serializers.BooleanField(default=False)

class Meta:
fields = DistributionSerializer.Meta.fields + ("publication",)
fields = DistributionSerializer.Meta.fields + ("publication", "checkpoint")
model = FileDistribution


Expand Down
6 changes: 4 additions & 2 deletions pulp_file/app/tasks/publishing.py
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@
log = logging.getLogger(__name__)


def publish(manifest, repository_version_pk):
def publish(manifest, repository_version_pk, checkpoint=False):
"""
Create a Publication based on a RepositoryVersion.
Expand All @@ -37,7 +37,9 @@ def publish(manifest, repository_version_pk):
)

with tempfile.TemporaryDirectory(dir="."):
with FilePublication.create(repo_version, pass_through=True) as publication:
with FilePublication.create(
repo_version, pass_through=True, checkpoint=checkpoint
) as publication:
publication.manifest = manifest
if manifest:
manifest = Manifest(manifest)
Expand Down
7 changes: 6 additions & 1 deletion pulp_file/app/viewsets.py
Original file line number Diff line number Diff line change
Expand Up @@ -433,11 +433,16 @@ def create(self, request):
serializer.is_valid(raise_exception=True)
repository_version = serializer.validated_data.get("repository_version")
manifest = serializer.validated_data.get("manifest")
checkpoint = serializer.validated_data.get("checkpoint")

result = dispatch(
tasks.publish,
shared_resources=[repository_version.repository],
kwargs={"repository_version_pk": str(repository_version.pk), "manifest": manifest},
kwargs={
"repository_version_pk": str(repository_version.pk),
"manifest": manifest,
"checkpoint": checkpoint,
},
)
return OperationPostponedResponse(result, request)

Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
# Generated by Django 4.2.18 on 2025-01-30 19:14

from django.db import migrations, models


class Migration(migrations.Migration):

dependencies = [
("core", "0127_remove_upstreampulp_pulp_label_select"),
]

operations = [
migrations.AddField(
model_name="distribution",
name="checkpoint",
field=models.BooleanField(default=False),
),
migrations.AddField(
model_name="publication",
name="checkpoint",
field=models.BooleanField(default=False, editable=False),
),
]
17 changes: 15 additions & 2 deletions pulpcore/app/models/publication.py
Original file line number Diff line number Diff line change
Expand Up @@ -73,6 +73,7 @@ class Publication(MasterModel):
pass_through (models.BooleanField): Indicates that the publication is a pass-through
to the repository version. Enabling pass-through has the same effect as creating
a PublishedArtifact for all of the content (artifacts) in the repository.
checkpoint (models.BooleanField): Indicates a checkpoint publication.
Relations:
repository_version (models.ForeignKey): The RepositoryVersion used to
Expand All @@ -98,12 +99,13 @@ class Publication(MasterModel):

complete = models.BooleanField(db_index=True, default=False)
pass_through = models.BooleanField(default=False)
checkpoint = models.BooleanField(default=False, editable=False)

repository_version = models.ForeignKey("RepositoryVersion", on_delete=models.CASCADE)
pulp_domain = models.ForeignKey("Domain", default=get_domain_pk, on_delete=models.PROTECT)

@classmethod
def create(cls, repository_version, pass_through=False):
def create(cls, repository_version, pass_through=False, checkpoint=False):
"""
Create a publication.
Expand All @@ -125,7 +127,11 @@ def create(cls, repository_version, pass_through=False):
Adds a Task.created_resource for the publication.
"""
with transaction.atomic():
publication = cls(pass_through=pass_through, repository_version=repository_version)
publication = cls(
pass_through=pass_through,
repository_version=repository_version,
checkpoint=checkpoint,
)
publication.save()
resource = CreatedResource(content_object=publication)
resource.save()
Expand Down Expand Up @@ -159,6 +165,10 @@ def delete(self, **kwargs):
# It's possible for errors to occur before any publication has been completed,
# so we need to handle the case when no Publication exists.
try:
if self.checkpoint:
base_paths |= Distribution.objects.filter(
checkpoint=self.checkpoint, repository=self.repository_version.repository
).values_list("base_path", flat=True)
versions = self.repository.versions.all()
pubs = Publication.objects.filter(repository_version__in=versions, complete=True)
publication = pubs.latest("repository_version", "pulp_created")
Expand Down Expand Up @@ -629,6 +639,7 @@ class Distribution(MasterModel):
pulp_labels (HStoreField): Dictionary of string values.
base_path (models.TextField): The base (relative) path component of the published url.
hidden (models.BooleanField): Whether this distribution should be hidden in the content app.
checkpoint (models.BooleanField): Whether this distribution serves checkpoint publications.
Relations:
content_guard (models.ForeignKey): An optional content-guard.
Expand All @@ -649,6 +660,7 @@ class Distribution(MasterModel):
base_path = models.TextField()
pulp_domain = models.ForeignKey("Domain", default=get_domain_pk, on_delete=models.PROTECT)
hidden = models.BooleanField(default=False, null=True)
checkpoint = models.BooleanField(default=False)

content_guard = models.ForeignKey(ContentGuard, null=True, on_delete=models.SET_NULL)
publication = models.ForeignKey(Publication, null=True, on_delete=models.SET_NULL)
Expand Down Expand Up @@ -706,6 +718,7 @@ def content_headers_for(self, path):
"remote",
"repository",
"repository_version",
"checkpoint",
],
has_changed=True,
)
Expand Down
10 changes: 9 additions & 1 deletion pulpcore/app/models/repository.py
Original file line number Diff line number Diff line change
Expand Up @@ -328,7 +328,15 @@ def protected_versions(self):
publication__pk__in=Distribution.objects.values_list("publication_id")
)

if distro := Distribution.objects.filter(repository=self.pk).first():
# Protect repo versions of distributed checkpoint publications.
if Distribution.objects.filter(repository=self.pk, checkpoint=True).exists():
qs |= self.versions.filter(
publication__pk__in=Publication.objects.filter(checkpoint=True).values_list(
"pulp_id"
)
)

if distro := Distribution.objects.filter(repository=self.pk, checkpoint=False).first():
if distro.detail_model().SERVE_FROM_PUBLICATION:
# if the distro serves publications, protect the latest published repo version
version = self.versions.filter(
Expand Down
Loading

0 comments on commit 6f24ed7

Please sign in to comment.