Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC655: Enhanced L1s #657

Open
wants to merge 8 commits into
base: main
Choose a base branch
from
Open
Changes from 5 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
247 changes: 247 additions & 0 deletions text/0655-enhanced-l1.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,247 @@
# Enhanced L1 constructs in CDK

* **Original Author(s):**: @aaythapa
* **Tracking Issue**: #655
* **API Bar Raiser**: @alias

## Background

In the CDK, Level 1 constructs or "L1s" represent constructs that directly correspond to AWS CloudFormation resources. These constructs are automatically generated primarily from the CloudFormation resource provider schema and provide a one-to-one mapping with the corresponding CloudFormation resource. Since they are automatically generated these constructs are updated on a ~weekly basis with new features and services as they are released in CloudFormation.

## Problem

A current gap with the CDK L1 constructs is their disconnect and feature gaps compared to CDK L2 constructs. L2s offer more sophisticated interfaces for AWS services, with built-in defaults, validation, enum definitions, and more. Additionally, L2s can easily reference other L2 constructs wherever a resource is required, using object references rather than attribute references. This provides developers with a richer and more intuitive experience.

In contrast, L1s constructs rely solely on primitive types for properties. Developers must hard-code all string and number properties, even when only specific values are permitted. There's is also no built-in validation with L1s. Many properties in CloudFormation must follow specific patterns if they're a string or fall within certain ranges if they're number, but developers using L1s only discover these constraints when their applications fail to deploy.

Furthermore, CDK L1s and L2s are not interoperable. L2 constructs are equipped with resource interfaces (e.g., IBucket, IRole, IKey, etc.), allowing them to be referenced seamlessly in other L2s. L1s lack these resource interfaces and can only be referenced using attribute . This becomes problematic when an L2 construct expects a resource interface rather than an attribute string, preventing L1s from being referenced within L2s.

For example when working with an L2 Lambda function, it's straightforward to pass in an L2 IAM role.

```
const lambdaRoleL2 = new iam.Role(this, 'lambdaRole', {
...
});
const lambdaFunctionL2 = new lambda.Function(this, 'lambdaFunction', {
...
role: lambdaRole,
...
});
```

In this case, the developer doesn't need to perform additional steps or know specific attributes to pass. However, if they attempt to pass an L1 IAM role, it becomes more complex. They'd need to manually create an `I<Resource>` object using the L1 role's attribute.
```
const lambdaRoleL1 = new iam.CfnRole(this, 'lambdaRole', {
...
});
const lambdaFunctionL2 = new lambda.Function(this, 'lambdaFunction', {
...
role: iam.Role.fromRoleArn(this, 'importedRole', lambdaRoleL1.attrArn),
...
});
```
This disconnect between L1s and L2s creates challenges for developers trying to use both in the same CDK application. Bridging this gap in a maintainable and backward-compatible way would significantly enhance the utility of L1s in CDK.

## Proposed Solution

aaythapa marked this conversation as resolved.
Show resolved Hide resolved
To resolve these issues in L1s we're proposing enhancing the current L1s by
1. Expanding the current list of data sources from which we generate L1s.
aaythapa marked this conversation as resolved.
Show resolved Hide resolved
2. Introducing enums for relevant properties.
3. Implementing validation for relevant properties.
4. Adding resource interfaces L1s to make them interoperable with L2s

These changes will be implemented in a backward-compatible way to ensure they don't break existing customers.

### Adding enums and validation

aaythapa marked this conversation as resolved.
Show resolved Hide resolved
This section describes the implementation approach for incorporating enum types and input validation into the L1 construct generation process.

#### Getting the data

The `aws-service-spec` currently integrates various data sources (listed [here](https://github.com/cdklabs/awscdk-service-spec?tab=readme-ov-file#data-sources)), combining them into a single JSON file that serves as the source of truth for L1 generation. This JSON file is then published to NPM. The CDK repository uses this file in conjunction with the `spec2cdk` script ([here](https://github.com/aws/aws-cdk/tree/main/tools/%40aws-cdk/spec2cdk)) to generate L1s. While we can enhance the CDK L1s with enums and validations by adjusting our generation code, we first need to ensure the necessary data for this generation is available.

Within our current list of data sources we can find some enums and validation information. The registry schema, which forms the base of our source of truth, already includes some enums and validation information. We just need to modify the service spec to incorporate this information into the published JSON file. We can then expand our data sources to improve coverage. `cfn-lint` is a command line tool that helps validate CFN templates for syntax and best practices. It contains a schema that builds on the CFN Resource Provider Schemas by adding additional information from various SDKs, then repackages these schemas for distribution. We can include the information provided by `cfn-lint` into our source of truth to achieve better coverage.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where does cfn-lint get the data from? Any reason not to include this data source directly?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cfn-lint already has a process built that pulls the data from the AWS SDK (specifically boto3). Idea is to not re-implement that ourselves. CFN Lint has similar data requirements/usage, so (imo) makes more sense to re-use. Ideally this data should just be in the CFN schemas per service directly, which is something we can work with cfn-lint and CFN on directly. That is outside of this, but helps with the full picture.


In cases where a schema doesn’t contain the required information, it doesn’t matter whether we retrieve this data from a JSON file or by executing an AWS CLI command. For instance this PR updates the enum values available for a property based on the output of an AWS CLI command defined in the docs for that property. We can build workflows that run these commands and update specific properties with new data. These workflows should support all necessary AWS CLI commands and allow flexible specification of which command to run and which property to update.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AFAIK AWS CLI is based on the Smithy service definitions. Would it make sense to pull these as a data source and cross reference with the CFN service spec?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is (more or less) what cfn-lint is doing (see above comment) but packaging it as CFN schemas that we can just leverage instead of doing ourselves.


#### Representing enums in L1s

To represent enums in L1s we can follow the same implementation as enums in L2s which is to create a new type in the generation process. This will allow developers to use the static enum variables from the schema or pass in an hard coded value if needed.

#### Runtime class will be generated
```
export class Runtime {
public static readonly PYTHON310 = new Runtime('python3.10');
public static readonly PYTHON311 = new Runtime('python3.11');
...
...
public readonly name: string
protected constructor(name: string) {
this.name = name;
}
public static of(value: string) {
return new Runtime(value);
}
}
...
runtime: Runtime.of('python3.10') # Allowed.
# OR
runtime: Runtime.PYTHON310 # Allowed
```

This implementation offers greater flexibility compared to using built-in enum types as custom values would not be allowed. This ensures that developers aren’t restricted when a CFN enum value is available but not yet included in the CDK.

### Validating properties

Currently, L1 constructs in the CDK perform minimal validation on properties, mostly ensuring type correctness. We can extend these validations by leveraging additional data from our sources. Depending on the property type, various validation rules can be applied, as defined in the schema:

1. String
1. `minLength`: min length the string can be
2. `maxLength`: max length the string can be
3. `pattern`: the pattern the string must match (commonly used for arns)
4. `enum`: the enum values the string can be, this likely won’t be part of the validation as we’ll have an enum type
5. `const`: the constant the string must be
2. Array
1. `minItems`: min number of items in the array
2. `maxItems`: max number of items in the array
3. Integer
1. `minimum`
2. `maximum`

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will be a huge help. It's immensely frustrating to have so many checks in your CDK code only to have a simple validation check fail in Cloudformation. Thank you for recommending this!



e.x Right now `Description` property of Lambda Function currently undergoes string validation
```
errors.collect(cdk.propertyValidator("description", cdk.validateString)(properties.description));
```
In the Registry Schema we have a maxLength value of `256` for `Description`. We can extend the validation to include this:
e.x
```
// add a validator function
errors.collect(cdk.propertyValidator("description", CfnFunctionDescriptionPropertyValidator)(properties.description));
...

// corresponding validator function
function CfnFunctionDescriptionPropertyValidator(prop: any): cdk.ValidationResult {
...
const maxLength = 256
if (prop.length > maxLength) {
errors.collect(new cdk.ValidationResult(<Error message>));
}
...
}
```
### Delivering in a backwards compatible way

The challenge lies in delivering these changes in a backward-compatible way. We can't simply overwrite the current types, nor can we add the new types as unions with the existing ones. Doing so would break backward compatibility by violating the original type guarantees. For example, changing a property type from `string` to `string | number` would remove the certainty that the property is always a string, which could prevent developers from safely performing string-specific operations. Additionally, unions could create complications in languages other than TypeScript, such as Java, C#, or Go, where unions might be represented as objects, leading to a loss of typing information.

The best approach is to deprecate the older types/files and generate new ones with distinct names. There are several ways we could implement this:

1. Add new properties to existing L1 files: Differentiate the new properties by adding a suffix, e.g., `runtime` vs. `runtime_v2`. This follows an established pattern in the CDK, where certain properties have versioned variants (e.x [here](https://github.com/aws/aws-cdk/blob/b6ad97f4b4e1c185ddc53f60e15b0dabd8022694/packages/aws-cdk-lib/aws-lambda/lib/function.ts#L571-L596)). The old properties will be tagged as `@deprecated`
1. Drawback: The UX may be confusing as developers have to decide which version of a property they want to use. Additionally,property that were required before have a new optional versions. E.x before `runtime` would be required but now we have to set both `runtime` and `runtime_v2` to optional. While we can add validation to ensure at least one is set, this approach sacrifices strong typing.
2. Create new L1 files in the existing library: Here, the suffix would be added to the file name rather than the properties, e.g., `aws_lambda` vs. `aws_lambda_v2`. The older file will be tagged as `@deprecated`
1. Drawback: This would overload the current library and may be redundant in some cases as not all properties have a _v2 so we would essentially be copying over the same thing to the new `_v2` file. For libraries with `_v2` files there may be very minimal changes.
3. Generate a new library with the new L1s: The suffix would be applied to the library name itself, e.g., `aws-cdk-lib` vs. `aws-cdk-lib-v2`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If there's enough "meat" for CDKv3, then I would recommend choosing this approach, but bear in mind that a major version is a big project (v2 tool us almost two years to release).

It is also okay to punt this particular enhancement until you are ready to release v3 given its breaking nature.

1. Drawback: While this would provide a clean break between the old L1s and the new we would have to maintain both versions of the library, adding more overhead. It would also present an issue every time we have new type changes, we’d have to make multiple `_v<number>` libraries

aaythapa marked this conversation as resolved.
Show resolved Hide resolved
### Recommended approach

Adding new property to the existing L1s would be the recommended approach. The suffix of `_v2` should make it clear which property to use and alleviate any confusion. We can also deprecate the older properties to signal to developers to use the new one. By deprecating the older properties rather than removing them, we ensure that existing code continues to function without breaking. Developers can still use the original properties, while being encouraged to adopt the new "_v2" version at their own pace.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I love the more thorough validations added with the enhanced L1s!

One question, how does this work when the original prop is required? For example in CfnTableProps, the keySchema property is required, something like below:

interface CfnTableProps {
  keySchema: FooType
}

After enhancing L1, we cannot make both original and v2 prop required, because users only need to set one.

// NOT WORK
interface CfnTableProps {
  keySchema: FooType
  keySchema_v2: FooType
}

So I think we have to make the interface something like this:

interface CfnTableProps {
  keySchema?: FooType
  keySchema_v2?: FooType
}

But imho this is actually worse than the original L1, because it lost the information that keySchema is a required property. (of course we can do runtime validation, but type check is faster.)

This is another reason why I'm not a big fan of option 1. I like option 3 better with the upgrade of aws-cdk-lib to v3 (and possibly aws-cdk v3) as @johnf said. With cdk v3, we can also remove all the feature flags that have been added since the v2 release, which has been a burden for construct library maintainers to validate their code with various possible flags.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a good example, my thought was we could do runtime validation (making sure one of these is set) but as you mentioned type check is faster. I'll add this example to the cons for option 1

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Damn this stuff is hard!

For option 1, I agree that the loss of strong typing (turning required properties to optional) is a major drawback.

But option 1 is tricky for another reason: you will likely want to deprecate the legacy properties and keep only the modern versions. In the above example, it means that you will need to delete keySchema and only leave keySchema_v2 which is not a good outcome. If you end up modernizing keySchema, you'll need to remove keySchema_v2 and then you will penalize the users who migrated to the modern version earlier by breaking or deprecating their code.

I would say the best approach would be to go with option 3 and couple this feature with a new major version (v3) of the CDK, which will allow you to break the existing APIs. When we planned v2 we had a GitHub issue where we accumulated these major breaking changes, and at some point we got to a critical mass that justified a new major version.

If v3 is too far away and there's a very strong push to introduce this capabilities sooner, I think option 2 is a reasonable tradeoff albeit there's a chance it will cause the library to be too large.

I am wondering if perhaps as an intermediate step (before a major version is released), you can just add better documentation and runtime (synth-time) checks that the value is valid. This will go a long way to shift-left errors without breaking. In TypeScript, you could even use a type union like "foo" | "bar" | "baz" to improve typings without impacting JSII languages.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you end up modernizing keySchema, you'll need to remove keySchema_v2 and then you will penalize the users who migrated to the modern version earlier by breaking or deprecating their code.

By modernizing keySchema do you mean if/when we go to CDK V3 we'd have to go back to keySchema?

I am wondering if perhaps as an intermediate step (before a major version is released), you can just add better documentation and runtime (synth-time) checks that the value is valid.

Updating the docs is a great idea and the implementation should be fairly straight forward. For the runtime validation, is it possible to do without introducing a new property and without breaking compatibility?

When we planned v2 we had a GitHub issue where we accumulated these major breaking changes?

Do you have a link to this github issue? Couldn't find it through a quick search 😅

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you end up modernizing keySchema, you'll need to remove keySchema_v2 and then you will penalize the users who migrated to the modern version earlier by breaking or deprecating their code.

By modernizing keySchema do you mean if/when we go to CDK V3 we'd have to go back to keySchema?

Yes. In the long run that's the "right" name.

I am wondering if perhaps as an intermediate step (before a major version is released), you can just add better documentation and runtime (synth-time) checks that the value is valid.

Updating the docs is a great idea and the implementation should be fairly straight forward. For the runtime validation, is it possible to do without introducing a new property and without breaking compatibility?

I believe it should be possible to only add runtime validations instead of introducing new types.

When we planned v2 we had a GitHub issue where we accumulated these major breaking changes?

Do you have a link to this github issue? Couldn't find it through a quick search 😅

https://github.com/aws/aws-cdk-rfcs/blob/main/text/0079-cdk-2.0.md

That's the relevant RFC

E.x
```
const func = new lambda.CfnFunction(this, 'test', {
code: {
...
},
handler: 'index.handler',
role: r.attrArn,
runtime_v2: lambda.CfnFunction.Runtime.PYTHON310, // new V2 property
// if both runtime and runtime_v2 are set we can throw an error
timeout: 30
})
Comment on lines +206 to +215

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't think of a better way to manage the backwards compatibility issues here, but I'm not crazy about the possibility of having runtime_v2 defined but not runtime. Can't think of specific examples but I get the impression validation becomes a little more difficult - if you must have a property defined, now you have to account for that property and its _v2 variant, which could introduce issues for certain use cases.

```

One potential drawback would be the mismatch between L1 and their corresponding L2’s properties. For instance, L1 might have a `_v2` property while L2 still uses the older version. Since L2 properties are typically added with validations and enums in mind, there’s no immediate need to switch, but updates should be made when feasible. While the process of updating L2 properties would need to be handled manually, we could implement a notification system whenever an L2 property requires updating.

### POC

We created an initial internal POC which involved adding the new data sources to the service spec and tweaking the generation script to include validations plus create new properties when new types were introduced. We can continue that work by making the necessary changes to spec2cdk and generating CDK code from the new data.

## Resource interface

One of the main pain points of working with L1s is that they can’t be used when L2s are expected. This creates a disconnect between L1s and L2s where using both in one CDK app is difficult. To resolve this we need to find a way to pass L1s in where L2s are expected and vice versa, enhancing flexibility and allow developers to more easily adopt L1s.

### Current behavior

Most L2s currently only accept object references. This is done by hand writing resource interfaces in all L2 constructs and that interface is used whenever a construct is referenced anywhere. This gives developers a richer interaction with the referenced object as they don’t have to worry about which attribute they need to reference and they can also utilize logic encapsulated by the target object.

e.x in the Bucket L2 construct
```
export interface IBucket extends IResource {
readonly bucketName: ...
readonly bucketArn: ...
...
addToResourcePolicy(permission: iam.PolicyStatement) ...
grantRead(identity: iam.IGrantable, objectsKeyPattern?: any) ...
...
}
```
Anytime the bucket needs to be referenced in another construct it will take in IBucket
```
export interface CodePipelineProps {
...
readonly artifactBucket?: s3.IBucket;
}
```

L1s on the other hand don’t have any object references and just take strings to refer to other objects (names, ARNs, ids etc). They also don’t implement resource interfaces, thus cannot be passed where L2s are expected.

### Proposed solution

To make L1s and L2s interoperable we first need to emit a minimal interface in the L1s file with a change to the generation script. This interface should include the refs attributes for that resource (marked as `primaryIdentifiers` in the schema) and any arn attributes.

e.x for the S3Bucket L1
```
export interface ICfnBucket {
attrBucketName ...
attrArn ...
}
```
In the L2’s resource interface we can extend this L1 interface. We can then modify all references of the L2 interface to now take in the new L1 interface. Any location that accepted `IBucket` would now accept `ICfnBucket`

e.x in the Bucket L2
```
export interface IBucket extends ICfnBucket, IResource {
...
}
```
Any location that accepted `IBucket` would now accept `ICfnBucket`
```
export interface CodePipelineProps {
...
readonly artifactBucket?: s3.ICfnBucket;
}
```
If those locations use a functionality from the L2 interface, we can do an upconversion to an `IBucket` by implementing an API that converts an `ICfn<Resource>` into an `I<Resource>`. We do a similar operation when a resource that isn’t created in the current CDK stack is referenced. This needs to done to maintain backwards compatibility and not break existing customers.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am wondering if perhaps it would be sufficient to just add fromCfnXxx to all L2s instead of juggling all of these interfaces.

These conversion static methods would be straightforward to implement:

public static fromCfnBucket(bucket: CfnBucket): IBucket {
  return this.fromBucketArn(bucket.attrBucketArn);
}

Indeed, it won't allow users to directly pass an L1 everywhere an IBucket is needed but it feels like this would be good enough to improve the ergonomics without too many breaking changes.

Copy link
Author

@aaythapa aaythapa Dec 27, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bucket L2 already has a fromCfnBucket that implements what you mentioned but I'm not sure how many other L2s have something similar or what the effort to add the missing ones would be. Could be worthwhile alternative but I don't think it fixes the root issue (and it becomes another thing we have to manually make rather than having it be auto generated :/ )

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should be easy to enforce the fromCfnXxx static methods using awslint so it's semi-manual.


e.x
```
// takes in ICfnBucket and returns IBucket
public static fromCFNBucket(bucket: ICfnBucket): IBucket {
...
}

// use case
...
private readonly bucket: s3.IBucket;
constructor(bucket: s3.ICfnBucket) {
super();
this.bucket = s3.Bucket.fromCFnBucket(bucket);
}
...
```
Note that we won’t be able to change `I<Resource>` to `ICfn<Resource>` everywhere. Public properties with the type `I<Resource>` can’t be changed as developers might be using an `I<Resource>` specific property. We can work on deprecating these properties and creating new `_v2` version of the property that will accept the CFN interface

### POC

There were some internal experiments done around this idea before. We concluded that generating `ICfn<Resource>` is fairly straight forward but actually backporting the interface to our existing L2 API is manual and runs the risk of accidentally creating breaking changes e.x turning a public property `IBucket` into `ICfnBucket` (since users could be referencing `IBucket` -specific properties) or renaming a parameter to an API can cause breaking changes in Python. If we move forward with this we'll continue the work from the experiments while keeping the findings in mind.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cautiously optimistic about this approach, and definitely curious about the findings.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can look at this branch for a deeper look on this POC: https://github.com/aws/aws-cdk/tree/conroy/generate


Loading