- Python CLI tool for rolling updates of CL-AppPipe-* and CL-SvcPipe-* stacks - Async update engine with configurable concurrency (asyncio.Semaphore) - Exponential backoff retry for API throttling - Dry-run mode for safe preview - IAM permission pre-validation - Comprehensive test suite (80 tests: 11 property-based + 69 unit) - Full spec documentation (requirements, design, tasks)
15 KiB
Design Document: One-Click CloudFormation Stack Updater
Overview
The One-Click CFN Stack Updater is a CLI tool (Python) that automates the rolling update of all CL-AppPipe-* CloudFormation stacks in an audit account. It discovers stacks dynamically by prefix, validates IAM permissions, and updates each stack using its existing parameters so the only change is the refreshed nested template URL resolving to the latest version. The tool supports concurrency control, dry-run mode, exponential backoff on throttling, and produces a structured summary report.
The tool is implemented as a single Python package using boto3 for AWS interactions. Python is chosen because it is the standard language for AWS automation tooling, boto3 provides first-class CloudFormation support, and the operator audience is already familiar with Python-based AWS scripts.
Architecture
The system follows a pipeline architecture with four sequential phases:
flowchart LR
A[Permission\nValidation] --> B[Stack\nDiscovery]
B --> C[Stack\nUpdate Engine]
C --> D[Report\nGenerator]
- Permission Validation — Verifies the executing role has the required IAM permissions before any work begins.
- Stack Discovery — Lists all CloudFormation stacks matching the
CL-AppPipe-prefix and filters to updatable states. - Stack Update Engine — Updates stacks concurrently (bounded by
Concurrency_Limit) with retry logic for throttling errors. - Report Generator — Aggregates results and produces the final summary.
Concurrency Model
The update engine uses a semaphore-based concurrency model with asyncio to run up to Concurrency_Limit stack updates in parallel. Each update is an independent coroutine that:
- Fetches current stack parameters
- Calls
UpdateStackwith existing parameters and the nested template URL - Polls
DescribeStacksuntil the update completes or fails - Records the result
flowchart TD
S[Semaphore: Concurrency_Limit] --> U1[Update Stack 1]
S --> U2[Update Stack 2]
S --> U3[Update Stack N]
U1 --> R[Result Collector]
U2 --> R
U3 --> R
R --> Report[Summary Report]
Components and Interfaces
CLI Entry Point (cli.py)
Parses command-line arguments and orchestrates the pipeline.
def main(
prefix: str = "CL-AppPipe-",
concurrency: int = 5,
dry_run: bool = False,
region: str | None = None,
) -> int:
"""
Entry point. Returns 0 on full success, 1 if any stack failed.
"""
Arguments:
| Flag | Type | Default | Description |
|---|---|---|---|
--prefix |
str |
CL-AppPipe- |
Stack name prefix to match |
--concurrency |
int |
5 |
Max parallel updates |
--dry-run |
bool |
False |
Preview mode, no updates |
--region |
str |
SDK default | AWS region override |
Permission Validator (permissions.py)
def validate_permissions(cfn_client) -> list[str]:
"""
Checks required IAM permissions by performing dry-run API calls.
Returns a list of missing permission names. Empty list means all OK.
"""
Validates by attempting:
cloudformation:ListStacks— callslist_stackswith a narrow filtercloudformation:DescribeStacks— callsdescribe_stackswith a non-existent stack name (expects specific error)cloudformation:UpdateStack— validated implicitly during updates; pre-check uses IAM policy simulation viaiam:SimulatePrincipalPolicyif available, otherwise deferred
Stack Discovery (discovery.py)
@dataclass
class DiscoveredStack:
name: str
status: str
updatable: bool
def discover_stacks(cfn_client, prefix: str) -> list[DiscoveredStack]:
"""
Lists all stacks matching the prefix. Paginates through all results.
Marks each stack as updatable or not based on its status.
"""
Non-updatable statuses:
ROLLBACK_COMPLETEROLLBACK_IN_PROGRESSUPDATE_ROLLBACK_IN_PROGRESSUPDATE_ROLLBACK_COMPLETE_CLEANUP_IN_PROGRESSDELETE_IN_PROGRESSDELETE_COMPLETE
Stack Updater (updater.py)
@dataclass
class StackUpdateResult:
stack_name: str
status: Literal["succeeded", "failed", "skipped", "no-update-needed"]
error: str | None = None
duration_seconds: float = 0.0
async def update_stack(
cfn_client,
stack_name: str,
template_url: str,
max_retries: int = 3,
) -> StackUpdateResult:
"""
Updates a single stack. Handles 'No updates' response, throttling retries,
and non-updatable state detection.
"""
async def update_all_stacks(
cfn_client,
stacks: list[DiscoveredStack],
template_url: str,
concurrency: int = 5,
max_retries: int = 3,
) -> list[StackUpdateResult]:
"""
Updates all stacks with bounded concurrency using asyncio.Semaphore.
"""
Retry logic:
- Triggered on
ThrottlingorRequestLimitExceedederror codes - Exponential backoff:
base_delay * 2^attempt(base_delay = 1s) - Maximum 3 retries per stack
Report Generator (report.py)
@dataclass
class UpdateRunReport:
start_time: datetime
end_time: datetime
total_found: int
succeeded: int
failed: int
skipped: int
no_update_needed: int
results: list[StackUpdateResult]
def generate_report(
results: list[StackUpdateResult],
total_found: int,
start_time: datetime,
end_time: datetime,
) -> UpdateRunReport:
"""
Aggregates results into a summary report.
"""
def format_report(report: UpdateRunReport) -> str:
"""
Formats the report as a human-readable string for console output.
"""
Data Models
DiscoveredStack
| Field | Type | Description |
|---|---|---|
name |
str |
CloudFormation stack name |
status |
str |
Current stack status (e.g., CREATE_COMPLETE) |
updatable |
bool |
Whether the stack is in an updatable state |
StackUpdateResult
| Field | Type | Description |
|---|---|---|
stack_name |
str |
Name of the stack |
status |
Literal["succeeded", "failed", "skipped", "no-update-needed"] |
Outcome of the update attempt |
error |
str | None |
Error message if failed |
duration_seconds |
float |
Time taken for this stack's update |
UpdateRunReport
| Field | Type | Description |
|---|---|---|
start_time |
datetime |
When the Update_Run started |
end_time |
datetime |
When the Update_Run ended |
total_found |
int |
Total Target_Stacks discovered |
succeeded |
int |
Count of successfully updated stacks |
failed |
int |
Count of failed stacks |
skipped |
int |
Count of skipped (non-updatable) stacks |
no_update_needed |
int |
Count of stacks with no changes |
results |
list[StackUpdateResult] |
Per-stack results |
Configuration Constants
TEMPLATE_URL = "https://s3.amazonaws.com/solutions-reference/centralized-logging-with-opensearch/latest/AppLogS3Buffer.template"
DEFAULT_PREFIX = "CL-AppPipe-"
DEFAULT_CONCURRENCY = 5
MAX_RETRIES = 3
BASE_RETRY_DELAY = 1.0 # seconds
NON_UPDATABLE_STATUSES = frozenset({
"ROLLBACK_COMPLETE",
"ROLLBACK_IN_PROGRESS",
"UPDATE_ROLLBACK_IN_PROGRESS",
"UPDATE_ROLLBACK_COMPLETE_CLEANUP_IN_PROGRESS",
"DELETE_IN_PROGRESS",
"DELETE_COMPLETE",
})
Correctness Properties
A property is a characteristic or behavior that should hold true across all valid executions of a system — essentially, a formal statement about what the system should do. Properties serve as the bridge between human-readable specifications and machine-verifiable correctness guarantees.
Property 1: Discovery returns exactly prefix-matched stacks with correct count
For any list of CloudFormation stacks with arbitrary names, the discovery function should return exactly those stacks whose names start with the configured prefix, and the reported count should equal the length of that filtered list.
Validates: Requirements 1.1, 1.3
Property 2: All updatable stacks are attempted
For any set of discovered stacks marked as updatable, the update engine should produce exactly one update result per updatable stack — no stack is silently dropped and no stack is attempted twice.
Validates: Requirements 2.2
Property 3: Concurrency limit invariant
For any positive concurrency limit and any list of stacks, at no point during an Update_Run should the number of concurrently in-progress stack updates exceed the specified concurrency limit.
Validates: Requirements 2.3, 3.3
Property 4: Update call preserves existing parameters and uses correct template URL
For any stack with any set of existing parameters, the UpdateStack API call should include exactly those same parameter keys with UsePreviousValue=True, and the TemplateURL argument should equal the configured Nested_Template_URL.
Validates: Requirements 3.1, 3.2
Property 5: "No updates" response maps to no-update-needed status
For any stack where CloudFormation returns a "No updates are to be performed" error, the resulting StackUpdateResult should have status no-update-needed (not failed).
Validates: Requirements 3.4
Property 6: Non-updatable stacks are skipped
For any stack whose CloudFormation status is in the set of non-updatable statuses (e.g., ROLLBACK_COMPLETE, DELETE_IN_PROGRESS), the result should have status skipped and no UpdateStack API call should be made for that stack.
Validates: Requirements 4.2
Property 7: Fault isolation — failures do not block remaining stacks
For any list of N updatable stacks where K of them fail (including after retry exhaustion), the update engine should still produce results for all N stacks, and the number of attempted updates should equal N.
Validates: Requirements 4.1, 4.4
Property 8: Throttling triggers exponential backoff retries
For any stack that receives throttling errors, the system should retry up to MAX_RETRIES times, and the delay between the i-th and (i+1)-th attempt should be at least BASE_RETRY_DELAY * 2^i seconds.
Validates: Requirements 4.3
Property 9: Report aggregation and exit code correctness
For any list of StackUpdateResult values, the generated report's succeeded, failed, skipped, and no_update_needed counts should equal the actual counts of each status in the input list, and the exit code should be non-zero if and only if failed > 0.
Validates: Requirements 5.2, 5.3
Property 10: Dry-run performs no updates and lists all discovered stacks
For any set of discovered stacks, when dry-run mode is enabled, zero UpdateStack API calls should be made, and the output should contain the name and current status of every discovered stack.
Validates: Requirements 6.2, 6.3
Property 11: Permission validation correctness
For any subset of required permissions that are missing, the permission validator should return exactly those missing permissions, and when any permissions are missing, the Update_Run should terminate without making any UpdateStack API calls.
Validates: Requirements 7.1, 7.2
Error Handling
Error Categories and Responses
| Error | Source | Response |
|---|---|---|
| Missing IAM permissions | Permission validation phase | Report missing permissions, exit with non-zero code, no updates attempted |
| No stacks found | Discovery phase | Log warning, exit with code 0 (not an error) |
| Stack in non-updatable state | Update phase | Skip stack, log warning, record as skipped |
| "No updates to be performed" | CloudFormation UpdateStack API | Treat as success, record as no-update-needed |
| Throttling / RequestLimitExceeded | CloudFormation API | Retry with exponential backoff (max 3 retries) |
| Throttling after max retries | CloudFormation API | Mark stack as failed, continue with remaining stacks |
| UpdateStack failure (other) | CloudFormation API | Log error details, mark as failed, continue with remaining stacks |
| Boto3 connection error | Network / SDK | Mark stack as failed, log error, continue |
| Invalid CLI arguments | Argument parsing | Print usage, exit with non-zero code |
Retry Strategy
async def retry_with_backoff(func, max_retries=3, base_delay=1.0):
for attempt in range(max_retries + 1):
try:
return await func()
except ClientError as e:
code = e.response["Error"]["Code"]
if code in ("Throttling", "RequestLimitExceeded") and attempt < max_retries:
delay = base_delay * (2 ** attempt)
await asyncio.sleep(delay)
else:
raise
Exit Codes
| Code | Meaning |
|---|---|
0 |
All stacks updated successfully (or no stacks found, or dry-run) |
1 |
One or more stacks failed to update |
2 |
Permission validation failed |
Testing Strategy
Testing Framework
- Unit tests:
pytest - Property-based tests:
hypothesis(Python's standard PBT library) - Mocking:
unittest.mockandbotocore.stub.Stubberfor AWS API mocking
Property-Based Tests
Each correctness property from the design maps to a single property-based test. All property tests run a minimum of 100 iterations using Hypothesis settings.
| Property | Test Description | Key Generators |
|---|---|---|
| P1 | Discovery prefix filtering | Random stack name lists (some with prefix, some without) |
| P2 | All updatable stacks attempted | Random lists of DiscoveredStack with mixed updatable flags |
| P3 | Concurrency limit invariant | Random concurrency values (1–20), random stack counts (1–50) |
| P4 | Parameter preservation and template URL | Random parameter key-value dicts |
| P5 | "No updates" status mapping | Random stacks with mocked "no updates" responses |
| P6 | Non-updatable stack skipping | Random stacks with statuses drawn from updatable and non-updatable sets |
| P7 | Fault isolation | Random stack lists with random failure injection |
| P8 | Exponential backoff retries | Random retry counts (0–3), verify delay sequence |
| P9 | Report aggregation | Random lists of StackUpdateResult with random statuses |
| P10 | Dry-run no-op | Random discovered stacks, verify zero update calls |
| P11 | Permission validation | Random subsets of required permissions marked as missing |
Each test must be tagged with a comment:
# Feature: one-click-cfn-stack-updater, Property 9: Report aggregation and exit code correctness
Unit Tests
Unit tests complement property tests by covering:
- Specific examples: Known stack names, known parameter sets, expected API responses
- Edge cases: Empty stack list (Req 1.4), concurrency of 1, all stacks failing, all stacks already up-to-date
- Integration points: CLI argument parsing, boto3 Stubber-based API interaction tests
- Error conditions: Malformed API responses, unexpected exception types
Test Organization
tests/
├── test_discovery.py # P1, P6 property tests + unit tests
├── test_updater.py # P2, P3, P4, P5, P7, P8 property tests + unit tests
├── test_report.py # P9 property tests + unit tests
├── test_dry_run.py # P10 property tests + unit tests
├── test_permissions.py # P11 property tests + unit tests
└── test_cli.py # CLI argument parsing unit tests