Skip to content

4.2: Application State Replication and Disaster Recovery Strategy

✂️ Tl;dr 🥷

Discusses replicating and recovering the stateful components of ArcGIS Enterprise (Portal, Server, Data Store) within an automated DevOps framework. Esri’s webgisdr utility is central to capturing and restoring dynamic application state, including configurations, user data and service definitions. Infrastructure provisioning via OpenTofu and baseline software installation via configuration management tools are decoupled from state replication, enabling immutable infrastructure principles. Automated backups integrate with Azure services: credentials are securely managed via Key Vault, configuration parameters via App Configuration and backups stored in geo-redundant Azure Blob Storage. Disaster recovery orchestration rebuilds infrastructure in a secondary region using IaC, then restores the latest application state via webgisdr. The process emphasises end-to-end automation, zero-trust security practices and alignment with RPO/RTO targets through scheduled backups and rigorous testing. A conceptual Python script demonstrates secure credential handling, dynamic configuration generation and integration with Azure’s SDKs.

This section outlines the approach for replicating and recovering the core ArcGIS Enterprise application state. This encompasses the stateful components: Portal for ArcGIS, ArcGIS Server and the ArcGIS Data Store. A central element of this strategy is the utilisation of Esri's webgisdr utility. Given this architecture's adherence to a fully automated, DevOps and Infrastructure as Code (IaC) approach, it is crucial to detail how webgisdr, a tool that produces a single backup file representing the application's state, integrates with these modern principles.

4.2.1 webgisdr in an Automated DevOps Setup

The new eMap platform is centred on a cloud-native architecture where infrastructure is immutable, defined by code and system configurations are applied consistently via the designated Configuration Management tool.

  1. Infrastructure Provisioning (IaC - OpenTofu): OpenTofu is exclusively responsible for the provisioning of all Azure infrastructure resources. This includes Virtual Machines (VMs) as well as Azure PaaS services.
  2. Base Software Installation and Configuration (CM Tool): The designated Configuration Management tool will be responsible for installing and configuring the ArcGIS Enterprise software components (Portal for ArcGIS, ArcGIS Server, ArcGIS Data Store) to a baseline, operational state on the infrastructure provisioned by OpenTofu. This includes applying licences, defining initial primary site administrator (PSA) accounts and performing basic component registrations.
  3. Application State Management (webgisdr): As a stateful application, ArcGIS Enterprise doesn't support many of the modern cloud-native replication and DR patterns. To bridge this gap, Esri provies, webgisdr, a utility for managing the ArcGIS Enterprise's application state. It is designed to capture and restore the dynamic content, configurations and inter-component relationships that constitute the operational state of an ArcGIS Enterprise deployment. This state information includes:

    • Portal for ArcGIS items (e.g., maps, applications, layers), users, groups and organisational settings.
    • ArcGIS Server service definitions, site configurations and security settings.
    • ArcGIS Data Store content (e.g., hosted feature layers, specific tile caches managed by the Data Store).
    • Federation relationships between Portal for ArcGIS and ArcGIS Server sites, including the hosting server designation.

By treating the webgisdr utility as the designated tool for application state backup and recovery—distinct from infrastructure provisioning and base software installation—a clean and effective integration with automated workflows can be achieved.

4.2.2 Automation of Backup Processes

For robust High Availability (Silver Stage) and Disaster Recovery (Gold Stage), regular and automated backups of the ArcGIS Enterprise application state are paramount. The webgisdr utility must be fully integrated into these automated processes, ensuring consistent and reliable capture of the application state.

The key components for automating webgisdr backups are:

  1. Trigger Mechanism:

    • Backup operations MUST be initiated automatically based on a predefined schedule. The recommended pattern for the new eMap platform involves scheduled triggers within GitHub Actions CI/CD pipelines. These triggers will invoke the Configuration Management tool to execute a script on the primary Portal for ArcGIS VM.
    • Alternative mechanisms, such as direct cron jobs or systemd timers on the Portal VM, or Azure Automation Runbooks, are possible but not recommended in this architecture.
  2. Secure Credential Management:

    • The webgisdr utility requires Portal for ArcGIS primary site administrator (PSA) credentials.
    • In accordance with security best practices and the Zero Trust Security Model, these credentials (username and password) MUST be stored securely as secrets within Azure Key Vault. Automation scripts, executed by the Configuration Management tool, MUST** retrieve these credentials at runtime using the Managed Identity of the Portal VM. Hardcoding credentials is strictly prohibited.
  3. webgisdr.properties File Management:

    • The webgisdr.properties file content MUST be created dynamically at runtime using Azure App Configuration and Azure Key Vault, rather than relying on static template files on the VM.
    • Azure App Configuration: Non-sensitive parameters for webgisdr (e.g., PORTAL_ADMIN_URL, SHARED_LOCATION, AZURE_STORAGE_ACCOUNT_NAME, AZURE_BLOB_CONTAINER_NAME, boolean flags such as RESTORE_RELATIONAL_DATA) should be stored in an Azure App Configuration store. Environment-specific values will should managed using labels within App Configuration (e.g., for DEV, UAT, PROD-Melbourne).
    • Azure Key Vault Integration: Sensitive values, specifically the PORTAL_ADMIN_USERNAME and PORTAL_ADMIN_PASSWORD, MUST be stored in Azure Key Vault. Azure App Configuration will store references to these Key Vault secrets. The Managed Identity of the Portal VM (acting on behalf of the CM tool) requires permissions to read from both App Configuration and the referenced Key Vault secrets.
    • Dynamic Generation: The automation script executed by the CM tool will:
      1. Authenticate to Azure App Configuration using the VM's Managed Identity.
      2. Fetch all necessary configuration parameters. App Configuration will resolve Key Vault references to retrieve the actual secret values.
      3. Construct the content of the webgisdr.properties file in memory.
      4. Securely write this dynamically generated content to a temporary file in a restricted location on the VM (e.g., /tmp/webgisdr_runtime.properties on Linux, with permissions set to be readable only by the execution context).
      5. This temporary properties file MUST be deleted immediately after the webgisdr command execution completes, regardless of success or failure (e.g., within a finally block in the script). This approach centralises configuration, enhances security by minimising the on-disk presence of sensitive information and aligns with immutable infrastructure principles.
  4. Backup Locations (SHARED_LOCATION and BACKUP_STORE_PROVIDER/BACKUP_LOCATION):

    • SHARED_LOCATION: As defined in App Configuration, this parameter defines a temporary staging area on a file system accessible by the machine executing the webgisdr utility (typically the active Portal for ArcGIS VM). An Azure Files share, mounted to the Portal VM, serves as an appropriate and resilient choice for this staging location.
      • For the PROD environment, this Azure Files share should be configured with Zone-Redundant Storage (ZRS). This ensures intra-region resilience for the staging area.
    • **BACKUP_STORE_PROVIDER: should be set to AzureBlob (retrieved from App Configuration).
    • AZURE_STORAGE_ACCOUNT_NAME / AZURE_BLOB_CONTAINER_NAME: These specify the Azure Blob Storage container where the final .webgissite backup file will be stored, with values sourced from App Configuration. For Disaster Recovery purposes (Gold Stage), the Azure Storage Account hosting this container for the PROD environment should be configured with *Geo-Redundant Storage (GRS). This ensures backups are asynchronously replicated from the Melbourne region to the designated DR region (Sydney). Container and storage account names should be parameterised per environment via App Configuration.
  5. Execution and Output:

    • The automation script will invoke the webgisdr command (e.g., webgisdr --export --file /path/to/temporary_properties_file).
    • The utility creates the backup archive in the SHARED_LOCATION (the Azure Files share) and subsequently uploads it automatically to the specified Azure Blob Storage container when BACKUP_STORE_PROVIDER=AzureBlob is configured.
  6. Post-Backup Operations (Logging and Cleanup):

    • The automation script MUST comprehensively log the success or failure of each backup operation and integrate with Azure Monitor.
    • The script should implement logic for cleaning up older backup files from the local SHARED_LOCATION (Azure Files share) to manage staging space.
    • The temporary webgisdr.properties file MUST be deleted.
    • Retention of backup files within Azure Blob Storage MUST be managed using Azure Storage lifecycle management policies.

4.2.3 Automated Disaster Recovery Orchestration

In a Disaster Recovery scenario, the primary objective is to restore service in the secondary Azure region (Sydney) with minimal downtime and data loss. The webgisdr utility is integral to restoring the ArcGIS Enterprise application state onto infrastructure that has been rebuilt by IaC and CM processes.

The DR restoration process is orchestrated in phases:

  1. Phase 1: Infrastructure and Base Software Provisioning (IaC and CM Driven)

    • Trigger: This phase is initiated either by a manual declaration of a disaster or by an automated trigger from the Monitoring & Observability framework (Section 3.7) detecting a severe and prolonged outage in the primary region (Melbourne).
    • OpenTofu Execution: IaC scripts (OpenTofu) are executed to provision all necessary Azure resources in the DR region (Sydney), establishing the "pilot light" infrastructure. This encompasses VMs for Portal for ArcGIS, ArcGIS Server and ArcGIS Data Store; Azure App Services for Web Adaptors; networking components; storage accounts (which would already contain replicated data via GRS/GZRS if used for webgisdr backups and other shared storage); and the Azure Database for PostgreSQL instance (which is failed over from the primary region's replica).
    • Configuration Management Tool Execution: The CM tool runs on the newly provisioned VMs in Sydney to:
      • Apply OS hardening configurations (Ubuntu 24.04 LTS).
      • Install the ArcGIS Enterprise software components (Portal, Server, Data Store) to a "clean" or "default site" state.
      • Configure the ArcGIS Web Adaptors on the App Service instances to point to these new, unconfigured backend components.
    • Outcome: At the end of this phase, a functional, but essentially empty and unconfigured, ArcGIS Enterprise deployment is operational in the DR region (Sydney).
  2. Phase 2: Application State Restore (Scripted webgisdr Import)

    • This phase is orchestrated by a dedicated DR automation script, managed within the CI/CD pipeline.
    • Retrieve DR Configuration: The script securely fetches all necessary configuration parameters (including PSA credentials for the newly created, clean Portal instance in the DR region, SHARED_LOCATION for DR, target AZURE_STORAGE_ACCOUNT_NAME and AZURE_BLOB_CONTAINER_NAME where replicated backups reside) from Azure App Configuration (using a DR-specific label, e.g., "PROD-Sydney") and Azure Key Vault, similar to the backup process.
    • Access Backup File:
      • The script identifies the latest valid .webgissite backup file from the geo-replicated Azure Blob Storage container (the PROD backup storage account configured with GRS/GZRS).
      • The chosen backup file is downloaded from Azure Blob Storage to the SHARED_LOCATION (e.g., a mounted ZRS Azure Files share) accessible by the Portal VM in the DR region.
    • Prepare DR webgisdr.properties File: Using the configuration retrieved from App Configuration and Key Vault, the script dynamically generates a temporary webgisdr.properties file tailored for the DR environment and the specific backup file. This temporary file is securely written to the DR Portal VM and deleted post-execution.
    • Execute webgisdr Import: The script invokes the webgisdr --import --file /path/to/temporary_dr_webgisdr.properties command on the Portal VM in the DR region.
    • Outcome: The complete ArcGIS Enterprise application state—including Portal items, users, groups, Server services, ArcGIS Data Store content and federation settings—is restored onto the newly provisioned DR infrastructure.
  3. Phase 3: Post-Restore Finalisation (Scripted and Manual Steps)

    • DNS/GSLB Update: As detailed in Section 4.1.4, automated scripts update DNS records or the Global Server Load Balancer (GSLB) configuration to redirect user traffic to the now-active DR environment in Sydney.
    • Validation: Automated smoke tests and validation scripts are executed to confirm the health and functionality of the restored services.
    • Notifications: Relevant stakeholders are alerted that the DR failover process is complete and services are operational from the Sydney region.

The webgisdr.properties file for a DR import will have its parameters (such as DR Portal URL, DR PSA credentials and the specific BACKUP_FILE_NAME to restore) dynamically sourced from Azure App Configuration (with a DR label) and Azure Key Vault by the DR automation script.

  • Recovery Point Objective (RPO) and Recovery Time Objective (RTO) Considerations:
    • The RPO for content managed by webgisdr (such as Portal items and ArcGIS Data Store content) is directly determined by the frequency of the automated backup operations. Frequent, scheduled backups to geo-replicated Azure Blob Storage help minimise potential data loss.
    • The RTO related to webgisdr restoration can be a significant component of the overall DR RTO. However, by fully automating the preceding IaC (OpenTofu) and CM steps for infrastructure rebuild and base software installation and by automating the webgisdr import process itself, the overall DR RTO can be significantly optimised and made more predictable. The "pilot light" DR infrastructure strategy, where minimal resources are pre-provisioned in the DR region (Sydney) and scaled upon failover, further reduces the time spent on infrastructure provisioning during a DR event.

Key to Success: Comprehensive Automation and Rigorous Testing

The successful integration of the webgisdr utility into a modern DevOps operational model is contingent upon comprehensive automation of both backup and restore processes, leveraging cloud-native configuration management as detailed. Equally crucial is regular, rigorous testing of Disaster Recovery procedures (Gold Stage). This includes end-to-end testing of the full orchestration, from infrastructure rebuild in the DR region to application state restoration and final service validation via the GSLB.

Defining these distinct roles and leveraging the appropriate tools and services for each aspect of the platform—IaC/CM for infrastructure and base software, webgisdr for ArcGIS application state (with configuration managed via Azure App Configuration and Key Vault and backups to GRS/GZRS Blob storage for PROD) and Azure-native replication for user-managed PaaS data storesestablishes a comprehensive, automated and efficient High Availability and Disaster Recovery strategy for the new eMap platform.

4.2.4 Conceptual Backup Implementation

The following code is a conceptual implementation of the strategy discussed in this section, showing patterns to run webgisdr via a Python script.

The script shows integration with Azure App Configuration and Key Vault using the Azure SDK and Managed Identities, configuration validation using Pydantic and asynchronous operation.

webgisdr_backup.py
#!/usr/bin/env python3

import argparse
import asyncio # (1)
import json
import logging
import os
import subprocess
import sys
import tempfile
from pathlib import Path
from typing import Dict, Any, TypedDict, cast


from pydantic import BaseModel, SecretStr, HttpUrl, Field, FilePath  # (2)

from azure.identity.aio import DefaultAzureCredential
from azure.appconfiguration.aio import AzureAppConfigurationClient
from azure.keyvault.secrets.aio import SecretClient
from azure.core.exceptions import AzureError, ResourceNotFoundError
from azure.appconfiguration import ConfigurationSetting


from pythonjsonlogger import jsonlogger

# --- Global Constants ---
WEBGISDR_TIMEOUT_SECONDS = 7200  # 2 hours

 # (3)
class AppConfigKeyMap(TypedDict):
    portal_admin_url: str
    webgisdr_tool_path: str
    shared_location_path: str
    backup_container_name: str
    storage_account_name: str
    portal_admin_username_ref: str # Key for AppConfig Key Vault reference for username
    portal_admin_password_ref: str # Key for AppConfig Key Vault reference for password


AZURE_APP_CONFIG_KEYS: AppConfigKeyMap = { # (4)
    "portal_admin_url": "emap:webgisdr:PortalAdminUrl",
    "webgisdr_tool_path": "emap:webgisdr:ToolPath",
    "shared_location_path": "emap:webgisdr:SharedLocationPath",
    "backup_container_name": "emap:webgisdr:AzureBlobContainerName",
    "storage_account_name": "emap:webgisdr:AzureStorageAccountName",
    "portal_admin_username_ref": "emap:webgisdr:PortalAdminUsernameRef",
    "portal_admin_password_ref": "emap:webgisdr:PortalAdminPasswordRef",
}

class ScriptArguments(BaseModel):
    app_config_endpoint: HttpUrl

class WebGISDRProperties(BaseModel):
    portal_admin_url: HttpUrl
    portal_admin_username: str
    portal_admin_password: SecretStr # (5)
    webgisdr_tool_path: FilePath
    shared_location_path: Path
    backup_container_name: str
    storage_account_name: str
    # (11)
    # restore_relational_data: bool = Field(default=True, alias="RESTORE_RELATIONAL_DATA")

    class Config:
        arbitrary_types_allowed = True

# --- Logging Setup (JSON Logging) ---
logger = logging.getLogger("arcgis_webgisdr_backup")
logger.setLevel(os.getenv("LOG_LEVEL", "INFO").upper())
log_handler = logging.StreamHandler(sys.stdout)
formatter = jsonlogger.JsonFormatter(
    '%(asctime)s %(levelname)s %(name)s %(module)s %(funcName)s %(lineno)d %(message)s',
    timestamp=True
)
log_handler.setFormatter(formatter)
logger.addHandler(log_handler)
logger.propagate = False # (12)


async def _resolve_key_vault_reference( # (6)
    setting: ConfigurationSetting, credential: DefaultAzureCredential
) -> str:
    """Helper to resolve a Key Vault reference from an App Configuration setting."""
    if not setting.value:
        raise ValueError(f"Key Vault reference for {setting.key} has no value (URI).")

    try:
        kv_ref_data = json.loads(setting.value)
        secret_uri = kv_ref_data.get("uri")
        if not secret_uri:
            raise ValueError(f"Key Vault reference for {setting.key} is missing URI in JSON value.")
    except json.JSONDecodeError:
        # (13)
        secret_uri = str(setting.value)
        logger.warning(
            f"Key Vault reference for {setting.key} was not JSON. "
            f"Assuming the value is the URI directly: {secret_uri}"
        )

    parsed_uri = Path(secret_uri)
    vault_url = f"{parsed_uri.scheme}://{parsed_uri.host}"
    secret_name = parsed_uri.parts[2]

    logger.info(f"Fetching secret '{secret_name}' from Key Vault: {vault_url}")
    async with SecretClient(vault_url=vault_url, credential=credential) as kv_client:
        secret_bundle = await kv_client.get_secret(secret_name)
        if secret_bundle.value is None:
            raise ValueError(f"Secret '{secret_name}' from {vault_url} has no value.")
        return secret_bundle.value


async def fetch_and_validate_config(app_config_endpoint: HttpUrl) -> WebGISDRProperties:
    """
    Fetches configuration from Azure App Configuration, resolves Key Vault references,
    and validates using Pydantic.
    """
    logger.info(f"Connecting to Azure App Configuration: {app_config_endpoint}")
    raw_config_values: Dict[str, Any] = {}

    async with DefaultAzureCredential() as credential:
        async with AzureAppConfigurationClient(
            endpoint=str(app_config_endpoint), credential=credential
        ) as app_config_client:

            async def fetch_setting(internal_key: str, azure_key: str):
                logger.info(f"Fetching App Configuration key: '{azure_key}' (for internal '{internal_key}')")
                try:
                    setting = await app_config_client.get_configuration_setting(key=azure_key)
                except ResourceNotFoundError:
                    logger.error(f"Azure App Configuration key '{azure_key}' not found.")
                    raise

                if setting.content_type == "application/vnd.azure.appconfiguration.keyvaultreference+json":
                    logger.info(f"Key '{azure_key}' is a Key Vault reference. Resolving...")
                    resolved_value = await _resolve_key_vault_reference(setting, credential)
                    target_key = internal_key.replace("_ref", "")
                    raw_config_values[target_key] = resolved_value
                    logger.info(f"Successfully resolved Key Vault reference for '{azure_key}'.")
                else:
                    raw_config_values[internal_key] = setting.value
                logger.debug(f"Fetched '{azure_key}': '{str(raw_config_values.get(internal_key, 'N/A'))[:30]}...'")
            try: # (14)
                async with asyncio.TaskGroup() as tg:
                    for internal_name, azure_name in AZURE_APP_CONFIG_KEYS.items():
                        tg.create_task(fetch_setting(internal_name, azure_name))
                logger.info("All App Configuration settings fetched.")
            except* AzureError as eg:
                for error in eg.exceptions:
                    logger.error(f"Azure SDK error during concurrent fetch: {error}", exc_info=error)
                raise RuntimeError("Failed to fetch one or more configurations from Azure.") from eg
            except* Exception as eg:
                for error in eg.exceptions:
                    logger.error(f"Unexpected error during concurrent fetch: {error}", exc_info=error)
                raise RuntimeError("Unexpected error during configuration fetching.") from eg

    try: # (15)
        if "webgisdr_tool_path" in raw_config_values:
            tool_path = Path(raw_config_values["webgisdr_tool_path"])
            if not tool_path.is_file():
                 logger.warning(f"webgisdr_tool_path '{tool_path}' does not exist or is not a file. Pydantic validation might fail if FilePath is strict.")


        validated_config = WebGISDRProperties(**raw_config_values)
        logger.info("Configuration successfully fetched and validated.")
        return validated_config
    except Exception as e: # (16)
        logger.error(f"Configuration validation failed: {e}", exc_info=True)
        raise


def create_properties_file_content(config: WebGISDRProperties) -> str:
    """Constructs the content for the webgisdr.properties file."""
    password = config.portal_admin_password.get_secret_value() # (17)

    content = f"""
PORTAL_ADMIN_URL={config.portal_admin_url}
PORTAL_ADMIN_USERNAME={config.portal_admin_username}
PORTAL_ADMIN_PASSWORD={password}
PORTAL_ADMIN_PASSWORD_ENCRYPTED=false
SHARED_LOCATION={config.shared_location_path}
BACKUP_STORE_PROVIDER=AzureBlob
AZURE_STORAGE_ACCOUNT_NAME={config.storage_account_name}
AZURE_BLOB_CONTAINER_NAME={config.backup_container_name}
    """.strip()
    logger.info("webgisdr.properties content generated.")
    return content


async def run_webgisdr_export(tool_path: Path, properties_file_path: Path) -> None:
    """Executes the webgisdr export command asynchronously."""
    cmd = [str(tool_path), "--export", "--file", str(properties_file_path)]
    logger.info(f"Executing webgisdr command: {' '.join(cmd)}")
    if not os.access(str(tool_path), os.X_OK): # (18)
        logger.warning(f"webgisdr tool at '{tool_path}' might not be executable by current user. Attempting to set execute bit.")
        try:
            current_mode = tool_path.stat().st_mode
            tool_path.chmod(current_mode | 0o111) # (19)
            logger.info(f"Execute permission set for '{tool_path}'.")
        except OSError as e:
            logger.error(f"Failed to set execute permission for '{tool_path}': {e}. Proceeding anyway.")

    def blocking_subprocess_run(): # (7)
        return subprocess.run(cmd, capture_output=True, text=True, check=False, timeout=WEBGISDR_TIMEOUT_SECONDS)

    try:
        result = await asyncio.to_thread(blocking_subprocess_run)
        if result.stdout: # (20)
            logger.info(f"webgisdr stdout:\n{result.stdout.strip()}")
        if result.stderr: # (21)
            log_level = logging.ERROR if result.returncode != 0  else logging.INFO
            logger.log(log_level, f"webgisdr stderr:\n{result.stderr.strip()}")


        if result.returncode == 0:
            logger.info("webgisdr export completed successfully.")
        else:
            logger.error(f"webgisdr export failed with exit code {result.returncode}.")
            raise subprocess.CalledProcessError(result.returncode, cmd, output=result.stdout, stderr=result.stderr) # (22)

    except subprocess.TimeoutExpired:
        logger.error(f"webgisdr export timed out after {WEBGISDR_TIMEOUT_SECONDS} seconds.", exc_info=True)
        raise
    except subprocess.CalledProcessError:
        raise # (23)
    except Exception as e:
        logger.error(f"An unexpected error occurred during webgisdr execution: {e}", exc_info=True)
        raise


async def main() -> None:  # (8)
    parser = argparse.ArgumentParser(description="Automated ArcGIS Enterprise webgisdr backup using modern Python.")
    parser.add_argument(
        "--app-config-endpoint",
        required=True,
        type=str,
        help="Azure App Configuration store endpoint (e.g., https://appconfig-name.azconfig.io).",
    )

    parsed_args_dict = vars(parser.parse_args())
    try:
        script_args = ScriptArguments(**parsed_args_dict) # (4)
    except Exception as e:
        logger.error(f"Invalid script arguments: {e}", exc_info=True)
        sys.exit(2)

    logger.info(
        "Starting webgisdr backup process with cloud-native configuration.",
        extra={"app_config_endpoint": str(script_args.app_config_endpoint)}
    )

    temp_properties_file_path: Path | None = None
    exit_code = 0

    try:
        # 1. Fetch and validate configuration
        config = await fetch_and_validate_config(script_args.app_config_endpoint)

        # 2. Create properties file content
        properties_content = create_properties_file_content(config)

        # 3. Write to a temporary, secure properties file
        with tempfile.NamedTemporaryFile(mode="w", delete=False, suffix=".properties", prefix="webgisdr_") as tf:
            temp_properties_file_path = Path(tf.name)
            tf.write(properties_content)

        # Set restrictive permissions
        temp_properties_file_path.chmod(0o600)
        logger.info(f"Runtime properties file securely created at {temp_properties_file_path}")

        # 4. Execute webgisdr export
        await run_webgisdr_export(config.webgisdr_tool_path, temp_properties_file_path)

    except Exception as e:
        # This is a catch-all
        logger.critical(f"A critical error halted the backup process: {e}", exc_info=True)
        exit_code = 1
        # (9)
    finally:
       # (10)
        if temp_properties_file_path and temp_properties_file_path.exists():
            logger.info(f"Cleaning up temporary properties file: {temp_properties_file_path}")
            try:
                temp_properties_file_path.unlink()
                logger.info("Temporary properties file successfully removed.")
            except OSError as e:
                logger.error(
                    f"Error removing temporary properties file '{temp_properties_file_path}': {e}",
                    exc_info=True
                )

        logger.info(f"Backup process finished with exit code {exit_code}.")
        sys.exit(exit_code)

if __name__ == "__main__":
    asyncio.run(main()) # (8)
  1. Using asyncio for concurrent execution of I/O-bound tasks.
  2. Pydantic models (ScriptArguments, WebGISDRProperties) are used to define the expected structure, types and validation rules.
  3. Pydantic used to define the expected schema, data types (e.g., HttpUrl, FilePath, SecretStr) and validation rules.
  4. Maps user-friendly internal configuration names to the specific key names used in the Azure App Configuration store.
  5. Using SecretStr type from Pydantic for sensitive values.
  6. Configuration settings in Azure App Configuration are references to secrets stored in Azure Key Vault. The script resolves these references at runtime, fetching the actual secret value directly from Key Vault.
  7. webgisdr is a synchronous, potentially long-running process. To avoid blocking the asyncio event loop, subprocess.run is executed in a separate thread using asyncio.to_thread.
  8. Primary entry point and orchestrates the entire backup workflow: parsing arguments, fetching configuration, creating the properties file, running the webgisdr tool and cleaning up.
  9. Logic for failure notifications (e.g., sending alerts to Azure Monitor ) can go here.
  10. Ensures that critical cleanup operations are performed regardless of success or failure.
  11. Add other webgisdr properties here.
  12. Avoid duplicate logs if root logger is also configured.
  13. /secrets/SECRET_NAME[/VERSION] -> SECRET_NAME.
  14. Use asyncio.TaskGroup for concurrent fetching.
  15. Validate and structure the configuration using Pydantic.
  16. In a real system, catch Pydantic ValidationError specifically.
  17. Access SecretStr value securely.
  18. Ensure executable.
  19. Add execute for user, group, other.
  20. Log stdout/stderr regardless of success for better diagnostics.
  21. webgisdr often outputs progress to stderr, so log as info unless error code.
  22. Raise an error to be caught by the main try/except.
  23. Already logged, just re-raise