4.3: Application Tier Implementation
✂️ Tl;dr 🥷
The Application Tier implementation for the eMap platform outlines the phased deployment of core ArcGIS Enterprise components (Portal for ArcGIS, ArcGIS Server, ArcGIS Data Store) across development (DEV/UAT) and production (PROD) environments. The MVP stage establishes foundational Azure VM-based deployments with Ubuntu 24.04 LTS, system hardening and integration with Azure storage services. Bronze introduces auto-scaling for ArcGIS Server via Azure VM Scale Sets (VMSS) in PROD, using CPU-based metrics to dynamically adjust capacity. Silver enhances resilience through high availability (HA) configurations: active-passive Portal pairs, VMSS minimum instances and primary-standby Data Store deployments, all leveraging Azure Availability Sets in Melbourne. Gold implements cross-region disaster recovery (DR) to Sydney using 'pilot light' infrastructure, Availability Zones and state restoration via webgisdr
backups. Each stage employs infrastructure-as-code (OpenTofu) and configuration management automation, with PROD progressively achieving scalability, fault tolerance and geo-resilience while DEV/UAT environments maintain simplified setups for cost efficiency.
Following the detailed configuration of the Web Tier (Section 4.1) and the strategy for application state replication using webgisdr
(Section 4.2), this section focuses on the Application Tier. This tier encompasses the core ArcGIS Enterprise software components which provide the geospatial capabilities of the new eMap platform. The implementation guidelines detail the deployment, configuration and scaling of these components across the various project stages (MVP, Bronze, Silver, Gold).
4.3. Application Tier¶
The Application Tier forms the core of the ArcGIS Enterprise deployment, housing the primary software components that deliver geospatial services and capabilities. It is responsible for data processing, analysis, content management and user interaction logic. This tier directly interacts with the Web Tier (receiving requests via Web Adaptors) and the Data Tier (accessing and managing data). The key components constituting the Application Tier are Portal for ArcGIS (user interface, content management, collaboration), ArcGIS Server (service publishing, geoprocessing, hosting server) and ArcGIS Data Store (supporting Portal's hosted layers and analysis tools). These components are deployed on dedicated Azure VMs.
graph TD
subgraph "Web Tier (Frontend - Simplified)"
direction LR
WA_P["📱<br>Portal Web Adaptor<br>(App Service)"] --> AppTier
WA_S["🖥️<br>Server Web Adaptor<br>(App Service)"] --> AppTier
end
subgraph AppTier ["Application Tier"]
direction TB
P4A[("🔑<br>Portal for ArcGIS VM(s)")]
AGS[("⚙️<br>ArcGIS Server VM / VMSS")]
ADS[("💾<br>ArcGIS Data Store VM(s)")]
P4A <-->|Federation, Internal Comms| AGS
AGS <-->|Hosting Server Role| P4A
AGS -->|Registers| ADS
P4A -->|Manages Hosted Layers via Hosting Server| ADS
end
subgraph "Data Tier (Backend - Simplified)"
direction LR
AppTier --> DTEG[("🗄️<br>Enterprise Geodatabase<br>(Azure PostgreSQL)")]
AppTier --> DTSP[("☁️<br>Cloud Storage<br>(Blob, ADLS Gen2, Files)")]
end
classDef webtier fill:#e3f2fd,stroke:#0b5ed7,stroke-width:2px;
classDef apptier fill:#e6ffed,stroke:#198754,stroke-width:2px;
classDef datatier fill:#fff8e1,stroke:#f57c00,stroke-width:2px;
class WA_P,WA_S webtier;
class P4A,AGS,ADS apptier;
class DTEG,DTSP datatier;
Diagram: Conceptual overview of the Application Tier components and their primary interactions. Portal for ArcGIS: Application Tier vs. Web Tier Focus
Section 4.1 (Web Tier) showed how Portal for ArcGIS is accessed and exposed through the web infrastructure and some of its web-related configurations. This Section (Application Tier) details the deployment, installation, core configuration and backend resilience of Portal for ArcGIS.
4.3.1 Application Tier MVP¶
The Minimum Viable Product (MVP) for the Application Tier focuses on establishing the foundational ArcGIS Enterprise components in all environments (DEV, UAT and the initial PROD deployment in Melbourne). This stage prioritises core functionality, base configuration and adherence to Esri's recommendations.
4.3.1.1 VM Provisioning and Common Configuration¶
The core ArcGIS Enterprise components (Portal for ArcGIS, ArcGIS Server, ArcGIS Data Store) will be deployed on Azure Virtual Machines (VMs). All VMs will use Ubuntu 24.04 LTS as the standard operating system. Cost optimisation for DEV and UAT environments is achieved through careful Azure VM SKU selection and the omission of High Availability (HA) and Disaster Recovery (DR) features, as detailed in Section 8.
- Portal for ArcGIS VM:
- PROD: A single dedicated Azure VM sized according to Esri recommendations for production loads (e.g., typically 16GB+ RAM, such as Standard_D4s_v3 or higher if initial load testing suggests). Premium SSD for the OS disk.
- DEV/UAT: A single dedicated Azure VM per environment, utilising lower Azure VM SKUs (e.g., meeting Esri minimums of 8GB RAM, such as Standard_B2ms). Standard HDD for OS disks.
- ArcGIS Server VM:
- PROD: A single dedicated Azure VM, using a custom "golden image" based on Ubuntu 24.04 LTS, sized for initial production GIS server roles (e.g., Standard_D4s_v3 or similar). Premium SSD for the OS disk. This will be transitioned to a VMSS in the Bronze Stage.
- DEV/UAT: A single dedicated Azure VM per environment, utilising lower Azure VM SKUs. Standard HDD for OS disks.
- ArcGIS Data Store (Relational) VM:
- PROD: A single dedicated Azure VM sized according to Esri recommendations for a relational data store supporting the hosting server (e.g., Standard_D4s_v3 or similar). Premium SSD for the OS disk and a dedicated managed data disk (Premium SSD recommended) for Data Store software and its internal database.
- DEV/UAT: A single dedicated Azure VM per environment, utilising lower Azure VM SKUs meeting Esri minimums. Standard HDD for OS and data disks.
- Common VM Specifications & Configuration (Automated via Configuration Management Tool):
- Operating System: Ubuntu 24.04 LTS.
- OS Hardening: Apply baseline security hardening.
- Prerequisite OS Packages: Install essential packages (e.g.,
gettext-base
,fontconfig
, PostgreSQL client libraries for ArcGIS Server VM). Avoid Wayland, X Window System or other graphical tools and libraries.Command-line or REST API should be used. - Service Account: A dedicated non-root service account (e.g.,
arcgis
) created with appropriate, minimal permissions for software installation and operation. - System Limits Configuration:
- File Handles: Maximum open file descriptors (
nofile
) for thearcgis
account set to 65,535. - Process Limits: Maximum processes (
nproc
) for thearcgis
account set to 25,059. - Persistence: Limits made persistent via
/etc/security/limits.conf
and systemd service overrides (e.g., in/etc/systemd/system/arcgisserver.service.d/override.conf
or/etc/systemd/system/arcgisportal.service.d/override.conf
). Example systemd override:
Importance of System Limits
Failure to correctly configure these file handle and process limits can lead to unpredictable ArcGIS Enterprise service failures, particularly under load. Symptoms may include refused connections, incomplete geoprocessing tasks and general component instability.
- File Handles: Maximum open file descriptors (
- Hostname and DNS Requirements:
- Machine hostnames must not contain underscores (
_
). - All VMs must have resolvable Fully Qualified Domain Names (FQDNs) using internal Azure DNS (e.g.,
portal-prod-mel.ffmvic.internal
).
- Machine hostnames must not contain underscores (
- Azure Accelerated Networking: Enabled on all VMs where the selected SKU supports it.
- VM-Level Firewall Configuration (
ufw
):- Configure
ufw
to allow only necessary inbound traffic on specific ports from designated internal sources (e.g., Web Adaptor App Service VNet integrated subnet, other application tier VMs). - Default deny for all other inbound traffic.
- Configure
- Managed Identities: Assign Azure Managed Identities to VMs for secure authentication to Azure Key Vault (licences, secrets) and Azure Storage (Cloud Stores, Portal content directory), granting least privilege RBAC roles.
- Software Installation & Storage:
- ArcGIS software binaries (Portal, Server, Data Store) installed on a local disk (e.g.,
/opt/arcgis/portal
,/opt/arcgis/server
,/opt/arcgis/datastore
). Installation on shared network storage for binaries is not supported. - ArcGIS Data Store software and its internal database files should be on its dedicated local data disk.
- Minimum 10GB free disk space on the installation drive per component, plus space for logs and data.
- ArcGIS software binaries (Portal, Server, Data Store) installed on a local disk (e.g.,
- Shared Configuration Storage: Critical shared directories (
config-store
andsystem
) should be hosted on Azure Files. The mounting and permissioning of these shares on VMs should be automated by the Configuration Management tool.
Item | Details | Notes |
---|---|---|
OS | Ubuntu 24.04 LTS | No graphical tools; CLI/REST API only |
System Limits | nofile=65,535 , nproc=25,059 via /etc/security/limits.conf and systemd overrides | Critical for ArcGIS stability; failure causes service crashes under load |
Hostname/DNS | FQDN without underscores (e.g., portal-prod-mel.ffmvic.internal ) | Required for federation and internal communication |
Firewall (ufw ) | Allow inbound traffic from Web Adaptor App Service subnet and other Application Tier VMs | Default deny policy |
Managed Identities | RBAC roles for Key Vault (licenses/secrets) and Azure Storage (Cloud Stores) | Least privilege access model |
Shared Directories | Azure Files for config-store /system ; mounted via Configuration Management | Automate permissions and mount points |
Table: Common VM Configuration
4.3.1.2 Portal for ArcGIS - MVP Configuration¶
- Software Installation: Portal for ArcGIS 11.4 installed silently by the Configuration Management tool using the
arcgis
service account. - Initial Site Creation: Create the Portal site, applying the licence (from Key Vault) and configuring the Primary Site Administrator (PSA) account (credentials managed via Key Vault).
- Content Directory: Configure the Portal
content
directory on Azure Blob Storage (ZRS for PROD MVP, LRS for DEV/UAT) via the Portal Administrator API, ensuring soft delete, versioning and resource locks are applied. - Security Configuration:
- Implement SSO by configuring SAML or OpenID Connect against the enterprise Identity Provider (IdP).
- Define initial administrative users and roles within Portal.
- Validation:
- Verify Portal accessibility via its direct URL and through the Web Adaptor/ADC URL.
- Confirm PSA login and basic Portal functionality (e.g., item creation, user management).
4.3.1.3. ArcGIS Server (Hosting Server) - MVP Configuration¶
- Software Installation: ArcGIS Server 11.4 installed silently by the Configuration Management tool using the
arcgis
service account. - Initial Site Creation: Create the ArcGIS Server site and apply the licence (from Key Vault).
- Federation and Hosting Server Designation:
- Federate the ArcGIS Server site with the Portal for ArcGIS instance.
- Designate this federated ArcGIS Server site as Portal's primary hosting server.
- Data Store Registration:
- Register the Azure Database for PostgreSQL (Flexible Server) as an Enterprise Geodatabase.
- Register Azure Blob Storage containers as Cloud Stores for
arcgiscache
,jobs
andoutput
directories. Ensure thearcgiscache
container has a subfolder namedarcgiscache
as required by ArcGIS Server. - Register Azure Data Lake Storage Gen2 (ADLSg2) as the Raster Store (Cloud Store).
- Validation:
- Verify ArcGIS Server Manager accessibility and site health.
- Confirm successful federation and hosting server designation in Portal settings.
- Test publishing a simple service from the registered Enterprise Geodatabase and accessing it via Portal.
4.3.1.4 ArcGIS Data Store - MVP Configuration¶
The ArcGIS Data Store (Relational type) is essential for supporting the hosting server.
- Role and Scope:
- Strictly limited to supporting Portal for ArcGIS internal operations for the hosting server, as defined in Section 2.b.8 and Section 3.4. This includes storing data for Hosted Feature Layers and outputs from Portal's built-in Spatial Analysis tools.
- It is NOT intended for primary enterprise data storage. Data governance policies must be enforced to limit its use.
- Software Installation: ArcGIS Data Store 11.4 software installed silently by the Configuration Management tool using the
arcgis
service account. Binaries on OS disk (e.g.,/opt/arcgis/datastore
), data on dedicated local data disk (e.g.,/opt/arcgis/datastore_data
). - Initial Configuration (Data Store Configuration Wizard or
configuredatastore
utility):- Run the configuration process on the Data Store VM.
- Provide the ArcGIS Server Admin URL (of the hosting server) and PSA credentials (from Key Vault).
- Specify the ArcGIS Data Store content directory (on the dedicated data disk).
- Select Relational data store type.
- Registration with Hosting Server: The configuration process registers the relational data store with the hosting ArcGIS Server site.
- Networking: Ensure NSG rules on the Data Store VM allow inbound traffic on necessary ports: 2443 for configuration, 9876 for relational data store communication with server from the Portal and ArcGIS Server VMs.(Project todo: verify if ports 29080 and 29081 are also needed or if they are for internal tile cache only).
- Validation:
- Verify registration status in ArcGIS Server Manager (Data Stores tab).
- Confirm Portal can publish hosted feature layers (e.g., by uploading a CSV to Portal).
- Use the
describedatastore
utility to check the health and properties of the relational data store.
- Backup (via
webgisdr
): The content within the ArcGIS Data Store is backed up as part of thewebgisdr
utility process (Section 4.2), which captures the entire ArcGIS Enterprise state. For MVP, this is the primary DR mechanism for its content. Internal automated backups of the relational data store (to its local backup directory, default:<ArcGIS Data Store directory>/arcgisdatastore/backup/relational
) should also function, but thewebgisdr
backup is the key enterprise-level backup. The default local backup location should be monitored for disk space.
4.3.2. Application Tier Bronze Stage¶
The Bronze Stage for the Application Tier focuses on enabling and validating automatic scaling for the ArcGIS Server component within the Production (PROD) environment in Melbourne. This enhancement is designed to optimise resource utilisation and maintain performance under varying user and service loads. DEV and UAT environments continue with single ArcGIS Server VMs; auto-scaling is a PROD-specific feature.
4.3.2.1. Portal for ArcGIS - Bronze Stage¶
- Monitoring and Performance: Continuously monitor the Portal VM's key performance indicators (CPU, memory, disk I/O, application response times) using Azure Monitor.
- Manual Vertical Scaling Strategy: If performance monitoring indicates the Portal VM is a bottleneck, a manual vertical scaling procedure (resizing the VM to a larger SKU) will be followed.
- Triggers: Define specific metric thresholds (e.g., sustained CPU > 85% for 1 hour during peak load, Portal page load times consistently exceeding X seconds for 95th percentile users) that initiate a review.
- Procedure (Documented in Operational Runbooks):
- Planning: Assess impact (downtime required for VM resize/reboot), schedule during a maintenance window, communicate to stakeholders.
- Execution: Update the VM size in the OpenTofu configuration (
size
property forazurerm_linux_virtual_machine
) and re-apply. The VM will typically reboot. - Validation: Verify Portal functionality and confirm performance metrics have improved post-scaling.
4.3.2.2. ArcGIS Server (Hosting Server) - Bronze Stage¶
The ArcGIS Server in PROD transitions from a single VM to an Azure Virtual Machine Scale Set (VMSS) to enable automatic horizontal scaling.
- VMSS Deployment:
- PROD ArcGIS Server deployed as an Azure VMSS using the custom "golden image" (Ubuntu 24.04 LTS with ArcGIS Server pre-requisites/configurations).
- Flexible Orchestration Mode is recommended for the VMSS.
- Azure Files for
config-store
/system
and Cloud Stores forarcgiscache
/jobs
/output
/Raster Store remain essential.
- Auto-Scaling Configuration (OpenTofu -
azurerm_monitor_autoscale_setting
):- Metric-Based Rules:
- Scale-Out Condition: e.g., If average "Percentage CPU" across VMSS instances > 70% for 10 minutes, increase instance count by 1.
- Scale-In Condition: e.g., If average "Percentage CPU" < 30% for 20 minutes, decrease instance count by 1.
- Instance Limits (Profile Capacity):
- Minimum Instances: 1 (May increase to 2 in Silver Stage for HA baseline).
- Maximum Instances: Initially 4 (to be refined based on load testing and budget).
- Default Instances: 1.
- Cooldown Period: Configure to prevent rapid "flapping" (e.g., 10 mins for scale-out, 20 mins for scale-in).
- Metric-Based Rules:
- Configuration Management for Scaled Instances:
- The Configuration Management tool, using VMSS Custom Script Extension, must ensure new VMSS instances automatically join the existing ArcGIS Server site.
- Site-joining credentials (PSA username/password) are retrieved securely from Azure Key Vault by the new instance using its Managed Identity.
-
Testing and Validation:
- Conduct load tests (Azure Load Testing, JMeter, k6) to simulate realistic user loads.
- Verify scaling rules trigger as expected and the platform remains stable.
- Monitor KPIs (service response times, error rates, resource utilization) during scaling.
- Validate new instances join the site and serve traffic.
-
ArcGIS Server VMSS Auto-Scaling Configuration:
-
The ArcGIS Server Virtual Machine Scale Set (VMSS) allows the number of ArcGIS Server instances to dynamically adjust based on real-time demand or a predefined schedule.
-
Metric-Based Scaling Rules: Azure Monitor Autoscale settings should be configured to govern the scaling behaviour of the VMSS. This leverages host-based metrics, requiring no additional agents on the VM instances for basic CPU/memory metrics.
-
Instance Limits (Profile Capacity):
- Minimum Instances: A baseline number of instances should be maintained to ensure baseline availability and responsiveness, even during low-load periods. For Bronze, this should be 1, with 2 becoming the minimum in Silver.
- Maximum Instances: An upper limit on the number of instances should be defined based on anticipated peak load, performance targets and budget considerations (e.g., 4 instances).
- Default Instances: The count the VMSS will revert to if no rules are met or upon initial deployment.
- Cooldown Period: A cooldown period (e.g., 5-10 minutes for scale-out, 10-20 minutes for scale-in) will be configured after each scaling action. This prevents rapid, successive scaling events ("flapping") by allowing metrics to stabilise before further scaling decisions are made.
-
Automation (IaC): These autoscale settings (rules, instance limits, cooldown periods) should be defined declaratively and managed through OpenTofu scripts, ensuring version control and consistent application. The
azurerm_monitor_autoscale_setting
resource in OpenTofu should be used for this. -
Configuration Management for Scaled Instances:
- The Configuration Management tool, typically invoked via VMSS Custom Script Extensions or by using a pre-configured "golden image", will ensure that any new ArcGIS Server instances provisioned by auto-scaling automatically and correctly join the existing ArcGIS Server site.
- Credentials required for joining the site (e.g., primary site administrator username and password) should be securely retrieved from Azure Key Vault at runtime by the new instance, using its Managed Identity. This avoids hardcoding sensitive information.
- The
az vmss extension set
command can be used to configure extensions such as the Custom Script Extension on the VMSS.az vmss run-command invoke
can be used for executing scripts on instances, which the CM tool might leverage.
-
flowchart TB
subgraph "Azure Monitor"
A[📊 Collects Metrics e.g., CPU Usage from VMSS] --> B{🔄 Autoscale Rules Engine}
end
subgraph C["ArcGIS Server VMSS (PROD - Melbourne)"]
direction LR
VM1["💻 VMSS Instance 1"]
VM2["💻 VMSS Instance 2"]
VM3["💻 VMSS Instance N"]
LB["⚖️ Azure Load Balancer"] --> VM1
LB --> VM2
LB --> VM3
end
B -- "If CPU > 70%" --> S_OUT["📤 Scale Out Action"]
B -- "If CPU < 30%" --> S_IN["📥 Scale In Action"]
S_OUT -->|"Provisions New Instances"| VMSS_API["⚙️ VMSS Management API"]
S_IN -->|"Deallocates Instances"| VMSS_API
VMSS_API -->|"Manages Instance Count: Min/Max/Default"| C
classDef default fill:#f9f9f9,stroke:#333,stroke-width:1px
Diagram: Conceptual overview of ArcGIS Server VMSS auto-scaling in the Bronze Stage, triggered by Azure Monitor metrics. Type | Metric | Threshold | Time Window | Cooldown | Action | Notes |
---|---|---|---|---|---|---|
Scale-Out | Percentage CPU | >70% | 10 min | 10 min | Add 1 instance | Prevents overloading during sustained load |
Scale-In | Percentage CPU | <30% | 20 min | 20 min | Remove 1 instance | Conservative to avoid premature scaling |
Table: ArcGIS Server VMSS Auto-Scaling Rules
More Responsive Scaling
Other Scale-out rules, for example based on memory pressure can also be defined to make the scaling more robust and responsive. We focus here just on a single CPU-focused rule for simplicity.
The following conceptual OpenTofu code illustrates how auto-scaling might be defined for the ArcGIS Server VMSS in the PROD environment:
- A unique and descriptive name for this autoscale setting resource within Azure.
- Specifies the Azure Resource Group where the target VM Scale Set is located.
- The resource ID of the specific ArcGIS Server VM Scale Set that this autoscale setting will manage.
- A boolean flag to enable or disable this autoscale setting.
true
means it's active. - Defines a set of scaling rules. Multiple profiles can exist (e.g., for different times of day or schedules), but a default profile is common.
- The instance count the VMSS will revert to if no scaling rules are met, or the initial count upon deployment if different from the minimum.
- The absolute minimum number of running instances the VMSS must maintain, ensuring baseline availability.
- The maximum number of instances the VMSS can scale out to. This acts as a cap to control costs and prevent runaway scaling. This should be initially set to 4 and further revisited by the project based on actual load and traffic patterns and budget.
- The source of the metric being monitored, which is the VMSS itself, using Azure's host-based metrics.
- The threshold for the scale-out rule: if the average CPU percentage across all instances is greater than 70% for the
time_window
, a scale-out action is triggered. - The number of instances to add when the scale-out rule is triggered.
- The cooldown period after a scale-out action, during which this rule won't trigger again. This allows the system and metrics to stabilise.
- A longer observation window for scale-in decisions to avoid reducing capacity prematurely based on temporary lulls.
- The threshold for the scale-in rule: if the average CPU percentage is less than 30%, a scale-in action is considered.
- The number of instances to remove when the scale-in rule is triggered.
- A longer cooldown period for scale-in actions to prevent "flapping".
- Configures notifications to be sent when autoscale events occur, ensuring administrators are aware of scaling activities.
- Other Scale-out rules, for example based on memory pressure can also be defined to make the scaling more robust and responsive. We focus here just on a single CPU-focused rule for simplicity.
4.3.2.3. ArcGIS Data Store (Relational) - Bronze Stage¶
The ArcGIS Data Store remains a single VM in PROD during the Bronze stage.
- Monitoring and Performance: Continuously monitor the Data Store VM's key performance indicators (CPU, memory, disk I/O for its internal database, query response times for hosted layers) using Azure Monitor and ArcGIS Data Store utilities (e.g.,
describedatastore
). - Manual Vertical Scaling Strategy: If the Data Store VM becomes a performance bottleneck (identified through monitoring), the same manual vertical scaling procedure documented for the Portal VM (update OpenTofu
size
property, re-apply, validate) will be followed. This procedure must be documented in the Operational Runbooks.
Testing and Validation of Auto-Scaling¶
Comprehensive testing is essential to confirm that the auto-scaling mechanisms function correctly.
- Load Testing Strategy: Utilise tools such as Azure Load Testing, JMeter, or k6 to simulate realistic and peak user loads against the services hosted on the PROD environment. Test scenarios should cover various request types (map rendering, feature queries, geoprocessing tasks).
- Verification Steps:
- Confirm that scaling rules (both scale-out and scale-in) trigger accurately based on the defined metric thresholds (CPU utilisation).
- Observe that the number of VMSS instances increases under sustained load and decreases appropriately when the load subsides.
- Monitor key performance indicators (KPIs) such as service response times, error rates and resource utilisation across all tiers during scaling events.
- Ensure the platform remains stable and responsive throughout the scaling operations, with no service degradation or failures.
- Validate that new ArcGIS Server VMSS instances successfully join the site (via Configuration Management automation) and begin processing requests distributed by the load balancer.
4.3.3. Application Tier - Silver Stage¶
The Silver Stage focuses on implementing High Availability (HA) for all critical components within the primary Production (PROD) region (Melbourne). This ensures resilience against single points of failure, significantly improving uptime and reliability of the new eMap platform. DEV and UAT environments continue with single-instance VMs for core components.
No Availability Zones in Melbourne (Azure Australia Southeast)
As the Azure Australia Southeast (Melbourne) region currently lacks Availability Zones, High Availability for VM-based components (Portal for ArcGIS, ArcGIS Data Store) will be achieved using Azure Availability Sets. This distributes VMs across different fault domains (protecting against hardware failures such as server rack power/network issues) and update domains (protecting against planned Azure maintenance events) within the data centre.
4.3.3.1. Portal for ArcGIS - Silver Stage¶
An active-passive HA configuration for Portal for ArcGIS will be implemented according to Esri's guidelines.
- HA Architecture: Two Portal for ArcGIS VMs deployed within an Azure Availability Set.
- VM Provisioning for Standby: OpenTofu scripts will be updated to deploy the second Portal VM, ensuring it's placed in a different fault and update domain.
- Software Installation and HA Configuration (Automated via CM Tool):
- Install Portal for ArcGIS software on the second VM.
- Configure Esri's native HA for Portal. This involves:
- Shared Content Directory: The Portal
content
directory, hosted on Azure Blob Storage (already configured with ZRS from MVP/Bronze), is critical and inherently shared and accessible by both Portal VMs. - Internal State Replication: Portal for ArcGIS includes built-in mechanisms for replicating its internal system database (stores users, groups, items, security settings) and synchronising its search index from the active machine to the passive machine.
- Shared Content Directory: The Portal
- Automatic Failover: The Portal HA configuration manages automatic failover. If the active Portal VM becomes unavailable, the passive VM is promoted to active status. Failover properties (e.g., monitoring intervals, frequency) are configurable in the
portal-ha-config.properties
file on each Portal VM. - ADC Integration: The highly available ADC routes traffic destined for
/portal/*
to the Portal Web Adaptor App Service. The Web Adaptor is configured with the URL that correctly resolves to the active Portal machine. The Portal's health check endpoint (e.g.,https://<portal.domain.com>:7443/arcgis/portaladmin/system/healthcheck
) can be used by monitoring systems or indirectly by the ADC assessing Web Adaptor App Service health. - Testing HA: Simulate failure of the active Portal VM and verify automatic failover to the passive VM, measuring RTO and ensuring data consistency.
4.3.3.2. ArcGIS Server (Hosting Server) - Silver Stage¶
The ArcGIS Server VMSS, already deployed in Bronze, inherently supports HA through instance distribution and Azure's automatic replacement of unhealthy instances.
- Availability Set Deployment for VMSS: The VMSS should be explicitly configured to distribute its instances across the fault and update domains within an Availability Set.
- Minimum Instance Count: The
minimum
capacity in the VMSS autoscale profile should be increased to 2. This ensures at least two instances are always running, providing a baseline for HA even during low load. - Shared Resources Resilience:
- Azure Files (
config-store
,system
): Must use Premium tier with ZRS replication. - Azure Blob Storage (
arcgiscache
,jobs
,output
) and ADLS Gen2 (Raster Store): Must use ZRS replication. - Azure Database for PostgreSQL (Enterprise Geodatabase): Must be configured with Azure's native High Availability option (zone-redundant standby replica).
- Azure Files (
- Health Monitoring & Automatic Replacement:
- Azure Load Balancer health probes (targeting an ArcGIS Server health endpoint) ensure traffic is routed only to healthy VMSS instances.
- The VMSS Application Health extension can be used for more granular health reporting.
- Azure automatically attempts to replace unhealthy instances based on these health signals and the VMSS 'Automatic instance repairs' policy.
Parameter | Bronze Stage Value | Silver Stage Value | Notes |
---|---|---|---|
Minimum | 1 | 2 | Baseline HA introduced in Silver |
Maximum | 4 | 8 | Adjust based on load testing |
Default | 1 | 2 | Fallback if no rules apply |
Table: ArcGIS Server VMSS Auto-Scaling Rules
4.3.3.3. ArcGIS Data Store - Silver Stage¶
Esri's native primary-standby HA configuration will be implemented for the Relational ArcGIS Data Store.
- HA Architecture: Two ArcGIS Data Store VMs deployed within an Azure Availability Set.
- VM Provisioning for Standby: OpenTofu scripts updated to deploy the second Data Store VM across the fault and update domains within an Availability Set. Ensure dedicated local data disks for each VM.
- Software Installation and HA Configuration (Automated via CM Tool):
- Install ArcGIS Data Store software on the second (standby) VM.
- Configuration Management tool should configure the standby Data Store using the
configuredatastore
command-line utility the REST APIs. - Register it with the same hosting ArcGIS Server site as the primary Data Store.
- The internal data is replicated from the primary to the standby machine.
- Automatic Failover: ArcGIS Data Store manages automatic failover. If the primary Data Store VM fails, the standby is promoted to primary.
- Backup Location for Internal Backups:
- While
webgisdr
is the strategic enterprise backup, ArcGIS Data Store also performs its own internal scheduled backups (default location:<ArcGIS Data Store directory>/arcgisdatastore/backup/relational
). - For an HA setup, it's recommended to change this default internal backup location to a shared network location accessible by both primary and standby VMs. An Azure Files share (configured with ZRS for PROD) is suitable. This can be configured using the
configurebackuplocation
utility with the--operation change
argument or the REST API.
- While
- Testing HA: Simulate failure of the primary Data Store VM. Verify automatic promotion of the standby, measure RTO and confirm data consistency by attempting to publish/access hosted feature layers.
Component | HA Mechanism | Azure Resource | Notes |
---|---|---|---|
Portal for ArcGIS | Active-Passive (2 VMs) | Availability Set | Shared content directory on ZRS Blob Storage; internal replication |
ArcGIS Server | VMSS + Auto-Scaling | Availability Set | Minimum 2 instances in Silver; ZRS Azure Files for shared directories |
ArcGIS Data Store | Primary-Standby (2 VMs) | Availability Set | Backup location on ZRS Azure Files; automatic failover |
Table: Silver Stage (High Availability)
High Availability in Melbourne
By deploying critical components (ArcGIS Server VMSS instances, Portal for ArcGIS HA pair, ArcGIS Data Store HA pair) across Azure Availability Sets and ensuring that all supporting shared PaaS resources (Azure Files, Azure Blob Storage, Azure ADLS Gen2, Azure Database for PostgreSQL) are configured with Zone-Redundant Storage (ZRS), the PROD environment in Melbourne achieves robust intra-region high availability. This significantly mitigates the risk of downtime due to failures.
4.3.4. Application Tier - Gold Stage¶
The Gold Stage for the Application Tier focuses on establishing inter-region Disaster Recovery (DR) capabilities. This involves enabling the failover of Portal for ArcGIS, ArcGIS Server and ArcGIS Data Store from the primary region (Melbourne) to the secondary DR region (Sydney), building upon the "pilot light" infrastructure model.
Availability Zones in Sydney (Azure Australia East)
Unlike Melbourne, the Azure Australia East (Sydney) region supports Availability Zones. For the DR deployment in Sydney, all VM-based components (Portal for ArcGIS HA pair, ArcGIS Data Store HA pair, ArcGIS Server VMSS instances) MUST be deployed across different Availability Zones. This provides higher resilience within the DR region itself, should it become the active region.
Disaster Recovery Configuration for Application Tier Components¶
The DR strategy for the application tier combines Infrastructure as Code (IaC) for activation of "pilot light" resources in Sydney, data replication for Azure PaaS storage locations (Section 4.4) and application-specific state restoration using the webgisdr
utility (detailed in Section 4.2).
-
Portal for ArcGIS DR (PROD Sydney):
- Infrastructure (Pilot Light): Two Portal VMs provisioned by OpenTofu in Sydney, distributed across different Availability Zones. These VMs may be kept in a stopped state or provisioned with smaller SKUs during normal operations, to be started and/or scaled up during a DR event. This mirrors the HA setup in Melbourne but leverages AZs.
- Application State Restoration: The primary method for restoring Portal's state (items, users, groups, configurations, federated server information) is using the
webgisdr
utility.- Regularly scheduled
webgisdr
backups from the PROD Melbourne environment are stored in an Azure Blob Storage container configured with Geo-Redundant Storage (GRS). - During a DR event, the DR automation scripts (IaC and CM) will orchestrate the restoration of the latest
webgisdr
backup to the Portal VMs activated in Sydney.
- Regularly scheduled
- Alternative (Azure Site Recovery - ASR): ASR for VM replication remains an optional secondary consideration for potentially faster RTO of the VM state, but
webgisdr
is key for application-level consistency.
-
ArcGIS Server (Hosting Server) DR (PROD Sydney):
- Infrastructure (Pilot Light): The ArcGIS Server VMSS in Sydney is provisioned by OpenTofu, with instances distributed across Availability Zones. It's configured with a minimum instance count (e.g., 1) for the pilot light model.
- Auto-scaling rules identical to Melbourne's configuration are defined, allowing the VMSS to scale out automatically if load increases post-failover.
- The custom "golden image" used for VMSS instances in Melbourne must be replicated to an Azure Compute Gallery accessible in the Sydney region.
- Configuration and Service Restoration:
- The ArcGIS Server
config-store
andsystem
directories, hosted on Azure Files (configured with GRS), will be accessible in Sydney due to storage replication. - ArcGIS Server service definitions are part of Portal's configuration and thus are included in the
webgisdr
backup. Restoring the Portal configuration usingwebgisdr
also restores the service definitions to the ArcGIS Server site in Sydney. - The ArcGIS Server site in Sydney will then connect to the failed-over data sources (e.g., promoted Azure PostgreSQL read replica, replicated Cloud Stores on Blob/ADLS Gen2).
- The ArcGIS Server
-
ArcGIS Data Store DR:
- Infrastructure (Pilot Light): Two ArcGIS Data Store VMs provisioned by OpenTofu in Sydney, distributed across Availability Zones. These VMs can be stopped or use minimal SKUs. This setup mirrors the HA configuration in Melbourne but leverages AZs for the DR site.
- Application State Restoration: The internal content of the Relational ArcGIS Data Store is included in the
webgisdr
backups. The DR process involves restoring thewebgisdr
backup to the Data Store VMs activated in Sydney. This process is orchestrated by the DR automation scripts as part of the overallwebgisdr
import. - Alternative (ASR): Similar to Portal, ASR is an optional secondary consideration for VM-level replication of the Data Store VMs. The primary DR mechanism for its content remains
webgisdr
.
Feature | MVP | Bronze | Silver | Gold |
---|---|---|---|---|
Portal HA | Single VM | Single VM | 2 VMs (AS) | 2 VMs (AZ) + DR |
Server Scaling | Manual | VMSS (1-4) | VMSS (2-8) | Cross-region |
Data Store Resilience | Single VM | Single VM | 2 VMs (AS) | 2 VMs (AZ) + DR |
Table: HA DR Stage Summary
- Automation and Runbooks for DR:
- OpenTofu and Configuration Management scripts must be fully parameterised for multi-region DR deployments, capable of provisioning and configuring the DR environment in Sydney.
- Detailed DR runbooks are critical, outlining step-by-step procedures for:
- Declaring a disaster.
- Failing over (activating pilot light resources, promoting PostgreSQL replica, restoring
webgisdr
, validating services in Sydney). - Failing back to Melbourne once the primary region is restored (data synchronisation, traffic redirection).
- Testing DR:
- Conduct comprehensive DR tests (e.g., annually).
- Simulate a full Melbourne region outage.
- Verify GSLB failover, promotion of PostgreSQL replica,
webgisdr
restoration, VMSS scale-out in Sydney and full application functionality in the Sydney DR environment. - Test and validate failback procedures.
Component | Pilot Light Infrastructure (Sydney) | State Restoration Method | Testing Validation Steps |
---|---|---|---|
Portal for ArcGIS | 2 VMs in AZs (stopped/small SKU) | webgisdr backup from GRS Blob | Verify SSO, item access, hosted layers |
ArcGIS Server | VMSS (min 1 instance) in AZs | config-store /system on GRS | Load test services; validate auto-scaling |
ArcGIS Data Store | 2 VMs in AZs (stopped) | webgisdr + internal backups | Publish hosted layers; check relational DB sync |
Table: Gold Stage (Disaster Recovery) Component Summary
This phased implementation ensures that the Application Tier evolves systematically from a functional MVP to a fully scalable, highly available and disaster-recoverable system, aligning with the enterprise's operational and resilience requirements. Continuous automation and rigorous testing at each stage are key to the success of this approach.