4.9: VertiGIS Studio Workflows
✂️ Tl;dr 🥷
VertiGIS Studio Workflows development for the eMap platform follows guidelines ensuring data integrity and resilience. Enterprise data resides in Azure PostgreSQL, accessed via validated feature services, while transient data uses ArcGIS Data Store with automated cleanup. Workflows adopt a PaaS-first approach, scalable Azure resources and avoid hardcoded endpoints for disaster recovery. Design patterns such as query federation optimise efficiency. DevOps integration includes workflow-as-code in Git, CI/CD pipelines and externalised configurations. Rigorous testing and avoidance of anti-patterns (e.g. direct database writes) ensure reliability. Data validation, lineage tracking and audit logs enforce quality, capturing edits' provenance across all operations.
This section details the guidelines and best practices for developing VertiGIS Studio Workflows within the new eMap platform.
4.9.1 Development Guidelines¶
The following development guidelines and best practices are recommended for VertiGIS Studio Workflows.
Core Principles¶
The following principles provide the foundational context for all VertiGIS Studio Workflow development activities.
Data Sovereignty & Stewardship¶
-
Enterprise Data Resides in Azure PostgreSQL: All authoritative and versioned enterprise datasets are managed in Azure Database for PostgreSQL
-
ArcGIS Data Store for Transient Content: The Esri-managed ArcGIS Data Store should be used for temporary outputs, staging data or user-generated content that is not classified as authoritative enterprise data. Workflows creating content in the ArcGIS Data Store should implement Time-To-Live (TTL) policies or automated cleanup mechanisms.
- Data Quality as a Shared Responsibility: Workflows that create or modify enterprise data should incorporate validation gates to check for schema conformance, attribute domain validity, spatial integrity and business rule adherence.
- Trackable Data Lineage: Modifications to enterprise data orchestrated by workflows should capture provenance metadata. This includes user identity, timestamp of the edit, the specific workflow and version performing the operation and the context or reason for the change.
Architectural Alignment¶
- PaaS-First Approach: Design workflows to leverage the platform's Azure PaaS-centric architecture. This means interacting with data primarily through services published from Azure Database for PostgreSQL and using Azure Blob Storage or ADLS Gen2 for file-based inputs/outputs.
- Content Directory Awareness: Developers should be aware of the backend storage configurations. While workflows interact via services, understanding the underlying storage can inform performance considerations for large data operations.
- Resource Scalability Awareness: Design workflows to be efficient and mindful of the scalable nature of the platform. For example, long-running or computationally intensive geoprocessing tasks should be executed asynchronously to avoid tying up ArcGIS Server instances.
-
Region-Aware Design for Disaster Recovery:
No Hardcoded Endpoints
Workflows should not contain hardcoded region-specific service URLs or connection parameters. All service endpoints consumed by workflows must be configurable to allow for seamless operation during Disaster Recovery (DR) failover to Sydney. This requires referencing Portal items or using environment variables managed by the CI/CD pipeline to resolve service URLs dynamically.
Workflow Design Patterns¶
Employing consistent design patterns enhances the maintainability, performance and resilience of workflows developed for the platform.
Data Access Strategies¶
Pattern | Implementation Guidance | Benefits |
---|---|---|
Query Federation | Design workflows to query multiple disparate data sources (via their respective services) but perform joins and complex processing in memory in the workflow or on the client side where possible. | Reduces load on individual databases, increases flexibility, avoids complex cross-database service configurations. |
Materialised View Approach | For complex analytical outputs that are frequently accessed, consider creating registered views or materialised views within Azure Database for PostgreSQL. Publish these as queryable services for workflows to consume. | Enables formal data governance of derived products, improves query performance for workflows, centralises complex logic. |
Event Sourced Updates | For critical enterprise data where a full audit trail of changes is required, workflows should use services that log all changes. This can complement geodatabase versioning. | Enhances auditability, provides a comprehensive history of changes, supports temporal analysis. |
Enterprise Geodatabase Integration¶
Workflows interacting with data must differentiate their approach based on the data classification and the intended operation.
flowchart TD
A["🚀 Workflow Start"] --> B{"❓ Data Operation Type"}
B -->|Query| C["📖 Read-Only Service"]
B -->|Update/Edit| D["✏️ Edit Operation"]
B -->|Analyse| E["⚙️ Processing"]
B -->|Create New| F["🆕 New Content"]
C --> G["🏛️ Enterprise Feature Service"]
D --> H{"❓ Enterprise Data?"}
H -- Yes --> I["🏛️ Versioned Editing<br>(Feature Service)"]
H -- No --> J["☁️ Hosted Layer Edits<br>(Portal REST)"]
E --> K{"❓ Result Lifetime?"}
K -- Temporary --> L["⏳ ArcGIS Data Store"]
K -- Permanent --> M["🗄️ Azure PostgreSQL"]
F --> N{"❓ Classification?"}
N -- Enterprise --> O["🗄️ Azure PostgreSQL<br>(Validated)"]
N -- Temporary --> P["⏳ ArcGIS Data Store"]
I --> Q["✅ Tracked Edits"]
J --> R["📱 Portal Interaction"]
classDef default fill:#fff,stroke:#333,stroke-width:1px;
classDef decision fill:#e3f2fd,stroke:#0b5ed7;
classDef enterprise fill:#e8f5e9,stroke:#1e8e3e;
classDef transient fill:#fff8e1,stroke:#f57c00;
class B,H,K,N decision;
class C,G,I,M,O,Q enterprise;
class J,L,P,R transient;
Diagram: Decision logic for VertiGIS Studio Workflows interacting with data stores and services, guiding data placement and interaction methods based on data classification and operation type.
Implementation Guidelines¶
Authoritative Datasets (Enterprise Geodatabase - Azure PostgreSQL)¶
- When to Use: Core business assets, regulatory data, systems of record.
- Implementation Requirements:
- All access (read and write) should be through feature services published from registered enterprise geodatabases.
- For editing workflows, services should be configured with version management capabilities enabled. Unversioned edits to enterprise datasets via workflows are strongly discouraged.
- Workflows performing edits should incorporate validation steps (schema, attribute, spatial, business rules) before committing changes.
- Metadata related to edits (user, timestamp, workflow ID, change reason) should be captured through automated edit tracking features of the feature service or custom attributes populated by the workflow.
- Consider using PostgreSQL native capabilities (e.g., constraints, triggers), managed by Database Administrators, to enforce data integrity at the database level, supplementing workflow-level validation.
Operational Datasets (ArcGIS Data Store - Esri-Managed PostgreSQL)¶
- When to Use: Temporary features, intermediate analysis results, user-generated content explicitly not classified as enterprise data.
- Implementation Requirements:
- Workflows creating hosted feature layers should implement or trigger automated Time-To-Live (TTL) policies.
- Adopt clear naming conventions for hosted layers created by workflows to indicate their transient or specific-purpose nature (e.g.,
temp_wf_analysis_[workflow_id]_[timestamp]
). - Design workflows with clear pathways for promoting valuable data from a hosted layer to an enterprise geodatabase if it meets the criteria for enterprise data. This promotion process should involve validation and registration.
- Automate cleanup processes for hosted layers, triggered upon workflow completion or on a schedule, to prevent accumulation of orphaned or obsolete data.
Reference Data (Mixed Sources)¶
- When to Use: Basemaps, lookup tables, contextual information layers.
- Implementation Requirements:
- Prioritise consumption of cached map services or vector tile services for basemaps and frequently accessed, static reference layers to optimise performance.
- Implement client-side caching strategies within VertiGIS Studio Workflows for infrequently changing lookup data.
- If a CDN has been deployed, use CDN cached services where available for widely distributed basemaps or large reference datasets.
Modern DevOps Integration for Workflows¶
Integrating workflow development into modern DevOps practices is essential for quality, consistency and manageability.
Workflow as Code¶
- All VertiGIS Studio Workflow definitions (.json or .xml files) should be stored in the designated git repository.
- Each workflow should reside in its own appropriately named directory structure within the repository.
- Atomic commits with clear messages should be used to track changes to workflow logic.
CI/CD for Workflows¶
- CI/CD pipelines using GitHub Actions should be used to deploy workflows to the respective ArcGIS Enterprise Portal environments (DEV, UAT, PROD).
- Pipelines should handle:
- Fetching workflow definitions from git.
- Publishing or updating workflow items in Portal using the ArcGIS API for Python or the ArcGIS REST API.
- Managing sharing permissions for workflow items.
- Environment-specific configurations.
Configuration Management for Workflows¶
- Workflows should not contain hardcoded environment specific values such as service URLs, item IDs, or credentials.
- Such configurations should be externalised and managed:
- Preferably through workflow inputs that are populated by the calling application or dynamically resolved at runtime; or alternatively through configuration files or lookup lists stored as Portal items, with environment specific versions managed by the CI/CD pipeline.
- Sensitive details (API keys, service credentials if unavoidable for specific integrations) should be retrieved securely from Azure Key Vault at runtime by the workflow. If direct Key Vault access from the workflow definition is not possible; the services consumed by the workflow should handle secure credential management.
Observability in Workflows¶
- Incorporate standardised logging steps within workflows at critical decision points, service calls and error occurrences.
- Workflow execution logs should be accessible for troubleshooting. Consider patterns for routing key workflow events or errors to Azure Monitor Log Analytics.
- Implement error handling within workflows to provide meaningful feedback to users.
Data Lifecycle Management in Workflows¶
Workflows can play an active role in managing the lifecycle of data they create or use.
Lifecycle Transitions¶
- Design workflows that can automate parts of the data lifecycle, such as:
- Flagging datasets for archival based on age or usage criteria.
- Applying retention policies to temporary data created by the workflow.
- Initiating data quality review processes.
Event Driven Workflows¶
- Utilise webhooks to trigger workflows based on events in other systems or within ArcGIS Enterprise itself (e.g., an item update in Portal, a new record in a database).
- Consider publishing events to Azure Service Bus when specific conditions are met, enabling orchestration of larger business processes.
Common Anti-Patterns to Avoid¶
- Permanent Storage in ArcGIS Data Store: Using hosted feature layers for long-lived, authoritative enterprise data.
- Direct Database Connections for Writes: Bypassing registered ArcGIS feature services for modifying enterprise data in Azure PostgreSQL.
- Unmanaged Data Duplication: Creating redundant copies of data across different systems or storage tiers without a clear synchronisation and governance strategy.
- Schema Drift: Allowing the schema of hosted feature layers to evolve independently of the enterprise data models, leading to integration issues.
- Unconstrained Growth of Temporary Data: Creating hosted feature layers or temporary outputs without implementing TTL policies or automated cleanup mechanisms.
- Hardcoded Regional Endpoints: Embedding URLs specific to one Azure region (e.g., Melbourne) directly into workflows, preventing seamless DR failover.
- Unlogged Data Migrations: Performing data movement between storage tiers (e.g., from ArcGIS Data Store to Azure PostgreSQL) via workflows without adequate logging.
4.9.2 Data Quality, Validation, and Lineage in Workflows¶
Ensuring data quality, robust validation and clear lineage is paramount when workflows use or create data.
Pre Edit Validation¶
Before any data is committed to an enterprise geodatabase (via a feature service) or a new dataset is finalised, workflows should perform:
- Schema Conformance Checks: Verify that input data or proposed changes adhere to the target layer's field definitions, data types and constraints.
- Attribute Domain Validation: Ensure attribute values fall within predefined coded value domains or ranges.
- Spatial Integrity and Topology Validation: For relevant datasets, use ArcGIS geometry operations or calls to geoprocessing services to check for valid geometries (e.g., no self-intersections in polygons, correct orientation) and topological consistency (e.g., no unintended overlaps or gaps for certain feature types).
- Business Rule Verification: Implement checks to ensure data conforms to specific organisational or operational business rules not covered by basic schema or topology.
Post Edit Validation¶
After data modifications are submitted:
- Commit Verification: Confirm that the changes were successfully applied to the target feature service.
- Propagation Confirmation (if applicable): If edits trigger updates in related datasets or systems, verify this propagation.
- Workflows should include conditional paths to handle data that fails validation (e.g., log the error, notify a data steward, route the data for manual correction, prevent commit).
Metadata Management and Data Lineage¶
Workflows should contribute to maintaining correct metadata and data lineage.
- Metadata Population: When workflows create new datasets or significantly transform existing ones, they should populate or update relevant metadata fields (e.g., description, source, processing steps applied).
- Lineage Tracking:
- For workflows that modify enterprise data, ensure that sufficient information is logged or captured (e.g., within feature service edit tracking, separate audit logs, or workflow execution logs) to trace the origin of the change. This includes:
- The workflow ID and version.
- Timestamp of the operation.
- A summary of the data affected and the changes made.
- This information is crucial for data audits, troubleshooting and understanding the provenance of datasets.
- For workflows that modify enterprise data, ensure that sufficient information is logged or captured (e.g., within feature service edit tracking, separate audit logs, or workflow execution logs) to trace the origin of the change. This includes:
Workflow Testing and Quality Assurance¶
Test Type | Required Coverage | Tools/Methods Recommended |
---|---|---|
Unit Tests | Focus on critical individual workflow steps or complex expressions. Test discrete logic components. | Manual execution of specific activities within VertiGIS Studio Designer; review of expression logic. |
Integration Tests | Verify interactions with all external services (ArcGIS feature services, geoprocessing services, external APIs). Test connectivity, authentication and data exchange. | Test execution within VertiGIS Studio apps; Postman/Newman for direct service endpoint testing. |
End-to-End Tests | Validate complete user journeys through the workflow, covering main success paths and key error handling scenarios. | Manual walkthroughs in target applications (e.g., Web AppBuilder, VertiGIS Studio Web); automated UI testing using Playwright for complex workflows. |
Load Tests | Workflows anticipated to have high concurrency or perform resource intensive server side operations. | Simulate concurrent user execution; use tools such as JMeter or k6 to stress underlying services consumed by the workflow. |
DR Scenario Tests | Test workflow behaviour during simulated Disaster Recovery failover. | Manually trigger workflows in a test environment configured to point to DR endpoints; verify graceful error handling or successful operation with DR services. |
By following these guidelines, VertiGIS Studio Workflow development will make a critical positive contribution to the robustness and maintainability of the new eMap platform.