3.1: Data Governance & Lifecycle Management
✂️ Tl;dr 🥷
Effective data governance is fundamental to the project's success, ensuring geospatial assets are managed consistently, securely and align with organisational objectives. This section introduces a structured framework defining core governance principles, key operational roles and specific responsibilities across the entire data lifecycle. These principles guide data management, including user control over enterprise data in Azure, adherence to open standards and a clear distinction between enterprise and user-generated data with data quality as a shared responsibility. Key roles such as Data Owners, Data Stewards and Data Custodians along with specialist Data and GIS Engineers ensure proper oversight and execution. The framework details responsibilities for each lifecycle stage from data creation and maintenance through quality control, sharing and archival and eventual retirement, promoting robust compliant data practices throughout.
Data Governance Principles¶
The recommended data governance strategy is built upon the following core principles:
-
User-Managed Enterprise Data: Authoritative enterprise geospatial data resides within user-managed Azure PaaS services, primarily Azure Database for PostgreSQL with PostGIS. This ensures the organisation retains full control over its critical data assets, with lifecycle management performed using native DBMS tools and standard database administration practices.
-
Open Standards & Decoupling for Inputs/Outputs (Principle 2.1.3): Data exchange and service exposure will prioritise open, standardised formats and cloud-native formats (e.g., CRF for rasters). This promotes interoperability with other enterprise systems, reduces vendor lock-in and enhances data accessibility.
-
Clear Data Classification and Purposeful Storage: A fundamental distinction is made between "Enterprise Data" (authoritative, curated, long-lived datasets) and "User-Generated/Temporary Data" (ad-hoc datasets, project-specific content, outputs from self-service analysis). This classification influences storage allocation (Enterprise Geodatabase vs. ArcGIS Data Store), management practices, security controls and lifecycle policies.
-
Data Quality as a Shared Responsibility: Ensuring high-quality data is not solely the responsibility of a single team. All stakeholders, from data creators and workflow developers to data stewards and consumers, play a role in maintaining data accuracy, completeness, consistency, currency and lineage. Proactive quality checks and validation gates, particularly within automated workflows are essential.
-
Comprehensive Data Lifecycle Management: All data assets, regardless of classification, are subject to defined lifecycle management policies. This includes standardised procedures for creation, registration, active use, archival and eventual retirement or deletion, ensuring data is managed appropriately throughout its existence.
-
Data Lineage and Provenance: The origin, transformations and dependencies of key datasets must be documented and traceable. This is crucial for understanding data reliability, supporting audits and managing change. Workflows modifying enterprise data are expected to contribute to this lineage.
-
Security and Compliance by Design: Data security and compliance with organisational and regulatory requirements are integral to all data governance processes. This includes robust access controls, encryption (in transit and at rest) and adherence to data residency policies.
Key Roles and Responsibilities¶
The successful implementation of data governance relies on clearly defined roles and responsibilities. The following key roles are established for the new eMap platform:
-
Data Owners:
- Definition: Senior stakeholders or Executives who hold ultimate accountability for specific datasets or data domains (e.g., cadastral, environmental, asset data).
- Responsibilities:
- Sponsoring and championing data governance initiatives for their respective data domains.
- Approving data policies, standards and access rights related to their datasets.
- Ensuring resources are available for data stewardship activities.
- Making final decisions on data classification, criticality and disposition.
-
Data Stewards:
- Definition: Individuals or teams with day-to-day responsibility for managing the quality, integrity, security and usability of specific datasets within their assigned domain. They are often subject matter experts.
- Responsibilities:
- Implementing and enforcing data governance policies and standards for their datasets.
- Defining and maintaining metadata for their datasets.
- Monitoring data quality, identifying and resolving data quality issues.
- Managing data access requests in accordance with approved policies.
- Guiding data users on the appropriate use of datasets.
- Participating in data lifecycle management processes, including archival and retirement decisions for their datasets.
- Collaborating with Data Owners and Data Custodians.
- Reviewing and approving changes to datasets, particularly those originating from automated workflows.
-
Data Custodians:
- Definition: Includes Cloud Infrastructure (DevOps) Engineers and Cloud Security Engineers, responsible for the technical infrastructure, operational environment and security of the systems where data is stored and processed.
- Responsibilities:
- Implementing and managing the Azure infrastructure (PaaS services, VMs, networks) that hosts the data.
- Ensuring data security controls (e.g., encryption, access controls at the infrastructure level, network security) are in place and effective.
- Managing data backup, recovery and disaster recovery procedures for the underlying storage systems.
- Monitoring system performance and availability.
- Implementing technical aspects of data lifecycle management policies (e.g., storage tiering automation).
-
Data Engineers & Database Administrators (DBAs):
- Definition: Specialists responsible for the design, implementation and maintenance of enterprise geodatabases and GIS services.
- Responsibilities:
- Designing and managing the schema of enterprise geodatabases.
- Implementing and optimising GIS data models and database performance.
- Publishing and managing ArcGIS services from authoritative data sources.
- Providing technical support for data integration and ETL processes.
- Working closely with Data Stewards to ensure data quality and integrity within the enterprise geodatabases.
-
GIS Engineers:
- Definition: Specialists responsible for developing applications, scripts and workflows (e.g., using VertiGIS Studio Workflow) that interact with the new eMap platform.
- Responsibilities:
- Building performant Workflows as per requirements.
- Adhering to defined data governance principles and workflow development patterns.
- Implementing data validation gates within workflows that create or modify data, ensuring schema conformance, spatial integrity and business rule adherence.
- Collaborating with Data Stewards to maintain data requirements and quality standards.
Responsibilities Across the Data Lifecycle¶
-
Data Creation and Registration:
- Data Stewards & GIS Professionals: Ensure new datasets are created according to defined standards, including appropriate data models and quality benchmarks.
- Data Stewards: Responsible for creating and maintaining comprehensive metadata (e.g., adhering to ISO 19115-1 standards if applicable) for new enterprise datasets.
- Data Stewards: Oversee the registration of new authoritative datasets in the data catalog for discoverability.
- GIS Engineers: Implement validation gates in automated data creation processes to ensure integrity.
-
Data Maintenance:
- Data Stewards: Oversee processes for keeping datasets current, accurate and complete.
- Data Engineers & DBAs: Manage updates, geodatabase versioning and implement conflict resolution procedures where applicable.
- Data Custodians: Ensure the underlying infrastructure supports efficient data maintenance operations.
-
Data Quality Control:
- Data Stewards: Define data quality rules and thresholds for their respective datasets. Regularly monitor data quality metrics and initiate remediation activities.
- GIS Engineers: Implement automated validation rules (schema, attribute, spatial, topological, business rules) within data processing and editing workflows.
- All Users: Responsible for reporting observed data quality issues.
-
Data Sharing and Accessibility:
- Data Owners & Data Stewards: Define data sharing policies and approve access requests based on data classification and sensitivity.
- GIS Engineers: Publish services adhering to standards.
- Data Custodians & Cloud Security Specialists: Implement and manage technical access controls (e.g., user permissions, NSG rules, RBAC on Azure resources).
-
Data Archival and Retention:
- Data Owners & Data Stewards: Define criteria for transitioning datasets from active use to archived status, based on factors like age, access frequency and project completion.
- Data Custodians: Implement Azure Storage lifecycle management policies to automatically transition data between Hot, Cool and Archive tiers for services like Azure Blob Storage and ADLS Gen2.
- GIS Engineers & DBAs: Develop and execute procedures for archiving data from enterprise geodatabases while ensuring integrity and long-term discoverability.
-
Data Retirement and Deletion:
- Data Owners: Authorise the final retirement and deletion of datasets.
- Data Stewards: Ensure that data retirement is conducted in accordance with organisational retention schedules and any applicable regulatory requirements.
- Data Custodians: Securely execute data deletion procedures from Azure infrastructure, ensuring data is irrecoverable once approved.