SQL Server 2025 Maximum Availability: Complete Guide to Always On, Failover Clustering, and High Availability Architecture

SQL Server 2025 High Availability has reached a new frontier with Microsoft's most advanced release to date. Whether you are a database administrator architecting enterprise workloads, a cloud engineer designing multi-region deployments, or a DevOps practitioner integrating SQL Server into Kubernetes, understanding SQL Server 2025 Maximum Availability is no longer optional — it is a business imperative. Downtime costs enterprises an average of $5,600 per minute, and SQL Server 2025 introduces a portfolio of high availability technologies designed to bring that number as close to zero as possible. In this comprehensive guide, we explore every pillar of SQL Server 2025 High Availability architecture — from Always On Availability Groups and Failover Cluster Instances to Distributed Availability Groups, Contained Availability Groups, and Kubernetes-native HA. We walk through real-world T-SQL configurations, best practices for RPO/RTO optimization, and architectural patterns used by MinervaDB engineers in production environments.

Table of Contents

What Is SQL Server 2025 Maximum Availability?

Maximum Availability in SQL Server 2025 refers to the comprehensive set of features, configurations, and architectural patterns that together eliminate or minimize unplanned and planned downtime. Microsoft defines Maximum Availability across four dimensions:
  • High Availability (HA): Automated local failover within a datacenter — primarily via Always On Availability Groups (AAG) and Failover Cluster Instances (FCI).
  • Disaster Recovery (DR): Geo-redundant failover across datacenters and cloud regions using Distributed Availability Groups and Log Shipping.
  • Business Continuity (BC): Keeping reads and secondary workloads alive during maintenance using readable secondaries and Availability Group offloading.
  • Online Operations: Zero-downtime schema changes, online index rebuilds, and Accelerated Database Recovery (ADR) — greatly enhanced in SQL Server 2025.

SQL Server 2025 High Availability: New Features Overview

SQL Server 2025 builds on the solid HA foundation of SQL Server 2022 and introduces game-changing capabilities that redefine what is possible for enterprise database availability:
  • Contained Availability Groups with Full msdb Integration: System databases (msdb, SQL Agent jobs, linked servers, logins) are now fully contained inside the AG, eliminating the manual synchronization that plagued earlier releases.
  • Automatic Seeding Enhancements: Faster parallel seeding with bandwidth throttling and progress monitoring via sys.dm_hadr_automatic_seeding.
  • Intelligent Failover Policy: Machine-learning-based health detection that reduces spurious failovers caused by transient I/O spikes.
  • Kubernetes HA Operator (GA): The SQL Server Operator for Kubernetes reaches general availability in 2025, enabling StatefulSet-based AG deployments with automated failover on pod and node failure.
  • Cross-Region Azure Availability Zones Support: Native multi-zone synchronous commit replicas that survive AZ-level outages with zero data loss.
  • Accelerated Database Recovery (ADR) v3: Transaction log-based version store improvements reduce recovery time after a crash to under 10 seconds for most workloads.

Always On Availability Groups in SQL Server 2025

Always On Availability Groups (AAG) remain the cornerstone of SQL Server 2025 High Availability. An AG is a set of user databases — the availability databases — that fail over together. Each AG has one primary replica that accepts read-write connections, and up to eight secondary replicas (nine total) that can serve read-only queries.

Synchronous vs. Asynchronous Commit Mode

The choice of commit mode directly determines your Recovery Point Objective (RPO):
  • Synchronous Commit: The primary waits for the secondary to harden the log before committing. RPO = 0 (zero data loss). Ideal for replicas within a low-latency LAN or the same Azure region.
  • Asynchronous Commit: The primary commits without waiting. RPO equals some data loss corresponding to the replication lag. Ideal for cross-region DR replicas where latency is high.

Configuring an Always On Availability Group in SQL Server 2025

The following T-SQL demonstrates creating an AG with one synchronous and one asynchronous replica, using the new SQL Server 2025 CONTAINED option and SEEDING_MODE = AUTOMATIC:
-- Step 1: Enable Always On on each node (run on each replica)
-- Requires Windows Server Failover Cluster (WSFC) or Pacemaker on Linux

-- Step 2: Create the Availability Group (run on the primary replica)
CREATE AVAILABILITY GROUP [AG_Production_2025]
WITH (
    AUTOMATED_BACKUP_PREFERENCE     = SECONDARY,
    DB_FAILOVER                     = ON,
    DTC_SUPPORT                     = PER_DB,
    CLUSTER_TYPE                    = WSFC,
    REQUIRED_SYNCHRONIZED_SECONDARIES_TO_COMMIT = 1,
    CONTAINED                       = YES,
    REUSE_SYSTEM_DATABASES          = YES
)
FOR
    DATABASE [SalesDB],
    DATABASE [OrdersDB],
    DATABASE [InventoryDB]
REPLICA ON
    N'SQLNODE01' WITH (
        ENDPOINT_URL              = N'TCP://SQLNODE01.corp.local:5022',
        FAILOVER_MODE             = AUTOMATIC,
        AVAILABILITY_MODE         = SYNCHRONOUS_COMMIT,
        BACKUP_PRIORITY           = 50,
        SECONDARY_ROLE (
            ALLOW_CONNECTIONS     = READ_ONLY,
            READ_ONLY_ROUTING_URL = N'TCP://SQLNODE01.corp.local:1433'
        ),
        SEEDING_MODE              = AUTOMATIC
    ),
    N'SQLNODE02' WITH (
        ENDPOINT_URL              = N'TCP://SQLNODE02.corp.local:5022',
        FAILOVER_MODE             = AUTOMATIC,
        AVAILABILITY_MODE         = SYNCHRONOUS_COMMIT,
        BACKUP_PRIORITY           = 50,
        SECONDARY_ROLE (
            ALLOW_CONNECTIONS     = READ_ONLY,
            READ_ONLY_ROUTING_URL = N'TCP://SQLNODE02.corp.local:1433'
        ),
        SEEDING_MODE              = AUTOMATIC
    ),
    N'SQLNODE03-DR' WITH (
        ENDPOINT_URL              = N'TCP://SQLNODE03.corp.local:5022',
        FAILOVER_MODE             = MANUAL,
        AVAILABILITY_MODE         = ASYNCHRONOUS_COMMIT,
        BACKUP_PRIORITY           = 80,
        SECONDARY_ROLE (
            ALLOW_CONNECTIONS     = READ_ONLY,
            READ_ONLY_ROUTING_URL = N'TCP://SQLNODE03.corp.local:1433'
        ),
        SEEDING_MODE              = AUTOMATIC
    )
LISTENER N'AG_Listener_Prod' (
    WITH IP (
        (N'10.0.1.100', N'255.255.255.0'),
        (N'10.0.2.100', N'255.255.255.0')
    ),
    PORT = 1433
);
GO

-- Grant CREATE ANY DATABASE on secondaries for automatic seeding
ALTER AVAILABILITY GROUP [AG_Production_2025]
    GRANT CREATE ANY DATABASE;
GO

Monitoring Always On Availability Groups

After deploying your AG, continuous monitoring is essential. SQL Server 2025 introduces expanded DMVs for deeper observability into replication lag and estimated data loss:
-- Monitor AG health, synchronization state, and estimated data loss
SELECT
    ag.name                         AS availability_group_name,
    ar.replica_server_name          AS replica_server,
    ar.availability_mode_desc       AS commit_mode,
    ar.failover_mode_desc           AS failover_mode,
    ars.role_desc                   AS current_role,
    ars.synchronization_health_desc AS sync_health,
    ars.connected_state_desc        AS connection_state,
    adc.database_name               AS database_name,
    drs.synchronization_state_desc  AS db_sync_state,
    drs.log_send_queue_size         AS log_send_queue_kb,
    drs.log_send_rate               AS log_send_rate_kbps,
    drs.redo_queue_size             AS redo_queue_kb,
    drs.redo_rate                   AS redo_rate_kbps,
    drs.secondary_lag_seconds       AS estimated_lag_seconds,
    drs.estimated_data_loss_time    AS estimated_data_loss_sec
FROM sys.availability_groups                    ag
JOIN sys.availability_replicas                  ar  ON ag.group_id  = ar.group_id
JOIN sys.dm_hadr_availability_replica_states    ars ON ar.replica_id = ars.replica_id
JOIN sys.availability_databases_cluster         adc ON ag.group_id  = adc.group_id
JOIN sys.dm_hadr_database_replica_states        drs
     ON  adc.group_database_id = drs.group_database_id
     AND ar.replica_id         = drs.replica_id
ORDER BY ag.name, ars.role_desc, adc.database_name;
GO

Windows Server Failover Clustering (WSFC) with SQL Server 2025

SQL Server 2025 Failover Cluster Instances (FCI) use Windows Server Failover Clustering to provide instance-level high availability. Unlike AGs (which protect specific databases), an FCI protects the entire SQL Server instance — all databases, logins, SQL Agent jobs, linked servers, and instance-level configuration.

Key FCI Architecture Decisions

  • Shared Storage: All nodes access the same storage (SAN, Azure Shared Disks, or Storage Spaces Direct). The storage is only mounted on the active node at any given time.
  • Virtual Network Name (VNN): Clients always connect to the VNN, which floats to the active node after a failover event.
  • Quorum Configuration: SQL Server 2025 on WSFC supports Disk Witness, File Share Witness, and Cloud Witness (Azure Blob Storage) for quorum voting.

Checking FCI Health with T-SQL

-- Check Failover Cluster Instance node status
SELECT
    NodeName,
    Status,
    StatusDescription,
    IsDrainStopped
FROM sys.dm_os_cluster_nodes;
GO

-- Check cluster network health
SELECT
    network_subnet_ip,
    network_subnet_ipv4_mask,
    is_public,
    is_ipv4
FROM sys.dm_hadr_cluster_networks;
GO

-- Verify FCI quorum state
SELECT
    quorum_type_desc,
    quorum_state_desc
FROM sys.dm_hadr_cluster;
GO

Distributed Availability Groups for Multi-Site Disaster Recovery

A Distributed Availability Group (DAG) connects two independent AGs — potentially on different WSFC clusters, different operating systems (Windows and Linux), or even in different Azure regions. This is the premier SQL Server 2025 architecture for active-active and active-passive multi-datacenter disaster recovery deployments.

Creating a Distributed Availability Group

-- Run on the PRIMARY replica of AG1 (primary datacenter)
CREATE AVAILABILITY GROUP [DAG_GlobalProd]
WITH (DISTRIBUTED)
AVAILABILITY GROUP ON
    N'AG_Datacenter1' WITH (
        LISTENER_URL    = N'TCP://AG_DC1_Listener.corp.local:5022',
        AVAILABILITY_MODE = ASYNCHRONOUS_COMMIT,
        FAILOVER_MODE   = MANUAL,
        SEEDING_MODE    = AUTOMATIC
    ),
    N'AG_Datacenter2' WITH (
        LISTENER_URL    = N'TCP://AG_DC2_Listener.corp.local:5022',
        AVAILABILITY_MODE = ASYNCHRONOUS_COMMIT,
        FAILOVER_MODE   = MANUAL,
        SEEDING_MODE    = AUTOMATIC
    );
GO

-- Run on the PRIMARY replica of AG2 (DR datacenter) to join the DAG
ALTER AVAILABILITY GROUP [DAG_GlobalProd]
JOIN AVAILABILITY GROUP ON
    N'AG_Datacenter1' WITH (
        LISTENER_URL    = N'TCP://AG_DC1_Listener.corp.local:5022',
        AVAILABILITY_MODE = ASYNCHRONOUS_COMMIT,
        FAILOVER_MODE   = MANUAL,
        SEEDING_MODE    = AUTOMATIC
    );
GO

-- Monitor DAG synchronization and lag
SELECT
    ag.name                         AS dag_name,
    ar.replica_server_name          AS replica_server,
    drs.synchronization_state_desc  AS sync_state,
    drs.secondary_lag_seconds       AS lag_seconds,
    drs.log_send_queue_size         AS send_queue_kb,
    drs.estimated_data_loss_time    AS est_data_loss_sec
FROM sys.availability_groups ag
JOIN sys.availability_replicas ar         ON ag.group_id  = ar.group_id
JOIN sys.dm_hadr_database_replica_states drs ON ar.replica_id = drs.replica_id
WHERE ag.is_distributed = 1;
GO

Contained Availability Groups: The SQL Server 2025 Game Changer

The most significant SQL Server 2025 High Availability innovation is the production-hardening of Contained Availability Groups. In a Contained AG, the system metadata — SQL Agent jobs, credentials, linked servers, database mail, and logins — is replicated inside the AG itself rather than living only at the Windows or OS level. This solves the number one operational headache with traditional AGs: the manual work of keeping SQL Agent jobs and logins synchronized across replicas. With Contained AGs in SQL Server 2025, failover is truly seamless with no post-failover manual intervention required.

Managing Contained AG System Objects

-- Verify Contained AG configuration
SELECT
    ag.name                         AS ag_name,
    ag.is_contained                 AS is_contained,
    ag.reuse_system_databases       AS reuse_system_dbs
FROM sys.availability_groups ag
WHERE ag.is_contained = 1;
GO

-- Create a SQL Agent job inside a Contained AG
-- (This job automatically replicates to all secondary replicas)
USE msdb;
GO

EXEC sp_add_job
    @job_name    = N'AG_Nightly_Index_Rebuild',
    @description = N'Nightly index rebuild - executes on AG primary only';

EXEC sp_add_jobstep
    @job_name       = N'AG_Nightly_Index_Rebuild',
    @step_name      = N'Rebuild Indexes on SalesDB',
    @subsystem      = N'TSQL',
    @command        = N'EXEC SalesDB.dbo.sp_RebuildIndexes;',
    @database_name  = N'SalesDB';

EXEC sp_add_schedule
    @schedule_name     = N'Nightly_2AM',
    @freq_type         = 4,
    @active_start_time = 020000;

EXEC sp_attach_schedule
    @job_name      = N'AG_Nightly_Index_Rebuild',
    @schedule_name = N'Nightly_2AM';

EXEC sp_add_jobserver
    @job_name = N'AG_Nightly_Index_Rebuild';
GO

Accelerated Database Recovery (ADR) in SQL Server 2025

Accelerated Database Recovery (ADR) is a transformative SQL Server feature that separates the version store from the transaction log, enabling near-instant database recovery regardless of the number of active transactions at the time of a crash. In SQL Server 2025, ADR v3 introduces several key improvements for high availability workloads:
  • Persistent Version Store (PVS) Auto-Cleanup: Background cleanup is more aggressive and configurable, preventing PVS bloat even under heavy OLTP workloads with millions of short transactions per second.
  • ADR with In-Memory OLTP: Full compatibility with memory-optimized tables (Hekaton), enabling sub-second recovery for in-memory workloads.
  • Cross-Database Transaction Recovery: Recovery of distributed transactions spanning multiple databases within a Contained AG is handled atomically, ensuring consistency after failover.
-- Enable ADR on production databases
ALTER DATABASE SalesDB
SET ACCELERATED_DATABASE_RECOVERY = ON
(PERSISTENT_VERSION_STORE_FILEGROUP = [PRIMARY]);
GO

ALTER DATABASE OrdersDB
SET ACCELERATED_DATABASE_RECOVERY = ON
(PERSISTENT_VERSION_STORE_FILEGROUP = [PRIMARY]);
GO

-- Monitor ADR Persistent Version Store usage (SQL Server 2025)
SELECT
    DB_NAME(pvs.database_id)                         AS database_name,
    pvs.pvs_filegroup_name                           AS pvs_filegroup,
    pvs.persistent_version_store_size_kb / 1024.0    AS pvs_size_mb,
    pvs.online_index_version_store_size_kb / 1024.0  AS online_idx_store_mb,
    pvs.current_aborted_transaction_count            AS active_aborted_txns,
    pvs.oldest_active_transaction_id                 AS oldest_active_txn_id,
    pvs.min_transaction_timestamp                    AS oldest_txn_start_time
FROM sys.dm_tran_persistent_version_store_stats pvs
ORDER BY pvs.persistent_version_store_size_kb DESC;
GO

SQL Server 2025 High Availability on Linux and Kubernetes

SQL Server 2025 expands its Linux HA capabilities significantly. Pacemaker combined with Corosync remains the standard for Failover Cluster Instances on Linux, but the SQL Server Operator for Kubernetes — now generally available in 2025 — provides a cloud-native alternative optimized for containerized database environments.

Pacemaker-based AG on Linux (RHEL 9 / Ubuntu 24.04)

#!/bin/bash
# Configure Pacemaker for SQL Server 2025 AG on RHEL 9

# Install required packages
sudo dnf install -y pacemaker pcs corosync fence-agents-all

# Authenticate cluster nodes
sudo pcs host auth sqllinux01 sqllinux02 sqllinux03   -u hacluster -p 'SecureClusterPass!'

# Create and start the Pacemaker cluster
sudo pcs cluster setup SQLCluster2025   sqllinux01 sqllinux02 sqllinux03   --start --enable

# Create the AG resource agent
sudo pcs resource create AG_Production_2025   ocf:mssql:ag   ag_name="AG_Production_2025"   meta failure-timeout=60s   op start   timeout=60s   op stop    timeout=60s   op promote timeout=60s   op monitor interval=10s timeout=60s on-fail=restart   op monitor interval=11s timeout=60s role="Master" on-fail=restart   op monitor interval=12s timeout=60s role="Slave"  on-fail=restart

# Make AG resource promotable (primary/secondary)
sudo pcs resource promotable AG_Production_2025   promoted-max=1 promoted-node-max=1   clone-max=3    clone-node-max=1

# Verify cluster status
sudo pcs status

SQL Server 2025 AG on Kubernetes (Operator GA)

apiVersion: sql.microsoft.com/v1
kind: SqlManagedInstance
metadata:
  name: sql-ag-primary
  namespace: sql-server-ha
spec:
  replicas: 3
  scheduler:
    defaultSchedulerName: default-scheduler
  settings:
    sqlinstance:
      licenseType: LicenseIncluded
      edition: Enterprise
  storage:
    datalogs:
      volumes:
      - volume:
          persistentVolumeClaim:
            claimName: sql-data-pvc
        name: sql-data
    logs:
      volumes:
      - volume:
          persistentVolumeClaim:
            claimName: sql-logs-pvc
        name: sql-logs
  services:
    primary:
      type: LoadBalancer
    secondary:
      type: LoadBalancer
  haServices:
    primaryEndpoint: 5022
  backup:
    retentionPeriodInDays: 7

Read-Scale Secondaries and Workload Offloading

One of the most powerful SQL Server 2025 High Availability features for enterprise environments is the ability to offload read-intensive workloads to secondary replicas. This improves both HA (primaries are less loaded) and analytical performance by routing reporting queries to secondaries.

Read-Only Routing Configuration

-- Configure read-only routing so read-intent connections route
-- automatically to the best available secondary replica

-- Step 1: Configure read-only routing URLs on each replica
ALTER AVAILABILITY GROUP [AG_Production_2025]
MODIFY REPLICA ON N'SQLNODE01' WITH (
    SECONDARY_ROLE (
        READ_ONLY_ROUTING_URL = N'TCP://SQLNODE01.corp.local:1433'
    )
);
GO

ALTER AVAILABILITY GROUP [AG_Production_2025]
MODIFY REPLICA ON N'SQLNODE02' WITH (
    SECONDARY_ROLE (
        READ_ONLY_ROUTING_URL = N'TCP://SQLNODE02.corp.local:1433'
    )
);
GO

-- Step 2: Configure read-only routing list on the primary
ALTER AVAILABILITY GROUP [AG_Production_2025]
MODIFY REPLICA ON N'SQLNODE01' WITH (
    PRIMARY_ROLE (
        READ_ONLY_ROUTING_LIST = (
            (N'SQLNODE02', N'SQLNODE03-DR'),  -- Load-balanced group
            N'SQLNODE02'                       -- Fallback
        )
    )
);
GO

-- Step 3: Connect with read-intent from application
-- Connection string: ApplicationIntent=ReadOnly;MultiSubnetFailover=True
-- Verify routing is working
SELECT
    SERVERPROPERTY('ServerName')    AS connected_server,
    SERVERPROPERTY('IsHadrEnabled') AS hadr_enabled,
    sys.fn_hadr_is_primary_replica(DB_NAME()) AS is_primary_replica;
GO

SQL Server 2025 High Availability Best Practices

Based on MinervaDB's experience managing SQL Server high availability across hundreds of enterprise deployments, here are the most impactful best practices for SQL Server 2025 Maximum Availability:

1. RPO and RTO Planning

Define your Recovery Point Objective (RPO) and Recovery Time Objective (RTO) before selecting your HA technology. Use synchronous commit (RPO=0) for mission-critical OLTP, and asynchronous commit for DR replicas where network latency makes synchronous commit impractical.
-- Measure current replication latency to determine feasibility
-- of synchronous commit for each replica
SELECT
    ar.replica_server_name,
    ar.availability_mode_desc,
    AVG(drs.secondary_lag_seconds)   AS avg_lag_seconds,
    MAX(drs.secondary_lag_seconds)   AS max_lag_seconds,
    AVG(drs.log_send_rate)           AS avg_send_rate_kbps,
    AVG(drs.redo_rate)               AS avg_redo_rate_kbps
FROM sys.availability_replicas                  ar
JOIN sys.dm_hadr_database_replica_states        drs ON ar.replica_id = drs.replica_id
WHERE drs.is_local = 0
GROUP BY ar.replica_server_name, ar.availability_mode_desc
ORDER BY avg_lag_seconds DESC;
GO

2. Health Check Tuning to Prevent Spurious Failovers

The SQL Server 2025 Intelligent Failover Policy uses machine learning, but you can also tune the traditional health check timeout and failure conditions to avoid unnecessary failovers:
-- Tune health check timeout and failure condition level
-- Failure condition level 3 (default): failover on critical server errors
-- Failure condition level 4: also failover on moderate server errors
ALTER AVAILABILITY GROUP [AG_Production_2025]
SET (
    HEALTH_CHECK_TIMEOUT              = 30000,  -- 30 seconds (default: 30000ms)
    FAILURE_CONDITION_LEVEL           = 3       -- Balanced: critical errors only
);
GO

-- For Pacemaker on Linux, adjust timeout values accordingly
-- (in the Pacemaker resource agent configuration)

3. Backup Strategy with Availability Groups

-- Check which replica is preferred for backups
SELECT
    ag.name                      AS availability_group,
    ar.replica_server_name       AS replica,
    ar.backup_priority           AS backup_priority,
    sys.fn_hadr_backup_is_preferred_replica(DB_NAME()) AS is_preferred_backup_replica
FROM sys.availability_groups    ag
JOIN sys.availability_replicas  ar ON ag.group_id = ar.group_id
ORDER BY ar.backup_priority DESC;
GO

-- Use COPY_ONLY for ad-hoc backups on the primary to avoid breaking the log chain
BACKUP DATABASE [SalesDB]
TO DISK = N'/backup/SalesDB_adhoc.bak'
WITH COPY_ONLY, COMPRESSION, STATS = 10;
GO

-- Automate backup selection: run this on all replicas,
-- backup only executes on the preferred replica
IF sys.fn_hadr_backup_is_preferred_replica(N'SalesDB') = 1
BEGIN
    BACKUP DATABASE [SalesDB]
    TO DISK = N'/backup/SalesDB_full.bak'
    WITH COMPRESSION, CHECKSUM, STATS = 10;
    PRINT 'Backup completed on preferred replica.';
END
ELSE
BEGIN
    PRINT 'Skipping backup: this replica is not preferred for SalesDB.';
END
GO

SQL Server 2025 Maximum Availability Architecture: Reference Topology

A production-ready SQL Server 2025 Maximum Availability architecture for a large enterprise typically combines multiple HA technologies in layers:
  • Layer 1 — Local HA (Primary Datacenter): Two-node synchronous Always On AG with automatic failover and a Cloud Witness for quorum. Both nodes participate in a Contained AG with SQL Server 2025.
  • Layer 2 — Regional DR: A third asynchronous replica in a secondary datacenter, configured as a Distributed AG forwarding node for cross-region failover.
  • Layer 3 — Cloud DR: An Azure SQL Managed Instance or SQL Server on Azure VM participating as a DAG endpoint, providing cloud-based DR with native Azure Availability Zone support.
  • Layer 4 — Read Scale-Out: Read-only routing configured to distribute reporting and analytics workloads across secondary replicas, reducing load on the primary.
  • Layer 5 — Observability: SQL Server 2025 Extended Events, Intelligent Query Processing telemetry, and third-party monitoring (Prometheus + Grafana or Datadog) for real-time AG health dashboards.

Performing a Planned Manual Failover in SQL Server 2025

Planned failovers are used for maintenance windows, OS patching, and hardware upgrades. A manual failover with synchronous commit results in zero data loss:
-- Step 1: Verify AG is in SYNCHRONIZED state before failover
SELECT
    ar.replica_server_name,
    ars.role_desc,
    ars.synchronization_health_desc,
    drs.synchronization_state_desc,
    drs.is_suspended
FROM sys.availability_replicas ar
JOIN sys.dm_hadr_availability_replica_states ars ON ar.replica_id = ars.replica_id
JOIN sys.dm_hadr_database_replica_states     drs ON ar.replica_id = drs.replica_id
WHERE drs.is_local = 0;
GO

-- Step 2: Connect to the TARGET secondary and initiate failover
-- (Run on SQLNODE02 — the intended new primary)
ALTER AVAILABILITY GROUP [AG_Production_2025] FAILOVER;
GO

-- Step 3: Verify the new primary after failover
SELECT
    SERVERPROPERTY('ServerName')                       AS server_name,
    sys.fn_hadr_is_primary_replica('SalesDB')          AS is_primary_for_SalesDB,
    sys.fn_hadr_is_primary_replica('OrdersDB')         AS is_primary_for_OrdersDB
GO

Troubleshooting SQL Server 2025 High Availability Issues

Even the best-configured HA environments encounter issues. Here are the most common SQL Server 2025 AG problems and their diagnostic queries:

Issue 1: AG Replica Stuck in SYNCHRONIZING State

-- Identify redo thread bottlenecks on secondary replicas
SELECT
    DB_NAME(drs.database_id)         AS database_name,
    drs.redo_queue_size              AS redo_queue_kb,
    drs.redo_rate                    AS redo_rate_kbps,
    drs.log_send_queue_size          AS send_queue_kb,
    drs.secondary_lag_seconds        AS lag_seconds,
    drs.last_redone_lsn,
    drs.last_received_lsn,
    drs.last_hardened_lsn
FROM sys.dm_hadr_database_replica_states drs
WHERE drs.is_local = 1
ORDER BY drs.redo_queue_size DESC;
GO

Issue 2: Automatic Failover Not Triggering

-- Check failover eligibility: both replicas must be SYNCHRONIZED
-- and the AG must have AUTOMATIC failover configured
SELECT
    ag.name                              AS ag_name,
    ar.replica_server_name               AS replica,
    ar.failover_mode_desc                AS failover_mode,
    ars.synchronization_health_desc      AS sync_health,
    ars.operational_state_desc           AS op_state,
    ars.recovery_health_desc             AS recovery_health
FROM sys.availability_groups                    ag
JOIN sys.availability_replicas                  ar  ON ag.group_id  = ar.group_id
JOIN sys.dm_hadr_availability_replica_states    ars ON ar.replica_id = ars.replica_id
ORDER BY ag.name, ar.replica_server_name;
GO

-- Check Windows Cluster health (run in PowerShell / CMD)
-- Get-ClusterNode | Select Name, State
-- Get-ClusterGroup | Where-Object {$_.GroupType -eq "AvailabilityGroup"}

SQL Server 2025 High Availability on Azure: Azure SQL Managed Instance

For organizations running SQL Server 2025 workloads in Microsoft Azure, Azure SQL Managed Instance (SQL MI) provides a fully managed HA layer built directly into the platform:
  • Business Critical Tier: Built on Always On AG technology with 4 replicas (3 synchronous), SLA of 99.99% uptime, automatic failover in under 30 seconds.
  • General Purpose Tier: Uses Azure Service Fabric with replicated remote storage, 99.99% SLA with RTO under 30 minutes.
  • Auto-Failover Groups: Cross-region active geo-replication built on Distributed AGs, enabling seamless failover between Azure regions.
  • Zone Redundant Configuration: Synchronous replicas spread across Azure Availability Zones within a region for zone-level resilience.

Conclusion: Building a Maximum Availability SQL Server 2025 Environment

SQL Server 2025 Maximum Availability is a multi-layered strategy, not a single feature. The combination of Contained Always On Availability Groups, Accelerated Database Recovery v3, Distributed Availability Groups, Pacemaker on Linux, and the new Kubernetes Operator provides unprecedented flexibility for achieving sub-second failover and zero data loss across on-premises, hybrid, and cloud environments. For organizations serious about SQL Server 2025 High Availability, MinervaDB provides expert database consulting services. Our team of SQL Server DBAs has designed and operated Maximum Availability architectures for global enterprises across financial services, healthcare, retail, and SaaS industries. Key Takeaways:
  • Always On Availability Groups with Contained AG is the recommended HA architecture for SQL Server 2025 enterprise deployments.
  • Use synchronous commit for RPO=0 within the same datacenter or Azure region; use asynchronous commit for geographically distributed DR replicas.
  • Accelerated Database Recovery (ADR) v3 dramatically reduces crash recovery time and should be enabled on all production databases.
  • Distributed Availability Groups extend your HA topology across datacenters and cloud regions without requiring a shared WSFC cluster.
  • Contained AGs eliminate the manual synchronization of SQL Agent jobs and logins, making post-failover operations seamless.
  • For Kubernetes and containerized deployments, the SQL Server 2025 Operator (GA) brings enterprise-grade HA to cloud-native environments.
Need help architecting your SQL Server 2025 High Availability environment? Contact MinervaDB for a complimentary architecture review with our SQL Server experts.
About MinervaDB Corporation 305 Articles
Full-stack Database Infrastructure Architecture, Engineering and Operations Consultative Support(24*7) Provider for PostgreSQL, MySQL, MariaDB, MongoDB, ClickHouse, Trino, SQL Server, Cassandra, CockroachDB, Yugabyte, Couchbase, Redis, Valkey, NoSQL, NewSQL, SAP HANA, Databricks, Amazon Resdhift, Amazon Aurora, CloudSQL, Snowflake and AzureSQL with core expertize in Performance, Scalability, High Availability, Database Reliability Engineering, Database Upgrades/Migration, and Data Security.