MySQL Query Optimization & EXPLAIN: A Complete Guide for DBAs and Developers

MySQL query optimization is one of the most critical skills a database administrator or developer can possess. Whether you are managing a high-traffic e-commerce platform, a data warehouse with billions of rows, or a transactional OLTP system, poorly optimized queries are the leading cause of performance degradation, increased I/O, excessive CPU usage, and frustrated end users. At the heart of MySQL's query optimization toolkit lies the EXPLAIN statement — a powerful diagnostic command that reveals how the MySQL query optimizer intends to execute a given SQL statement. In this comprehensive guide, we will explore MySQL query optimization from the ground up: understanding the query execution lifecycle, dissecting every column of the EXPLAIN and EXPLAIN ANALYZE output, identifying common anti-patterns, and applying proven optimization strategies that MySQL DBAs and developers rely on in production environments every day. By the end of this article, you will be equipped with the knowledge to analyze execution plans, eliminate slow queries, and design indexes that drive maximum throughput.

Understanding the MySQL Query Optimizer

Before diving into EXPLAIN, it is essential to understand what the MySQL query optimizer does. The optimizer is a cost-based component within the MySQL server that evaluates multiple possible execution plans for a given query and selects the one with the lowest estimated cost. This cost is calculated based on statistics about tables and indexes stored in the Information Schema and the InnoDB storage engine's internal data dictionary. The optimizer considers factors such as row estimates, index selectivity, join order, and available access methods before producing an execution plan. However, the optimizer is not perfect — it relies on statistics that may be stale or inaccurate, which is why understanding EXPLAIN and knowing how to guide the optimizer with hints is an indispensable skill for any serious MySQL DBA or developer.

The EXPLAIN Statement: Syntax and Variants

MySQL provides several variants of the EXPLAIN statement, each offering different levels of detail about query execution. Understanding when to use each variant is key to efficient query diagnostics.
-- Basic EXPLAIN
EXPLAIN SELECT * FROM orders WHERE customer_id = 1001;

-- EXPLAIN with FORMAT=JSON for richer, structured output
EXPLAIN FORMAT=JSON SELECT * FROM orders WHERE customer_id = 1001;

-- EXPLAIN ANALYZE (MySQL 8.0.18+) - executes query and returns real metrics
EXPLAIN ANALYZE SELECT * FROM orders WHERE customer_id = 1001;

-- EXPLAIN for DML statements
EXPLAIN UPDATE orders SET status = 'shipped' WHERE order_date < '2024-01-01';
EXPLAIN DELETE FROM audit_log WHERE created_at < NOW() - INTERVAL 90 DAY;
EXPLAIN INSERT INTO archive_orders SELECT * FROM orders WHERE status = 'closed';

Sample Schema for Practical Examples

Throughout this guide, we use a realistic e-commerce schema to demonstrate every optimization technique hands-on.
CREATE TABLE customers (
    customer_id    INT UNSIGNED AUTO_INCREMENT PRIMARY KEY,
    email          VARCHAR(255) NOT NULL,
    country_code   CHAR(2) NOT NULL,
    created_at     DATETIME NOT NULL DEFAULT CURRENT_TIMESTAMP,
    status         TINYINT(1) NOT NULL DEFAULT 1,
    UNIQUE KEY uk_email (email),
    KEY idx_country_status (country_code, status),
    KEY idx_created_at (created_at)
) ENGINE=InnoDB;

CREATE TABLE orders (
    order_id       BIGINT UNSIGNED AUTO_INCREMENT PRIMARY KEY,
    customer_id    INT UNSIGNED NOT NULL,
    order_date     DATE NOT NULL,
    total_amount   DECIMAL(12,2) NOT NULL,
    status         ENUM('pending','processing','shipped','delivered','cancelled') NOT NULL,
    KEY idx_customer_id (customer_id),
    KEY idx_order_date_status (order_date, status),
    CONSTRAINT fk_orders_customer FOREIGN KEY (customer_id) REFERENCES customers(customer_id)
) ENGINE=InnoDB;

CREATE TABLE order_items (
    item_id        BIGINT UNSIGNED AUTO_INCREMENT PRIMARY KEY,
    order_id       BIGINT UNSIGNED NOT NULL,
    product_id     INT UNSIGNED NOT NULL,
    quantity       SMALLINT UNSIGNED NOT NULL,
    unit_price     DECIMAL(10,2) NOT NULL,
    KEY idx_order_id (order_id),
    KEY idx_product_id (product_id),
    CONSTRAINT fk_items_order FOREIGN KEY (order_id) REFERENCES orders(order_id)
) ENGINE=InnoDB;

CREATE TABLE products (
    product_id     INT UNSIGNED AUTO_INCREMENT PRIMARY KEY,
    sku            VARCHAR(50) NOT NULL,
    category_id    INT UNSIGNED NOT NULL,
    price          DECIMAL(10,2) NOT NULL,
    stock_qty      INT NOT NULL DEFAULT 0,
    UNIQUE KEY uk_sku (sku),
    KEY idx_category_id (category_id)
) ENGINE=InnoDB;

Dissecting the EXPLAIN Output: Column by Column

The id Column

The id column represents the sequential identifier of each SELECT within the query. Simple queries have a single id of 1. Subqueries and unions produce multiple rows with different id values. Rows with the same id execute as a join; rows with higher id values represent inner subqueries executed before the outer query.

The select_type Column

The select_type column describes the type of SELECT involved. Key values include: SIMPLE (no subqueries or unions), PRIMARY (the outermost SELECT), SUBQUERY (a subquery in SELECT or WHERE), DERIVED (a subquery in the FROM clause), UNION (subsequent SELECT in a UNION), and DEPENDENT SUBQUERY (a correlated subquery — a critical performance red flag indicating the subquery re-evaluates for each outer row).
-- SIMPLE: No subqueries or unions
EXPLAIN SELECT customer_id, email FROM customers WHERE country_code = 'US';

-- PRIMARY + SUBQUERY: Subquery in WHERE clause
EXPLAIN
SELECT order_id, total_amount FROM orders
WHERE customer_id IN (
    SELECT customer_id FROM customers WHERE country_code = 'DE'
);

-- PRIMARY + DERIVED: Subquery in FROM clause (derived table)
EXPLAIN
SELECT d.country_code, COUNT(*) AS order_count
FROM (
    SELECT c.country_code, o.order_id
    FROM customers c
    JOIN orders o ON o.customer_id = c.customer_id
    WHERE o.status = 'delivered'
) d
GROUP BY d.country_code;

-- UNION: Multiple SELECT statements combined
EXPLAIN
SELECT customer_id, 'active' AS label FROM customers WHERE status = 1
UNION ALL
SELECT customer_id, 'inactive' AS label FROM customers WHERE status = 0;

The type Column: The Most Critical Field in EXPLAIN

The type column — also called the join type or access type — is the most important field in the entire EXPLAIN output. It tells you how MySQL accesses rows in a table. From best to worst performance:
  • system — The table has only one row. A special case of const.
  • const — Exactly one matching row via PRIMARY KEY or UNIQUE index. Ideal for primary key lookups.
  • eq_ref — For each row from the preceding table, exactly one row is read via PRIMARY KEY or UNIQUE NOT NULL index. The best possible join access type.
  • ref — Multiple rows may match. Occurs with non-unique indexes or leftmost prefix matches.
  • range — Only rows within a given range are retrieved using an index (BETWEEN, IN, >, <, LIKE with prefix).
  • index — A full index scan. Faster than ALL but potentially a bottleneck on large indexes.
  • ALL — A full table scan. The worst case for large tables — must be eliminated in performance-critical paths.
-- const: Primary key lookup
EXPLAIN SELECT * FROM customers WHERE customer_id = 42;
-- type: const, rows: 1

-- eq_ref: Unique index join (best for joins)
EXPLAIN
SELECT c.email, o.order_id, o.total_amount
FROM orders o
JOIN customers c ON c.customer_id = o.customer_id
WHERE o.order_date = '2024-06-01';
-- type for customers: eq_ref (primary key join)

-- ref: Non-unique index lookup
EXPLAIN SELECT order_id, order_date, status FROM orders WHERE customer_id = 1001;
-- type: ref

-- range: Index range scan
EXPLAIN SELECT * FROM orders
WHERE order_date BETWEEN '2024-01-01' AND '2024-03-31';
-- type: range

-- ALL: Full table scan (must be fixed for large tables!)
EXPLAIN SELECT * FROM orders WHERE total_amount > 5000;
-- type: ALL if no index on total_amount
-- Solution: CREATE INDEX idx_total_amount ON orders(total_amount);

The possible_keys, key, key_len, ref, rows, filtered, and Extra Columns

The possible_keys column lists all indexes MySQL considered; key shows the index actually chosen. When key is NULL despite available indexes in possible_keys, MySQL chose a full table scan — often because statistics suggest too many rows match. Run ANALYZE TABLE to refresh statistics. The key_len column shows how many bytes of the chosen index are used. For composite indexes, this reveals how many columns are utilized. The rows column is MySQL's estimated row examination count — minimize this product across joined tables for optimal performance. The filtered percentage shows what fraction of rows examined actually pass the WHERE clause. The Extra column contains the most actionable diagnostic signals: Using index (covering index — ideal), Using temporary (temp table — investigate), Using filesort (sort without index — add covering index), Using index condition (Index Condition Pushdown active — good), and Using MRR (Multi-Range Read active — good for range scans).
-- Using index: Covering index (zero table row access)
ALTER TABLE orders ADD INDEX idx_cust_covering
    (customer_id, order_id, order_date, total_amount, status);

EXPLAIN
SELECT order_id, order_date, total_amount, status
FROM orders WHERE customer_id = 1001;
-- Extra: Using index

-- Using temporary + Using filesort: Performance red flag
EXPLAIN
SELECT country_code, COUNT(*) AS cnt
FROM customers GROUP BY country_code ORDER BY cnt DESC;
-- Fix: add index on (country_code) to avoid temp table

-- Using filesort on non-indexed ORDER BY
EXPLAIN SELECT order_id, total_amount FROM orders
ORDER BY total_amount DESC LIMIT 20;
-- Fix: CREATE INDEX idx_total_amount ON orders(total_amount);

-- Using index condition: ICP optimization
EXPLAIN SELECT * FROM orders
WHERE order_date BETWEEN '2024-01-01' AND '2024-12-31'
  AND status = 'shipped';
-- Extra: Using index condition

EXPLAIN ANALYZE: Real Execution Metrics in MySQL 8.0

EXPLAIN ANALYZE, introduced in MySQL 8.0.18, executes the query and returns both estimated and actual metrics for each node in the execution plan tree. This is critical for identifying cardinality estimation errors — cases where the optimizer's row estimates diverge wildly from reality, leading to suboptimal plan selection.
EXPLAIN ANALYZE
SELECT
    c.country_code,
    COUNT(DISTINCT o.order_id)       AS total_orders,
    SUM(oi.unit_price * oi.quantity) AS total_revenue
FROM customers c
JOIN orders o     ON o.customer_id = c.customer_id
JOIN order_items oi ON oi.order_id = o.order_id
WHERE c.status = 1
  AND o.order_date >= '2024-01-01'
  AND o.status = 'delivered'
GROUP BY c.country_code
ORDER BY total_revenue DESC;
-> Sort: total_revenue DESC  (actual time=142.5..142.7 rows=48 loops=1)
    -> Aggregate using temporary table  (actual time=142.2..142.2 rows=48 loops=1)
        -> Nested loop inner join  (cost=18540.23 rows=9820)
                                   (actual time=0.8..138.6 rows=87342 loops=1)
            -> Nested loop inner join  (cost=5421.12 rows=3240)
                                       (actual time=0.5..22.4 rows=28918 loops=1)
                -> Filter: (c.status = 1)  (cost=1240.80 rows=8400)
                   (actual time=0.3..8.7 rows=71230 loops=1)
                    -> Index scan on c using idx_country_status
                       (cost=1240.80 rows=84000)
                       (actual time=0.2..6.9 rows=84000 loops=1)
                -> Filter: (o.order_date >= '2024-01-01') and (o.status='delivered')
                   (cost=0.25 rows=1) (actual time=0.00019..0.00019 rows=0 loops=71230)
                    -> Index lookup on o using idx_customer_id
                       (customer_id=c.customer_id)  (cost=0.25 rows=1)
                       (actual time=0.00017..0.00017 rows=1 loops=71230)
            -> Index lookup on oi using idx_order_id (order_id=o.order_id)
               (cost=1.12 rows=3) (actual time=0.003..0.004 rows=3 loops=28918)
Key analysis points: compare the estimated rows against actual rows. When these diverge by orders of magnitude, consider running ANALYZE TABLE or increasing innodb_stats_persistent_sample_pages. The actual time=start..end values are in milliseconds. The loops value shows how many times each node executed — high loop counts on expensive inner operations are the primary target for optimization.

Common Query Anti-Patterns and How to Fix Them

Anti-Pattern 1: Functions on Indexed Columns in WHERE Clauses

Wrapping an indexed column inside a function prevents MySQL from using the index, forcing a full table scan. This is one of the most common and damaging anti-patterns found in production SQL workloads — and the fix is almost always straightforward.
-- BAD: Function prevents index usage
EXPLAIN SELECT * FROM orders
WHERE YEAR(order_date) = 2024 AND MONTH(order_date) = 6;
-- type: ALL (full table scan on potentially millions of rows)

-- GOOD: Rewrite as range condition (uses index)
EXPLAIN SELECT * FROM orders
WHERE order_date >= '2024-06-01' AND order_date < '2024-07-01';
-- type: range, Extra: Using index condition

-- BAD: LIKE with leading wildcard (no index possible)
EXPLAIN SELECT * FROM products WHERE sku LIKE '%ABC%';
-- Consider FULLTEXT index for arbitrary substring searches
ALTER TABLE products ADD FULLTEXT INDEX ft_sku (sku);
SELECT * FROM products WHERE MATCH(sku) AGAINST('ABC' IN BOOLEAN MODE);

-- GOOD: LIKE with trailing wildcard (uses index prefix scan)
EXPLAIN SELECT * FROM products WHERE sku LIKE 'ABC%';
-- type: range

-- BAD: Function on indexed column breaks index usage
EXPLAIN SELECT * FROM customers WHERE LOWER(email) = 'user@example.com';

-- GOOD: Functional index (MySQL 8.0+) preserves index access
ALTER TABLE customers ADD INDEX idx_email_lower ((LOWER(email)));
EXPLAIN SELECT * FROM customers WHERE LOWER(email) = 'user@example.com';
-- type: ref, key: idx_email_lower

Anti-Pattern 2: The N+1 Query Problem

The N+1 problem occurs when an application executes one query to retrieve N records and then fires an additional query for each record — N+1 total round trips. This is catastrophic at scale and entirely preventable with proper JOIN usage or batch fetching.
-- BAD: N+1 pattern (500 pending orders = 501 queries!)
-- Query 1: SELECT order_id FROM orders WHERE status = 'pending';
-- Then for each order_id:
-- Queries 2..501: SELECT * FROM order_items WHERE order_id = ?;

-- GOOD: Single JOIN eliminates N+1 completely
EXPLAIN
SELECT
    o.order_id, o.order_date, o.total_amount,
    oi.item_id, oi.product_id, oi.quantity, oi.unit_price
FROM orders o
JOIN order_items oi ON oi.order_id = o.order_id
WHERE o.status = 'pending'
ORDER BY o.order_id, oi.item_id;
-- type for orders: ref (idx_status)
-- type for order_items: ref (idx_order_id)
-- One query, complete result set

Anti-Pattern 3: SELECT * Instead of Column Projection

Using SELECT * prevents covering index usage, transfers unnecessary data across the network, and makes execution plans less predictable as schemas evolve. Always project only the columns your application actually needs.
-- BAD: SELECT * forces table row access even when index could cover query
EXPLAIN SELECT * FROM orders WHERE customer_id = 1001;

-- GOOD: Project only needed columns enables covering index
ALTER TABLE orders ADD INDEX idx_cust_cover
    (customer_id, order_id, order_date, total_amount, status);

EXPLAIN
SELECT order_id, order_date, total_amount, status
FROM orders WHERE customer_id = 1001;
-- type: ref, Extra: Using index (all data from index - zero table access)

Advanced Indexing Strategies for MySQL Query Optimization

Composite Index Design: The Left-Prefix Rule

Composite indexes follow the left-prefix rule: MySQL can only use an index starting from the leftmost column. A composite index on (A, B, C) supports queries on A, A+B, or A+B+C — but not B or C alone. Design composite indexes with equality columns first, range condition columns second, and ORDER BY / GROUP BY columns last to eliminate filesort operations.
-- Query: WHERE status = 'shipped' AND order_date BETWEEN x AND y ORDER BY order_date
-- Optimal: equality first, range second, ORDER BY aligned with range column
ALTER TABLE orders ADD INDEX idx_status_date_opt (status, order_date);

EXPLAIN
SELECT order_id, customer_id, total_amount
FROM orders
WHERE status = 'shipped'
  AND order_date BETWEEN '2024-01-01' AND '2024-06-30'
ORDER BY order_date;
-- type: range, key: idx_status_date_opt
-- Extra: Using index condition  (NO filesort! ORDER BY uses index)

-- Verify index columns being used via key_len
-- status ENUM NOT NULL = 1 byte
-- order_date DATE NOT NULL = 3 bytes
-- key_len = 4 means BOTH columns are utilized

-- Covering composite index for aggregate queries
ALTER TABLE orders ADD INDEX idx_grp_covering
    (status, order_date, customer_id, total_amount);

EXPLAIN
SELECT status, order_date, COUNT(*) AS cnt, SUM(total_amount) AS revenue
FROM orders
WHERE status IN ('shipped', 'delivered')
  AND order_date >= '2024-01-01'
GROUP BY status, order_date;
-- Extra: Using index (full covering index - no table access whatsoever)

Invisible Indexes: Safe Index Testing Without Dropping

MySQL 8.0 introduced invisible indexes, which the optimizer ignores while InnoDB continues maintaining them. This allows DBAs to safely validate the impact of removing an index before permanently dropping it — an indispensable tool for production index lifecycle management.
-- Make an index invisible to test impact of removing it
ALTER TABLE orders ALTER INDEX idx_status INVISIBLE;

-- EXPLAIN now shows optimizer ignoring this index
EXPLAIN SELECT * FROM orders WHERE status = 'pending';
-- possible_keys: NULL (invisible index ignored)

-- Re-enable the index
ALTER TABLE orders ALTER INDEX idx_status VISIBLE;

-- Allow session to see invisible indexes for targeted testing
SET SESSION optimizer_switch = 'use_invisible_indexes=on';
EXPLAIN SELECT * FROM orders WHERE status = 'pending';
SET SESSION optimizer_switch = 'use_invisible_indexes=off';

-- Check visibility status of all indexes
SELECT index_name, is_visible
FROM information_schema.STATISTICS
WHERE table_schema = 'ecommerce' AND table_name = 'orders'
GROUP BY index_name, is_visible;

Index Hints and MySQL 8.0 Optimizer Hints

When the MySQL optimizer makes a poor index selection — often due to outdated statistics or unusual data distributions — index hints and optimizer hints allow targeted intervention. Use them sparingly and always validate with EXPLAIN, as they bypass the optimizer's cost model.
-- FORCE INDEX: Optimizer must use this index (ignores all others)
EXPLAIN SELECT * FROM orders FORCE INDEX (idx_order_date_status)
WHERE order_date >= '2024-01-01' AND status = 'delivered';

-- USE INDEX: Suggests an index (optimizer may still ignore)
EXPLAIN SELECT * FROM orders USE INDEX (idx_customer_id)
WHERE customer_id = 1001;

-- IGNORE INDEX: Prevents use of a specific index
EXPLAIN SELECT * FROM orders IGNORE INDEX (idx_status)
WHERE status = 'pending' AND order_date >= '2024-01-01';

-- Optimizer hints (MySQL 8.0+ preferred method)
SELECT /*+ NO_HASH_JOIN(o, c) */
    o.order_id, c.email, o.total_amount
FROM orders o
JOIN customers c ON c.customer_id = o.customer_id
WHERE o.status = 'pending';

-- SET_VAR hint: Change variable scope for a single query
SELECT /*+ SET_VAR(sort_buffer_size=4194304) */
    customer_id, SUM(total_amount) AS revenue
FROM orders
GROUP BY customer_id
ORDER BY revenue DESC
LIMIT 100;

Subquery Optimization and Common Table Expressions

Subqueries can be highly efficient or devastating for performance depending on how they are written. The most dangerous anti-pattern is the correlated subquery — a subquery with a DEPENDENT SUBQUERY select_type that re-evaluates for every row of the outer query. MySQL 8.0's Common Table Expressions (CTEs) provide both performance parity with JOINs and dramatically improved readability for complex multi-step queries.
-- BAD: Correlated subquery (re-evaluated N times for N outer rows)
EXPLAIN
SELECT o.order_id, o.total_amount,
    (SELECT SUM(oi.unit_price * oi.quantity)
     FROM order_items oi
     WHERE oi.order_id = o.order_id) AS calculated_total
FROM orders o
WHERE o.order_date >= '2024-01-01';
-- select_type: DEPENDENT SUBQUERY (executed once per outer row!)

-- GOOD: JOIN with aggregation (single pass over data)
EXPLAIN
SELECT o.order_id, o.total_amount, oi_agg.calculated_total
FROM orders o
JOIN (
    SELECT order_id, SUM(unit_price * quantity) AS calculated_total
    FROM order_items GROUP BY order_id
) oi_agg ON oi_agg.order_id = o.order_id
WHERE o.order_date >= '2024-01-01';

-- BEST: CTE for readability with equivalent performance (MySQL 8.0+)
WITH order_totals AS (
    SELECT order_id, SUM(unit_price * quantity) AS calculated_total
    FROM order_items GROUP BY order_id
)
SELECT o.order_id, o.total_amount, ot.calculated_total
FROM orders o
JOIN order_totals ot ON ot.order_id = o.order_id
WHERE o.order_date >= '2024-01-01';

-- Recursive CTE: Hierarchical queries (category trees, org charts)
WITH RECURSIVE category_tree AS (
    SELECT category_id, parent_id, name, 0 AS depth
    FROM categories WHERE parent_id IS NULL
    UNION ALL
    SELECT c.category_id, c.parent_id, c.name, ct.depth + 1
    FROM categories c
    JOIN category_tree ct ON ct.category_id = c.parent_id
)
SELECT category_id, CONCAT(REPEAT('  ', depth), name) AS indented_name
FROM category_tree ORDER BY category_id;

Optimizing Pagination: Escaping the LIMIT/OFFSET Trap

Naive pagination using high OFFSET values is a classic performance trap. As OFFSET grows, MySQL must scan and discard increasingly large numbers of rows before returning the requested page — a problem known as deep pagination. For large datasets, cursor-based pagination using the last seen primary key delivers constant-time performance regardless of page depth.
-- BAD: High offset forces full scan of 1,000,100 rows
EXPLAIN SELECT order_id, order_date, total_amount
FROM orders ORDER BY order_id
LIMIT 100 OFFSET 1000000;
-- rows: 1000100 (scans and discards 1,000,000 rows)

-- GOOD: Cursor-based (keyset) pagination - constant performance
-- First page:
SELECT order_id, order_date, total_amount
FROM orders WHERE order_id > 0
ORDER BY order_id LIMIT 100;

-- Next page (pass last_order_id from previous result set):
SELECT order_id, order_date, total_amount
FROM orders
WHERE order_id > :last_order_id
ORDER BY order_id LIMIT 100;
-- type: range, rows: 100 (reads exactly what is needed)

-- Alternative: Late row lookup for complex multi-column sort
SELECT o.*
FROM orders o
JOIN (
    SELECT order_id FROM orders
    ORDER BY total_amount DESC, order_id
    LIMIT 100 OFFSET 50000
) ids ON ids.order_id = o.order_id
ORDER BY o.total_amount DESC, o.order_id;
-- Inner query works only with index pages; outer fetches only 100 full rows

Statistics Management and the Query Optimizer

The MySQL optimizer's decisions are only as good as the statistics it uses. Stale or inaccurate statistics lead to poor plan choices — wrong join orders, missed index usage, and cardinality estimation errors. As a MySQL DBA, proactively managing statistics is a core operational responsibility, especially after bulk data loads or large DELETE operations.
-- Refresh table statistics
ANALYZE TABLE orders, customers, order_items, products;

-- View table statistics and sizes
SELECT table_name,
    table_rows,
    ROUND(data_length / 1024 / 1024, 2)  AS data_mb,
    ROUND(index_length / 1024 / 1024, 2) AS index_mb,
    update_time
FROM information_schema.TABLES
WHERE table_schema = 'ecommerce'
ORDER BY data_length DESC;

-- Check index cardinality (higher = more selective = better)
SELECT index_name, column_name, seq_in_index, cardinality, nullable
FROM information_schema.STATISTICS
WHERE table_schema = 'ecommerce' AND table_name = 'orders'
ORDER BY index_name, seq_in_index;

-- Increase sample pages for better statistics on large tables
ALTER TABLE orders STATS_SAMPLE_PAGES = 50;
ANALYZE TABLE orders;

-- InnoDB persistent statistics settings
SHOW VARIABLES LIKE 'innodb_stats%';
-- innodb_stats_persistent = ON (recommended for production)
-- innodb_stats_persistent_sample_pages = 20 (increase for accuracy)

-- Check when InnoDB table statistics were last updated
SELECT * FROM mysql.innodb_table_stats
WHERE database_name = 'ecommerce';

Performance Schema: Identifying the Highest-Impact Slow Queries

MySQL's Performance Schema provides comprehensive instrumentation tables for real-time query performance monitoring. For MySQL DBAs, mastering the Performance Schema is essential for identifying the highest-impact optimization targets in production — revealing far more than the slow query log alone.
-- Top 10 slowest queries by total execution time
SELECT
    DIGEST_TEXT                                 AS query_template,
    COUNT_STAR                                  AS exec_count,
    ROUND(SUM_TIMER_WAIT / 1e12, 3)            AS total_time_sec,
    ROUND(AVG_TIMER_WAIT / 1e12, 6)            AS avg_time_sec,
    ROUND(MAX_TIMER_WAIT / 1e12, 6)            AS max_time_sec,
    SUM_ROWS_EXAMINED                           AS total_rows_examined,
    ROUND(SUM_ROWS_EXAMINED / COUNT_STAR, 0)   AS avg_rows_examined,
    SUM_NO_INDEX_USED                           AS full_scans
FROM performance_schema.events_statements_summary_by_digest
WHERE SCHEMA_NAME = 'ecommerce'
ORDER BY SUM_TIMER_WAIT DESC
LIMIT 10;

-- Queries performing full table scans in production
SELECT
    DIGEST_TEXT,
    COUNT_STAR,
    SUM_NO_INDEX_USED,
    ROUND(AVG_TIMER_WAIT / 1e12, 6) AS avg_sec
FROM performance_schema.events_statements_summary_by_digest
WHERE SCHEMA_NAME = 'ecommerce' AND SUM_NO_INDEX_USED > 0
ORDER BY SUM_NO_INDEX_USED DESC LIMIT 10;

-- sys schema: Simplified top-level performance view
SELECT * FROM sys.statement_analysis
WHERE db = 'ecommerce'
ORDER BY total_latency DESC LIMIT 10;

-- sys schema: All queries doing full table scans
SELECT * FROM sys.statements_with_full_table_scans
WHERE db = 'ecommerce'
ORDER BY no_index_used_count DESC;

The Optimizer Trace: Deep-Dive Plan Analysis

When EXPLAIN and EXPLAIN ANALYZE do not provide sufficient insight, the Optimizer Trace delivers a complete JSON log of every decision the optimizer made — including all alternative plans considered and their cost estimates. This is the ultimate diagnostic instrument for resolving the most difficult query optimization problems.
-- Enable optimizer trace
SET SESSION optimizer_trace = 'enabled=on';
SET SESSION optimizer_trace_max_mem_size = 1048576;

-- Run the query to analyze
SELECT order_id, customer_id, total_amount
FROM orders
WHERE status = 'shipped'
  AND order_date BETWEEN '2024-01-01' AND '2024-06-30'
ORDER BY total_amount DESC
LIMIT 50;

-- Retrieve the trace (JSON format)
SELECT QUERY, TRACE
FROM information_schema.OPTIMIZER_TRACE\G

-- Key JSON sections to examine:
-- "considered_execution_plans": All plans evaluated
-- "best_access_path": Index chosen and why
-- "rows_estimation": Cardinality estimates per table
-- "cost_info": read_cost, eval_cost, prefix_cost per plan

-- Disable optimizer trace
SET SESSION optimizer_trace = 'enabled=off';

Key MySQL Variables for Query Performance Tuning

Beyond index design, several MySQL server variables directly influence query execution performance. Understanding and tuning these variables is a critical complement to query-level optimization in production environments.
-- Sort buffer: used when ORDER BY/GROUP BY cannot use an index
SHOW VARIABLES LIKE 'sort_buffer_size';          -- Default: 256KB
SET SESSION sort_buffer_size = 4 * 1024 * 1024; -- 4MB for heavy sorts

-- Join buffer: used for Block Nested Loop joins (non-indexed joins)
SHOW VARIABLES LIKE 'join_buffer_size';          -- Default: 256KB
SET SESSION join_buffer_size = 2 * 1024 * 1024; -- 2MB for large joins

-- Temporary table memory thresholds (exceeding causes disk spill)
SHOW VARIABLES LIKE 'tmp_table_size';            -- Default: 16MB
SHOW VARIABLES LIKE 'max_heap_table_size';       -- Default: 16MB
-- Set both equal to prevent disk-based temp tables

-- InnoDB buffer pool: the single most impactful performance variable
SHOW VARIABLES LIKE 'innodb_buffer_pool_size';   -- Target: 70-80% of total RAM

-- Enable slow query log for continuous production monitoring
SET GLOBAL slow_query_log = ON;
SET GLOBAL long_query_time = 1;                  -- Capture queries > 1 second
SET GLOBAL log_queries_not_using_indexes = ON;   -- Capture queries without indexes
SHOW VARIABLES LIKE 'slow_query_log_file';       -- Check log file location

-- Read buffer: sequential scan performance
SHOW VARIABLES LIKE 'read_buffer_size';          -- Default: 128KB
SHOW VARIABLES LIKE 'read_rnd_buffer_size';      -- Default: 256KB

MySQL Query Optimization Checklist for DBAs and Developers

The following checklist provides a systematic approach to diagnosing and resolving slow queries in MySQL production environments. Apply these steps in order for every optimization engagement.
  1. Capture the slow query — Use the slow query log, performance_schema.events_statements_summary_by_digest, or sys.statement_analysis to identify the highest-impact queries by total execution time and examination count.
  2. Run EXPLAIN and EXPLAIN ANALYZE — Review every column starting with type (eliminate ALL and index scans), rows (minimize the cross-join product), and Extra (eliminate Using filesort and Using temporary where feasible).
  3. Verify index design — Confirm indexes exist on all columns used in WHERE, JOIN ON, GROUP BY, and ORDER BY. Design composite indexes following the left-prefix rule: equality conditions first, range conditions second.
  4. Eliminate anti-patterns — Remove functions on indexed columns in WHERE, replace SELECT * with projection, convert correlated subqueries to JOINs, and eliminate N+1 patterns entirely.
  5. Update table statistics — Run ANALYZE TABLE after bulk data changes to ensure the optimizer works with accurate cardinality estimates.
  6. Validate with EXPLAIN ANALYZE — After applying changes, re-run EXPLAIN ANALYZE to confirm actual row counts match optimizer estimates and execution time has improved measurably.
  7. Test with production-scale data — Always benchmark optimizations against data volumes comparable to production. An effective index on 10,000 rows may not scale to 100,000,000 rows.
  8. Monitor continuously — Use Performance Schema and the sys schema to continuously monitor query performance and proactively identify regressions before they impact users.

Conclusion

MySQL query optimization is both a science and an art. The science lies in understanding how the cost-based optimizer works, how indexes are structured and accessed internally by InnoDB, and how to interpret every field of the EXPLAIN and EXPLAIN ANALYZE output with precision. The art lies in applying this knowledge pragmatically — knowing when to add a composite index, when to rewrite a correlated subquery as a JOIN, when to refresh statistics, and when to override the optimizer with targeted hints. Mastering the techniques in this guide — from dissecting EXPLAIN columns and eliminating full table scans, to designing optimal composite and covering indexes, avoiding deep pagination traps, leveraging invisible indexes for safe lifecycle management, and using the Performance Schema for continuous monitoring — equips you to build MySQL-backed systems that scale confidently to hundreds of millions of rows and thousands of concurrent connections. The return on investment in MySQL query optimization skills is exceptional: reduced infrastructure costs, dramatically improved user experience, fewer on-call incidents, and a more resilient, predictable database tier. Every millisecond shaved from a high-frequency query executed millions of times daily translates directly into meaningful savings and competitive advantage. Start every optimization engagement with EXPLAIN, follow the evidence rigorously, and let the data guide every decision you make.
About MinervaDB Corporation 302 Articles
Full-stack Database Infrastructure Architecture, Engineering and Operations Consultative Support(24*7) Provider for PostgreSQL, MySQL, MariaDB, MongoDB, ClickHouse, Trino, SQL Server, Cassandra, CockroachDB, Yugabyte, Couchbase, Redis, Valkey, NoSQL, NewSQL, SAP HANA, Databricks, Amazon Resdhift, Amazon Aurora, CloudSQL, Snowflake and AzureSQL with core expertize in Performance, Scalability, High Availability, Database Reliability Engineering, Database Upgrades/Migration, and Data Security.