Query Expert

Master database queries across SQL and NoSQL systems. Generate optimized queries, analyze performance with EXPLAIN plans, design effective indexes, and troubleshoot slow queries.

What This Skill Does

Helps you write efficient, performant database queries:

Generate Queries - SQL, MongoDB, GraphQL queries Optimize Queries - Performance tuning and refactoring Design Indexes - Index strategies for faster queries Analyze Performance - EXPLAIN plans and query analysis Troubleshoot - Debug slow queries and bottlenecks Best Practices - Query patterns and anti-patterns Supported Databases SQL Databases PostgreSQL - Advanced features, CTEs, window functions MySQL/MariaDB - InnoDB optimization, replication SQLite - Embedded database optimization SQL Server - T-SQL, execution plans, DMVs Oracle - PL/SQL, partitioning, hints NoSQL Databases MongoDB - Aggregation pipelines, indexes Redis - Key-value queries, Lua scripts Elasticsearch - Full-text search queries Cassandra - CQL, partition keys Query Languages SQL - Standard and vendor-specific MongoDB Query Language - Find, aggregation GraphQL - Efficient data fetching Cypher - Neo4j graph queries SQL Query Patterns SELECT Queries Basic SELECT -- ✅ Select only needed columns SELECT user_id, email, created_at FROM users WHERE status = 'active' AND created_at > NOW() - INTERVAL '30 days' ORDER BY created_at DESC LIMIT 100;

-- ❌ Avoid SELECT * SELECT * FROM users; -- Wastes resources

JOINs -- INNER JOIN (most common) SELECT o.order_id, o.total, c.name AS customer_name, c.email FROM orders o INNER JOIN customers c ON o.customer_id = c.customer_id WHERE o.created_at >= '2024-01-01';

-- LEFT JOIN (include all left rows) SELECT c.customer_id, c.name, COUNT(o.order_id) AS order_count, COALESCE(SUM(o.total), 0) AS total_spent FROM customers c LEFT JOIN orders o ON c.customer_id = o.customer_id GROUP BY c.customer_id, c.name;

-- Multiple JOINs SELECT o.order_id, c.name AS customer_name, p.product_name, oi.quantity, oi.price FROM orders o INNER JOIN customers c ON o.customer_id = c.customer_id INNER JOIN order_items oi ON o.order_id = oi.order_id INNER JOIN products p ON oi.product_id = p.product_id WHERE o.status = 'completed';

Subqueries -- Subquery in WHERE SELECT name, email FROM customers WHERE customer_id IN ( SELECT DISTINCT customer_id FROM orders WHERE total > 1000 );

-- Correlated subquery SELECT c.name, (SELECT COUNT(*) FROM orders o WHERE o.customer_id = c.customer_id) AS order_count FROM customers c;

-- ✅ Better: Use JOIN instead SELECT c.name, COUNT(o.order_id) AS order_count FROM customers c LEFT JOIN orders o ON c.customer_id = o.customer_id GROUP BY c.customer_id, c.name;

Aggregation -- GROUP BY with aggregates SELECT category, COUNT() AS product_count, AVG(price) AS avg_price, MIN(price) AS min_price, MAX(price) AS max_price, SUM(stock_quantity) AS total_stock FROM products GROUP BY category HAVING COUNT() > 5 ORDER BY avg_price DESC;

-- Multiple GROUP BY columns SELECT DATE_TRUNC('month', created_at) AS month, category, SUM(total) AS monthly_sales FROM orders GROUP BY DATE_TRUNC('month', created_at), category ORDER BY month DESC, monthly_sales DESC;

-- ROLLUP for subtotals SELECT COALESCE(category, 'TOTAL') AS category, COALESCE(brand, 'All Brands') AS brand, SUM(sales) AS total_sales FROM products GROUP BY ROLLUP(category, brand);

Window Functions (PostgreSQL, SQL Server, MySQL 8+) -- ROW_NUMBER SELECT customer_id, order_date, total, ROW_NUMBER() OVER ( PARTITION BY customer_id ORDER BY order_date DESC ) AS order_rank FROM orders;

-- Running totals SELECT order_date, total, SUM(total) OVER ( ORDER BY order_date ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW ) AS running_total FROM orders;

-- RANK vs DENSE_RANK SELECT product_name, sales, RANK() OVER (ORDER BY sales DESC) AS rank, DENSE_RANK() OVER (ORDER BY sales DESC) AS dense_rank, NTILE(4) OVER (ORDER BY sales DESC) AS quartile FROM products;

-- LAG and LEAD SELECT order_date, total, LAG(total, 1) OVER (ORDER BY order_date) AS prev_total, LEAD(total, 1) OVER (ORDER BY order_date) AS next_total, total - LAG(total, 1) OVER (ORDER BY order_date) AS change FROM orders;

CTEs (Common Table Expressions) -- Simple CTE WITH active_customers AS ( SELECT customer_id, name, email FROM customers WHERE status = 'active' ) SELECT ac.name, COUNT(o.order_id) AS order_count FROM active_customers ac LEFT JOIN orders o ON ac.customer_id = o.customer_id GROUP BY ac.customer_id, ac.name;

-- Multiple CTEs WITH monthly_sales AS ( SELECT DATE_TRUNC('month', order_date) AS month, SUM(total) AS sales FROM orders GROUP BY DATE_TRUNC('month', order_date) ), avg_monthly AS ( SELECT AVG(sales) AS avg_sales FROM monthly_sales ) SELECT ms.month, ms.sales, am.avg_sales, ms.sales - am.avg_sales AS variance FROM monthly_sales ms CROSS JOIN avg_monthly am ORDER BY ms.month;

-- Recursive CTE (hierarchies) WITH RECURSIVE org_tree AS ( -- Base case SELECT employee_id, name, manager_id, 1 AS level, ARRAY[employee_id] AS path FROM employees WHERE manager_id IS NULL

UNION ALL

-- Recursive case
SELECT
    e.employee_id,
    e.name,
    e.manager_id,
    ot.level + 1,
    ot.path || e.employee_id
FROM employees e
INNER JOIN org_tree ot ON e.manager_id = ot.employee_id

) SELECT * FROM org_tree ORDER BY path;

Query Optimization 1. Use Indexes Effectively -- Create index on frequently queried columns CREATE INDEX idx_users_email ON users(email); CREATE INDEX idx_orders_customer_date ON orders(customer_id, order_date);

-- Composite index (order matters!) CREATE INDEX idx_orders_composite ON orders(status, customer_id, order_date);

-- ✅ This query uses the index SELECT * FROM orders WHERE status = 'pending' AND customer_id = 123 AND order_date > '2024-01-01';

-- ❌ This doesn't use the index (skips first column) SELECT * FROM orders WHERE customer_id = 123;

-- Partial/Filtered index (smaller, faster) CREATE INDEX idx_active_users ON users(email) WHERE status = 'active';

-- Covering index (includes all needed columns) CREATE INDEX idx_users_covering ON users(email) INCLUDE (name, created_at);

Avoid SELECT * -- ❌ Bad: Retrieves all columns SELECT * FROM users;

-- ✅ Good: Select only needed columns SELECT user_id, email, name FROM users;

-- ✅ Good: More efficient for joins SELECT u.user_id, u.email, o.order_id, o.total FROM users u INNER JOIN orders o ON u.user_id = o.user_id;

Optimize JOINs -- ❌ Bad: Filtering after JOIN SELECT u.name, o.total FROM users u LEFT JOIN orders o ON u.user_id = o.user_id WHERE o.status = 'completed';

-- ✅ Good: Filter before JOIN SELECT u.name, o.total FROM users u INNER JOIN ( SELECT user_id, total FROM orders WHERE status = 'completed' ) o ON u.user_id = o.user_id;

-- ✅ Even better: Use WHERE with INNER JOIN SELECT u.name, o.total FROM users u INNER JOIN orders o ON u.user_id = o.user_id WHERE o.status = 'completed';

Use EXISTS Instead of IN -- ❌ Slower: IN with subquery SELECT name FROM customers WHERE customer_id IN ( SELECT customer_id FROM orders WHERE total > 1000 );

-- ✅ Faster: EXISTS SELECT name FROM customers c WHERE EXISTS ( SELECT 1 FROM orders o WHERE o.customer_id = c.customer_id AND o.total > 1000 );

Avoid Functions on Indexed Columns -- ❌ Bad: Function prevents index usage SELECT * FROM users WHERE LOWER(email) = 'john@example.com';

-- ✅ Good: Use functional index CREATE INDEX idx_users_email_lower ON users(LOWER(email));

-- Or use case-insensitive collation SELECT * FROM users WHERE email = 'john@example.com' COLLATE utf8_general_ci;

Limit Result Sets -- ✅ Use LIMIT/TOP for pagination SELECT * FROM orders ORDER BY created_at DESC LIMIT 20 OFFSET 0;

-- ✅ Use WHERE to reduce rows early SELECT * FROM orders WHERE created_at > NOW() - INTERVAL '7 days' ORDER BY created_at DESC;

Batch Operations -- ❌ Bad: Multiple single inserts INSERT INTO users (name, email) VALUES ('User1', 'user1@example.com'); INSERT INTO users (name, email) VALUES ('User2', 'user2@example.com');

-- ✅ Good: Batch insert INSERT INTO users (name, email) VALUES ('User1', 'user1@example.com'), ('User2', 'user2@example.com'), ('User3', 'user3@example.com');

-- ✅ Good: Batch update UPDATE products SET price = price * 1.1 WHERE category IN ('Electronics', 'Computers');

EXPLAIN Plans PostgreSQL -- Simple EXPLAIN EXPLAIN SELECT * FROM orders WHERE customer_id = 123;

-- EXPLAIN ANALYZE (actually runs query) EXPLAIN ANALYZE SELECT c.name, COUNT(o.order_id) AS order_count FROM customers c LEFT JOIN orders o ON c.customer_id = o.customer_id GROUP BY c.customer_id, c.name;

-- Look for: -- - Seq Scan (bad, needs index) -- - Index Scan (good) -- - Bitmap Heap Scan (good for multiple rows) -- - Hash Join vs Nested Loop -- - High cost numbers

MySQL -- EXPLAIN EXPLAIN SELECT * FROM orders WHERE customer_id = 123;

-- EXPLAIN ANALYZE (MySQL 8.0.18+) EXPLAIN ANALYZE SELECT * FROM orders WHERE customer_id = 123;

-- Look for: -- - type: ALL (table scan, bad) -- - type: index (index scan, good) -- - type: ref (index lookup, great) -- - Extra: Using filesort (may need index) -- - Extra: Using temporary (may need optimization)

Indexing Strategies When to Index

✅ Index these columns:

Primary keys (automatic) Foreign keys Columns in WHERE clauses Columns in JOIN conditions Columns in ORDER BY Columns in GROUP BY

❌ Don't index:

Small tables (< 1000 rows) Columns with low cardinality (few distinct values) Frequently updated columns Large text/blob columns Index Types -- B-Tree (default, most common) CREATE INDEX idx_users_email ON users(email);

-- Hash index (equality only, PostgreSQL) CREATE INDEX idx_users_email_hash ON users USING HASH(email);

-- GIN (full-text search, arrays, JSONB) CREATE INDEX idx_posts_content_gin ON posts USING GIN(to_tsvector('english', content));

-- GiST (geometric, full-text) CREATE INDEX idx_locations_gist ON locations USING GIST(coordinates);

-- Partial index (filtered) CREATE INDEX idx_orders_pending ON orders(customer_id) WHERE status = 'pending';

-- Expression index CREATE INDEX idx_users_email_domain ON users((email ~~ '%@gmail.com%'));

Composite Index Order -- Index column order matters! CREATE INDEX idx_orders_search ON orders(status, customer_id, created_at);

-- ✅ Uses index (left-most column) WHERE status = 'completed'

-- ✅ Uses index (left-most columns) WHERE status = 'completed' AND customer_id = 123

-- ✅ Uses full index WHERE status = 'completed' AND customer_id = 123 AND created_at > '2024-01-01'

-- ❌ Doesn't use index (skips first column) WHERE customer_id = 123

-- ❌ Doesn't use index (skips first column) WHERE created_at > '2024-01-01'

MongoDB Queries Find Queries // Basic find db.users.find({ status: 'active' })

// Find with projection db.users.find( { status: 'active' }, { name: 1, email: 1, _id: 0 } )

// Find with operators db.orders.find({ total: { $gt: 100, $lt: 1000 }, status: { $in: ['pending', 'processing'] }, 'customer.city': 'New York' })

// Find with sort and limit db.products.find({ category: 'Electronics' }) .sort({ price: -1 }) .limit(10)

// Count db.users.countDocuments({ status: 'active' })

Aggregation Pipeline // Group and count db.orders.aggregate([ { $match: { status: 'completed' } }, { $group: { _id: '$customer_id', total_orders: { $sum: 1 }, total_spent: { $sum: '$total' }, avg_order: { $avg: '$total' } }}, { $sort: { total_spent: -1 } }, { $limit: 10 } ])

// Lookup (JOIN) db.orders.aggregate([ { $lookup: { from: 'customers', localField: 'customer_id', foreignField: '_id', as: 'customer' }}, { $unwind: '$customer' }, { $project: { order_id: 1, total: 1, 'customer.name': 1, 'customer.email': 1 }} ])

// Complex aggregation db.sales.aggregate([ // Filter { $match: { date: { $gte: ISODate('2024-01-01') } }},

// Add computed fields
{ $addFields: {
    month: { $month: '$date' },
    year: { $year: '$date' }
}},

// Group by month
{ $group: {
    _id: { year: '$year', month: '$month' },
    total_sales: { $sum: '$amount' },
    order_count: { $sum: 1 },
    avg_sale: { $avg: '$amount' }
}},

// Sort
{ $sort: { '_id.year': 1, '_id.month': 1 } },

// Reshape
{ $project: {
    _id: 0,
    date: {
        $concat: [
            { $toString: '$_id.year' },
            '-',
            { $toString: '$_id.month' }
        ]
    },
    total_sales: 1,
    order_count: 1,
    avg_sale: { $round: ['$avg_sale', 2] }
}}

])

MongoDB Indexes // Single field index db.users.createIndex({ email: 1 })

// Compound index db.orders.createIndex({ customer_id: 1, created_at: -1 })

// Unique index db.users.createIndex({ email: 1 }, { unique: true })

// Partial index db.orders.createIndex( { customer_id: 1 }, { partialFilterExpression: { status: 'active' } } )

// Text index db.products.createIndex({ name: 'text', description: 'text' })

// TTL index (auto-delete after time) db.sessions.createIndex( { created_at: 1 }, { expireAfterSeconds: 3600 } )

// List indexes db.users.getIndexes()

// Analyze query performance db.orders.find({ customer_id: 123 }).explain('executionStats')

GraphQL Queries

Basic query

query { users { id name email } }

Query with arguments

query { user(id: "123") { name email orders { id total status } } }

Query with variables

query GetUser($userId: ID!) { user(id: $userId) { name email orders(limit: 10, status: COMPLETED) { id total createdAt } } }

Fragments (reusable fields)

fragment UserFields on User { id name email createdAt }

query { user(id: "123") { ...UserFields orders { id total } } }

Avoid N+1 queries with DataLoader

query { orders { id total customer { # Batched by DataLoader name email } } }

Common Anti-Patterns ❌ N+1 Query Problem -- Bad: N+1 queries SELECT * FROM customers; -- 1 query -- Then for each customer: SELECT * FROM orders WHERE customer_id = ?; -- N queries

-- Good: Single JOIN query SELECT c.customer_id, c.name, o.order_id, o.total FROM customers c LEFT JOIN orders o ON c.customer_id = o.customer_id;

❌ Using OR on Different Columns -- Bad: Can't use indexes effectively SELECT * FROM products WHERE name = 'iPhone' OR category = 'Electronics';

-- Good: Use UNION SELECT * FROM products WHERE name = 'iPhone' UNION SELECT * FROM products WHERE category = 'Electronics';

❌ Implicit Type Conversion -- Bad: '123' is string, user_id is integer SELECT * FROM users WHERE user_id = '123';

-- Good: Use correct type SELECT * FROM users WHERE user_id = 123;

Query Performance Checklist Select only needed columns (no SELECT *) Add indexes to WHERE/JOIN/ORDER BY columns Use EXPLAIN to analyze query plan Avoid functions on indexed columns Use EXISTS instead of IN for subqueries Batch INSERT/UPDATE operations Use appropriate JOIN types Filter early (WHERE before JOIN) Use LIMIT for large result sets Monitor slow query logs Update statistics regularly Avoid SELECT DISTINCT when possible Use covering indexes when appropriate Resources PostgreSQL: https://www.postgresql.org/docs/current/performance-tips.html MySQL: https://dev.mysql.com/doc/refman/8.0/en/optimization.html MongoDB: https://docs.mongodb.com/manual/core/query-optimization/ Use The Index, Luke: https://use-the-index-luke.com/

"Premature optimization is the root of all evil, but slow queries are the root of all frustration."

query-expert

安装

Basic query

Query with arguments

Query with variables

Fragments (reusable fields)

Avoid N+1 queries with DataLoader