如何才能发挥数据库的真正作用？

bisal

发布于 2026-03-12 16:36:18

1110

文章被收录于专栏：bisal的个人杂货铺bisal的个人杂货铺

点击标题下「蓝色微信名」可快速关注

技术社群的这篇文章《那个几乎搞崩我们数据库的巨大错误，你可能正在犯……》给我们讲解了一些软件开发过程中如何用好数据库的观点，用好了，"事半功倍"，如果方向错了，就可能"事倍功半"。

文章的核心，倡导我们不要将数据库当作愚蠢的存储桶，而是开始就将它当作强大的计算引擎来使用，

用数据库做它擅长的事
不要传输和处理你不需要的数据
当有意义时编写SQL
让数据库执行基于集合的操作

数据库和应用，需要平衡，有些逻辑属于应用程序代码，

复杂的业务工作流
与外部服务的集成
任何需要分支到多个系统的操作
用户特定的格式化和呈现

由于对数据库的基本原理不了解，很可能开发过程中就将数据库看做一个黑盒，这就导致可能用到的只是数据库的一些皮毛功能（简单的CRUD），或者过度使用（复杂的存储过程）。数据库和应用之间应该是一种平衡，无论是资源，还是逻辑设计上，这就要求我们能了解数据库的原理功能，才能针对不同的场景采用合适的方案。

我是付出了惨痛代价才明白这一点的！接下来，让我告诉你我们是如何让应用程序濒临崩溃，以及我们如何通过减少代码中的工作、增加数据库中的工作来修复它。

仪表板的缓慢退出

我们构建了很普遍的一个分析仪表板：显示不同时间段、区域和产品类别的收入指标（基本的企业智能内容）。

我们最初的实现很简单，获取所有数据，在Node.js中处理，发送到前端：

async function getRevenueDashboard(filters) {
  // Fetch all orders from the last year
  const orders = await db.query(`
    SELECT 
      id, 
      amount, 
      created_at, 
      product_id, 
      region,
      customer_id
    FROM orders
    WHERE created_at > NOW() - INTERVAL '1 year'
  `);
  
  // Fetch all products
  const products = await db.query(`
    SELECT id, name, category
    FROM products
  `);
  
  // Create product lookup map
  const productMap = products.reduce((acc, p) => {
    acc[p.id] = p;
    return acc;
  }, {});
  
  // Process in JavaScript
  const result = {
    totalRevenue: 0,
    byCategory: {},
    byRegion: {},
    byMonth: {}
  };
  
  orders.forEach(order => {
    const product = productMap[order.product_id];
    const category = product?.category || 'Unknown';
    const month = order.created_at.toISOString().substring(0, 7);
    
    // Total revenue
    result.totalRevenue += order.amount;
    
    // By category
    result.byCategory[category] = (result.byCategory[category] || 0) + order.amount;
    
    // By region
    result.byRegion[order.region] = (result.byRegion[order.region] || 0) + order.amount;
    
    // By month
    result.byMonth[month] = (result.byMonth[month] || 0) + order.amount;
  });
  
  return result;
}

顿悟

WITH monthly_revenue AS (
  SELECT 
    DATE_TRUNC('month', o.created_at) as month,
    p.category,
    o.region,
    SUM(o.amount) as revenue
  FROM orders o
  JOIN products p ON o.product_id = p.id
  WHERE o.created_at > NOW() - INTERVAL '1 year'
  GROUP BY DATE_TRUNC('month', o.created_at), p.category, o.region
)
SELECT 
  -- Total revenue
  (SELECT SUM(revenue) FROM monthly_revenue) as total_revenue,
  
  -- By category
  (
    SELECT json_object_agg(category, category_revenue)
    FROM (
      SELECT category, SUM(revenue) as category_revenue
      FROM monthly_revenue
      GROUP BY category
    ) cat
  ) as by_category,
  
  -- By region
  (
    SELECT json_object_agg(region, region_revenue)
    FROM (
      SELECT region, SUM(revenue) as region_revenue
      FROM monthly_revenue
      GROUP BY region
    ) reg
  ) as by_region,
  
  -- By month
  (
    SELECT json_object_agg(month, month_revenue)
    FROM (
      SELECT month, SUM(revenue) as month_revenue
      FROM monthly_revenue
      GROUP BY month
      ORDER BY month
    ) mon
  ) as by_month;

async function getRevenueDashboard(filters) {
  const result = await db.query(`
    -- That whole SQL query above
  `);
  
  return result.rows[0];
}

就这样，数据库完成所有工作，Node只返回结果。

为什么会出现这种情况

真实的错误例子

例子1：N+1查询问题

我经常看到这种情况：

// Bad: N+1 queries
const users = await db.query('SELECT * FROM users');
for (const user of users) {
  const orders = await db.query(
    'SELECT * FROM orders WHERE user_id = ?',
    [user.id]
  );
  user.orderCount = orders.length;
  user.totalSpent = orders.reduce((sum, o) => sum + o.amount, 0);
}

如果有1000个用户，这就执行了1001次数据库查询。

// Good: One query
const users = await db.query(`
  SELECT 
    u.*,
    COUNT(o.id) as order_count,
    COALESCE(SUM(o.amount), 0) as total_spent
  FROM users u
  LEFT JOIN orders o ON u.id = o.user_id
  GROUP BY u.id
`);

一次查询，数据库完成所有工作。

例子2：在代码中进行复杂过滤

// Bad: Fetch everything, filter in code
const products = await db.query('SELECT * FROM products');
const available = products.filter(p => 
  p.stock > 0 && 
  p.price < 100 && 
  p.category === 'electronics' &&
  !p.discontinued
);

你刚刚通过网络拉取了50000个产品，只为了找到12个。

-- Good: Let the database filter
SELECT * FROM products
WHERE stock > 0
  AND price < 100
  AND category = 'electronics'
  AND discontinued = false;

数据库有索引，它知道如何高效过滤，利用它。

例子3：计算衍生值

// Bad: Compute in application
const orders = await db.query(`
  SELECT amount, tax_rate FROM orders
`);
orders.forEach(order => {
  order.taxAmount = order.amount * order.tax_rate;
  order.total = order.amount + order.taxAmount;
});
-- Good: Compute in database
SELECT 
  amount,
  tax_rate,
  amount * tax_rate as tax_amount,
  amount + (amount * tax_rate) as total
FROM orders;

SQL为此做了优化，JavaScript没有。

这在什么时候真正重要

-- This query processes millions of rows
-- but returns only a small summary
SELECT 
  product_id,
  COUNT(*) as order_count,
  SUM(quantity) as total_quantity,
  AVG(price) as avg_price,
  PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY price) as median_price
FROM order_items
WHERE created_at > NOW() - INTERVAL '30 days'
GROUP BY product_id
HAVING COUNT(*) > 10
ORDER BY total_quantity DESC
LIMIT 100;

尝试在应用程序代码中做这个？你会把数百万行加载到内存里。数据库在毫秒内完成。

场景2：带排序的分页

// Bad: Fetch everything, sort in app, paginate in app
const allUsers = await db.query('SELECT * FROM users');
const sorted = allUsers.sort((a, b) => b.created_at - a.created_at);
const page = sorted.slice(offset, offset + limit);

你获取了100000个用户只为了显示20个。

-- Good: Database handles it
SELECT * FROM users
ORDER BY created_at DESC
LIMIT 20 OFFSET 0;

场景3：复杂的业务逻辑

人们认为业务逻辑必须存在于应用程序代码中，但数据库可以处理它：

-- Calculate customer lifetime value with complex rules
WITH customer_orders AS (
  SELECT 
    customer_id,
    SUM(total_amount) as total_spent,
    COUNT(*) as order_count,
    MAX(created_at) as last_order_date
  FROM orders
  WHERE status = 'completed'
  GROUP BY customer_id
),
customer_segments AS (
  SELECT 
    c.id,
    c.email,
    co.total_spent,
    co.order_count,
    co.last_order_date,
    CASE 
      WHEN co.total_spent > 10000 THEN 'vip'
      WHEN co.total_spent > 5000 THEN 'premium'
      WHEN co.total_spent > 1000 THEN 'regular'
      ELSE 'new'
    END as segment,
    CASE
      WHEN co.last_order_date > NOW() - INTERVAL '30 days' THEN 'active'
      WHEN co.last_order_date > NOW() - INTERVAL '90 days' THEN 'at_risk'
      ELSE 'churned'
    END as status
  FROM customers c
  LEFT JOIN customer_orders co ON c.id = co.customer_id
)
SELECT * FROM customer_segments
WHERE segment = 'vip' AND status = 'at_risk';

这种逻辑写成SQL，通常比写成带有多个循环和条件判断的应用程序代码，更具可读性和可维护性。

ORM怎么样？

“但是我用ORM！那不是正确的方法吗？”

ORM对于CRUD操作来说很棒。但对于复杂查询来说它们很糟糕，因为它们强迫你从对象的角度思考，而不是集合的角度。

// Typical ORM approach
const users = await User.findAll({
  include: [{
    model: Order,
    required: false
  }]
});
// Now you have users with orders nested
// But you still need to process this in JS to get totals

// Use ORM for this
const user = await User.findByPk(userId);
user.email = newEmail;
await user.save();
// Use raw SQL for this
const stats = await db.query(`
  SELECT 
    COUNT(*) as total_users,
    COUNT(*) FILTER (WHERE last_login > NOW() - INTERVAL '7 days') as active_users,
    AVG(order_count) as avg_orders_per_user
  FROM users
`);

你没在用的数据库功能

现代数据库拥有令人难以置信的功能，但大多数开发者从未接触过：

窗口函数

-- Running total and ranking in one query
SELECT 
  date,
  revenue,
  SUM(revenue) OVER (ORDER BY date) as running_total,
  RANK() OVER (ORDER BY revenue DESC) as revenue_rank
FROM daily_revenue;

试试在应用程序代码中做这个，我等着你！

公共表表达式（CTEs）

将复杂查询拆分成可读的部分：

WITH high_value_customers AS (
  SELECT customer_id, SUM(amount) as total
  FROM orders
  GROUP BY customer_id
  HAVING SUM(amount) > 10000
),
recent_orders AS (
  SELECT *
  FROM orders
  WHERE created_at > NOW() - INTERVAL '30 days'
)
SELECT 
  c.name,
  hvc.total as lifetime_value,
  COUNT(ro.id) as recent_orders
FROM customers c
JOIN high_value_customers hvc ON c.id = hvc.customer_id
LEFT JOIN recent_orders ro ON c.id = ro.customer_id
GROUP BY c.id, c.name, hvc.total;

这比用多个循环编写的等效应用程序代码更具可读性。

JSON操作

PostgreSQL可以直接处理JSON：

-- Query JSON fields efficiently
SELECT 
  id,
  metadata->>'name' as name,
  metadata->'preferences'->>'theme' as theme
FROM users
WHERE metadata @> '{"status": "active"}';

不需要在你的应用中反序列化JSON。数据库处理它。

生成列

让数据库维护衍生值：

CREATE TABLE products (
  id SERIAL PRIMARY KEY,
  price DECIMAL(10,2),
  tax_rate DECIMAL(3,2),
  price_with_tax DECIMAL(10,2) GENERATED ALWAYS AS (price * (1 + tax_rate)) STORED
);

永远不要再在代码中计算 price_with_tax 了。它总是正确的。

约束

使用数据库来强制执行业务规则：

CREATE TABLE orders (
  id SERIAL PRIMARY KEY,
  quantity INTEGER CHECK (quantity > 0),
  status VARCHAR(20) CHECK (status IN ('pending', 'processing', 'completed', 'cancelled')),
  completed_at TIMESTAMP,
  CONSTRAINT completed_orders_must_have_date 
    CHECK (status != 'completed' OR completed_at IS NOT NULL)
);

你的应用程序代码无法意外违反这些规则。

思维模式的转变