# 数据模型

> 版本: v0.4.0 | 作者: Arch AI | 基于 PRD v0.4.0 + 旧架构合并

---

## 1. 实体关系图 (ER)

```
User ──< UserRelation (邀请树: inviter → invitee)
  │
User ──< ErrorItem >── Subject
  │         │
  │         ├──< CorrectionLog (AI 值 vs 用户修正)
  │         └── KnowledgePoint (多对多)
  │
  └──< AnalysisReport

ErrorItem >──< KnowledgePoint (error_knowledge_points)
PracticeRecommendation >──< KnowledgePoint
  └──< Question (题库题目, 多对多)

Question ──< KnowledgePoint (question_knowledge_points)
Question ── Subject
```

## 2. 表定义

### 2.1 users

| 列 | 类型 | 约束 | 说明 |
|----|------|------|------|
| id | UUID | PK, DEFAULT gen_random_uuid() | 用户 ID |
| wx_openid | VARCHAR(128) | UNIQUE, NOT NULL | 微信 OpenID |
| nickname | VARCHAR(64) | | 微信昵称 |
| avatar_url | VARCHAR(512) | | 头像 URL |
| grade | VARCHAR(16) | | 年级，如"初中二年级" |
| role | VARCHAR(16) | NOT NULL, DEFAULT 'student' | 角色: student/parent/teacher/admin/super_admin (Phase 3 启用后台角色) |
| invitation_code | VARCHAR(16) | UNIQUE | 个人邀请码（6 位字母数字，注册时生成） |
| created_at | TIMESTAMPTZ | NOT NULL, DEFAULT NOW() | |
| updated_at | TIMESTAMPTZ | NOT NULL, DEFAULT NOW() | |

### 2.2 user_relations

用户邀请树结构。记录邀请链，支持树状用户群。

| 列 | 类型 | 约束 | 说明 |
|----|------|------|------|
| id | UUID | PK | |
| inviter_id | UUID | FK → users.id, NOT NULL | 邀请人 |
| invitee_id | UUID | FK → users.id, UNIQUE, NOT NULL | 被邀请人（一个用户只能被一个人邀请） |
| relation_type | VARCHAR(16) | NOT NULL, DEFAULT 'student' | student/parent/colleague |
| created_at | TIMESTAMPTZ | NOT NULL, DEFAULT NOW() | |

**索引**: `(inviter_id)` — 查询某用户邀请的所有人；`(invitee_id)` UNIQUE — 确保一对一邀请链

**树查询**: 通过递归 CTE 查询某用户下的完整子树（老师查看全班学生、机构查看所有老师）

```sql
-- 查询邀请人的一级下线
SELECT * FROM user_relations WHERE inviter_id = $1;

-- 递归查询完整子树（所有下级）
WITH RECURSIVE tree AS (
  SELECT invitee_id, inviter_id, 1 AS depth FROM user_relations WHERE inviter_id = $1
  UNION ALL
  SELECT ur.invitee_id, ur.inviter_id, t.depth + 1
  FROM user_relations ur JOIN tree t ON ur.inviter_id = t.invitee_id
  WHERE t.depth < 10
) SELECT * FROM tree;
```

**典型结构**:
```
机构负责人 (invitation_code: ABC123)
  ├── 老师A (受邀)
  │   ├── 学生1 (受邀)
  │   └── 学生2 (受邀)
  └── 老师B (受邀)
      ├── 学生3 (受邀)
      └── 学生4 (受邀)
```

### 2.3 subjects

| 列 | 类型 | 约束 | 说明 |
|----|------|------|------|
| id | SERIAL | PK | 学科 ID |
| name | VARCHAR(32) | UNIQUE, NOT NULL | 数学/英语/语文/... |
| icon | VARCHAR(32) | | 图标标识 |
| sort_order | INT | DEFAULT 0 | 排序 |

**预置数据**: 数学、英语（首发）、语文、物理、化学、生物、地理、历史、政治（后续扩展）

### 2.4 knowledge_points

| 列 | 类型 | 约束 | 说明 |
|----|------|------|------|
| id | SERIAL | PK | 知识点 ID（内部关联用） |
| code | VARCHAR(32) | UNIQUE | 业务编码，如 `G5-MATH-0201`（跨环境稳定，API 对外暴露） |
| name | VARCHAR(128) | NOT NULL | 如"二次函数顶点式" |
| subject_id | INT | FK → subjects.id, NOT NULL | 所属学科 |
| parent_id | INT | FK → knowledge_points.id | 父级知识点（树形结构） |
| level | SMALLINT | NOT NULL, DEFAULT 1 | 层级深度 |
| sort_order | INT | DEFAULT 0 | 同级排序 |

**索引**: `(subject_id, parent_id)`, `(name)` GIN trigram（模糊搜索）

**示例数据（数学 + 英语双学科首发）**:

**编码规则**: `{Grade}-{Subject}-{Category}{Detail}`，如 `G5-MATH-0201` = 五年级·数学·02 大类·01 知识点。ID 用于内部关联，code 跨环境稳定，API 对外暴露。

```
数学 (id=1)
├── 代数 (id=10, code=G5-MATH-0100, parent=NULL)
│   ├── 一次函数 (id=101, code=G8-MATH-0101, parent=10)
│   │   ├── 斜率与截距 (id=1011, code=G8-MATH-0101-1, parent=101)
│   │   └── 一次函数应用 (id=1012, code=G8-MATH-0101-2, parent=101)
│   └── 二次函数 (id=102, code=G9-MATH-0102, parent=10)
│       ├── 顶点坐标 (id=1021, code=G9-MATH-0102-1, parent=102)
│       └── 图像性质 (id=1022, code=G9-MATH-0102-2, parent=102)
└── 几何 (id=20, code=G5-MATH-0200, parent=NULL)
    ├── 三角形 (id=201, code=G7-MATH-0201, parent=20)
    └── 圆 (id=202, code=G9-MATH-0202, parent=20)

英语 (id=2)
├── 语法 (id=200, code=G7-ENG-0100, parent=NULL)
│   ├── 时态 (id=2001, code=G7-ENG-0101, parent=200)
│   │   ├── 一般现在时 (id=20011, code=G7-ENG-0101-1, parent=2001)
│   │   └── 现在完成时 (id=20012, code=G8-ENG-0101-2, parent=2001)
│   ├── 从句 (id=2002, code=G8-ENG-0102, parent=200)
│   │   ├── 定语从句 (id=20021, code=G9-ENG-0102-1, parent=2002)
│   │   └── 状语从句 (id=20022, code=G8-ENG-0102-2, parent=2002)
│   └── 被动语态 (id=2003, code=G8-ENG-0103, parent=200)
├── 词汇 (id=300, code=G7-ENG-0200, parent=NULL)
│   ├── 词义辨析 (id=3001, code=G7-ENG-0201, parent=300)
│   └── 固定搭配 (id=3002, code=G8-ENG-0202, parent=300)
└── 阅读 (id=400, code=G7-ENG-0300, parent=NULL)
    ├── 主旨大意 (id=4001, code=G7-ENG-0301, parent=400)
    └── 细节理解 (id=4002, code=G7-ENG-0302, parent=400)
```

### 2.6 error_items

| 列 | 类型 | 约束 | 说明 |
|----|------|------|------|
| id | UUID | PK | 错题 ID |
| user_id | UUID | FK → users.id, NOT NULL | 所属用户 |
| subject_id | INT | FK → subjects.id | 学科 |
| image_url | VARCHAR(512) | NOT NULL | 原始图片 URL |
| thumbnail_url | VARCHAR(512) | | 缩略图 URL |
| question_text | TEXT | | AI 提取的题目文本 |
| wrong_answer | TEXT | | 错误答案 |
| correct_answer | TEXT | | 正确答案（可选） |
| error_type | VARCHAR(32) | | 错误类型 |
| difficulty | VARCHAR(8) | | 难度: basic/medium/advanced |
| verification_status | VARCHAR(16) | NOT NULL, DEFAULT 'raw' | raw/reviewed/corrected/stale |
| ai_confidence | JSONB | | AI 各字段置信度 |
| note | TEXT | | 学生备注 |
| created_at | TIMESTAMPTZ | NOT NULL, DEFAULT NOW() | |
| updated_at | TIMESTAMPTZ | NOT NULL, DEFAULT NOW() | |

**索引**: `(user_id, created_at DESC)`, `(user_id, subject_id)`, `(user_id, error_type)`, `(user_id, verification_status)`

**verification_status 枚举**:
- `raw` — AI 原始结果，用户尚未确认，不计入分析
- `reviewed` — 用户已确认（一键确认或查看后确认）
- `corrected` — 用户修正了至少一个 AI 字段
- `stale` — 30 天未确认，系统标记，可恢复为 raw

**ai_confidence JSONB 结构**:
```json
{
  "question_text": 0.92,
  "subject_id": 0.88,
  "knowledge_points": { "1021": 0.95, "1022": 0.73 },
  "error_type": 0.81,
  "correct_answer": 0.55
}
```

**error_type 枚举**:
- `knowledge_gap` — 知识点欠缺
- `careless` — 粗心失误
- `misread` — 审题偏差
- `concept_confusion` — 概念混淆

### 2.7 error_knowledge_points

错题与知识点的多对多关联表。

| 列 | 类型 | 约束 | 说明 |
|----|------|------|------|
| id | SERIAL | PK | |
| error_item_id | UUID | FK → error_items.id, NOT NULL | 错题 |
| knowledge_point_id | INT | FK → knowledge_points.id, NOT NULL | 知识点 |
| relevance | SMALLINT | DEFAULT 100 | 关联度 (0-100)，主关联=100 |

**唯一约束**: `(error_item_id, knowledge_point_id)`

### 2.8 correction_logs

用户修正 AI 识别结果的记录。P02 阶段用于微调自有模型。

| 列 | 类型 | 约束 | 说明 |
|----|------|------|------|
| id | UUID | PK | |
| error_item_id | UUID | FK → error_items.id, NOT NULL | 所属错题 |
| field_name | VARCHAR(32) | NOT NULL | 修正的字段名 |
| ai_value | JSONB | NOT NULL | AI 原始值 |
| user_value | JSONB | NOT NULL | 用户修正值 |
| ai_confidence | REAL | NOT NULL | 该字段 AI 置信度 |
| corrected_at | TIMESTAMPTZ | NOT NULL, DEFAULT NOW() | 修正时间 |

**索引**: `(error_item_id)`, `(field_name)`（P02 阶段按字段统计 AI 薄弱项）

**示例数据**:
```json
{
  "error_item_id": "uuid",
  "field_name": "knowledge_points",
  "ai_value": [1021],
  "user_value": [1022],
  "ai_confidence": 0.72,
  "corrected_at": "2026-05-26T10:30:00Z"
}
```

### 2.9 analysis_reports

| 列 | 类型 | 约束 | 说明 |
|----|------|------|------|
| id | UUID | PK | 报告 ID |
| user_id | UUID | FK → users.id, NOT NULL | |
| period_start | DATE | NOT NULL | 报告周期开始 |
| period_end | DATE | NOT NULL | 报告周期结束 |
| weak_points | JSONB | NOT NULL | 薄弱点数据 |
| error_type_distribution | JSONB | NOT NULL | 错误类型分布 |
| trend | VARCHAR(8) | | up/flat/down（与上周期对比） |
| generated_at | TIMESTAMPTZ | NOT NULL, DEFAULT NOW() | |

**weak_points JSONB 结构**:

```json
[
  {
    "knowledge_point_id": 1021,
    "name": "二次函数顶点坐标",
    "error_count": 5,
    "weight": 0.85,
    "trend": "up"
  }
]
```

**error_type_distribution JSONB 结构**:

```json
{
  "knowledge_gap": 12,
  "careless": 5,
  "misread": 3,
  "concept_confusion": 2
}
```

**注意**: AnalysisReport 仅统计 `verification_status != 'raw'` 的错题，确保分析基于用户确认过的数据。

### 2.10 question_bank（题库抽象层）

支持多题库源的统一抽象。自有题库（PDF 录入）和第三方题库（作业帮 API）通过统一接口接入。

**2.10.1 questions（题库题目）**

| 列 | 类型 | 约束 | 说明 |
|----|------|------|------|
| id | UUID | PK | 题目 ID |
| source | VARCHAR(16) | NOT NULL | 来源: self_built / zuoyebang / future_source |
| external_id | VARCHAR(128) | | 外部题库的原始 ID（自建为空） |
| subject_id | INT | FK → subjects.id, NOT NULL | 所属学科 |
| question_type | VARCHAR(16) | NOT NULL, DEFAULT 'choice' | 题型: choice/fill/calculation/word_problem/geometry/composite |
| question_text | TEXT | NOT NULL | 题目文本 |
| options | JSONB | | 选项（如 ABCD） |
| answer | TEXT | NOT NULL | 正确答案 |
| analysis | TEXT | | 解析 |
| difficulty | SMALLINT | DEFAULT 3 | 难度 1-5（1 基础 → 5 综合创新） |
| cognitive_level | SMALLINT | | 认知层次 1-6（布鲁姆: 记忆/理解/应用/分析/评价/创造，预留） |
| grade | VARCHAR(16) | | 适用年级 |
| variation_params | JSONB | | 变式参数（数字替换、条件变换，预留，Phase 3 启用） |
| status | VARCHAR(16) | DEFAULT 'active' | active/inactive |
| created_at | TIMESTAMPTZ | NOT NULL, DEFAULT NOW() | |

**索引**: `(subject_id, knowledge_point_id)`, `(source)`, `(grade)`

**2.10.2 question_knowledge_points**

题目与知识点的多对多关联（同 error_knowledge_points 模式）。

**2.10.3 pdf_import_tasks（PDF 导入任务）**

自有题库 PDF 导入的异步任务管理。

| 列 | 类型 | 约束 | 说明 |
|----|------|------|------|
| id | UUID | PK | 任务 ID |
| uploaded_by | UUID | FK → users.id, NOT NULL | 上传者 |
| file_url | VARCHAR(512) | NOT NULL | PDF 文件 URL |
| subject_id | INT | FK → subjects.id | 目标学科 |
| status | VARCHAR(16) | NOT NULL, DEFAULT 'pending' | pending/parsing/ai_extracting/review/complete/failed |
| parsed_count | INT | DEFAULT 0 | 解析出的题目数 |
| imported_count | INT | DEFAULT 0 | 成功导入的题目数 |
| error_log | JSONB | | 错误信息 |
| created_at | TIMESTAMPTZ | NOT NULL, DEFAULT NOW() | |

**PDF 导入管线**:
```
PDF 上传 → OCR 解析 → AI 结构化提取（题目/选项/答案/知识点）
  → 人工审核（校验解析结果） → 入库（source=self_built）
```

### 2.11 print_tasks（打印/PDF 输出任务）[P0]

| 列 | 类型 | 约束 | 说明 |
|----|------|------|------|
| id | UUID | PK | 任务 ID |
| user_id | UUID | FK → users.id, NOT NULL | 创建者 |
| error_item_ids | UUID[] | NOT NULL | 选中的错题 ID 列表 |
| output_mode | VARCHAR(8) | NOT NULL, DEFAULT 'pdf' | pdf / image |
| status | VARCHAR(16) | NOT NULL, DEFAULT 'pending' | pending/generating/complete/expired/failed |
| file_url | VARCHAR(512) | | 生成的 PDF/图片下载链接 |
| expires_at | TIMESTAMPTZ | | 下载链接过期时间（24h） |
| created_at | TIMESTAMPTZ | NOT NULL, DEFAULT NOW() | |

**清晰度优先级**: 结构化内容（题库匹配）> 增强图片（经图像预处理）> 原始图片

### 2.12 audit_logs（操作审计，预留 Phase 3）

Phase 3 引入完整数据主权方案时建表。字段预留: id, user_id, action, resource_type, resource_id, detail (JSONB), ip_address, created_at。

### 2.13 practice_recommendations (P1)

| 列 | 类型 | 约束 | 说明 |
|----|------|------|------|
| id | UUID | PK | 推荐记录 ID |
| user_id | UUID | FK → users.id, NOT NULL | |
| knowledge_point_ids | INT[] | NOT NULL | 目标知识点 |
| question_refs | JSONB | NOT NULL | 推荐题目引用 |
| completed | BOOLEAN | DEFAULT false | 是否完成 |
| score | SMALLINT | | 得分 |
| generated_at | TIMESTAMPTZ | NOT NULL, DEFAULT NOW() | |

## 3. Drizzle Schema 示例

```typescript
// src/server/src/db/schema.ts

import { pgTable, uuid, varchar, text, integer, smallint, date, jsonb, timestamp, boolean, uniqueIndex, index } from 'drizzle-orm/pg-core';

export const users = pgTable('users', {
  id: uuid('id').defaultRandom().primaryKey(),
  wxOpenid: varchar('wx_openid', { length: 128 }).unique().notNull(),
  nickname: varchar('nickname', { length: 64 }),
  avatarUrl: varchar('avatar_url', { length: 512 }),
  grade: varchar('grade', { length: 16 }),
  role: varchar('role', { length: 16 }).default('student').notNull(),
  invitationCode: varchar('invitation_code', { length: 16 }).unique(),
  createdAt: timestamp('created_at', { withTimezone: true }).defaultNow().notNull(),
  updatedAt: timestamp('updated_at', { withTimezone: true }).defaultNow().notNull(),
});

export const errorItems = pgTable('error_items', {
  id: uuid('id').defaultRandom().primaryKey(),
  userId: uuid('user_id').references(() => users.id).notNull(),
  subjectId: integer('subject_id').references(() => subjects.id),
  imageUrl: varchar('image_url', { length: 512 }).notNull(),
  thumbnailUrl: varchar('thumbnail_url', { length: 512 }),
  questionText: text('question_text'),
  wrongAnswer: text('wrong_answer'),
  correctAnswer: text('correct_answer'),
  errorType: varchar('error_type', { length: 32 }),
  difficulty: varchar('difficulty', { length: 8 }),
  verificationStatus: varchar('verification_status', { length: 16 }).default('raw').notNull(),
  aiConfidence: jsonb('ai_confidence'),
  note: text('note'),
  createdAt: timestamp('created_at', { withTimezone: true }).defaultNow().notNull(),
  updatedAt: timestamp('updated_at', { withTimezone: true }).defaultNow().notNull(),
}, (table) => ({
  userIdCreatedIdx: index('idx_error_items_user_created').on(table.userId, table.createdAt.desc()),
  userIdSubjectIdx: index('idx_error_items_user_subject').on(table.userId, table.subjectId),
  userIdStatusIdx: index('idx_error_items_user_status').on(table.userId, table.verificationStatus),
}));

export const correctionLogs = pgTable('correction_logs', {
  id: uuid('id').defaultRandom().primaryKey(),
  errorItemId: uuid('error_item_id').references(() => errorItems.id).notNull(),
  fieldName: varchar('field_name', { length: 32 }).notNull(),
  aiValue: jsonb('ai_value').notNull(),
  userValue: jsonb('user_value').notNull(),
  aiConfidence: real('ai_confidence').notNull(),
  correctedAt: timestamp('corrected_at', { withTimezone: true }).defaultNow().notNull(),
}, (table) => ({
  errorItemIdIdx: index('idx_correction_logs_error_item').on(table.errorItemId),
  fieldNameIdx: index('idx_correction_logs_field').on(table.fieldName),
}));
```

## 4. 数据量预估

| 表 | MVP 年末预估 | 增长速度 |
|----|-------------|----------|
| users | 10K | 线性（含邀请裂变） |
| user_relations | ~10K | 每用户 1 条邀请关系 |
| error_items | 500K | 每用户日均 2-3 道 |
| knowledge_points | ~5K (预置) | 版本更新追加 |
| analysis_reports | 40K | 每用户每周 1 份 |
| error_knowledge_points | 1M | 每错题 1-3 条关联 |
| correction_logs | ~200K | 每错题平均修正 0.5-1 个字段 |
| questions | 50K+ | 自有 PDF 导入 + 作业帮 API 同步 |
| question_knowledge_points | 100K | 每题 1-3 条关联 |

MVP 单表最大 500K 行，PostgreSQL 单实例完全可承载，无需分库分表。

---

*关联: 模块设计.md → 总体架构.md*