ADR 0042 — anomaly 가설 자동화 파이프라인 (LLM context + 검증 loop + opsKnowledge RAG)

날짜: 2026-05-08 상태: Accepted Tier: A (Architecture · ops 코파일럿 코어) 적용 범위: apps/ops/app/(ops)/dashboard/ (anomaly slot) · packages/business-logic/src/ops-anomaly/ (신설) · apps/web/lib/ai/ (LLM 호출 — server action 경유) · Firestore opsKnowledge 컬렉션 (신설)

결정문

ADR 0040 의 anomaly slot 을 채우는 4-layer LLM 파이프라인을 신설한다. 시그널 수집 → 너홀로프로-aware context builder → LLM 가설 생성 → 검증 closed-loop. 사용자 워딩 "어떻게 개선해야지" 의 본질. 단순 임계값 alert 가 아니라 자기 코드·ADR·post-mortem·메모리를 self-RAG 한 운영 인사이트 가 본 ADR 의 차별점.

배경

ADR 0040 PR-1 의 dashboard v2 는 anomaly slot 을 placeholder 로 두었고, 정적 if-else 가설만 표시한다 (recall < 0.05 → "publicDocuments statute coverage 부족"). PR-5 에서 정정한 사례 (statute coverage 부족 ❌ → sourceType filtering 미적용 ✅) 가 보여주듯, 정적 룰 기반 가설은 실측과 어긋날 위험이 크다.

너홀로프로 만이 가능한 차별점:

ADR 33 개 + post-mortem · retrospective 다수 + 메모리 50+ entries — 자기 운영 prior 풍부
ADR 0021 Platform RAG · ADR 0032 RAG 평가 인프라 ship — self-RAG 부품 모두 준비
ragEvalCases 14 + ragMergeQuality 시계열 — anomaly 입력 시그널 풍부

다른 SaaS ops 콘솔이 못 하는 이유는 자기 prior 를 self-RAG 하지 않아서. 우리는 인프라 있다.

결정 — 4 Layer 설계

Layer 1 — 시그널 수집 (multi-source)

신뢰도 분해 4 axis 별 source + cross-축:

시그널 source	컬렉션	무엇을 측정
학습 회귀 통과율	`tenants/{tid}/ragMergeQuality/{date}`	recall · precision baseline 추세
편집률	`tenants/{tid}/draftDiffs/*`	jaccardSimilarity · editIntensity 7d 분포
docgen 성공률	`tenants/{tid}/docgenEvents/*`	failureReason · elapsedMs · outcome
exemplar selection	`tenants/{tid}/refinementFeedback/*`	up/down 비율 · half-life 90일
Functions health	gcloud Logging API	severity≥ERROR · memoryMb · cold start
Firestore index	`firestore.indexes.json` + 정적 grep	composite index 누락 · range-after-orderby
cross-tenant 기준선	aggregateDocgenStats · aggregateRelatedMemoriesStats	다른 사무소 평균 (sliding window)

packages/business-logic/src/ops-anomaly/signals.ts 에 통합 collector — 각 source 는 pure async 함수, 실패 시 silent empty (다른 source 영향 없음).

Layer 2 — LLM Context Builder (너홀로프로-aware self-RAG)

opsKnowledge 컬렉션 신설 — Firestore root, 임베딩 source = ADR + post-mortem + 메모리 + past resolved hypotheses. 새 LLM 가설 생성 시 anomaly fingerprint 로 findNearest top 5 → context 에 prior 인용:

opsKnowledge/{docId} schema:
{
  source: "adr" | "retrospective" | "memory" | "past-hypothesis",
  title: string,
  content: string,                  // ADR 결정문 또는 메모리 본문
  embedding: { version, dims, vector },
  embeddedAt: Timestamp,
  metadata: {
    adrNumber?: string,
    fileName?: string,
    pastHypothesisFingerprint?: string,
  },
}

입력 context 구조 (LLM prompt):

사무소 {id} · 24h 시그널:
  학습 회귀: recall 2.4% (정상 >5%, 임계 baseline)
  편집률: 18% (목표 <15%, 7d ↑ 6%p)
  docgen 실패: 28% (정상 5%, lawsuit-loan 4건/5건 TIMEOUT)

cross-tenant 비교:
  같은 docKind 다른 사무소 평균 9% 실패율
  abc 가입 30일 전, 사건 12건

자기-RAG (top-3 prior, opsKnowledge findNearest):
  [1] ADR 0015 PR-7: 스캔 배치 ≤5건 메모리 권고
  [2] 2026-05-03 sweep: memoryMb<512 P0
  [3] past-hypothesis 2026-05-09 (정확 5/7): "이미지 OCR 메모리 부족"

→ JSON Zod schema 로 가장 가능성 높은 3 가설 + 추천 액션 ranking

LLM 호출 = server action 경유 (apps/web/lib/ai/generate-anomaly-hypothesis.ts 또는 packages/business-logic/src/ops-anomaly/generate.ts). 클라이언트 직접 호출 금지 (PII 무전파 + assist counter weight 1 차감).

Layer 3 — 가설 검증 closed-loop

운영자가 추천 액션 실행 → 1시간 후 시그널 재측정 → outcome 자동 기록:

tenants/{tid}/anomalyHypotheses/{id} schema:
{
  fingerprint: string,                // hash(사무소 + 시그널 source + 임계 위반 axis)
  generatedAt: Timestamp,
  signals: { sourceMetric, value, threshold, status },
  hypothesis: { description, confidence, evidenceFields, ragSources },
  proposedActions: [{ label, autoExecutable, scenarioCardId?, estimatedImpactMs? }],
  executedActionLabel?: string,
  outcome?: "pending" | "resolved" | "persisted" | "false_positive",
  resolvedAt?: Timestamp,
  resolutionLagMs?: number,
  operatorFeedback?: { match: boolean, comment?: string },
}

자주 맞는 가설 (정확도 ≥ 70% 누적 5회 이상) → opsKnowledge 의 past-hypothesis 로 자동 승격, 다음 LLM context 의 few-shot 예시.

Layer 4 — 메타-신뢰도 + 페일 모드 가드

위험	가드
LLM 환각 (없는 ADR 인용)	RAG citation verify (PR-1 의 verifyCitationExistence 재사용) — opsKnowledge docId 반드시 매칭
free-form 명령 추천	proposedActions = 등록된 scenario card · toggle 화이트리스트만 (`packages/business-logic/src/ops-anomaly/action-registry.ts`)
잦은 false positive	fingerprint 별 정확도 누적, < 60% 면 자동 dismiss + Anomaly badge "신뢰도 낮음"
비용 폭증	anomaly detect 시점만 LLM 호출 + 24h fingerprint 캐싱 (resolved/false_positive 전까지)
confidence 낮은 가설	< 0.6 면 표시 안 함 — "수동 점검" 카피만
cross-tenant mutation	proposedActions 의 `autoExecutable=true` 는 reversible (메모리 토글·flag 토글 등) 만. payment·data delete 는 `autoExecutable=false` 강제

opsKnowledge 시드 전략

Phase 1 (본 ADR 의 PR-1 starter):

4 source 모두 임베딩: ADR 33 개 + retrospectives 5 개 + memory MEMORY.md 의 200 lines + 신규 ragEvalCases 컨텍스트
scripts/embed-ops-knowledge.ts (신설) — 기존 ADR 0021 의 embedding 패턴 재사용
한 번 batch 시드 후 새 ADR 추가 시 incremental embedding (ADR file 의 frontmatter.sidebar_position 변경 detect)

Phase 2 (PR-2 후속):

past-hypothesis 자동 승격 (Layer 3 outcome 정확도 누적 후)
메모리 변경 추적 (MEMORY.md commit hash trigger embedding refresh)

dashboard slot 연결

ADR 0040 의 AnomalySlot 컴포넌트가 다음으로 진화:

// 변경 전 (PR-1, 정적 if-else)
if (rag.avgRecall < 0.05) {
  return <StaticHypothesisCard ... />;
}

// 변경 후 (ADR 0042)
const { hypothesis, loading, error } = useLatestAnomalyHypothesis({ tenantId, fingerprint });
if (loading) return <Skeleton />;
if (!hypothesis || hypothesis.confidence < 0.6) return <ManualCheckCard />;
return (
  <HypothesisCard
    hypothesis={hypothesis}
    onActionExecute={handleActionExecute}
    onFeedback={handleFeedback}
  />
);

useLatestAnomalyHypothesis hook = tenants/{tid}/anomalyHypotheses 의 fingerprint 별 최신 1건 onSnapshot. 가설 생성은 server action generateAnomalyHypothesisAction (Layer 2 + 3 통합).

합성 corpus 위 검증 (메모리 `close_the_learning_loop` 적용)

본 결정의 ship 판정:

demo-firm-memory 위에서 LLM 가설 5건 생성 (각기 다른 fingerprint)
각 가설의 confidence ≥ 0.6
citation verify (opsKnowledge docId 매칭) 100%
proposedActions 의 scenarioCardId 모두 존재 (action-registry 화이트리스트)

통과 안 하면 ship 표시 안 함 (ADR 0030 정합).

위험·완화

위험	완화
Gemini 비용 폭증	anomaly detect 시점만 호출, 24h fingerprint 캐시. 1 사무소 1일 평균 < 5 호출 예상
opsKnowledge 임베딩 stale	새 ADR/메모리 추가 시 embed-ops-knowledge.ts 재실행 또는 Cloud Scheduler 매주
잘못된 추천 액션 자동 실행	reversible 만 autoExecutable=true. 모든 액션 audit log + 1-click rollback
past-hypothesis 학습 편향	정확도 매트릭으로 dismiss. 운영자가 false positive 라벨 가능
ADR 0040 의 정적 슬롯 vs LLM 슬롯 동시	Phase 1 에서는 LLM 가설 없을 때 정적 fallback 유지 (PR-5 의 카피). 점진 마이그레이션

Non-goal

자율 mutation (사용자 승인 없이 cross-tenant write) — 무조건 confirmation
실시간 streaming LLM (Gemini SSE) — 단발 호출 only
ADR 0021 Platform RAG 의 publicDocuments 와 opsKnowledge 통합 — 의도적 분리 (다른 audience · 다른 임베딩 corpus)
multi-language hypothesis — 한국어 only
ADR 0040 의 정적 fallback 폐기 — Phase 2 에서 검토

구현 순서 (PR 단위)

PR	산출물	예상 시간
1	본 ADR (doc-only)	0:30
2	opsKnowledge 컬렉션 schema + scripts/embed-ops-knowledge.ts (시드 batch)	1:00
3	Layer 1 시그널 collector (`packages/business-logic/src/ops-anomaly/signals.ts`) + Zod schema	1:00
4	Layer 2 LLM context builder + generateAnomalyHypothesisAction server action	1:30
5	Layer 3 검증 loop (anomalyHypotheses 컬렉션 writer + outcome 측정)	1:00
6	dashboard AnomalySlot LLM 가설 표시 (정적 fallback 유지)	0:30
7	Layer 4 메타-신뢰도 + action-registry 화이트리스트	1:00
8	합성 corpus 검증 + 메모리 갱신	0:30

총 약 7 시간. 보수 추정 6 PR · 도전 추정 8 PR.

측정 가능한 성공 지표

LLM 가설 생성 5건 모두 confidence ≥ 0.6 (Phase 1 검증)
citation verify 100% (opsKnowledge docId 매칭 누락 0)
추천 액션 모두 scenarioCardId 또는 toggle 화이트리스트 (free-form 0)
past-hypothesis 자동 승격 5회 누적 후 LLM context 에 인용 (Phase 2)
운영자 클릭 깊이 — anomaly drill-down 1 클릭 + 액션 실행 1 클릭 (총 2 클릭)

후속 ADR

ADR 0043 (가칭) — anomaly 가설 시계열 분석 (시간대별 패턴 · 사무소 cohort)
ADR 0044 (가칭) — cross-pollinate 인프라 ("qrs 모범 패턴 → 다른 사무소" — exemplar/prompt 공유)

변경 이력

버전	날짜	변경
1.0	2026-05-08	초안 — 4 layer + opsKnowledge self-RAG + 검증 closed-loop + 메타-신뢰도

결정문​

배경​

결정 — 4 Layer 설계​

Layer 1 — 시그널 수집 (multi-source)​

Layer 2 — LLM Context Builder (너홀로프로-aware self-RAG)​

Layer 3 — 가설 검증 closed-loop​

Layer 4 — 메타-신뢰도 + 페일 모드 가드​

opsKnowledge 시드 전략​

dashboard slot 연결​

합성 corpus 위 검증 (메모리 close_the_learning_loop 적용)​

위험·완화​

Non-goal​

구현 순서 (PR 단위)​

측정 가능한 성공 지표​

후속 ADR​

변경 이력​

결정문

배경