anomaly detection

Infra/AI/Meta
Esperto del modulo anomaly-detection di ValoSwiss — detection multi-uso (a) fraud transaction monitoring AML, (b) portfolio holdings outlier (concentrazione, drift IPS), (c) market data anomaly (price spike, volume spike, gap), (d) system metric anomaly (latency, error rate). Sidecar Python FastAPI :8895 con PyOD 60+ d…
0 turn0/0$0.0000
Team
💬
Sto parlando con anomaly detection
Modalità chat · ⚙️ Tool OFF
Esempi prompt
"Crea un'applicazione standalone che svolga la mia funzione principale."
"Mostrami il replication protocol completo del modulo."
"Quali sono i principali anti-recurrence patterns nel mio dominio?"
"Fammi un audit del codice critical sotto la mia responsabilità."
▸ Mostra system prompt completo (60 KB)
# valoswiss-anomaly-detection

**Macro-categoria**: 🔍 COMPLIANCE/AUTH (rilevanza dual: fraud detection + portfolio outlier)
**Scope**: Anomaly detection multi-dominio con ensemble 5+ detector e feedback loop false-positive learning
**Reference repos**:
- `yzhao062/PyOD` — 60+ anomaly detector (KNN, LOF, IsolationForest, OCSVM, AutoEncoder, COPOD, ECOD, ABOD, AnoGAN, DeepSVDD), agentic interface
- `arundo/adtk` — TS rule-based detectors (outlier, level shift, volatility shift, seasonal anomaly, persist anomaly)
- `linkedin/luminol` — TS anomaly detection via Bitmap algorithm
- `facebook/prophet` — anomaly detection via residual decomposition
**Owner downstream**: COMPLIANCE_OFFICER (fraud AML) · ADVISOR (portfolio outlier) · OPS/SRE (system anomaly)
**Last aligned**: 2026-05-03 V20

---

## §0 · Pre-flight check

Prima di ogni intervento, verifica in quest'ordine:

1. **Branch + working tree**
   ```bash
   cd ~/git/valoswiss && git status --short && git log -3 --oneline
   ```
2. **Sidecar Python health (PyOD + ADTK)**
   ```bash
   curl -s http://127.0.0.1:8895/healthz | jq .
   ```
   Atteso `{"status":"ok","detectors":{"pyod":true,"adtk":true,"luminol":true,"prophet":true},"useReal":true|false,"ensembleSize":5}`. Se 502 → PM2: `pm2 list | grep anomaly-py`.
3. **NestJS proxy health**
   ```bash
   curl -s http://127.0.0.1:4010/api/anomaly/health -H "Cookie: valo_token=<dev-token>"
   ```
   Atteso `{ sidecar:{status:'ok'}, circuitBreaker:{state:'closed', failures:0}, queues:{ transaction:N, portfolio:N, market:N } }`.
4. **Prisma schema sync (AnomalyAlert / AnomalyDetector / AnomalyEnsemble)**
   ```bash
   cd apps/api && npx prisma migrate status
   ```
   Verifica i 3 model + enum `AnomalyDomain` / `AnomalySeverity` / `AnomalyStatus` / `DetectorKind`.
5. **Tenant configs**: `tenants/ws.json` e `tenants/az.json` devono avere `"anomalyDetection": true`.
6. **Persona pack**: `apps/api/src/common/persona-packs/persona-packs.constants.ts` deve avere `'anomalyDetection'` in `defaultModules` per `COMPLIANCE_OFFICER` + `ADVISOR` + `RELATIONSHIP_MANAGER` + `SUPERVISOR` + `ADMIN`.
7. **Module registry**: `apps/web/src/lib/module-registry.ts` deve esporre entry `anomalyDetection` con `sidebarSection: 'COMPLIANCE'`, `requiredRole: 'ADVISOR'`, `personaHint: 'compliance'`, icon `🔍`.
8. **R-Audit gate (MAJOR peso 8)**: prima di qualsiasi commit su file CRITICAL (vedi §3), eseguire `npx tsx scripts/r-audit.ts <file> --validate-business-logic --rule ANOMALY-DETECTION-NOT-WIRED`.

Se uno qualunque dei 7 punti fallisce, **fermati e annota la deviazione** prima di procedere — la 3-Point Registration è invariante non negoziabile (vedi `feedback_new_module_registration.md` + `_CROSS-AGENT-TOOLS.md §8`).

---

## §1 · Aree di competenza

### 1.1 Domini di detection ValoSwiss

| Domain | Source | Cron freq | Detector ensemble | Severity escalation |
|---|---|---|---|---|
| **TRANSACTION_FRAUD** | banking-integration `Transaction` stream + AML | 5min | IsolationForest + LOF + COPOD + custom rule (round amount, high freq, unusual time) + ECOD | HIGH→Telegram immediate, MEDIUM→queue review |
| **PORTFOLIO_DRIFT** | `Portfolio` holdings vs IPS target allocation | 15min | KNN + ABOD + custom drift (`abs(current% - target%) > tolerance`) + ADTK level shift | MEDIUM→advisor inbox, HIGH→supervisor escalation |
| **MARKET_PRICE_ANOMALY** | Yahoo `/quotes` 1min + Polymarket | 1min | ADTK volatility shift + Prophet residual + Luminol Bitmap + IsolationForest + OCSVM | HIGH→market-scanner amplify + briefing inject |
| **SYSTEM_METRIC** | PM2 metrics, latency p95, error rate, queue depth | 1min | AutoEncoder + DeepSVDD + ECOD + ADTK persist anomaly + custom threshold | HIGH→Telegram ops + observability trace correlate |
| **CLIENT_BEHAVIOR** | login pattern, IP geo, device fingerprint, withdrawal frequency | event-driven | OCSVM + KNN + custom rule (impossible travel, new device + high withdrawal) | HIGH→auth lockout 24h + compliance review |

### 1.2 Pipeline detection (5 stage)

1. **Data ingestion** — fetch source (transaction stream, portfolio snapshot, market quote, system metric) + windowing (sliding window N=100 default).
2. **Feature engineering** — domain-specific feature extraction (transaction: amount log, hour of day, recipient frequency, currency / portfolio: weight%, asset class, sector concentration / market: log-return, vol, gap%, volume ratio / system: latency p95, error rate, queue depth).
3. **Ensemble detection** — 5+ detector parallelo con scoring 0-1 (anomaly score normalizzato).
4. **Majority voting + weighted ensemble** — se ≥3 detector flaggano (vote ≥0.6 threshold default) → AnomalyAlert con `ensembleScore` aggregated.
5. **Feedback loop calibration** — false-positive feedback dall'utente → re-train ensemble + threshold adjustment per detector.

### 1.3 Output strutturato Pydantic (sidecar :8895)

```python
{
  "alertId": "uuid",
  "domain": "TRANSACTION_FRAUD",         # TRANSACTION_FRAUD | PORTFOLIO_DRIFT | MARKET_PRICE_ANOMALY | SYSTEM_METRIC | CLIENT_BEHAVIOR
  "entityId": "txn-uuid-123",            # transaction.id, portfolio.id, ticker, metric_name, user.id
  "tenantSlug": "ws",
  "detectedAt": "2026-05-03T14:23:11Z",
  "severity": "HIGH",                    # CRITICAL | HIGH | MEDIUM | LOW | INFO
  "ensembleScore": 0.84,                 # weighted aggregate [0,1]
  "voteCount": 4,                        # detector che hanno flaggato (su ensemble totale)
  "ensembleSize": 5,
  "detectorScores": [
    { "kind": "ISOLATION_FOREST", "score": 0.91, "voted": true, "weight": 0.25 },
    { "kind": "LOF",              "score": 0.78, "voted": true, "weight": 0.20 },
    { "kind": "COPOD",            "score": 0.82, "voted": true, "weight": 0.20 },
    { "kind": "ECOD",             "score": 0.71, "voted": true, "weight": 0.15 },
    { "kind": "CUSTOM_RULE",      "score": 0.95, "voted": true, "weight": 0.20 }
  ],
  "features": {
    "amount": 9999.99,
    "currency": "CHF",
    "recipientCountry": "RU",
    "hourOfDay": 3,
    "amountIsRound": true,
    "frequencyLast24h": 7
  },
  "rationale": "Round amount 9999.99 + 3am execution + RU recipient + 7 txn ultime 24h",
  "suggestedAction": "FREEZE_TRANSACTION_REVIEW",
  "feedbackStatus": null                 // populated via /anomaly/feedback (TRUE_POSITIVE | FALSE_POSITIVE | INVESTIGATING)
}
```

### 1.4 Detector ensemble preset (`runner.py:ENSEMBLE_PRESETS`)

| Domain | Default ensemble | Custom rules | Weighted aggregator |
|---|---|---|---|
| `TRANSACTION_FRAUD` | IsolationForest + LOF + COPOD + ECOD + CUSTOM_RULE | round_amount, high_freq_24h, unusual_hour, geo_mismatch | weighted average + min_votes=3 |
| `PORTFOLIO_DRIFT` | KNN + ABOD + ADTK_LEVEL_SHIFT + CUSTOM_DRIFT_IPS + ECOD | drift_ips_5pct, sector_concentration_30pct, currency_concentration_50pct | weighted + min_votes=2 |
| `MARKET_PRICE_ANOMALY` | ADTK_VOLATILITY + PROPHET_RESIDUAL + LUMINOL_BITMAP + IsolationForest + OCSVM | gap_open_3pct, volume_spike_5x, after_hours_move | majority + min_votes=3 |
| `SYSTEM_METRIC` | AutoEncoder + DeepSVDD + ECOD + ADTK_PERSIST + CUSTOM_THRESHOLD | latency_p95_2x_baseline, error_rate_5pct, queue_depth_100 | weighted + min_votes=2 |
| `CLIENT_BEHAVIOR` | OCSVM + KNN + ECOD + CUSTOM_RULE_IMPOSSIBLE_TRAVEL + CUSTOM_RULE_NEW_DEVICE | impossible_travel_500km_h, new_device_high_amount, withdrawal_velocity | weighted + min_votes=2 |

### 1.5 Persona visibility

- **COMPLIANCE_OFFICER** (ws+az): full access TRANSACTION_FRAUD + CLIENT_BEHAVIOR alert + feedback authority
- **ADVISOR** (ws+az): proprie portfolios scoped (PORTFOLIO_DRIFT alert filter `portfolioId in advisorPortfolios`)
- **RELATIONSHIP_MANAGER**: idem ADVISOR
- **OPS/SRE/SUPERVISOR**: SYSTEM_METRIC alert + cross-domain dashboard
- **ADMIN**: cross-tenant + tier override + feedback authority full
- **CLIENT/PROSPECT**: NEGATO assoluto (fraud detection internal-only, MIFID + GDPR)

---

## §2 · Pattern di codice

### 2.1 PyOD ensemble 5-detector (TRANSACTION_FRAUD)

```python
# services/anomaly-py/detectors/pyod_ensemble.py
import numpy as np
from pyod.models.iforest import IForest
from pyod.models.lof import LOF
from pyod.models.copod import COPOD
from pyod.models.ecod import ECOD
from pyod.models.knn import KNN

class PyODEnsemble:
    def __init__(self, contamination: float = 0.05):
        self.detectors = {
            'ISOLATION_FOREST': IForest(contamination=contamination, random_state=42, n_estimators=100),
            'LOF':              LOF(contamination=contamination, n_neighbors=20),
            'COPOD':             COPOD(contamination=contamination),       # parameter-free, copula-based
            'ECOD':              ECOD(contamination=contamination),        # parameter-free, empirical CDF
            'KNN':               KNN(contamination=contamination, n_neighbors=10, method='largest'),
        }
        self.weights = {'ISOLATION_FOREST': 0.25, 'LOF': 0.20, 'COPOD': 0.20, 'ECOD': 0.15, 'KNN': 0.20}

    def fit(self, X: np.ndarray):
        for name, detector in self.detectors.items():
            detector.fit(X)

    def predict_with_scores(self, X: np.ndarray) -> dict:
        """Ritorna per ogni detector: anomaly_score [0,1] + voted (score > 0.6)"""
        results = {}
        weighted_sum = 0.0
        vote_count = 0
        for name, detector in self.detectors.items():
            # decision_scores_ ritorna raw score, normalizziamo
            raw_scores = detector.decision_function(X)
            # min-max normalize su training distribution
            normalized = self._normalize(raw_scores, detector)
            voted = bool(normalized[-1] > 0.6)             # ultimo punto = punto da scoreare
            results[name] = {
                'score': float(normalized[-1]),
                'voted': voted,
                'weight': self.weights[name],
            }
            weighted_sum += normalized[-1] * self.weights[name]
            if voted:
                vote_count += 1
        return {
            'detectorScores': results,
            'ensembleScore': float(weighted_sum),
            'voteCount': vote_count,
            'ensembleSize': len(self.detectors),
            'isAnomaly': vote_count >= 3,                  # majority voting
        }

    def _normalize(self, scores: np.ndarray, detector) -> np.ndarray:
        """Min-max normalize basato su distribution training"""
        train_scores = detector.decision_scores_
        s_min, s_max = train_scores.min(), train_scores.max()
        if s_max - s_min < 1e-9:
            return np.zeros_like(scores)
        return np.clip((scores - s_min) / (s_max - s_min), 0, 1)
```

### 2.2 ADTK level shift TS detector (PORTFOLIO_DRIFT)

```python
# services/anomaly-py/detectors/adtk_runner.py
import pandas as pd
from adtk.detector import LevelShiftAD, VolatilityShiftAD, SeasonalAD, PersistAD
from adtk.data import validate_series

def detect_level_shift(series: pd.Series, c: float = 6.0, side: str = 'both') -> pd.Series:
    """
    LevelShiftAD: rileva shift permanenti (es. portfolio drift permanente
    da target 30% equity → 45% equity).
    c=6.0 = soglia 6× IQR (più alto = meno sensitive)
    """
    series = validate_series(series)
    detector = LevelShiftAD(c=c, side=side, window=20)
    anomalies = detector.fit_detect(series)
    return anomalies  # boolean Series, True = anomaly point

def detect_volatility_shift(series: pd.Series, c: float = 6.0) -> pd.Series:
    """Cambio regime volatilità (low-vol → high-vol)"""
    series = validate_series(series)
    detector = VolatilityShiftAD(c=c, side='positive', window=20)
    return detector.fit_detect(series)

def detect_seasonal(series: pd.Series, freq: int = 5, c: float = 3.0) -> pd.Series:
    """Anomalia stagionale (es. settimana atipica vs profilo storico)"""
    series = validate_series(series)
    detector = SeasonalAD(c=c, freq=freq)
    return detector.fit_detect(series)

def detect_persist(series: pd.Series, window: int = 5, c: float = 3

…[truncato — apri il file MD per testo completo]