# Import System

Anwal Growth Platform supports two Excel import paths aligned with market database workflows: **simple** (single sheet, user-mapped columns) and **processed** (multi-sheet GPT/ChatGPT output).

Implementation: `src/server/services/*-import.service.ts`, APIs under `/api/imports/*`, UI at `/imports`.

---

## Import lifecycle

```mermaid
flowchart LR
  Upload[Upload file] --> Detect[Detect type + mapping]
  Detect --> Preview[Preview counts]
  Preview --> Execute[POST execute 202]
  Execute --> Poll[GET status poll]
  Poll --> Log[ImportLog progress + summary]
  Log --> CRM[Companies / People / Contacts]
```

1. **Upload** — `.xlsx`, `.xls`, or `.csv` (max 15MB)
2. **Detect** — simple vs processed; auto column mapping for simple
3. **Preview** — row counts, issues, no DB writes
4. **Execute** — returns `importLogId` immediately (`202`); import runs in background
5. **Poll** — `GET /api/imports/status/:importLogId` for stage, percent, summary counters
6. **Summary** — UI shows created/updated/skipped counts (not full record dumps)

### Deduplication (simple + processed)

**Companies:** website/domain → LinkedIn → exact name → fuzzy name (≥ 0.9)

**People:** email → mobile → company + full name → fuzzy name (≥ 0.9)

**Contacts:** `contactType` + `normalizedValue`

On match: enrich missing fields only; skip duplicate if nothing to add.

### ImportLog progress stages

`uploaded` → `parsing` → `importing_companies` → `importing_people` → `importing_contacts` → `importing_service_matches` → `finalizing` → `completed` / `failed`

Stored fields: `progressStage`, `progressPercent`, `progressMessage`, `processedRows`, `totalRows`, plus granular `*Created`, `*Updated`, `*SkippedDuplicate` counters and `durationMs`.

---

## Simple import

### Overview

- One sheet with columns such as company name, person name, email, mobile, phone, city, sector.
- **Smart auto-mapping** infers column meaning (English, Arabic, mixed headers) — no manual JSON required.
- User may **review or override** mapping in the UI before preview/execute.
- Creates/updates **Company**, **Person**, **ContactPoint** records.

### Smart mapping (`import-mapping.service.ts`)

On detect, preview, or execute (when mapping JSON is empty or missing required fields):

1. Normalize headers (trim, Arabic alef forms, punctuation).
2. Layered detection: exact aliases → contains → token overlap → Arabic hints → value patterns (email, Saudi mobile, URL, names).
3. Confidence threshold **≥ 0.6** for automatic assignment.
4. Conflict resolution: highest-confidence field/header pairs win; unmapped columns listed in `unmappedHeaders`.

**Required mapping:** at least one of `companyName`, `personName`, `email`, `mobile` must be mapped before simple import proceeds.

### Mapping fields (`SimpleImportColumnMapping`)

| System field | Description |
|--------------|-------------|
| `companyName` | Arabic/primary company name → `nameAr` |
| `personName` | Full name |
| `email`, `mobile`, `phone`, `website` | Contact points |
| `city`, `region`, `sector`, `activity`, `jobTitle`, `notes` | Metadata |

### Example mappings (auto-detected)

English:

```json
{
  "companyName": "company_name",
  "personName": "person_name",
  "mobile": "mobile"
}
```

Arabic:

```json
{
  "companyName": "اسم الجهة",
  "personName": "اسم المسؤول",
  "mobile": "رقم الجوال"
}
```

Mixed:

```json
{
  "companyName": "Organization",
  "personName": "Contact Person",
  "mobile": "WhatsApp"
}
```

---

## Processed import

### Overview

Multi-sheet workbook prepared outside the app (GPT/ChatGPT). Sheets map to MongoDB models. External IDs (`entity_id`, `person_id`) are resolved to MongoDB ObjectIds during import via in-memory maps.

### Required sheets

| Sheet | Model |
|-------|--------|
| `Entities` | `Company` |
| `People` | `Person` |
| `Contact_Points` | `ContactPoint` |

### Optional sheets

| Sheet | Model / behavior |
|-------|------------------|
| `Service_Matching` | `ServiceMatch` |
| `Data_Quality` | Ignored (handled in GPT preprocessing) |
| `Duplicate_Review` | Ignored (handled in GPT preprocessing) |
| `Campaign_Ready` | Preview only (leads — future task) |
| `Import_Summary` | Reference only (stats not overwritten automatically) |

### Expected columns (processed)

**Entities:** `entity_id`, `source_file`, `entity_name_ar`, `entity_name_en`, `entity_type`, `sector`, `sub_sector`, `activity`, `city`, `region`, `country`, `website`, `linkedin_url`, `address`, `description`, `data_confidence`, `notes` (legacy aliases `name_ar`, `name_en` still accepted)

**People:** `person_id`, `entity_id`, `company_name_ar`, `full_name`, `job_title`, `department`, `seniority_level`, `is_decision_maker`, `email`, `mobile`, `phone`, `linkedin_url`, `data_confidence`, `notes`

**Contact_Points:** `entity_id`, `person_id`, `entity_type`, `contact_type`, `contact_value`, `is_primary`, `is_verified`, `data_confidence`, `notes`

**Service_Matching:** `entity_id`, `company_name_ar`, `sector`, `activity`, `recommended_service`, `fit_score`, `fit_reason`, `priority`

**Data_Quality:** `issue_type`, `issue_description`, `severity`, `recommended_fix`, `record_reference`

**Duplicate_Review:** `group_id`, `entity_type`, `entity_name_1`, `entity_name_2`, `similarity_reason`, `confidence`, `recommended_action`

Headers are matched flexibly (case/spacing/underscore insensitive).

---

## Deduplication strategy

| Entity | Match keys | Action |
|--------|------------|--------|
| Company | `nameAr`, normalized `website` | Update existing when obvious match |
| Person | `email`, `mobile`, `fullName` + `companyId` | Update/skip create |
| Contact point | `contactType` + `normalizedValue` | Skip duplicate create |
| Ambiguous company | Conflicting `entity_id` maps | Dedup to existing company; logged in import `issues` (no `DuplicateReview` record) |

Uncertain merges do **not** auto-merge — conflicts are noted in the import result only.

---

## Error handling

- Row-level try/catch: one bad row does not abort the batch.
- `ImportLog.status`: `completed`, `partially_completed`, or `failed`.
- Issues collected in API response; high-severity sheet gaps fail validation before execute.
- API returns safe messages only (no stack traces).

---

## Data quality

- Cleaning and duplicate resolution are expected **before** import (GPT preprocessing).
- Simple import: missing identity → import `issues` array only (no `DataQualityIssue` record).
- `Data_Quality` and `Duplicate_Review` sheets are **not** imported into CRM collections.
- Normalization: `src/lib/normalize.ts` (email, phone, URL, confidence, priority).

---

## Detection rules

- Sheets include `Entities`, `People`, `Contact_Points` → **processed** (high confidence).
- Otherwise → **simple**.
- UI can force type via `importType` form field.

---

## API summary

| Method | Path | Auth |
|--------|------|------|
| POST | `/api/imports/detect` | Yes |
| POST | `/api/imports/preview` | Yes |
| POST | `/api/imports/execute` | Yes |
| GET | `/api/imports/history` | Yes |

See [03-api-specification.md](./03-api-specification.md).

---

## Future support

- Visual column mapper (replace JSON textarea)
- `Campaign_Ready` → `Lead` creation
- Async import jobs (Redis/queue) for large files
- Downloadable error CSV per `ImportLog`
- Optional re-import of legacy `Data_Quality` / `Duplicate_Review` sheets (if needed)

---

## Testing with processed file

Use **`Anwal_CRM_Processed_Import.xlsx`** (or any workbook with the sheets above):

1. Log in → `/imports`
2. Upload file → **كشف النوع** (should detect `processed`)
3. **معاينة** — verify sheet row counts
4. **تنفيذ الاستيراد** — verify `ImportLog` in history and data in MongoDB

Ensure `MONGODB_URI` and auth env vars are set in `.env.local`.
