Development Guide¶
1. Backend Architecture¶
DblpService combines query APIs and build pipeline control in one FastAPI service.
Core layers:
- API layer (
app.py) - Bootstrap page and REST endpoints.
- Coauthor pair query APIs for CoAuthors.
- Pipeline orchestration layer (
PipelineManagerinapp.py) - Start/stop/reset state machine.
- Threaded execution and in-memory status snapshots.
- Build pipeline layer (
dblp_builder/pipeline.py) - Download DTD/XML.GZ.
- Decompress XML.
- Parse XML and rebuild SQLite + FTS indexes.
2. API Surface and Responsibilities¶
Query APIs¶
GET /api/health: schema readiness check.GET /api/stats: publication/author counters and data date.GET /api/pc-members: optional reviewer list.POST /api/coauthors/pairs: coauthor matrix + pair publication details.
Build/control APIs¶
GET /api/config: default build parameters.GET /api/state: pipeline runtime snapshot.GET /api/files: managed data files status.POST /api/start,/api/stop,/api/reset: pipeline lifecycle control.
3. Coauthor Query Path (/api/coauthors/pairs)¶
Execution flow:
- Normalize/deduplicate left/right author entries.
- Resolve candidate author IDs via exact match -> FTS -> LIKE fallback.
- Join
pub_authorstwice to compute intersections. - Read publication metadata from
publications. - Return matrix and per-pair publication lists.
Safety controls:
- Maximum authors per side (
MAX_ENTRIES_PER_SIDE). - Author resolve cap (
MAX_AUTHOR_RESOLVE). - Optional pair result limit and year filter.
4. Build Pipeline Internals (dblp_builder/pipeline.py)¶
Pipeline phases:
- URL validation and trusted-host download.
- XML decompression.
- SQLite rebuild (optional cleanup of existing db/wal/shm).
- XML iterparse with secure DTD resolver.
- Batch insert into:
publicationsauthorspub_authorstitle_ftsauthor_fts
PipelineManager updates status, step, progress, and log buffers for frontend polling.
5. Data Model¶
Main DB tables:
publications(id, title, year, venue, pub_type, raw_xml)authors(id, name)pub_authors(pub_id, author_id)title_fts,author_fts(FTS5 virtual tables)
SQLite tuning includes WAL, busy_timeout, and temp-store memory optimization.
6. Extensibility Notes¶
- Keep heavy build logic inside
dblp_builder/pipeline.py; keep route handlers thin. - Add new pipeline phases through progress callbacks so UI can observe them.
- If schema changes, update both
_ensure_fullmeta_schema()checks and builder initialization. - For new frontend controls, expose defaults in
/api/configfirst, then bind in UI.