The user's water source declaration trumps EPA's map. EPA's national CWS service-area layer has documented coverage gaps, and established city addresses can sit in the holes between polygons. So when a user says during onboarding that they're on city water, Hearth runs a nearest-polygon fallback against a 500-meter buffer before it ever falls through to “we couldn't pinpoint your utility.” Trusting what the user told us over a map with known holes gives a more honest answer than quietly treating a polygon miss as a private well.
Serial-number decoding runs on a separate reasoning model. The streaming Research-this-model pipeline uses a fast non-reasoning model to keep latency down. Decoding a manufacture date from a serial number is a different kind of problem, because it demands determinism. In testing, non-reasoning models would invent a plausible-looking decoding rule on each call and then confidently apply it to that same serial. Moving the decode into a parallel call on a reasoning model, under a strict protocol of name the rule, apply it, verify internal consistency, and return null if any step fails, eliminated the hallucinated dates while leaving the rest of the pipeline just as fast.
Soft-fail at every level. Every external data source can fail, whether it's Mapbox, EPA Envirofacts, FEMA NFHL, EPA's drinking-water APIs, or the AI Gateway. Hearth is built so that when one does, the user-visible result degrades to “we don't know yet” instead of a confidently wrong answer, and unrelated features keep working. The activity log on each habitat finding records which sources answered and which didn't, so the user can see exactly what was checked.
JSONB columns earn promotion to real columns only when a query pattern demands it. The inventory and document tables both carry a `metadata` JSONB column for subtype-specific fields that don't need cross-row queries yet, things like vehicle VINs and plate states, pet microchip numbers, or a receipt's vendor and line items. When a query pattern finally needs one of those fields as a real column, say, aggregating values for an insurance valuation, that shape migrates out of JSONB into a column of its own. Until then, JSONB keeps the schema small and the path to a new feature short.
Workflow errors are sorted into terminal vs. retryable. Hearth's durable jobs run on Vercel Workflows, where a naive policy retries every failure as if it were transient. Hearth instead marks its orchestrator-internal errors as non-retryable: constraint violations, missing entities, a failed write. Those surface a deterministic bug in about two seconds rather than grinding through roughly thirty seconds of pointless retries, while genuinely transient errors like a flaky EPA call or a gateway hiccup still retry. That one distinction turns the workflow layer into a fast feedback loop during development instead of a latency tax.
Data shared across homes is ingested once and cached for everyone. Some facts aren't per-house. A water utility publishes one annual quality report, and it's identical for every home on that system. Rather than ingest and summarize it once per user, Hearth keys the report by utility and year and serves a single cached row to every house the utility covers. The same pattern fits any data shared across entities, and it saves redundant ingestion, duplicate LLM cost, and needless load on slow government APIs.
Findings go stale-but-labeled instead of eagerly regenerating. When upstream data changes, Hearth ingests it, works out which homes are affected, and writes a notification, but it stops short of regenerating any findings. The existing finding stays put, clearly labeled as stale, until the user decides to reanalyze. There's a cost asymmetry at play: regenerating findings nobody is looking at is wasted money, and the notification on its own already tells the user the system is watching, without any silent background churn.