Gainesville public data in a DuckDB file
fl.ddb is a static Gainesville/Alachua public-data dashboard.
The new dashboard is /data/fl/. It loads a checked-in DuckDB file, /data/fl.ddb, through DuckDB-Wasm after the page button is clicked. There is also a /data/meta/ dashboard for build and refresh metadata.
The current file is about 45 MiB: 14 public source pulls, 693,382 sanitized staged rows, and 46 DuckDB tables. The first version is mostly Gainesville and Alachua County: service requests, police/fire/crash activity, permits, code cases, transit, business licenses, public meetings, and legislative matters.
This follows the static-dashboard path from how the dashboards work and Zorto dashboards after DuckDB’s Quack protocol: build the data locally, commit the .ddb, serve it as a static asset, and let the browser run read-only SQL.
sources #
The dashboard currently pulls three families of public data:
| provider | feeds | examples |
|---|---|---|
| dataGNV / Socrata | 8 | 311/myGNV requests, crime responses, traffic crashes, building permits, code cases, fire/rescue responses, RTS ridership, business licenses |
| Legistar API | 4 | Gainesville and Alachua County public meeting archives and legislative matters |
| eSCRIBE | 2 | current Gainesville and Alachua County meeting calendars |
The database includes source_catalog, source_freshness, source_cache_status, source_fetch_log, and schema_notes. The dashboards read those tables for provider links, row counts, freshness labels, cache behavior, and table notes.
pipeline #
The public command is still:
bin/build-data
That wrapper now runs the master data pipeline:
bin/prediction builduv run --script pipelines/build_all_data.py, includingruffandtybin/prediction check-public
The master script runs pipelines/build_site_data.py for site.ddb and pipelines/build_fl_data.py for fl.ddb. Both builders are self-contained uv scripts. build_fl_data.py connects with Ibis on DuckDB and uses SQL for the larger transform blocks.
flowchart TB
socrata["dataGNV / Socrata<br/>311, safety, permits, code, transit, businesses"] --> fetch["pipelines/build_fl_data.py"]
legistar["Legistar API<br/>events + matters"] --> fetch
escribe["eSCRIBE calendars<br/>current city/county meetings"] --> fetch
fetch <--> cache["private/data/fl-source-cache.ddb<br/>ignored local source cache"]
fetch --> staging["staging tables<br/>public source rows"]
staging --> modeled["modeled tables<br/>service_requests, traffic_crashes,<br/>civic_items, public_safety_daily"]
modeled --> rollups["rollups + metadata<br/>monthly_activity, source_freshness,<br/>source_cache_status, dashboard_metrics"]
rollups --> ddb["static/data/fl.ddb"]
ddb --> page["/data/fl/ static shell"]
ddb --> meta["/data/meta/ static shell"]
page --> loadButton["load dashboard button"]
loadButton --> wasm["DuckDB-Wasm"]
meta --> wasm
wasm --> charts["Plotly charts + tables + SQL lab"]
The FL builder writes to a temporary database, runs privacy checks, then replaces static/data/fl.ddb only after the build succeeds. The public file is still rebuilt from scratch, but source ingest now has an ignored local cache. Socrata feeds use watermark lookbacks, Legistar refreshes bounded archive windows, and eSCRIBE refreshes recent calendar years after the first seed. source_cache_status and source_fetch_log make the cache behavior visible in the database.
Sanitization rules:
- raw addresses, owner names, contractor fields, phone numbers, and emails are not selected for the public tables
- latitude/longitude values are rounded to three decimals
- email-like strings, local file URLs, and absolute user paths are scrubbed from scalar values
- the build fails if forbidden byte or text patterns show up in the generated database
inline sample #
A small subset of the same database:
inline fl.ddb sample
query a few public-data rollups
The full dashboard has filters, tabbed views, freshness checks, and a SQL lab. The meta dashboard shows build duration, source fetches, cache state, and refresh history. This embed is just enough to prove the file is live.
what is in the file #
The public dashboard uses these table groups:
- source metadata:
meta_info,source_catalog,source_freshness,source_cache_status,source_fetch_log,schema_notes,build_runs - local services:
service_requests,service_request_type_rollup,service_request_status_rollup,service_request_district_rollup - safety:
crime_responses,fire_responses,traffic_crashes,public_safety_daily,safety_hourly_profile,traffic_crash_risk_monthly,traffic_crash_yearly_summary - growth:
building_permits,code_cases,businesses,development_monthly,business_type_rollup - civic/transit:
legistar_events,legistar_matters,escribe_meetings,civic_items,meeting_type_rollup,upcoming_meetings,rts_ridership - dashboard glue:
monthly_activity,dashboard_metrics
Stale feeds stay visible. For example, 311 and permits are useful historically, but they are not current in dataGNV right now. source_freshness shows that in the UI.
no live server yet #
This version is only a checked-in .ddb: no public Quack server, no database credentials in the browser, and no production scheduler. A batch job can refresh the local source cache, rebuild the same public file, and leave the website contract alone.