TorontoBot

TorontoBot is a tool for querying Toronto Open Data. It answers questions either on the command line or as a Discord bot.

To help explain, here are some slides.

We're looking for a Go developers interested in working with LLMs to help us turn TorontoBot into a richly-featured, civic assistant.

Usage

Join our Discord. In the "Open Data" channel, using a slash command:

/torontobot <your-query-here>

This bot is brand new, so go easy on it if it doesn't get things right. It can answer questions about the operating budget surprisingly well!

To run queries locally you'll need an OpenAI API token. Get one at openai.com.

As of right now, TorontoBot can answer questions about:

The City of Toronto operating budget
311 service requests
Speed camera ticket information
StatsCan apartment price index.

There are plenty of examples in th Discord. Don't hesitate to ask TorontoBot how it can help!

Dependencies

This project requires the Go programming language and sqlite3 installed on your local system. Both very easy to install.

Ingest

First thing to do is initialize the database. We're building our proof-of-concept with a single data source: the City of Toronto approved operating budget. This file has good quality from 2014 and because it's one big flat table covering the whole city budget, it's useful for a lot of different queries.

We need to create a local database file containing the target data so we can query it locally. Bear in mind, if you want to include 311 request data you are in for a wait.

This project uses golang-migrate/migrate to manage migrations and you can install it like so:

 $~/code/torontobot/db> go  install -tags 'sqlite3' github.com/golang-migrate/migrate/v4/cmd/migrate

That will add a migrate binary to your Go binary path.

Next, you'll need to intialize a fresh, empty database. Do it like this (in the db dir):

 $~/code/torontobot/db> migrate -path db/migrations -database "sqlite3://db/toronto.db" up

That will create toronto.db, the sqlite3 database we'll use to store our data.

Then, from the ingest directory, run the ingest script:

 $~/code/torontobot/ingest> go run .

Over the course of the next several minutes, this script will download City of Toronto operating budget data for the years 2014 through 2023 collating and storing every entry in our database file.

Then, over several more hours it will load each of the 311 service request files for the last ~13 years and write those to the database as well.

Usage

Start the bot like so:

 $~/code/torontobot> go run . --openai-token <your-openai-token-here>

Now you can do something like this:

>> What are the 8 most expensive programs?
% sending request to openai...
Torontobot: The operating_budget table has columns for id, program, service, activity, entry_type, category, subcategory, item, year, and amount. We will be using the program and amount columns.

We will be looking at the amount column to determine program costs, and sorting by descending order to find the 8 most expensive programs.

Executing query "WITH program_costs AS (SELECT program, SUM(amount) AS total_cost FROM operating_budget WHERE entry_type = 'expense' GROUP BY program) SELECT program, total_cost FROM program_costs ORDER BY total_cost DESC LIMIT 8;"

Query result:
+-------------------------------------------+-------------------+
| PROGRAM (TEXT)                            | TOTAL_COST ()     |
+-------------------------------------------+-------------------+
| Toronto Transit Commission - Conventional | $2,237,543,963.48 |
| Toronto Police Service                    | $1,330,625,706.88 |
| Capital & Corporate Financing             | $1,204,852,706.30 |
| Toronto Employment & Social Services      | $1,153,609,587.01 |
| Children's Services                       | $1,108,471,252.28 |
| Housing Secretariat                       | $845,624,329.06   |
| Shelter, Support & Housing Administration | $707,949,472.87   |
| Non-Program Expenditures                  | $673,064,907.78   |
+-------------------------------------------+-------------------+

>>

Adding a new dataset

There are three steps required to add a new dataset.

Figure out what schema to use. (easy)
Write ingest script for the dataset. (hard)
Add table description to tables.json5 (very easy)

Don't hesitate to give a sample of the data to GPT-4 and ask for a draft schema.

The hard part is figuring out how to ingest the data. There are two examples in the current ingest script, but every piece of data is unique and it's important to give the dataset proper consideration during ingest.

tables.json5 will let you add both hints and special instructions for the table. This can improve the query experience dramatically.

Inspiration

This project is inspired by the work being done on textSQL, we've retooled the idea to better fit the Torontoverse ecosystem.

Support

Join our Discord.

Name		Name	Last commit message	Last commit date
Latest commit History 73 Commits
.github/workflows		.github/workflows
bot		bot
db		db
discord		discord
ingest		ingest
jsonschema		jsonschema
opendata		opendata
storage		storage
viz		viz
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
go.mod		go.mod
go.sum		go.sum
main.go		main.go

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

TorontoBot

Usage

Dependencies

Ingest

Usage

Adding a new dataset

Inspiration

Support

About

Releases

Packages

Contributors 3

Languages

License

geomodulus/torontobot

Folders and files

Latest commit

History

Repository files navigation

TorontoBot

Usage

Dependencies

Ingest

Usage

Adding a new dataset

Inspiration

Support

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages