- You can create anything you want but it must involve computatoinal science.
- You will be judgged primarially on innovation, inginuity and creativity, as well as the other criteria mentioned here like your final oral presentation.
A loose collection of example workflows to kickstart hackathon projects. Each of these are just starting points that need significant work and focus to make real.
- Enron dataset task
- Enron email dataset in kaggle https://www.kaggle.com/datasets/wcukierski/enron-email-dataset/data . 1.5GB dataset. Not a very clean dataset
- Parse and clean dataset
- Tasks :
- Get the count of user sent emails per year and plot it
- NetworkX graph from emails
- RAG pipeline
- Find the most important person and summarize all the emails
-
Letterbox dataset
- Letterboxd dataset is a more cleaner and elaborate IMDb dataset https://www.kaggle.com/datasets/gsimonx37/letterboxd/data
- Has posters (image data) and multiple csv tabular data
- Can devise some tasks that might combine multiple data?
-
Kaggle dataset for computer vision
- https://www.kaggle.com/datasets?tags=13207-Computer+Vision
- Can pick one task. These datasets will need some preprocessing.
-
Astronomical Image processing workflow
- Most astronomical deep-sky imagery needs some processing.
- Multiple images, filters, denoising.
- Pixinsight is a software that people use to process FITS images https://pixinsight.com/
- LSST datasets, some tools and some tutorials https://github.com/lsst/dp0-2_lsst_io/tree/main
- European Org for Astronomical Research Data processing pipeline https://www.eso.org/sci/software/edps.html https://ftp.eso.org/pub/dfs/pipelines/libraries/edps/edps_tutorial0.9.pdf
-
DocVQA
- massive dataset of visual docs for QA. https://rrc.cvc.uab.es/?ch=17&com=introduction
- Can define some tasks based on this dataset
-
RE_MAT project
NCSA has a new project to host LLMs that are directly compatible with the OpenAI API.
- API Docs: https://docs.ncsa.ai/
- Playground (experimental: no guarentee that all features work): https://ncsa.ai/
Come see a hackathon organizer and we can provide Azure OpenAI API keys. These are generously subsidised by Microsoft Research. This has access to GPT-4 Turbo, etc. We only have the Azure version, not the regular OpenAI version.
3. UIUC.chat - RAG llm API
The UIUC.chat API allows you to upload many types of documents and chat with them. The API will return "answers that are grounded in your documents" much like Perplexity.ai.
Please email me ([email protected]) if you have any questions or problems! Just a quick casual, email is great, low pressure.
- UIUC.chat https://www.uiuc.chat/
- API docs: https://docs.uiuc.chat/uiuc.chat-api/api-keys
- Tutorial & highlights: https://www.youtube.com/watch?v=IIMCrIoz7LM&ab_channel=KastanDay
Usage:
- Make an account with your Illinois email.
- Create a new project by uploading documents
- This requires supplying an Azure OpenAI key (see above). Enter it on the "Materials" page under Project-wide OpenAl key before continuing.
- Try chatting with your documents on the website, then try via the API.
This is favorite LLM provider, they have a generous free tier, high rate limits, and leading-class features like function calling and json mode.
You'll have to create your own account. I recommend using the models Mistral
and Mixtral
.
- Function calling blog / explainer: https://www.anyscale.com/blog/anyscale-endpoints-json-mode-and-function-calling-features
- Function calling docs: https://docs.endpoints.anyscale.com/text-generation/function-calling
- JSON mode docs: https://docs.endpoints.anyscale.com/text-generation/json-mode/
- OpenAI docs (good to read important notes!): https://platform.openai.com/docs/guides/text-generation/json-mode