Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(jsonish): Add support for curly quotes in JSON parsing #1249

Open
wants to merge 4 commits into
base: canary
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
28 changes: 23 additions & 5 deletions engine/baml-lib/jsonish/src/jsonish/parser/entry.rs
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,22 @@ use crate::jsonish::{

use super::ParseOptions;

pub fn parse(str: &str, mut options: ParseOptions) -> Result<Value> {

/// Normalizes Unicode quotes in a string to standard ASCII double quotes.
///
/// This function handles the following conversions:
/// - Left double quotation mark (U+201C) → Basic quotation mark (U+0022)
/// - Right double quotation mark (U+201D) → Basic quotation mark (U+0022)
///
/// This normalization is necessary because LLMs may output JSON with curly quotes
/// that would otherwise be valid JSON if using standard quotes.

fn normalize_quotes(s: &str) -> String {
// Convert both left (U+201C) and right (U+201D) curly quotes to straight quotes (U+0022)
s.replace('\u{201C}', "\u{0022}").replace('\u{201D}', "\u{0022}")
}

pub fn parse<'a>(str: &'a str, mut options: ParseOptions) -> Result<Value> {
log::debug!("Parsing:\n{:?}\n-------\n{}\n-------", options, str);

options.depth += 1;
Expand All @@ -22,15 +37,18 @@ pub fn parse(str: &str, mut options: ParseOptions) -> Result<Value> {
));
}

match serde_json::from_str(str) {
// First normalize any curly quotes
let normalized = normalize_quotes(str);

match serde_json::from_str(&normalized) {
Ok(v) => return Ok(Value::AnyOf(vec![v], str.to_string())),
Err(e) => {
log::debug!("Invalid JSON: {:?}", e);
}
};

if options.allow_markdown_json {
match markdown_parser::parse(str, &options) {
match markdown_parser::parse(&normalized, &options) {
Ok(items) => match items.len() {
0 => {}
1 => {
Expand Down Expand Up @@ -103,7 +121,7 @@ pub fn parse(str: &str, mut options: ParseOptions) -> Result<Value> {
}

if options.all_finding_all_json_objects {
match multi_json_parser::parse(str, &options) {
match multi_json_parser::parse(&normalized, &options) {
Ok(items) => match items.len() {
0 => {}
1 => {
Expand Down Expand Up @@ -136,7 +154,7 @@ pub fn parse(str: &str, mut options: ParseOptions) -> Result<Value> {
}

if options.allow_fixes {
match fixing_parser::parse(str, &options) {
match fixing_parser::parse(&normalized, &options) {
Ok(items) => {
match items.len() {
0 => {}
Expand Down
3 changes: 3 additions & 0 deletions engine/language_client_python/Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,9 @@ internal-baml-codegen.workspace = true
env_logger.workspace = true
futures.workspace = true
indexmap.workspace = true
libc = "0.2"
log.workspace = true
ctrlc = "3.4"
# Consult https://pyo3.rs/main/migration for migration instructions
pyo3 = { version = "0.23.3", default-features = false, features = [
"abi3-py38",
Expand All @@ -44,6 +46,7 @@ regex.workspace = true
serde.workspace = true
serde_json.workspace = true
tokio = { version = "1", features = ["full"] }
tokio-util = { version = "0.7", features = ["full"] }
tracing-subscriber = { version = "0.3.18", features = [
"json",
"env-filter",
Expand Down
46 changes: 46 additions & 0 deletions engine/language_client_python/src/lib.rs
Original file line number Diff line number Diff line change
Expand Up @@ -7,9 +7,55 @@ use pyo3::prelude::{pyfunction, pymodule, PyAnyMethods, PyModule, PyResult};
use pyo3::types::PyModuleMethods;
use pyo3::{wrap_pyfunction, Bound, Python};
use tracing_subscriber::{self, EnvFilter};
use ctrlc;

#[pyfunction]
fn invoke_runtime_cli(py: Python) -> PyResult<()> {
// SIGINT (Ctrl+C) Handling Implementation, an approach from @revidious
//
// Background:
// When running BAML through Python, we face a challenge where Python's default SIGINT handling
// can interfere with graceful shutdown. This is because:
// 1. Python has its own signal handlers that may conflict with Rust's
// 2. The PyO3 runtime can sometimes mask or delay interrupt signals
// 3. We need to ensure clean shutdown across the Python/Rust boundary
//
// Solution:
// We implement a custom signal handling mechanism using Rust's ctrlc crate that:
// 1. Bypasses Python's signal handling entirely
// 2. Provides consistent behavior across platforms
// 3. Ensures graceful shutdown with proper exit codes
// Note: While eliminating the root cause of SIGINT handling conflicts would be ideal,
// the source appears to be deeply embedded in BAML's architecture and PyO3's runtime.
// A proper fix would require extensive changes to how BAML handles signals across the
// Python/Rust boundary. For now, this workaround provides reliable interrupt handling
// without requiring major architectural changes but welp, this is a hacky solution.

// Create a channel for communicating between the signal handler and main thread
// This is necessary because signal handlers run in a separate context and
// need a safe way to communicate with the main program
let (interrupt_send, interrupt_recv) = std::sync::mpsc::channel();

// Install our custom Ctrl+C handler
// This will run in a separate thread when SIGINT is received
ctrlc::set_handler(move || {
println!("\nShutting Down BAML...");
// Notify the main thread through the channel
// Using ok() to ignore send errors if the receiver is already dropped
interrupt_send.send(()).ok();
}).expect("Error setting Ctrl-C handler");

// Monitor for interrupt signals in a separate thread
// This is necessary because we can't directly exit from the signal handler.

std::thread::spawn(move || {
if interrupt_recv.recv().is_ok() {
// Exit with code 130 (128 + SIGINT's signal number 2)
// This is the standard Unix convention for processes terminated by SIGINT
std::process::exit(130);
}
});

baml_cli::run_cli(
py.import("sys")?
.getattr("argv")?
Expand Down