You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In #1880 I discussed the application for topics categorization of test-time compute to reduce hallucinations, in particular using multi-sample and/or semantic entropy (I’m drafting another issue about other ways). Obviously, we would want to get similar benefits for summarization too!
At first glance summarization seems quite a different topic categorization: much longer answers, with multiple clauses, as opposed to a single precise answer from a pre-defined set of topics. The good news is that we can apply many of the same ideas, by decomposing the summary into multiple simple clauses, as (Farquhar et al. 2024) explains:
Naively, one might simply regenerate each sentence (conditioned on the text so far) and then compute semantic entropy over these regenerations. However, the resampled sentences often target different aspects of the biography: example, one time describing family and the next time profession. This is analogous to the original problem semantic
entropy was designed to resolve: the model is uncertain about the right ordering of facts, not about the facts themselves. To address this, we break down the entire paragraph into factual claims and reconstruct questions which might have been answered by those claims. Only then do we apply semantic entropy.
In more visual form, (Farquhar et al. 2024) proceed as follow:
First, they decompose full paragraphs into a set of factoids
The for each factoid
Using an LLM to generate of 3 questions on that factoid
Then for each question (and passing the paragraph as context), use the LLM to generate multiple answers
Then use semantic entropy to check them for hallucinations.
or even more visual and taken from their article Figure 1:
It gets even better, for our case. The JSON format created by @colin and @tim for summarization explicitly separates the summary into clauses:
This simplifies the separation into factoids: each clause (and the associated comments) is a factoid!
Some questions are still open that I have not yet fully answered:
I need to double-check some of the the details: for example, in step 2.a.i, do we give or not the original generated paragraph in addition to the question to get the answer. It’s quite a different exercise, and amounts to checking the entropy of two different conditional distributions. I need to think more about this.
Unlike for topic categorization, and if I understand correctly, this algorithm only detects hallucinations, but does not generate a new correct summary. This is a procedure to check the clauses and knock-out wrong ones, but it does not to generate multiple entire summaries. Could we extend it to get better clauses ?
Maybe using the multiple answers on the multiple generated questions on which we check entropy, and then recombine that into a new clause to replace the hallucinated clause?
But there might then be issues about keeping coherence with the other surrounding clauses. Worth investigating nevertheless!
I think this is a promising direction to keep digging :)
Reference:
Farquhar, Sebastian, Jannik Kossen, Lorenz Kuhn, and Yarin Gal. 2024. ‘Detecting Hallucinations in Large Language Models Using Semantic Entropy’. Nature 630 (8017): 625–30. https://doi.org/10.1038/s41586-024-07421-0.
The text was updated successfully, but these errors were encountered:
jucor
changed the title
[LLM Summarization] Test-time compute for summarization
[LLM Summarization] Test-time compute to reduce hallucinations in summarization
Jan 22, 2025
In #1880 I discussed the application for topics categorization of test-time compute to reduce hallucinations, in particular using multi-sample and/or semantic entropy (I’m drafting another issue about other ways). Obviously, we would want to get similar benefits for summarization too!
At first glance summarization seems quite a different topic categorization: much longer answers, with multiple clauses, as opposed to a single precise answer from a pre-defined set of topics. The good news is that we can apply many of the same ideas, by decomposing the summary into multiple simple clauses, as (Farquhar et al. 2024) explains:
In more visual form, (Farquhar et al. 2024) proceed as follow:
or even more visual and taken from their article Figure 1:
It gets even better, for our case. The JSON format created by @colin and @tim for summarization explicitly separates the summary into clauses:
polis/server/src/prompts/report_experimental/subtasks/common/jsonSchema.xml
Lines 1 to 22 in 19adf7c
And the corresponding TypeScript types:
polis/server/src/prompts/report_experimental/subtasks/common/typesReference.xml
Lines 1 to 26 in 19adf7c
This simplifies the separation into factoids: each clause (and the associated comments) is a factoid!
Some questions are still open that I have not yet fully answered:
I think this is a promising direction to keep digging :)
Reference:
Farquhar, Sebastian, Jannik Kossen, Lorenz Kuhn, and Yarin Gal. 2024. ‘Detecting Hallucinations in Large Language Models Using Semantic Entropy’. Nature 630 (8017): 625–30. https://doi.org/10.1038/s41586-024-07421-0.
The text was updated successfully, but these errors were encountered: