What does AI believe is true?
Samuel Albanie Samuel Albanie
18.7K subscribers
1,812 views
0

 Published On Jul 11, 2023

How can I infer the beliefs of a neural network in an unsupervised way?

That question motivates Burns et al. to propose the Contrast-Consistent Search (CCS) method in their work "Discovering Latent Knowledge in language models Without Supervision".

Timestamps:
00:00 - Discovering Latent Knowledge in Language Models Without Supervision
02:08 - Problem definition: Discovering Latent Knowledge
03:30 - Method: Contrast-Consistent Search (CCS)
08:35 - Experimental Setup
09:37 - Finding: CCS is robust to misleading prompts
11:52 - Finding: Truth is a salient feature
12:28 - Related work on zero-shot prompting and truthfulness
13:15 - Limitations
13:52 - A few closing thoughts
14:51 - Further resources

Topics: #interpretability #ai #llm #liedetection #artificialintelligence

Link to the paper: https://arxiv.org/abs/2212.03827
Notebook on Github: https://github.com/collin-burns/disco...
Forum post: https://www.alignmentforum.org/posts/...
Critique: https://arxiv.org/abs/2307.00175

For related content:
- Twitter:   / samuelalbanie  
- Research lab: https://caml-lab.com/
- personal webpage: https://samuelalbanie.com/
- YouTube:    / @samuelalbanie1  
- TikTok:   / samuelalbanie  
- Instagram:   / samuelalbanie  
- LinkedIn:   / samuel-albanie  
- Threads: https://www.threads.net/@samuelalbanie
- Discord server for filtir:   / discord  

(Optional) if you'd like to support the channel:
- https://www.buymeacoffee.com/samuelal...
-   / samuel_albanie  

show more

Share/Embed