Paul Christiano (researcher): Difference between revisions

Paul Christiano
Paul Christiano
Education	Massachusetts Institute of Technology (BS); University of California, Berkeley (PhD);
Known for	AI alignment; Reinforcement learning from human feedback;
	Scientific career
Institutions	OpenAI; Alignment Research Center;
Thesis	Manipulation-resistant online learning (2017)
Doctoral advisor	Umesh Vazirani
Website	paulfchristiano.com

Browse history interactively

← Previous edit

Content deleted Content added

VisualWikitext

Inline

Latest revision as of 07:16, 28 May 2024

Paul Christiano is an American researcher in the field of artificial intelligence (AI), with a specific focus on AI alignment, which is the subfield of AI safety research that aims to steer AI systems toward human interests.^[1] He formerly led the language model alignment team at OpenAI and became founder and head of the non-profit Alignment Research Center (ARC), which works on theoretical AI alignment and evaluations of machine learning models.^[2]^[3] In 2023, Christiano was named as one of the TIME 100 Most Influential People in AI (TIME100 AI).^[3]^[4]

In September 2023, Christiano was appointed to the UK government's Frontier AI Taskforce advisory board.^[5] He is also an initial trustee on Anthropic's Long-Term Benefit Trust.^[6]

Education

Christiano attended the Harker School in San Jose, California.^[7] He competed on the U.S. team and won a silver medal at the 49th International Mathematics Olympiad (IMO) in 2008.^[7]^[8]

In 2012, Christiano graduated from the Massachusetts Institute of Technology (MIT) with a degree in mathematics.^[9]^[10] At MIT, he researched data structures, quantum cryptography, and combinatorial optimization.^[10]

He then went on to complete a PhD at the University of California, Berkeley.^[11] While at Berkeley, Christiano collaborated with researcher Katja Grace on AI Impacts, co-developing a preliminary methodology for comparing supercomputers to brains, using traversed edges per second (TEPS).^[12] He also experimented with putting Carl Shulman's donor lottery theory into practice, raising nearly $50,000 in a pool to be donated to a single charity.^[13]

Career

At OpenAI, Christiano co-authored the paper "Deep Reinforcement Learning from Human Preferences" (2017) and other works developing reinforcement learning from human feedback (RLHF).^[14]^[15] He is considered one of the principal architects of RLHF,^[3]^[6] which in 2017 was "considered a notable step forward in AI safety research", according to The New York Times.^[16] Other works such as "AI safety via debate" (2018) focus on the problem of scalable oversight – supervising AIs in domains where humans would have difficulty judging output quality.^[17]^[18]^[19]

Christiano left OpenAI in 2021 to work on more conceptual and theoretical issues in AI alignment and subsequently founded the Alignment Research Center to focus on this area.^[1] One subject of study is the problem of eliciting latent knowledge from advanced machine learning models.^[20]^[21] ARC also develops techniques to identify and test whether an AI model is potentially dangerous.^[3] In April 2023, Christiano told The Economist that ARC was considering developing an industry standard for AI safety.^[22]

As of April 2024, Christiano was listed as the head of AI safety for the US AI Safety Institute at NIST.^[23] One month earlier in March 2024, staff members and scientists at the institute threatened to resign upon being informed of Christiano's pending appointment to the role, stating that his ties to the effective altruism movement may jeopardize the AI Safety Institute's objectivity and integrity.^[24]

Views on AI risks

He is known for his views on the potential risks of advanced AI. In 2017, Wired magazine stated that Christiano and his colleagues at OpenAI weren't worried about the destruction of the human race by "evil robots", explaining that "[t]hey’re more concerned that, as AI progresses beyond human comprehension, the technology’s behavior may diverge from our intended goals."^[25]

However, in a widely quoted interview with Business Insider in 2023, Christiano said that there is a “10–20% chance of AI takeover, [with] many [or] most humans dead.” He also conjectured a “50/50 chance of doom shortly after you have AI systems that are human level.”^[26]^[1]

Personal life

Christiano is married to Ajeya Cotra of Open Philanthropy.^[27]

References

^ ^a ^b ^c "A.I. has a '10 or 20% chance' of conquering humanity, former OpenAI safety researcher warns". Fortune. Retrieved June 4, 2023.
^ Piper, Kelsey (March 29, 2023). "How to test what an AI model can — and shouldn't — do". Vox. Retrieved August 4, 2023.
^ ^a ^b ^c ^d Henshall, Will (September 7, 2023). "Paul Christiano – Founder, Alignment Research Center". TIME magazine. Retrieved November 16, 2023.
^ Sibley, Jess (September 10, 2023). "The Future Is Now". Time magazine. Vol. 202, no. 11/12. Retrieved November 16, 2023 – via EBSCOHost.
^ Skelton, Sebastian Klovig (September 7, 2023). "Government AI taskforce appoints new advisory board members". ComputerWeekly.com. Retrieved November 16, 2023.
^ ^a ^b Matthews, Dylan (September 25, 2023). "The $1 billion gamble to ensure AI doesn't destroy humanity". Vox. Retrieved November 16, 2023.
^ ^a ^b Kehoe, Elaine (October 2008). "Mathematics People – 2008 International Mathematical Olympiad" (PDF). American Mathematical Society. Retrieved November 16, 2023.
^ Feng, Zumin; Gelca, Razvan; Le, Ian; Dunbar, Steven R. (June 2009). "NEWS AND LETTERS: 49th International Mathematical Olympiad". Mathematics Magazine. 82 (e): 235–238. doi:10.1080/0025570X.2009.11953629. JSTOR 27765911.
^ "Paul F. Christiano". Association for Computing Machinery Digital Library. Retrieved November 16, 2023.
^ ^a ^b "About the Authors: Theory of Computing: An Open Access Electronic Journal in Theoretical Computer Science". Retrieved November 16, 2023.
^ "Paul Christiano – Research Associate". The Future of Humanity Institute. Retrieved August 4, 2023.
^ Hsu, Jeremy (August 26, 2015). "Estimate: Human Brain 30 Times Faster than Best Supercomputers". IEEE Spectrum. Retrieved November 16, 2023.
^ Paynter, Ben (January 31, 2017). "Take A Chance With Your Charity And Try A Donor Lottery". Fast Company. Retrieved November 16, 2023.
^ Christiano, Paul F; Leike, Jan; Brown, Tom; Martic, Miljan; Legg, Shane; Amodei, Dario (2017). "Deep Reinforcement Learning from Human Preferences". Advances in Neural Information Processing Systems. 30. Curran Associates, Inc.
^ Ouyang, Long; Wu, Jeffrey; Jiang, Xu; Almeida, Diogo; Wainwright, Carroll; Mishkin, Pamela; Zhang, Chong; Agarwal, Sandhini; Slama, Katarina; Ray, Alex; Schulman, John; Hilton, Jacob; Kelton, Fraser; Miller, Luke; Simens, Maddie (December 6, 2022). "Training language models to follow instructions with human feedback". Advances in Neural Information Processing Systems. 35: 27730–27744. arXiv:2203.02155.
^ Metz, Cade (August 13, 2017). "Teaching A.I. Systems to Behave Themselves". The New York Times. Retrieved November 16, 2023.
^ Irving, G.; Christiano, P.; Amodei, Dario (May 2, 2018). "AI safety via debate". arXiv:1805.00899 [stat.ML].
^ Wu, Jeff; Ouyang, Long; Ziegler, Daniel M.; Stiennon, Nissan; Lowe, Ryan; Leike, J.; Christiano, P. (September 22, 2021). "Recursively Summarizing Books with Human Feedback". arXiv:2109.10862 [cs.CL].
^ Christiano, P.; Shlegeris, Buck; Amodei, Dario (October 19, 2018). "Supervising strong learners by amplifying weak experts". arXiv:1810.08575 [cs.LG].
^ Burns, Collin; Ye, Haotian; Klein, Dan; Steinhardt, Jacob (2022). "Discovering Latent Knowledge in Language Models Without Supervision". arXiv:2212.03827 [cs.CL].
^ Christiano, Paul; Cotra, Ajeya; Xu, Mark (December 2021). "Eliciting Latent Knowledge: How to tell if your eyes deceive you". Google Docs. Alignment Research Center. Retrieved April 16, 2023.
^ "How generative models could go wrong". The Economist. April 19, 2023. Retrieved November 16, 2023.
^ "Paul Christiano". NIST.gov. April 17, 2024. Retrieved May 22, 2024.
^ Goldman, Sharon (March 7, 2024). "NIST staffers revolt against expected appointment of 'effective altruist' AI researcher to US AI Safety Institute". VentureBeat. Retrieved May 22, 2024.
^ Newman, Lily Hay (September 2017). "Should We Worry? – Will AI Turn Against Me?". Wired. Retrieved November 16, 2023.
^ Nolan, Beatrice. "Ex-OpenAI researcher says there's a 50% chance AI development could end in 'doom'". Business Insider. Retrieved June 4, 2023.
^ Piper, Kelsey (June 2023). "A Field Guide to AI Safety". Asterisk Magazine. No. 3. Retrieved November 16, 2023.

External links

Personal website

[:0-1] "A.I. has a '10 or 20% chance' of conquering humanity, former OpenAI safety researcher warns". Fortune. Retrieved June 4, 2023.

[2] Piper, Kelsey (March 29, 2023). "How to test what an AI model can — and shouldn't — do". Vox. Retrieved August 4, 2023.

[:1-3] Henshall, Will (September 7, 2023). "Paul Christiano – Founder, Alignment Research Center". TIME magazine. Retrieved November 16, 2023.

[4] Sibley, Jess (September 10, 2023). "The Future Is Now". Time magazine. Vol. 202, no. 11/12. Retrieved November 16, 2023 – via EBSCOHost.

[5] Skelton, Sebastian Klovig (September 7, 2023). "Government AI taskforce appoints new advisory board members". ComputerWeekly.com. Retrieved November 16, 2023.

[:3-6] Matthews, Dylan (September 25, 2023). "The $1 billion gamble to ensure AI doesn't destroy humanity". Vox. Retrieved November 16, 2023.

[:4-7] Kehoe, Elaine (October 2008). "Mathematics People – 2008 International Mathematical Olympiad" (PDF). American Mathematical Society. Retrieved November 16, 2023.

[8] Feng, Zumin; Gelca, Razvan; Le, Ian; Dunbar, Steven R. (June 2009). "NEWS AND LETTERS: 49th International Mathematical Olympiad". Mathematics Magazine. 82 (e): 235–238. doi:10.1080/0025570X.2009.11953629. JSTOR 27765911.

[9] "Paul F. Christiano". Association for Computing Machinery Digital Library. Retrieved November 16, 2023.

[:2-10] "About the Authors: Theory of Computing: An Open Access Electronic Journal in Theoretical Computer Science". Retrieved November 16, 2023.

[11] "Paul Christiano – Research Associate". The Future of Humanity Institute. Retrieved August 4, 2023.

[12] Hsu, Jeremy (August 26, 2015). "Estimate: Human Brain 30 Times Faster than Best Supercomputers". IEEE Spectrum. Retrieved November 16, 2023.

[13] Paynter, Ben (January 31, 2017). "Take A Chance With Your Charity And Try A Donor Lottery". Fast Company. Retrieved November 16, 2023.

[14] Christiano, Paul F; Leike, Jan; Brown, Tom; Martic, Miljan; Legg, Shane; Amodei, Dario (2017). "Deep Reinforcement Learning from Human Preferences". Advances in Neural Information Processing Systems. 30. Curran Associates, Inc.

[15] Ouyang, Long; Wu, Jeffrey; Jiang, Xu; Almeida, Diogo; Wainwright, Carroll; Mishkin, Pamela; Zhang, Chong; Agarwal, Sandhini; Slama, Katarina; Ray, Alex; Schulman, John; Hilton, Jacob; Kelton, Fraser; Miller, Luke; Simens, Maddie (December 6, 2022). "Training language models to follow instructions with human feedback". Advances in Neural Information Processing Systems. 35: 27730–27744. arXiv:2203.02155.

[16] Metz, Cade (August 13, 2017). "Teaching A.I. Systems to Behave Themselves". The New York Times. Retrieved November 16, 2023.

[17] Irving, G.; Christiano, P.; Amodei, Dario (May 2, 2018). "AI safety via debate". arXiv:1805.00899 [stat.ML].

[18] Wu, Jeff; Ouyang, Long; Ziegler, Daniel M.; Stiennon, Nissan; Lowe, Ryan; Leike, J.; Christiano, P. (September 22, 2021). "Recursively Summarizing Books with Human Feedback". arXiv:2109.10862 [cs.CL].

[19] Christiano, P.; Shlegeris, Buck; Amodei, Dario (October 19, 2018). "Supervising strong learners by amplifying weak experts". arXiv:1810.08575 [cs.LG].

[20] Burns, Collin; Ye, Haotian; Klein, Dan; Steinhardt, Jacob (2022). "Discovering Latent Knowledge in Language Models Without Supervision". arXiv:2212.03827 [cs.CL].

[21] Christiano, Paul; Cotra, Ajeya; Xu, Mark (December 2021). "Eliciting Latent Knowledge: How to tell if your eyes deceive you". Google Docs. Alignment Research Center. Retrieved April 16, 2023.

[22] "How generative models could go wrong". The Economist. April 19, 2023. Retrieved November 16, 2023.

[23] "Paul Christiano". NIST.gov. April 17, 2024. Retrieved May 22, 2024.

[24] Goldman, Sharon (March 7, 2024). "NIST staffers revolt against expected appointment of 'effective altruist' AI researcher to US AI Safety Institute". VentureBeat. Retrieved May 22, 2024.

[25] Newman, Lily Hay (September 2017). "Should We Worry? – Will AI Turn Against Me?". Wired. Retrieved November 16, 2023.

[26] Nolan, Beatrice. "Ex-OpenAI researcher says there's a 50% chance AI development could end in 'doom'". Business Insider. Retrieved June 4, 2023.

[27] Piper, Kelsey (June 2023). "A Field Guide to AI Safety". Asterisk Magazine. No. 3. Retrieved November 16, 2023.

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

[12]

[13]

[14]

[15]

[16]

[17]

[18]

[19]

[20]

[21]

[22]

[23]

[24]

[25]

[26]

[27]

@@ Line 1: / Line 1: @@
 {{Short description|American AI safety researcher}}
+{{Use mdy dates|date=November 2023}}{{Infobox scientist
-{{primary sources|date=October 2023}}
-{{notability|date=October 2023}}
-{{Infobox scientist
 | workplaces        = {{ubl|[[OpenAI]]|
 [[Alignment Research Center]]}}
-| alma_mater        = {{ubl|[[Massachusetts Institute of Technology]] (BS)|
+| education         = {{ubl|[[Massachusetts Institute of Technology]] (BS)|
 [[University of California, Berkeley]] (PhD)}}
 | known_for         = {{ubl|[[AI alignment]]|
 [[Reinforcement learning from human feedback]]}}
 | website           = {{url|https://paulfchristiano.com/}}
+| thesis_title      = Manipulation-resistant online learning
+| thesis_year       = 2017
+| thesis_url        = https://escholarship.org/content/qt0w22c86t/qt0w22c86t.pdf
+| doctoral_advisor  = [[Umesh Vazirani]]
 }}
+'''Paul Christiano''' is an American researcher in the field of [[artificial intelligence]] (AI), with a specific focus on [[AI alignment]], which is the subfield of [[AI safety]] research that aims to steer AI systems toward human interests.<ref name=":0" /> He formerly led the language model alignment team at [[OpenAI]] and became founder and head of the non-profit [[Alignment Research Center]] (ARC), which works on theoretical AI alignment and evaluations of [[machine learning]] models.<ref>{{Cite web |last=Piper |first=Kelsey |date=2023-03-29 |title=How to test what an AI model can — and shouldn't — do |url=https://www.vox.com/future-perfect/2023/3/29/23661633/gpt-4-openai-alignment-research-center-open-philanthropy-ai-safety |access-date=2023-08-04 |website=Vox |language=en}}</ref><ref name=":1">{{Cite magazine |last=Henshall |first=Will |date=September 7, 2023 |title=Paul Christiano – Founder, Alignment Research Center |magazine=[[TIME (magazine)|TIME magazine]] |url=https://time.com/collection/time100-ai/6309030/paul-christiano/ |access-date=2023-11-16}}</ref> In 2023, Christiano was named as one of the [[Time 100|''TIME'' 100]] Most Influential People in AI (''TIME''100 AI).<ref name=":1" /><ref>{{Cite magazine |last=Sibley |first=Jess |date=September 10, 2023 |title=The Future Is Now |volume=202 |magazine=[[Time (magazine)|Time magazine]] |issue=11/12 |url=https://search.ebscohost.com/login.aspx?direct=true&db=a9h&AN=172374416&lang=en-gb&site=eds-live&scope=site |access-date=2023-11-16 |via=[[EBSCOHost]]}}</ref>
+In September 2023, Christiano was appointed to the UK government's Frontier AI Taskforce advisory board.<ref>{{Cite news |last=Skelton |first=Sebastian Klovig |date=7 September 2023 |title=Government AI taskforce appoints new advisory board members |work=ComputerWeekly.com |url=https://www.computerweekly.com/news/366551256/Government-AI-taskforce-appoints-new-advisory-board-members |access-date=2023-11-16}}</ref> He is also an initial trustee on [[Anthropic]]'s Long-Term Benefit Trust.<ref name=":3">{{Cite news |last=Matthews |first=Dylan |date=25 September 2023 |title=The $1 billion gamble to ensure AI doesn't destroy humanity |work=[[Vox (website)|Vox]] |url=https://www.vox.com/future-perfect/23794855/anthropic-ai-openai-claude-2 |access-date=2023-11-16}}</ref>
-'''Paul Christiano''' is an American researcher in the field of [[artificial intelligence]] (AI), with a specific focus on [[AI alignment]], which is the subfield of [[AI safety]] research that aims to steer AI systems toward human interests.<ref name=":0" /> He formerly led the language model alignment team at [[OpenAI]] and is now the head of the non-profit [[Alignment Research Center]], which works on theoretical AI alignment and evaluations of [[machine learning]] models.<ref>{{Cite web |last=Piper |first=Kelsey |date=2023-03-29 |title=How to test what an AI model can — and shouldn't — do |url=https://www.vox.com/future-perfect/2023/3/29/23661633/gpt-4-openai-alignment-research-center-open-philanthropy-ai-safety |access-date=2023-08-04 |website=Vox |language=en}}</ref>
 ==Education==
+Christiano attended the [[Harker School]] in San Jose, California.<ref name=":4">{{Cite web |last=Kehoe |first=Elaine |date=October 2008 |title=Mathematics People – 2008 International Mathematical Olympiad |url=https://www.ams.org/notices/200810/tx081001284p.pdf |access-date=2023-11-16 |website=American Mathematical Society}}</ref> He competed on the U.S. team and won a silver medal at the 49th [[International Mathematical Olympiad|International Mathematics Olympiad]] (IMO) in 2008.<ref name=":4" /><ref>{{Cite journal |last1=Feng |first1=Zumin |last2=Gelca |first2=Razvan |last3=Le |first3=Ian |last4=Dunbar |first4=Steven R. |date=June 2009 |title=NEWS AND LETTERS: 49th International Mathematical Olympiad |url=https://www.jstor.org/stable/27765911 |journal=Mathematics Magazine |volume=82 |issue=e |pages=235–238 |doi=10.1080/0025570X.2009.11953629 |jstor=27765911}}</ref>
-Christiano won a silver medal in the international mathematics olympiad in 2008.<ref>{{cite web | url=https://www.imo-official.org/participant_r.aspx?id=18030 | title=IMO 2008}}</ref> In 2012, Christiano graduated from the [[Massachusetts Institute of Technology]] (MIT) with a degree in mathematics.<ref>{{cite web | url=https://simons.berkeley.edu/people/paul-christiano | title=Paul Christiano }}</ref> At MIT, he researched data structures, quantum cryptography, and combinatorial optimization.<ref>{{cite web | url=https://theoryofcomputing.org/articles/v009a009/about.html | title=About the Authors: Theory of Computing: An Open Access Electronic Journal in Theoretical Computer Science }}</ref> He then went on to complete a [[Doctor of Philosophy|PhD]] at the [[University of California, Berkeley]].<ref>{{Cite web |last=FHI |first=Future of Humanity Institute- |title=Future of Humanity Institute |url=http://www.fhi.ox.ac.uk/ |access-date=2023-08-04 |website=The Future of Humanity Institute |language=en-GB}}</ref>
+In 2012, Christiano graduated from the [[Massachusetts Institute of Technology]] (MIT) with a degree in mathematics.<ref>{{Cite web |title=Paul F. Christiano |url=https://dl.acm.org/profile/81485658302 |access-date=2023-11-16 |website=Association for Computing Machinery Digital Library}}</ref><ref name=":2" /> At MIT, he researched data structures, quantum cryptography, and combinatorial optimization.<ref name=":2">{{cite web |title=About the Authors: Theory of Computing: An Open Access Electronic Journal in Theoretical Computer Science |url=https://theoryofcomputing.org/articles/v009a009/about.html |access-date=2023-11-16}}</ref>
+He then went on to complete a [[Doctor of Philosophy|PhD]] at the [[University of California, Berkeley]].<ref>{{Cite web |last= |first= |title=Paul Christiano – Research Associate |url=http://www.fhi.ox.ac.uk/ |access-date=2023-08-04 |website=The Future of Humanity Institute |language=en-GB}}</ref> While at Berkeley, Christiano collaborated with researcher [[Katja Grace]] on AI Impacts, co-developing a preliminary methodology for comparing supercomputers to brains, using traversed edges per second (TEPS).<ref>{{Cite news |last=Hsu |first=Jeremy |date=26 August 2015 |title=Estimate: Human Brain 30 Times Faster than Best Supercomputers |work=[[IEEE Spectrum]] |url=https://spectrum.ieee.org/estimate-human-brain-30-times-faster-than-best-supercomputers |access-date=2023-11-16}}</ref> He also experimented with putting [[Carl Shulman]]'s donor lottery theory into practice, raising nearly $50,000 in a pool to be donated to a single charity.<ref>{{Cite news |last=Paynter |first=Ben |date=January 31, 2017 |title=Take A Chance With Your Charity And Try A Donor Lottery |work=[[Fast Company]] |url=https://www.fastcompany.com/3067596/take-a-chance-with-your-charity-and-try-a-donor-lottery |access-date=2023-11-16}}</ref>
 ==Career==
-At OpenAI, Christiano co-authored the paper "Deep Reinforcement Learning from Human Preferences" (2017) and other works developing [[reinforcement learning from human feedback]] (RLHF).<ref>{{Cite journal |last1=Christiano |first1=Paul F |last2=Leike |first2=Jan |last3=Brown |first3=Tom |last4=Martic |first4=Miljan |last5=Legg |first5=Shane |last6=Amodei |first6=Dario |date=2017 |title=Deep Reinforcement Learning from Human Preferences |url=https://proceedings.neurips.cc/paper_files/paper/2017/hash/d5e2c0adad503c91f91df240d0cd4e49-Abstract.html |journal=Advances in Neural Information Processing Systems |publisher=Curran Associates, Inc. |volume=30}}</ref><ref>{{Cite journal |last1=Ouyang |first1=Long |last2=Wu |first2=Jeffrey |last3=Jiang |first3=Xu |last4=Almeida |first4=Diogo |last5=Wainwright |first5=Carroll |last6=Mishkin |first6=Pamela |last7=Zhang |first7=Chong |last8=Agarwal |first8=Sandhini |last9=Slama |first9=Katarina |last10=Ray |first10=Alex |last11=Schulman |first11=John |last12=Hilton |first12=Jacob |last13=Kelton |first13=Fraser |last14=Miller |first14=Luke |last15=Simens |first15=Maddie |date=2022-12-06 |title=Training language models to follow instructions with human feedback |url=https://proceedings.neurips.cc/paper_files/paper/2022/hash/b1efde53be364a73914f58805a001731-Abstract-Conference.html |journal=Advances in Neural Information Processing Systems |language=en |volume=35 |pages=27730–27744|arxiv=2203.02155 }}</ref>  Other works such as "AI safety via debate" (2018) focus on the problem of ''scalable oversight'' – supervising AIs in domains where humans would have difficulty judging output quality.<ref>{{Cite arXiv |last1=Irving |first1=G. |last2=Christiano |first2=P. |last3=Amodei |first3=Dario |date=2018-05-02 |title=AI safety via debate |class=stat.ML |eprint=1805.00899 }}</ref><ref>{{Cite arXiv |last1=Wu |first1=Jeff |last2=Ouyang |first2=Long |last3=Ziegler |first3=Daniel M. |last4=Stiennon |first4=Nissan |last5=Lowe |first5=Ryan |last6=Leike |first6=J. |last7=Christiano |first7=P. |date=2021-09-22 |title=Recursively Summarizing Books with Human Feedback |class=cs.CL |eprint=2109.10862}}</ref><ref>{{Cite arXiv |last1=Christiano |first1=P. |last2=Shlegeris |first2=Buck |last3=Amodei |first3=Dario |date=2018-10-19 |title=Supervising strong learners by amplifying weak experts |class=cs.LG |eprint=1810.08575}}</ref>
+At OpenAI, Christiano co-authored the paper "Deep Reinforcement Learning from Human Preferences" (2017) and other works developing [[reinforcement learning from human feedback]] (RLHF).<ref>{{Cite journal |last1=Christiano |first1=Paul F |last2=Leike |first2=Jan |last3=Brown |first3=Tom |last4=Martic |first4=Miljan |last5=Legg |first5=Shane |last6=Amodei |first6=Dario |date=2017 |title=Deep Reinforcement Learning from Human Preferences |url=https://proceedings.neurips.cc/paper_files/paper/2017/hash/d5e2c0adad503c91f91df240d0cd4e49-Abstract.html |journal=Advances in Neural Information Processing Systems |publisher=Curran Associates, Inc. |volume=30}}</ref><ref>{{Cite journal |last1=Ouyang |first1=Long |last2=Wu |first2=Jeffrey |last3=Jiang |first3=Xu |last4=Almeida |first4=Diogo |last5=Wainwright |first5=Carroll |last6=Mishkin |first6=Pamela |last7=Zhang |first7=Chong |last8=Agarwal |first8=Sandhini |last9=Slama |first9=Katarina |last10=Ray |first10=Alex |last11=Schulman |first11=John |last12=Hilton |first12=Jacob |last13=Kelton |first13=Fraser |last14=Miller |first14=Luke |last15=Simens |first15=Maddie |date=2022-12-06 |title=Training language models to follow instructions with human feedback |url=https://proceedings.neurips.cc/paper_files/paper/2022/hash/b1efde53be364a73914f58805a001731-Abstract-Conference.html |journal=Advances in Neural Information Processing Systems |language=en |volume=35 |pages=27730–27744|arxiv=2203.02155 }}</ref> He is considered one of the principal architects of RLHF,<ref name=":1" /><ref name=":3" /> which in 2017 was "considered a notable step forward in AI safety research", according to ''[[The New York Times]]''.<ref>{{Cite news |last=Metz |first=Cade |date=August 13, 2017 |title=Teaching A.I. Systems to Behave Themselves |work=[[The New York Times]] |url=https://www.nytimes.com/2017/08/13/technology/artificial-intelligence-safety-training.html |url-access=subscription |access-date=2023-11-16}}</ref> Other works such as "AI safety via debate" (2018) focus on the problem of ''scalable oversight'' – supervising AIs in domains where humans would have difficulty judging output quality.<ref>{{Cite arXiv |last1=Irving |first1=G. |last2=Christiano |first2=P. |last3=Amodei |first3=Dario |date=2018-05-02 |title=AI safety via debate |class=stat.ML |eprint=1805.00899 }}</ref><ref>{{Cite arXiv |last1=Wu |first1=Jeff |last2=Ouyang |first2=Long |last3=Ziegler |first3=Daniel M. |last4=Stiennon |first4=Nissan |last5=Lowe |first5=Ryan |last6=Leike |first6=J. |last7=Christiano |first7=P. |date=2021-09-22 |title=Recursively Summarizing Books with Human Feedback |class=cs.CL |eprint=2109.10862}}</ref><ref>{{Cite arXiv |last1=Christiano |first1=P. |last2=Shlegeris |first2=Buck |last3=Amodei |first3=Dario |date=2018-10-19 |title=Supervising strong learners by amplifying weak experts |class=cs.LG |eprint=1810.08575}}</ref>
-Christiano left OpenAI in 2021 to work on more conceptual and theoretical issues in AI alignment and subsequently founded the [[Alignment Research Center]] to focus on this area.<ref name=":0">{{Cite web |title=A.I. has a '10 or 20% chance' of conquering humanity, former OpenAI safety researcher warns |url=https://fortune.com/2023/05/03/openai-ex-safety-researcher-warns-ai-destroy-humanity/ |access-date=2023-06-04 |website=Fortune |language=en}}</ref> One subject of study is the problem of ''eliciting latent knowledge'' from advanced machine learning models''.''<ref>{{Cite arXiv|last1=Burns |first1=Collin |last2=Ye |first2=Haotian |last3=Klein |first3=Dan |last4=Steinhardt |first4=Jacob |date=2022 |title=Discovering Latent Knowledge in Language Models Without Supervision |class=cs.CL |eprint=2212.03827}}</ref><ref>{{Cite web |last1=Christiano |first1=Paul |last2=Cotra |first2=Ajeya |last3=Xu |first3=Mark |date=December 2021 |title=Eliciting Latent Knowledge: How to tell if your eyes deceive you |url=https://docs.google.com/document/d/1WwsnJQstPq91_Yh-Ch2XRL8H_EpsnjrC1dwZXR37PC8/edit?usp=embed_facebook |access-date=2023-04-16 |website=Google Docs |publisher=Alignment Research Center |language=en}}</ref>
+Christiano left OpenAI in 2021 to work on more conceptual and theoretical issues in AI alignment and subsequently founded the [[Alignment Research Center]] to focus on this area.<ref name=":0">{{Cite web |title=A.I. has a '10 or 20% chance' of conquering humanity, former OpenAI safety researcher warns |url=https://fortune.com/2023/05/03/openai-ex-safety-researcher-warns-ai-destroy-humanity/ |access-date=2023-06-04 |website=Fortune |language=en}}</ref> One subject of study is the problem of ''eliciting latent knowledge'' from advanced machine learning models''.''<ref>{{Cite arXiv|last1=Burns |first1=Collin |last2=Ye |first2=Haotian |last3=Klein |first3=Dan |last4=Steinhardt |first4=Jacob |date=2022 |title=Discovering Latent Knowledge in Language Models Without Supervision |class=cs.CL |eprint=2212.03827}}</ref><ref>{{Cite web |last1=Christiano |first1=Paul |last2=Cotra |first2=Ajeya |last3=Xu |first3=Mark |date=December 2021 |title=Eliciting Latent Knowledge: How to tell if your eyes deceive you |url=https://docs.google.com/document/d/1WwsnJQstPq91_Yh-Ch2XRL8H_EpsnjrC1dwZXR37PC8/edit?usp=embed_facebook |access-date=2023-04-16 |website=Google Docs |publisher=Alignment Research Center |language=en}}</ref> ARC also develops techniques to identify and test whether an AI model is potentially dangerous.<ref name=":1" /> In April 2023, Christiano told ''[[The Economist]]'' that ARC was considering developing an industry standard for AI safety.<ref>{{Cite news |date=2023-04-19 |title=How generative models could go wrong |newspaper=[[The Economist]] |url=https://www.economist.com/science-and-technology/2023/04/19/how-generative-models-could-go-wrong |access-date=2023-11-16}}</ref>
+As of April 2024, Christiano was listed as the head of AI safety for the US AI Safety Institute at [[NIST]].<ref>{{Cite web | title= Paul Christiano|url=https://www.nist.gov/people/paul-christiano |access-date=2024-05-22 |website=NIST.gov |date=April 17, 2024 |language=en-US}}</ref> One month earlier in March 2024, staff members and scientists at the institute threatened to resign upon being informed of Christiano's pending appointment to the role, stating that his ties to the [[effective altruism]] movement may jeopardize the AI Safety Institute's objectivity and integrity.<ref>{{Cite news |last=Goldman |first=Sharon |date=March 7, 2024 |title=NIST staffers revolt against expected appointment of 'effective altruist' AI researcher to US AI Safety Institute |url=https://venturebeat.com/ai/nist-staffers-revolt-against-potential-appointment-of-effective-altruist-ai-researcher-to-us-ai-safety-institute/ |newspaper=[[VentureBeat]] |access-date=2024-05-22}}</ref>
-Christiano is known for his views on the potential risks of advanced AI, stating in a 2023 interview that there is a “10–20% chance of AI takeover, [with] many [or] most humans dead.” He also conjectured a “50/50 chance of doom shortly after you have AI systems that are human level.”<ref>{{Cite web |last=Nolan |first=Beatrice |title=Ex-OpenAI researcher says there's a 50% chance AI development could end in 'doom' |url=https://www.businessinsider.com/openai-researcher-ai-doom-50-chatgpt-2023-5 |access-date=2023-06-04 |website=Business Insider |language=en-US}}</ref><ref name=":0" />
+=== Views on AI risks ===
+He is known for his views on the potential risks of advanced AI. In 2017, [[Wired (magazine)|''Wired'' magazine]] stated that Christiano and his colleagues at OpenAI weren't worried about the destruction of the human race by "evil robots", explaining that "[t]hey’re more concerned that, as AI progresses beyond human comprehension, the technology’s behavior may diverge from our intended goals."<ref>{{Cite magazine |last=Newman |first=Lily Hay |date=September 2017 |title=Should We Worry? – Will AI Turn Against Me? |magazine=[[Wired (magazine)|Wired]] |url=https://www.wired.com/2017/08/dont-worry-be-happy/ |access-date=2023-11-16}}</ref>
+However, in a widely quoted interview with ''[[Business Insider]]'' in 2023, Christiano said that there is a “10–20% chance of AI takeover, [with] many [or] most humans dead.” He also conjectured a “50/50 chance of doom shortly after you have AI systems that are human level.”<ref>{{Cite web |last=Nolan |first=Beatrice |title=Ex-OpenAI researcher says there's a 50% chance AI development could end in 'doom' |url=https://www.businessinsider.com/openai-researcher-ai-doom-50-chatgpt-2023-5 |access-date=2023-06-04 |website=Business Insider |language=en-US}}</ref><ref name=":0" />
+== Personal life ==
+Christiano is married to Ajeya Cotra of [[Open Philanthropy]].<ref>{{Cite news |last=Piper |first=Kelsey |date=June 2023 |title=A Field Guide to AI Safety |work=Asterisk Magazine |issue=3 |url=https://asteriskmag.com/issues/03/a-field-guide-to-ai-safety |access-date=2023-11-16}}</ref>
 == References ==
@@ Line 31: / Line 43: @@
 ==External links==
 * [https://paulfchristiano.com/ Personal website]
 {{Authority control}}
-[[Category:Theoretical computer scientists]]
+[[Category:American theoretical computer scientists]]
 [[Category:Year of birth missing (living people)]]
 [[Category:Living people]]