How can we teach artificial intelligence systems to act in accordance with human goals and values

How can we teach artificial intelligence systems to act in accordance with human goals and values?

Many researchers interact with AI systems to teach them human values, using techniques like inverse reinforcement learning (IRL). In theory, with IRL, an AI system can learn what humans value and how to best assist them by observing human behavior and receiving human feedback.

But human behavior doesn’t always reflect human values, and human feedback is often biased. We say we want healthy food when we’re relaxed, but then we demand greasy food when we’re stressed. Not only do we often fail to live according to our values, but many of our values contradict each other. We value getting eight hours of sleep, for example, but we regularly sleep less because we also value working hard, caring for our children, and maintaining healthy relationships.

AI systems may be able to learn a lot by observing humans, but because of our inconsistencies, some researchers worry that systems trained with IRL will be fundamentally unable to distinguish between value-aligned and misaligned behavior. This could become especially dangerous as AI systems become more powerful: inferring the wrong values or goals from observing humans could lead these systems to adopt harmful behavior.

Distinguishing Biases and Values

Owain Evans, a researcher at the Future of Humanity Institute, and Andreas Stuhlmüller, president of the research non-profit Ought, have explored the limitations of IRL in teaching human values to AI systems. In particular, their research exposes how cognitive biases make it difficult for AIs to learn human preferences through interactive learning.

Evans elaborates: “We want an agent to pursue some set of goals, and we want that set of goals to coincide with human goals. The question then is, if the agent just gets to watch humans and try to work out their goals from their behavior, how much are biases a problem there?”

In some cases, AIs will be able to understand patterns of common biases. Evans and Stuhlmüller discuss the psychological literature on biases in their paper, Learning the Preferences of Ignorant, Inconsistent Agents, and in their online book, agentmodels.org. An example of a common pattern discussed in agentmodels.org is “time inconsistency.” Time inconsistency is the idea that people’s values and goals change depending on when you ask them. In other words, “there is an inconsistency between what you prefer your future self to do and what your future self prefers to do.”

Examples of time inconsistency are everywhere. For one, most people value waking up early and exercising if you ask them before bed. But come morning, when it’s cold and dark out and they didn’t get those eight hours of sleep, they often value the comfort of their sheets and the virtues of relaxation. From waking up early to avoiding alcohol, eating healthy, and saving money, humans tend to expect more from their future selves than their future selves are willing to do.

With systematic, predictable patterns like time inconsistency, IRL could make progress with AI systems. But often our biases aren’t so clear. According to Evans, deciphering which actions coincide with someone’s values and which actions spring from biases is difficult or even impossible in general.

“Suppose you promised to clean the house but you get a last minute offer to party with a friend and you can’t resist,” he suggests. “Is this a bias, or your value of living for the moment? This is a problem for using only inverse reinforcement learning to train an AI — how would it decide what are biases and values?”

Scientist

 

Learning the Correct Values

Despite this conundrum, understanding human values and preferences is essential for AI systems, and developers have a very practical interest in training their machines to learn these preferences.

Already today, popular websites use AI to learn human preferences. With YouTube and Amazon, for instance, machine-learning algorithms observe your behavior and predict what you will want next. But while these recommendations are often useful, they have unintended consequences.

Consider the case of Zeynep Tufekci, an associate professor at the School of Information and Library Science at the University of North Carolina. After watching videos of Trump rallies to learn more about his voter appeal, Tufekci began seeing white nationalist propaganda and Holocaust denial videos on her “autoplay” queue. She soon realized that YouTube’s algorithm, optimized to keep users engaged, predictably suggests more extreme content as users watch more videos. This led her to call the website “The Great Radicalizer.”

This value misalignment in YouTube algorithms foreshadows the dangers of interactive learning with more advanced AI systems. Instead of optimizing advanced AI systems to appeal to our short-term desires and our attraction to extremes, designers must be able to optimize them to understand our deeper values and enhance our lives.

Evans suggests that we will want AI systems that can reason through our decisions better than humans can, understand when we are making biased decisions, and “help us better pursue our long-term preferences.” However, this will entail that AIs suggest things that seem bad to humans on first blush.

One can imagine an AI system suggesting a brilliant, counterintuitive modification to a business plan, and the human just finds it ridiculous. Or maybe an AI recommends a slightly longer, stress-free driving route to a first date, but the anxious driver takes the faster route anyway, unconvinced.

To help humans understand AIs in these scenarios, Evans and Stuhlmüller have researched how AI systems could reason in ways that are comprehensible to humans and can ultimately improve upon human reasoning.

One method (invented by Paul Christiano) is called “amplification,” where humans use AIs to help them think more deeply about decisions. Evans explains: “You want a system that does exactly the same kind of thinking that we would, but it’s able to do it faster, more efficiently, maybe more reliably. But it should be a kind of thinking that if you broke it down into small steps, humans could understand and follow.”

This second concept is called “factored cognition” – the idea of breaking sophisticated tasks into small, understandable steps. According to Evans, it’s not clear how generally factored cognition can succeed. Sometimes humans can break down their reasoning into small steps, but often we rely on intuition, which is much more difficult to break down.

Cognitive Biases and AI Value Alignment: An Interview with Owain Evans

Are you looking for a similar paper or any other quality academic essay? Then look no further. Our research paper writing service is what you require. Our team of experienced writers is on standby to deliver to you an original paper as per your specified instructions with zero plagiarism guaranteed. This is the perfect way you can prepare your own unique academic paper and score the grades you deserve.

Use the order calculator below and get ordering with idealtermpapers.com now! Contact our live support team for any assistance or inquiry.

Type of paper Academic level Subject area
Number of pages Paper urgency Cost per page:
 Total:

Purchase Guarantee

Why ORDER at IdealTermPapers.com?

  • Educated and experienced writers.
  • Quality, Professionalism and experience.
  • Original Content writing.
  • Best customer support.
  • Affordable Pricing on orders.
  • Thorough research.
  • Ontime delivery of finished work.
  • 100% plagiarism free papers.

Reasonable Prices

  • To get the best quality papers isn’t cheap so don’t trust extremely low prices.
  • We can’t claim that we have unreasonably low prices because low prices equal to low quality.
  • Our prices are good and they balance with the quality of our work.
  • We have a Moneyback guarantee.

Original and Quality work

  • Our writers are professionals and they write your paper from scratch and we don’t encourage copy pasting.
  • All writers are assessed and they have to pass our standards for them to work with us.
  • Plagiarism is an offence and it’s never tolerated in our company.

Native Writers plus Researchers

  • Our writers are qualified and excellent and will guarantee the best performance in your order.
  • Our team has writers who have master's and PhD qualifications who can handle any assignment
  • We have the best standards in essay writing.

We have been in business for over 7 syears

  • We have always served our customers from all over the world and they have continued to order with us.
  • We value our customers since they have trusted us to do their assignments.
  • We are competent in our writing gained from experience over the years
  • Our company has 24/7 Live Support.

You will get

  •  Custom Admission Essay written by competent professional English writers.
  •  Free revisions according to our revision policy if required
  •  Paper format:  275 words per page, Times New Roman font and size 12, doublespaced text and1 inch margin
  •  On time delivery and direct order download
  •  Privacy guaranteed

We can help you:

  •  acquire a comprehensive professional presentation.
  •  get a unique and remarkable content as per your instructions.
  •  Get an additional portion that can be included to your existing presentation;
  •  turn your work in to an eyecatching presentation with well communicated ideas.
  •  improve your presentation to acquire the best professional standards.