Computer Science / Software Engineering Notes Network

Intelligent Agents

Matthew Barnes

Overview        3

Introduction to Intelligent Agents        3

Agents        4

Intelligent Agents        5

Multi-Agent System (MAS)        6

Agent-Based Negotiation        8

Part 1        8

Ultimatum Game        11

Alternating Offers        11

Monotonic Concession Protocol        12

Divide and Choose        14

Part 2        14

Utility functions        15

Utility space        15

Fairness        19

Part 3        21

Basic negotiation strategies        21

Uncertainty about the opponent        23

Preference uncertainty        25

Utility Theory        25

Decision under certainty        26

Decision under uncertainty        28

Pure Strategy Games        32

Strategic-form Games        33

Strictly dominated strategies        34

Weakly dominated strategies        35

Pure Nash Equilibria        36

Mixed Strategy Games        39

Mixed Nash Equilibria        42

Computing Mixed Nash Equilibria        43

Best Response Method        43

Indifference Principle        45

Extensive-Form Games        48

Strategic-Form Transformation        51

Equilibria        53

Backward Induction        56

Linear Programming        59

Computing Nash Equilibrium with LP        59

Structure of linear programs        62

Transforming linear programs        63

Beyond linear programming        65

Preference Elicitation        65

Preference ordering        65

Preference uncertainty        66

Multi Criteria Ranking using Multi-Attribute Additive Functions        67

Ordinal Regression via Linear Programming (UTA)        68

UTAGMS - Robust Ordinal Regression        73

Auctions        78

Types of auctions        80

Analysing Auctions        80

First-price auction expected utility        81

Vickrey Auction: Discrete bids expected utility        83

Vickrey Auction: Continuous bids expected utility        83

Strategy-proofness        84

Bayes-Nash Framework        85

Revenue Equivalence        86

Trading Agents        86

Voting        88

The Social Choice Problem        89

Plurality Vote        90

Copeland Method        91

Borda Count        91

Properties of voting procedures        92

Social Welfare Functions        93

Social Choice Functions        94


Overview

Introduction to Intelligent Agents

Agents

“A computer system capable of autonomous action in some environment, in order to achieve its delegated goals”

A pretty plain-looking thermostat. It doesn’t even have RGB.

Intelligent Agents

Multi-Agent System (MAS)

A multi-agent system of smol robots.

You could even teach a bunch of robot agents to play football with each other.

Agent-Based Negotiation

Part 1

To enable auctioning within agents, you must

3D-print a small gavel for them to use.

Voting: a core component of a civilised society,

according to Lord of the Flies.

An illustration of a negotiation;

not to be confused with TCP/IP.

Ultimatum Game

  1. Agent 1 suggests a division of the cake
  2. Agent 2 can choose to accept or reject the cake
  3. If agent 2 rejects, nobody gets the cake

Agent 1 is not a very reasonable agent.

Alternating Offers

  1. Agent 1 starts off with an offer.
  2. Agent 2 can accept or reject. If agent 2 rejects, agent 2 can make a counter-offer.
  3. Agent 1 can accept or reject the counter-offer. If agent 1 rejects, agent 1 can make a counter-counter-offer.
  4. This keeps on looping until an agreement is made.

How can they negotiate how much cake to eat

if they don’t have hands to eat with?

  1. The vendor suggests ¥1,000 for five kebabs.
  2. Joseph makes a counter-offer of ¥250.
  3. The vendor makes a counter-counter-offer for ¥700.
  4. They converge with the offers ¥300, then ¥600, then ¥350, then ¥550, then ¥400, then ¥450, until eventually they reach a consensus with ¥425.

Monotonic Concession Protocol

Buyer

Seller

Description

£5

£30

The buyer wants to buy something, and the seller is selling it at a haggled price.

The buyer makes an offer for £5, and the seller makes an offer for £30. These are shown simultaneously.

The buyer thinks their offer of £5 is way better for them than the seller’s deal of £30.

The seller thinks their offer of £30 is way better for them than the buyer’s deal of £5.

Therefore, another round begins.

£10

£20

Both agents concede and make new offers. The prices converge.

The buyer still thinks their offer is better off for them than the seller’s offer, and vice versa.

Another round!

£10

🐐

£20

The seller does not concede and sticks with the same offer they had before. They’re certain that their product is worth £20 at the least.

The buyer is still making an offer of £10, but is also throwing in their best goat into the deal.

The seller thinks. Yesterday, at the market, the cheapest goat went for about £12. That means the buyer is basically paying £22, with the value of the goat included. That’s even more than the seller is offering!

Right now, the seller is rating the buyer’s offer as greater than or equal to the rating of the seller’s own offer. That means an agreement can be made.

The seller walks away with £10 and a new goat, and the buyer walks away with whatever it was the seller was selling (use your imagination).

Divide and Choose

Mmmm, cake 🍰

Part 2

Utility functions

Utility space

Is it worth putting icing on an ice cake?

I just wanted to be different.

Fairness

Part 3

Basic negotiation strategies

  1. β → 0 - Hard-headed
  1. β = 1 - Linear time-dependent concession
  1. β < 1 - Boulware
  1. β > 1 - Conceder

Uncertainty about the opponent

If you really did know everything,

you probably wouldn’t be reading these notes.

Preference uncertainty

Utility Theory

An example of a game. Although the kind of

games we’re reasoning with are a bit simpler.

Decision under certainty

 iff

Utility values are not legal tender and cannot be used in shops.

 where

Decision under uncertainty

Mmmm, ice cream 🍨

Outcomes

Lotteries

Ordinal

Preference relation over outcomes

Preference relation over lotteries

Cardinal

Utility functions

Expected utilities

  1.  satisfies continuity and independence.
  2. There exists a utility function over  such that for all :
  1. If  satisfies continuity and independence, then that means  when
  2. If  when , then  satisfies continuity and independence

Pure Strategy Games

Strategic-form Games

Strictly dominated strategies

Weakly dominated strategies

  1. Miss out on utility had you picked another strategy
  2. The utility you get is no different had you picked the other strategy

  1. For every strategy vector  of the other players,

  1. There exists a strategy vector  of the other players such that,

Pure Nash Equilibria

Mixed Strategy Games

I wonder what the mixed Nash equilibrium of poker is?

(more on mixed Nash equilibrium later!)

Mixed Nash Equilibria

Hype for competitive Matching Pennies at the next Olympics

Computing Mixed Nash Equilibria

Best Response Method

Indifference Principle

Extensive-Form Games

Ever noticed how Computer Science tables and trees

look nothing like their real-life counterparts?

Strategic-Form Transformation

(C,E)

(C,F)

(D,E)

(D,F)

(A,G)

3 / 8

3 / 8

8 / 3

8 / 3

(A,H)

3 / 8

3 / 8

8 / 3

8 / 3

(B,G)

5 / 5

2 / 10

5 / 5

2 / 10

(B,H)

5 / 5

1 / 0

5 / 5

1 / 0

(C,E)

(A,G)

3 / 8

(D,F)

(B,G)

2 / 10

Equilibria

(C,E)

(C,F)

(D,E)

(D,F)

(A,G)

3 / 8

3 / 8

8 / 3

8 / 3

(A,H)

3 / 8

3 / 8

8 / 3

8 / 3

(B,G)

5 / 5

2 / 10

5 / 5

2 / 10

(B,H)

5 / 5

1 / 0

5 / 5

1 / 0

Some of these choices are fine, like when P2 picks C over D because it maximises their utility.

However, look at P1 picking H over G. Why would they do that? They’re getting a utility of 1 instead of 2. They’re minimising their own utility.

I know that choice doesn’t affect things in this strategy, but is that the sort of behaviour you’d expect from a Nash equilibrium?

subgame perfect equilibria

Yes, the game is its own subgame.

((B,H),(C,E))

((A,H),(C,F))

No, these strategies are not subgame perfect.

Why? Because in the subgame with just G and H, P1 picks H and gets a utility of 1 instead of picking G and getting a utility of 2.

P1 can maximise their utility by picking a different pure strategy, so in this subgame, it’s not a Nash equilibrium.

((A,G),(C,F))

Let’s go through each subgame and see if this strategy is a Nash equilibrium in all of them.

N/A

G

2 / 10

H

1 / 0

G is a Nash equilibrium in this subgame.

In our strategy, P1 plays G in (A,G), so our strategy is a Nash equilibrium in this subgame.

C

D

N/A

3 / 8

8 / 3

C is a Nash equilibrium in this subgame.

In our strategy, P2 plays C in (C,F), so our strategy is a Nash equilibrium in this subgame too.

E

F

G

5 / 5

2 / 10

H

5 / 5

1 / 0

(G,F) and (H,E) are Nash equilibrium in this subgame.

In our strategy, P1 plays G in (A,G) and P2 plays F in (C,F), so our strategy is a Nash equilibrium in this subgame.

Backward Induction

It’s like mathematical induction, but instead of dominos, it’s, uh... trees.

  1. Take the tree representing an extensive-form game with  players.
  2. Starting from the smallest subgames (i.e. those rooted at nodes whose children are terminal nodes), compute the Nash equilibrium for the player making the choice
  3. Replace the subgame with utility  obtained, i.e. make the node of the subgame a terminal node and label it with the utility of the equilibrium.
  4. Repeat the process for every subgame rooted in a node leading to terminal nodes, i.e.: compute the Nash equilibrium and replace the game with the result of its equilibrium.
  5. If at any step a subgame has multiple Nash equilibria (i.e. when the utilities of different choices are the same), select any of them.
  6. The algorithm terminates when we reach the root node of the whole game and runs in polynomial time in the size of the tree.
  7. For every player , let  be the vector where each component is the choice made by the player in the process of selecting the equilibrium of a subgame.
  8.  is a subgame perfect equilibrium and the utility  at the root node is the value of the equilibrium.

((_,G),(_,_))

First, let’s look at this subgame on the right. P1 is playing, and naturally, they’d want to maximise their utility.

That means P1 would want to pick the left choice ‘G’, right? They’d get a utility of 2 instead of 1.

Therefore, P1’s choice at this node is G, and we replace this non-terminal subgame root node with a terminal node of utility (2,10).

((_,G),(_,F))

Now, let’s look at the next subgame on the right. P2 is playing, so they’d want to maximise their utility.

P2 would pick the right option ‘F’, so they’d get a utility of 10 instead of 5.

So, like before, we remember that P2 should pick F at this node, and replace this subgame with a terminal node of utility (2,10).

((_,G),(C,F))

Let’s look at the subgame on the left. Again, P2 is playing.

P2 should pick ‘C’, as they’ll get a utility of 8 instead of 3. This subgame will be replaced with (3,8).

((A,G),(C,F))

This is the final subgame! P1 is playing.

P1 should pick ‘A’, as they’ll get 3 instead of 2.

Since this is the final subgame, we have now compiled our entire strategy. This strategy is subgame perfect, and will yield P1 and P2 utilities of 3 and 8, respectively.

I wanted to put a picture of a centipede here, but I couldn’t find a good or cute picture of a centipede; only disgusting ones, so here’s a clipart caterpillar instead.

Linear Programming

Computing Nash Equilibrium with LP

Structure of linear programs

Unique solutions

Multiple solutions

Unbounded solutions

Transforming linear programs

You can introduce slack variables,

but you can’t procrastinate from work with them.

Beyond linear programming

Preference Elicitation

Preference ordering

Preference uncertainty

I wanted to reference some other flavours,

but there weren’t enough unique text colours.

Multi Criteria Ranking using Multi-Attribute Additive Functions

Flavours

Sweetness

g1

Texture

g2

Richness

g3

Strawberry

70

60

20

Vanilla

60

60

30

Chocolate

40

50

80

Ordinal Regression via Linear Programming (UTA)

Contrary to popular belief, the UTA method was not invented in Utah.

A visual representation of linear interpolation.

There's no way I’m typing all that.

  1. This constraint makes sure that when U(c) ≥ U(d), that also means c ≻ d.
  2. This constraint makes sure that when U(c) = U(d), that also means c ∼ d.
  3. This constraint makes sure that any breakpoint is always bigger than its preceding one, so e.g. makes sure that u1(75) is bigger than u1(50).
  4. This constraint makes sure that the breakpoint of the worst finite evaluation is always 0 (in other words, the breakpoints always start at 0).
  5. This constraint ensures that ui’s are chosen so that U is normalised.
  6. This constraint defines the range of the error values (0 or greater).

UTAGMS - Robust Ordinal Regression

  1. When U(c) > U(d), then also c ≻ d
  2. When U(c) = U(d), then also c ∼ d
  3. The marginal value of an outcome, with respect to an issue, should be greater than the marginal value of another outcome, given that the evaluation of the first outcome is greater than that of the second. E.g. given that gi(strawberry) = 70 and gi(vanilla) = 60, this checks that ui(gi(strawberry)) ≥ uigi(vanilla)).
  4. This checks the range of the marginal values of outcomes. It checks that the marginal value of the outcome with the smallest evaluation doesn’t go below 0, and that the marginal value of the outcome with the biggest evaluation isn’t above the maximum.
  5. This sets the minimum marginal value to 0.
  6. This is here to normalise the value function U.

Save your breath asking me, I have no idea why it’s called that.

Not even the slides explain.

  1. When U(c) > U(d), then also c ≻ d
  2. When U(c) = U(d), then also c ∼ d
  3. The marginal value of one characteristic point is greater than the marginal value of the characteristic point that came before it. For example, say gi(strawberry) = 70 and gi(bubblegum) = 65, this constraint checks that ui(strawberry) ≥ ui(bubblegum).
  4. The marginal value of the smallest characteristic point should be 0.
  5. This is here to normalise the value function U.

Auctions

I’m auctioning off a froggy chair.

Types of auctions

English auction:

  1. Auctioneer starts at a low “reservation price”
  2. Bidders shout ascending prices, with some minimum increment set by the auctioneer.
  3. Auction ends once bidders stop bidding
  4. Highest bidder wins the item and pays their bid.

Dutch auction:

  1. Auctioneer starts at a high bidding price
  2. Auctioneer lowers the price until someone bids
  3. The item is allocated to the first bidder at that current price

Sealed-bid auction:

  1. Each bidder submits a bit in a sealed envelope
  2. Bidders do not see each other’s bids
  3. Bids are collected by the auctioneer
  4. The auctioneer determines the winner and the price to pay
  • This depends on the type of sealed-bid auction.
  • In first-price sealed-bid auction, the highest bidder wins and pays the price they bidded.

Vickrey (second-price sealed-bid) auction:

  • A type of sealed-bid auction where the highest bidder wins, but they pay the second highest bid (or reserve price, whichever is higher).
  • Named after William Vickery.

Analysing Auctions

Alongside Bayesian games, we’ll also be covering FNAF theories.

First-price auction expected utility

Vickrey Auction: Discrete bids expected utility

Vickrey Auction: Continuous bids expected utility

Strategy-proofness

We win

We lose

We bid higher than

Equal utility

We still win, and pay the same amount.

Lower utility

We either lose and pay nothing, or win but pay more than our valuation (negative utility).

We bid lower than

Lower utility

We either win and pay the same amount, or lose and get nothing.

Equal utility

We still lose and pay nothing.

Bayes-Nash Framework

Revenue Equivalence

Trading Agents

If you feel like procrastinating,

here’s a Drawception I found about robot auctioneers.

Voting

If you don’t like this example, then you won’t like the rest of this section.

The Social Choice Problem

You go to Sprinkles, with 14 other people.

There is a world shortage of ice-cream, so you can only pick one scoop of one flavour to share.

Each voter has a numerical ID, and a numerical preference shown on the right, with 1 being the preferred option and 3 being the least preferred.

For voter 1, their preference would be:

strawberry  vanilla  chocolate

Voter

Strawberry

Vanilla

Chocolate

1

1

2

3

2

1

2

3

3

1

2

3

4

1

2

3

5

1

2

3

6

1

2

3

7

3

2

1

8

3

2

1

9

3

2

1

10

3

2

1

11

3

2

1

12

3

2

1

13

2

3

1

14

2

1

3

15

2

1

3

Plurality Vote

Copeland Method

Borda Count

Voter

Strawberry

Vanilla

Chocolate

1

1 (3)

2 (2)

3 (1)

2

1 (3)

2 (2)

3 (1)

3

1 (3)

2 (2)

3 (1)

4

3 (1)

1 (3)

2 (2)

5

3 (1)

1 (3)

2 (2)

Borda count

11

12

7

Properties of voting procedures

Social Welfare Functions

Social Choice Functions

In a manipulable voting system,

it’s possible to lie to get a better outcome.