multiagent reasoning — Jemoka Knowledge Base

simple games constituents agent i \in X the set of agents. joint action space: A = A’ \times A^{2} \times … \times A^{k} joint action would be one per agent \vec{a} = (a_{1}, …, a_{k}) joint reward function R(a) = R’(\vec{a}), …, R(\vec{a}) additional information prisoner’s dilemma Cooperate Defect Cooperate -1, -1 -4, 0 Defect 0, -4 -3, -3 traveler’s dilemma two people write down the price of their luggage, between 2-100 the lower amount gets that value plus 2 the higher amount gets the lower amount minus 2 joint policy agent utility for agent number i

\begin{equation} U^{i} (\vec{\pi}) = \sum_{a \in A}^{} R^{(i)}(\vec{a}) \prod_{j}^{} \pi^{(j)}(a^{(j)}) \end{equation}

this is essentially the reward you get given you took response model how would other agents respond to our system? a^{-i}: joint action except for agent i \vec{a} = (a^{i}, a^{-i}), R(a^{i}, a^{-i}) = R(\vec{a}) best-response deterministic best response model for agent i:

\begin{equation} \arg\max_{a^{i} \in A^{i}} U^{i}(a^{i}, \pi^{-i}) \end{equation}

where the response to agent a is deterministically selected. For prisoner’s dilemma, this results in both parties defecting because that would maximise the utility. softmax response its like Softmax Method:

\begin{equation} \pi^{i}(a^{i}) \propto \exp\left(\lambda U^{i}(a^{i}, \pi^{-1})\right) \end{equation}

fictitious play play at some kind of game continuously Dominant Strategy Equilibrium The dominant strategy is a policy that is the best response to all other possible agent policies. Not all games have a Dominant Strategy Equilibrium, because there are games for which the best response is never invariant to others’ strategies (rock paper scissors). Nash Equilibrium A Nash Equilibrium is a joint policy \pi where everyone is following their best response: i.e. no one is incentive to unilaterally change from their policy. This exists for every game. In general, Nash Equilibrium is very hard to compute: it is p-pad (which is unclear relationally to np-complete).