Suppose you have dataset \left(x^{1}, y^{1}\right) …, \left(x^{n}, y^{n}\right), where y^{(j)} \in \left\{1,2,3,4\right\}. We can learn a model of this
\begin{align} \max_{\theta} L\left(\theta\right) &= \prod_{i=1}^{n} p\left(y^{(i)} \mid x^{(i)}; \theta\right) \\ &= \prod_{i=1}^{n}\theta_{1}^{1\left\{y_{i} = 1\right\}} \dots \theta_{4}^{1\left\{y_{i} = 4\right\}} \end{align}
the derivative ends up being nice. Derivation Consider a multinomial distribution in 4 elements. Let’s write this in terms of a n exponential family. Consider:
\begin{equation} \begin{cases} T\left(1\right) = \mqty(1 & 0 & 0) \\ T\left(2\right) = \mqty(0&1&0) \\ T\left(3\right) = \mqty(0&0&1) \end{cases} \end{equation}
And, given \phi_{1},\phi_{2},\phi_{3}:
\begin{equation} p\left(y\right) = \phi_{1}^{T\left(y\right)_{1}} \phi_{2}^{T\left(y\right)_{2}}\phi_{3}^{T\left(y\right)_{3}}\phi_{4}^{1-\left(T\left(y\right)_{1}+T\left(y\right)_{2}+T\left(y\right)_{3}\right)} \end{equation}
Taking the \exp \log \left(^{}\right) of the above, we obtain:
\begin{equation} p\left(y\right) = \exp \left(T\left(y\right)_{1}\log \frac{\phi_{1}}{\phi_{4}} + T\left(y\right)_{2} \log \frac{\phi_{2}}{\phi_{4}} + T\left(y\right)_{3} \log \frac{\phi_{3}}{\phi_{4}} + \log\left(\phi_{4}\right)\right) \end{equation}
Which we can now rewrite in the standard form of an exponential family, for which b\left(y\right) = 1 and then:
\begin{equation} \eta = \mqty(\log \frac{\phi_{1}}{\phi_{4}} \\ \log \frac{\phi_{2}}{\phi_{4}} \\ \log \frac{\phi_{3}}{\phi_{4}}) \end{equation}
and
\begin{equation} a\left(\eta\right) = -\log \left(\phi_{4}\right) \end{equation}
\begin{equation} b\left(y\right) = 1 \end{equation}
Solving for \phi_{j} in terms of \eta, we obtain:
\begin{align} \phi_{i} &= \frac{e^{\eta_{i}}}{1 + e^{\eta_{1}}+ e^{\eta_{2}}+ e^{\eta_{3}}} \\ &= \frac{e^{\theta_{i}^{T}x}}{1+\sum_{j=1}^{3} e^{\theta_{j}^{\top} x}} \end{align}
and we have:
\begin{equation} \phi_{4} = \frac{1}{1+\sum_{j=1}^{3} e^{\theta_{j}^{T}x}} \end{equation}
You may notice:
\begin{equation} \frac{e^{\eta_{j}}}{ 1+ e^{\eta_{1}}+ e^{\eta_{2}}+ e^{\eta_{3}}} = \phi_{j} \end{equation}
for j \in [1,3], and \frac{1}{1 + e^{\eta_{1}}+ e^{\eta_{2}}+ e^{\eta_{3}}} = \phi_{4}