Chatbot — Jemoka Knowledge Base

Two main Dialogue Systems architectures: frame based systems: talk to users + accomplish specific tasks LLM: reasoning as agents Dialogue Systems vs Chatbot Previously, when we say Chatbot we mean task-based systems humans and chat humans tend to think of Dialogue Systems as human-like even if they know its not. this makes users more prone to share private information and worry less about its disclosure. ELIZA see ELIZA LLM Chatbots Training Corpus C4: colossal clean crawled corpus patent, wikipedia, news Chatbots EmphaticDialogues SaFeRDialogues Pseudo-conversations: reddit, twitter, weibo Fine-Tuning quality: improving sensible and interesting responses safety: prevention of suggesting harmful actions IFT: perhaps you can add positive data as fine tuning as a part of instruction-finetuning step. Filtering: build a filter for whether something is safe/unsafe, etc. Retrieval Augmented Generation call search engine get back a retrieved passages shove them into prompt “based on this tasks, answer:” we can make Chatbots use RAG by adding “pseudo-participants” to make the chat bots, which the system should add. Evaluation task based systems: measure task performance chatbot: enjoyability by humans we evaluate chatbots by asking a human to assign a score, and observer is a third party that assigns a score via a transcript of a conversation. participants scoring interact with 6 turns, then score: avoiding repetition interestingness sensemaking fluency listening inquisitiveness humanness engagingness ACUTE-EVAL: choosing who you would like to speak to adversarial evaluation train a human/robot classifier, use it, use the inverse of its score at the metric of the chat bot task evaluatino measure overall task success, or measure slot error rate design system design Don’t build Frankenstein: safety (ensure people aren’t crashing cars), limiting representation harm (don’t demean social groups), privacy study users and task what are their values? how do they interact? build simulations wizard of oz study: observe user interaction with a HUMAN pretending to be a chat bot test the design test on users info leakage accidentally leaking information (microphone, etc.) intentionally leaking information due to advertising, etc.