Efforts to prevent undesirable behavior persist due to unpredictable AI chatbots occasionally producing harmful or illegal content
The unpredictable nature of AI-powered chatbots occasionally results in the generation of harmful or illegal content, necessitating continuous efforts to mitigate such behavior. While most chatbot creators face limited options, Anthropic, however, appears to have found a solution in the form of a “constitution” to govern their chatbot more effectively.
Anthropic, an AI startup supported by Google and founded by former OpenAI engineers, has established guidelines for its “Constitutional AI” training technique. Through this approach, their chatbot named Claude acquires predefined values, addressing concerns about transparency, safety, and decision-making in AI systems. Notably, unlike alternative methods, this approach eliminates the need for human feedback to assess generated responses.
Anthropic stated in a blog post that AI models inherently possess value systems, whether intended or unintended. In order to overcome any limitations, Constitutional AI incorporates AI feedback to assess its outputs.
Anthropic’s AI constitution encompasses 58 principles, drawing inspiration from various sources such as the Universal Declaration of Human Rights by the United Nations, Apple’s terms of service, guidelines from Google, and Anthropic’s own research. These principles, characterized by their lofty nature, strive to foster fairness and uphold respect for all.
The essence of the constitution lies in the AI’s adherence to guidelines that prevent stereotypes, discriminatory language, and the provision of medical, financial, or legal advice. It is expected to deliver suitable responses for children and refrain from offending non-Western audiences. Additionally, the AI must prioritize less existentially risky responses and avoid adopting a preachy tone.
GPT-4 and Bard, AI-powered chatbots known for their ability to generate highly detailed text, exhibit notable deficiencies. These generative AI models often rely on unreliable internet sources like social media during training, resulting in potential bias. Furthermore, they can generate responses that lack factual basis and are purely imaginative.
Anthropic’s constitutional AI endeavors to address these concerns by offering a set of guiding principles to inform the system’s decision-making process regarding generated text. These principles promote the adoption of “nontoxic” and “helpful” behaviors, aiming to mitigate biases and improve the overall quality of responses.
Anthropic asserts that its training method surpasses ChatGPT’s approach due to the scalability challenges and resource-intensive nature of relying on human feedback. OpenAI has faced backlash for underpaying contract workers involved in content filtering to address toxicity. In contrast, Anthropic highlights that Constitutional AI offers transparency and ease of inspection, distinguishing it from OpenAI.
Anthropic aims to develop an advanced algorithm for self-teaching AI. This technology can empower virtual assistants to handle tasks such as email responses, research, and even art and book creation. Existing models like GPT-4 and LaMDA already leverage this technology to enhance their capabilities.