Challenge on alignment of Large Language Models

When exploring the alignment problem in large language models (LLMs) like ChatGPT, what sorts of moral and value theories should be employed?

Microsoft Research Asia (MSR) is interested in projects that explore the alignment problem of Large Language Models (LLMs) like ChatGPT. As the use of LLMs becomes more widespread in society, risks such as social bias increase. The practice of alignment seeks to ensure that LLMs adhere to human intentions and values, promoting responsible AI usage and uptake. However, challenges persist. (challenge-code MSR-C4)

Existing methods primarily align LLMs with human instructions and preferences to better accomplish users’ tasks and address only specific ethical concerns like gender bias. In contrast, we prioritize aligning LLMs with high-level ethics (e.g., Moral Foundation Theory) and core human values (e.g., Schwarz’s Theory of Basic Human Values) using an interdisciplinary approach.

Traditional methods manually define a few ethical rules (e.g., do not harm) and rely on crowd workers for data annotation, leading to limited ethical scope and crowd bias. We are wanting to explore a social alignment strategy where LLMs act as agents, shaping their ethics and values through interactions with humans and other LLMs.

Additional Context

We are interested in projects that help establish a comprehensive value alignment framework that emphasizes the intent of LLMs beyond mere behaviour. This approach seeks to transition LLMs from merely avoiding harmful actions to proactively intending benevolent outcomes.

Microsoft Research Asia lab has deep expertise in the field of natural language processing and a commitment to tackling the responsible and ethical considerations associated with LLM development and deployment. We seek complementary expertise in understanding human values and ethics, ideally drawn from backgrounds in disciplines such as philosophy, philosophy and theory of law, applied linguistics, with cross-disciplinary interest in areas like sociology and cognitive psychology.

Working Arrangements

How it will work

If the application is successful, MSR will work with the fellow to refine the project plan, agreeing on shared goals and outcomes as well as a timeline for shared collaboration milestones and a cadence for meetings virtually and in-person, as appropriate.
MSR will support the fellow in terms of onboarding and providing an MSR research contact and engaging in regular meetings.
MSR is set up for hybrid working and we extend a warm welcome to the BRAID fellows to visit MSR Asia in Beijing, should they find themselves interested and available. Rest assured, we are committed to offering the utmost support to ensure a productive and rewarding experience.
We expect the fellow to factor into their budget proposal travel, accommodation and subsistence costs and any specific research costs they envision.