- Author: Emily M. Bender
- Full Title: ChatGP-why: When, if ever, is synthetic text safe, appropriate, and desirable?
- Slides: Bender-GRAILE-2023.pdf
- Talk: YouTube
Notes
- There is a long list of examples of where the use of ChatGPT hasn’t turned out the way people expect.
- Language form does not contain meaning and this explains why language models don’t understand.
- Large language models are corpus models
- Claude Shannon worked on early language models
- Unigram language model models frequency of single words
- Bigram language model models frequency of words given previous word
- Trigram language model models frequency of words given previous two words
- Good uses for language models include:
- Spell checkers
- Machine transcription
- Machine translation
- Text input
- Neural networks are made of of perceptrons
- A perceptron is a simplified model of a neuron
- Transformer architecture is an arrangement or perceptrons
- Language models use word embeddings
- The number of words in training data determines the size of a language model
- Generative AI is a mis-use of a classification and ranking tool
- Generative AI produces plausible output not intelligence
- In order to determine whether a machine can understand and infer meaning, we need definitions understanding and meaning.
- Language competency makes it hard to separate form from meaning
- Form refers to the marks on a page for language, the arrangement of pixels for images or video, etc.
- Language meaning is the relationship between form and something external
- Understanding is the recovery of communicative intent from form
- Virtual assistants can understand limited instructions
- Language models exposed only to form can never learn meaning
- Language models do not learn the same way as babies
- Babies learn the relationship between form (sound, mouth movement) and meaning by forming connections with external cues that hint at communicative intent.
- The Octopus Paper show that form does not contain meaning
- Large language models have a significant environmental impact
- Environmental cost of large language models impacts marginalised communities
- The contents of the internet do not represent a balanced view of humanity
- The young and those from developed countries are more likely to have contributed to the volume of work available on the internet.
- Sampling the internet without bias is hard
- Large language models are too big
- Generative AI output does not contain communicative intent
- We bring our own understanding to language form
- When reading generative text, it is important to remember that the inference of meaning is our own.
- A Stochastic Parrot refers to the stitching together of form without meaning
- Coherence is in the eye of the beholder
- Synthetic text lacks accountability
- There is no Who behind generative text
- Generative AI pollutes the information ecosystem
- Information retrieval is a terrible use-case for a large language model
- The more accurate generative text becomes the more dangerous it is
- Chatbots hide the sources of the information they regurgitate
- Responsible use-cases for generative AI include:
- where the only thing that matters is form
- text must not confuse author with a person
- text needs to clearly articulate biases
- consider labor practices
- consider data theft
- Good use-cases for generative AI include:
- a dialogue partner for language learning
- a non-playable character
- writing support
- Good use-cases for generative text must consider the costs
- Be a critical consumer of AI
- We need to understand how the AI technology was evaluated in the context in which it is being used.
- We need to understand who benefits from the use of AI instead of a human?
- You are responsible for your use of generative text
- We must insist on transparency of source material in the training data.
- Talk to students about what generative AI is
- Use of generative AI in education is a missed learning opportunity
- Use of generative AI by students indicates broader problem
Further Information
Three podcasts worth subscribing to on AI: