What is a "dataset" in AI?
A dataset in AI refers to the data that was used to train an AI model. What the dataset consists of exactly depends on the purpose of the AI model and of the program that will use the AI model.
For example, in image recognition software and Stable Diffusion (a AI-based image generation tool), a dataset consists of both images and text labels for the images. For example, you could have a photo of a cat labelled "cat." That's because these software need to know what the object in the image is called in order to recognize it or generate it.
It's not really possible to create an image recognition software without labels. If the software is only supposed to recognize birds in photos, for example, we could have a dataset full of birds, but that won't be enough to train a reliable AI. We need to make sure the AI doesn't start recognizing random things as if they were birds, so we need photos WITHOUT birds as well. The simplest labels we could have in this case are "bird" and "no bird," or "yes" and "no." In either case, the dataset must include this extra data besides the photo.