Dataset Cards
Integration
Contribute
Attribute
Value
pretty_name
Conceptual Captions
annotations_creators
language_creators
languages
licenses
custom
multilinguality
size_categories
10M<n<100M
source_datasets
task_categories
task_ids
paperswithcode_id
Paper: Conceptual Captions Paper
Licenses: Attribution
Pairs of images and captions.
The dataset can be loaded directly via the squirrel Catalog API. Make sure that squirrel-dataset-core is installed via pip, which will register this dataset. Use the following code to load the data:
from squirrel.catalog import Catalog plugin_catalog = Catalog.from_plugins() it = plugin_catalog["conceptual-captions-12m"].get_driver().get_iter()
A sample from the training set is provided below:
{ 'url': 'https://i.pinimg.com...', 'error': False, 'image': array(...) 'caption': 'Peterbilt 359 custom built show me how to find this Large Cars kits...' }
error: True if there was an http error accessing the data.
name
CC12M
12M