H²O: Human-to-Human-or-Object Interaction Dataset

H²O is an image dataset annotated for Human-to-Human-or-Object interaction detection.

Dataset description


H²O is composed of the 10 301 images from V-COCO dataset to which are added 3 635 images which mostly contain interactions between people.

Extra-images sources:


All H²O images have been annotated with a new taxonomy of verbs including human-to-object and human-to-human interactions.

This taxonomy is composed of 51 verbs divided into 5 categories:


Data were annotated with the open-source tool pixano

Dataset download

Please download and unzip the H2O.zip file.

To get images, please download V-COCO images and split it in trainval set and test set as defined in the V-COCO split files. Then rename it as HO[vcoco_id.zfill(10)].jpg.

Then launch the download_HH_images.py script to get the additional images. The script first download the dataset from which are taken extra-images and then copy or download all images in a new directory H2O next to the script.

			"sourceId":		# Image name
			"category":		# Object category
				"geometrytype": 1,
				"isNormalized": true,
				"vertices": [xmin, ymin, xmax, ymax]	# Normalized coordinates
			"id":			# A uniq object Id
					"value":		# Interaction verb
					"targetId":		# Target object Id / entity Id if the interaction has no target
					"instrumentId":	# Instrument object Id used to achieve the interaction
									# / entity Id if the interaction has no instrument

- `trainval_vcoco` and `test_vcoco` which follows the V-COCO annotation file structure.


To run evaluation and compute the agent AP and role AP, get the V-COCO evaluation code and replace vsrl_eval.py file by h2o_vsrl_eval.py file provided in the H2O.zip file.

This new version allows the evaluation of a list of target objects for a given interaction.

As for the V-COCO evaluation, store your predictions as a pickle file (detections.pkl) in the following format:

		'image_id':			# The H²O image name
		'person_box':		# [xmin, ymin, xmax, ymax] the box prediction for the person
		'[action]_agent':  	# The score for action corresponding to the person prediction
							# [action] is a verb from the list provided in `H2O_verb_list.json` file
		'[action]_role':  	# [[x1, y1, x2, y2, s]], list of the predicted boxes for role and 
                      		# associated score for the action-role pair
                      		# [action] is a verb from the list provided in `H2O_verb_list.json` file						

To launch the evaluation, run:

from h2o_vsrl_eval import VCOCOeval
vcocoeval = VCOCOeval(vsrl_annot_file, coco_file, split_file)

	# For the original scenario
	# vsrl_annot_file:	vcocolike_annotations/test_vcoco/interactions_test.json
	# coco_file: 		vcocolike_annotations/test_vcoco/instances_test.json
	# split_file: 		images_test.ids

	# For the objectness scenario
	# vsrl_annot_file:	other_annotations/test_vcoco/interactions_test.json
	# coco_file: 		other_annotations/test_vcoco/instances_test.json
	# split_file: 		images_trainval.ids

vcocoeval._do_eval(detections.pkl, ovr_thresh=0.5)


Data annotations are under Creative Commons Attribution Non Commercial 4.0 license (see LICENSE file in H2O.zip file).

Evaluation code is under MIT license.


A. Orcesi, R. Audigier, F. Poka Toukam and B. Luvison, "Detecting Human-to-Human-or-Object (H2O) Interactions with DIABOLO," 2021 16th IEEE International Conference on Automatic Face and Gesture Recognition (FG 2021), 2021, pp. 1-8, doi: 10.1109/FG52635.2021.9667005


	author={Orcesi, Astrid and Audigier, Romaric and Poka Toukam, Fritz and Luvison, Bertrand},
	booktitle={2021 16th IEEE International Conference on Automatic Face and Gesture Recognition (FG 2021)},
	title={Detecting Human-to-Human-or-Object (H<sup>2</sup>O) Interactions with DIABOLO},


If you have any question about this dataset, you can contact us by email at: h2o@cea.fr