H²O: Human-to-Human-or-Object Interaction Dataset
H²O is an image dataset annotated for Human-to-Human-or-Object interaction detection.
Dataset description
Images
H²O is composed of the 10 301 images from V-COCO dataset to which are added 3 635 images which mostly contain interactions between people.
Extra-images sources:
- MS-COCO dataset
- Human-Interaction-Images dataset
- BIT-Interaction dataset
- SBU-Kinect-Interaction dataset
- TV-Human-interaction dataset
- Pascal-Voc dataset
- ShakeFive2 dataset
- UT-Interaction dataset
- Web images
Annotations
All H²O images have been annotated with a new taxonomy of verbs including human-to-object and human-to-human interactions.
This taxonomy is composed of 51 verbs divided into 5 categories:
- Verbs decribing the general posture of the subject
- Verbs related to the way the subject is moving
- Verbs used for interactions with object
- Verbs describing human-to-human interactions
- Verbs of interactions involving strength or violence
Data were annotated with the open-source tool pixano
Dataset download
Please download and unzip the H2O.zip file.
- Images
To get images, please download V-COCO images and split it in trainval set
and test set
as defined in the V-COCO split files. Then rename it as HO[vcoco_id.zfill(10)].jpg
.
Then launch the download_HH_images.py
script to get the additional images.
The script first download the dataset from which are taken extra-images and then copy or download all images in a new directory H2O
next to the script.
-
Annotations
In H²O, each interacting object is annotated even if its category is not in the 80 COCO classes.
We provide 3 annotation folders:
-
initial_annotations/
where objects outside the 80 COCO classes are labeled with their real label. -
other_annotations/
where objects outside the 80 COCO classes are grouped under the label "other". -
vcocolike_annotations/
where objects outside the 80 COCO classes are not annotated: the interactions with such objects are thus considered as if they had no target object.
In each of these files, you can find 4 folders:
-
trainval
andtest
where the structure is as follows:
-
{
"entities":[
{
"sourceId": # Image name
"category": # Object category
"geometry":
{
"geometrytype": 1,
"isNormalized": true,
"vertices": [xmin, ymin, xmax, ymax] # Normalized coordinates
}
"id": # A uniq object Id
"actions":
[
{
"value": # Interaction verb
"targetId": # Target object Id / entity Id if the interaction has no target
"instrumentId": # Instrument object Id used to achieve the interaction
# / entity Id if the interaction has no instrument
}
]
}
]
}
- `trainval_vcoco` and `test_vcoco` which follows the V-COCO annotation file structure.
Evaluation
To run evaluation and compute the agent AP
and role AP
, get the V-COCO evaluation code and replace vsrl_eval.py
file by h2o_vsrl_eval.py
file provided in the H2O.zip file.
This new version allows the evaluation of a list of target objects for a given interaction.
As for the V-COCO evaluation, store your predictions as a pickle file (detections.pkl)
in the following format:
[
{
'image_id': # The H²O image name
'person_box': # [xmin, ymin, xmax, ymax] the box prediction for the person
'[action]_agent': # The score for action corresponding to the person prediction
# [action] is a verb from the list provided in `H2O_verb_list.json` file
'[action]_role': # [[x1, y1, x2, y2, s]], list of the predicted boxes for role and
# associated score for the action-role pair
# [action] is a verb from the list provided in `H2O_verb_list.json` file
}
]
To launch the evaluation, run:
from h2o_vsrl_eval import VCOCOeval
vcocoeval = VCOCOeval(vsrl_annot_file, coco_file, split_file)
# For the original scenario
# vsrl_annot_file: vcocolike_annotations/test_vcoco/interactions_test.json
# coco_file: vcocolike_annotations/test_vcoco/instances_test.json
# split_file: images_test.ids
# For the objectness scenario
# vsrl_annot_file: other_annotations/test_vcoco/interactions_test.json
# coco_file: other_annotations/test_vcoco/instances_test.json
# split_file: images_trainval.ids
vcocoeval._do_eval(detections.pkl, ovr_thresh=0.5)
License
Data annotations are under Creative Commons Attribution Non Commercial 4.0 license (see LICENSE file in H2O.zip file).
Evaluation code is under MIT license.
Citation
A. Orcesi, R. Audigier, F. Poka Toukam and B. Luvison, "Detecting Human-to-Human-or-Object (H2O) Interactions with DIABOLO," 2021 16th IEEE International Conference on Automatic Face and Gesture Recognition (FG 2021), 2021, pp. 1-8, doi: 10.1109/FG52635.2021.9667005
https://arxiv.org/abs/2201.02396
@INPROCEEDINGS{h2o_dataset_2021,
author={Orcesi, Astrid and Audigier, Romaric and Poka Toukam, Fritz and Luvison, Bertrand},
booktitle={2021 16th IEEE International Conference on Automatic Face and Gesture Recognition (FG 2021)},
title={Detecting Human-to-Human-or-Object (H<sup>2</sup>O) Interactions with DIABOLO},
year={2021},
month={dec},
pages={1-8},
doi={10.1109/FG52635.2021.9667005},
url={https://doi.org/10.1109%2Ffg52635.2021.9667005}}
Contact
If you have any question about this dataset, you can contact us by email at: h2o@cea.fr