Multiview Aerial Visual Recognition (MAVREC):

Can Multi-view Improve Aerial Visual Perception?

Aritra Dutta1, Srijan Das2, Jacob Nielsen3, Rajatsubhra Chakraborty2, and Mubarak Shah1

1 University of Central Florida, 2 University of North Carolina at Charlotte, 3 University of Southern Denmark



TL;DR; MAVREC- Multiview Aerial Visual RECognition dataset with synchronized videos recorded from different perspectives covering rural and urban pastures from European geographies.

🎉 MAVREC got accepted in CVPR 2024 🎉


12 Multi-modal Scenes

0.5M Frames

1.1M Bounding Boxes

10 Object Classes

2.5 hours of 2.7K resolution video


Abstract

Despite the commercial abundance of UAVs, aerial data acquisition remains challenging, and the existing Asia and North America-centric open-source UAV datasets are small-scale or low-resolution and lack diversity in scene contextuality. Additionally, the color content of the scenes, solar-zenith angle, and population density of different geographies influence the data diversity. These two factors conjointly render suboptimal aerial-visual perception of the deep neural network (DNN) models trained primarily on the ground-view data, including the open-world foundational models.

To pave the way for a transformative era of aerial detection, we present Multiview Aerial Visual RECognition or MAVREC, a video dataset where we record synchronized scenes from different perspectives --- ground camera and drone-mounted camera. MAVREC consists of around 2.5 hours of industry-standard 2.7K resolution video sequences, more than 0.5 million frames, and 1.1 million annotated bounding boxes.~This makes MAVREC the largest ground and aerial-view dataset, and the fourth largest among all drone-based datasets across all modalities and tasks.~Through our extensive benchmarking on MAVEREC, we recognize that augmenting object detectors with ground-view images from the corresponding geographical location is a superior pre-training strategy for aerial detection. Building on this strategy, we benchmark MAVREC with a curriculum-based semi-supervised object detection approach that leverages labeled (ground and aerial) and unlabeled (only aerial) images to enhance the aerial detection.


Sample Frames

Image 1 Image 2

Different sample scenes (with annotation) from our dataset; the first row is the aerial-view, second row presents the same scenes from a ground camera. Similarly, the third row is the aerial-view, and the fourth row presents the same scenes from a ground camera. Some scenes have a dense object annotations, while some scenes have very few object annotations. This high variance in object distribution across different scenes in MAVREC is complementary to datasets like VisDrone where object detection is relatively straightforward due to their biased object distribution (dense), reflecting its demographic characteristics.



10 Object Classes

Image 1
Image 1

Dominant colors in MAVREC and other datasets

Dominant colors in sample frames of MAVREC

DVD - Dual-View Drone Dataset Qualitative Results

Dominant colors in sample frames of other state-of-the-art drone datasets

DVD - Dual-View Drone Dataset Qualitative Results


MAVREC Toy Dataset

We provide a small low-resolution toy dataset of MAVREC consisting of 100 images from each view.

Download Toy Dataset


Annotation Format

We adopt the MSCOCO Annotation Format. We extend the format of images by adding a scene and a frame identifier. We provide aligned annotation files for corresponding ground and aerial.

              
                {
                  "id": 1,
                  "file_name": "scene_12_sdu_30Sec_droneView_6_000826.PNG",
                  "height": 337,
                  "width": 600.0
                  "scene": 12,
                  "frameID": 826,
                },
              
            

Qualitative Results

DVD - Dual-View Drone Dataset Qualitative Results

Citation


@InProceedings{Dutta_2024_CVPR,
    author = {Dutta, Aritra and Das, Srijan and Nielsen, Jacob and Chakraborty, Rajatsubhra and Shah, Mubarak},
    title = {Multiview Aerial Visual RECognition (MAVREC): Can Multi-view Improve Aerial Visual Perception?},
    booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
    month = {June},
    year = {2024},
    pages = {22678-22690}
}


Usage Licence

The dataset is protected under the CC-BY license of creative commons, which allows the users to distribute, remix, adapt, and build upon the material in any medium or format, as long as the creator is attributed. The license allows MAVREC for commercial use. As the authors of this manuscript and collectors of this dataset, we reserve the right to distribute the data.