Remote distance measurement from a single image by automatic detection and perspective correction

Research output: Contribution to journalArticlepeer-review

6 Citations (Scopus)


This paper proposes a novel method for locating objects in real space from a single remote image and measuring actual distances between them by automatic detection and perspective transformation. The dimensions of the real space are known in advance. First, the corner points of the interested region are detected from an image using deep learning. Then, based on the corner points, the region of interest (ROI) is extracted and made proportional to real space by applying warp-perspective transformation. Finally, the objects are detected and mapped to the real-world location. Removing distortion from the image using camera calibration improves the accuracy in most of the cases. The deep learning framework Darknet is used for detection, and necessary modifications are made to integrate perspective transformation, camera calibration, un-distortion, etc. Experiments are performed with two types of cameras, one with barrel and the other with pincushion distortions. The results show that the difference between calculated distances and measured on real space with measurement tapes are very small; approximately 1 cm on an average. Furthermore, automatic corner detection allows the system to be used with any type of camera that has a fixed pose or in motion; using more points significantly enhances the accuracy of real-world mapping even without camera calibration. Perspective transformation also increases the object detection efficiency by making unified sizes of all objects.

Original languageEnglish
Pages (from-to)3981-4004
Number of pages24
JournalKSII Transactions on Internet and Information Systems
Issue number8
Publication statusPublished - 2019


  • Deep learning; Real-world mapping
  • Object detection
  • Object locating
  • Remote measurement


Dive into the research topics of 'Remote distance measurement from a single image by automatic detection and perspective correction'. Together they form a unique fingerprint.

Cite this