Research:
I'm interested in computational photography and inverse problems that look at the
whole imaging pipeline, from signal collection to scene reconstruction. MRIs, modulated light
sources, or mobile phones; I love working with real devices and real data.
Over the course of my research I've written a number of open-source data collection apps:
- Pani (Android, camera2) : An all-in-one
camera app for continuous recording of Bayer RAWs, accelerometer values, gyroscope measurements,
and a metric ton of device metadata from multiple camera configurations (main, ultrawide,
telephoto). I am actively using this app in my current work, and so plan to continue expanding
its features over time.
- SoaP-App (iOS,
AVFoundation) : A "long-burst" capture app for recording up to 42 frame sequences of Bayer
RAWs, depth maps, accelerometer values, gyroscope measurements, and metadata.
-
HNDR-App (iOS, ARKit) : A "long-burst" capture app for recording up to 120 frame sequences
of processed RGB images, depth maps, and pose estimates (from ARKit world tracking).
"If you try and take a cat apart to see how it works, the first thing you have on your hands is a non-working cat."
- Douglas Adams
|
|
Neural Light Spheres for Implicit Image Stitching and View Synthesis
Ilya Chugunov,
Amogh Joshi,
Kiran Murthy,
Francois Bleibel,
Felix Heide
SIGGRAPH Asia, 2024
We design a spherical neural light field model for implicit panoramic image stitching and re-rendering, capable of handling depth parallax, view-dependent lighting, and scene motion. Our compact model decomposes the scene into view-dependent ray offset and color components, and with no volume sampling achieves real-time 1080p rendering.
|
We design a spherical neural light field model for implicit panoramic image stitching and re-rendering, capable of handling depth parallax, view-dependent lighting, and scene motion. Our compact model decomposes the scene into view-dependent ray offset and color components, and with no volume sampling achieves real-time 1080p rendering.
|
|
Split-Aperture 2-in-1 Computational Cameras
Zheng Shi*,
Ilya Chugunov*,
Mario Bijelic,
Geoffroi Côté,
Jiwoon Yeom,
Qiang Fu,
Hadi Amata,
Wolfgang Heidrich,
Felix Heide
SIGGRAPH, 2024
Split-aperture 2-in-1 computational cameras encode half the aperture with a diffractive optical
element to simultaneously capture optically coded and conventional images in a single device. Using
a dual-pixel sensor, our camera separates the wavefronts, retaining high-frequency content and
enabling single-shot high-dynamic-range, hyperspectral, and depth imaging.
|
Split-aperture 2-in-1 computational cameras encode half the aperture with a diffractive optical
element to simultaneously capture optically coded and conventional images in a single device. Using
a dual-pixel sensor, our camera separates the wavefronts, retaining high-frequency content and
enabling single-shot high-dynamic-range, hyperspectral, and depth imaging.
|
|
Neural Spline Fields for Burst Image Fusion and Layer Separation
Ilya Chugunov,
David Shustin,
Ruyu Yan,
Chenyang Lei,
Felix Heide
CVPR, 2024
We propose neural spline fields, coordinate networks trained to map input 2D points to vectors of
spline control points, as a versatile representation of pixel motion during burst photography. This
flow model can fuse images during test-time optimization using just photometric loss, without
regularization. Layering these representations, we can separate effects such as occlusions,
reflections, shadows and more.
|
We propose neural spline fields, coordinate networks trained to map input 2D points to vectors of
spline control points, as a versatile representation of pixel motion during burst photography. This
flow model can fuse images during test-time optimization using just photometric loss, without
regularization. Layering these representations, we can separate effects such as occlusions,
reflections, shadows and more.
|
|
Shakes on a Plane: Unsupervised Depth Estimation from Unstabilized Photography
Ilya Chugunov,
Yuxuan Zhang,
Felix Heide
CVPR, 2023
In a “long-burst”, forty-two 12-megapixel RAW frames captured in a two-second sequence, there is
enough parallax information from natural hand tremor alone to recover high-quality scene depth. We
fit a neural RGB-D model directly to this long-burst data to recover depth and camera motion with no
LiDAR, no external pose estimates, and no disjoint preprocessing steps.
|
In a “long-burst”, forty-two 12-megapixel RAW frames captured in a two-second sequence, there is
enough parallax information from natural hand tremor alone to recover high-quality scene depth. We
fit a neural RGB-D model directly to this long-burst data to recover depth and camera motion with no
LiDAR, no external pose estimates, and no disjoint preprocessing steps.
|
|
GenSDF: Two-Stage Learning of Generalizable Signed Distance Functions
Gene Chou,
Ilya Chugunov,
Felix Heide
NeurIPS, 2022 (Featured)
Signed distance fields (SDFs) can be a compact and convenient way of representing 3D objects, but
state-of-the-art learned methods for SDF estimation struggle to fit more than a few shapes at a
time. This work presents a two stage semi-supervised meta-learning approach that learns generic
shape priors to reconstruct over a hundred unseen object classes.
|
Signed distance fields (SDFs) can be a compact and convenient way of representing 3D objects, but
state-of-the-art learned methods for SDF estimation struggle to fit more than a few shapes at a
time. This work presents a two stage semi-supervised meta-learning approach that learns generic
shape priors to reconstruct over a hundred unseen object classes.
|
|
Centimeter-Wave Free-Space Time-of-Flight Imaging
Seung-Hwan Baek,
Noah Walsh,
Ilya Chugunov,
Zheng Shi,
Felix Heide
SIGGRAPH, 2022
Modern AMCW time-of-flight (ToF) cameras are limited to modulation frequencies of several hundred
MHz by silicon absorption limits. In this work we leverage electro-optic modulators to build the
first free-space GHz ToF imager. To solve high-frequency phase ambiguities we alongside introduce a
segmentation-inspired neural phase unwrapping network.
|
Modern AMCW time-of-flight (ToF) cameras are limited to modulation frequencies of several hundred
MHz by silicon absorption limits. In this work we leverage electro-optic modulators to build the
first free-space GHz ToF imager. To solve high-frequency phase ambiguities we alongside introduce a
segmentation-inspired neural phase unwrapping network.
|
|
The Implicit Values of A Good Hand Shake: Handheld Multi-Frame Neural Depth Refinement
Ilya Chugunov,
Yuxuan Zhang,
Zhihao Xia,
Xuaner (Cecilia) Zhang,
Jiawen Chen,
Felix Heide
CVPR, 2022 (Oral)
Modern smartphones can stream multi-megapixel RGB images, high-quality 3D pose information, and
low-resolution depth estimates at 60Hz. In tandem, the natural shake of a phone photographer's hand
provides us with dense micro-baseline parallax depth cues during viewfinding. This work explores how
we can combine these data streams to get a high-fidelity depth map from a single snapshot.
|
Modern smartphones can stream multi-megapixel RGB images, high-quality 3D pose information, and
low-resolution depth estimates at 60Hz. In tandem, the natural shake of a phone photographer's hand
provides us with dense micro-baseline parallax depth cues during viewfinding. This work explores how
we can combine these data streams to get a high-fidelity depth map from a single snapshot.
|
|
Mask-ToF: Learning Microlens Masks for Flying Pixel Correction in Time-of-Flight Imaging
Ilya Chugunov,
Seung-Hwan Baek,
Qiang Fu,
Wolfgang Heidrich,
Felix Heide
CVPR, 2021
Flying pixels are pervasive depth artifacts in time-of-flight imaging, formed by light paths from
both an object and its background connecting to the same sensor pixel. Mask-ToF jointly learns a
microlens-level occlusion mask and refinement network to respectively encode and decode geometric
information in device measurements, helping reduce these artifacts while remaining light efficient.
|
Flying pixels are pervasive depth artifacts in time-of-flight imaging, formed by light paths from
both an object and its background connecting to the same sensor pixel. Mask-ToF jointly learns a
microlens-level occlusion mask and refinement network to respectively encode and decode geometric
information in device measurements, helping reduce these artifacts while remaining light efficient.
|
|
Self-Contained Jupyter Notebook Labs Promote Scalable Signal Processing Education
Dominic Carrano,
Ilya Chugunov,
Jonathan Lee,
Babak Ayazifar,
6th International Conference on Higher Education Advances (HEAd), 2020
Jupyter Notebook labs can offer a similar experience to in-person lab sections while being
self-contained, with relevant resources embedded in their cells. They interactively demonstrate
real-life applications of signal processing while reducing overhead for course staff.
|
Jupyter Notebook labs can offer a similar experience to in-person lab sections while being
self-contained, with relevant resources embedded in their cells. They interactively demonstrate
real-life applications of signal processing while reducing overhead for course staff.
|
|
Multiscale Low-Rank Matrix Decomposition for Reconstruction of Accelerated Cardiac CEST MRI
Ilya Chugunov,
Wissam AlGhuraibawi,
Kevin Godines,
Bonnie Lam,
Frank Ong,
Jonathan Tamir,
Moriel Vandsburger
28th Annual Meeting of International Society for Magnetic Resonance in Medicine (ISMRM), 2020
Leveraging sparsity in the Z-spectrum domain, multi-scale low rank reconstruction of cardiac
chemical exchange saturation transfer (CEST) MRI can allow for 4-fold acceleration of scans while
providing accurate Lorentzian line-fit analysis.
|
Leveraging sparsity in the Z-spectrum domain, multi-scale low rank reconstruction of cardiac
chemical exchange saturation transfer (CEST) MRI can allow for 4-fold acceleration of scans while
providing accurate Lorentzian line-fit analysis.
|
|
Duodepth: Static Gesture Recognition Via Dual Depth Sensors
Ilya Chugunov,
Avideh Zakhor
IEEE International Conference on Image Processing (ICIP), 2019
Point cloud data integrated from two structured light sensors for gesture recognition implicitly via
a 3D spatial transform network can lead to improved results as compared to iterative closest point
(ICP) registered point clouds.
|
Point cloud data integrated from two structured light sensors for gesture recognition implicitly via
a 3D spatial transform network can lead to improved results as compared to iterative closest point
(ICP) registered point clouds.
|
|