Lecture 15¶
Announcements¶
- P3 due! P4 out! P2 grading is (slowly) happening!
Goals¶
- Discuss several different ways 3D reconstructions can be represented
- Explain why positional encodings are necessary for MLP neural networks in contexts like fitting 2D images or 3D scene representations.
- Be prepared to implement NeRF (Project 4).
- Know how to perform volume rendering along camera rays to predict a pixel color from the model
3D Representations¶
- SfM: given images, get camera pose and (sparse) 3D scene geometry
- Large-scale SfM result examples:
- Multiview stereo / 3D reconstruction: given SfM outputs, recover 3D model of the world
- Interesting question: how do you represent your 3D model?
Brainstorm¶
How would you reconstruct a 3D model of the world, given images, camera poses, and a sparse point cloud of the world?
How would you even represent a 3D model of the world?
These questions are clearly interrelated. Let's Brainstorm.
- Voxel grid
- Polygon mesh
- Continuous density volume
- Point clouds (denser!)
- Patch clouds
- Signed distance field
- Neural network!?
# boilerplate setup
%load_ext autoreload
%autoreload 2
%matplotlib inline
import os
import sys
src_path = os.path.abspath("../src")
if (src_path not in sys.path):
sys.path.insert(0, src_path)
# Library imports
import numpy as np
import imageio.v3 as imageio
import matplotlib.pyplot as plt
import skimage as skim
import cv2
import torch
import torch.nn as nn
import torch.nn.functional as F
# codebase imports
import util
import filtering
import features
import geometry
import ML
3D Representations - Some ideas¶
Source for some relevant visuals: https://courses.cs.washington.edu/courses/cse455/10wi/lectures/multiview.pdf
- Depth maps; multi-camera: depth map fusion
- Voxel grids
- Point clouds
- Patch clouds (surfels)
- Polygon mesh
- SDF
- Neural network!?
ML review: 0 to MLP¶
Review by example the anatomy, care, and feeding of an MLP
MLP: a bunch of linear layers stacked together with an activation function
import sklearn
import sklearn.datasets
moons = ML.scale_split(sklearn.datasets.make_moons(n_samples=1000, noise=0.1, random_state=0))
X, Xva, y, yva, xx, yy = moons
ML.plot_dataset(X, y)
Code tour:
- look at
ML.MLP()
- look at training routine below
- flag me down if something's not familiar!
def train(model, X, y, train_iters=1000):
optimizer = torch.optim.Adam(model.parameters())
for i in range(train_iters):
optimizer.zero_grad()
batch_indices = torch.randint(0, X.shape[0], (1000,))
batch_X, batch_y = X[batch_indices,:], y[batch_indices]
outputs = model(batch_X).squeeze()
loss = F.mse_loss(batch_y, outputs)
loss.backward()
optimizer.step()
return model
def plot_trained_model(model, X, y, xx, yy, encode=lambda x: x):
with torch.no_grad():
h, w = xx.shape
dense_X = encode(torch.vstack([xx.flatten(), yy.flatten()]).T)
dense_ypred = model(dense_X).reshape((h, w)).flip([0])
plt.gca().imshow(dense_ypred, extent=[xx.min(), xx.max(), yy.min(),yy.max()])
ML.plot_dataset(X, y)
model = ML.MLP(2)
model = train(model, X, y, train_iters=1000)
plot_trained_model(model, X, y, xx, yy)
X, y = ML.make_stripes(100, 4, 0.05)
ML.plot_dataset(X, y)
X, y = ML.make_stripes(500, 10, 0.0001)
xx, yy = np.meshgrid(np.arange(0, 1, 0.01), np.arange(0, 1, 0.01))
xx = torch.Tensor(xx)
yy = torch.Tensor(yy)
model = train(ML.MLP(2), X, y)
plot_trained_model(model, X, y, xx, yy)
Conclusion: high-frequency stuff is hard for the MLP to learn!
Question: We need to go deeper; will more layers fix this?
Try MLP_N
X, y = ML.make_stripes(5000, 15, 0.001)
xx, yy = np.meshgrid(np.arange(0, 1, 0.01), np.arange(0, 1, 0.01))
xx = torch.Tensor(xx)
yy = torch.Tensor(yy)
model = train(ML.MLP_N(2, 10, 128), X, y)
plot_trained_model(model, X, y, xx, yy)
To a point, but it's going to get expensive...
Alternative: "positional encoding"
Very handwavy intuition: "smear" the input signal across more input channels to allow the network to learn high-frequency stuff.
pi = torch.pi
# very barebones positional encoding:
def positional_encoding(X):
return torch.hstack([
torch.sin(2*pi * X),
torch.sin(4*pi * X),
torch.sin(8*pi * X),
torch.sin(16*pi * X),
torch.cos(2*pi * X),
torch.cos(4*pi * X),
torch.cos(8*pi * X),
torch.cos(16*pi * X)])
Xpe = positional_encoding(X)
model = train(ML.MLP(Xpe.shape[1]), Xpe, y)
plot_trained_model(model, X, y, xx, yy, encode=positional_encoding)
Neural Radiance Fields¶
Paper with helpful visuals: https://arxiv.org/pdf/2003.08934.pdf
Representation: continuous volume with color and density¶
- Basic idea: Parameterize a volumetric representation with an MLP
Color and density is a function of 3D location and view direction $f(x, y, z, \phi, \theta) = (r, g, b, \sigma)$
- Detail: density is constrained to depend only on location, not direction.
MLP Architecture:¶
Volume Rendering¶
Given a magic color-density-producing machine, how do you make an image?
(notes)
HW Problem 2¶
The somewhat obfusque equation for weighting samples along a volume rendering ray is: $$ C(𝐫)=\int_{t_n}^{t_f}T(t)\sigma(𝐫(t))𝐜(𝐫(t),𝐝)dt $$ in its continuous form, and the discretized quadriture equation is: $$ \begin{align*} \hat{C}(𝐫) &= \sum_{i=1}^N w_i 𝐜_i \\ &=\sum_{i=1}^{N}T_i(1-\text{exp}(-\sigma_iδ_i))𝐜_i \end{align*} $$ where N is the number of samples, $T_i=\text{exp}(-\sum_{j=1}^{i-1}\sigma_iδ_i)$, and $δ_i=t_{i+1}-t_i$ is the distance between adjacent samples. This boils down to a weighted sum of the colors ($\mathbf{c}_i$) along the sample ray.
To get some intuition for this, let's plug in a simple case and plot the weights. Let's take samples at $t = 1..10$ and assume that the density is 0 except for an constant-density object with density $\sigma= 0.4$ ranging between $t=4$ and $t=6$ inclusive. Using software of your choice, plug this situation into the above equation to compute the weights $w_{1..10}$, and plot these to show the weights on the 10 different sample points.
Positional Encoding¶
High-frequencies aren't learned well by the naive implementation, so use positional encoding
$$ γ(p)=(\text{sin}(2^0πp), \text{cos}(2^0πp), \text{sin}(2^1πp), \text{cos}(2^1πp), \cdots, \text{sin}(2^{L-2}πp), \text{cos}(2^{L-2}πp), \text{sin}(2^{L-1}πp), \text{cos}(2^{L-1}πp)) $$
More Recently: Gaussian Splats¶
Look ma, no MLP!
Just a giant cloud of 3D Gaussians that are "learned" (optimized) to minimize reprojection error!
Shiny results: https://repo-sam.inria.fr/fungraph/3d-gaussian-splatting/
Paper with some helpful visuals: https://repo-sam.inria.fr/fungraph/3d-gaussian-splatting/3d_gaussian_splatting_low.pdf