James' Blog: 2013

Sunday, December 8, 2013

Matlab/MEX utility for loading VTK Rectilinear Grid files (.vtr files)

The title says it all, I've written some basic code for loading VTK rectilinear grid files into Matlab. The code supports uniformly spaced meshes in up to four dimensions for both point and cell data.

You can download the code from my Github repository:
https://github.com/jamesgregson/matlab_vtr

Tuesday, October 22, 2013

Index transformation between bounding boxes and uniform grids

I end up rewriting this code all the time. It transforms from points in space to grid coordinates world_to_grid() and back grid_to_world(), such that p = grid_to_world( aabb, dim, world_to_grid( aabb, dim, p ) ). It's simple, but bugs here can mess up lots of things in ways that are hard to detect. No range checking is performed in order to make it easy to apply custom boundary conditions.

    template< typename real, typename index, typename real3 >
    inline real3 _world_to_grid( const real *aabb, const index *dim, const real3 &pos ){
        return real3( real(dim[0]-1)*(pos[0]-aabb[0])/(aabb[1]-aabb[0]), real(dim[1]-1)*(pos[1]-aabb[2])/(aabb[3]-aabb[2]), real(dim[2]-1)*(pos[2]-aabb[4])/(aabb[5]-aabb[4]) );
    }
    
    template< typename real, typename index, typename real3 >
    inline real3 _grid_to_world( const real *aabb, const index *dim, const real3 &pos ){
        return real3( aabb[0]+real(pos[0])*(aabb[1]-aabb[0])/real(dim[0]-1), aabb[2]+real(pos[1])*(aabb[3]-aabb[2])/real(dim[1]-1), aabb[4]+real(pos[2])*(aabb[5]-aabb[4])/real(dim[2]-1) );
    }

The code assumes that the dim variable holds the number of points in each direction (e.g. int dim[] = { nx, ny, nz };) and that the first point is coincident with the minimum value of the axis-aligned bounding box with the last point coincident with the maximum value of the bounding box. The aabb parameter is the axis-aligned bounding box defining the grid in world space, e.g. int aabb[] = { xmin, xmax, ymin, ymax, zmin, zmax };

Thursday, October 10, 2013

A follow-up to fluid simulation on non-uniform grids

In my last post, I discussed preliminary results for fluid simulation on non-uniform Cartesian grids. In that post I showed some preliminary results, but there were some bugs that added disturbing artifacts.

I have fixed a number of bugs and now have a solver based on BFECC advection using min-max limited cubic interpolation for non-uniform and often highly anisotropic meshes for the velocity and pressure solve, with high-resolution uniform density fields for the density field. The results are fairly impressive:

This image shows a volume rendering (in Paraview) of a simulation computed using the grid in the background. Near the area of interest, the grid is uniform, but it grows very quickly (geometrically, with growth rate ~1.5) outside this region. The velocity/pressure grid is 133x127x134, but covers nearly a 10x6x10 m cubic volume, with 2cm cells in the fine region. The density field is 1x6x1 m with 1cm uniform resolution.

Being able to run different resolutions and gradings for the velocity and density fields is extremely helpful: fine fluid details distract from a poor velocity solution, and high-resolution densities help avoid diffusion in the physics. The image above shows the density as resolved by the fluid grid. It is terrible. However the density as resolved by the density grid is -way- better:

It's still not perfect, but given the cell size and anisotropy, I think it does extremely well. Although there are definite artifacts, the payoff is in the memory usage and runtime. The whole simulation is 5 seconds in real time and takes approximately 22 seconds per output frame, meaning my 5 second simulation completes in under an hour. These simulations used to take on the order of 4-5 hours.

The results look pretty good. I think the grading is too steep to get really nice results, but it's an excellent proof-of-concept.

Tuesday, October 8, 2013

Fluid Simulation for Graphics on Non-Uniform Structured Grids

I've been playing around with fluid simulation on non-uniform structured grids. They have some charms in that it is easy to have very large domains with isolated regions of interest; e.g. near the camera or salient fluid features. One of the big advantages is that it makes far-field boundaries easy, you simply extend the mesh far away from the domain.

My solver pretty run-of-the-mill; only the pressure solver was updated in order to handle anisotropic cells. I've used a finite-volume formulation to derive the pressure-correction equation for this case, but it suffices to say that only the weights used in forming the Poisson equation change.

Here is an example, a grid that is roughly 10x10x5 meters, with 2cm fine cells for roughly 100x100x100 cells:

The aspect ratios get quite high, 100:1 is not uncommon. You can see a buoyant flow simulation that I'm running as the contour values. The simulation takes about 20 seconds per frame, of which 7 seconds is writing the VTK output files that I use to post-process. In spite of that, the visual detail in the region of interest is excellent (given the resolution):

I find that it helps to advect a uniform resolution density field that is roughly 2X the fine cell resolution. Writing the code with general-purpose fields that can be point-sampled arbitrarily makes this a trivial feature to implement, simply allocate different grids for density and velocity.

Finally here is a rendering of the final simulation in Paraview, total simulation time ~35 minutes. There are still some bugs to track down, but the results are pretty promising:

Sunday, September 8, 2013

Updated C Mathematical Expression Parser

I've updated and posted my recursive descent mathematical expression parsing code. The new code is available from my Github repository:

https://github.com/jamesgregson/expression_parser

The original post describing the library is at: http://jamesgregson.blogspot.ca/2012/06/mathematical-expression-parser-in-c.html

New features include Boolean operations and a callback interface for custom named variables and functions. You can optionally disable the Boolean operations (and the 10 levels of parsing they trigger) if not needed.

Updated 4x4 transformation matching OpenGL

I've updated the code from my previous post: 4x4 transformation matching OpenGL to remove the requirement for a separate linear algebra library by introducing some 4x4 inverse code from Rodolphe Vaillant (originally at http://www.irit.fr/~Rodolphe.Vaillant/?e=7) along with some other minor improvements. You can still use Eigen or GMM to perform matrix inverses, but it is no longer a requirement.

As before, the code re-implements the main OpenGL matrix functions and includes code for testing the output against the system OpenGL. I use the code for a number of my imaging projects where it is desirable (but optional) to have an OpenGL preview.

The code is now also hosted at Github which should simply future updates:

https://github.com/jamesgregson/transformation

C++/Mex Image Deblurring using ADMM

I've posted some sample code on Github for performing image deblurring in Matlab using Mex. I wrote it as a way to play around with the ADMM algorithm for sparse signal reconstruction, as described in Stephen Boyd's ADMM paper, as well as to get some experience using C++ code from Matlab.

The code uses matrix-free linear operators to evaluate the forward and reverse measurement (blurring) process, solving the data-term subproblem with gradient descent. While you can do this much faster using FFTs for deconvolution, the code should generalize quite well to tomography with non-orthographic projections. I've tried to comment it well so that I can figure out what I did if I ever need to pick it up again.

In the image above, the left plot is the ground-truth image and the right is blurred by a point-spread function and corrupted by Gaussian noise. After running the demo, the following results are obtained:

The reconstructed image is shown on the right, while a slice through the center scanline shows the reconstruction (blue dots), ground-truth (solid black line) and input (red line). From the results it's clear that the method does well at preserving discontinuities even for fairly large blurs and noise levels.

The demo is pretty quick to run and is helpful for choosing parameters since you can see how the behavior of the method changes with the regularizer weight and the penalty term weight.

You can download the code from https://github.com/jamesgregson/MexADMMDeblurDemo

Wednesday, June 19, 2013

FLIP fluid simulation with sample code

I've been continuing to play around with incompressible fluid simulation and have implemented the FLIP method (FLuid Implicit Particle). The FLIP algorithm represents the fluid concentration and velocity field using a large set of particles. At every iteration these particles are transferred to an auxiliary grid using kernel density estimation which defines a grid-based velocity field which is made incompressible by pressure projection and then used to advect the particles. Finally the velocity of each particle is corrected using the difference between the projected velocity field and the initial velocity field obtained by kernel density estimation.

The primary benefit of this approach is that it has extremely low numerical viscosity, making simulating fluids such as smoke and water possible at sane grid resolutions. The downside is considerably more complexity than a purely grid-based solver; you need to track particles, perform the mapping between particles and grid (and vice versa) and have some sensible scheme to reseed particles when either there are too few or too many in a region.

To play with the method I implemented it in C++, largely following Robert Bridson's Fluid Simulation for Computer Graphics. It is a canned example demonstrating variable density flow under the influence of gravity using an body force proportional to the fluid 'concentration', a sort of poor-man's Boussinesq approximation. The simulation below was performed on a 100x100 auxiliary grid with 16 particles per grid-cell. This is more than the 2x2 typically used, but helps get around reseeding. As this is not free-surface flow, there are particles everywhere which allows causes the smoke to form nice instabilities and finally turbulent mixing (although I did not run it particularly far).

In order to have an unmangled version of the code available to myself, I am posting it here. It is uncommented and unstructured, but (hopefully) fairly understandable once you understand the basic algorithm. The code is bundled with the GMM++ sparse matrix library for the pressure solves and required VTK and CMake for output and as a build-system respectively.

You can download the code here: FLIP2.zip

Thursday, June 13, 2013

Latex formulas as images using Python

I find it endlessly frustrating to incorporate math into vector-graphics documents and Powerpoint presentations. While Powerpoint does allow importing equations from the equation editor, they usually look terrible. However there is an online Latex equation editor run by codecogs.com that allows you to generate images of expressions. This works great, but it's inconvenient to save and regenerate images after using it.

To that end, I've written a python function that will generate the URL for the Latex image-generation script, get the image data using HTTP and write it to disk as a file of your choosing. It uses the python requests module (available via pip) and ImageMagick:

import os, requests

def formula_as_file( formula, file, negate=False ):
    tfile = file
    if negate:
        tfile = 'tmp.png'
    r = requests.get( 'http://latex.codecogs.com/png.latex?\dpi{300} \huge %s' % formula )
    f = open( tfile, 'wb' )
    f.write( r.content )
    f.close()
    if negate:
        os.system( 'convert tmp.png -channel RGB -negate -colorspace rgb %s' %file )

The code is appallingly simple for how much time it has saved me; it allows me to regenerate formulas for conference posters by simply running a python script and since most vector-graphics packages simply link to bitmap images allows the resulting layouts to update automatically. Even better is that CodeCogs image generation script produces transparent PNGs so you can drop them on top of lightly colored backgrounds. By setting the negate flag you will get a white on transparent PNG that can be used over dark backgrounds.

Here are some examples:

formula_as_file( r'\Gamma_{Levin}(x) = \| \nabla p(x) \|_2^{0.8} + \sum_i |\frac{\partial^2 p(x)}{\partial x_i^2}|^{0.8}', 'reg_levin.png', True )

formula_as_file( r'\Gamma_{MTV}(x) = \| \sum_{i=1}^3 \left( \| \nabla p^{(i)}(x)\|_2^2 \right) \|_2', 'reg_MTV.png' )

Finally no more ugly power-point presentations!

Friday, May 31, 2013

Efficiently Sample from Normal (Gaussian) Distribution

The following code snippet samples from a standard normal distribution using the Box-Muller transform, which avoids nastiness like cropping to a fixed number of standard-deviations or rejection sampling. I have shamelessly cribbed this from Wikipedia (the previous link), but use it all the time, so I'm putting it up as a snippet.

void utils_sample_normal( float &x, float &y ){
    float u=drand48(), v=drand48();
    float lnu = sqrt( -2.0*log(u) );
    x = lnu*cos(2.0*M_PI*v);
    y = lnu*sin(2.0*M_PI*v);
}

pyPolyCSG library updated

I have updated the pyPolyCSG python constructive solid geometry library to use the most recent version of the Carve CSG library. The updates to Carve appear to make it significantly more robust as well as much faster.

The update to pyPolyCSG:

Improves the robustness, at least on my existing scripts
Improves the speed of performing Boolean operations
Fixes a bug where vertex normals and texture coordinates were treated as vertices (thanks to Ryan Rix for finding this)
Adds the ability to transform polyhedra by a 3x3 or 4x4 matrix (again thanks to Ryan Rix!)

You can get the updated version from github: https://github.com/jamesgregson/pyPolyCSG

Sunday, May 26, 2013

Smooth Feedrate Envelopes for Motion Control, Part II

In the previous post, I derived equations for smooth feedrate control of stepper motors, claiming that by using a smoother feedrate envelope that the motors could be driven faster with less chance of skipping steps and losing position.

In this post, I demonstrate this with a real stepper motor and show that it actually does work: using the envelopes does actually prevent the motors from losing steps. My test setup is a single NEMA 17 stepper, driven by one of my A4988 driver breakouts, which is controlled by an Arduino sketch running on the Arduino Due. I'm using half-stepping on the motors, driven by a 200KHz timer interrupt step callback which decides whether or not to step based on the interpolated supplied delays for the start and end of each move. The move itself approximates a square wave, first accelerating from a slow feedrate, then performing a constant speed portion, then decelerating back to the initial feedrate.

The video below shows the stepper being driven using a constant acceleration profile, which causes the kinks in the feedrate graph in my previous post. You can clearly see it moving around on the table and stalling frequently before it reaches the top speed.

In contrast, here is the result using the third-order cubic feedrate envelope for the same set of moves. The stepper is easily able to handle the top speed and jerks around considerably less on the table. Of course this comes at a price, a higher pulse-frequency must be used to resolve the acceleration profile.

You can get the code I used for this from following link: https://sites.google.com/site/jamesgregson/tmp/linear_move.zip, it includes a multi-axis DDA implementation suitable for use with timer-interrupts as well as the code for evaluating the feedrate envelopes.

Thursday, May 23, 2013

Smooth Feedrate Envelopes for Motion Control

When running motors it is desirable to run them as smoothly as possible to minimize vibrations and possible missed steps. This is why controllers for 3D printers and CNC machines typically incorporate some notion of acceleration rather than instantly switching from one feedrate to another. Often this is done with a simple ramp of the feedrate, i.e. using a constant acceleration profile.

As an example, consider a machine starting at feedrate $f_0$ and then performing a very long linear move at a constant feedrate $f_1$. For this example, If $f(t)$ is the machine feedrate over time and $a$ is a constant acceleration, this profile would be defined mathematically as follows:

\begin{equation}
f(t) = \begin{cases}
f_0 & \mbox{if } t < t_0 \\
f_0 + a (t - t_0) \hspace{0.5cm} & \mbox{if } 0 \leq t - t_0 \leq \frac{f_1-f_0}{a} \\
f_1 & \mbox{otherwise}
\end{cases}
\end{equation}

Graphically, here's a plot of the feedrate over time for a move starting at rest at $t=0$ and accelerating up to a feedrate of 2 over one unit of time:

The problem with a constant acceleration profile is that there are sharp kinks in the feedrate plotted over time. These kinks imply instantaneous changes in acceleration, which in turn imply infinite forces for infinitely short periods of time. Of course, there is no mechanical way to produce these forces, so what actually happens is the machine overshoots very slightly and averages the forces out over a short time. For low feedrates with a light machine this actually works okay, but for a heavy machine at high-feedrates, the overshoot can be more than a motor step which causes the machine to lose position. In an open-loop design, once the machine loses position, it never recovers and in all likelihood, the part is ruined.

There are a few ways to address this problem:

Use lower feedrates
Use higher torque motors
Use a closed-loop control scheme, e.g. with encoders on the motors
Make the acceleration smooth

The first is clearly not an option because it wastes time and feedrates may be chosen specifically for valid reasons such as minimizing local part heating or reducing machining time. In an ideal world we'd do the remaining three items, but options two and three are expensive, particularly for hobby gear. However the fourth option can be tackled in firmware with minimal hardware overhead.

In order to smoothly transition between accelerations we can simply use a different curve to interpolate the feedrates. The conditions needed are that the feedrates match the desired rates at the beginning and end of the curve and that the slope of the feedrate curves (i.e. the acceleration) is zero at the endpoints. In between the endpoints we want the curve to be smooth.

The one of the simplest classes of functions that meet these requirements are cubic polynomials. These are defined by four coefficients $a$, $b$, $c$ and $d$ using the following equation, where $\tau$ is the fraction of the total time spent accelerating:

\begin{equation}
f(\tau) = a \tau^3 + b \tau^2 + c \tau + d
\end{equation}

We now want to solve for the coefficients needed to reproduce the move. There are four coefficients so we need four equations. Two come from the requirement that we match the feedrates at the curve endpoints:

\begin{eqnarray}
f(\tau=0) = a 0^3 + b 0^2 + c 0 + d &=& f_0 \\
f(\tau=1) = a 1^3 + b 1^2 + c 1 + d &=& f_1
\end{eqnarray}

From these, we see that $d=f_0$ and $a+b+c=f_1-f_0$. The remaining two equations can be found using the requirements that the slope of the feedrate curve is zero at the endpoints. To enforce these constraints we need the derivative of the cubic function:

\begin{equation}
f'(\tau) = 3 a \tau^2 + 2 b \tau + c
\end{equation}

The constraints can now be enforced by requiring that:

\begin{eqnarray}
f'(\tau=0) = 3 a 0^2 + 2 b 0 + c &=& 0 \\
f'(\tau=1) = 3 a 1^2 + 2 b 1 + c &=& 0
\end{eqnarray}

These equations make it clear that $c=0$ and $3 a + 2 b = 0$. Combining these with the previous conditions leaves two equations and two unknowns:

\begin{eqnarray}
a + b &=& f_1 - f_0 \\
3 a + 2b &=& 0
\end{eqnarray}

So $a = -\frac{2 b}{3}$ which means that $b = 3 (f_1-f_0)$ and $a = -2 (f_1 - f_0)$. This gives the following equation for the interpolating curve:

\begin{equation}
f(\tau) = -2(f_1-f_0)\tau^3 + 3(f_1-f_0)\tau^2 + f_0
\end{equation}

The only remaining thing is to define $\tau$ in terms of $t$. This is a simple linear interpolation from the start of the acceleration $t_0$ to the end of the acceleration $t_1=\frac{f_1-f_0}{a}$:

\begin{equation}
\tau = \frac{t-t_0}{t_1-t_0} = \frac{t-t_0}{\frac{f_1-f_0}{a}-t_0}
\end{equation}

Plotting this for the same parameters as before gives a smooth, kink-free curve that considerably reduces the time-rate-of-change of acceleration:

In the post-to-come I will demonstrate applying this to a real stepper motor being driven aggressively. Although seemingly complicated, for a cost of only a few operations per step, it is possible to switch from the linear acceleration profile to the cubic one derived here and get considerably smoother operation.

Matlab Signal Deblurring & Denoising Example

To date my research has been largely focused on inverse problem such as tomography or image deblurring. These problems are often highly under-determined and so must include strong priors to obtain good solutions and finding efficient solvers for these priors is challenging. A labmate recently pointed me towards the ADMM method, which splits the full problem into coupled sub-problems. It has a number of advantages:

Flexibility - ADMM handles a number of non-trivial problems in a common framework
Efficiency - Often the subproblems have efficient, often embarassingly-parallel, solvers

An excellent introduction to the method can be found in the paper: Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers by Boyd et al.. It's a pretty gentle and practical introduction, with numerous example problems.

I've quickly implemented ADMM for combined deblurring and denoising of 1D input signals using the total-variation regularization in a Generalized-Lasso problem definition. Unlike the Boyd paper, I've chosen to use Landweber iterations to solve the data subproblem as these are commonly used in large-scale deblurring and tomography. My well-commented sample implementation (see the end of this post) allows all the method parameters to be tweaked to see the effect of using inexact solves, different penalty parameters and different regularization weights.

As an example, the image below shows a 100 sample square-wave signal, blurred by a Gaussian with standard deviation of 3 and corrupted by Gaussian noise with a sigma of 5%. Only two inner Landweber iterations were used for each outer iteration. The red line shows the original blurred and noisy input, while the blue line shows the reconstructed result after 100 iterations.

Overall I find that the method works as advertised. It's fairly easy to implement, either in Matlab or in C/C++, can handle 1-norms of general linear regularizers and converges quickly. It also allows matrix-free solvers to be used for the data-subproblem, while the solve for the prior is embarrassingly-parallel, so it should scale quite well too.

You can get the implementation that produced this plot below. Note that you may get slightly different results since the noise is generated randomly.

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%                                                                     %%%
%%% Sample signal denoising & deblurring using ADMM                 %%%
%%% code written by James Gregson (james.gregson@gmail.com), 2013       %%%
%%% Use the code for whatever you'd like                                %%%
%%%                                                                     %%%
%%% Demonstrates performing Total-Variation (TV) denoising and          %%%
%%% deblurring of a 1D input signal using the ADMM iterative scheme     %%%
%%% applied to a Generalized-Lasso problem.  More information on the    %%%
%%% approach can be found in [1].  The approach from [1] is modified    %%%
%%% slightly by using gradient-descent (Landweber iterations) to solve  %%%
%%% the first subproblem in place of a direct solver or iterative       %%%
%%% method such as conjugate gradient.  Matrix-free Landweber           %%%
%%% iterations are commonly used for large-scale linear inverse         %%%
%%% problems and this sample code allows the accuracy of the            %%%
%%% subproblem solves to be adjusted to see the effect on the final     %%%
%%% reconstructions.                                                    %%%
%%%                                                                     %%%
%%% [1] Boyd et al., Distributed Optimization and Statistical Learning  %%%
%%%     via the Alternating Direction Method of Multipliers, 2010       %%%
%%%                                                                     %%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

close all;
clear all;

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%% Problem Parameters %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

N           = 100;     % Signal sample count
lambda      = 0.1;     % Total-variation weight
sigma       = 3.0;     % PSF sigma for generating blurred input
noise       = 0.05;    % Gaussian-noise sigma

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%% ADMM Parameters %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

rho         = 1.0;     % ADMM constraint weight 0 < rho <= 2
outer_iters = 100;     % Number of ADMM iterations
inner_iters = 2;       % Number of Landweber steps per ADMM iteration
relax       = 1.9;     % Under-relaxation factor for Landweber steps [0,2]

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%% Construct System PSF & Image Formation Model %%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

off = -ceil(sigma*3):ceil(sigma*3);  % psf pixel offsets
psf = exp( -off.^2/sigma^2 );        % psf values for offsets
psf = psf/sum(psf);                  % normalize to 1.0

% generate psf matrix by setting diagonals based on 1D psf above
M = zeros( N, N );                   
for i=1:numel(psf),
    M = M + diag( psf(i)*ones(N-abs(off(i)),1), off(i) );
end
M = sparse( M );

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%% Generate noisy and blurred synthetic input %%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

input = zeros( N, 1 );                  % start with zeros
input(floor(N/4):ceil(3*N/4),:) = 1.0;  % make a center region at 1.0
blur = M * input + noise*(randn(N,1));  % blur the input by the psf


%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%% Construct the difference matrix that computes image gradients %%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

A = sparse( -diag( ones(N,1), 0 ) + diag( ones(N-1,1), 1 ) );
A(N,N-1)=1; % use a backward difference for the final point

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%% Setup ADMM and perform iterations %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

% compute the eigenvalues of the first subproblem
% system matrix and use the largest to define a
% Landweber step size, for large system a a power-
% iteration should be used instead.
tmp = eigs(M'*M + rho*A'*A);
step = relax/tmp(1);

% initialize the ADMM variables
x = blur;             % intrinsic (sharp) signal
z = zeros( N, 1 );    % splitting variable
u = zeros( N, 1 );    % scaled Lagrange multipliers

% define an anonymous shrinkage operator to implement
% the second sub-problem solve
shrink = @(kappa,x) max( abs(x)-kappa, 0 ).*sign(x);

% perform outer_iters outer iterations, plot the
% solver progress as the iterations proceed
for k=1:outer_iters,
    fprintf( 1, 'iteration %d of %d\n', k, outer_iters );
    plot( x, 'b-+' );
    drawnow;
 
    % define an anonymous function returning the gradient
    % of the first subproblem w.r.t. x, holding z and u
    % fixed, then perform gradient-descent (Landweber 
    % iterations).
    gradF = @(x) M'*(M*x) - M'*blur + rho*A'*( A*x - z + u );
    x = gradient_descent( gradF, x, step, inner_iters );
    
    % solve the second sub-problem using the anonymous
    % shrinkage operator
    z = shrink( lambda/rho, A*x + u );
    
    % update the scaled Lagrange multipliers
    u = u + A*x - z;
end

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%% Setup ADMM and perform iterations %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

hold on;
plot( blur, 'r-o' );
plot( x, 'b-o' );

Monday, April 29, 2013

Light-in-Flight Paper accepted to Siggraph 2013!

A recent project focused on reducing the cost of Transient Imaging. Transient images are effectively videos that depict light-transport at the timescales of light-transport; they allow you to see light 'in-flight' as it bounces around the scene.

This has been done in the past with extremely expensive optical setups costing hundreds of thousands of dollars that involve very short laser pulses and so-called streak-cameras. However, it is possible to accomplish much the same effect with modifications to off-the-shelf hardware for on the order of only a $1000. The paper describing this has been accepted to SIGGRAPH 2013 and there will be an accompanying technical demo in the Emerging Technologies area.

You can get an idea what the project does from the following video. If you happen to be attending SIGGRAPH this year, please stop by the booth to find out more!

Stochastic Deconvolution Paper Accepted to CVPR2013

My most recent submission on image deblurring has been accepted as a poster at CVPR2013, to be held in Portland Oregon. The approach is an adaptation of the stochastic random walk used in my earlier Stochastic Tomography paper to image deblurring and handles saturated pixels and boundary conditions naturally. The submission video gives an overview of the method:

More info is available at the Stochastic Deconvolution website. The site also includes sample C++ code illustrating the basic method.

Sunday, March 24, 2013

Minimal HTTP Server Example with WiFly RN-XV and Teensy3.0

The following code is a minimal example for serving content from a Teensy3.0 using the RN-XV WiFly module. The code assumes that the WiFly has been set up to use a baudrate of 115200 bps and has also been set up to receive connections on port 80 via the commands:

set uart baudrate 115200
set ip localport 80

The basic setup procedure that I used is detailed in a previous post covering how to set up an ad-hoc network to configure the WiFly along with some example code make a Teensy3.0 and WiFly operate as an echo server.

Assuming this is done, the following code will serve up a simple HTML page that alternately displays "Welcome!" or "Booyah!" when the page is reloaded.

int read_message( const char *msg, int len ){
  for( int i=0; i<len; i++ ){
    while( !Serial3.available() ){ }
    if( Serial3.read() != msg[i] )
     return 0; 
  }
  return 1;
}

int read_line( char *line ){
  int pos = 0;
  char c = '\0';
  while( c != '\n' ){
    if( Serial3.available() ){
      c = Serial3.read();
      line[pos++] = c;
    } 
  }
  line[pos] = '\0';
  return pos;
}

void send_response( const char *data ){
  Serial3.print( "HTTP/1.1 200 OK\r\n");
  Serial3.print( "Content-Type: text/html\r\n" );
  Serial3.print( "Content-Length: " );
  Serial3.print( strlen( data )+1 );
  Serial3.print( "\r\n" );
  Serial3.print( "Connection: Close\r\n" );
  Serial3.print( "\r\n" );
  Serial3.print( data ); 
  Serial3.write( (byte)0 );
}

void handle_connection( int val ){
  char cmd[128], line[128];
  if( !read_message( "*OPEN*", 6 ) ) 
    return;
  Serial.println("client connected!");

  read_line( cmd );
  Serial.println( cmd );
  while( read_line( line ) > 1 ){
    Serial.print( line );
    if( line[0] == '\r' )
      break;
  }
  
  const char *page[] = { 
    "<html><body>booyah!</body></html>\n",
    "<html><body>welcome!</body></html>\n"
  };  
  Serial.println("responding with" );
  Serial.println( page[val%2] );
  send_response( page[val%2] );
 
  read_message( "*CLOS*", 6 );  
}

void setup(){
  Serial.begin(115200);
  Serial3.begin(115200);
}

int id = 0;

void loop(){
 handle_connection(id++);
}

This is obviously some pretty brittle and stripped down code, there is no bounds checking on input, nor are the input requests parsed to see what they actually are. Parsing the cmd string in the handle_connection function would handle this, however it does demonstrate serving pages from the Teensy. Output is as expected when the IP for the RN-XV is entered into Firefox, alternating "Welcome!" and "Booyah!" as the page is reloaded.

With some simple input parsing, this would make it easy to do basic querying of the state of the Teensy.

Setting up the WiFly RN-XV with a Teensy 3.0

A recent order from Sparkfun arrived, including a 3.3V Serial LCD and a Roving Networks RN-XV WiFly module. The RN-XV module is intended to be a drop-in replacement for an XBee, except that it operates over WiFi. At about $35, it is just about the cheapest way to make your project wireless enabled.

The module is 3.3V, meaning some form of level shifting is needed with a 5V system like an Arduino. You can use this module with an Arduino via an XBee shield pretty easily. However it is even easier to use with the 3.3V Teensy 3.0 ARM board, provided you have a breakout for small-pitch XBee module footprint. The Teensy is also nice for this application because it has multiple serial ports, so you don't need to use the SoftwareSerial library, or program the board, then disconnect to use the wireless.

Setting everything up was pretty easy once I knew what to do, but this post summarizes the process should I ever need to do it again.

The setup that I am using is shown below:

Only four connections are needed once you're set up, 3.3V, GND and two data connections. DOUT from the Teensy3.0 Serial3 connects to DIN of the RN-XV and DIN from the Teensy to DOUT to the RN-XV. This makes the module operate as just a serial port, making it pretty easy to interface with. The remaining orange wire connects the Serial LCD display, more on this later.

To get started, I found it was easiest to set the RN-XV in ad-hoc mode. This can be done by connecting pin 8 to 3.3V and will cause the module to create its own wireless network. When this happens you will see the status LEDs blinking green, orange and red; they're doing it, but you can't really see in this picture. Note the additional green wire to 3.3V connected to the 8th pin.

You can then look for the network. On a Mac it's pretty easy, it just shows up in the list of networks in the status bar:

The WiFly shows up towards the bottom as WiFly-GSX-a8 or something similar. If you connect to this network, you can then telnet to the module using the IP address: 169.254.1.1, port 2000. The module should then respond with a *HELLO* string, at which point you type $$$ to enter command mode. Command mode allows you to set up the module for your network.

When the module is ready, it will respond with the CMD message to indicate that you're in command mode. To set up your network you can issue the commands:

set wlan phrase (password);
set lan ssid (your network name);
save
reboot

You can also issue commands to assign a static IP address to the module, but I didn't do this. For more information, see this excellent introduction http://www.tinkerfailure.com/2012/02/setting-up-the-wifly-rn-xv/

I found that sometimes the module would respond with a confirmation and sometimes would not. I repeated the process a few times in the hopes that some combination would stick. After this process, remove the power and and connection from pin 8 to 3.3V. This will cause the device to try to connect to your wireless network.

You should now be able to telnet to the device, but this time with your computer and it connected to your normal WiFi network rather than the ad-hoc network that the device creates. However first you need to find the IP address of the module. To do this, I went into my router configuration page:

Conveniently the WiFly module had an entry: 192.168.1.106. Depending on your router, you should be able to set up a specific IP address for the router to assign to the module based on the MAC address. However my POS router does not allow this.

I could then telnet to the module's IP address, again using port 2000. This module responds with the same *HELLO* prompt, indicating that everything was successful and the module is on the network and communicating.

With the connections above the Teensy should now see the module as just another serial port. To test this, I attached the Serial LCD and uploaded the following code to the Teensy:

#include<stdio.h>

void setup(){
  Serial.begin(9600);
  Serial2.begin(9600);
  Serial3.begin(9600);
}

void write_lines( const char *L0, const char *L1 ){
  
  Serial2.write( 0xFE );
  Serial2.write( 0x01 );
  delay(10);
  Serial2.write( 0xFE );
  Serial2.write( 128 );
  delay(10);
  Serial2.print( L0 );
  Serial2.write( 0xFE );
  Serial2.write( 192 );
  delay(10);
  Serial2.print( L1 );
}


void loop(){
  if( Serial3.available() ){
    char L0[17];
    char L1[17];
    int pos = 0;

    L0[0] = '\0';
    L1[0] = '\0';

    while( Serial3.available() ){
      char c = Serial3.read();
      if( c == '\n' ){
        pos = 0;
        Serial.print('\n');
      } else if( c == '\r' ){
        
      } else {
        if( pos < 16 ){
          L0[pos] = c;
          pos++;
          L0[pos] = '\0';
        } else if( pos < 32 ){
          L1[pos-16] = c; 
          pos++;  
          L1[pos-16] = '\0'; 
        }
        Serial.print( (char)c );
      }
    }
    write_lines( L0, L1 );
  }
  delay(100);
}

My LCD is a 2x16 character display. The code above just polls for available data on the third serial port and, when a newline is encountered, prints it out onto the display. Lo and behold, after the following session:

Jamess-MacBook-Pro:~ jgregson$ telnet 192.168.1.106 2000
Trying 192.168.1.106...
Connected to 192.168.1.106.
Escape character is '^]'.
*HELLO*
This is James

The result on the display is below:

Hooray! An utterly useless internet thingy!

Putting CMake binaries in a specific directory

I've started using CMake for pretty much all my development work and have been quite happy except for a few minor issues. One of those is that the build-products are put in Debug and Release directories by default, instead of single output directory. You can add an install target to do this, but XCode won't apply it through the UI. Ditto goes for a post-build copy step.

It took a while to find, but it's actually simple to get binaries placed into a directory of your choosing. In the example project, code and the CMakeLists.txt file is in the code/ subdirectory, I build using CMake/XCode in the build/ subdirectory and I want the build products placed in the bin/ subdirectory, regardless of whether they are built as debug or release.

Here's the CMakeLists.txt file:

cmake_minimum_required( VERSION 2.6 )

project( entropy )

add_executable( entropy main.cpp )
target_link_libraries( entropy ${LIBS} )
set_target_properties( entropy PROPERTIES
  RUNTIME_OUTPUT_DIRECTORY_DEBUG   ${CMAKE_SOURCE_DIR}/../bin 
  RUNTIME_OUTPUT_DIRECTORY_RELEASE ${CMAKE_SOURCE_DIR}/../bin
)

The set_target_properties() command allows you to set the runtime output directory. You'll see this a fix similar to this online in many places, but it will put the built executables in bin/Debug or bin/Release, instead of just bin/. What was not immediately obvious is that it can be set per-build-configuration, causing all build products to be put in bin/.

Friday, March 22, 2013

Symmetric Discretization of Linear Boundary Conditions

When solving large-scale PDEs, it's generally preferable (where possible) to have a symmetric discretization so that the conjugate gradient method can be used. Conjugate gradient uses relatively little memory compared to more general purpose methods (like GMRES) and can also take advantage of symmetric preconditioners (like incomplete LDL^T) that are faster and use lower memory than preconditioners such as incomplete LU.

However applying boundary conditions to the problem can be an issue, since it is easy to end up with a non-symmetric matrix unless the constrained variables are eliminated from the system matrix. This means that solvers like GMRES have to be used, which is costly in terms of computation and memory.

The following Matlab script gives an example for how to apply linear boundary conditions via an auxiliary boundary condition matrix suggested by a labmate of mine. It shows both pin (Dirichlet) constraints and gradient (Neumann) boundary conditions at the domain boundary and in the interior. The only downside is that Neumann constraints are a bit peculiar to specify since one of the variables must be eliminated from the boundary condition matrix. However this can usually be specified fairly easily, and certainly more easily than eliminating the variable from the system matrix.

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%% Example for showing symmetric discretization  %%
%% of general linear boundary conditions.        %%
%% (c) James Gregson 2013                        %%
%% Please send questions or comments to:         %%
%% james.gregson@gmai.com                        %%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

% solve a 1D Laplace equaiton with 8 mesh points
% subject to the constraint that the leftmost 
% grid point has value 1.0 and the rightmost 
% grid point has value 2.0 and the fifth is 
% constrained to the value of the fourth minus 0.2
%
% then solves the same problem, but with a biharmonic
% operator, to show that the result is different 
% from just pinning the values; i.e. the biharmonic
% energy is applied to the constrained grid points. 
% If this weren't the case, both problems would have 
% the same solution.

% the approach is to use a standard symmetric 
% discretization of the Laplace operator A for all
% grid points (including those constrained by
% boundary conditions) and a second boundary
% condition matrix that enforces the constraints
% specifically, the boundary conditions are
% represented as:
%
% u = B v + c
%
% where u has the same dimensions as the Laplace
% operator and B has one column for every
% unconstrained grid-point.  c is a vector of
% constants. this allows the problem: 
%
% A u = f
%
% to be written as:
%
% A*B*v = f - A*c
%
% however this is system is rectangular and 
% non-symmetric.  This can be remedied by 
% multiplying through by B', giving:
%
% B'*A*B*v = B'*(f-A*c)
% 
% which is square and symmetric.

% define the objective function, in this case a 
% 6-variable 1D Laplace equation. The Laplacian
% is discretized with a standard 3-point stencil
A = [  1, -1,  0,  0,  0,  0,  0,  0;
      -1,  2, -1,  0,  0,  0,  0,  0;
       0, -1,  2, -1,  0,  0,  0,  0;
       0,  0, -1,  2, -1,  0,  0,  0;
       0,  0,  0, -1,  2, -1,  0,  0;
       0,  0,  0,  0, -1,  2, -1,  0;
       0,  0,  0,  0,  0, -1,  2, -1;
       0,  0,  0,  0,  0,  0, -1,  1 ];

% we are solving the Laplace equation, so the
% right hand side is zero
f = [ 0, 0, 0, 0, 0, 0, 0, 0 ]';

% the boundary condition matrix.  
B = [  0,  0,  0,  0,  0; % first grid point, set to c[1]
       1,  0,  0,  0,  0;
       0,  1,  0,  0,  0;
       0,  0,  1,  0,  0;
       0,  0,  1,  0,  0; % fifth grid point, set to c[4]-0.2
       0,  0,  0,  1,  0;
       0,  0,  0,  0,  1;
       0,  0,  0,  0,  0 ]; % last grid point, set to c[8]

% the boundary condition constant matrix
c = [ 1, 0, 0, 0, -0.2, 0, 0, 2 ]';

% form the system with boundary conditions
M = B'*A*B
rhs = B'*( f - A*c );

% solve for the primary variables
x = M\rhs;

% use the primary variables to solve for the
% grid values
u1 = B * x + c;

% now solve a biharmonic problem with the same constraints
% in the same manner, note the A' factor in the  system
% matrix and right-hand-side.
M = B'*A'*A*B;
rhs = B'*( f - A'*A*c );
x = M\rhs;
u2 = B * x + c;

% plot the results
figure
hold on
plot( u1, '-ro' );
plot( u2, '-b+' );
legend( 'Laplace', 'Biharmonic' );

The plot produced by this script is shown below:

Both solves reproduce the boundary conditions correctly, but the biharmonic solve finds a solution that is smoother than the Laplace solution around the interior constraints.

Wednesday, March 13, 2013

Playing with Some Fluid Simulation

A recent project has me playing around with some fluid simulation. I got some nice looking results and thought I would post a few.

The first is a buoyancy-driven flow simulated using a Marker-And-Cell (MAC) grid with the Boussinesq approximation. This is simply a source term applied to the vertical velocity based on the fluid density, making it trivial to add to a basic solver. Although I have no explicitly simulated viscosity, numerical viscosity is enough to cause some interesting unsteady flow effects

The second is a simulation of smoke in box, using a 3D 'Stable Fluids' solver with the first order advection replaced by the second-order 'BFECC' scheme. This gives -way- better results than standard first order semi-Lagrangian discretizations:

The results compared to the first order scheme (shown below) are pretty impressive:

That's it..

Reducing Warping/Shrinkage in Large 3D Prints

I've been doing a bit more 3D printing lately and having problems with large 3D prints warping due to uneven cooling. Most of the prints I work on are fairly large, roughly 10x8x4 cm or larger, so this can be a big problem, particularly if the dimensions change enough to mess up the fit of parts. Basically the problem is that as layers cool, they contract and pull off the build-surface. Generally I'm printing PLA on painter's tape, so this will just pull up the tape. An example is shown in the photo below:

You can see how the tape is pulled up at the near edge; this is about 2mm of shrinkage which is not too bad. But this is also a relatively small part, perhaps 5x5x4 cm, and the problem gets worse as the parts get bigger. A partial fix is to use a heated bed, but the Gen6 electronics for my printer do not support controlling a bed.

However a more practical solution was suggested by another member of the Vancouver Hack Space. He's working on a very large printer for very large prints; I highly suggest visiting his site ottersoft.ca if you're interested. Anyway, his suggestion is to space the part up from the bed using some dummy geometry and then print with support material. The support material forms less connected layers than the object, which gives it more...'give' when printing. Here's an example:

The little block is some dummy geometry added to lower the bottom of the geometry about 1.5 cm. When printed with support, the extra flex of the support lets the object stay attached to the bed without the terrible shrinking.

Part of this seems to be the extra flex, but I suspect that another part is less direct contact with the large metal build-plate, which acts like a heat sink, so the whole part cools more uniformly.

After pulling it off the build plate, you can see that the support material on both sides is in contact with the table, indicating little to no warping. And this is a big part, roughly 10x8x3 cm.

The downside is that you now have to deal with the support material. If this is the same as the print material, it can be difficult to clean off without leaving a nasty surface finish or changing the part accuracy. For example, filing the support out of the horizontal holes without accidentally changing the hole diameter. But once done carefully, the part is remarkably accurate: the height as measured at all four corners only differs by about 0.15 mm (with a layer height of 0.3 mm), and the holes are actually round and not elliptical.

I wish that I could take credit for this trick, but that belongs to Loial (ottersoft.ca) of VHS (hackspace.ca)

Sunday, February 10, 2013

Prototype 5-Axis CNC Board

Since I'm working on some CNC firmware, I thought I needed a test platform. I didn't want to rip-apart the CNC so I decided to build a new 5-axis board using the new Pololu stepper drivers. These are nice little cheap boards and their advantage over the ubiquitous A4988 drivers is that they don't need cooling up to 1.5A per phase. This makes things way easier, since those tiny little heatsinks for the A4988 chips are scattered all over my apartment, double-sided tape all linted up so they won't stick, just waiting to be stepped on. They hurt even more than stepping on a Lego block, little buggers.

Anyway, I figured I should just go to five axes, which would allow me to control the 3D printer eventually (1X, 1Y, 2Z, 1E). So I built a protoboard test board:

I laid out the board and soldered it up. It was pretty easy except for having to drill a few extra holes in the power-rails to get the spacing I wanted. After electrically testing everything I decided not to blow all 5 drivers simultaneously on a meticulously copied wiring error, opting instead for only two.

The MCU controlling the whole mess is an Arduino Uno. I have the step pins on PortD and the direction pins on PortB, which allows fast bitwise operations to be used for stepping. While picking up the stuff for the prototype board I also bought a proto-screw-shield, wanting a more resilient mounting option for the Uno:

I have the early version of my CNC firmware loaded onto the Arduino. After firing up the serial terminal, I had the thing running!

Not super crazy, but still rewarding nonetheless.

CNC Firmware running on Arduino Uno

In an earlier post I described some prototype CNC firmware that I had gotten running on the Teensy3.0. The Teensy3.0 is a 32-bit ARM board and is pretty nice, but I wanted to see if the code would also run on an Arduino Uno. This is a much more restrictive platform; only 2k of ram and 32k of program space. However, after a bit of porting of the hardware layer, it compiles and seem to run no problem. Here's a screen-capture from a (brief) session, the pulse-ticks stuff is debugging the motion control module.

Note that this is running the full GCode interpreter, with spec-compliant command sorting and an 8-move lookahead buffer. The step timer is running off a 2KHz Timer1 interrupt. I'm hoping to increase this to 8+ KHz, at least on other platforms, which should make smoother motion control possible, but at the moment I'm just happy that my ANSI C + POSIX code that I've developed on on a 64bit quad core machine with 16Gb ram ports easily to an 8bit system with 2k or ram and only interrupts after changing less than 50 hardware-specific lines of code.

I've written the code this way for just this reason; by keeping all the hardware specific stuff isolated in a hardware abstraction layer I'm hoping to make a code base that stands the test of time, from small embedded platforms to boards like the Raspberry Pi, to full-blown PCs. So far so good, it's running on an 8bit AVR, a 32 bit embedded ARM and a full X86_64 PC.

Wednesday, January 30, 2013

Start of CNC Firmware Running on the Teensy 3.0

I've been intermittently working on some CNC firmware with the goal of getting it to run on everything from an Arduino to a standard PC. It's a fun project because it's very resource constrained but involves a number of different parts, like GCode parsing, asynchcronous programming, portability and motion control for non-Cartesian machines.

In the interests of portability I've been writing the various components in plain ANSI C (except the comments, I like my double slashes!). By doing this I expect the code to be portable across a wide variety of platforms, from AVRs to the Propoeller child to ARM to PCs, with minimal effort.

My hopes are that the firmware will conform to the NIST GCode specification as closely as is manageable. That said, some sacrifices will have to made as the spec calls for around 20k of addressable space for expressions, which exceeds the total RAM of Arduinos and even the Teensy 3.0. However I hope to get mostly spec-compliant in terms of order of operations, expressions (if not the addressable space) and features. Where possible I am also allowing the specific capabilities to be determined by preprocessor macros.

I've done a bit of work on the GCode interpreter, getting some test code together that parses GCode expressions (this led to my Mathematical Expression Parser in C) as well as an arbitrary dimension DDA implementation. To date these have been just simple tests, nothing actually running on real hardware. I've also done some early tests on doing feedrate optimization and

However today I made some good progress on actually getting some real firmware started. I got my early GCode interpreter stripped down and abstracted out all the hardware-specific stuff into a hardware layer. I then wrote the hardware abstraction layer for the Teensy 3.0 version of the Arduino environment and fired up the code, using the following string to test with:

N1229.0(This is a comment)G01x110.0   y330.0 M7 F20.0(MSG: another comment)

And it worked!

The screenshot above shows the GCode interpreter code loaded up in the Arduino environment, the Teensy loader application that programs the Teensy and the serial output of the parser. Note that this is not just reading the commands linearly, it is fully parsing the input, splitting it into distinct commands, sorting the commands by operation precedence (as described in the NIST spec) and finally calling back to the Arduino sketch with each command.

Parsing the string and calling back takes about 250 us (the remaining time taken is up by serial communications) so more than fast enough to fill up a lookahead buffer. Total RAM used is about 5k, so this won't run on an Uno but does run handily on the Teensy 3.0. Ditto goes for the flash at 33Kb currently. This is pretty hefty compared to other firmware, but I think portability and cleanness makes up for that.

The next steps will be to get the kinematic/motion control stuff fleshed out. I think I should be able to use by DDA implementation running with the Teensy timers pretty easily. Then it's just tweaking and adding features...

Thursday, January 10, 2013

Periodic Interrupt Timers on the Teensy 3.0 (Freescale MK20DX128)

I recently ordered a Teensy 3.0 and today it finally arrived! It definitely is teensy. Anyway, after soldering on headers and popping it in a breadboard, I got it running with the blink example. It took some time, but my issue was downloading directly from the Teensy loader page (which does not support Teensy3.0) rather than from the PJRC Teensy forums. To be fair, the Teensy loader page does have a notice at the top about this, but I missed it.

After getting the software, I was able to write a standard Arduino sketch to blink the LED connected to pin 13. The code is below:

void setup(){
   pinMode(13, OUTPUT);
}

void loop(){
   digitalWrite( 13, HIGH );
   delay( 100 );
   digitalRead( 13, LOW );
   delay( 100 )
}

It's pretty nice that the Teensy3 works like a regular Arduino, but with more pins, at a higher clock rate, in 32 bits and with heaps of extra peripherals. However my end goal is to use the board as a CNC controller for my ongoing firmware project. This will make extensive use of timed interrupts to control steppers, so I thought I'd try to get timers working. Of course, the Teensy3 is not a Atmel uC, so everything changes at this point and, with the help of this forum post to get started, I had to dive into the manual (available from here) for the Freescale MK20DX128 that the Teensy3 is based upon.

According to the forum post, the timer to use is one of the (4?) Periodic Interrupt Timers (PITs). As expected, these have a number of control registers. Registers listed with an [N], e.g. PIT_LDVAL[N] should have an appropriate timer index substituted, like PIT_LDVAL2. Here are the registers:

SIM_SCGC6 - Enables/disables clock used by PIT timers, not exactly clear on the details, set to SIM_SCGC6_PIT in the forum post example.
PIT_MCR - Enables and disables the PIT timers. Writing zero enables the timers and writing 1 disables them.
PIT_LDVAL[N] - Sets the timer count value. Apparently the timer runs at 50MHz, so toggling timer 2 every second should set PIT_LDVAL2 to 0x2fa080 (hex for 50,000,000). Visually, this appears to be around a second.
PIT_TCTRL[N] - Bit zero (TEN in the manual) enables (set to 1) or disables (set to zero) the timer. Bit one (TIE in the manual) enables (set to 1) or disables (set to zero) interrupts that can be generated by the timer.
PIT_TFLG[N] - Flag to indicate timer waiting. Set to one to start timer and at the end of every called interrupt routine, otherwise interrupts will stop.

Finally, interupts must be enabled. Again I'm not clear on the details, but calling NVIC_ENABLE_IRQ( IRC_PIT_CH[N] ) results in the interupt "void pit[N]_isr(void){}" being called. Although it seems like the chip should have four timers, I only succeeded in getting timers 0, 1, and 2 working properly with interrupts, testing with index 3 gave a linker error in the Arduino software.

Anyway, here's the code for my tests:

#define TIE 0x2
#define TEN 0x1

void pit0_isr(void){
  digitalWrite( 13, !digitalRead(13) );
  PIT_TFLG0 = 1; 
}

void setup(){
  pinMode(13,OUTPUT);
  SIM_SCGC6 |= SIM_SCGC6_PIT;
  PIT_MCR = 0x00;
  NVIC_ENABLE_IRQ(IRQ_PIT_CH0);
  PIT_LDVAL0 = 0x2faf080;
  PIT_TCTRL0 = TIE;
  PIT_TCTRL0 |= TEN;
  PIT_TFLG0 |= 1;
}

void loop(){
  delay(2000);
}

Hope this helps someone get up to speed, and perhaps serves as a reference for me later on.

Thursday, January 3, 2013

CMake add_subdirectory link errors

This is probably something that everyone already knows, but it took me a few hours to figure out...

I'm currently working on a project that uses CMake to build several libraries and executables all in the same project. Everything worked fine on OS-X, but when I tried to build on Linux I kept getting linker errors when compiling executable using functions and objects from interdependent libraries that I thought were being linked correctly via the target_link_libraries() command. Each library was compiled by CMake using the add_subdirectory() command.

I found out that the issue is that you need to specify dependencies between the libraries themselves, not just between executables and libraries. This was a bit of a surprise, but just adding the target_link_libraries() command to the CMakeLists.txt file for each library and listing the dependencies fixed the problem.

Simplified DDA/Bresenham-like algorithm for line drawing in 2D and 3D

Implementing Bresenham's line drawing algorithm is a pain and has some drawbacks for motion control applications. In particular, it relies on swapping endpoints of the line-segments to achieve specific preconditions and has eight configurations (in 2D alone!) that must be implemented to draw arbitrarily oriented lines. This has big disadvantages for implementation in high dimension (6D in my case) and even bigger disadvantages for motion control, where you want to move from point A to point B, not arbitrarily (and incorrectly) assume you're already at B and then move backward to A because, hey, order doesn't matter!

I recently ran across Cris Luengo's blog post on a Bresenham-equivalent algorithm in arbitrary dimension based on floating point rounding. This is likely to be faster on modern machines than the integer only version, simply because i) floating point ops and casting are cheap and ii) branches are expensive, which is effectively the opposite of the case when the original Bresenham algorithm was developed. He claims the method reproduces the original Bresenham algorithm's plotted pixels, although in my tests I seem to get occasional errant pixels even with double precision. It may well be an error on my part.

Regardless, this post gave me the idea of just implementing Bresenham's algorithm in a parametric form, accumulating error and advancing pixels in both (all) coordinates in the same manner. This is considerably more costly in 2D, where you double the operations required, but the relative cost decreases as the dimension increases. Since I'm targeting 6D, the relative cost is only 1/5th, and the simplicity of implementation and lack of endpoint switching more than makes up for it.

I've put up simple C++ code for 2D and 3D integer-only line drawing using a test implementation that is freely available for any use. It draws 50 random lines with the integer only algorithm described above into the green channel and the same 50 lines with the floating point rounding method (in double precision) into the red channel. The results are below, if the algorithms matched exactly, all the lines would be yellow, with no red or green pixels anywhere.

It looks okay at first, but if you zoom in and look carefully you can see a few red and green pixels around. Regardless, they both do a good job of plotting lines, and if my code is off by a step or two occasionally it's not the end of the world.

syntax-highlighter