Nipype BrainsAutoWorkup Demonstration
Here we will document the process for the creation and debugging of a nipype version of BrainsAutoWorkup.
Before we can run this program, we must have a fully installed copy of Nipype and either an original or a modified (attached) version of generate_classes.py/generate_class.py
generate_classes.py is a script designed to help facilitate the creation of nipype helper classes from programs which support the Slicer-style "--xml" flag.
generate_class.py is a script based on generate_classes.py, designed to mitigate some of its weaknesses:
* It is not designed to run on a preset list of classes, but a single, command-line-passed class, thus making it batchable from the shell.
* It runs its commands directly instead of requiring that they be run through slicer.
* It supports many more argument types
* It makes use of the Directory type (note: this may cause errors sometimes; nipype can be weird)
It takes one argument: the program whose xml is to be parsed into a wrapper .py file.
Putting together the pieces: Framework
* Create nipype node wrappers for all of your external programs that you want to be pipelined as nodes.
* Convert the TCL into rough python. Note that this code does not need to work! It will simply be the framework for further conversion to nipype.
* "Flatten" your python code. You can't have "if" statements in the middle of a pipe, nor function calls, nor loops, etc. Note that some of this behavior can be approximated. Loops can be approximated by iterables/mapnodes, for example. The best way to deal with conditionals is to either put them before the pipeline starts being assembled or into the code for a given node.
* Now you should have a linear flow of your code, with a bunch of external calls to wrapped program nodes. Now you need to structure those as nodes within your nipype script. Instead of:
SomeExternalProgramCall $param1_value $param2_value $param3_value ...
... you will instead have:
SomeExternalProgram_node = pe.Node(interface=SomeExternalProgramWrapperType(), name="SomeExternalProgram_node") SomeExternalProgram_node.inputs.param1 = param1_value SomeExternalProgram_node.inputs.param2 = param2_value SomeExternalProgram_node.inputs.param3 = param3_value ...
* Now you need to identify which elements are outputs of one stage which get linked into the next so that you can properly pipeline them. I recommend putting some special comment or variable naming system at this point in time; it should identify the name of the node that created it and what its output's name was.. During this phase, you will discover, to your chagrin, that your previous code did lots of "tweaking" of the outputs of previous programs to get the proper inputs for the next stage, such as "I passed in the previous program $directory to store its results, and now I'm passing in "$directory/SomeSubFile.nii.gz" to the next step of the process. Since "$directory/SomeSubFile.nii.gz" wasn't a listed output of the previous node, you'll have to create a new helper node simply to generate that filename. :P
* Now that all of those problems are corrected and out of the way, you can start building up your pipeline.connect line. For each variable that you've flagged to pipeline, take its ".inputs.paramX" line and bring it down to the end where you've started your pipeline.connect statement. The order you put them in doesn't matter. Now rewrite them in the form:
(NodeFrom, NodeTo, [('VarFrom'), ('VarTo')]),
* Once you finish this, make sure you've got all of your imports at the top, and you're ready to debug. Your imports may look something like:
import nipype.interfaces.io as nio # Data i/o import nipype.interfaces.spm as spm # spm import nipype.interfaces.matlab as mlab # how to run matlab import nipype.interfaces.fsl as fsl # fsl import nipype.interfaces.utility as util # utility import nipype.pipeline.engine as pe # pypeline engine import nipype.algorithms.rapidart as ra # artifact detection import nipype.algorithms.modelgen as model # model specification import enthought.traits.api as traits from nipype.interfaces.base import BaseInterface, TraitedSpec from AutoTalairachParameters import * from BRAINSABC import * from BRAINSApplySurfaceLabels import * from BRAINSClassify import * from BRAINSClassPlugs import * from BRAINSConstellationDetector import * from BRAINSCut import * from BRAINSDemonWarp import * from BRAINSDiscreteClass import * from BRAINSFit import * from BRAINSMeasureSurface import * from BRAINSMush import * from BRAINSResample import * from BRAINSROIAuto import * from BRAINSTalairachMask import * from BRAINSTalairach import * from ClassTalairachVolumes import * from ClipAndAverageTwo import * from CreateAutoLabelBrainSurface import * from CreateBrainSurface import * from CreateGenusZeroBrainSurface import * from DicomToNrrdConverter import * from DtiSkullStripB0 import * from extractNrrdVectorIndex import * from GenerateSummedGradientImage import * from gtractAnisotropyMap import * from gtractConcatDwi import * from gtractCoregBvalues import * from gtractTensor import * from itkAndImage import * from itkBinaryImageMorphology import * from itkBinaryThresholdImage import * from itkConstantImageMath import * from itkMaskImage import * from itkNaryMaximumImageFilter import * from itkObjectMorphology import * from itkOrImage import * from itkRelabelComponentImage import * from N4ITK import * from PickBloodPlugsFromMargin import * from QuadMeshDecimation import * from QuadMeshSmoothing import * from runBRAINSCut import * from ScalarTalairachMeasures import * from StandardizeImageIntensity import * import os, sys, string, shutil, glob, re
Common debugging pitfalls
* The nipype docs are unrelaible due to being obsoleted.
* Datasinks should not be trusted. As of our last conversations, datasinks (which are currently awkward) are likely to be changed in how they work. For now, simply hard-code your paths for where you want files to go instead of relying on autogeneration.
* Absolute paths may or may not be supported in the current version if there's a symlink in the path structure, due to the use of realpath(). Relative paths always are supported.
* Nipype makes extensive use of caching. If a node does something wrong in the debugging phase, you may have to blow away your entire cache/results directory. This can make it sloooooooow!
* Some TCL code may be difficult or impossible to translate to python and put into the pipeline, such as ITK calls. These sections should be grouped together and wrapped as a continuous block of TCL. Warning, though: this process itself can be tricky. For example ,one piece of code I wrapped needed a multidimensional array to run. How do you pass something a multidimensional array on the commandline? I had to write code to first encode, then decode, a multidimensional array into/from a string, using differing separators.
* Do not confuse iterables and maps! You must read this:
In general, MapNode is what you want to use for loops, not iterables. Each has their own specific purposes.
* Likewise, MergeNode and the other "helper nodes" can be useful in trying to linearize complex logic flows which may be present in your original TCL.
* This piece of advice from Chris Gorgolewski:
node = pe.MapNode(interface=SomeInterface(), name="node", iterfield=['file', 'number']) node.file = ['file', 'another_file', 'different_file'] node.number = [1,2,3] This will run SomeInterface eith the following pairs of arguments ('file', 1), ('another_file', 2), ('different_file', 3)
* And this one:
> Picture this situation: > > ------- > incr = 1 > for file in FilesToProcess: > process(file, incr) > incr += 1 > ------- > > We hit a new snag: we don't know in advance of the pipeline how many files will be in FilesToProcess. But I think that should be easy enough to deal with -- we simply make our incr mapping way larger than would ever be needed, such as [1, 2,3, 4,....99998, 99999]. You can achieve this in a similar way as in the example I have written above. In case the number of files is not known beforehand you can use special connect feature which evaluates a function before setting the inputs. Imagine that process2 is a MapNode with two inputs (which are iterfileds): file and incr; process1 outputs a list of files. def count_files(list_of_files): return range(len(list_of_files)) my_worflow.connect([(process1, process2, [('output_files', 'file'), (('output_files', count_files), 'incr')])])
*The "order of strictness" in terms of traits is "Directory">"File">"Str". That is, in general, a string can be used anywhere a file or directory can be used; a file can be used anywhere a directory can; and a directory can only be used for directories. File and Directory as inputs default to requiring that their value already exist on the filesystem, although this may be controllable by the exists parameter.
* You'll frequently (in the early phases of debugging) run into cases where you mess up your linkages between nodes and variables in the pipeline phase. A particularly insidious one for me involved a space (since variables passed are strings, and thus handled at runtime). I wrote "VbFile " instead of "VbFile".
* If you see paths like, "NiftiImage..raid0..homes..kpease..temp..itkWrapping..site-032..0060..20221..ANONRAW..0060_20221_T2_COR.nii.gz", that's not a bug! That's actually a directory name, and is how nipype mangles a full path to create a similarly-named subdirectory.
* Debugging statements in your nodes will never fire! Nipype eats them. If you need to have them, write to a file.
* The failure "raise NotImplementedError" means that you need to define _run_interface or _list_outputs in your node. Really obscure, I know.
* The autogenerated getattr call may fail with an exception; you may want to wrap it with a catch.
* List types only seem to work if they take in string arguments. If you get an error at just under line 1000 in _format_arg that says "TypeError: float argument required, not str" or similar, that's probably the problem. Change your type to %s instead of %f or whatnot, even if you really want to pass floats.