1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
|
.\"
.\" indexamajig man page
.\"
.\" Copyright © 2012-2013 Deutsches Elektronen-Synchrotron DESY,
.\" a research centre of the Helmholtz Association.
.\"
.\" Part of CrystFEL - crystallography with a FEL
.\"
.TH INDEXAMAJIG 1
.SH NAME
indexamajig \- bulk indexing and data reduction program
.SH SYNOPSIS
.PP
.BR indexamajig
\fB-i\fR \fIfilename\fR \fB-o\fR \fIoutput.stream\fR \fB-g\fR \fIdetector.geom\fR \fB-b\fR \fIbeamline.beam\fR \fB--peaks=\fR\fImethod\fR \fB--indexing=\fR\fImethod\fR
[\fBoptions\fR] \fB...\fR
.PP
\fBindexamajig --help\fR
.SH DESCRIPTION
\fBindexamajig\fR takes a list of diffraction snapshots from crystals in random orientations and attempts to find peaks, index and integrate each one. The input is a list of diffraction image files in HDF5 format and some auxiliary files and parameters. The output is a long text file ('stream') containing the results from each image in turn.
For minimal basic use, you need to provide the list of diffraction patterns, the method which will be used to index, a file describing the geometry of the detector, and a PDB file which contains the unit cell which will be used for the indexing. Here is what the minimal use might look like on the command line:
.IP \fBindexamajig\fR
.PD
-i mypatterns.lst -j 10 -g mygeometry.geom --indexing=mosflm,dirax --peaks=hdf5 -b myxfel.beam -o test.stream -p mycell.pdb
.PP
More typical use includes all the above, but might also include extra parameters to modify the behaviour. The HDF5 files might be in some
folder a long way from the current directory, so you might want to specify a
full pathname to be added in front of each filename. You'll probably want to
run more than one indexing job at a time (-j <n>).
You can include a table of saturation values for in the HDF5 file, if you have
a method for estimating the intensities of saturated peaks. It goes in
/processing/hitfinder/peakinfo_saturated, and should be an n*3 two dimensional
array, where the first two columns contain fast scan and slow scan coordinates
(in that order) and the third contains the value which should belong in a peak
at the given location. The value will be spread in a small cross centred on
that location.
See \fBman crystfel_geometry\fR for information about how to create a geometry description file and a beam parameters file.
.SH PEAK DETECTION
You can control the peak detection on the command line. Firstly, you can choose the peak detection method using \fB--peaks=\fR\fImethod\fR. Currently, two values for "method" are available. \fB--peaks=hdf5\fR will take the peak locations from the HDF5 file. It expects a two dimensional array at where size in the first dimension is the number of peaks and the size in the second dimension is three. The first two columns contain the fast scan and slow scan coordinates, the third contains the intensity. However, the intensity will be ignored since the pattern will always be re-integrated using the unit cell provided by the indexer on the basis of the peaks. You can tell indexamajig where to find this table inside each HDF5 file using \fB--hdf5-peaks=\fR\fIpath\fR.
If you use \fB--peaks=zaef\fR, indexamajig will use a simple gradient search after Zaefferer (2000). You can control the overall threshold and minimum squared gradient for finding a peak using \fB--threshold\fR and \fB--min-gradient\fR. The threshold has arbitrary units matching the pixel values in the data, and the minimum gradient has the equivalent squared units.
Peaks will be rejected if the 'foot point' is further away from the 'summit' of the peak by more than the inner integration radius (see below). They will also be rejected if the peak is closer than twice the inner integration radius from another peak.
You can suppress peak detection altogether for a panel in the geometry file by specifying the "no_index" value for the panel as non-zero.
.SH INDEXING METHODS
You can choose between a variety of indexing methods. You can choose more than one method, in which case each method will be tried in turn until one of them reports that the pattern has been successfully indexed. Choose from:
.IP \fBdirax\fR
.PD
Invoke DirAx, check linear combinations of the resulting cell axes for agreement with your cell, and then check that the cell accounts for at least half of the peaks from the peak search.
.sp
To use this option, 'dirax' must be in your shell's search path. If you see the DirAx version and copyright information when you run \fBdirax\fR on the command line, things are set up correctly.
.IP \fBmosflm\fR
.PD
As \fBdirax\fR, but invoke MOSFLM instead. If you provide a PDB file (with \fB-p\fR), the lattice type and centering information will be passed to MOSFLM, which will then return solutions which match. Note that the lattice parameter information will \fBnot\fR be given to MOSFLM, because it has no way to make use of it.
.sp
To use this option, 'ipmosflm' must be in your shell's search path. If you see the MOSFLM version and copyright information when you run \fBipmosflm\fR on the command line, things are set up correctly.
.IP \fBreax\fR
.PD
Run the DPS algorithm, looking for the axes of your cell.
.IP \fBgrainspotter\fR
.PD
Invoke GrainSpotter, which will use your cell parameters to find multiple crystals in each pattern.
.sp
To use this option, 'GrainSpotter.0.93' must be in your shell's search path. If you see the GrainSpotter version information when you run \fBGrainSpotter.0.93\fR on the command line, things are set up correctly.
.IP \fBxds\fR
.PD
Invoke XDS, and use its REFIDX procedure to attempt to index the pattern.
.PP
You can add one or more of the following to the above indexing methods:
.IP \fB-raw\fR
.PD
Do not check the resulting unit cell. This option is useful when you need to determine the unit cell ab initio. Use with 'dirax' and 'mosflm' - the other indexing methods need the unit cell as input in any case, and cannot determine the unit cell ab initio. See \fB-comb\fR and \fB-axes\fR.
.IP \fB-axes\fR
.PD
Check permutations of the axes for correspondence with your cell, but do not check linear combinations. This is useful to avoid a potential problem when one of the unit cell axis lengths is close to a multiple of one of the others. Can be used with \fBdirax\fR and \fBmosflm\fR. See \fB-raw\fR and \fB-comb\fR.
.IP \fB-comb\fR
.PD
Check linear combinations of the unit cell basis vectors to see if a cell can be produced which looks like your unit cell. This is the default behaviour for \fBdirax\fR and \fBmosflm\fR. See \fB-raw\fR and \fB-axes\fR.
.IP \fB-bad\fR
.PD
Do not check that the cell accounts for any of the peaks as described in \fBdirax\fR above. Might be useful to debug initial indexing problems, or if there are many multi-crystal patterns and the indexing method has no concept of multiple crystals per pattern (which, at the moment, means all of them). Can be used with any indexing method, but is generally a bad idea.
.IP \fB-nolatt\fR
.PD
Do not use the lattice type information from the PDB file to help guide the indexing. Use with \fBmosflm\fR, which is the only indexing method which can optionally take advantage of this information. This is the default behaviour for \fBdirax\fR. This option makes no sense for \fBreax\fR, which is intrinsically based on using known lattice information.
.IP \fB-latt\fR
.PD
This is the opposite of \fB-nolatt\fR, and is the default behaviour for \fBmosflm\fR, \fBxds\fR and \fBgrainspotter\fR. It is the only behaviour for \fBreax\fR.
.IP \fB-cell\fR
.PD
Provide your unit cell parameters to the indexing algorithm. This is the default for \fBxds\fR and \fBgrainspotter\fR, and the only behaviour for \fBreax\fR. This option makes no sense for \fBdirax\fR and \fBmosflm\fR, neither of which can make use of this information.
.IP \fB-nocell\fR
.PD
Do not provide your unit cell parameters to the indexing algorithm. This is the only behaviour for \fBmosflm\fR and \fBdirax\fR, both of which cannot make use of the information. Can be used with \fBgrainspotter\fR and \fBxds\fR, and makes no sense for \fBreax\fR, which is intrinsically based on using known cell parameters.
.PP
The default indexing method is 'none', which means no indexing will be done. This is useful if you just want to check that the peak detection is working properly.
.PP
Your indexing methods will be checked for validity, incompatible flags removed, and warnings given about duplicates. For example, \fBmosflm\fR and \fBmosflm-comb-latt\fR represent the same indexing method, because \fB-comb\fR and \fB-latt\fR are the default behaviour for \fBmosflm\fR. The 'long version' of each of your indexing methods will be listed in the output, and the stream will contain a record of which indexing method successfully indexed each pattern.
.PP
It's risky to use \fBmosflm-nolatt\fR in conjunction with either \fB-comb\fR or \fB-axes\fR when you have a rhombohedral cell. This would be an odd thing to do anyway: why withhold the lattice information from MOSFLM if you know what it is, and want to use it to check the result? It's risky because MOSFLM will by default return the "H centered" lattice for your rhombohedral cell, and it's not completely certain that MOSFLM consistently uses one or other of the two possible conventions for the relationship between the "H" and "R" cells. It is, however, very likely that it does.
Examples of indexing methods: 'dirax,mosflm,reax', 'dirax-raw,mosflm-raw', 'dirax-raw-bad'.
.SH PEAK INTEGRATION
If the pattern could be successfully indexed, peaks will be predicted in the pattern and their intensities measured. You have a choice of integration methods, and you specify the method using \fB--integration\fR. Choose from:
.IP \fBrings\fR
.PD
Use three concentric rings to determine the peak, buffer and background estimation regions. The radius of the smallest circle sets the peak region. The radius of the middle and outer circles describe an annulus from which the background will be estimated. You can set the radii of the rings using \fB--int-radius\fR (see below). By default, the peak will first be centered iteratively on the actual peak location.
.IP \fBprof2d\fR
.PD
Integrate the peaks using 2D profile fitting with a planar background, close to the method described by Rossmann (1979) J. Appl. Cryst. 12 p225.
.PP
You can add one or more of the following to the above integration methods:
.IP \fB-nocen\fR
.PD
Skip the peak centering step. The opposite is \fB-cen\fR, which is the default.
.IP \fB-sat\fR
.PD
Normally, reflections which contain one or more pixels above max_adu (defined in the detector geometry file) will not be integrated and written to the stream. Using this option skips this check, and allows saturated reflections to be passed to the later merging stages. This is not usually a good idea, but might be your only choice if there are many saturated reflections. The opposite is \fB-nosat\fR, which is the default.
.SH OPTIMISING THE INTEGRATION RADII
To determine appropriate values for the integration radii, index some patterns with the default values and view the results using \fBcheck-near-bragg\fR (in the scripts folder). Set the binning in \fBhdfsee\fR to 1, and adjust the ring radius until none of the rings overlap for any of the patterns. This ring radius is the outer radius to use. Then reduce the radius until the circles match the sizes of the peaks as closely as possible. This value is the inner radius. The middle radius should be between the two, ideally between two and three pixels smaller than the outer radius.
.PP
If it's difficult to do this without setting the middle radius to the
same value as the inner radius, then the peaks are too close together to be
accurately integrated. Perhaps you got greedy with the resolution and put the
detector too close to the interaction region?
.SH OPTIONS
.PD 0
.IP "\fB-i\fR \fIfilename\fR"
.IP \fB--input=\fR\fIfilename\fR
.PD
Read the list of images to process from \fIfilename\fR. The default is \fB--input=-\fR, which means to read from stdin.
.PD 0
.IP "\fB-o\fR \fIfilename\fR"
.IP \fB--output=\fR\fIfilename\fR
.PD
Write the output data stream to \fIfilename\fR. The default is \fB--output=-\fR, which means to write to stdout.
.PD 0
.IP \fB--peaks=\fR\fImethod\fR
.PD
Find peaks in the images using \fImethod\fR. See the second titled \fBPEAK DETECTION\fB (above) for more information.
.PD 0
.IP \fB--indexing=\fR\fImethod\fR
.PD
Index the patterns using \fImethod\fR. See the section titled \fBINDEXING METHODS\fR (above) for more information.
.PD 0
.IP "\fB-g\fR \fIfilename\fR"
.IP \fB--geometry=\fR\fIfilename\fR
.PD
Read the detector geometry description from \fIfilename\fR. See \fBman crystfel_geometry\fR for more information.
.PD 0
.IP "\fB-b\fR \fIfilename\fR"
.IP \fB--beam=\fR\fIfilename\fR
.PD
Read the beam description from \fIfilename\fR. See \fBman crystfel_geometry\fR for more information.
.PD 0
.IP "\fB-p\fR \fIfilename\fR"
.IP \fB--pdb=\fR\fIfilename\fR
.PD
Read the unit cell for comparison from the CRYST1 line of the PDB file called \fIfilename\fR.
.PD 0
.IP "\fB-e\fR \fIpath\fR"
.IP \fB--image=\fR\fIpath\fR
.PD
Get the image data to display from \fIpath\fR inside the HDF5 file. For example: \fI/data/rawdata\fR. If this is not specified, the default behaviour is to use the first two-dimensional dataset with both dimensions greater than 64.
.PD 0
.IP \fB--int-radius=\fR\fIinner,middle,outer\fR
.PD
Set the inner, middle and outer radii for three-ring integration. See the
section about \fBPEAK INTEGRATION\fR, above, for details of how to determine
these. The defaults are probably not appropriate for your situation.
.PD
The default is \fB--int-radius=4,5,7\fR.
.PD 0
.IP \fB--basename\fR
.PD
Remove the directory parts of the filenames taken from the input file. If \fB--prefix\fR or \fB-x\fR is also given, the directory parts of the filename will be removed \fIbefore\fR adding the prefix.
.PD 0
.IP "\fB-x\fR \fIprefix\fR"
.IP \fB--prefix=\fR\fIprefix\fR
.PD
Prefix the filenames from the input file with \fIprefix\fR. If \fB--basename\fR is also given, the filenames will be prefixed \fIafter\fR removing the directory parts of the filenames.
.PD 0
.IP \fB--hdf5-peaks=\fR\fIpath\fR
.PD
When using \fB--peaks=hdf5\fR, read the peak locations from a table in the HDF5 file located at \fIpath\fR.
.PD 0
.IP \fB--tolerance=\fR\fItol\fR
.PD
Set the tolerances for unit cell comparison. \fItol\fR takes the form \fIa\fR,\fIb\fR,\fIc\fR,\fIang\fR. \fIa\fR, \fIb\fR and \fIc\fR are the tolerances, in percent, for the respective direct space axes when using \fB-axes\fR in the indexing method (see below). \fIang\fR is the tolerance in degrees for the angles. When \fBnot\fR using \fB-axes\fR, they represent the respective \fIreciprocal\fR space parameters. Sorry for the horrifying inconsistency.
.PD
The default is \fB--tolerance=5,5,5,1.5\fR.
.PD 0
.IP \fB--median-filter=\fR\fIn\fR
.PD
Apply a median filter with box "radius" \fIn\fR to the image. The median of the values from a \fI(n+1)\fRx\fI(n+1)\fR square centered on the pixel will be subtracted from each pixel. This might help with peak detection if the background is high and/or noisy. The \fIunfiltered\fR image will be used for the final integration of the peaks. If you also use \fB--noise-filter\fR, the median filter will be applied first.
.PD 0
.IP \fB--filter-noise\fR
.PD
Apply a noise filter to the image with checks 3x3 squares of pixels and sets all of them to zero if any of the nine pixels have a negative value. This filter may help with peak detection under certain circumstances. The \fIunfiltered\fR image will be used for the final integration of the peaks, because the filter is destroys a lot of information from the pattern. If you also use \fB--median-filter\fR, the median filter will be applied first.
.PD 0
.IP \fB--no-sat-corr\fR
.PD
This option is here for historical purposes only, to disable a correction which is done if certain extra information is included in the HDF5 file.
.PD 0
.IP \fB--threshold=\fR\fIthres\fR
.PD
Set the overall threshold for peak detection using \fB--peaks=zaef\fR to \fIthres\fR, which has the same units as the detector data. The default is \fB--threshold=800\fR.
.PD 0
.IP \fB--min-gradient=\fR\fIgrad\fR
.PD
Set the gradient threshold for peak detection using \fB--peaks=zaef\fR to \fIgrad\fR, which units of "detector units per pixel". The default is \fB--min-gradient=100000\fR.
.PD 0
.IP \fB--min-snr=\fR\fIsnr\fR
.PD
Set the minimum I/sigma(I) for peak detection when using \fB--peaks=zaef\fR. The default is \fB--min-snr=5\fR.
.PD 0
.IP \fB--copy-hdf5-field=\fR\fIpath\fR
.PD
Copy the information from \fIpath\fR in the HDF5 file into the output stream. The information must be a single scalar value. This option is sometimes useful to allow data to be separated after indexing according to some condition such the presence of an optical pump pulse. You can give this option as many times as you need to copy multiple bits of information.
.PD 0
.IP "\fB-j\fR \fIn\fR"
.PD
Run \fIn\fR analyses in parallel. Default: 1.
.PD 0
.IP \fB--no-check-prefix\fR
.PD
Don't attempt to correct the prefix (see \fB--prefix\fR) if it doesn't look correct.
.PD 0
.IP \fB--closer-peak\fR
.PD
If you use this option, indexamajig will integrate around the location of a detected peak instead of the predicted peak location if one is found close to the predicted position, within ten pixels. \fBDon't use this option\fR, because
there is currently no way to set the definition of 'nearby' to be appropriate
for your data.
.PD 0
.IP \fB--no-closer-peak\fR
.PD
This is the opposite of \fB--closer-peak\fR, and is provided for compatibility
with old scripts because this option selects the behaviour which is now the
default.
.PD 0
.IP \fB--use-saturated\fR
.PD
Normally, peaks which contain one or more pixels above max_adu (defined in the detector geometry file) will not be used for indexing. Using this option skips this check, possibly improving the indexing rate if there is a large proportion of saturated peaks.
.PD 0
.IP \fB--no-revalidate\fR
.PD
When using \fB--peaks=hdf5\fR, the peaks will be put through the same checks as if you were using \fB--peaks=zaef\fR. These checks reject peaks which are too close to panel edges, are saturated (unless you use \fB--use-saturated\fR), fall short of the minimum SNR value given by \fB--min-snr\fR, have other nearby peaks (closer than twice the inner integration radius, see \fB--int-radius\fR), or have any part in a bad region. Using this option skips this validation step, and uses the peaks directly.
.PD 0
.IP \fB--no-peaks-in-stream\fR
.PD
Do not record peak search results in the stream. You won't be able to check that the peak detection was any good, but the stream will be around 30% smaller.
.PD 0
.IP \fB--no-refls-in-stream\fR
.PD
Do not record integrated reflections in the stream. The resulting output won't be usable for merging, but will be a lot smaller. This option might be useful if you're only interested in things like unit cell parameters and orientations.
.SH BUGS
ReAx indexing is experimental. It works very nicely for some people, and crashes for others. In a future version, it will be improved and fully supported.
.SH AUTHOR
This page was written by Thomas White.
.SH REPORTING BUGS
Report bugs to <taw@physics.org>, or visit <http://www.desy.de/~twhite/crystfel>.
.SH COPYRIGHT AND DISCLAIMER
Copyright © 2012-2013 Deutsches Elektronen-Synchrotron DESY, a research centre of the Helmholtz Association.
.P
indexamajig, and this manual, are part of CrystFEL.
.P
CrystFEL is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.
.P
CrystFEL is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.
.P
You should have received a copy of the GNU General Public License along with CrystFEL. If not, see <http://www.gnu.org/licenses/>.
.SH SEE ALSO
.BR crystfel (7),
.BR crystfel_geometry (5),
.BR process_hkl (1),
.BR partialator (1)
|