aboutsummaryrefslogtreecommitdiff
path: root/doc/articles/online.rst
blob: 6bf58cfafe9f663724546b18b8bf084f8d31c57a (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
==========================================
Real-time data processing with indexamajig
==========================================

Instead of reading from files, indexamajig can receive its data from an
"online" streaming system.  In this version, two streaming systems are
implemented in CrystFEL: ZeroMQ and ASAP::O.

When using streamed data, most things work as they normally would.  However,
there are some limitations and "gotchas".  Read this document closely for
details.

Streamed data can be in HDF5, MsgPack or Seedee format.  CBF (as well as
gzipped CBF) may be added in the future.  To specify the format of the data,
use ``--data-format=hdf5``, ``--data-format=seedee`` or
``--data-format=msgpack``.

You can use all of the usual peak search methods for streaming data, but
note that HDF5/CXI peak lists are not currently supported for streamed data.
This may also be added in future.

The option ``--no-image-data`` will be honoured, if given.  This makes it
possible to quickly check streaming data for "indexability", which is more
likely to be useful with streamed data than with files.  You will be able
to do almost all of the usual downstream analysis operations on the resulting
stream, except that attempting to merge it using partialator or process_hkl
will result in zeroes everywhere.


Streaming data over ZeroMQ
==========================

To tell indexamajig to receive data over a ZeroMQ socket, use ``--zmq-input``
instead of ``--input`` or ``-i``.  An error will be generated if you use
``--zmq-input`` and ``--input``  or ``-i`` simultaneously.

Indexamajig can use either a SUB (subscriber) or a REQ (request) socket.  The
SUB socket type can be used for receiving data from OnDA/OM via the same
mechanism that the OnDA/OM GUI uses.  In this case, you will also need to
specify which message prefixes to subscribe to using ``--zmq-subscribe``::

  indexamajig --zmq-input=tcp://127.0.0.1:5002 \
              --zmq-subscribe=ondaframedata \
              -o output.stream -g Eiger.geom ...

You can use ``--zmq-subscribe`` multiple times to subscribe to multiple message
prefixes.

Note that this mode of operation does not combine well with multi-threading
in indexamajig - all worker processes will receive the same data!  For anything
more than "taking a peek" at the data, use the REQ socket instead by using
``--zmq-request`` instead of ``--zmq-subscribe``.  The argument to this option
is the string which should be sent in the request message::

  indexamajig --zmq-input=tcp://127.0.0.1:5002 \
              --zmq-request=next \
              -o output.stream -g Eiger.geom ...

Because they represent completely different modes of operation, the two options
``--zmq-request`` and ``--zmq-subscribe`` are mutually exclusive.


Streaming data using ASAP::O
============================

To tell indexamajig to receive data via ASAP::O, use ``--asapo-endpoint``.
You must additionally specify ``--asapo-token`` (giving the ASAP::O
authentication token), ``--asapo-beamtime`` and ``--asapo-source`` (specifying
the ASAP::O beamtime ID and data source, respectively), and ``--asapo-group``
with the ASAP::O consumer group ID.  If you run multiple copies of
``indexamajig`` on the same data stream, you should make sure that they all use
the same consumer group ID.

Tip: Since the ASAP::O token is a long text string, put it in a separate file
and use ``cat`` in backticks, as follows::

   indexamajig \
       --asapo-endpoint=my-endpoint.facility.de:8400 \
       --asapo-token=`cat /path/to/asapo-token.txt` \
       --asapo-beamtime=mybeamtime1234 \
       --asapo-source=eiger \
       --asapo-group=online

ASAP::O will remember the last retrieved frame in the stream for the given
consumer group ID.  Therefore, if you run ``indexamajig`` again, it will start
from where it left off.  To start again from the beginning of a stream, you
will need to reset the 'last read' location separately.  Instructions and tools
for this are pending...


MsgPack data format
===================

For data in MessagePack format, the following assumptions are made:

* The data consists of either a single MsgPack 'map' object, or an array of
  maps.
  If there are multiple map objects in the array, only the first one will be
  used.  The others will be ignored.
* The image data is given as a two-dimensional array (i.e. no 3D arrays with
  'panel number' as one of the dimensions).
* The image data itself is given as a MsgPack 'map' object representing a
  serialised NumPy array.  That is, it should contain ``type``, ``data`` and
  ``shape`` keys.
* The data ``type`` field should contain either ``<i4`` (if the data is in
  little-endian 32-bit signed integer format) or ``<f4`` for 32-bit (IEEE754
  single precision) floating-point.
* The data ``shape`` field should be a 1D array of two values.  The first
  element is the slow-scan size, the second is that fast-scan size.
* The data array within the NumPy map should be in a binary object called
  ``data``.

Note that *all* of these assumptions are 'open for negotiation' and will be
relaxed in future CrystFEL versions, as new online data formats arise.

You can specify which map objects to look at in the geometry file.  The
following example will get the incident photon energy (in eV) and detector
distance (in mm) by looking up the ``beam_energy`` and ``detector_distance``
keys, respectively, in the MsgPack map object.  It will then look up
``detector_data`` to find the image data itself.  See the next section for an
explanation of ``peak_list``::

  photon_energy = beam_energy eV
  adu_per_photon = 1
  clen = detector_distance mm
  res = 5814.0
  peak_list = peak_list
  
  thepanel/data = detector_data
  thepanel/min_fs = 0
  thepanel/max_fs = 2067
  thepanel/min_ss = 0
  thepanel/max_ss = 2161
  thepanel/corner_x = -1034
  thepanel/corner_y = -1081
  thepanel/fs = x
  thepanel/ss = y


You can use ``--peaks=msgpack`` to get the peak locations from the MsgPack
data.  In this case, the ``peak_list`` directive in the geometry file specifies
the key for the peak information in the MsgPack map object. The peak
information itself is expected to be a map object with three keys: ``fs``,
``ss`` and ``intensity``.  Each of these keys should correspond to an array
containing (respectively) the fast scan and slow scan coordinates of each peak,
and their intensities.  Obviously, the three arrays must have equal sizes.

Note that there is no way, in this structure, to communicate which detector
panel contains a peak, in the case where different detector panels cover the
same pixel ranges (in this case, the pixel data would from multiple data
blocks).  In practice, this means that the ``data`` directives for all panels
need to be the same when using ``--peaks=msgpack``.

Note also that the options ``--no-revalidate`` and ``--check-hdf5-snr`` apply
to the peak lists from ``--peaks=msgpack``.