Exchange¶
Calling for more exchange about (meta)data in the field¶
The proprietary software package IVAS now APSuite by AMETEK/Cameca is the workhorse
for data acquisition and analysis in the atom probe community. Several file formats
that this software uses are proprietary with the .apt
file format as one exception.
The idea of the .apt
file format is opening up atom probe data from Cameca instrument,
i.e., Local Electrode Atom Probe (LEAP) instruments for scientific data analysis and
public dissemination. The International Field Emission Society (IFES) and Cameca have
worked together to communicate a documentation for the format which enabled the
community to develop open-source reading capabilities as implemented also in the
ifes_apt_tc_data_modeling
library through its apt
module.
While .apt
is more and more getting accepted, traditional text and binary file formats
are still commonly used in daily atom probe research practice. Not for all of these
formats formal specifications exist. This makes working with these formats
in software tools other than from AMETEK/Cameca trickier and error-prone.
A practical solution to raise at least awareness of this problem has been that scientists collect examples (instances) of files in respective formats. Pieces of information about the content and formatting of atom probe file formats were reported in the literature (e.g. in the books by D. Larson et al. or B. Gault et al.. Atom probers like Daniel Haley have contributed substantially through raising awareness of the issue within the community. Consequently, individuals of the community invested into reverse engineering efforts about what these formats store and how this can be parsed using open-source software that is developed within the atom probe community and beyond.
The ifes_apt_tc_data_modeling
library bundles this knowledge highlighting though also
that there are still gaps in our understanding. From an academic point of view
these should be closed so that whenever possible atom probe data and metadata can
be always communicated clearly with respect to what do certain numbers mean, i.e.,
what are the semantics and concepts behind the numbers and data items.
As an example, the .pos
file format stores a table of number quadruples which mostly
are interpreted as reconstructed position and mass-to-charge-state ratio values.
Often the latter column is hijacked though to report conceptually different quantities
like identifier used to distinguish clusters of atoms. Which specific input data were
used, which parameterization was used for the reconstruction algorithm whose results
were stored in that .pos
file. These questions pertaining to the workflow and
provenance along the data lifecycle remain unaddressed. Other technical issues exist
with file formats like .pos
and .epos
: These do not provide a magic number
that identifies the file as a true .pos
file such that software tools and humans
could make substantiated assumptions.
Needs for improvement exist also for ranging definitions file formats like the commonly used .rrng
, .rng
,
and .env
formats: These merely store the resulting ranging definitions but do not store details based on which
peak finding algorithm or even which mass-to-charge-state-ratio value array they were defined with.
A more detailed discussion of these limitations is provided in the literature.
The ifes_apt_tc_data_modeling
library was developed after observing that
many researchers in atom probe uses custom written code for reading atom probe
data via classical file formats. While for several formats this is a rather
simple programming exercise, it led though to parallel developments and many
implementations that target only specific use cases instead of a general enough
implementations with functionalities for all possible elements, ion types, and
edge cases.