The filex Command
The filex command manages chemical structure and reaction file I/O modules. In many cases, actively loading of I/O modules is not required because of the built-in auto-load mechanism. If the toolkit encounters a file of unknown type, an attempt is made to load a suitable module by constructing the name of the module from the file suffix. However, that mechanism fails in case the file does not have a suffix, or a non-standard suffix, or the data source is not a file but some other stream, such as a network connection, a pipe, or a standard I/O channel. In these cases, explicit managing of I/O modules is required.
The
filex
command has the following subcommands:
filex defined
filex defined format
Filex.Defined(format)
A check to determine whether the specified format is supported by an I/O module. In case the appropriate handler is not yet loaded, an attempt at auto-loading is made. For the equivalent command without auto-loading, see
filex exists.
The result value is a boolean status code.
filex exists
filex exists format
Filex.Exists(format)
A check to determine whether an I/O module for the specified format is currently loaded. This command variant does not attempt auto-loading. The format name may be either the primary name of a loaded module, or any of alias format name aliases the module recognizes. For the equivalent command with auto-loading, see
filex defined
. The result value is a boolean status code.
filex get
filex get format attribute
fx.get(attribute)
fx.attribute
fx[attribute]
Filex.Get(format,attribute)
Query the value of an attribute of the I/O module. The list of attributes is detailed in the paragraph on the
filex set
command.
In case the format argument cannot be resolved by an active module, an attempt to auto-load a suitable module is made.
filex list
filex list ?pattern?
Filex.List(?pattern=?)
List the names of all currently loaded I/O modules. A string match pattern may be used to filter the result list. The variant
filex modules
is an alias to this command.
filex load
filex load format ?objectfile?
filex load all
Filex.Load(format,?objectfile?,...)
Filex.Load(“all”)
Explicitly load an I/O module. If the module is already loaded, the current version is unloaded first. If no specific object file (a shared library on Unix/Linux, a DLL on Windows, a bundle file for OSX) is specified, the standard name of the module file is automatically constructed from the format name, and then the file searched in the directories in the I/O module path. The module path can be customized in the control variable
::cactvs(filexpath)
.
For
Tcl
, the return value of the command is the slot in the module table the module has been loaded into. This corresponds to the value of the
slot
attribute which can be queried via
filex get
. For
Python
, the return value is a module reference.
The second form of the command scans the currently set I/O module extension search path and loads all accessible modules which are not yet in memory. Modules which are already active in the running application are not unloaded, and only a single instance of each I/O module, even if present under various alias names in the module directories, is loaded. This form of the command does not return a value.
filex modules
filex modules ?pattern?
Filex.Modules(?pattern=?)
This is an alias for
filex list
.
filex ref
Filex.Ref(format)
Python
-only method to get a reference of the module, which allows terser attribute retrieval commands and other operations.
filex reload
filex reload format ?objectfile?
Filex.Reload(format,?objectfile?,...)
A variant of the
filex load
command which fails if the I/O module was not previously loaded. There is no all variant of this command.
filex set
filex set format ?attribute value?...
filex set format dict
fx.set(?attribute,value?,...)
fx.set(dict)
fx.attribute = value
fx[attribute] = value
Filex.Set(format,?attribute,value?,...)
Filex.Set(format,dict)
Set attributes of the I/O module. Compared to other classes of modules, there are rather few attributes in a module which can be set in a meaningful manner. Some of the listed attributes are read-only. They are included in this section because it is cross-referenced from the
filex get
command. These are the supported attributes:
-
address_city
The city part of the author contact address.
-
address_country
The country part of the author contact address, following the ISO3166 standard.
-
address_state
The state part of the author contact address. Empty if not applicable.
-
address_street
The street address part of the author contact address. Includes floor, house number, etc.
-
address_zip
The
ZIP
code or other applicable postal code of the author contact address.
-
affiliation
The institution the author of the module works for.
-
affiliationduns
The
DUNS
registration ID of the affiliated institution. This is primarily useful for US government projects.
-
affiliationurl
The
URL
of the affiliated institution.
-
aliases
A list of alternative names of for the formats the module supports.
-
author
The author of the module.
-
authorization
An authorization string, for example a service login
URL
. This is for example used in the
dropbox
meta I/O module. In that case, it is a Web
URL
generated by the module from the compiled-in application secret. Using that
URL
, the user must log into a
Dropbox
account and approve access to the files of that account by the application. Only after this has been performed, opening
Dropbox
files with a
molfile open
command succeeds.
-
authorurl
A
URL
with information on the author, or an empty string if unset.
-
builtin
A boolean read-only boolean flag indicating whether the module is built-in.
-
capabilities
A list of features and behaviors the I/O module supports. Only a few of the flags which can be found here can be changed in a productive fashion. These include:
disabled
Temporarily disable this module, without unloading it
nommap
Never attempt to memory-map files of this format
-
category
A category string to be used if the module is stored in a repository.
-
classuuid
The base class
UUID
of this module.
-
comment
A free-form string comment on the module.
-
date
The data the module source was last modified.
-
doi
A digital object identifier for the module, if defined.
-
email
The email address of the author of an I/O module.
-
ensproperty
The name of a property which is used to store structure information in the file. This is only used for file formats where storing structure data is a minor objective, not for standard chemical structure exchange formats.
-
functions
This attribute is a read-only list of the classes of available functions in the function table of the module. Developers can use this information to determine whether a module is input-only or output-only, or supports acceleration methods for scanning structure files.
-
id
The internal format ID of the module in the current program run. This is usually identical to the slot in the extension table for module was loaded or compiled into.
-
infourl
A
URL
with information on the module, or an empty string if unset.
-
keywords
A list of keywords associated with the module.
-
license
The license class associated with this module. Setting the license to a standard type updates the associated
URL
with a standard location.
-
licenseurl
A
URL
with details about the module license.
-
literature
A free-form literature reference.
-
mimetype
The
MIME
type associated with the file format, for example
chemical/x-mdl-molfile
. This information is used for constructing
HTTP
headers for data transfer in Web environments and similar tasks.
-
name
The primary name of the format the I/O module handles.
-
nitrostyle
The style of nitro groups and similar groups in the file, i.e. whether these are preferably encoded with pentavalent nitrogen or a charge pair. Possible values are
asis
(does not matter, or unknown),
ionic
,
neutral
,
xionic
and
xneutral
. If this value is not
asis
, structures written to the file are automatically adapted. This is performed on duplicates of the output structures, so the objects used in a
molfile write
or similar command does not change. On the other hand, the requirement to duplicate the object, manipulate the duplicate, and destroy it after it has been used can be time-consuming.
-
objectfile
The full path name of the loaded object file or dynamic library. This attribute is read-only.
-
orcid
The
ORCID
code of the author (see www.orcid.org).
-
parameters
A dictionary of format-specific keyword/value pairs which are not represented as a general
molfile
object attribute. When a file of a specific format is opened, the data from the corresponding I/O module is copied to the parameters attribute of the
molfile
object, where it may be further customized by
molfile set
commands before an input or output operation. Changing this attribute in the I/O module modifies the initial content of the parameters attribute of all
molfile
objects associated with this format created in the future. Explicitly changing the format of a
molfile
object refreshes the parameter set.
-
path
The repository path for displaying hierarchical repository trees. This attribute is independent of any file system paths.
-
phone
A contact phone number of the author.
-
reactionproperty
The name of a property which is used to store reaction information in the file. This is only used for file formats where storing reaction data is a minor objective, not for standard chemical structure exchange formats.
-
readflags
A list of flags to adjust input behavior. Not all flags are supported for all I/O modules. Unsupported flags are silently ignored. The flag set is copied as default to any
molfile
object which uses the I/O handler module. The flag set is the same as for the
molfile
readflags
attribute, but only a subset of these flags make sense as presets. The flags can be modified on the I/O module level if desired:
-
none
The same as an empty list; no flags are set.
-
aroresolver
If set, resolve bonds marked in the file as
aromatic
into a Kekulé system. This includes resolution of bonds which are explicitly marked as query bonds (i.e. bond type 4 in
MDL
Molfiles
). This is very useful to fix frequently seen MDL Molfiles which encode structures, not queries, but nevertheless use an aromatic bond type in violation of the file format specification. Aromatic system resolution works much more robustly for structures with a complete set of hydrogens. It is advisable to combine this flag with automatic hydrogen addition.
-
autowrap
If set, the file is automatically rewound if the end of the file is reached, and the start record of the operation has not yet been encountered again. This behavior only applies to the
molfile scan
command, not to normal record input. Wrapping is not possible on data source which cannot be rewound.
-
basiconly
Only read basic connectivity information, but not additional properties. Supported only on formats which use the native
Cactvs
structure data storage system (
cbin, cbs, bdb
).
-
chargebalancer
If set, perform a charge balancing step after reading, in an attempt to obtain a neutral structure.
-
chargecombiner
If set, perform a charge combination step after reading, in an attempt to obtain a neutral structure.
-
complexresolver
If set, try to resolve a purely VB-based structure representation into a representation which utilizes
complex
bonds for bonds between ligands and metal centers which cannot be described well with electron-counted VB bonds.
-
fixdoublespace
If set, this flag instructs I/O modules with support for this feature to read structure files which contain one spurious empty line after each data line, which unfortunately appears to happen sometimes when
DOS
-encoded files are transferred to Apple systems. This is not the same as reading
CR/LF
files on
CR
-only or
NL
-only platforms, or vice versa, which is always possible and fully automatic. This flag addresses the problem that, due to mishandling by obscure transfer software, duplicated
EOL
-markers are introduced in the file (two identical
CR/LF
, or
CR
, or
NL
pairs after each data line).
-
fixstereo
If set, remove spurious atom and bond stereo descriptors assigned to non-stereogenic centers.
-
fixwedges
If set, invert wedge bonds encoded with the base at the stereo center to the IUPAC-conforming style with the tips at the stereo center.
-
ignoreempty
If set, records which do not contain any atoms are silently ignored and the next record with atoms is returned instead.
-
ignoreerrors
If set, ignore records with raise errors on file input. Instead, silently attempt to re-synchronize the read pointer and proceed with the next record, until the end of the file has been reached, or an undamaged record could be read successfully.
-
ignorevisibility
Ignore any object visibility information in the file and read all data as visible objects.
-
ignorecr
Allow an isolated carriage return (
ASCII
13) character without following
NL
(
ASCII
10) character as data content instead of examining it as potential line break symbol. This flag is necessarily ignored on Mac-style input files which only use
CR
as
EOL
markers.
-
ignoreeitherdb
If set, ignore the
either
flag for double bonds when reading
MDL
Molfiles
. The default is to translate it into the
crossed
bit of the
B_FLAGS
property.
-
keepcoords
If set, always keep atomic 2D layout coordinates, even if they are, for example in reactions, overlapping on the reagent and product sides. By default the coordinates of molecules are adjusted if necessary to be non-overlapping. This is done by moving molecules only, not by scaling the coordinates, and never by recomputing any coordinates.
-
latehprocessing
If set, hydrogen modification (addition, deletion) is performed after standardization operations (see various
resolver
attributes). By default, hydrogen addition is performed before these routines are called.
-
mergedata
If this flag is set, multi-line input from SD file data lines into a simple string property is merged into a single string value, with tab characters indicating the newlines in the file. By default, in such cases every line of a multi-line data item is stored as a new property instance. This is equivalent to the property attribute
mergedata
(see
prop set
command).
-
multibondcheck
If set, attempt to intelligently resolve any atoms with excessive multiple bonds consuming bond electrons in excess of the available number by recoding such bonds as charge pairs.
-
nocoordinatecheck
If set, no attempt is made to add missing coordinates, for example for automatically added hydrogen atoms, to the 2D and 3D coordinate sets, if such coordinates were present in the original record.
-
noimplicith
Do not add implicit hydrogen. This flag only applies to file formats which exactly define a default number of hydrogens (for example,
SMILES
) as implicit part of the structure . It has no effect in file formats which just tend to omit hydrogen (for example,
MDL
Molfiles
).
-
nometal
Assert to the file input routine that the input does not contain any metal atoms. In that case ambiguous atom symbols, for example
CA
or
CD in
PDB
files,
are interpreted as carbon (in alpha and delta position), and not as calcium or cadmium.
-
nometalh
Assert that none of the metal atoms in the structure has any missing hydrogen ligands. If set, hydrogen addition, if selected, skips the processing these atoms.
-
noorigin
By default, every ensemble or reaction read from a file is augmented with a property
E_FILE
or
X_FILE
, indicating the origin of the record by recording the file name, record number and other information in the automatically attached property. If this information is not of interest, this wasteful step can be suppressed by setting the flag.
-
noradicals
Assert that the file does not contain any radicals. This can for example be helpful in the resolution of aromatic systems (see
aroresolver
attribute).
-
pedantic
Strictly adhere to the format specification and flag any deviation as error. This is feature is only well implemented for
MDL
Molfiles
. It is intended to be used for strict format checking.
-
radicalcharger
Edit radicals which are typically formed by reading a file without formal atomic charge information by adding standard formal charges, for example replacing NR
4
with N
(+)
R
4
and OR with O
(-)
R. This only works reasonably well if the file contains a complete hydrogen set.
-
readas2d
Force interpretation of the atomic coordinates in the record as 2D display coordinates (property
A_XY
), even if syntax or data items in the file indicate the presence of 3D coordinates. This is useful for simple reading of records where 3D coordinate fields were abused for storing display coordinates.
-
readparity
For
MDL
Molfiles
, read the parity information. By default, as recommended by
MDL
, this information is not read and parity is instead computed from wedges if needed.
-
simpleradicals
Assume that any radical encountered is a singlet, and not anything more complex such as triplets etc., regardless what the file encodes.
-
tautoresolver
Perform a tautomer standardization on the read structure. This operation invalidates numerous atom and bond properties, such as coordinates, but in this special case all ensemble properties which were attached to the processed structure are retained, regardless of their sensitivity toward atom and bond changes. Tautomer resolution requires a complete hydrogen set, so either these must be present in the input file, or a suitable hydrogen addition mode must have been set on the file handle. The processing behind this input option is comparatively expensive. For normal input, when speedy input and maximum fidelity of the data to the original file is desired, this flag should not be set.
-
regid
A numerical registration ID assigned to registered modules.
-
references
Cross references of the module. This is a nested list of class
UUID
s and reference type tags.
-
slot
The slot the module was loaded into. This attribute is read-only.
-
sourcefile
The name of the source file of the module. This attribute is read-only.
-
suffixes
A list of the file suffixes this module recognizes as typical for the implemented format. If a file with a suffix is opened for writing without specifying an explicit format, the last loaded module which has the suffix in its list determines the automatically assigned format. Suffixes are ignored as format identifier for file input and updates. In these cases, the file contents are analyzed to determine the format. This attribute is read-only.
-
version
The version of the module. This is a string in a 1.2.3 (or shortened) style.
-
versionuuid
The version
UUID
associated with this module version.
In case the format argument cannot be resolved by an active module, an attempt to auto-load a suitable module is made.
filex subcommands
filex subcommands
dir(Filex)
List all supported subcommands of the
filex
command in an installation.
filex unload
filex unload ?format?...
fx.unload()
Filex.Unload(?format?,...)
Unload zero or more I/O modules. It is an error to specify the name of a module which is not loaded.
Built-in I/O modules cannot be unloaded. If the use of one of these needs to be switched off, it is possible to set the
disabled
flag of the
capabilities
module attribute via the
filex set
command.
The command returns the number of unloaded I/O modules.