The dataset Command

The dataset command is the generic command used to manipulate datasets. The syntax of this command follows the standard schema of command/subcommand/majorhandle . Datasets are major objects and thus do not need any minor object labels for identification.

Example:

dataset get $dhandle D_SIZE

As explained in the introductory section on datasets, a normal persistent dataset handle may be substituted as third argument of the dataset command by an arbitrary list of dataset, ensemble, reaction, table and network handles. Substitution is only allowed in that argument position, not in case where a dataset handle is part of the command arguments of another object command, and not in a different argument position in the context of a dataset command. Such an object list is transformed into a transient dataset for the duration of the command execution. After the command has completed, the elements of the transient dataset are in most cases restored to their original state with respect to dataset membership and position, except in a few documented exceptional circumstances.

As a means to access an embedded dataset object, its handle may be replaced by the handle of the parent object where this is unambiguous, e.g.

ens move $eh $thandle

moves the ensemble into the embedded dataset of the table, while

dataset count $thandle

treats the table argument as part of a transient dataset as described above.

This is the list of currently officially supported subcommands:

dataset add

dataset add dhandle objhandle ?position?
d.add(object=,?position=?)
d += object

Add an object to the dataset, relocating it from a current dataset if it exists. If no position is specified, the object is appended to the rear of the dataset object list. The position can either be a numerical zero-based index, or any string beginning with ‘e’ to indicate the end position.

If the object handle identifies a (local) dataset, and the target dataset does not accept datasets as members, all objects in the source dataset are instead moved to the new dataset, and then the source dataset is destroyed. If ensembles, reactions, tables or networks are moved, they are unlinked from any current datasets, but these original datasets themselves persist.

This dataset command is equivalent to issuing a move command from the object.

The command returns the dataset handle for Tcl , or the dataset reference for Python . The numerical operator shortcut for Python adds the object to the end of the dataset.

Example:

dataset add $dh $eh end
ens move $eh $dh end

These two commands are equivalent.

dataset addthread

dataset addthread dhandle ?body?
dataset addthread dhandle count body
dataset addthread dhandle count substitutiondict body
d.addthread(?count=?,?dictionary=?,?script=?)

Add one or more Tcl script threads to the dataset. By default, a single thread is added, but by setting the count parameter to a higher number multiple threads with the same script body can be added simultaneously, up to a maximum of 32 threads per dataset. It is possible to use this command to add additional threads to a dataset which already has attached threads. These older threads remain active.

The thread script code is always Tcl code, even if the command is issued from a Python interpreter. This is due to limitations in the Python thread model and described in more detail in the general Python scripting introduction.

The optional substitution dictionary contains a set of percent-prefixed keys and replacement values, following the Tk event procedure model. All such replacements are made before the script is passed to the thread interpreters. A single default substitution replacing the character sequence %D with the handle of the current dataset is always predefined and cannot be redefined. Replacement token keys (but not necessarily their values) are single case-depended characters, ignoring an optional percent prefix character. Within the script, percent signs which should be preserved as such must be doubled, just like in Tk event substitution commands.

The dataset threads are compatible to those of the standard Tcl threads package. Dataset-associated threads are automatically created in preserved state, and a thread::wait command is automatically appended at the end of the script, so they can be sent additional tasks via the thread::send commands. If no script body is specified, the initial script consists only of the wait command. Threads can be canceled or joined only if they are stopped the thread::wait statement.

When a dataset is deleted, all threads associated with this dataset need first to be joined, and this can only happen if they have finished processing the main body script and are all in their idle state in the thread::wait command. Object deletion is postponed until this condition is met. A global join on all currently executing dataset threads is automatically performed when the program exits, before any object clean-up tasks are run. An application where dataset threads are stuck and do not reach their t hread::wait cancellation points cannot be cleanly exited.

Duplicating datasets does not duplicate any associated threads.

The presence of threads on a dataset has consequences for the behavior of the dataset wait and dataset pop commands, as well as object insertion commands associated with other major object classes (e.g. ens move , or molfile read ). Please refer to the respective paragraphs for details. The size control mechanism of datasets in the auto mode is also dependent on the presence of absence of linked dataset threads.

Example:

dataset addthread $dh 1 [dict create %T $th] {
	while {1} 
		set eh [dataset pop %D]
		if {$eh==""} break
		if {[catch {ens get $eh E_CANONIC_TAUTOMER} eh_canonic]} {
			ens delete $eh
			continue
		}	
		if {[catch {ens get $eh_canonic E_DESCRIPTORS}]} {
			ens delete $eh
			continue
		}
		table addens %T $eh_canonic
		ens delete $eh
	}
}

This code creates a processing thread on the dataset which computes properties on newly arriving ensembles, stores the data in a table (note the table handle substitution via the replacement dictionary) and then deletes the ensemble. The dataset pop command returns an empty string when it is known no more data will arrive, and otherwise blocks until an object for popping is available. This is managed by setting the eod dataset attribute from feeder threads.

The return value of the command is a list of the Tcl thread IDs of the newly created threads. These are suitable for use in the dataset jointhreads command or any standard Tcl thread package command.

dataset append

dataset append dhandle ?property value?...
d.append({?property:value,?...})
d.append(?property,value,?...)

Standard data manipulation command for appending property data. It is explained in more detail in the section about setting property data.

The command returns the first data value.

Example:

dataset append $dhandle D_NAME “_new”
dataset append $dhandle eod 1

dataset assign

dataset assign dhandle srcproperty dstproperty
d.assign(srcproperty=,dstproperty=)

Assign property data to another property on the same ensemble. Both properties must be associated with the same object class. This process is more efficient than going through a pair of dataset get/dataset set commands, because in most cases no string or Tcl/Python script object representations of the property data need to be created.

Both source and destination properties may be addressed with field specifications. A data conversion path must exist between the data types of the involved properties. If any data conversion fails, the command fails. For example, it is possible to assign a string property to a numeric property - but only if all property values can be successfully converted to that numeric type. The reverse example case always succeeds, out-of-memory errors and similar global events excluded.

The original property data remains valid. The command variant dataset rename directly exchanges the property name without any data duplication or conversion, if that is possible. In any case, the original property data is no longer present after the execution of this command variant.

If the properties are not associated with datasets (prefix D_ ), the operation is performed on all dataset member objects.

The command returns the object handle for Tcl , or object reference for Python .

Example:

dataset assign $dhandle A_XY A_XY%

This code snippet creates backup atomic 2D layout coordinates on all dataset ensembles or reactions.

dataset biologics

dataset biologics dhandle ?filterset? ?filtermode? ?recursive?
d.tables(?filters=?,?mode=?,?recursive=?)

Return a list of all the handles or references of the biologics in the dataset. Other objects in the dataset (ensembles, reactions, datasets, networks) are ignored. The object list may optionally be filtered by the filter list, and the result further modified by a standard filter mode argument.

If the recursive flag is set, and the dataset contains other datasets as objects, biologics in these nested datasets are also listed.

Example:

set n [dataset biologics $dhandle {} count]

dataset cancelthreads

dataset cancelthreads ?all?
dataset cancelthreads dhandle ?all?
dataset cancelthreads dhandle threadid...
Dataset.Cancelthreads()
d.cancelthreads(“all”)
d.cancelthreads()
d.cancelthreads(?threadid?,...)

Cancel (or more precisely, wait for and join) one or more threads associated with the dataset. Dataset threads can only be canceled when they are idle, executing the implicitly added thread::wait command at the end of their script. Therefore, this command is not just used for clean-up, but also useful for ascertaining that the threads have finished their tasks. The IDs of the threads associated with a dataset can be retrieved as the threads dataset attribute, or saved from the return value of the original dataset addthread command. The special all thread ID value can be used to cancel all threads of the dataset. This can also be achieved by setting an empty thread ID parameter, or omitting it altogether. If a dataset does not possess threads, this command does nothing. If a thread marked for cancellation has not yet finished, the cancellation command is suspended until it has.

This command can also be invoked without specifying an explicit or transient dataset argument, or passing it as all. In that case, the thread join cleanup is run on all threads of all currently defined datasets. This function is also implicitly run when a a script exits, before performing other application cleanup operations.

Thread cancellation for all dataset threads is implicitly invoked when a dataset is deleted, so an explicit clean-up is not required. However, this also means that a dataset deletion blocks if there are still active threads. It is not possible to forcefully cancel an thread which has entered an infinite loop, so careful programming is required.

The command returns the number of canceled threads.

dataset jointhreads is an alias to this command.

Example:

dataset jointhreads $dh
dataset cancelthreads $dh [lindex [dataset get $th threads] 0]
dataset jointhreads

The first example waits for all threads on the specified dataset to finish. The second command waits for the completion of one specific thread, and the last command waits for all threads on all currently defined datasets.

dataset cast

dataset cast dhandle dataset/ens/reaction/table ?propertylist?
d.cast(objectclass=,?properties=?)

Transform the dataset into a different object. Depending on the target object class, the result is as follows:

If the optional property list is specified, an attempt is made to compute the listed properties before the cast operation, so that they may become a part of the new object. No error is raised if a computation fails.

The command returns the handle (reference for Python ) of the new object, or the input object in case of mode dataset.

dataset clear

dataset clear dhandle
d.clear()

Delete all objects in the dataset, but keep the dataset object. The return value is the number of deleted objects.

dataset count

dataset count dhandle|remotehandle ?filterlist?
d.count(?filters=?)
Dataset.Count(dataset=,?filters=?)

Get the number of objects in the dataset. If the filter parameter is specified, only those objects which pass the filter are counted.

Example:

dataset count $dhandle astereogenic

counts the number of ensembles or reactions in the dataset with one or more potential atom stereo centers.

dataset size is an alias to this command.

This command can be used with remote datasets. In the case of Python , this requires the use of the class method.

In case a simple count on a local dataset is required, without any filters, the dataset size can also be queried as attribute, as in

set n [dataset get $dhandle size]

dataset create

dataset create ?objecthandle/objectlist?...
Dataset(?objectref/objectsequence?,...)
Dataset.Create(?objectref/objectsequence?,...)

This command creates a new dataset and returns the handle of the new dataset. If the optional object handle lists are provided as arguments, the specified objects (in case of ensemble, reaction, network or table handles), or elements of the object (for a dataset handle, with default accept flags) are moved to the new dataset. In case the accept flags of the target dataset are configured to allow datasets as primary dataset objects, the source dataset argument is not implicitly replaced by its content objects but added as a single object, retaining its objects as content. Otherwise, the source dataset is emptied but remains a valid object.

Besides handles of ensembles, reactions, networks, tables, molfiles and of other datasets, which are identified with priority, any string which can be decoded in an ens create statement is also allowed as member initialization identifier.

If the dataset create statement references objects which are not usually accepted by the default settings of the accept dataset attribute, that attribute is automatically adjusted to allow for these objects. The accept flag modification is persistent.

Molfile objects in the object handle list are treated different from other objects. The latter are directly moved into the dataset. In the case of molfile objects, the file is read from the current position to the end (or until a termination condition configured on the molfile handle is met), and the newly read objects are moved into the dataset.

The command always returns the handle of the new dataset (or a reference for Python ), never the handles of any objects which may have been placed into the dataset

Examples:

dataset create [list $eh1 $eh2] $dh1

creates a new dataset and move the two specified ensembles $eh1 and $eh2, as well as everything contained in the dataset $dh1 , into the new dataset.

dataset create [molfile open myfile.sdf r hydrogens add]

creates a dataset from the file contents, with hydrogen addition configured on the molfile handle.

dataset create VXPBDCBTMSKCKZ

Above command matches a partial InChI key, and puts all structures from the NCI resolver which matches the non-stereo/isotope-specific part of their full InChI key, into the new dataset.

set ::cactvs(lookupmode) „name_pattern“
dataset create [list "+morphine +methyl"]

This command performs a name pattern lookup and puts all structures from the NCI resolver which contain both name fragments in one of their known names into the dataset. The name pattern string needs to be explicitly packed into a list, because otherwise it would be split into two independent list elements.

dataset dataset

dataset dataset dhandle ?filterlist?
d.dataset(?filters=?)

Get the handle (or, for Python , a reference) of the container dataset the dataset is a member of. If the dataset is not itself a dataset member, or does not pass the optional filters, an empty string is returned, or None for Python .

This command is not equivalent to dataset datasets !

dataset datasets

dataset datasets dhandle ?filterset? ?filtermode? ?recursive?
d.datasets(?filters=?,?mode=?,?recursive=?)

Return a list of all the handles or references of the datasets that are members in the dataset identified by the command argument handle. Other objects (ensembles, reactions, tables, networks) are ignored. The object list may optionally be filtered by the filter list, and the output further modified by a standard filter mode.

If the recursive flag is set, and the dataset contains other datasets as objects, datasets in these nested datasets are also listed.

This command is not equivalent of the dataset dataset command!

Example:

set dlist [dataset datasets $dhandle]

dataset defined

dataset defined dhandle property
d.defined(property)

This command checks whether a property is defined for the dataset. This is explained in more detail in the section about property validity checking. Note that this is not a check for the presence of property data! The dataset valid command is used for this purpose.

The command returns a boolean result.

dataset delete

dataset delete ?datasethandle/datasethandlelist/all?...
d.delete()
Dataset.Delete(“all”)
Dataset.Delete(?dref/drefsequence/dhandle?,...)

This command destroys datasets and everything contained therein. The special handle value all may be used to delete all datasets in the application at once.

The command returns the number of datasets which were successfully deleted.

Transient datasets cannot be used with this command. Neither can be datasets which are a component of another object, e.g. the internal datasets of tables or factories. These are only and automatically deleted when their parent object is destroyed. Datasets which are a property value are also undeletable by this command.

It is a common programming error to delete a dataset, or its parent object if one exists, without protecting its current member ensembles or reactions. If they are still needed in later processing they need to be explicitly transferred into another dataset or outside of it.

Examples:

dataset delete all
dataset move $dhandle {}; dataset delete $dhandle

The first example destroys all datasets defined in the current script and everything contained in them. The second example shows how to delete a dataset and preserve its contents by moving all dataset elements out prior to deletion.

dataset dget

dataset dget dhandle propertylist ?filterset? ?parameterdict?
d.dget(property=,?filters=?,?parameters=?)
Dataset.Dget(items,property=,?filters=?,?parameters=?)

Standard data manipulation command for reading object data. It is explained in more detail in the section about retrieving property data.

For examples, see the dataset get command. The difference between dataset get and dataset dget is that the latter does not attempt computation of property data, but rather initializes the property values to the default and return that default if the data is not yet available. For data already present, dataset get and dataset dget are equivalent.

The Python class method is a one-shot command. The transient dataset created from the initialization items is automatically deleted when the command finishes.

dataset dup

dataset dup dhandle ?targethandle? ?cleartarget?
d.dup(?target=?,?cleartarget=?)

If the optional arguments are not supplied, the dataset with all data attached to the dataset and all objects which are contained in it are duplicated. The command returns a new dataset handle for Tcl , or reference for Python . All duplicated objects in the new datasets also are assigned handles which can be obtained by commands such as dataset list $dhandle .

It is possible to specify a target dataset as an optional argument. In that case, no new dataset is created, and dataset-level property data on the source dataset is not copied. All objects in the source dataset are duplicated and appended to the end of the target dataset. In case the boolean target clearance flag is set, which is also the default if the parameter is omitted, the target dataset is cleared before the new objects from the source dataset are added. In this command variant, the return value of the command is the target dataset handle or reference.

Examples:

dataset dup $dhandle
dataset dup [list $eh1 $eh2] $dtarget 0

dataset ens

dataset ens dhandle ?filterset? ?filtermode? ?recursive?
d.ens(?filters=?,?mode=?,?recursive=?)

Return a list of all the handles or references of the ensembles in the dataset. Other objects (reactions, tables, datasets, networks) are ignored. The object list may optionally be filtered by the filter list, and the output further modified by a standard filter mode.

If the optional boolean recursive argument is set, ensembles which are a component of a reaction in the dataset are also listed. Furthermore, if the dataset contains datasets as elements, these are recursively traversed, and ensembles in these, as well as ensembles in reactions in these datasets, are listed. If the output mode of the command is a handle list, items found by recursion are appended to the result list in a straight fashion, without the creation of nested lists. By default the recursion flag is off. Regardless of the flag value, ensembles which are associated with rows of a table in the dataset, but are not themselves dataset members, are not output.

Example:

set elist [dataset ens $dhandle astereogenic]

lists those ensembles in the dataset which have one or more atoms which are potential atom stereo centers.

set cnt [dataset ens $dhandle {} count 1]

returns a count of all ensembles which are either directly members of the dataset, or indirectly as component objects of reactions in the dataset, or which are contained in datasets which are a themselves a member of the primary dataset.

dataset exists

dataset exists dhandle ?filterlist?
d.exists(?filters=?)
Dataset.Exists(dref,?filters=?)

Check whether a dataset handle or reference is valid. The command returns boolean 0 or 1. Optionally, the dataset may be filtered by a standard filter list, and if it does not pass the filter, it is reported as not valid. This command cannot be used with transient datasets.

Example:

dataset exists $dhandle

dataset expr

dataset expr dhandle expression
d.expr(expression)

Compute a standard SQL -style property expression for the dataset. This is explained in detail in the chapter on property expressions.

dataset extract

dataset extract dhandle propertylist ?filterset? ?filterprocs?
d.extract(property=,?filters=?,?filterfunctions=?)

This command is rather complex and closely related to the dataset xlabel command. It was designed for the efficient extraction of major or minor object data for filtered subsets of the dataset.

The property list parameter determines the property data which is extracted. Multiple properties may be specified, but they can only be associated with major objects and one arbitrary minor object class. So it is possible to simultaneously extract an ensemble and an atom property, but not an atom and a bond property.

The return value is a nested list of data items for every object which is encountered while traversing the dataset on the level of the minor object associated with the extraction property, or just ensembles or other major objects if no such property is selected. Every list element is itself a list which contains the extracted property values in the order they are named in the property list parameter.

The objects for which data is returned can further be filtered by a standard filter set, and additionally by a list of filter procedures (for Tcl , specified as procedure names) or functions (for Python , specified as function names or function references). These procedures or functions are called with the respective object handles/references and object labels as arguments. For example, a callback function used in an atom retrieval context would be called for each atom with its ensemble handle or reference and the atom label as arguments. If major objects without a label are checked, such as complete ensembles, 1 is passed as the label. The callback procedures are expected to return a boolean value. If it is false or 0, the object is not added to the returned list, and the other check procedures are no longer called.

The command currently only works on ensembles in the dataset, ignoring any reactions, tables, datasets or networks which may be present.

Because this command is primarily intended for numerical data display, the returned values are formatted as with the nget command, i.e. instead of enumerated values the underlying numerical values are returned.

Example:

set dhandle [dataset create [ens create CO] [ens create CN]]
dataset extract $dhandle [list E_NAME A_SYMBOL] !hydrogen

This example first creates a dataset with methanol and methylamine . The second line performs the actual extraction and returns

{CH4O C} {CH4O O} {CH5N C} {CH5N N}

This kind of extracted data is useful for the display of filtered atomic (and other minor object’s) property values.

dataset filter

dataset filter dhandle filterset
d.filter(filters)

Check whether a the dataset passes a filter list. The return value is boolean 1 for success and 0 for failure. Note that only filters operating on dataset objects are applicable, not any filter for objects contained in the dataset (such as ensembles or reactions).

dataset find

dataset find dhandle objecthandle
d.find(objectref)

Get the index of the dataset object. If it cannot be found in the dataset, the result is minus one.

dataset forget

dataset forget dhandle ?objectclass?
d.forget(?objectclass=?)

This command is essentially the same as the ens forget (or reaction forget , etc) command. It is applied to all objects in the dataset.

If the object class is dataset , all dataset-level property data is deleted.

The command returns the dataset handle or reference, or, for Tcl only, an empty string if the dataset was transient.

dataset get

dataset get dhandle propertylist ?filterset? ?parameterdict?
dataset get dhandle attribute
d.get(property=,?filters=?,?parameters=?)
d.get(attribute)
d[property/attribute]
d.property/attribute
Dataset.Get(items,property=,?filters=?,?parameters=?)
Dataset.Get(items,attribute)

Standard data manipulation command for reading object data. It is explained in more detail in the section about retrieving property data.

In addition to retrieving property data, it can also be used to query dataset attributes. The set of supported attributes is detailed in the paragraph on the dataset set command.

The Python class method is a one-shot command. The transient dataset created from the initialization items is automatically deleted when the command finishes.

Examples:

dataset get $dhandle {D_NAME D_SIZE}

yields the name and size of the dataset as a list. If the information is not yet available, an attempt is made to compute it. If the computation fails, an error results.

dataset get $dhandle [list E_FORMULA E_WEIGHT]

gives the formula and molecular weight of all dataset ensembles. The result is delivered as a nested list. The first list are the formulas, the second list contains the weights.

Currently, it is not possible to use filters with this command (and the other retrieval command variants) which are not operating directly on the dataset object, but on objects lower in the hierarchy such as ensembles or atoms.

For the use of the optional property parameter list argument, refer to the documentation of the ens get command.

Variants of the dataset get command are dataset new, dataset dget, dataset jget, dataset jnew, dataset jshow, dataset nget, dataset show, dataset sqldget, dataset sqlget, dataset sqlnew, and dataset sqlshow .

dataset getparam

dataset getparam dhandle property ?key? ?default?
d.getparam(property=,?key=?,?default=?)

Retrieve a named computation parameter from valid property data. If the key is not present in the parameter list, an empty string is returned ( None for Python ). If the default argument is supplied, that value is returned in case the key is not found.

If the key parameter is omitted, a complete set of the parameters used for computation of the property value is returned in dictionary format.

This command does not attempt to compute property data. If the specified property is not present, an error results.

Example:

dataset getparam $dhandle E_GIF format

returns the actual format of the image, which could be GIF , PNG , or various bitmap formats.

dataset hadd

dataset hadd dhandle ?filterset? ?flags? ?changeset?
d.hadd(?filters=?,?flags=?,?changeset=?)

Add a standard set of hydrogens to all ensembles and reactions in the dataset. If the filterset parameter is specified, only those atoms which pass the filter set are processed.

Additional operation flags may be activated by setting the flags parameter to a list of flag names, or a numerical value representing the bit-ored values of the selected flags. By default, the flag set is empty, corresponding to the use of an empty string or none as parameter value. These flags are currently supported:

Adding hydrogens with this command is less destructive to the property data set of the ensembles or reactions than adding them with individual atom create/bond create commands, because many properties are defined to be indifferent to explicit hydrogen status changes, but are invalidated if the structure is changed in other ways.

If the effects of the hydrogen addition step to the validity of the property data set should not be handled with this standard procedure, it is possible to explicitly generate additional property invalidation events by specifying a list as the optional last parameter, for example a list of atom and bond to trigger both the atom change and bond change events.

The command returns the total number of hydrogens added to all ensembles and reactions in the dataset.

Example:

dataset hadd $dhandle

dataset hdup

dataset hdup dhandle ?targethandle? ?cleartarget?
d.hdup(?target=?,?cleartarget=?)

If the optional arguments are not supplied, the dataset with all data attached to the dataset and all objects which are contained in it are duplicated with hydrogen addition. The command returns a new dataset handle for Tcl , or reference for Python . All duplicated objects in the new datasets also are assigned handles which can be obtained by commands such as dataset list $dhandle .

It is possible to specify a target dataset as an optional argument. In that case, no new dataset is created, and dataset-level property data on the source dataset is not copied. All objects in the source dataset are duplicated with hydrogen addition and appended to the end of the target dataset. In case the boolean target clearance flag is set, which is also the default if the parameter is omitted, the target dataset is cleared before the new objects from the source dataset are added. In this command variant, the return value of the command is the target dataset handle or reference.

Examples:

dataset dup $dhandle
dataset dup [list $eh1 $eh2] $dtarget 0

dataset hierarchies

dataset hierarchies dhandle ?filterset? ?filtermode? ?recursive?
d.tables(?filters=?,?mode=?,?recursive=?)

Return a list of all the handles or references of the hierarchies in the dataset. Other objects in the dataset (ensembles, reactions, datasets, networks) are ignored. The object list may optionally be filtered by the filter list, and the result further modified by a standard filter mode argument.

If the recursive flag is set, and the dataset contains other datasets as objects, hierarchies in these nested datasets are also listed.

This is not the same as dataset hierarchy - the latter reports the hierarchy the dataset is a member of. This command lists the hierarchies in the dataset.

Example:

set n [dataset hierarchies $dhandle {} count]

dataset hierarchy

dataset hierarchy dhandle ?filterlist? ?root?
d.hierarchy(?filters=?,?root=?)

Return the hierarchy handle or reference of the hierarchy the dataset is part of. If the dataset is not member of a hierarchy, or does not pass all of the optional filters, an empty string or None for Python is returned. By default, the hierarchy object which directly contains the dataset is returned. If the root flag is set, the root hierarchy object is reported instead, which is the same only if the hierarchy has only a single level.

This command is not the same as dataset hierarchies , which reports hierarchies in the dataset.

Example:

dataset hierarchy $dhandle

dataset hread

dataset hread dhandle ?datasethandle|enshandle? ?#recs|batch|all?
d.hread(?target=?,?limit=?)

This command provides the same functionality as dataset read , but additionally adds a stand set of hydrogen atoms to the read duplicate objects.

The command arguments are explained in the section on dataset read .

dataset hstrip

dataset hstrip dhandle ?flags? ?changeset?
d.hstrip(?flags=?,?changeset=?)

This command removes hydrogens from the dataset ensembles and reactions. By default, all hydrogen atoms in the dataset ensembles or reactions are removed.

The flags parameter can be used to make the operation more selective. It may be a list of the following flags:

If the flags parameter is an empty string, or none , it is ignored. The default flag value is wedgetransfer - but the default value is overridden if any flags are set!

If the changeset parameter is given, all property change events listed in the parameter are triggered.

Hydrogen stripping is not as disruptive to the ensemble or reaction data content as normal atom deletion. The system assumes that this operation is done as part of some file output or visualization preparation. However, if any new data is computed after stripping, the computation functions see the stripped structure, and proceed to work on that reduced structure without knowledge that there are implicit hydrogens.

Example:

dataset hstrip $dhandle [list keeporiginal wedgetransfer]

dataset index

dataset index dhandle
dataset index dhandle position
d.index(?position=?)

This command comes in two variants. The tree-word version is the generic command to check dataset membership, which is the same for all objects which can be dataset members. The second version is specific to datasets objects and retrieves object references from this dataset.

This first version gets the position of the dataset in the object list of its parent dataset. If the dataset is not part of a parent dataset, -1 is returned. This is the generic dataset membership test command variant.

This second command variant obtains the object handle or reference of the object at the specified position in this dataset. Position counting begins with zero. If the index is outside the object position range, an empty string is returned. The special value end may be used to address the last object. The indexed object remains in the dataset.

Note that this index command is not equivalent to the standard index command on minor objects which is used to obtain the position of the minor object in the minor object list of the controlling major object. This kind of functionality is not needed for major objects, because they are not contained in any minor object list.

Example:

dataset index $dhandle end

dataset intersect

dataset intersect dhandle1 dhandle2 ?property?...
d.intersect(dref2,?pref?...)

Perform an intersection check between two datasets. The result is a list of zero-based dataset index pairs (as in dataset index ) of all identical corresponding dataset entries in both datasets, as judged by the value of the comparison property. The default comparison property is E_ISOTOPE_STEREO_HASH for full structural identity check of ensembles.

In case the first dataset contains duplicates, the index of the matching second dataset element is identical for all duplicates, and, in case the second dataset also contains corresponding duplicates, a (pseudo-)random element from among these duplicates, and the other duplicates in the second dataset are reported as not matched in the dataset intersect3 command variant (see below).

The comparison property object class must match the class of the compared dataset objects (i.e. the default property is only suitable for comparison of ensembles in the datasets, but not for reactions, etc.). Objects of mismatching classes in the datasets are ignored.

Example:

set dh1 [dataset create CC CCC CCCC]
set dh2 [dataset create CCC CCCC CCCCC]
dataset intersect $dh1 $dh2

The result is {1 0} {2 1} , meaning the second (if we start counting with 1) element of the first dataset corresponds to the first element in the second, and the third element to the second.

dataset intersect3

dataset intersect3 dhandle1 dhandle2 ?property?...
d.intersect3(dref2,?pref?...)

This command is an extended variant of dataset intersect . The return value is a 3-element list comprising of a simple list of the element indices in the first dataset which are not matched, the match pair list as in dataset intersect of the equivalent elements, and a simple list containing the element indices of the second dataset which are not matched.

Example:

set dh1 [dataset create CC CCC CCCC]
set dh2 [dataset create CCC CCCC CCCCC]
dataset intersect3 $dh1 $dh2

The result is 0 {{1 0} {2 1}} 2 . The middle element of the result list is the same as in the example for the dataset intersect command. The first element indicates that the first (starting the count with 1) element of the first dataset was not matched, and the third element indicates that the third element of the second dataset was not matched.

dataset jget

dataset jget dhandle propertylist ?filterset? ?parameterdict?
d.jget(property=,?filters=?,?parameters=?)
Dataset.Jget(items,property=,?filters=?,?parameters=?)

This is a variant of dataset get which returns the result data as a JSON formatted string instead of Tcl or Python interpreter objects. The command is usable only for property data, not attribute retrieval.

The Python class method is a one-shot command. The transient dataset created from the initialization items is automatically deleted when the command finishes.

dataset jointhreads

dataset jointhreads ?all?
dataset jointhreads dhandle ?all?
dataset jointhreads dhandle threadid...
Dataset.Jointhreads()
d.jointhreads(“all”)
d.jointhreads()
d.jointhreads(?threadid?,...)

This is an alias for the dataset cancelthreads command. Please refer to its documentation.

dataset jnew

dataset jnew dhandle propertylist ?filterset? ?parameterdict?
d.jnew(property=,?filters=?,?parameters=?)
Dataset.Jnew(items,property=,?filters=?,?parameters=?)

This is a variant of dataset new which returns the result data as a JSON formatted string instead of Tcl or Python interpreter objects.

The Python class method is a one-shot command. The transient dataset created from the initialization items is automatically deleted when the command finishes.

dataset jshow

dataset jshow dhandle propertylist ?filterset? ?parameterdict?
d.jshow(property=,?filters=?,?parameters=?)
Dataset.Jshow(items,property=,?filters=?,?parameters=?)

This is a variant of dataset show which returns the result data as a JSON formatted string instead of Tcl or Python interpreter objects.

The Python class method is a one-shot command. The transient dataset created from the initialization items is automatically deleted when the command finishes.

dataset ldup

dataset ldup ?dhandlelist?...
Dataset.Ldup(?dref/drefsequence?,...)

Duplicate all datasets in the handle list(s) in default mode.

The return value is a single list (even if multiple source lists are used) of the duplicated ensemble handles or references. If an argument list element is an empty string (or None for Python ), it indicates a missing object, and the output list also receives an empty string element (for Tcl ) or None (for Python ) at its position, without raising an error.

This command cannot be used with transient datasets.

dataset lhdup

dataset lhdup ?dhandlelist?...
Dataset.Lhdup(?dref/drefsequence?,...)

Duplicate all datasets in the handle list(s) in default mode, and add hydrogens.

The return value is a single list (even if multiple source lists are used) of the duplicated ensemble handles or references. If an argument list element is an empty string (or None for Python ), it indicates a missing object, and the output list also receives an empty string element (for Tcl ) or None (for Python ) at its position, without raising an error.

This command cannot be used with transient datasets.

dataset list

dataset list ?dhandle?
Dataset.List(?filters=?)
d.list()

Without a handle argument (for Tcl ), or called as the class method (for Python ) the command returns a list of the handles of all existing datasets.

If (in Tcl ) a dataset handle or transient dataset is passed as third argument, or the object method is used (for Python ) the command returns a list of all major objects in the dataset. In the Tcl case, this function is different from the behavior of the list subcommand for other major object classes, where the optional argument is a filter list. In Python , the filter list variant is supported.

Examples:

dataset list
dataset list $dhandle

dataset lock

dataset lock dhandle propertylist/dataset/all ?compute?
d.lock(property=,?compute=?)

Lock property data of the dataset handle, meaning that it is no longer subject to the standard data consistency manager control. The data consistency manager deletes specific property data if anything is done to the dataset handle which would invalidate the information. Property data remains locked until is it explicitly unlocked.

The property data to lock can be selected by providing a list of the following identifiers:

A lock can be released by a dataset unlock command.

This command does not recurse into the objects contained in the dataset.

The return value is the dataset handle (for Tcl ) or reference (for Python ) or, if the dataset was transient, an empty string (for Tcl only).

dataset loop

dataset loop dhandle objvar ?maxrec? ?offset? body
d.loop(function=,?maxloop=?,?offset=?,?variable=?)
for obj in d:

Loop over the elements in a dataset. This command is similar to molfile loop . On each iteration, the variable is set to the handle of the current member object, and then the body code is executed. The variable refers to the original dataset element, not a duplicate. This is different from dataset read.

All operations on the current loop item are allowed, including deletion. However, the next object after the current item must not be deleted or moved, because it is needed for the iteration process.

If a maximum record count is set, the loop terminates after the specified number of iterations. If the maximum record argument is set to an empty string, a negative value, or all , the loop covers all dataset elements. This is also the default.

For Tcl scripts, within the loop, the standard Tcl break and continue commands work as expected. If the body script generates an error, the loop is exited.

If no offset is specified, the loop starts at the first element. Within the loop body, the dataset attribute record is continuously updated to indicate the current loop position. Its value starts with one, like file records in the molfile loop command.

The Python version of the loop method does intentionally have a different argument sequence for convenience. The function argument may either be a multi-line string (similar to the Tcl construct), or a function reference. Functions are called with the reference of the current loop object as single argument, and have their own context frame, so that the specification of a reference variable is not generally useful in that call style, though is is allowed. For string function blocks the code is executed in the local call frame, and the variable with the current object reference is visible locally. Script code blocks must be written with an initial indentation level of zero. Within the Python functions, the normal break and continue commands cannot be used to to scope limitations. Instead, the custom exceptions BreakLoop and ContinueLoop can be raised. These are automatically caught and processed in the loop body handler code.

In Python , there is also an object iterator so that simple loops over dataset elements can be written with a for statement. The dataset object iterator is of the self style (i.e. there is one per dataset, these are not independent objects), so nesting them is not possible on the same dataset.

Python object loop constructs and their peculiarities are discussed in more detail in the general chapter on Python scripting.

Example:

dataset loop $dh eh {
	puts „[ens get $eh E_NAME] at position[ens index $eh]“
}

dataset match

dataset match dhandle ss_ehandle ?matchflags? ?ignoreflags? 
d.match(substructure=,?matchflags=?,?ignoreflags=?)

Perform a substructure match on all eligible objects in the dataset. The return value is the match count.

The arguments are the same as with ens match . The specification of variables to capture match locations is not possible in this command variant.

dataset max

dataset max dhandle propertylist ?filterset?
d.max(property=,?filters=?)

Get the maximum value of one or more properties in from the elements in the dataset. The property argument may be any property attached to dataset members, or minor objects thereof. If the filterset argument is specified, the maximum value is searched only for objects which pass the filter set.

Examples:

dataset max $dhandle E_WEIGHT
dataset max [list $ehandle1 $ehandle2] A_SIGMA_CHARGE carbon

The first example finds the highest molecular weight in the dataset. The second example finds the largest (most positive) Gasteiger partial charge on any carbon atom in the two argument ensembles, which form a transient dataset.

dataset metadata

dataset metadata dhandle property ?field ?value??
d.metadata(property=,?field=?,?value=?)

Obtain property metadata information, or set it. The handling of property metadata is explained in more detail in its own introductory section. The related commands dataset setparam and dataset getparam can be used for direct manipulation of specific keys in the computation parameter field. Metadata can only be read from or set on valid property data.

Valid field names are bounds , comment , info , flags , parameters and unit .

Examples:

array set gifparams [dataset metadata $dhandle D_GIF parameters]
dataset metadata $dhandle D_QUALITY comment “This value looks suspicious to me”

The first line retrieves the computation parameters of the property D_GIF as keyword/value pairs. These are read into the array variable gifparams , and may subsequently be accessed as $gifparams(format) , $gifparams(height) , etc. The second example shows how to attach a comment to a property value.

dataset min

dataset min dhandle propertylist ?filterset?
d.min(property=,?filters=?)

Get the minimum value of one or more properties from the elements in the dataset. The property argument may be any property attached to dataset sub-elements, or minor objects thereof. If the filterset argument is specified, the minimum value is searched only for objects which pass the filter set.

Examples:

dataset min $dhandle E_WEIGHT
dataset min [list $ehandle1 $ehandle2] A_SIGMA_CHARGE carbon

The first example finds the smallest molecular weight in the dataset. The second example finds the smallest (most negative, or smallest positive) Gasteiger partial charge on any carbon atom in the two argument ensembles, which form a transient dataset.

dataset molfile

dataset molfile dhandle ?filterset?
d.molfile(?filters=?)

Return the handle or reference of the molfile object associated with the dataset as backing page file. If no such file object exists, an empty string (for Tcl ) or None (for Python ) is returned.

Example:

set fh [dataset molfile $dh]
set fh [dataset get $dh pagefile]

The two commands are equivalent.

dataset move

dataset move dhandle datasethandle|remotehandle ?position?
d.move(target=,?position=?)

Move, depending on the acceptance flags of the destination dataset, either the objects in the dataset or transient dataset into another local or remote dataset, or move the dataset itself. If the destination dataset handle is an empty string (or None for Python ), the dataset objects are removed from the original dataset, but not moved into any other dataset. If the destination dataset accepts datasets as members, which is not the default (see the accept attribute in the section on dataset set ) the dataset is directly moved as object. Otherwise, its contained objects are moved, under preservation of the object order from the source dataset, and the source dataset is emptied, but not deleted.

Optionally, a position in the new dataset for the first moved object may be specified. This parameter is either an index (beginning with 0), or end , which is the default. If the contents of a dataset are spliced into another at a specific position, objects after the first element of the source dataset follow as a block.

Another special position value is random or rnd. This value moves to the dataset, or dataset contents, to a random position in the target dataset. Use of this mode with remote datasets is currently not supported.

In case of a transient command dataset the original dataset memberships of the dataset objects are not restored when the command completes.

The return value of the command is the dataset of the ensemble prior to the move operation. It is either a dataset handle/reference, or an empty string ( Tcl ) or None ( Python ) if it was not member of a dataset.

A dataset cannot be moved into itself.

Examples:

dataset move $dhandle $dhandle2 0
dataset move $dhandle {}
dataset move [ens list] [dataset create]

The first line moves all objects in the source dataset into the first (and following) positions in the destination dataset. The second example removes all elements from the dataset. This is often useful in order to avoid dataset member destruction with the dataset delete command. The final example shows how to move a set of ensembles (here: all ensembles currently defined in the application) into a newly created dataset via an intermediate, transient dataset.

dataset move $dhandle vioxx@server55:10001

This command moves all objects in the first dataset to the remote dataset on host server55 , which listens on port 10001 and requires the pass phrase vioxx for access.

dataset mutex

dataset mutex dhandle mode
d.mutex(mode)

Manipulate the object mutex.

During the execution of a script command, the mutex of the major object(s) associated with the command are automatically locked and unlocked, so that the operation of the command is thread-safe. This applies to toolkit builds that support multi-threading, either by allowing multiple parallel script interpreters in separate threads or by supporting helper threads for the acceleration of command execution or background information processing.

Going beyond this automatic per-statement protection, this command locks major objects for a period of time that exceeds a single command. A lock on the object can only be released from the same interpreter thread that set the lock. Any other threaded interpreters, or auxiliary threads, block until a mutex release command has been executed when accessing a locked command object. This command supports the following modes:

There is no trylock command variant because the command already needs to be able to acquire a transient object mutex lock for its execution.

The command returns the current lock count.

dataset need

dataset need dhandle propertylist ?mode? ?parameterdict?
d.need(property=,?mode=?,?parameters=?)

Standard command for the computation of property data, without immediate retrieval of results. In the common case of threaded computation, this starts a compute thread whose results or error status can be collected later. This command is explained in more detail in the section about retrieving property data.

If the dataset is not transient, the return value is the original dataset handle or reference.

Example:

dataset need $dhandle D_GIF recalc

dataset networks

dataset networks dhandle ?filterset? ?filtermode? ?recursive?
d.networks(?filters=?,?mode=?,?recursive=?)

Return a list of the handles or references of all the networks in the dataset. Other objects (ensembles, reactions, datasets, tables) are ignored. The object list may optionally be filtered by the filter list, and the result further modified by a standard filter mode argument.

If the recursive flag is set, and the dataset contains other datasets as objects, networks in these nested datasets are also listed.

Example:

set n [dataset networks $dhandle {} count]

dataset new

dataset new dhandle propertylist ?filterset? ?parameterdict?
d.new(property=,?filters=?,?parameters=?)
Dataset.New(items,property=,?filters=?,?parameters=?)

Standard data manipulation command for reading object data. It is explained in more detail in the section about retrieving property data.

For examples, see the dataset get command. The difference between dataset get and dataset new is that the latter forces the re-computation of the property data, regardless whether it is present and valid, or not.

The Python class method is a one-shot command. The transient dataset created from the initialization items is automatically deleted when the command finishes.

dataset nget

dataset nget dhandle propertylist ?filterset? ?parameterdict?
d.nget(property=,?filters=?,?parameters=?)
Dataset.Nget(items,property=,?filters=?,?parameters=?)

Standard data manipulation command for reading object data and attributes. It is explained in more detail in the section about retrieving property data.

For examples, see the dataset get command. The difference between dataset get and dataset nget is that the latter always returns numeric data, even if symbolic names for the values are available.

The Python class method is a one-shot command. The transient dataset created from the initialization items is automatically deleted when the command finishes.

dataset nnew

dataset nnew dhandle propertylist ?filterset? ?parameterdict?
d.nnew(property=,?filters=?,?parameters=?)
Dataset.Nnew(items,property=,?filters=?,?parameters=?)

Standard data manipulation command for reading object data and attributes. It is explained in more detail in the section about retrieving property data.

For examples, see the dataset get command. The difference between dataset get and dataset nnew is that the latter always returns numeric data, even if symbolic names for the values are available, and that property data re-computation is enforced.

The Python class method is a one-shot command. The transient dataset created from the initialization items is automatically deleted when the command finishes.

dataset nitrostyle

dataset nitrostyle dhandle style
d.nitrostyle(style=)

Change the internal encoding of nitro groups and similar functional groups in the ensembles and reactions in the dataset. Possible values for the style parameter are:

The command returns the dataset handle or reference.

dataset objects

dataset objects dhandle ?pattern?
d.objects(?pattern=?)

This is a non-standard cross-referencing command. The result is a list of all the objects in the dataset, where each result list element is a list or tuple consisting of the object type (ens, reaction, table, network, dataset), and the object handle or reference. Optionally, the dataset objects may be filtered by the pattern argument which applies to the object handle.

Example:

dataset objects $dhandle ens*

is roughly equivalent to

dataset ens $dhandle

except that the latter only lists the ensemble handles, not pairs of object class name and handle.

dataset pack

dataset pack dhandle ?maxsize? ?requestprops? ?suppressedprops? ?compressionlib?
d.pack(?maxsize=?,?requestprops=?,?suppressedprops=?,?compressionlib=?)

Pack the dataset and all objects it contains into a base-64 encoded, compressed string as a serialized object. The string does not contain any non-printing characters, quotation marks or other problematic characters and is thus well suited for storage in database tables and similar applications. These packed strings are portable and platform-independent.

The maximum size of the object string (default -1, meaning unlimited) can be configured by the optional maxsize parameter. The size is specified in bytes. If the pack string would be longer than the maximum size, an error results.

The two optional parameter lists allow to request a specific property set to be part of the package, even if it normally would not be included, and to explicitly omit properties from the dump. No property computation is performed, and suppressed properties are not purged from the source ensemble.

The default compression library is zlib . Other useful variants include lzo and gzip (and there are other internal types), but these may not be available on all builds due to license issues, and you need to specify the compression library when a dataset is unpacked. It is generally recommended to stay with zlib .

The return value of this command is the packed string.

In Python , datasets support the standard pickle / unpickle protocol.

Example:

dataset pack $dhandle

dataset pop

dataset pop dhandle|remotehandle ?position? ?timeout?
d.pop(?position=?,?timeout=?)
Dataset.Pop(dref/remotehandle,?position=?,?timeout=?)

Remove an object from a dataset. The handle or reference of the selected object is returned, and the object is no longer a member of the dataset when the command completes. If a timeout is specified, it is transferred to the dataset attribute of the same name before the command is executed, as with a dataset set command.

By default the first object in the dataset, at index zero, is returned. A different object can be selected by means of the optional position argument. It can be a numerical index, end for the last object, rnd / random for a random selection. If the object index if larger than the maximum index of any object, it is silently rewritten to end . Random pops are not supported on remote datasets.

This command works with remote datasets. In that case, the object is transferred via an intermediate serialized object representation over the network. It is unpacked on the local interpreter, and deleted on the remote interpreter.

If the desired dataset object cannot be found, and a timeout is set, including a negative value for an unlimited wait time, the command suspends execution until the object appears in the dataset, for example from a different script thread or as result of a remote object insertion. If a wait would be executed, but the eod/targeteod parameter pair of the dataset indicate that no further data can be expected, the command returns an empty string (for Tcl ) or None (for Python ) instead of the object handle or reference, but does not trigger an error. Otherwise, if the object cannot be delivered immediately or after the timeout, an error results.

Example:

set eh [dataset pop $eh end]

dataset properties

dataset properties dhandle ?pattern? ?intersectionmode?
d.properties(?pattern=?,?intersectionmode=?)

Get a list of valid properties of the dataset proper and the dataset objects. By default, both dataset properties (prefix D_ ) as well as the properties of the objects in the dataset (prefix E_ for ensembles, X_ for reactions, T_ for tables, N_ for networks, D_ for datasets as members) and the properties of their minor objects (atoms, bonds, etc.) are listed. Property subsets may be selected by specifying a string filter pattern. In case of dataset element properties which are not present in all dataset members, the default intersect mode is union, meaning that all properties are reported for which at least a single instance in any member exists. The alternative mode intersect only lists those dataset member properties which are present at all dataset members.

This command may also be invoked as dataset props or d.props() .

Example:

dataset properties $dhandle D_*
dataset props $dhandle E_* intersect

The first example returns a list of the currently valid dataset-level properties. The second example lists ensemble properties which are present in all dataset objects.

dataset purge

dataset purge dhandle propertylist ?emptyonly?
d.purge(?properties=?,?emptyonly=?)

Delete property data from the dataset. The properties may be both dataset properties (prefix D_ ) or properties of the dataset members, such as ensemble or atom properties. If a property marked for deletion is not present on an object, it is silently ignored.

If an object class name, such as ens or atom , is used instead of a property name, all properties of that class set on the objects in the dataset are deleted, if they are not locked, or filtered out by the optional empty-only flag.

Besides normal property names, a few convenient alias names for common property deletion tasks of ensembles in a dataset, or the reaction ensembles of reactions in the dataset, are defined and can be used as a replacement for the property list. These include:

The optional boolean flag emptyonly restricts the deletion to those properties where all the values for a property associated with a major object (such as on all atoms in an ensemble for atom properties, or just the single ensemble property value for ensemble properties) are set to the default property value.

The return value is the original dataset handle or reference.

Examples:

dataset purge $dhandle D_GIF
dataset purge [ens list] E_IDENT 1
dataset purge $dhandle stereochemistry

The first example deletes the property data D_GIF for the selected dataset if it is present. The second example deletes property E_IDENT from all ensembles in the current application if their property value is equal to the default value of E_IDENT . The third examples removes stereochemistry from all dataset ensembles.

dataset reactions

dataset reactions dhandle ?filterset? ?filtermode? ?recursive?
d.reactions(?filters=?,?mode=?,?recursive=?)

Return a list of all the handles or references of the reactions in the dataset. Other objects (ensembles, tables. datasets, networks) are ignored. The object list may optionally be filtered by the filter list, and the output further modified by a standard filter mode.

If the optional boolean recursive argument is set, reactions of which ensembles in the dataset are a component are also listed. Furthermore, if the dataset contains datasets as elements, these are recursively traversed, and reactions in these, as well as reactions as components of ensembles in these datasets, are listed. If the output mode of the command is a handle list, items found by recursion are appended in a straight fashion, without the creation of nested lists. By default the recursion flag is off. Regardless of the flag value, reactions which are associated with rows of a table in the dataset, but are not themselves dataset members, are not output.

Example:

set xlist [dataset reactions $dhandle]

Return a list of the handles of the reactions in the dataset.

set cnt [dataset reactions $dhandle {} count 1]

returns a count of all reactions which are either directly members of the dataset, or indirectly because ensembles in the dataset are part of a reaction, or which are contained in datasets which are a themselves a member of the primary dataset.

dataset read

dataset read dhandle ?datasethandle/enshandle? ?#recs|batch|all?
d.read(?target=?,?limit=?)

This command returns handles or references of duplicates of one or more objects from the current dataset iterator position ( record attribute). Its arguments mimic those of the molfile read command. The iterator record attribute is automatically incremented. When the end of the dataset is reached, an empty result is returned, but no error is raised.

The return value is usually the handle or reference of the object duplicated from the dataset member at the current read position. If an optional target dataset has been specified. the object is appended to that dataset, and the return value is the target dataset handle. It is also possible to use the magic dataset handles new or #auto , which create a new receptor dataset.

If instead of a target dataset an existing target ensemble is specified, the recipient ensemble is cleared, and the read dataset object placed into its hull without changing its handle. This requires that the read object is an ensemble, and not a reaction, table, dataset or network, and that only a single item is read. It is also possible to use an empty argument to skip these options.

By default, a single object is duplicated and the iterator record attribute of the dataset incremented by one. With the optional third argument, a different number of objects can be selected for reading as a block. The special value all reads all remaining objects, and batch copies a number of objects corresponding to the batchsize dataset attribute. If there are insufficient objects in the dataset to read all requested records, only the available set is returned, and no error results.

The dataset contents are not changed by this command. All extracted items are object duplicates. In order to fetch original objects from the dataset, use the dataset pop command, or the various object move commands.

The command variant dataset hread provides the same functionality as this command, but additionally adds a standard set of hydrogen atoms to the duplicates.

dataset ref

Dataset.Ref(identifier)

Python only method to get a dataset reference from a handle or another identifier. For datasets, other recognized identifiers are dataset references, integers encoding the numeric part of the handle string, the dataset UUID or name, or a table handle (which returns the dataset embedded in the table).

dataset remove

dataset remove dhandle ?handle?...
d.remove(?handle/ref?,...)

Remove objects from a dataset. The removal objects must be in the dataset.

If the dataset is not virtual, the command returns the dataset handle or reference.

dataset rename

dataset rename dhandle srcproperty dstproperty
d.rename(srcproperty=,dstproperty=)

This is a variant of the dataset assign command. Please refer the command description in that paragraph.

dataset request

dataset request dhandle propertylist ?reload? ?modelist?

Request property data for a dataset when the dataset is not maintained locally, but a partial shadow copy of a remotely managed dataset. It is assumed to have been only partially transferred via RPC to a slave from a master controller application, for example for display purposes, but without the full data content, which resides on the master.

If the requested property data is already present on the slave, and the reload flag is not set, this command is equivalent to a dataset need command and does not invoke communication with the master. Otherwise, the master is asked to provide the information, which may be calculated on the master only after receiving the request, or even delegated by the master to another remote server for computation.

Once the requested data has been received by the slave, it is added to the property data set of the local dataset copy. The optional modelist parameter is the same as in the dataset need command. This command is used to guarantee that critical or non-computable property data is obtained from the master. Local, unsynchronized data may still be computed by the slave using standard property data access commands. It is currently not possible to send data back to the master.

This command is only available on toolkit versions which have been compiled with RPC support.

In the absence of errors, the command returns a boolean status code. If it is zero, the request failed in a non-critical way. This for example happens in case the dataset is not under control of a remote application.

Example:

if {![dataset request $dhandle A_XY]} {
	dataset need $dhandle A_XY
}

is a bullet proof method of guaranteeing that correct atomic 2D display coordinates are present for the dataset structures even if the script is run in a master/slave context.

This command is not supported in the Python interface.

dataset rewind

dataset rewind dhandle
d.rewind()

Reset the dataset iterator record. This is equivalent to setting the record attribute to one.

dataset scan

dataset scan dhandle expression/queryhandle ?mode? ?parameterdict?
d.scan(query=,?resultmode=?,?parameters=?)
Dataset.Scan(items,query=,?resultmode=?,?parameters=?)

Perform a query on the dataset or transient dataset. The syntax of the query expression is the same as that of the molfile scan command and explained in more detail in its section on query expressions. Essentially, this command behaves like an in-memory data file version of the molfile scan command. However, currently queries work on ensembles and reactions as dataset members only. Any table, network or other object which is a member of a scanned dataset is skipped. Skipped items still count as records for positioning and query result output. In the absence of a specified scan record list (order parameter), dataset scans begin at the current position of the iterator record attribute that is shared with the dataset read/hread commands.

The optional parameter dictionary is the same as for molfile scan , but not all parameters are actually used. At this time, only the matchcallback, maxhits, maxscan, order, progresscallback, progresscallbackfrequency, sscheckcallback, startposition and target parameters have an effect. If result ensembles or reactions are transferred to a remote dataset via the target parameter, they are not deleted from the local dataset but duplicates are created instead. This is because the original objects are members of the dataset which, just like a structure file would, should remain unchanged as result of a scan. In contrast, in file scans, the transferred ensembles and reactions were read from file and created as new objects during the scan, and sending these does not change the underlying file. In case a progress callback function is used, the dataset handle is passed as argument in place of the molfile handle in molfile scan .

The return value depends on the mode. The default mode is enslist . The following modes are supported for dataset queries:

If requested property data is not present on the matched dataset objects, an attempt is made to compute it. If this fails, the table object in retrieval mode table contains NULL cells, and property retrieval as list data produces empty list elements, but no errors. For minor object properties, the property list retrieval modes produce lists of all object property values instead of a single value. In table mode, only the data for the first object is retrieved, which makes this mode less suitable for direct minor object property retrieval.

The following pseudo properties can be retrieved in addition to normal properties:

These pseudo properties are identical to those available for structure file queries. However, structure file queries support a couple of additional pseudo properties which are not available for dataset queries.

The Python class method is a one-shot command. The transient dataset created from the initialization items is automatically deleted when the command finishes.

Examples:

dataset scan $dhandle {E_WEIGHT < 200} recordlist
dataset scan $dhandle “structure >= c1ccccc1” {table E_NAME E_LOPG record}
dataset scan $dhandle “structure >~ $sshnd 90” {cmpvalue E_REACTION_ROLE X_IDENT}

The first example returns the record numbers (dataset member indices plus one) of all structures in the dataset which have a molecular weight of less than 200.

The seconds example generates a table with columns for name, logP and record number. The table is filled with data from all structures which contain a phenyl ring as substructure.

The final example returns a nested list of the properties of all dataset structures which have a Tanimoto similarity of 90% or more to the structure which is represented by its handle stored in the variable $sshnd . In this example, the ensembles are expected to be also part of a reaction, which is possible since reaction and dataset membership are completely unrelated. Each result list element contains the actual similarity value (which is the only comparison result value with a threshold evaluated in the query, so there is no ambiguity which comparison result cmpvalue refers to), the role of the ensemble in the reaction ( reagent , product , catalyst , etc.) from property E_REACTION_ROLE , and the reaction ID in X_IDENT . The scan mode is here automatically set to propertylist , because the mode list consists exclusively of names of properties and pseudo properties.

Another example:

set is_chno [dataset scan $ehandle {formula = C0-H0-N0-O0-} count]

This command checks whether the ensemble (which is, for the duration of the command, embedded into a transient dataset) contains only elements C, H, N and O.

dataset set

dataset set dhandle ?property value?...
d.set(?property,value?,...)
d.set({property:value...})
d.property = value
d[property] = value

Standard data manipulation command. It is explained in more detail in the section about setting property data.

In addition to property data, the dataset object possesses a few attributes, which can be retrieved with the get command (but not by its related sister subcommands like dget , sqlget , etc.). Many of them are also modifiable via dataset set. These attributes are:

Examples:

dataset set $dhandle D_NAME “New lead structures”
dataset set $dhandle E_NAME “Lead (metal)”

The first line is a simple set operation for a dataset property. The second line shows how to set properties of multiple ensembles in one step. The same property value is assigned to all ensembles.

dataset set $dhandle port 10001 passphrase blockbuster

Set up a listener thread on port 10001 which accepts connections from remote interpreters which need to present the pass phrase as credential. Remote interpreters can add ( ens move , reaction move , table move ) or remove ( dataset pop ) objects to or from this dataset, as well as query the dataset object count ( dataset count ). Objects are transferred over the network connection as serialized objects to and from the remote interpreters.

dataset setparam

dataset setparam dhandle property ?key value?...
dataset setparam dhandle property dictionary
d.setparam(property,?key,value?...)
d.setparam(property,dict)

Set or update a property computation parameter in the parameter list of a valid property. This command is described in the section about retrieving property data.

The return value is the updated property computation parameter dictionary.

Example:

dataset setparam $dhandle D_GIF comment “Top Secret”

dataset show

dataset show dhandle propertylist ?filterset? ?parameterdict?
d.show(property=,?filters=?,?parameters=?)
Dataset.Show(items,property=,?filters=?,?parameters=?)

Standard data manipulation command for reading object data. It is explained in more detail in the section about retrieving property data.

For examples, see the dataset get command. The difference between dataset get and dataset show is that the latter does not attempt computation of property data, but raises an error if the data is not present and valid. For data already present, dataset get and dataset show are equivalent.

The Python class method is a one-shot command. The transient dataset created from the initialization items is automatically deleted when the command finishes.

dataset sort

dataset sort dhandle {property ?direction ?cmpflags ?cmpvalue???}...
d.sort((property,?direction,?cmpflags,?cmpvalue???),...)

Sort a dataset according to property values of the objects in the dataset. If no sort property set is specified, the default sort properties are E_NATOMS (number of atoms) and, for breaking ties, E_WEIGHT (molecular weight) and finally E_HASHISY (stereo isotope hash code).

Every sort item is interpreted as a nested list/tuple and can have from one to four elements. The first, mandatory element is the sort property, or one of the magic names record (or #record) or random (#random). The next optional element is the sort direction, specified as up (or ascending ) or down ( descending ). The default sorting order is ascending. The final optional comparison flags parameter can be set to a combination of any of the values allowed with the prop compare command. The default is an empty flag set. Properties in the sort list have precedence in the order they are specified in. Object property values of comparison list entries to the right in this list are only considered if the comparison of all data values of list elements to the left results in a tie.

If a comparison value is supplied as fourth argument, the sort utilizes the comparison results of dataset object property values against this value for ranking, not the direct comparison result between the dataset object property values. This is for example useful when sorting according to a bitvector similarity value to an external structure.

The magic property name record sorts by the object index in the dataset. Sorting upwards on this property does not change the object sequence in the dataset, and sorting downwards reverses it. This pseudo property is always added as a final implicit criterion, so that the sequence order of objects tied in all explicit comparisons is preserved. The other magic property name random assigns a random value to all dataset objects and sorts on this value, yielding a random object sequence.

The command returns a list of the handles of the objects controlled by the dataset in the newly sorted order. Simultaneously, the objects are physically moved within the dataset, so the sort has a persistent effect. The same result list may later be obtained by a dataset objects command.

It is possible to sort transient datasets, but this makes sense only if the object list sequence returned as command result is captured and used later, because the sort effect is not persistent since there exists no permanent dataset object.

Examples:

dataset sort $dhandle {E_NAME up {ignorecase lazy}]

The example sorts the dataset according to the compound name (property E_NAME , data type string) in alphabetic order, using a lazy (ignoring whitespace and punctuation) and case-insensitive comparison mode.

dataset sort $dhandle {E_NATOMS down} {E_NRINGS up}

Sort the dataset in such a way that the ensembles with the largest number of atoms, and among these those with the smallest number of rings, come first.

dataset sort $dhandle random

This command randomizes the object order in the dataset.

dataset sort $dhandle {*}$sortlist

This is the recommended construct when using a sort property list store in a Tcl variable as command argument. Older versions of the dataset sort command used a single sort argument parameter instead of a variable-size argument set.

dataset sqldget

dataset sqldget dhandle propertylist ?filterset? ?parameterdict?
d.sqldget(property=,?filters=?,?parameters=?)
Dataset.Sqldget(items,property=,?filters=?,?parameters=?)

Standard data manipulation command for reading object data. It is explained in more detail in the section about retrieving property data.

For examples, see the dataset get command. The differences between dataset get and dataset sqldget are that the latter does not attempt computation of property data, but initializes the property value to the default and returns that default, if the data is not present and valid; and that the SQL command variant formats the data as SQL values rather than for Tcl or Python script processing.

The Python class method is a one-shot command. The transient dataset created from the initialization items is automatically deleted when the command finishes.

dataset sqlget

dataset sqlget dhandle propertylist ?filterset? ?parameterdict?
d.sqlget(property=,?filters=?,?parameters=?)
Dataset.Sqlget(items,property=,?filters=?,?parameters=?)

Standard data manipulation command for reading object data. It is explained in more detail in the section about retrieving property data.

For examples, see the dataset get command. The difference between dataset get and dataset sqlget is that the SQL command variant formats the data as SQL values rather than for Tcl or Python script processing.

The Python class method is a one-shot command. The transient dataset created from the initialization items is automatically deleted when the command finishes.

dataset sqlnew

dataset sqlnew dhandle propertylist ?filterset? ?parameterdict?
d.sqlnew(property=,?filters=?,?parameters=?)
Dataset.Sqlnew(items,property=,?filters=?,?parameters=?)

Standard data manipulation command for reading object data. It is explained in more detail in the section about retrieving property data.

For examples, see the dataset get command. The differences between dataset get and dataset sqlnew are that the latter forces re-computation of the property data, and that the SQL command variant formats the data as SQL values rather than for Tcl or Python script processing.

The Python class method is a one-shot command. The transient dataset created from the initialization items is automatically deleted when the command finishes.

dataset sqlshow

dataset sqlshow dhandle propertylist ?filterset? ?parameterdict?
d.sqlshow(property=,?filters=?,?parameters=?)
Dataset.Sqlshow(items,property=,?filters=?,?parameters=?)

Standard data manipulation command for reading object data. It is explained in more detail in the section about retrieving property data.

For examples, see the dataset get command. The differences between dataset get and dataset sqlshow are that the latter does not attempt computation of property data, but raises an error if the data is not present and valid, and that the SQL command variant formats the data as SQL values rather than for Tcl or Python script processing.

The Python class method is a one-shot command. The transient dataset created from the initialization items is automatically deleted when the command finishes.

dataset statistics

dataset statistics dhandle property
d.statistics(property)

Get basic statistics on the property values of the objects in the dataset. The property can be a basic property or a property field, but its element data type needs to be cast-able to a simple numeric type. In addition, it must be directly attached to any of the objects which can be members of a dataset, e.g. an ensemble property, but not an atom property.

If the property data is not present on an object, an attempt is made to compute it. In case that fails, or a dataset member object is not of a type matching the property, these objects are silently skipped.

The return value is a dictionary containing the number of objects in the dataset which were used for the statistics (key n ), the sum of property values ( sum ), the property value average ( avg ) and the property data standard deviation ( stddev ). The latter three values are floating points, regardless of the property data type. In case any of these values are not computable, for example because there were an insufficient number of objects, the reported value is zero.

The command verb can be abbreviated as stats .

Example:

set d [dataset statistics $dh E_WEIGHT]puts „Avg: [dict get $d avg]"

dataset subcommands

dataset subcommands
dir(Dataset)

Lists all subcommands of the dataset command. Note that this command does not require a dataset handle.

dataset tables

dataset tables dhandle ?filterset? ?filtermode? ?recursive?
d.tables(?filters=?,?mode=?,?recursive=?)

Return a list of all the handles or references of the tables in the dataset. Other objects in the dataset (ensembles, reactions, datasets, networks) are ignored. The object list may optionally be filtered by the filter list, and the result further modified by a standard filter mode argument.

If the recursive flag is set, and the dataset contains other datasets as objects, tables in these nested datasets are also listed.

Example:

set n [dataset tables $dhandle {} count]

dataset taint

dataset taint dhandle propertylist/changeset ?purge?
d.taint(property=,?purge=?)

Trigger a property data tainting event which acts on the dataset data, and all objects and their data contained in the dataset.

The parameters of this command are the same as for ens taint and explained there.

Example:

dataset taint $dhandle A_XYZ

All property data on the dataset and the dataset members is invalidated if it directly or indirectly depends on the 3D atomic coordinates.

The command returns the original object handle or reference.

dataset threadexec

dataset threadexec ?maxthreads? ?substitutiondict? scriptbody

Execute a script on the objects in the dataset in parallel in multiple threads. The number of threads is by default the lesser of 16 or the number of objects in the dataset, but this can be configured. If there are more dataset objects than threads, threads are started in a groupwise fashion. In the function body, standard Tcl/Tk percent substitution is performed. The default substitutions are %D for the dataset handle, and %O for the thread-specific dataset object. Other custom substitutions can be configured in the optional substitution dictionary, in a letter /value (no percent prefix) format.

There are some limitations on what the object threads can do. They are allowed to delete their own current object, or move it outside the dataset, but not other objects in the dataset. Additional objects may be appended to the dataset (they are not subject to processing by the original command), but not inserted in random positions. Computation in the script body must reach the end of the script, or be ended by return or break statements. An error in any of the threads stops the command. All threads of a group must have finished before a new group is started.

The command returns the dataset handle if the dataset is not virtual.

Because of multi-threading issues, there is no Python version of the command.

dataset transfer

dataset transfer dhandle propertylist ?targethandle? ?targetpropertylist?
d.transfer(properties=,?target=?,?targetproperties=?)

Copy property data from one dataset to another dataset or other major object, without going through an intermediate scripting language object representation, or alternatively dissociate property data from the dataset. If a property in the argument property list is not already valid on the source dataset, an attempt is made to compute it.

If a target object is specified, the return value is the handle or reference of the target object. The source dataset and the target object cannot be the same object.

If a target property list is given, the data from the source is stored as content of a different property on the target. For this, the data types of the properties must be compatible, and the object class of the target property that of the target object. No attempt is made to convert data of mismatched types. In case of multiple properties, the source property list and the target property list are stepped through in parallel. If there is no target property list, or it is shorter than the source list, unmatched entries are stored as original property values, and this implies that the object class of the source and target objects are the same.

If no target object is specified, or it is spelled as an empty string or Python None , the visible effect of the command is the same as a simple dataset get , i.e. the result is the property data value or value list. The property data is then deleted from the source object. In case the data type of the deleted property was that of a major object (i.e. an ensemble, reaction, table, dataset or network), it is only unlinked from the source object, but not destroyed. This means that the object handles returned by the command can henceforth the used as independent objects. They can be deleted by a normal object deletion command, and are no longer managed by the source object.

Example:

dataset transfer $dh D_SVG_IMAGE $lh L_1DPATTERN_SVG_IMAGE

This command performs a data transfer between different object classes, with change of the property under which the content is stored.

dataset transform

dataset transform dhandle SMIRKSlist ?direction? ?reactionmode?	?selectionmode? ?flags? ?overlapmode? ?{?exclusionmode? excludesslist}?	?maxstructures? ?timeout? ?maxtransforms? ?niterations? ?statusvariable?
d.transform(transforms=,?direction=?,?reactionmode=?,?selectionmode=?,?flags=?,	?overlapmode=?,?excludess=?,?maxstructures=?,?timeout=?,?maxtransforms=?,	?iterations=?)
Dataset.Transform(items,transforms=,?direction=?,?reactionmode=?,	?selectionmode=?,?flags=?,?overlapmode=?,?excludess=?,?maxstructures=?,	?timeout=?,?maxtransforms=?,?iterations=?)

This command is complex, but very similar to the ens transform command. Please refer to that command for a full description of the command arguments.

The major difference of dataset transform is that the start structure set is not a single ensemble, but rather the set of all ensembles in the dataset. Any dataset items which are not ensembles are ignored. The return value is, just as with the ens transform command, a list of result ensembles. These do not become part of the input dataset.

Example:

dataset transform [ens get $ehandle E_KEKULESET] $trafolist bidirectional \
	multistep all {preservecharges checkaro setname}

This command first expands an ensemble object into a set of Kekulé structures. The property data type of the E_KEKULESET property is a dataset, so its handle is returned, and this dataset is then submitted for further transformation, which in this case involves manipulations of bonds in aromatic systems and thus is dependent on the Kekulé structures of the input ensembles.

The dataset variant of the transform command does not allow the use of marked or unmarked atom or bond specifications in the exclusion substructure list. Normal substructures are supported, and are applied to all start structures.

The Python class method is a one-shot command. The transient dataset created from the initialization items is automatically deleted when the command finishes.

dataset unique

dataset unique dhandle {property ?direction? ?cmpflags?}..
d.unique((property,?direction,?cmpflags,?cmpvalue???),...)

This command removes duplicate objects from the dataset and destroys them. Object equivalence is determined by pair-wise comparison of one or more properties. If all these properties are identical for any two objects, one of them is deleted. If no properties are specified, the default is the single property E_HASHISY , the standard isotope- and stereo-aware ensemble hash code.

The command returns labels or references of the ordered list of objects remaining in the dataset after deletion. The command is closely related to the dataset sort command, and the same restrictions on usable sort properties apply. Internally, the command performs a sort first, in order to avoid a quadratic growth of pair-wise comparisons. This has the side effect that the object order in the dataset is not preserved. Instead, the surviving objects are listed in ascending (by default) or descending (if the corresponding optional sort direction argument is set accordingly) values of the sort properties. The interpretation of the optional comparison flags and sort direction arguments, as well as the priority of the properties, and the special considerations when working on transient datasets, are the same as for the command dataset sort .

Example:

molfile read $fh $dh all
dataset unique $dh

This command first reads a complete file into a dataset, and then discard duplicates, using the default isotope- and stereo-aware structure hash code.

dataset unlock

dataset unlock dhandle propertylist/dataset/all
d.unlock(property=)

Unlock property data for the dataset object, meaning that they are again under the control of the standard data consistency manager.

The property data to unlock can be selected by providing a list of the following identifiers:

Property data locks are obtained by the dataset lock command.

This command does not recurse into the objects contained in the dataset.

The return value is the original dataset handle or reference. If the argument was a transient dataset (only possible for Tcl ), the result is an empty string.

dataset unpack

dataset unpack string ?compressionlib)
Dataset.Unpack(data=,?compressionlib=?)

Generate a dataset complete with all elements it contains from a packed, base64-encoded serialized object string, as it is generated by the complementary dataset pack command.

The return value is the handle or reference of the new dataset. All objects in the new dataset also are assigned standard handles, which can be retrieved with the usual commands such as dataset ens and dataset reactions .

The default compression library is zlib . For more options, see dataset pack .

Note that this command does not take a dataset handle as argument, but a pack string.

Example:

dataset unpack [dataset pack $dhandle]

This example is effectively the same as a dataset dup operation, but of course less efficient, because the objects have to be serialized, compressed, and base64-encoded and the same sequence of operations run backward again.

dataset valid

dataset valid dhandle propertylist
d.valid(property/propertysequence)

Returns a list of boolean values indicating whether values for the named properties are currently set for the dataset. No attempt at computation is made. For Python , where single-item lists are syntactically not the same as a single value, the return value is a single boolean if the argument was a string or a property reference, and only a single property was decoded.

Example:

dataset valid $dhandle D_NAME

reports whether the dataset is named (has a valid D_NAME property) or not.

dataset has is an alias to this command.

dataset verify

dataset verify dhandle property
d.verify(property)

Verify the values of the specified property on the dataset. The property data must be valid, and a dataset property. If the data can be found, it is checked against all constraints defined for the property, and, if such a function has been defined, is tested with the value verification function of the property.

If all tests are passed, the return value is boolean 1, 0 if the data could be found but fails a test, and an error condition otherwise.

dataset wait

dataset wait dhandle ?size|query? ?script?
d.wait(?query=?,?size=?,?function=?)

Suspend the interpreter until the number of objects in the dataset has reached a threshold, or an object which satisfies a query expression can be found. The syntax of query expressions is the same as in the dataset scan command. Query parsing is attempted if the argument is not a simple integer. If no explicit size or query expression is specified, or an empty string (or None for Python) is passed as this parameter, the command uses the value of the highwatermark dataset attribute as default value for an implicit size threshold condition.

Another dataset attribute which has an influence on the execution of the command is the timeout attribute. If the dataset size has not grown to the required size, or no object which satisfies the query expression was added to the dataset after waiting for the timeout number of seconds, an error is raised. By default, the maximum wait period is indefinite, which corresponds to a negative timeout value. If the timeout value is set to zero, the wait condition must be met immediately, or an error results. However, no error is raised if the eod/targeteod dataset parameter pair indicates that no more data can be expected to be added in the dataset. In that case, the result is an empty string, or None for Python.

If no script function parameter is used, the return value of the command is the number of objects the dataset holds in case of an explicit or implicit size condition, or the handle/reference of the first matching object in case of a query expression.

If the object count already exceeds the threshold, or a matching object can be found at the moment the command is executed, the command returns immediately.

In the Tcl case, and in the presence of a script body parameter, the script is executed whenever the wait condition is met. If the script is ended with a continue statement, or simply reaches the end of the code block, the wait loop is automatically restarted. If the script reports an error, or is left via a break or return statement, the loop is terminated.

For Python , instead of the script body, a function name or reference can be used. This function is called in local scope with a single argument, which is either the current dataset item count in case of a simple threshold condition, or the reference of the object matching the query expression. Within the Python functions, the normal break and continue loop control commands cannot be used to to scope limitations. Instead, the custom exceptions BreakLoop and ContinueLoop can be raised. These are automatically caught and processed in the loop body handler code.

This command is mostly useful when running multi-threaded scripts, or when the dataset has an active remote command listener on a port. Under these circumstances, new objects may arrive in the dataset without participation of the local, waiting and stopped interpreter, which can then be processed.

While a dataset wait command is pending, the dataset cannot be deleted. Since it is possible that other threads or port monitors further update the dataset between the time the wait condition is met and script processing commences, action scripts should be prepared to see more or less items in the dataset than there were immediately after the trigger event.

Example:

loop n 1 $nrecs {
	set eh [dataset wait $dh “E_FILE(startrec) = $n”]
	molfile write $fh $eh
	ens delete $eh
}

This is a part of a simple write thread which writes back processed ensembles in the same order as they were read from an input file. In case there are multiple processing threads, it is likely to happen that the computation on an ensemble read from a higher input file record finishes before another with a smaller record number and thus the sequence of the ensembles to be written as delivered in the output queue becomes out of sync. By waiting for ensembles in the input record sequence the original order is preserved. More robust versions of such a script should handle the case of ensembles from a specific input record never appearing in the dataset and similar sources of disruption.

dataset weed

dataset weed dhandle keywords
d.weed(keywordsequence)
d.weed(?keyword?,...)

This command performs standard clean-up operations on all ensembles and reactions in the dataset. The supported operations are described in more detail in the section on the equivalent ens weed command.

The return value of this command is the dataset handle or reference.

dataset xlabel

dataset xlabel dhandle propertylist ?filterset? ?filterprocs?
d.xlabel(property=,?filters=?,?filterfunctions=?)

This command is rather complex and closely related to the dataset extract command. Its purpose is to extract handle/reference and label information for selected subsets of the dataset. The return value is a nested list. The sublists consist of the object handle or reference, the object label (if the object does not have a label, 1 is substituted), and the dataset object index. The dataset object index starts with zero.

The selection of the class of objects which are extracted is performed indirectly via the property list. For practical purposes, this list should be a single property. Its object association type determines the class of objects selected. For example, A_LABEL or A_SYMBOL returns atom labels, while B_ORDER returns bond labels and E_NAME select complete ensembles, with 1 as pseudo ensemble label.

The objects for which data is returned can further be filtered by a standard filter set, and additionally by a list of filter procedures (for Tcl , specified as procedure names) or functions (for Python , specified as function names or function references). These procedures or functions are called with the respective object handles/references and object labels as arguments. For example, a callback function used in an atom retrieval context would be called for each atom with its ensemble handle or reference and the atom label as arguments. If major objects without a label are checked, such as complete ensembles, 1 is passed as the label. The callback procedures are expected to return a boolean value. If it is false or 0, the object is not added to the returned list, and the other check procedures are no longer called.

The command currently only works on ensembles in the dataset, ignoring any reactions, tables, datasets or networks which may be present.

This command is primarily useful for the display of filtered minor object data from datasets, such as atom property values for specific types of atoms.

Example:

set dhandle [dataset create [ens create O] [ens create C=C]]
dataset xlabel $dhandle A_LABEL !hydrogen
dataset xlabel $dhandle B_ORDER doublebond

First, a dataset with two ensembles (water and ethene) is created. This dataset is then queried. The first query is for all atoms in it which are not hydrogen. The returned list is

{ens0 1 0} {ens1 1 1} {ens1 2 1}

In object ens0 , which is the first object in the dataset, atom 1 passes the filter. In object ens1 , which is the second object in the dataset, atoms with label 1 and 2 pass. The second query asks for the labels of double bonds in the dataset. The use of property B_ORDER is arbitrary - any other bond property would do as well. The return value of this command is

{ens1 1 1}

which indicates that only the bond with label 1 in object ens1 , which is the second object in the dataset, fulfills this condition.