The dataset Command

The dataset command is the generic command used to manipulate datasets. The syntax of this command follows the standard schema of command/subcommand/majorhandle . Datasets are major objects and thus do not need any minor object labels for identification.

Example:

dataset get $dhandle D_SIZE

As explained in the introductory section on datasets, a normal persistent dataset handle may be substituted as third argument of the dataset command by an arbitrary list of dataset, ensemble, reaction, table and network handles. Substitution is only allowed in that argument position, not in case where a dataset handle is part of the command arguments of another object command, and not in a different argument position in the context of a dataset command. Such an object list is transformed into a transient dataset for the duration of the command execution. After the command has completed, the elements of the transient dataset are in most cases restored to their original state with respect to dataset membership and position, except in a few documented exceptional circumstances.

As a means to access an embedded dataset object, its handle may be replaced by the handle of the parent object where this is unambiguous, e.g.

ens move $eh $thandle

moves the ensemble into the embedded dataset of the table, while

dataset count $thandle

treats the table argument as part of a transient dataset as described above.

This is the list of currently officially supported subcommands:

dataset add

dataset add dhandle objhandle ?position?

Add an object to the dataset, relocating it from a current dataset if it exists. If no position is specified, the object is appended to the rear of the dataset object list. The position can either be a numerical zero-based index, or any string beginning with ‘e’ to indicate the end position.

If the object handle identifies a (local) dataset, and the target dataset does not accept datasets as members, all objects in the source dataset are instead moved to the new dataset, and then the source dataset is destroyed. If ensembles, reactions, tables or networks are moved, they are unlinked from any current datasets, but these original datasets themselves persist.

This dataset command is equivalent to issuing a move command from the object.

Example:

dataset add $dh $eh end
ens move $eh $dh end

These two commands are equivalent.

dataset addthread

dataset addthread dhandle ?body?
dataset addthread dhandle count body
dataset addthread dhandle count substitutiondict body

Add one or more script threads to the dataset. By default, a single thread is added, but by setting the count parameter to a higher number multiple threads with the same script body can be added simultaneously, up to a maximum of 32 threads per dataset. It is possible to use this command to add additional threads to a dataset which already has attached threads. These older threads remain active.

The optional substitution dictionary contains a set of percent-prefixed keys and replacement values, following the Tk event procedure model. All such replacements are made before the script is passed to the thread interpreters. A single default substitution replacing the character sequence %D with the handle of the current dataset is always predefined and cannot be redefined. Replacement token keys (but not necessarily their values) are single case-depended characters, ignoring an optional percent prefix character. Within the script, percent signs which should be preserved as such must be doubled, just like in Tk event substitution commands.

The dataset threads are compatible to those of the standard Tcl threads package. Dataset-associated threads are automatically created in preserved state, and a thread::wait command is automatically appended at the end of the script, so they can be sent additional tasks via the thread::send facility. If no script body is specified, the initial script consists only of the wait command. Threads can be canceled or joined only if they are stopped the wait statement.

When a dataset is deleted, all threads associated with this dataset need first to be joined, and this can only happen if they have finished processing the main body script and are all in their idle state in the thread::wait command. Object deletion is postponed until this condition is met. A global join on all currently executing dataset threads is automatically performed when the program exits, before any object clean-up tasks are run. An application where dataset threads are stuck and do not reach their t hread::wait cancellation points cannot be cleanly exited.

Duplicating datasets does not duplicate any associated threads.

The presence of threads on a dataset has consequences for the behavior of the dataset wait and dataset pop commands, as well as object insertion commands associated with other major object classes (e.g. ens move , or molfile read ). Please refer to the respective paragraphs for details. The size control mechanism of datasets in the auto mode is also dependent on the presence of absence of linked dataset threads.

Example:

dataset addthread $dh 1 [dict create %T $th] {
	while {1} 
		set eh [dataset pop %D]
		if {$eh==""} break
		if {[catch {ens get $eh E_CANONIC_TAUTOMER} eh_canonic]} {
			ens delete $eh
			continue
		}	
		if {[catch {ens get $eh_canonic E_DESCRIPTORS}]} {
			ens delete $eh
			continue
		}
		table addens %T $eh_canonic
		ens delete $eh
	}
}

This code creates a processing thread on the dataset which computes properties on newly arriving ensembles, stores the data in a table (note the table handle substitution via the replacement dictionary) and then deletes the ensemble. The dataset pop command returns an empty string when it is known no more data will arrive, and otherwise blocks until an object for popping is available. This is managed by setting the eod dataset attribute from feeder threads.

The return value of the command is a list of the Tcl thread IDs of the newly created threads. These are suitable for use in the dataset jointhreads command or any standard Tcl thread package command.

dataset append

dataset append dhandle property value ?property value?

Standard data manipulation command for appending property data. It is explained in more detail in the section about setting property data.

Example:

dataset append $dhandle D_NAME “_new”
dataset append $dhandle eod 1

dataset assign

dataset assign dhandle srcprop dstprop

Copy data from one property to another. Both properties must be associated with the same object class. The source property (but currently not the destination property) may be specified as an indexed property subfield. There must be a conversion path between the data types of the two properties or property subfields involved for the operation to succeed. For example, assigning a string property to a numeric property succeeds only if the string data items contain suitable numbers.

The original property data remains valid. The command variant dataset rename directly exchanges the property name without any data duplication or conversion, if that is possible. In any case, the original property data is no longer present after the execution of this command variant.

If the properties are not associated with datasets (prefix D_ ), the operation is performed on all dataset member objects.

Example:

dataset assign $dhandle A_XY A_XY%

This code snippet creates backup atomic 2D layout coordinates on all dataset ensembles or reactions.

dataset cancelthreads

dataset cancelthreads ?all?
dataset cancelthreads dhandle ?all?
dataset cancelthreads dhandle threadid..

Cancel (or more precisely, wait for and join) one or more threads associated with the dataset. Dataset threads can only be canceled when they are idle, executing the implicitly added thread::wait command at the end of their script. Therefore, this command is not just used for clean-up, but also useful for ascertaining that the threads have finished their tasks. The IDs of the threads associated with a dataset can be retrieved as the threads dataset attribute, or saved from the return value of the original dataset addthread command. The special all thread ID value can be used to cancel all threads of the dataset. This can also be achieved by setting an empty thread ID parameter, or omitting it altogether. If a dataset does not possess threads, this command does nothing. If a thread marked for cancellation has not yet finished, the cancellation command is suspended until it has.

This command can also be invoked without specifying an explicit or transient dataset argument, or passing it as all. In that case, the thread join cleanup is run on all threads of all currently defined datasets. This function is also implicitly run when a a script exits, before performing other application cleanup operations.

Thread cancellation for all dataset threads is implicitly invoked when a dataset is deleted, so an explicit clean-up is not required. However, this also means that a dataset deletion blocks if there are still active threads. It is not possible to forcefully cancel an thread which has entered an infinite loop, so careful programming is required.

The command returns the number of canceled threads.

dataset jointhreads is an alias to this command.

Example:

dataset jointhreads $dh
dataset cancelthreads $dh [lindex [dataset get $th threads] 0]
dataset jointhreads

The first example waits for all threads on the specified dataset to finish. The second command waits for the completion of one specific thread, and the last command waits for all threads on all currently defined datasets.

dataset cast

dataset cast datasethandle dataset/ens/reaction/table ?propertylist?

Transform the dataset into a different object. Depending on the target object class, the result is as follows:

If the optional property list is specified, an attempt is made to compute the listed properties before the cast operation, so that they may become a part of the new object. No error is raised if a computation fails.

The command returns the handle of the new object, or the input object in case of mode dataset.

dataset clear

dataset clear dhandle

Delete all objects in the dataset, but keep the dataset object. The return value is the number of deleted objects.

dataset count

dataset count dhandle|remotehandle ?filterlist?

Get the number of objects in the dataset. If the filter parameter is specified, only those objects which pass the filter are counted.

Example:

dataset count $dhandle pstereoatom

counts the number of ensembles or reactions in the dataset with one or more potential atom stereo centers.

dataset size is an alias to this command.

This command can be used with remote datasets.

In case a simple count on a local dataset is required, without any filters, the dataset size can also be queried as attribute, as in

set n [dataset get $dhandle size]

dataset create

dataset create ?objecthandlelist?...

This command creates a new dataset and returns the handle of the new dataset. If the optional object handle lists are provided as arguments, the specified objects (in case of ensemble, reaction, network or table handles), or elements of the object (for a dataset handle, with default accept flags) are moved to the new dataset. In case the accept flags of the target dataset are configured to allow datasets as primary dataset objects, the source dataset argument is not implicitly replaced by its content objects but added as a single object, retaining its objects as content. Otherwise, the source dataset is emptied but remains a valid object.

Besides handles of ensembles, reactions, networks, tables and datasets, which are identified with priority, any string which can be decoded in an ens create statement is also allowed as member initialization identifier.

If the create statement references objects which are not usually accepted by the default settings of the accept table attribute, that attribute is automatically adjusted to allow for these objects.

The command always returns the handle of the new dataset, never the handles of any objects which may have been placed into the dataset

Examples:

dataset create [list $eh1 $eh2] $dh1

creates a new dataset and move the two specified ensembles $eh1 and $eh2, as well as everything contained in the dataset $dh1 , into the new dataset.

dataset create VXPBDCBTMSKCKZ

Above command matches a partial InChI key, and puts all structures from the NCI resolver which matches the non-stereo/isotope-specific part of their full InChI key, into the new dataset.

set ::cactvs(lookupmode) „name_pattern“
dataset create [list "+morphine +methyl"]

This command performs a name pattern lookup and puts all structures from the NCI resolver which contain both name fragments in one of their known names into the dataset. The name pattern string needs to be explicitly packed into a list, because otherwise it would be split into two independent list elements.

dataset dataset

dataset dataset dhandle ?filterlist?

Get the handle of the container dataset the dataset is a member of. If the dataset is not itself a dataset member, or does not pass all of the optional filters, an empty string is returned.

This command is not equivalent to dataset datasets !

dataset datasets

dataset datasets dhandle ?filterset? ?filtermode? ?recursive?

Return a list of all the datasets that are members in the dataset identified by the command argument handle. Other objects (ensembles, reactions, tables, networks) are ignored. The object list may optionally be filtered by the filter list, and the output further modified by a standard filter mode.

If the recursive flag is set, and the dataset contains other datasets as objects, datasets in these nested datasets are also listed.

This command is not equivalent of the dataset dataset command!

Example:

set dlist [dataset datasets $dhandle]

dataset defined

dataset defined dhandle property

This command checks whether a property is defined for the dataset. This is explained in more detail in the section about property validity checking. Note that this is not a check for the presence of property data! The dataset valid command is used for this purpose.

dataset delete

dataset delete ?datasethandlelist/all?...

This command destroys datasets and everything contained therein. The special handle value all may be used to delete all datasets in the application at once.

The command returns the number of datasets which were successfully deleted.

Transient datasets cannot be used with this command. Neither can be datasets which are a component of another object, e.g. the internal datasets of tables or factories. These are only and automatically deleted when their parent object is destroyed. Datasets which are a property value are also undeletable by this command.

It is a common programming error to delete a dataset, or its parent object if one exists, without protecting its current member ensembles or reactions. If they are still needed in later processing they need to be explicitly transferred into another dataset or outside it.

Examples:

dataset delete all
dataset move $dhandle {}; dataset delete $dhandle

The first example destroys all datasets defined in the current script and everything contained in them. The second example shows how to delete a dataset and preserve its contents by moving all dataset elements out prior to deletion.

dataset dget

dataset dget dhandle propertylist ?filterset? ?parameterlist?

Standard data manipulation command for reading object data. It is explained in more detail in the section about retrieving property data.

For examples, see the dataset get command. The difference between dataset get and dataset dget is that the latter does not attempt computation of property data, but rather initializes the property values to the default and return that default if the data is not yet available. For data already present, dataset get and dataset dget are equivalent.

dataset dup

dataset dup dhandle ?targethandle? ?cleartarget?

If the optional arguments are not supplied, the dataset with all data attached to dataset and all objects which are contained in it are duplicated. The command returns a new dataset handle. All duplicated objects in the new datasets also are assigned handles which can be obtained by commands such as dataset list $dhandle .

It is possible to specify a target dataset as an optional argument. In that case, no new dataset is created, and dataset-level property data on the source dataset is not copied. All objects in the source dataset are duplicated and appended to the end of the target dataset. In case the boolean target clearance flag is set, which is also the default if the parameter is omitted, the target dataset is cleared before the new objects from the source dataset are added. In this command variant, the return value of the command is the target dataset handle.

Examples:

dataset dup $dhandle
dataset dup [list $eh1 $eh2] $dtarget 0

dataset ens

dataset ens dhandle ?filterset? ?filtermode? ?recursive?

Return a list of all the ensembles in the dataset. Other objects (reactions, tables, datasets, networks) are ignored. The object list may optionally be filtered by the filter list, and the output further modified by a standard filter mode.

If the optional boolean recursive argument is set, ensembles which are a component of a reaction in the dataset are also listed. Furthermore, if the dataset contains datasets as elements, these are recursively traversed, and ensembles in these, as well as ensembles in reactions in these datasets, are listed. If the output mode of the command is a handle list, items found by recursion are appended to the result list in a straight fashion, without the creation of nested lists. By default the recursion flag is off. Regardless of the flag value, ensembles which are associated with rows of a table in the dataset, but are not themselves dataset members, are not output.

Example:

set elist [dataset ens $dhandle astereogenic]

lists those ensembles in the dataset which have one or more atoms which are potential atom stereo centers.

set cnt [dataset ens $dhandle {} count 1]

returns a count of all ensembles which are either directly members of the dataset, or indirectly as component objects of reactions in the dataset, or which are contained in datasets which are a themselves a member of the primary dataset.

dataset exists

dataset exists dhandle

Check whether this dataset exists. The command returns a boolean value. This command cannot be used with transient datasets.

Example:

dataset exists $dhandle

dataset expr

dataset expr dhandle expression

Compute a standard SQL -style property expression for the dataset. This is explained in detail in the chapter on property expressions.

dataset extract

dataset extract dhandle propertylist ?filterset? ?filterprocs?

This command is rather complex and closely related to the dataset xlabel command. It was designed for the efficient extraction of major or minor object data for filtered subsets of the dataset.

The property list parameter selects the property data which is extracted. Multiple properties may be specified, but they can only be associated with major objects and one arbitrary minor object class. So it is possible to simultaneously extract an ensemble and an atom property, but not an atom and a bond property.

The return value is a nested list of data items for every object which is encountered while traversing the dataset on the level of the minor object associated with the extraction property, or just ensembles or other major objects if no such property is selected. Every list element is itself a list which contains the extracted property values in the order they are named in the property list parameter.

The objects for which data is returned can further be filtered by a standard filter set, and additionally by a list of filter procedures. These Tcl script procedures are called with the respective object handles and object labels as arguments. For example, a callback function used in an atom retrieval context would be called for each atom with its ensemble handle and the atom label as arguments. If major objects without a label are checked, such as complete ensembles, 1 is passed as the label. The callback procedures are expected to return a boolean value. If it is false or 0, the object is not added to the returned list, and the other check procedures are no longer called.

The command currently only works on ensembles in the dataset, ignoring any reactions, tables, datasets or networks which may be present.

Because this command is primarily intended for numerical data display, the returned values are formatted as with the nget command, i.e. instead of enumerated values the underlying numerical values are returned.

Example:

set dhandle [dataset create [ens create CO] [ens create CN]]
dataset extract $dhandle [list E_NAME A_SYMBOL] !hydrogen

This example first creates a dataset with methanol and methylamine . The second line performs the actual extraction and returns

{CH4O C} {CH4O O} {CH5N C} {CH5N N}

This kind of extracted data is useful for the display of filtered atomic (and other minor object’s) property values.

dataset forget

dataset forget dhandle ?objectclass?

This command is essentially the same as the ens forget (or reaction forget , etc) command. It is applied to all objects in the dataset.

If the object class is dataset , all dataset-level property data is deleted.

dataset get

dataset get dhandle propertylist ?filterset? ?parameterlist?
dataset get dhandle attribute

Standard data manipulation command for reading object data. It is explained in more detail in the section about retrieving property data.

In addition to retrieving property data, it can also be used to query dataset attributes. The set of supported attributes is detailed in the paragraph on the dataset set command.

Examples:

dataset get $dhandle {D_NAME D_SIZE}

yields the name and size of the dataset as a list. If the information is not yet available, an attempt is made to compute it. If the computation fails, an error results.

dataset get $dhandle [list E_FORMULA E_WEIGHT]

gives the formula and molecular weight of all dataset ensembles. The result is delivered as a nested list. The first list are the formulas, the second list contains the weights.

Currently, it is not possible to use filters with this command (and the other retrieval command variants) which are not operating directly on the dataset object, but on objects lower in the hierarchy such as ensembles or atoms.

For the use of the optional property parameter list argument, refer to the documentation of the ens get command.

Variants of the dataset get command are dataset new, dataset dget, dataset nget, dataset show, dataset sqldget, dataset sqlget, dataset sqlnew and dataset sqlshow .

dataset getparam

dataset getparam dhandle property ?key? ?default?

Retrieve a named computation parameter from valid property data. If the key is not present in the parameter list, an empty string is returned. If a default value is set, that value is returned in case the key is not found.

If the key parameter is omitted, a complete set of the parameters used for computation of the property value is returned in key/value format.

This command does not attempt to compute property data. If the specified property is not present, an error results.

Example:

dataset getparam $dhandle E_GIF format

returns the actual format of the image, which could be GIF , PNG , or various bitmap formats.

dataset hadd

dataset hadd dhandle ?filterset? ?flags? ?changeset?

Add a standard set of hydrogens to all ensembles and reactions in the dataset. If the filterset parameter is specified, only those atoms which pass the filter set are processed.

Additional operation flags may be activated by setting the flags parameter to a list of flag names, or a numerical value representing the bit-ored values of the selected flags. By default, the flag set is empty, corresponding to the use of an empty string or none as parameter value. These flags are currently supported:

Adding hydrogens with this command is less destructive to the property data set of the ensembles or reactions than adding them with individual atom create/bond create commands, because many properties are defined to be indifferent to explicit hydrogen status changes, but are invalidated if the structure is changed in other ways.

If the effects of the hydrogen addition step to the validity of the property data set should not be handled with this standard procedure, it is possible to explicitly generate additional property invalidation events by specifying a list as the optional last parameter, for example a list of atom and bond to trigger both the atom change and bond change events.

The command returns the total number of hydrogens added to all ensembles and reactions in the dataset.

Example:

dataset hadd $dhandle

dataset hread

dataset hread dhandle ?datasethandle|enshandle? ?#recs|batch|all?

This command provides the same functionality as dataset read , but additionally adds a stand set of hydrogen atoms to the read duplicate objects.

The command arguments are explained in the section on dataset read .

dataset hstrip

dataset hstrip dhandle ?flags? ?changeset?

This command removes hydrogens from the dataset ensembles and reactions. By default, all hydrogen atoms in the dataset ensembles or reactions are removed.

The flags parameter can be used to make the operation more selective. It may be a list of the following flags:

If the flags parameter is an empty string, or none , it is ignored. The default flag value is wedgetransfer - but the default value is overridden if any flags are set!

If the changeset parameter is given, all property change events listed in the parameter are triggered.

Hydrogen stripping is not as disruptive to the ensemble or reaction data content as normal atom deletion. The system assumes that this operation is done as part of some file output or visualization preparation. However, if any new data is computed after stripping, the computation functions see the stripped structure, and proceed to work on that reduced structure without knowledge that there are implicit hydrogens.

Example:

dataset hstrip $dhandle [list keeporiginal wedgetransfer]

dataset index

dataset index dhandle
dataset index dhandle position

This command comes in two variants. The tree-word version is the generic command to check dataset memberships, which is the same for all objects which can be dataset members, while the second version retrieves object references from this dataset.

This first version gets the position of the dataset in the object list of its parent dataset. If the dataset is not part of a parent dataset, -1 is returned. This is the generic dataset membership test command variant.

This second command variant obtains the object handle of the object at the specified position in this dataset. Position counting begins with zero. If the index is outside the object position range, an empty string is returned. The special value end may be used to address the last object. The indexed object remains in the dataset.

Note that this index command is not equivalent to the standard index command on minor objects which is used to obtain the position of the minor object in the minor object list of the controlling major object. This kind of functionality is not needed for major objects, because they are not contained in any minor object list.

Example:

dataset index $dhandle end

dataset jointhreads

dataset jointhreads ?all?
dataset jointhreads dhandle ?all?
dataset jointhreads dhandle threadid..

This is an alias for the dataset cancelthreads command. Please refer to its documentation.

dataset list

dataset list ?dhandle?

Without a handle argument, the command returns a list of the handles of all existing datasets.

If a dataset handle or transient dataset is passed as third argument, the command returns a list of all major objects in the dataset. This function is different from the behavior of the list subcommand for other major object classes, where the optional argument is a filter list.

Examples:

dataset list
dataset list $dhandle

dataset lock

dataset lock filehandle propertylist/dataset/all ?compute?

Lock property data of the dataset handle, meaning that it is no longer subject to the standard data consistency manager control. The data consistency manager deletes specific property data if anything is done to the dataset handle which would invalidate the information. Property data remains locked until is it explicitly unlocked.

The property data to lock can be selected by providing a list of the following identifiers:

A lock can be released by a dataset unlock command.

This command does not recurse into the objects contained in the dataset.

The return value is the dataset handle or, if the dataset was transient, an empty string.

dataset loop

dataset loop dhandle objvar ?maxrec? ?offset? body

Loop over the elements in a dataset. This command is similar to molfile loop . On each iteration, the variable is set to the handle of the current member object, and then the body code is executed. The variable refers to the original dataset element, not a duplicate. This is different from dataset read.

All operations on the current loop item are allowed, including deletion. However, the next object after the current item must not be deleted or moved, because it is needed for the iteration process.

If a maximum record count is set, the loop terminates after the specified number of iterations. If the maximum record argument is set to an empty string, a negative value, or all , the loop covers all dataset elements. This is also the default.

Within the loop, the standard Tcl break and continue commands work as expected. If the body script generates an error, the loop is exited.

If no offset is specified, the loop starts at the first element. Within the loop body, the dataset attribute record is continuously updated to indicate the current loop position. Its value starts with one, like file records in the molfile loop command.

Example:

dataset loop $dh eh {
	puts „[ens get $eh E_NAME] at position[ens index $eh]“
}

dataset max

dataset max dhandle propertylist ?filterset?

Get the maximum value of one or more properties in from the elements in the dataset. The property argument may be any property attached to dataset members, or minor objects thereof. If the filterset argument is specified, the maximum value is searched only for objects which pass the filter set.

Examples:

dataset max $dhandle E_WEIGHT
dataset max [list $ehandle1 $ehandle2] A_SIGMA_CHARGE carbon

The first example finds the highest molecular weight in the dataset. The second example finds the largest (most positive) Gasteiger partial charge on any carbon atom in the two argument ensembles, which form a transient dataset.

dataset metadata

dataset metadata dhandle property field ?value?

Obtain property metadata information, or set it. The handling of property metadata is explained in more detail in its own introductory section. The related commands dataset setparam and dataset getparam can be used for convenient manipulation of specific keys in the computation parameter field. Metadata can only be read from or set on valid property data.

Examples:

array set gifparams [dataset metadata $dhandle D_GIF parameters]
dataset metadata $dhandle D_QUALITY comment “This value looks suspicious to me”

The first line retrieves the computation parameters of the property D_GIF as keyword/value pairs. These are read into the array variable gifparams , and may subsequently be accessed as $gifparams(format) , $gifparams(height) , etc. The second example shows how to attach a comment to a property value.

dataset min

dataset min dhandle propertylist ?filterset?

Get the minimum value of one or more properties from the elements in the dataset. The property argument may be any property attached to dataset sub-elements, or minor objects thereof. If the filterset argument is specified, the minimum value is searched only for objects which pass the filter set.

Examples:

dataset min $dhandle E_WEIGHT
dataset min [list $ehandle1 $ehandle2] A_SIGMA_CHARGE carbon

The first example finds the smallest molecular weight in the dataset. The second example finds the smallest (most negative, or smallest positive) Gasteiger partial charge on any carbon atom in the two argument ensembles, which form a transient dataset.

dataset molfile

dataset molfile dhandle ?filterset?

Return the handle of the molfile object associated with the dataset as backing page file. If no such file object exists, and empty string is returned.

Example:

set fh [dataset molfile $dh]
set fh [dataset get $dh pagefile]

The two commands are equivalent.

dataset move

dataset move dhandle datasethandle|remotehandle ?position?

Move, depending on the acceptance flags of the destination dataset, either the objects in the dataset or transient dataset into another local or remote dataset, or move the dataset itself. If the destination dataset handle is an empty string, the dataset objects are removed from the original dataset, but not moved into any other dataset. If the destination dataset accepts datasets as members, which is not the default (see the accept attribute in the section on dataset set ) the dataset is directly moved as object. Otherwise, its contained objects are moved, under preservation of the object order from the source dataset, and the source dataset is emptied, but not deleted.

Optionally, a position in the new dataset for the first moved object may be specified. This parameter is either an index (beginning with 0), or end , which is the default. If the contents of a dataset are spliced into another at a specific position, objects after the first element of the source dataset follow as a block.

Another special position value is random . This value moves to the dataset, or dataset contents, to a random position in the target dataset. Use of this mode with remote datasets is currently not supported.

In case of a transient command dataset the original dataset memberships of the dataset objects are not restored when the command completes.

The return value of the command is the original parent dataset of the command dataset, as it existed before the move. Usually, it is an empty string.

A dataset cannot be moved into itself.

Examples:

dataset move $dhandle $dhandle2 0
dataset move $dhandle {}
dataset move [ens list] [dataset create]

The first line moves all objects in the source dataset into the first (and following) positions in the destination dataset. The second example removes all elements from the dataset. This is often useful in order to avoid dataset member destruction with the dataset delete command. The final example shows how to move a set of ensembles (here: all ensembles currently defined in the application) into a newly created dataset via an intermediate, transient dataset.

dataset move $dhandle vioxx@server55:10001

This command moves all objects in the first dataset to the remote dataset on host server55 , which listens on port 10001 and requires the pass phrase vioxx for access.

dataset mutex

dataset mutex dhandle mode

Manipulate the object mutex. During the execution of a script command, the mutex of the major object(s) associated with the command are automatically locked and unlocked, so that the operation of the command is thread-safe. This applies to builds that support multi-threading, either by allowing multiple parallel script interpreters in separate threads or by supporting helper threads for the acceleration of command execution or background information processing. This command locks major objects for a period of time that exceeds a single command. A lock on the object can only be released from the same interpreter thread that set the lock. Any other threaded interpreters, or auxiliary threads, block until a mutex release command has been executed when accessing a locked command object. This command supports the following modes:

There is no trylock command variant because the command already needs to be able to acquire a transient object mutex lock for its execution.

dataset need

dataset need dhandle propertylist ?mode?

Standard command for the computation of property data, without immediate retrieval of results. This command is explained in more detail in the section about retrieving property data.

If the dataset is not transient, the return value is the dataset handle.

Example:

dataset need $dhandle D_GIF recalc

dataset networks

dataset networks dhandle ?filterset? ?filtermode? ?recursive?

Return a list of all the networks in the dataset. Other objects (ensembles, reactions, datasets, tables) are ignored. The object list may optionally be filtered by the filter list, and the result further modified by a standard filter mode argument.

If the recursive flag is set, and the dataset contains other datasets as objects, networks in these nested datasets are also listed.

Example:

set n [dataset networks $dhandle {} count]

dataset new

dataset new dhandle propertylist ?filterset? ?parameterlist?

Standard data manipulation command for reading object data. It is explained in more detail in the section about retrieving property data.

For examples, see the dataset get command. The difference between dataset get and dataset new is that the latter forces the re-computation of the property data, regardless whether it is present and valid, or not.

dataset nget

dataset nget dhandle propertylist ?filterset? ?parameterlist?

Standard data manipulation command for reading object data and attributes. It is explained in more detail in the section about retrieving property data.

For examples, see the dataset get command. The difference between dataset get and dataset nget is that the latter always returns numeric data, even if symbolic names for the values are available.

dataset nitrostyle

dataset nitrostyle dhandle style

Change the internal encoding of nitro groups and similar functional groups in the ensembles and reactions in the dataset. Possible values for the style parameter are:

The command returns the dataset handle.

dataset objects

dataset objects dhandle ?pattern?

This is a non-standard cross-referencing command. The result is a list of all the objects in the dataset, where each result list element is a list consisting of the object type (ens, reaction, table, network, dataset), and the object handle. Optionally, the list objects may be filtered by the filters in the filterset argument.

Example:

dataset objects $dhandle ens*

is roughly equivalent to

dataset ens $dhandle

except that the latter only lists the ensemble handles, not pairs of object class name and handle.

dataset pack

dataset pack dhandle ?maxsize? ?requestlist? ?suppresslist?

Pack the dataset and all objects it contains into a base-64 encoded, compressed string as a serialized object. The string does not contain any non-printing characters, quotation marks or other problematic characters and is thus well suited for storage in database tables and similar applications. These packed strings are portable and platform-independent.

By default, all property data on the dataset and its member objects are stored. By providing a request list of properties which are computed if they are not yet present, and/or a list of properties not to store, the data content may be customized.

The maxsize parameter can be used to limit the maximum length of the packed string by setting a maximum length in bytes. The default value are 128K bytes. If the string would be longer, an error is generated.

The return value of this command is the packed string.

Example:

dataset pack $dhandle

dataset pop

dataset pop dhandle|remotehandle ?position? ?timeout?

Remove an object from a dataset. The handle of the selected object is returned, and the object is no longer a member of the dataset when the command completes. If a timeout is specified, it is transferred to the dataset attribute of the same name before the command is executed, as with a dataset set command.

By default the first object in the dataset, at index zero, is returned. A different object can be selected by means of the optional position argument. It can be a numerical index, or end for the last object. If the object index if larger than the maximum index of any object, it is silently rewritten to end .

This command works with remote datasets. In that case, the object is transferred via an intermediate serialized object representation over the network. It is unpacked on the local interpreter, and deleted on the remote interpreter.

If the desired dataset object cannot be found, and a timeout is set, including a negative value for an unlimited wait time, the command suspends execution until the object appears in the dataset, for example from a different script thread or as result of a remote object insertion. If a wait would be executed, but the eod/targeteod parameter pair of the dataset indicate that no further data can be expected, the command returns an empty string instead of the object handle, but does not trigger an error. Otherwise, if the object cannot be delivered immediately or after the timeout, an error results.

Example:

set eh [dataset pop $eh end]

dataset properties

dataset properties dhandle ?pattern? ?intersect/union?

Get a list of valid properties of the dataset proper and the dataset objects. By default, both dataset properties (prefix D_ ) as well as the properties of the objects in the dataset (prefix E_ for ensembles, X_ for reactions, T_ for tables, N_ for networks, D_ for datasets as members) and the properties of their minor objects (atoms, bonds, etc.) are listed. Property subsets may be selected by specifying a string filter pattern. In case of dataset element properties which are not present in all dataset members, the default intersect mode is union, meaning that all properties are reported for which at least a single instance in any member exists. The alternative mode intersect only lists those dataset element properties which are present at all dataset members.

This command may also be invoked as dataset props .

Example:

dataset properties $dhandle D_*
dataset props $dhandle E_* intersect

The first example returns a list of the currently valid dataset-level properties. The second example lists ensemble properties which are present in all dataset objects.

dataset purge

dataset purge dhandle propertylist ?emptyonly?

Delete property data from the dataset. The properties may be both dataset properties (prefix D_ ) or properties of the dataset members, such as ensemble or atom properties. If a property marked for deletion is not present on an object, it is silently ignored.

Besides normal property names, a few convenient alias names for common property deletion tasks of ensembles in a dataset, or the reaction ensembles of reactions in the dataset, are defined and can be used as a replacement for the property list. These include:

The optional boolean flag emptyonly restricts the deletion to those properties where all the values for a property associated with a major object (such as on all atoms in an ensemble for atom properties, or just the single ensemble property value for ensemble properties) are set to the default property value.

Examples:

dataset purge $dhandle D_GIF
dataset purge [ens list] E_IDENT 1
dataset purge $dhandle stereochemistry

The first example deletes the property data D_GIF for the selected dataset if it is present. The second example deletes property E_IDENT from all ensembles in the current application if their property value is equal to the default value of E_IDENT . The third examples removes stereochemistry from all dataset ensembles.

dataset reactions

dataset reactions dhandle ?filterset? ?filtermode? ?recursive?

Return a list of all the reactions in the dataset. Other objects (ensembles, tables. datasets, networks) are ignored. The object list may optionally be filtered by the filter list, and the output further modified by a standard filter mode.

If the optional boolean recursive argument is set, reactions of which ensembles in the dataset are a component are also listed. Furthermore, if the dataset contains datasets as elements, these are recursively traversed, and reactions in these, as well as reactions as components of ensembles in these datasets, are listed. If the output mode of the command is a handle list, items found by recursion are appended in a straight fashion, without the creation of nested lists. By default the recursion flag is off. Regardless of the flag value, reactions which are associated with rows of a table in the dataset, but are not themselves dataset members, are not output.

Example:

set xlist [dataset reactions $dhandle]

Return a list of the handles of the reactions in the dataset.

set cnt [dataset reactions $dhandle {} count 1]

returns a count of all reactions which are either directly members of the dataset, or indirectly because ensembles in the dataset are part of a reaction, or which are contained in datasets which are a themselves a member of the primary dataset.

dataset read

dataset read dhandle ?datasethandle/enshandle? ?#recs|batch|all?

This command returns duplicates of one or more objects from the current dataset iterator position ( record attribute). Its arguments mimic those of the molfile read command. The iterator record attribute is automatically incremented. When the end of the dataset is reached, an empty result is returned, but no error is raised.

The return value is usually the handle of the object duplicated from the dataset member at the current read position. If an optional target dataset has been specified. the object is appended to that dataset, and the return value is the target dataset handle. It is also possible to use the magic dataset handles new or #auto , which create a new receptor dataset.

If instead of a target dataset am existing target ensemble is specified, the recipient ensemble is cleared, and the read dataset object placed into its hull without changing its handle. This requires that the read object is an ensemble, and not a reaction, table, dataset or network, and that only a single item is read. It is also possible to use an empty argument to skip these options.

By default, a single object is duplicated and the iterator record attribute of the dataset incremented by one. With the optional third argument, a different number of objects can be selected for reading as a block. The special value all reads all remaining objects, and batch copies a number of objects corresponding to the batchsize dataset attribute. If there are insufficient objects in the dataset to read all requested records, only the available set is returned, and no error results.

The dataset contents are not changed by this command. All extracted items are object duplicates. In order to fetch original objects from the dataset, use the dataset pop command, or the various object move commands.

The command variant dataset hread provides the same functionality as this command, but additionally adds a standard set of hydrogen atoms to the duplicates.

dataset rename

dataset rename dhandle srcproperty dstproperty

This is a variant of the dataset assign command. Please refer the command description in that paragraph.

dataset request

dataset request dhandle propertylist ?reload? ?modelist?

Request property data for a dataset when the dataset is not maintained locally, but a partial shadow copy of a remotely managed dataset. It is assumed to have been only partially transferred via RPC to a slave from a master controller application, for example for display purposes, but without the full data content, which resides on the master.

If the requested property data is already present on the slave, and the reload flag is not set, this command is equivalent to a dataset need command and does not invoke communication with the master. Otherwise, the master is asked to provide the information, which may be calculated on the master only after receiving the request, or even delegated by the master to another remote server for computation.

Once the requested data has been received by the slave, it is added to the property data set of the local dataset copy. The optional modelist parameter is the same as in the dataset need command. This command is used to guarantee that critical or non-computable property data is obtained from the master. Local, unsynchronized data may still be computed by the slave using standard property data access commands. It is currently not possible to send data back to the master.

This command is only available on toolkit versions which have been compiled with RPC support.

In the absence of errors, the command returns a boolean status code. If it is zero, the request failed in a non-critical way. This for example happens in case the dataset is not under control of a remote application.

Example:

if {![dataset request $dhandle A_XY]} {
	dataset need $dhandle A_XY
}

is a bullet proof method of guaranteeing that correct atomic 2D display coordinates are present for the dataset structures even if the script is run in a master/slave context.

dataset rewind

dataset rewind dhandle

Reset the dataset iterator record. This is equivalent to setting the record attribute to one.

dataset scan

dataset scan dhandle expression ?mode? ?parameters?

Perform a query on the dataset or transient dataset. The syntax of the query expression is the same as that of the molfile scan command and explained in more detail in its section on query expressions. Essentially, this command behaves like an in-memory data file version of the molfile scan command. However, currently queries work on ensembles and reactions as dataset members only. Any table, network or other object which is a member of a scanned dataset is skipped. Skipped items still count as records for positioning and query result output. In the absence of a specified scan record list (order parameter), dataset scans begin at the current position of the iterator record attribute that is shared with the dataset read/hread commands.

The optional parameter dictionary is the same as for molfile scan , but not all parameters are actually used. At this time, only the matchcallback, maxhits, maxscan, order, progresscallback, progresscallbackfrequency, sscheckcallback, startposition and target parameters have an effect. If result ensembles or reactions are transferred to a remote dataset via the target parameter, they are not deleted from the local dataset but duplicates are created instead. This is because the original objects are members of the dataset which, just like a structure file would, should remain unchanged as result of a scan. In contrast, in file scans, the transferred ensembles and reactions were read from file and created as new objects during the scan, and sending these does not change the underlying file. In case a progress callback function is used, the dataset handle is passed as argument in place of the molfile handle in molfile scan .

The return value depends on the mode. The default mode is enslist . The following modes are supported for dataset queries:

If requested property data is not present on the matched dataset objects, an attempt is made to compute it. If this fails, the table object in retrieval mode table contains NULL cells, and property retrieval as list data produces empty list elements, but no errors. For minor object properties, the property list retrieval modes produce lists of all object property values instead of a single value. In table mode, only the data for the first object is retrieved, which makes this mode less suitable for direct minor object property retrieval.

The following pseudo properties can be retrieved in addition to normal properties:

These pseudo properties are identical to those available for structure file queries. However, structure file queries support a couple of additional pseudo properties which are not available for dataset queries.

Examples:

dataset scan $dhandle {E_WEIGHT < 200} recordlist
dataset scan $dhandle “structure >= c1ccccc1” {table E_NAME E_LOPG record}
dataset scan $dhandle “structure >~ $sshnd 90” {cmpvalue E_REACTION_ROLE X_IDENT}

The first example returns the record numbers (dataset member indices plus one) of all structures in the dataset which have a molecular weight of less than 200.

The seconds example generates a table with columns for name, logP and record number. The table is filled with data from all structures which contain a phenyl ring as substructure.

The final example returns a nested list of the properties of all dataset structures which have a Tanimoto similarity of 90% or more to the structure which is represented by its handle stored in the variable $sshnd . In this example, the ensembles are expected to be also part of a reaction, which is possible since reaction and dataset membership are completely unrelated. Each result list element contains the actual similarity value (which is the only comparison result value with a threshold evaluated in the query, so there is no ambiguity which comparison result cmpvalue refers to), the role of the ensemble in the reaction ( reagent , product , catalyst , etc.) from property E_REACTION_ROLE , and the reaction ID in X_IDENT . The scan mode is here automatically set to propertylist , because the mode list consists exclusively of names of properties and pseudo properties.

Another example:

set is_chno [dataset scan $ehandle {formula = C0-H0-N0-O0-} count]

This command checks whether the ensemble (which is, for the duration of the command, embedded into a transient dataset) contains only elements C, H, N and O.

dataset set

dataset set dhandle property value ?property value?...

Standard data manipulation command. It is explained in more detail in the section about setting property data.

In addition to property data, the dataset object possesses a few attributes, which can be retrieved with the get command (but not its related sister subcommands like dget , sqlget , etc.). Many of them are also modifiable via dataset set. These attributes are:

Examples:

dataset set $dhandle D_NAME “New lead structures”
dataset set $dhandle E_NAME “Lead (metal)”

The first line is a simple set operation for a dataset property. The second line shows how to set properties of multiple ensembles in one step. The same property value is assigned to all ensembles.

dataset set $dhandle port 10001 passphrase blockbuster

Set up a listener thread on port 10001 which accepts connections from remote interpreters which need to present the pass phrase as credential. Remote interpreters can add ( ens move , reaction move , table move ) or remove ( dataset pop ) objects to or from this dataset, as well as query the dataset object count ( dataset count ). Objects are transferred over the network connection as serialized objects to and from the remote interpreters.

dataset setparam

dataset setparam dhandle property key value ?key value?...

Set or update a property computation parameter in the parameter list of a valid property. This command is described in the section about retrieving property data.

Example:

dataset setparam $dhandle D_GIF comment “Top Secret”

dataset show

dataset show dhandle propertylist ?filterset? ?parameterlist?

Standard data manipulation command for reading object data. It is explained in more detail in the section about retrieving property data.

For examples, see the dataset get command. The difference between dataset get and dataset show is that the latter does not attempt computation of property data, but raises an error if the data is not present and valid. For data already present, dataset get and dataset show are equivalent.

dataset sort

dataset sort dhandle {property ?direction? ?cmpflags?}..

Sort a dataset according to property values of the objects in the dataset. If no sort property set is specified, the default sort properties are E_NATOMS (number of atoms) and, for breaking ties, E_WEIGHT (molecular weight) and finally E_HASHISY (stereo isotope hash code).

Every sort item is interpreted as a nested list and can have from one to three elements. The first, mandatory element is the sort property, or one of the magic names record or random . The next optional element is the sort direction, specified as up (or ascending ) or down ( descending ). The default sorting order is ascending. The final optional comparison flags parameter can be set to a combination of any of the values allowed with the prop compare command. The default is an empty flag set. Properties in the sort list have precedence in the order they are specified in. Object property values of comparison list entries to the right in this list are only considered if the comparison of all data values of list elements to the left results in a tie.

The magic property name record sorts by the object index in the dataset. Sorting upwards on this property does not change the object sequence in the dataset, and sorting downwards reverses it. This pseudo property is always added as a final implicit criterion, so that the sequence order of objects tied in all explicit comparisons is preserved. The other magic property name random assigns a random value to all dataset objects and sorts on this value, yielding a random object sequence.

The command returns a list of the handles of the objects controlled by the dataset in the newly sorted order. Simultaneously, the objects are physically moved within the dataset, so the sort has a persistent effect. The same result list may later be obtained by a dataset objects command.

It is possible to sort transient datasets, but this makes sense only if the object list sequence returned as command result is captured and used later, because the sort effect is not persistent since there exists no permanent dataset object.

Examples:

dataset sort $dhandle {E_NAME up {ignorecase lazy}]

The example sorts the dataset according to the compound name (property E_NAME , data type string) in alphabetic order, using a lazy (ignoring whitespace and punctuation) and case-insensitive comparison mode.

dataset sort $dhandle {E_NATOMS down} {E_NRINGS up}

Sort the dataset in such a way that the ensembles with the largest number of atoms, and among these those with the smallest number of rings, come first.

dataset sort $dhandle random

This command randomizes the object order in the dataset.

dataset sort $dhandle {*}$sortlist

This is the recommended construct when using a sort property list store in a Tcl variable as command argument. Older versions of the dataset sort command used a single sort argument parameter instead of a variable-size argument set.

dataset sqldget

dataset sqldget dhandle propertylist ?filterset? ?parameterlist?

Standard data manipulation command for reading object data. It is explained in more detail in the section about retrieving property data.

For examples, see the dataset get command. The differences between dataset get and dataset sqldget are that the latter does not attempt computation of property data, but initializes the property value to the default and returns that default, if the data is not present and valid; and that the SQL command variant formats the data as SQL values rather than for Tcl script processing.

dataset sqlget

dataset sqlget dhandle propertylist ?filterset? ?parameterlist?

Standard data manipulation command for reading object data. It is explained in more detail in the section about retrieving property data.

For examples, see the dataset get command. The difference between dataset get and dataset sqlget is that the SQL command variant formats the data as SQL values rather than for Tcl script processing.

dataset sqlnew

dataset sqlnew dhandle propertylist ?filterset? ?parameterlist?

Standard data manipulation command for reading object data. It is explained in more detail in the section about retrieving property data.

For examples, see the dataset get command. The differences between dataset get and dataset sqlnew are that the latter forces re-computation of the property data, and that the SQL command variant formats the data as SQL values rather than for Tcl script processing.

dataset sqlshow

dataset sqlshow dhandle propertylist ?filterset? ?parameterlist?

Standard data manipulation command for reading object data. It is explained in more detail in the section about retrieving property data.

For examples, see the dataset get command. The differences between dataset get and dataset sqlshow are that the latter does not attempt computation of property data, but raises an error if the data is not present and valid, and that the SQL command variant formats the data as SQL values rather than for Tcl script processing.

dataset statistics

dataset statistics dhandle property

Get basic statistics on the property values of the objects in the dataset. The property can be a basic property or a property subfield, but its element data type needs to be cast-able to a simple numeric type. In addition, it must be directly attached to any of the objects which can be members of a dataset, e.g. an ensemble property, but not an atom property.

If the property data is not present on any of the objects, an attempt is made to compute it. In case that fails, or a dataset member object is not of a matching type, these objects are silently skipped.

The return value is a list containing, in this order, the number of objects in the dataset which were used for the statistics, the property value sum, the property value average and the property data standard deviation. The latter three values are floating point, regardless of the property data type. In case any of these values are not computable, for example because there were an insufficient number of objects, the reported value is zero.

The command verb can be abbreviated as stats .

Example:

lassign [dataset statistics $dh E_WEIGHT] n sum avg stddev]

dataset subcommands

dataset subcommands

Lists all subcommands of the dataset command. Note that this command does not require a dataset handle.

dataset tables

dataset tables dhandle ?filterset? ?filtermode? ?recursive?

Return a list of all the tables in the dataset. Other objects in the dataset (ensembles, reactions, datasets, networks) are ignored. The object list may optionally be filtered by the filter list, and the result further modified by a standard filter mode argument.

If the recursive flag is set, and the dataset contains other datasets as objects, tables in these nested datasets are also listed.

Example:

set n [dataset tables $dhandle {} count]

dataset taint

dataset taint dhandle propertylist/changeset ?purge?

Trigger a property data tainting event which acts on the dataset data, and all objects and their data contained in the dataset.

The parameters of this command are the same as for ens taint and explained there.

Example:

dataset taint $dhandle A_XYZ

All property data on the dataset and the dataset members is invalidated if it directly or indirectly depends on the 3D atomic coordinates.

dataset transform

dataset transform dhandle SMIRKSlist ?direction? ?reactionmode?
	?selectionmode? ?flags? ?overlapmode? ?{?exclusionmode? excludesslist}?	?maxstructures? ?timeout? ?maxtransforms? ?niterations?

This command is complex, but very similar to the ens transform command. Please refer to that command for a full description of the command arguments. The major difference is that the start structure set is not a single ensemble, but rather the set of all ensembles in the dataset. Any dataset items which are not ensembles are ignored. The return value is, just as with the ens transform command, a list of result ensembles. These do not become part of the input dataset.

Example:

dataset transform [ens get $ehandle E_KEKULESET] $trafolist bidirectional \
	multistep all {preservecharges checkaro setname}

This command first expands an ensemble object into a set of Kekulé structures. The property data type of the E_KEKULESET property is a dataset, so its handle is returned, and this dataset is then submitted for further transformation, which in this case involves manipulations of bonds in aromatic systems and thus is dependent on the Kekulé structures of the input ensembles.

The dataset variant of the transform command does not allow the use of marked or unmarked atom or bond specifications in the exclusion substructure list. Normal substructures are supported, and are applied to all start structures.

dataset unique

dataset unique dhandle {property ?direction? ?cmpflags?}..

This command removes duplicate objects from the dataset and destroys them. Object identity is determined by pair-wise comparison of one or more properties. If all these properties are identical for any two objects, one of them is deleted. If no properties are specified, the default is the single property E_HASHISY , the standard isotope- and stereo-aware ensemble hash code.

The command returns the ordered list of objects remaining in the dataset after deletion. The command is closely related to the dataset sort command, and the same restrictions on usable sort properties apply. Internally, the command performs a sort first, in order to avoid a quadratic growth of pair-wise comparisons. This has the side effect that the object order in the dataset is not preserved. Instead, the surviving objects are listed in ascending (by default) or descending (if the corresponding optional sort direction argument is set accordingly) values of the sort properties. The interpretation of the optional comparison flags and sort direction arguments, as well as the priority of the properties, and the special considerations when working on transient datasets, are the same as for the command dataset sort .

Example:

molfile read $fh $dh all
dataset unique $dh

This command first reads a complete file into a dataset, and then discard duplicates, using the default isotope- and stereo-aware structure hash code.

dataset unlock

dataset unlock dhandle propertylist/dataset/all

Unlock property data for the dataset object, meaning that they are again under the control of the standard data consistency manager.

The property data to unlock can be selected by providing a list of the following identifiers:

Property data locks are obtained by the dataset lock command.

This command does not recurse into the objects contained in the dataset.

The return value is the dataset handle, or, if the argument was a transient dataset, an empty string.

dataset unpack

dataset unpack string

Generate a dataset complete with all elements it contains from a packed, base64-encoded serialized object string, as it is generated by the complementary dataset pack command.

The return value is the handle of the new dataset.. All objects in this dataset also are assigned standard handles, which can be retrieved with the usual commands such as dataset ens and dataset reactions .

Note that this command does not take a dataset handle as argument, but a pack string.

Example:

dataset unpack [dataset pack $dhandle]

This example is effectively the same as a dataset dup operation, but of course less efficient, because the objects have to be serialized, compressed, and base64-encoded and the same sequence of operations run backward again.

dataset valid

dataset valid dhandle propertylist

Returns a list of boolean values indicating whether values for the named properties are currently set for the dataset. No attempt at computation is made.

Example:

dataset valid $dhandle D_NAME

reports whether the dataset is named (has a valid D_NAME property) or not.

dataset wait

dataset wait dhandle ?size|query? ?script?

Suspend the interpreter until the number of objects in the dataset has reached a threshold, or an object which satisfies a query expression can be found. The syntax of query expressions is the same as in the dataset scan command. If no explicit size or a query expression is specified, or an empty string is passed as this parameter, the command uses the value of the highwatermark dataset attribute as default value for an implicit size threshold condition.

Another dataset attribute which has an influence on the execution of the command is the timeout attribute. If the dataset size has not grown to the required size, or no object which satisfies the query expression was added to the dataset after waiting for the timeout number of seconds, an error is raised. By default, the maximum wait period is indefinite, which corresponds to a negative timeout value. If the timeout value is set to zero, the wait condition must be met immediately, or an error results. However, no error is raised if the eod/targeteod dataset parameter pair indicates that no more data can be expected to be added in the dataset. In that case, the result is an empty string.

If no script body parameter is used, the return value of the command is the number of objects the dataset holds in case of an explicit or implicit size condition, or the handle of the first matching object in case of a query expression.

If the object count already exceeds the threshold, or a matching object can be found at the moment the command is executed, the command returns immediately.

In the presence of a script parameter, the script body is executed whenever the wait condition is met. If the script is ended with a continue statement, or simply reaches the end of the code block, the wait loop is automatically restarted. If the script reports an error, or is left via a break or return statement, the loop is terminated.

This command is mostly useful when running multi-threaded scripts, or when the dataset is operating a remote command listener on a port. Under these circumstances, new objects may arrive in the dataset without participation of the local, stopped interpreter.

While a dataset wait command is pending, the dataset cannot be deleted. Since it is possible that other threads or remote action port monitors further update the dataset between the time the wait condition is met and script processing commences, action scripts should be prepared to see more or less items in the dataset than immediately after the trigger event.

Example:

loop n 1 $nrecs {
	set eh [dataset wait $dh “E_FILE(startrec) = $n”]
	molfile write $fh $eh
	ens delete $eh
}

This is a part of a simple write thread which writes back processed ensembles in the same order as they were read from an input file. In case there are multiple processing threads, it is likely to happen that the computation on an ensemble read from a larger input file record finishes before another with a smaller record number and thus the sequence of the ensembles to be written as delivered in the output queue becomes out of sync. By waiting for ensembles in the input record sequence the original order is preserved. More robust versions of such a script should handle the case of ensembles from a specific input record never appearing in the dataset and similar sources of disruption.

dataset weed

dataset weed dhandle keywords

This command performs standard clean-up operations on all ensembles and reactions in the dataset. The supported operations are described in more detail in the section on the equivalent ens weed command.

The return value of this command is the dataset handle.

dataset xlabel

dataset xlabel dhandle propertylist ?filterset? ?filterprocs?

This command is rather complex and closely related to the dataset extract command. Its purpose is to extract handle and label information for selected subsets of the dataset. The return value is a nested list. The sublists consist of the object handle, the object label (if the object does not have a label, 1 is substituted), and the dataset object index. The dataset object index starts with zero.

The selection of the class of objects which are extracted is performed indirectly via the property list. For practical purposes, this list should be a single property. Its object association type determines the class of objects selected. For example, A_LABEL or A_SYMBOL returns atom labels, while B_ORDER returns bond labels and E_NAME select complete ensembles (with 1 as pseudo ensemble label).

The returned objects can further be filtered by a standard filter set, and additionally by a list of callback procedures. These Tcl script procedures are called with the respective object handles and object labels as arguments. For example, a callback procedure used in an atom retrieval context would be called for each atom with its ensemble handle and the atom label as arguments. If objects without a label are checked, such as complete ensembles, 1 is passed as the label. The callback procedures are expected to return a boolean value. If it is false or 0, the object is not added to the returned list, and the other check procedures are no longer called.

The command currently only works on ensembles in the dataset, ignoring any reactions, tables, datasets or networks which may be present.

This command is primarily useful for the display of filtered minor object data from datasets, such as atom property values for specific types of atoms.

Example:

set dhandle [dataset create [ens create O] [ens create C=C]]
dataset xlabel $dhandle A_LABEL !hydrogen
dataset xlabel $dhandle B_ORDER doublebond

First, a dataset with two ensembles (water and ethene) is created. This dataset is then queried. The first query is for all atoms in it which are not hydrogen. The returned list is

{ens0 1 0} {ens1 1 1} {ens1 2 1}

In object ens0 , which is the first object in the dataset, atom 1 passes the filter. In object ens1 , which is the second object in the dataset, atoms with label 1 and 2 pass. The second query asks for the labels of double bonds in the dataset. The use of property B_ORDER is arbitrary - any other bond property would do as well. The return value of this command is

{ens1 1 1}

which indicates that only the bond with label 1 in object ens1 , which is the second object in the dataset, fulfills this condition.