previousTable of ContentsReferencesnext
 

Chapter 4: Library Reference

PyTables implements several classes to represent the different nodes in the object tree. They are named File, Group, Leaf, Table, Array, EArray, VLArray and UnImplemented. Another one allows the user to complement the information on these different objects; its name is AttributeSet. Finally, another important class called IsDescription allows to build a Table record description by declaring a subclass of it. Many other classes are defined in PyTables, but they can be regarded as helpers whose goal is mainly to declare the data type properties of the different first class objects and will be described at the end of this chapter as well.

An important function, called openFile is responsible to create, open or append to files. In addition, a few utility functions are defined to guess if the user supplied file is a PyTables or HDF5 file. These are called isPyTablesFile() and isHDF5File(), respectively. Finally, there exists a function called whichLibVersion that informs about the versions of the underlying C libraries (for example, the HDF5 or the Zlib).

Let's start discussing the first-level variables and functions available to the user, then the different classes defined in PyTables.

4.1 tables variables and functions

4.1.1 Global variables

__version__
The PyTables version number.
extVersion
The version of the Pyrex extension module. This might be useful when reporting bugs.
hdf5Version
The underlying HDF5 library version number.

4.1.2 Global functions

copyFile(srcfilename, dstfilename, overwrite=False, **kwargs)

An easy way of copying one PyTables file to another.

This function allows you to copy an existing PyTables file named srcfilename to another file called dstfilename. The source file must exist and be readable. The destination file can be overwritten in place if existing by asserting the overwrite argument.

This function is a shorthand for the File.copyFile() method, which acts on an already opened file. kwargs takes keyword arguments used to customize the copying process. See the documentation of File.copyFile() (see 4.2.2) for a description of those arguments.

isHDF5File(filename)

Determine whether a file is in the HDF5 format.

When successful, it returns a true value if the file is an HDF5 file, false otherwise. If there were problems identifying the file, an HDF5ExtError is raised.

For this function to work, it needs the name of an existing, readable and closed file.

isPyTablesFile(filename)

Determine whether a file is in the PyTables format.

When successful, it returns a true value if the file is a PyTables file, false otherwise. The true value is the format version string of the file. If there were problems identifying the file, an HDF5ExtError is raised.

For this function to work, it needs the name of an existing, readable and closed file.

openFile(filename, mode='r', title='', trMap={}, rootUEP="/", filters=None)

Open a PyTables (or generic HDF5) file and returns a File object.

filename
The name of the file (supports environment variable expansion). It is suggested that it should have any of ".h5", ".hdf" or ".hdf5" extensions, although this is not mandatory.
mode
The mode to open the file. It can be one of the following:
'r'
read-only; no data can be modified.
'w'
write; a new file is created (an existing file with the same name would be deleted).
'a'
append; an existing file is opened for reading and writing, and if the file does not exist it is created.
'r+'
is similar to 'a', but the file must already exist.
title
If filename is new, this will set a title for the root group in this file. If filename is not new, the title will be read from disk, and this will not have any effect.
trMap
A dictionary to map names in the object tree Python namespace into different HDF5 names in file namespace. The keys are the Python names, while the values are the HDF5 names. This is useful when you need to use HDF5 node names with invalid or reserved words in Python.
rootUEP
The root User Entry Point. This is a group in the HDF5 hierarchy which will be taken as the starting point to create the object tree. The group has to be named after its HDF5 name and can be a path. If it does not exist, an HDF5ExtError exception is issued. Use this if you do not want to build the entire object tree, but rather only a subtree of it.
filters
An instance of the Filters class (see section 4.14.1) that provides information about the desired I/O filters applicable to the leaves that hang directly from root (unless other filters properties are specified for these leaves). Besides, if you do not specify filter properties for its child groups, they will inherit these ones. So, if you open a new file with this parameter set, all the leaves that would be created in the file will recursively inherit this filtering properties (again, if you don't prevent that from happening by specifying other filters on the child groups or leaves).

whichLibVersion(name)

Get version information about a C library.

If the library indicated by name is available, this function returns a 3-tuple containing the major library version as an integer, its full version as a string, and the version date as a string. If the library is not available, None is returned.

The currently supported library names are hdf5, zlib, lzo, ucl and bzip2. If another name is given, a ValueError is raised.

4.2 The File class

An instance of this class is returned when a PyTables file is opened with the openFile() function. It offers methods to manipulate (create, rename, delete...) nodes and handle their attributes, as well as methods to traverse the object tree. The user entry point to the object tree attached to the HDF5 file is represented in the rootUEP attribute. Other attributes are available.

File objects support an Undo/Redo mechanism which can be enabled with the enableUndo() method. Once the Undo/Redo mechanism is enabled, explicit marks (with an optional unique name) can be set on the state of the database using the mark() method. There are two implicit marks which are always available: the initial mark (0) and the final mark (-1). Both the identifier of a mark and its name can be used in undo and redo operations.

Hierarchy manipulation operations (node creation, movement and removal) and attribute handling operations (setting and deleting) made after a mark can be undone by using the undo() method, which returns the database to the state of a past mark. If undo() is not followed by operations that modify the hierarchy or attributes, the redo() method can be used to return the database to the state of a future mark. Else, future states of the database are forgotten.

Please note that data handling operations can not be undone nor redone by now. Also, hierarchy manipulation operations on nodes that do not support the Undo/Redo mechanism issue an UndoRedoWarning before changing the database.

The Undo/Redo mechanism is persistent between sessions and can only be disabled by calling the disableUndo() method.

4.2.1 File instance variables

filename
The name of the opened file.
format_version
The PyTables version number of this file.
isopen
True if the underlying file is open, false otherwise.
mode
The mode in which the file was opened.
title
The title of the root group in the file.
trMap
A dictionary that maps node names between PyTables and HDF5 domain names. Its initial values are set from the trMap parameter passed to the openFile function. You can change its contents after a file is opened and the new map will take effect over any new object added to the tree.
rootUEP
The UEP (user entry point) group in the file (see 4.1.2).
filters
Default filter properties for the root group (see section 4.14.1).
root
The root of the object tree hierarchy (a Group instance).
objects
A dictionary which maps path names to objects, for every node in the tree.
groups
A dictionary which maps path names to objects, for every group in the tree.
leaves
A dictionary which maps path names to objects, for every leaf in the tree.

4.2.2 File methods

createGroup(where, name, title='', filters=None)

Create a new Group instance with name name in where location.

where
The parent group where the new group will hang from. where parameter can be a path string (for example "/level1/group5"), or another Group instance.
name
The name of the new group.
title
A description for this group.
filters
An instance of the Filters class (see section4.14.1) that provides information about the desired I/O filters applicable to the leaves that hangs directly from this new group (unless other filters properties are specified for these leaves). Besides, if you do not specify filter properties for its child groups, they will inherit these ones.

createTable(where, name, description, title='', filters=None, expectedrows=10000)

Create a new Table instance with name name in where location.

where
The parent group where the new table will hang from. where parameter can be a path string (for example "/level1/leaf5"), or Group instance.
name
The name of the new table.
description
A user-defined class, derived from the IsDescription class, where table fields are specified. However, in certain situations, it is more handy to allow this description to be supplied as a dictionary (for example, when you do not know beforehand which structure will have your table). In such a cases, you can pass the description as a dictionary as well. See section 3.3 for an example of use. Finally, a RecArray object from the numarray package is also accepted, and all the information about columns and other metadata is used as a basis to create the Table object. Moreover, if the RecArray has actual data this is also injected on the newly created Table object.
title
A description for this object.
filters
An instance of the Filters class (see section 4.14.1) that provides information about the desired I/O filters to be applied during the life of this object.
expectedrows
An user estimate of the number of records that will be on table. If not provided, the default value is appropriate for tables until 10 MB in size (more or less). If you plan to save bigger tables you should provide a guess; this will optimize the HDF5 B-Tree creation and management process time and memory used. See section 6.1 for a discussion on that issue.

createArray(where, name, object, title='')

Create a new Array instance with name name in where location.

object
The regular array to be saved. Currently accepted values are: lists, tuples, scalars (int and float), strings and (multidimensional) Numeric and NumArray arrays (including CharArrays string arrays). However, these objects must be regular (i.e. they can not be like, for example, [[1,2],2]). Also, objects that have some of their dimensions equal to zero are not supported (use an EArray object if you want to create an array with one of its dimensions equal to 0).

See createTable description 4.2.2 for more information on the where, name and title, parameters.

createEArray(where, name, atom, title='', filters=None, expectedrows=1000)

Create a new EArray instance with name name in where location.

atom
An Atom instance representing the shape, type and flavor of the atomic objects to be saved. One (and only one) of the shape dimensions must be 0. The dimension being 0 means that the resulting EArray object can be extended along it. Multiple enlargeable dimensions are not supported right now. See section 4.13.3 for the supported set of Atom class descendants.
expectedrows
In the case of enlargeable arrays this represents an user estimate about the number of row elements that will be added to the growable dimension in the EArray object. If not provided, the default value is 1000 rows. If you plan to create both much smaller or much bigger EArrays try providing a guess; this will optimize the HDF5 B-Tree creation and management process time and the amount of memory used.

See createTable description 4.2.2 for more information on the where, name, title, and filters parameters.

createVLArray(where, name, atom=None, title='', filters=None, expectedsizeinMB=1.0)

Create a new VLArray instance with name name in where location. See the section 4.10 for a description of the VLArray class.

atom
An Atom instance representing the shape, type and flavor of the atomic object to be saved. See section 4.13.3 for the supported set of Atom class descendants.
expectedsizeinMB
An user estimate about the size (in MB) in the final VLArray object. If not provided, the default value is 1 MB. If you plan to create both much smaller or much bigger VLA's try providing a guess; this will optimize the HDF5 B-Tree creation and management process time and the amount of memory used.

See createTable description 4.2.2 for more information on the where, name, title, and filters parameters.

getNode(where, name=None, classname=None)

Get the node under where with the given name.

where can be a Node instance or a path string leading to a node. If no name is specified, that node is returned.

If a name is specified, this must be a string with the name of a node under where. In this case the where argument can only lead to a Group instance (else a TypeError is raised). The node called name under the group where is returned.

In both cases, if the node to be returned does not exist, a NoSuchNodeError is raised. Please note that hidden nodes are also considered.

If the classname argument is specified, it must be the name of a class derived from Node. If the node is found but it is not an instance of that class, a NoSuchNodeError is also raised.

getNodeAttr(where, attrname, name=None)

Returns the attribute attrname under where.name location.

where, name
These arguments work as in getNode() (see [here]), referencing the node to be acted upon.
attrname
The name of the attribute to get.

setNodeAttr(where, attrname, attrvalue, name=None)

Sets the attribute attrname with value attrvalue under where.name location. If the node already has a large number of attributes, a PerformanceWarning will be issued.

where, name
These arguments work as in getNode() (see [here]), referencing the node to be acted upon.
attrname
The name of the attribute to set on disk.
attrvalue
The value of the attribute to set. Any scalar (string, ints or floats) attribute is supported natively. However, (c)Pickle is automatically used so as to serialize other kind of objects (like lists, tuples, dicts, small Numeric/numarray objects...) that you might want to save.

delNodeAttr(where, attrname, name=None)

Delete the attribute attrname in where.name location.

where, name
These arguments work as in getNode() (see [here]), referencing the node to be acted upon.
attrname
The name of the attribute to delete on disk.

copyNodeAttrs(where, dstnode, name=None)

Copy the attributes from node where.name to dstnode.

where, name
These arguments work as in getNode() (see [here]), referencing the node to be acted upon.
dstnode
This is the destination node where the attributes will be copied. It can be either a path string or a Node object.

listNodes(where, classname=None)

Returns a list with children nodes hanging from where. The list is alpha-numerically sorted by node name.

where
This argument works as in getNode() (see [here]), referencing the node to be acted upon.
classname
If the name of a class derived from Node is supplied in the classname parameter, only instances of that class (or subclasses of it) will be returned.

removeNode(where, name=None, recursive=False)

Removes the object node name under where location.

where, name
These arguments work as in getNode() (see [here]), referencing the node to be acted upon.
recursive
If not supplied, the object will be removed only if it has no children; if it does, a NodeError will be raised. If supplied with a true value, the object and all its descendants will be completely removed.

copyNode(where, newparent=None, newname=None, name=None, overwrite=False, recursive=False, **kwargs)

Copy the node specified by where and name to newparent/newname.

where, name
These arguments work as in getNode() (see [here]), referencing the node to be acted upon.
newparent
The destination group that the node will be copied to (a path name or a Group instance). If newparent is None, the parent of the source node is selected as the new parent.
newname
The name to be assigned to the new copy in its destination (a string). If newname is None or not specified, the name of the source node is used.
overwrite
Whether the possibly existing node newparent/newname should be overwritten or not. Please note that trying to copy over an existing node without overwriting it will issue a NodeError.
recursive
Specifies whether the copy should recurse into children of the copied node. This argument is ignored for leaf nodes. The default is not recurse.
kwargs
Additional keyword arguments may be passed to customize the copying process. The supported arguments depend on the kind of node being copied. The following are some of them:
title
The new title for the destination. If None, the original title is used. This only applies to the topmost node for recursive copies.
filters
Specifying this parameter overrides the original filter properties in the source node. If specified, it must be an instance of the Filters class (see section 4.14.1). The default is to copy the filter attribute from the source node.
copyuserattrs
You can prevent the user attributes from being copied by setting this parameter to False. The default is to copy them.
start, stop, step
Specify the range of rows in child leaves to be copied; the default is to copy all the rows.
stats
This argument may be used to collect statistics on the copy process. When used, it should be a dictionary with keys groups, leaves and bytes having a numeric value. Their values will be incremented to reflect the number of groups, leaves and bytes, respectively, that have been copied in the operation.

renameNode(where, newname, name=None)

Change the name of the node specified by where and name to newname.

where, name
These arguments work as in getNode() (see [here]), referencing the node to be acted upon.
newname
The new name to be assigned to the node (a string).

moveNode(where, newparent=None, newname=None, name=None, overwrite=False)

Move the node specified by where and name to newparent/newname.

where, name
These arguments work as in getNode() (see [here]), referencing the node to be acted upon.
newparent
The destination group the node will be moved to (a path name or a Group instance). If newparent is None, the original node parent is selected as the new parent.
newname
The new name to be assigned to the node in its destination (a string). If newname is None or not specified, the original node name is used.

walkGroups(where='/')

Iterator that returns the list of Groups (not Leaves) hanging from (and including) where. The where Group is listed first (pre-order), then each of its child Groups (following an alphanumerical order) is also traversed, following the same procedure. If where is not supplied, the root object is used.

where
The origin group. Can be a path string or Group instance.

walkNodes(where="/", classname="")

Recursively iterate over the nodes in the File instance. It takes two parameters:

where
If supplied, the iteration starts from (and includes) this group.
classname
(String) If supplied, only instances of this class are returned.

Example of use:

	      # Recursively print all the nodes hanging from '/detector'
	      print "Nodes hanging from group '/detector':"
	      for node in h5file.walkNodes("/detector"):
	          print node
	    

copyChildren(srcgroup, dstgroup, overwrite=False, recursive=False, **kwargs)

Copy the children of a group into another group.

This method copies the nodes hanging from the source group srcgroup into the destination group dstgroup. Existing destination nodes can be replaced by asserting the overwrite argument. If the recursive argument is true, all descendant nodes of srcnode are recursively copied.

kwargs takes keyword arguments used to customize the copying process. See the documentation of Group._f_copyChildren() (see 4.4.2) for a description of those arguments.

copyFile(dstfilename, overwrite=False, **kwargs)

Copy the contents of this file to dstfilename.

dstfilename must be a path string indicating the name of the destination file. If it already exists, the copy will fail with an IOError, unless the overwrite argument is true, in which case the destination file will be overwritten in place.

Additional keyword arguments may be passed to customize the copying process. For instance, title and filters may be changed, user attributes may be or may not be copied, data may be sub-sampled, stats may be collected, etc. Arguments unknown to nodes are simply ignored. Check the documentation for copying operations of nodes to see which options they support.

Copying a file usually has the beneficial side effect of creating a more compact and cleaner version of the original file.

flush()

Flush all the leaves in the object tree.

close()

Flush all the leaves in object tree and close the file.

Undo/Redo support

isUndoEnabled()

Is the Undo/Redo mechanism enabled?

Returns True if the Undo/Redo mechanism has been enabled for this file, False otherwise. Please note that this mechanism is persistent, so a newly opened PyTables file may already have Undo/Redo support.

enableUndo(filters=Filters(complevel=1))

Enable the Undo/Redo mechanism.

This operation prepares the database for undoing and redoing modifications in the node hierarchy. This allows mark(), undo(), redo() and other methods to be called.

The filters argument, when specified, must be an instance of class Filters (see section 4.14.1) and is meant for setting the compression values for the action log. The default is having compression enabled, as the gains in terms of space can be considerable. You may want to disable compression if you want maximum speed for Undo/Redo operations.

Calling enableUndo() when the Undo/Redo mechanism is already enabled raises an UndoRedoError.

disableUndo()

Disable the Undo/Redo mechanism.

Disabling the Undo/Redo mechanism leaves the database in the current state and forgets past and future database states. This makes mark(), undo(), redo() and other methods fail with an UndoRedoError.

Calling disableUndo() when the Undo/Redo mechanism is already disabled raises an UndoRedoError.

mark(name=None)

Mark the state of the database.

Creates a mark for the current state of the database. A unique (and immutable) identifier for the mark is returned. An optional name (a string) can be assigned to the mark. Both the identifier of a mark and its name can be used in undo() and redo() operations. When the name has already been used for another mark, an UndoRedoError is raised.

This method can only be called when the Undo/Redo mechanism has been enabled. Otherwise, an UndoRedoError is raised.

getCurrentMark()

Get the identifier of the current mark.

Returns the identifier of the current mark. This can be used to know the state of a database after an application crash, or to get the identifier of the initial implicit mark after a call to enableUndo().

This method can only be called when the Undo/Redo mechanism has been enabled. Otherwise, an UndoRedoError is raised.

undo(mark=None)

Go to a past state of the database.

Returns the database to the state associated with the specified mark. Both the identifier of a mark and its name can be used. If the mark is omitted, the last created mark is used. If there are no past marks, or the specified mark is not older than the current one, an UndoRedoError is raised.

This method can only be called when the Undo/Redo mechanism has been enabled. Otherwise, an UndoRedoError is raised.

redo(mark=None)

Go to a future state of the database.

Returns the database to the state associated with the specified mark. Both the identifier of a mark and its name can be used. If the mark is omitted, the next created mark is used. If there are no future marks, or the specified mark is not newer than the current one, an UndoRedoError is raised.

This method can only be called when the Undo/Redo mechanism has been enabled. Otherwise, an UndoRedoError is raised.

goto(mark)

Go to a specific mark of the database.

Returns the database to the state associated with the specified mark. Both the identifier of a mark and its name can be used.

This method can only be called when the Undo/Redo mechanism has been enabled. Otherwise, an UndoRedoError is raised.

4.2.3 File special methods

Following are described the methods that automatically trigger actions when a File instance is accessed in a special way.

__contains__(path)

Is there a node with that path?

Returns True if the file has a node with the given path (a string), False otherwise.

__iter__()

Iterate over the children on the File instance. However, this does not accept parameters. This iterator is recursive.

Example of use:

	      # Recursively list all the nodes in the object tree
	      h5file = tables.openFile("vlarray1.h5")
	      print "All nodes in the object tree:"
	      for node in h5file:
	          print node
	    

__str__()

Prints a short description of the File object.

Example of use:

>>> f=tables.openFile("data/test.h5")
>>> print f
data/test.h5 (File) 'Table Benchmark'
Last modif.: 'Mon Sep 20 12:40:47 2004'
Object Tree:
/ (Group) 'Table Benchmark'
/tuple0 (Table(100L,)) 'This is the table title'
/group0 (Group) ''
/group0/tuple1 (Table(100L,)) 'This is the table title'
/group0/group1 (Group) ''
/group0/group1/tuple2 (Table(100L,)) 'This is the table title'
/group0/group1/group2 (Group) ''
	    

__repr__()

Prints a detailed description of the File object.

4.3 The Node class

This is the base class for all nodes in a PyTables hierarchy. It is an abstract class, i.e. it may not be directly instantiated; however, every node in the hierarchy is an instance of this class.

A PyTables node is always hosted in a PyTables file, under a parent group, at a certain depth in the node hierarchy. A node knows its own name in the parent group and its own path name in the file. When using a translation map (see 4.2), its HDF5 name might differ from its PyTables name.

All the previous information is location-dependent, i.e. it may change when moving or renaming a node in the hierarchy. A node also has location-independent information, such as its HDF5 object identifier and its attribute set.

This class gathers the operations and attributes (both location-dependent and independent) which are common to all PyTables nodes, whatever their type is. Nonetheless, due to natural naming restrictions, the names of all of these members start with a reserved prefix (see 4.4).

Sub-classes with no children (i.e. leaf nodes) may define new methods, attributes and properties to avoid natural naming restrictions. For instance, _v_attrs may be shortened to attrs and _f_rename to rename. However, the original methods and attributes should still be available.

4.3.1 Node instance variables

Location dependent

_v_file
The hosting File instance (see 4.2).
_v_parent
The parent Group instance (see 4.4).
_v_depth
The depth of this node in the tree (an non-negative integer value).
_v_name
The name of this node in its parent group (a string).
_v_hdf5name
The name of this node in the hosting HDF5 file (a string).
_v_pathname
The path name of this node in the tree (a string).
_v_rootgroup
The root group instance. This is deprecated; please use node._v_file.root.

Location independent

_v_objectID
The identifier of this node in the hosting HDF5 file.
_v_attrs
The associated AttributeSet instance (see 4.12).

Attribute shorthands

_v_title
A description of this node. A shorthand for TITLE attribute.

4.3.2 Node methods

Hierarchy manipulation

_f_close()

Close this node in the tree.

This makes the node inaccessible from the object tree. The closing operation is not recursive, i.e. closing a group does not close its children. On nodes with data, it may flush it to disk.

_f_remove(recursive=False)

Remove this node from the hierarchy.

If the node has children, recursive removal must be stated by giving recursive a true value; otherwise, a NodeError will be raised.

_f_rename(newname)

Rename this node in place.

Changes the name of a node to newname (a string).

_f_move(newparent=None, newname=None, overwrite=False)

Move or rename this node.

Moves a node into a new parent group, or changes the name of the node. newparent can be a Group object or a pathname in string form. If it is not specified or None, the current parent group is chosen as the new parent. newname must be a string with a new name. If it is not specified or None, the current name is chosen as the new name.

Moving a node across databases is not allowed, nor it is moving a node into itself. These result in a NodeError. However, moving a node over itself is allowed and simply does nothing. Moving over another existing node is similarly not allowed, unless the optional overwrite argument is true, in which case that node is recursively removed before moving.

Usually, only the first argument will be used, effectively moving the node to a new location without changing its name. Using only the second argument is equivalent to renaming the node in place.

_f_copy(newparent=None, newname=None, overwrite=False, recursive=False, **kwargs)

Copy this node and return the new node.

Creates and returns a copy of the node, maybe in a different place in the hierarchy. newparent can be a Group object or a pathname in string form. If it is not specified or None, the current parent group is chosen as the new parent. newname must be a string with a new name. If it is not specified or None, the current name is chosen as the new name. If recursive copy is stated, all descendants are copied as well.

Copying a node across databases is supported but can not be undone. Copying a node over itself is not allowed, nor it is recursively copying a node into itself. These result in a NodeError. Copying over another existing node is similarly not allowed, unless the optional overwrite argument is true, in which case that node is recursively removed before copying.

Additional keyword arguments may be passed to customize the copying process. For instance, title and filters may be changed, user attributes may be or may not be copied, data may be sub-sampled, stats may be collected, etc. See the documentation for the particular node type.

Using only the first argument is equivalent to copying the node to a new location without changing its name. Using only the second argument is equivalent to making a copy of the node in the same group.

Attribute handling

_f_getAttr(name)

Get a PyTables attribute from this node.

If the named attribute does not exist, an AttributeError is raised.

_f_setAttr(name, value)

Set a PyTables attribute for this node.

If the node already has a large number of attributes, a PerformanceWarning is issued.

_f_delAttr(name)

Delete a PyTables attribute from this node.

If the named attribute does not exist, an AttributeError is raised.

4.4 The Group class

Instances of this class are a grouping structure containing instances of zero or more groups or leaves, together with supporting metadata.

Working with groups and leaves is similar in many ways to working with directories and files, respectively, in a Unix filesystem. As with Unix directories and files, objects in the object tree are often described by giving their full (or absolute) path names. This full path can be specified either as a string (like in '/group1/group2') or as a complete object path written in natural name schema (like in
file.root.group1.group2) as discussed in the section 1.2.

A collateral effect of the natural naming schema is that names of Group members must be carefully chosen to avoid colliding with existing children node names. For this reason and not to pollute the children namespace, it is explicitly forbidden to assign normal attributes to Group instances, and all existing members start with some reserved prefixes, like _f_ (for methods) or _v_ (for instance variables). Any attempt to set a new child node whose name starts with one of these prefixes will raise a ValueError exception.

Another effect of natural naming is that nodes having reserved Python names and other non-allowed Python names (like for example $a or 44) can not be accessed using the node.child syntax. You will be forced to use getattr(node, child) and delattr(node, child) to access them.

You can also make use of the trMap (translation map dictionary) parameter in the openFile function (see section 4.1.2) in order to translate HDF5 names not suited for natural naming into more convenient ones.

4.4.1 Group instance variables

These instance variables are provided in addition to those in Node (see 4.3).

_v_nchildren
The number of children hanging from this group.
_v_children
Dictionary with all nodes hanging from this group.
_v_groups
Dictionary with all groups hanging from this group.
_v_leaves
Dictionary with all leaves hanging from this group.
_v_filters
Default filter properties for child nodes —see 4.14.1. A shorthand for FILTERS attribute.

4.4.2 Group methods

This class defines the __getattr__ and __delattr__ methods, and they work as normally intended. Please note that __setattr__ should not be used to assign children into a group. Use the node creation methods from File (see 4.2.2) or the node movement methods (move and _f_move) to do that. So, you can access to and delete children from a group by just using the next constructs:
	      # Add a Table child instance under group with name "tablename"
	      file.createTable(group, 'tablename', recordDict, "Record instance")
	      table = group.tablename     # Get the table child instance
	      del group.tablename         # Delete the table child instance
	    

Caveat: The following methods are documented for completeness, and they can be used without any problem. However, you should use the high-level counterpart methods in the File class, because these are most used in documentation and examples, and are a bit more powerful than those exposed here.

These methods are provided in addition to those in Node (see 4.3).

_f_join(name)

Helper method to correctly concatenate a name child object with the pathname of this group.

_f_copy(newparent, newname, overwrite=False, recursive=False, **kwargs)

Copy this node and return the new one.

This method has the behavior described in Node._f_copy() (see [here]). In addition, it recognizes the following keyword arguments:

title
The new title for the destination. If omitted or None, the original title is used. This only applies to the topmost node in recursive copies.
filters
Specifying this parameter overrides the original filter properties in the source node. If specified, it must be an instance of the Filters class (see section 4.14.1). The default is to copy the filter properties from the source node.
copyuserattrs
You can prevent the user attributes from being copied by setting this parameter to False. The default is to copy them.
stats
This argument may be used to collect statistics on the copy process. When used, it should be a dictionary with keys 'groups', 'leaves' and 'bytes' having a numeric value. Their values will be incremented to reflect the number of groups, leaves and bytes, respectively, that have been copied during the operation.

_f_listNodes(classname=None)

Returns a list with all the object nodes hanging from this instance. The list is alpha-numerically sorted by node name. If a classname parameter is supplied, it will only return instances of this class (or subclasses of it).

_f_walkGroups()

Iterate over the list of Groups (not Leaves) hanging from (and including) self. This Group is listed first (pre-order), then each of its child Groups (following an alphanumerical order) is also traversed, following the same procedure.

_f_walkNodes(classname=None, recursive=True)

Iterate over the nodes in the Group instance. It takes two parameters:

classname
(String) If supplied, only instances of this class are returned.
recursive
(Integer) If false, only children hanging immediately after the group are returned. If true, a recursion over all the groups hanging from it is performed.

Example of use:

	      # Recursively print all the arrays hanging from '/'
	      print "Arrays the object tree '/':"
	      for array in h5file.root._f_walkNodes("Array", recursive=1):
	          print array
	    

_f_close()

Close this node in the tree.

This method has the behavior described in Node._f_close() (see [here]). It should be noted that this operation disables access to nodes descending from this group. Therefore, if you want to explicitly close them, you will need to walk the nodes hanging from this group before closing it.

_f_copyChildren(dstgroup, overwrite=False, recursive=False, **kwargs)

Copy the children of this group into another group.

Children hanging directly from this group are copied into dstgroup, which can be a Group (see 4.4) object or its pathname in string form.

The operation will fail with a NodeError if there is a child node in the destination group with the same name as one of the copied children from this one, unless overwrite is true; in this case, the former child node is recursively removed before copying the later.

By default, nodes descending from children groups of this node are not copied. If the recursive argument is true, all descendant nodes of this node are recursively copied.

Additional keyword arguments may be passed to customize the copying process. For instance, title and filters may be changed, user attributes may be or may not be copied, data may be sub-sampled, stats may be collected, etc. Arguments unknown to nodes are simply ignored. Check the documentation for copying operations of nodes to see which options they support.

4.4.3 Group special methods

Following are described the methods that automatically trigger actions when a Group instance is accessed in a special way.

__contains__(name)

Is there a child with that name?

Returns True if the group has a child node (visible or hidden) with the given name (a string), False otherwise.

__iter__()

Iterate over the children on the group instance. However, this does not accept parameters. This iterator is not recursive.

Example of use:

	      # Non-recursively list all the nodes hanging from '/detector'
	      print "Nodes in '/detector' group:"
	      for node in h5file.root.detector:
	          print node
	    

__str__()

Prints a short description of the Group object.

Example of use:

>>> f=tables.openFile("data/test.h5")
>>> print f.root.group0
/group0 (Group) 'First Group'
>>>
	    

__repr__()

Prints a detailed description of the Group object.

Example of use:

>>> f=tables.openFile("data/test.h5")
>>> f.root.group0
/group0 (Group) 'First Group'
  children := ['tuple1' (Table), 'group1' (Group)]
>>>
	    

4.5 The Leaf class

The goal of this class is to provide a place to put common functionality of all its descendants as well as provide a way to help classifying objects on the tree. A Leaf object is an end-node, that is, a node that can hang directly from a group object, but that is not a group itself and, thus, it can not have descendants. Right now, the set of end-nodes is composed by Table, Array, EArray, VLArray and UnImplemented class instances. In fact, all the previous classes inherit from the Leaf class.

4.5.1 Leaf instance variables

These instance variables are provided in addition to those in Node (see 4.3).

shape
The shape of data in the leaf.
byteorder
The byte ordering of data in the leaf.
filters
Filter properties for this leaf —see 4.14.1.
name
The name of this node in its parent group (a string). An alias for Node._v_name.
hdf5name
The name of this node in the hosting HDF5 file (a string). An alias for Node._v_hdf5name.
objectID
The identifier of this node in the hosting HDF5 file. An alias for Node._v_objectID.
attrs
The associated AttributeSet instance (see 4.12). An alias for Node._v_attrs.
title
A description for this node. An alias for Node._v_title.

4.5.2 Leaf methods

flush()

Flush pending data to disk.

Saves whatever remaining buffered data to disk.

_f_close(flush=True)

Close this node in the tree.

This method has the behavior described in Node._f_close() (see [here]). Besides that, the optional argument flush tells whether to flush pending data to disk or not before closing.

close(flush=True)

Close this node in the tree.

This method is completely equivalent to _f_close().

remove()

Remove this node from the hierarchy.

This method has the behavior described in Node._f_remove() (see [here]). Please note that there is no recursive flag since leaves do not have child nodes.

copy(newparent, newname, overwrite=False, **kwargs)

Copy this node and return the new one.

This method has the behavior described in Node._f_copy() (see [here]). Please note that there is no recursive flag since leaves do not have child nodes. In addition, this method recognizes the following keyword arguments:

title
The new title for the destination. If omitted or None, the original title is used.
filters
Specifying this parameter overrides the original filter properties in the source node. If specified, it must be an instance of the Filters class (see section 4.14.1). The default is to copy the filter properties from the source node.
copyuserattrs
You can prevent the user attributes from being copied by setting this parameter to False. The default is to copy them.
start, stop, step
Specify the range of rows in child leaves to be copied; the default is to copy all the rows.
stats
This argument may be used to collect statistics on the copy process. When used, it should be a dictionary with keys 'groups', 'leaves' and 'bytes' having a numeric value. Their values will be incremented to reflect the number of groups, leaves and bytes, respectively, that have been copied during the operation.

rename(newname)

Rename this node in place.

This method has the behavior described in Node._f_rename() (see [here]).

move(newparent=None, newname=None, overwrite=False)

Move or rename this node.

This method has the behavior described in Node._f_move() (see [here]).

getAttr(name)

Get a PyTables attribute from this node.

This method has the behavior described in Node._f_getAttr() (see [here]).

setAttr(name, value)

Set a PyTables attribute for this node.

This method has the behavior described in Node._f_setAttr() (see [here]).

delAttr(name)

Delete a PyTables attribute from this node.

This method has the behavior described in Node._f_delAttr() (see [here]).

4.6 The Table class

Instances of this class represents table objects in the object tree. It provides methods to read/write data and from/to table objects in the file.

Data can be read from or written to tables by accessing to an special object that hangs from Table. This object is an instance of the Row class (see 4.6.4). See the tutorial sections chapter 3 on how to use the Row interface. The columns of the tables can also be easily accessed (and more specifically, they can be read but not written) by making use of the Column class, through the use of an extension of the natural naming schema applied inside the tables. See the section 4.7 for some examples of use of this capability.

Note that this object inherits all the public attributes and methods that Leaf already has.

4.6.1 Table instance variables

description
The metaobject describing this table.
row
The Row instance for this table (see 4.6.4).
nrows
The number of rows in this table.
rowsize
The size, in bytes, of each row.
cols
A Cols (see section 4.6.5) instance that serves as accessors to Column (see section 4.7) objects.
colnames
The field names for the table (list).
coltypes
The data types for the table fields (dictionary).
colstypes
The data string-types for the table fields (dictionary).
colshapes
The shapes for the table fields (dictionary).
colindexed
Whether the table fields are indexed (dictionary).
indexed
Whether or not some field in the table is indexed.
indexprops
Properties of an indexed Table (see 4.14.2). This attribute (dictionary) exists only if the Table is indexed.

4.6.2 Table methods

append(rows)

Append a series of rows to this Table instance. rows is an object that can keep the rows to be append in several formats, like a RecArray, a list of tuples, list of Numeric/NumArray/CharArray objects, string, Python buffer or None (no append will result). Of course, this rows object has to be compliant with the underlying format of the Table instance or a ValueError will be issued.

Example of use:
from tables import *
class Particle(IsDescription):
    name        = StringCol(16, pos=1)   # 16-character String
    lati        = IntCol(pos=2)        # integer
    longi       = IntCol(pos=3)        # integer
    pressure    = Float32Col(pos=4)    # float  (single-precision)
    temperature = FloatCol(pos=5)      # double (double-precision)

fileh = openFile("test4.h5", mode = "w")
table = fileh.createTable(fileh.root, 'table', Particle, "A table")
# Append several rows in only one call
table.append([("Particle:     10", 10, 0, 10*10, 10**2),
              ("Particle:     11", 11, -1, 11*11, 11**2),
              ("Particle:     12", 12, -2, 12*12, 12**2)])
fileh.close()
	      

col(name)

Get a column from the table.

If a column called name exists in the table, it is read and returned as a numarray.NumArray object, or as a numarray.strings.CharArray object (whatever is more appropriate). If it does not exist, a ValueError is raised.

Example of use:

narray = table.col('var2')

That statement is equivalent to:

narray = table.read(field='var2')

Here you can see how this method can be used as a shorthand for the read() (see 4.6.2) method.

iterrows(start=None, stop=None, step=1)

Returns an iterator yielding Row (see section 4.6.4) instances built from rows in table. If a range is supplied (i.e. some of the start, stop or step parameters are passed), only the appropriate rows are returned. Else, all the rows are returned. See also the __iter__() special method in section 4.6.3 for a shorter way to call this iterator.

The meaning of the start, stop and step parameters is the same as in the range() python function, except that negative values of step are not allowed. Moreover, if only start is specified, then stop will be set to start+1. If you do not specify neither start nor stop, then all the rows in the object are selected.

Example of use:

	      result = [ row['var2'] for row in table.iterrows(step=5)
                                     if row['var1'] <= 20 ]
	    

itersequence(sequence, sort=True)

Iterate over a sequence of row coordinates.

sequence
Can be any object that supports the __getitem__ special method, like lists, tuples, Numeric/numarray objects, etc.
sort
If true, means that sequence will be sorted out so that the I/O process would get better performance. If your sequence is already sorted or you don't want to sort it, put this parameter to 0. The default is to sort the sequence.

read(start=None, stop=None, step=1, field=None, flavor="numarray")

Returns the actual data in Table. If field is not supplied, it returns the data as a RecArray object table.

The meaning of the start, stop and step parameters is the same as in the range() python function, except that negative values of step are not allowed. Moreover, if only start is specified, then stop will be set to start+1. If you do not specify neither start nor stop, then all the rows in the object are selected.

The rest of the parameters are described next:

field
If specified, only the column field is returned as a NumArray object. If this is not supplied, all the fields are selected and a RecArray is returned.
flavor
When a field in table is selected, passing a flavor parameter make an additional conversion to happen in the default "numarray" returned object. flavor must have any of the next values: "numarray" (i.e. no conversion is made), "Numeric", "Tuple" or "List".

readCoordinates(coords, field=None, flavor='numarray')

Read a set of rows given their indexes into an in-memory object.

This method works much like the read() method (see 4.6.2), but it uses a sequence (coords) of row indexes to select the wanted columns, instead of a column range.

It returns the selected rows in a RecArray object. If both field and flavor are provided, an additional conversion to an object of this flavor is made, just as in read().

modifyRows(start=None, stop=None, step=1, rows=None)

Modify a series of rows in the [start:stop:step] extended slice range. If you pass None to stop, all the rows existing in rows will be used.

rows can be either a RecArray object or a structure that is able to be converted to a RecArray and compliant with the table format.

Returns the number of modified rows.

It raises an ValueError in case the rows parameter could not be converted to an object compliant with table description.

It raises an IndexError in case the modification will exceed the length of the table.

modifyColumns(start=None, stop=None, step=1, columns=None, names=None)

Modify a series of rows in the [start:stop:step] extended slice row range. If you pass None to stop, all the rows existing in columns will be used.

columns can be either a RecArray or a list of arrays (the columns) that is able to be converted to a RecArray compliant with the specified column names subset of the table format.

names specifies the column names of the table to be modified.

Returns the number of modified rows.

It raises an ValueError in case the columns parameter could not be converted to an object compliant with table description.

It raises an IndexError in case the modification will exceed the length of the table.

removeRows(start, stop=None)

Removes a range of rows in the table. If only start is supplied, this row is to be deleted. If a range is supplied, i.e. both the start and stop parameters are passed, all the rows in the range are removed. A step parameter is not supported, and it is not foreseen to implement it anytime soon.

start
Sets the starting row to be removed. It accepts negative values meaning that the count starts from the end. A value of 0 means the first row.
stop
Sets the last row to be removed to stop - 1, i.e. the end point is omitted (in the Python range tradition). It accepts, likewise start, negative values. A special value of None (the default) means removing just the row supplied in start.

removeIndex(index)

Remove the index associated with the specified column. Only Index instances (see 4.14.3) are accepted as parameter. This index can be recreated again by calling the createIndex (see 4.7.2) method of the appropriate Column object.

flushRowsToIndex()

Add remaining rows in buffers to non-dirty indexes. This can be useful when you have chosen non-automatic indexing for the table (see section 4.14.2) and want to update the indexes on it.

reIndex()

Recompute all the existing indexes in table. This can be useful when you suspect that, for any reason, the index information for columns is no longer valid and want to rebuild the indexes on it.

reIndexDirty()

Recompute the existing indexes in table, but only if they are dirty. This can be useful when you have set the reindex parameter to 0 in IndexProps constructor (see 4.14.2) for the table and want to update the indexes after a invalidating index operation (Table.removeRows, for example).

where(condition, start=None, stop=None, step=None)

Iterate over values fulfilling a condition.

This method returns an iterator yielding Row (see 4.6.4) instances built from rows in the table that satisfy the given condition over a column. If that column is indexed, its index will be used in order to accelerate the search. Else, the in-kernel iterator (with has still better performance than standard Python selections) will be chosen instead.

Moreover, if a range is supplied (i.e. some of the start, stop or step parameters are passed), only the rows in that range and fullfilling the condition are returned. The meaning of the start, stop and step parameters is the same as in the range() Python function, except that negative values of step are not allowed. Moreover, if only start is specified, then stop will be set to start+1.

You can mix this method with standard Python selections in order to have complex queries. It is strongly recommended that you pass the most restrictive condition as the parameter to this method if you want to achieve maximum performance.

Example of use:

passvalues=[]
for row in table.where(0 < table.cols.col1 < 0.3, step=5):
    if row['col2'] <= 20:
        passvalues.append(row['col3'])
print "Values that pass the cuts:", passvalues
	    

whereAppend(dstTable, condition, start=None, stop=None, step=None)

Append rows fullfilling the condition to the dstTable table.

dstTable must be capable of taking the rows resulting from the query, i.e. it must have columns with the expected names and compatible types. The meaning of the other arguments is the same as in the where() method (see 4.6.2).

The number of rows appended to `dstTable` is returned as a result.

getWhereList(condition, flavor="List")

Get the row coordinates that fulfill the condition param. This method will take advantage of an indexed column to speed-up the search.

flavor is the desired type of the returned list. It can take the 'List' (the default), 'Tuple' or 'NumArray' values.

4.6.3 Table special methods

Following are described the methods that automatically trigger actions when a Table instance is accessed in a special way (e.g., table["var2"] will be equivalent to a call to table.__getitem__("var2")).

__iter__()

It returns the same iterator than Table.iterrows(0,0,1). However, this does not accept parameters.

Example of use:

	      result = [ row['var2'] for row in table 
                                     if row['var1'] <= 20 ]
	    

Which is equivalent to:

	      result = [ row['var2'] for row in table.iterrows() 
                                     if row['var1'] <= 20 ]
	    

__getitem__(key)

Get a row or a range of rows from the table.

If the key argument is an integer, the corresponding table row is returned as a numarray.records.Record object. If key is a slice, the range of rows determined by it is returned as a numarray.reords.RecArray object.

Using a string as key to get a column is supported but deprecated. Please use the col() (see 4.6.2) method.

Example of use:

record = table[4]
recarray = table[4:1000:2]
	    

Those statements are equivalent to:

record = table.read(start=4)[0]
recarray = table.read(start=4, stop=1000, step=2)
	    

Here you can see how indexing and slicing can be used as shorthands for the read() (see 4.6.2) method.

__setitem__(key, value)

It takes different actions depending on the type of the key parameter:

key is an Integer
The corresponding table row is set to value. value must be a List or Tuple capable of being converted to the table field format.
key is a Slice
The row slice determined by key is set to value. value must be a RecArray object or a list of rows capable of being converted to the table field format.

Example of use:

	      # Modify just one existing row
	      table[2] = [456,'db2',1.2]
	      # Modify two existing rows
	      rows = numarray.records.array([[457,'db1',1.2],[6,'de2',1.3]],
	                                    formats="i4,a3,f8")
	      table[1:3:2] = rows
	    

Which is equivalent to:

	      table.modifyRows(start=2, [456,'db2',1.2])
	      rows = numarray.records.array([[457,'db1',1.2],[6,'de2',1.3]],
	                                    formats="i4,a3,f8")
	      table.modifyRows(start=1, step=2, rows)
	    

4.6.4 The Row class

This class is used to fetch and set values on the table fields. It works very much like a dictionary, where the keys are the field names of the associated table and the values are the values of those fields in a specific row.

This object turns out to actually be an extension type, so you won't be able to access its documentation interactively. Neither you won't be able to access its internal attributes (they are not directly accessible from Python), although accessors (i.e. methods that return an internal attribute) have been defined for some important variables.

Row methods

append()
Once you have filled the proper fields for the current row, calling this method actually commits these data to the disk (actually data are written to the output buffer).
nrow()
Accessor that returns the current row number in the table. It is useful to know which row is being dealt with in the middle of a loop.
getTable()
Accessor that returns the associated Table object.

4.6.5 The Cols class

This class is used as an accessor to the table columns following the natural name convention, so that you can access the different columns because there exists one attribute with the name of the columns for each associated Column instances. Besides, and like the Row class, it works similar to a dictionary, where the keys are the column names of the associated table and the values are Column instances. See section 4.7 for examples of use.

4.7 The Column class

Each instance of this class is associated with one column of every table. These instances are mainly used to fetch and set actual data from the table columns, but there are a few other associated methods to deal with indexes.

4.7.1 Column instance variables

table
The parent Table instance.
name
The name of the associated column.
type
The data type of the column.
index
The associated Index object (see 4.14.3) to this column (None if does not exist).
dirty
Whether the index is dirty or not (property).

4.7.2 Column methods

createIndex()

Create an Index (see 4.14.3) object for this column.

reIndex()

Recompute the index associated with this column. This can be useful when you suspect that, for any reason, the index information is no longer valid and want to rebuild it.

reIndexDirty()

Recompute the existing index only if it is dirty. This can be useful when you have set the reindex parameter to 0 in IndexProps constructor (see 4.14.2) for the table and want to update the column's index after a invalidating index operation (Table.removeRows, for example).

removeIndex()

Delete the associated column's index. After doing that, you will loose the indexation information on disk. However, you can always re-create it using the createIndex() method (see 4.7.2).

closeIndex()

Close the index of this column. After that, the column will look as if it has no index, although it will re-appear when the file would be re-opened later on.

4.7.3 Column special methods

__getitem__(key)

Returns a column element or slice. It takes different actions depending on the type of the key parameter:

key is an Integer
The corresponding element in the column is returned as a scalar object or as a NumArray/CharArray object, depending on its shape.
key is a Slice
The row range determined by this slice is returned as a NumArray or CharArray object (whichever is appropriate).
Example of use:
print "Column handlers:"
for name in table.colnames:
    print table.cols[name]
print
print "Some selections:"
print "Select table.cols.name[1]-->", table.cols.name[1]
print "Select table.cols.name[1:2]-->", table.cols.name[1:2]
print "Select table.cols.lati[1:3]-->", table.cols.lati[1:3]
print "Select table.cols.pressure[:]-->", table.cols.pressure[:]
print "Select table.cols['temperature'][:]-->", table.cols['temperature'][:]
	      
and the output of this for a certain arbitrary table is:
Column handlers:
/table.cols.name (Column(1,), CharType)
/table.cols.lati (Column(2,), Int32)
/table.cols.longi (Column(1,), Int32)
/table.cols.pressure (Column(1,), Float32)
/table.cols.temperature (Column(1,), Float64)

Some selections:
Select table.cols.name[1]--> Particle:     11
Select table.cols.name[1:2]--> ['Particle:     11']
Select table.cols.lati[1:3]--> [[11 12]
 [12 13]]
Select table.cols.pressure[:]--> [  90.  110.  132.]
Select table.cols['temperature'][:]--> [ 100.  121.  144.]
	      
See the examples/table2.py for a more complete example.

__setitem__(key, value)

It takes different actions depending on the type of the key parameter:

key is an Integer
The corresponding element in the column is set to value. value must be a scalar or NumArray/CharArray, depending on column's shape.
key is a Slice
The row slice determined by key is set to value. value must be a list of elements or a NumArray/CharArray.

Example of use:

	      # Modify row 1
	      table.cols.col1[1] = -1
	      # Modify rows 1 and 3
	      table.cols.col1[1::2] = [2,3]
	    

Which is equivalent to:

	      # Modify row 1
	      table.modifyColumns(start=1, columns=[[-1]], names=["col1"])
	      # Modify rows 1 and 3
	      columns = numarray.records.fromarrays([[2,3]], formats="i4")
	      table.modifyColumns(start=1, step=2, columns=columns, names=["col1"])
	    

4.8 The Array class

Represents an array on file. It provides methods to write/read data to/from array objects in the file. This class does not allow you to enlarge the datasets on disk; see the EArray descendant in section 4.9 if you want enlargeable dataset support and/or compression features.

The array data types supported are the same as the set provided by Numeric and numarray. For details of these data types see appendix A, or the numarray reference manual ().

Note that this object inherits all the public attributes and methods that Leaf already provides.

4.8.1 Array instance variables

flavor
The object representation for this array. It can be any of "NumArray", "CharArray" "Numeric", "List", "Tuple", "String", "Int" or "Float" values.
nrows
The length of the first dimension of Array.
nrow
On iterators, this is the index of the current row.
type
The type class of the represented array.
itemsize
The size of the base items. Specially useful for CharArray objects.

4.8.2 Array methods

Note that, as this object has no internal I/O buffers, it is not necessary to use the flush() method inherited from Leaf in order to save its internal state to disk. When a writing method call returns, all the data is already on disk.

iterrows(start=None, stop=None, step=1)

Returns an iterator yielding numarray instances built from rows in array. The return rows are taken from the first dimension in case of an Array instance and the enlargeable dimension in case of an EArray instance. If a range is supplied (i.e. some of the start, stop or step parameters are passed), only the appropriate rows are returned. Else, all the rows are returned. See also the and __iter__() special methods in section 4.8.3 for a shorter way to call this iterator.

The meaning of the start, stop and step parameters is the same as in the range() python function, except that negative values of step are not allowed. Moreover, if only start is specified, then stop will be set to start+1. If you do not specify neither start nor stop, then all the rows in the object are selected.

Example of use:

	      result = [ row for row in arrayInstance.iterrows(step=4) ]
	    

read(start=None, stop=None, step=1)

Read the array from disk and return it as a numarray (default) object, or an object with the same original flavor that it was saved. It accepts start, stop and step parameters to select rows (the first dimension in the case of an Array instance and the enlargeable dimension in case of an EArray) for reading.

The meaning of the start, stop and step parameters is the same as in the range() python function, except that negative values of step are not allowed. Moreover, if only start is specified, then stop will be set to start+1. If you do not specify neither start nor stop, then all the rows in the object are selected.

4.8.3 Array special methods

Following are described the methods that automatically trigger actions when an Array instance is accessed in a special way (e.g., array[2:3,...,::2] will be equivalent to a call to
array.__getitem__(slice(2,3, None), Ellipsis, slice(None, None, 2))).

__iter__()

It returns the same iterator than Array.iterrows(0,0,1). However, this does not accept parameters.

Example of use:

	      result = [ row[2] for row in array ]

	    

Which is equivalent to:

	      result = [ row[2] for row in array.iterrows(0, 0, 1) ]
	    

__getitem__(key)

It returns a numarray (default) object (or an object with the same original flavor that it was saved) containing the slice of rows stated in the key parameter. The set of allowed tokens in key is the same as extended slicing in python (the Ellipsis token included).

Example of use:

	      array1 = array[4]   # array1.shape == array.shape[1:]
	      array2 = array[4:1000:2]  # len(array2.shape) == len(array.shape)
	      array3 = array[::2, 1:4, :]
	      array4 = array[1, ..., ::2, 1:4, 4:] # General slice selection
	    

__setitem__(key, value)

Sets an Array element, row or extended slice. It takes different actions depending on the type of the key parameter:

key is an integer:
The corresponding row is assigned to value. If needed, this value is broadcasted to fit the specified row.
key is a slice:
The row slice determined by it is assigned to value. If needed, this value is broadcasted to fit in the desired range. If the slice to be updated exceeds the actual shape of the array, only the values in the existing range are updated, i.e. the index error will be silently ignored. If value is a multidimensional object, then its shape must be compatible with the slice specified in key, otherwise, a ValueError will be issued.

Example of use:

	      a1[0] = 333       # Assign an integer to a Integer Array row
	      a2[0] = "b"       # Assign a string to a string Array row
	      a3[1:4] = 5       # Broadcast 5 to slice 1:4
	      a4[1:4:2] = "xXx" # Broadcast "xXx" to slice 1:4:2
	      # General slice update (a5.shape = (4,3,2,8,5,10)
	      a5[1, ..., ::2, 1:4, 4:] = arange(1728, shape=(4,3,2,4,3,6))
	    

4.9 The EArray class

This is a child of the Array class (see 4.8) and as such, EArray represents an array on the file. The difference is that EArray allows to enlarge datasets along any single dimension6) you select. Another important difference is that it also supports compression.

So, in addition to the attributes and methods that EArray inherits from Array, it supports a few more that provide a way to enlarge the arrays on disk. Following are described the new variables and methods as well as some that already exist in Array but that differ somewhat on the meaning and/or functionality in the EArray context.

4.9.1 EArray instance variables

atom
The class instance chosen for the atom object (see section 4.13.3).
extdim
The enlargeable dimension.
nrows
The length of the enlargeable dimension.

4.9.2 EArray methods

append(sequence)

Appends a sequence to the underlying dataset. Obviously, this sequence must have the same type as the EArray instance; otherwise a TypeError is issued. In the same way, the dimensions of the sequence have to conform to those of EArray, that is, all the dimensions have to be the same except, of course, that of the enlargeable dimension which can be of any length (even 0!).

Example of use (code available in examples/earray1.py):

import tables
from numarray import strings

fileh = tables.openFile("earray1.h5", mode = "w")
a = tables.StringAtom(shape=(0,), length=8)
# Use 'a' as the object type for the enlargeable array
array_c = fileh.createEArray(fileh.root, 'array_c', a, "Chars")
array_c.append(strings.array(['a'*2, 'b'*4], itemsize=8))
array_c.append(strings.array(['a'*6, 'b'*8, 'c'*10], itemsize=8))

# Read the string EArray we have created on disk
for s in array_c:
    print "array_c[%s] => '%s'" % (array_c.nrow, s)
# Close the file
fileh.close()
	    

and the output is:

	      array_c[0] => 'aa'
	      array_c[1] => 'bbbb'
	      array_c[2] => 'aaaaaa'
	      array_c[3] => 'bbbbbbbb'
	      array_c[4] => 'cccccccc'
	    

4.10 The VLArray class

Instances of this class represents array objects in the object tree with the property that their rows can have a variable number of (homogeneous) elements (called atomic objects, or just atoms). Variable length arrays (or VLA's for short), similarly to Table instances, can have only one dimension, and likewise Table, the compound elements (the atoms) of the rows of VLArrays can be fully multidimensional objects.

VLArray provides methods to read/write data from/to variable length array objects residents on disk. Also, note that this object inherits all the public attributes and methods that Leaf already has.

4.10.1 VLArray instance variables

atom
The class instance chosen for the atom object (see section 4.13.3).
nrow
On iterators, this is the index of the current row.
nrows
The total number of rows.

4.10.2 VLArray methods

append(sequence, *objects)

Append objects in the sequence to the array.

This method appends the objects in the sequence to a single row in this array. The type of individual objects must be compliant with the type of atoms in the array. In the case of variable length strings, the very string to append is the sequence.

Example of use (code available in examples/vlarray1.py):

import tables
from Numeric import *   # or, from numarray import *

# Create a VLArray:
fileh = tables.openFile("vlarray1.h5", mode = "w")
vlarray = fileh.createVLArray(fileh.root, 'vlarray1',
tables.Int32Atom(flavor="Numeric"),
                 "ragged array of ints", Filters(complevel=1))
# Append some (variable length) rows:
vlarray.append(array([5, 6]))
vlarray.append(array([5, 6, 7]))
vlarray.append([5, 6, 9, 8])

# Now, read it through an iterator:
for x in vlarray:
    print vlarray.name+"["+str(vlarray.nrow)+"]-->", x

# Close the file
fileh.close()
	    

The output of the previous program looks like this:

vlarray1[0]--> [5 6]
vlarray1[1]--> [5 6 7]
vlarray1[2]--> [5 6 9 8]
	    

The objects argument is only retained for backwards compatibility; please do not use it.

iterrows(start=None, stop=None, step=1)

Returns an iterator yielding one row per iteration. If a range is supplied (i.e. some of the start, stop or step parameters are passed), only the appropriate rows are returned. Else, all the rows are returned. See also the __iter__() special methods in section 4.10.3 for a shorter way to call this iterator.

The meaning of the start, stop and step parameters is the same as in the range() python function, except that negative values of step are not allowed. Moreover, if only start is specified, then stop will be set to start+1. If you do not specify neither start nor stop, then all the rows in the object are selected.

Example of use:

	      for row in vlarray.iterrows(step=4):
	          print vlarray.name+"["+str(vlarray.nrow)+"]-->", row
	    

read(start=None, stop=None, step=1)

Returns the actual data in VLArray. As the lengths of the different rows are variable, the returned value is a python list, with as many entries as specified rows in the range parameters.

The meaning of the start, stop and step parameters is the same as in the range() python function, except that negative values of step are not allowed. Moreover, if only start is specified, then stop will be set to start+1. If you do not specify neither start nor stop, then all the rows in the object are selected.

4.10.3 VLArray special methods

Following are described the methods that automatically trigger actions when a VLArray instance is accessed in a special way (e.g., vlarray[2:5] will be equivalent to a call to vlarray.__getitem__(slice(2,5,None)).

__iter__()

It returns the same iterator than VLArray.iterrows(0,0,1). However, this does not accept parameters.

Example of use:

	      result = [ row for row in vlarray ]
	    

Which is equivalent to:

	      result = [ row for row in vlarray.iterrows() ]
	    

__getitem__(key)

It returns the slice of rows determined by key, which can be an integer index or an extended slice. The returned value is a list of objects of type array.atom.type.

Example of use:

	      list1 = vlarray[4]
	      list2 = vlarray[4:1000:2]
	    

__setitem__(keys, value)

Updates a vlarray row described by keys by setting it to value. Depending on the value of keys, the action taken is different:

keys is an integer:
It refers to the number of row to be modified. The value object must be type and shape compatible with the object that exists in the vlarray row.
keys is a tuple:
The first element refers to the row to be modified, and the second element to the range (so, it can be an integer or an slice) of the row that will be updated. As above, the value object must be type and shape compatible with the object specified in the vlarray row and range.

Note: When updating VLStrings (codification UTF-8) or Objects atoms, there is a problem: one can only update values with exactly the same bytes than in the original row. With UTF-8 encoding this is problematic because, for instance, 'c' takes 1 byte, but 'ç' takes two. The same applies when using Objects atoms, because when cPickle applies to a class instance (for example), it does not guarantee to return the same number of bytes than over other instance, even of the same class than the former. These facts effectively limit the number of objects than can be updated in VLArrays.

Example of use:

	      vlarray[0] = vlarray[0]*2+3
	      vlarray[99,3:] = arange(96)*2+3
	      # Negative values for start and stop (but not step) are supported
	      vlarray[99,-99:-89:2] = vlarray[5]*2+3 
	    

4.11 The UnImplemented class

Instances of this class represents an unimplemented dataset in a generic HDF5 file. When reading such a file (i.e. one that has not been created with PyTables, but with some other HDF5 library based tool), chances are that the specific combination of datatypes and/or dataspaces in some dataset might not be supported by PyTables yet. In such a case, this dataset will be mapped into the UnImplemented class and hence, the user will still be able to build the complete object tree of this generic HDF5 file, as well as enabling the access (both read and write) of the attributes of this dataset and some metadata. Of course, the user won't be able to read the actual data on it.

This is an elegant way to allow users to work with generic HDF5 files despite the fact that some of its datasets would not be supported by PyTables. However, if you are really interested in having access to an unimplemented dataset, please, get in contact with the developer team.

This class does not have any public instance variables, except those inherited from the Leaf class (see 4.5).

4.12 The AttributeSet class

Represents the set of attributes of a node (Leaf or Group). It provides methods to create new attributes, open, rename or delete existing ones.

Like in Group instances, AttributeSet instances make use of the natural naming convention, i.e. you can access the attributes on disk like if they were normal AttributeSet attributes. This offers the user a very convenient way to access (but also to set and delete) node attributes by simply specifying them like a normal attribute class.

Caveat: All Python data types are supported. The scalar ones (i.e. String, Int and Float) are mapped directly to the HDF5 counterparts, so you can correctly visualize them with any HDF5 tool. However, the rest of the data types and more general objects are serialized using cPickle, so you will be able to correctly retrieve them only from a Python-aware HDF5 library. Hopefully, the list of supported native attributes will be extended to fully multidimensional arrays sometime in the future.

4.12.1 AttributeSet instance variables

_v_node
The parent node instance.
_v_attrnames
List with all attribute names.
_v_attrnamessys
List with system attribute names.
_v_attrnamesuser
List with user attribute names.

4.12.2 AttributeSet methods

Note that this class defines the __setattr__, __getattr__ and __delattr__ and they work as normally intended. Any scalar (string, ints or floats) attribute is supported natively as an attribute. However, (c)Pickle is automatically used so as to serialize other kind of objects (like lists, tuples, dicts, small Numeric/numarray objects, ...) that you might want to save. If an attribute is set on a target node that already has a large number of attributes, a PerformanceWarning will be issued.

With these special methods, you can access, assign or delete attributes on disk by just using the next constructs:
	      leaf.attrs.myattr = "str attr"  # Set a string (native support)
	      leaf.attrs.myattr2 = 3          # Set an integer (native support)
	      leaf.attrs.myattr3 = [3,(1,2)]  # A generic object (Pickled)
	      attrib = leaf.attrs.myattr      # Get the attribute myattr
	      del leaf.attrs.myattr           # Delete the attribute myattr
	    
_f_copy(where)
Copy the user attributes (as well as certain system attributes) to where object. where has to be a Group or Leaf instance.
_f_list(attrset = "user")
Return a list of attribute names of the parent node. attrset selects the attribute set to be used. A user value returns only the user attributes and this is the default. sys returns only the system attributes. all returns both the system and user attributes.
_f_rename(oldattrname, newattrname)
Rename an attribute.

4.13 Declarative classes

In this section a series of classes that are meant to declare datatypes that are required for primary PyTables (like Table or VLArray ) objects are described.

4.13.1 The IsDescription class

This class is in fact a so-called metaclass object. There is nothing special on this fact, except that its subclasses' attributes are transformed during its instantiation phase, and new methods for instances are defined based on the values of the class attributes.

It is designed to be used as an easy, yet meaningful way to describe the properties of Table objects through the use of classes that inherit properties from it. In order to define such a special class, you have to declare it as descendant of IsDescription, with as many attributes as columns you want in your table. The name of these attributes will become the name of the columns, while their values will be the properties of the columns that are obtained through the use of the Col (see section 4.13.2) class constructor.

Then, you can pass this object to the Table constructor, where all the information it contains will be used to define the table structure. See the section 3.3 for an example on how that works.

Moreover, you can use the IsDescription object to change the properties of the index creation process for a table. Just create an instance of the IndexProps class (see section 4.14.2) and assign it to the special attribute _v_indexprops of the IsDescription object.

4.13.2 The Col class and its descendants

The Col class is used as a mean to declare the different properties of a table column. In addition, a series of descendant classes are offered in order to make these column descriptions easier to the user. In general, it is recommended to use these descendant classes, as they are more meaningful when found in the middle of the code.

Note that the only public method accessible in these classes is the constructor itself.

Col(dtype="Float64", shape=1, dflt=None, pos=None, indexed=0)
Declare the properties of a Table column.
dtype
The data type for the column. All types listed in appendix A are valid data types for columns. The type description is accepted both in string-type format and as a numarray data type.
shape
An integer or a tuple, that specifies the number of dtype items for each element (or shape, for multidimensional elements) of this column. For CharType columns, the last dimension is used as the length of the character strings. However, for this kind of objects, the use of StringCol subclass is strongly recommended.
dflt
The default value for elements of this column. If the user does not supply a value for an element while filling a table, this default value will be written to disk. If the user supplies an scalar value for a multidimensional column, this value is automatically broadcasted to all the elements in the column cell. If dflt is not supplied, an appropriate zero value (or null string) will be chosen by default.
pos
By default, columns are arranged in memory following an alpha-numerical order of the column names. In some situations, however, it is convenient to impose a user defined ordering. pos parameter allows the user to force the desired ordering.
indexed
Whether this column should be indexed for better performance in table selections.
StringCol(length=None, dflt=None, shape=1, pos=None, indexed=0)
Declare a column to be of type CharType. The length parameter sets the length of the strings. The meaning of the other parameters are like in the Col class.
BoolCol(dflt=0, shape=1, pos=None, indexed=0)
Define a column to be of type Bool. The meaning of the parameters are the same of those in the Col class.
IntCol(dflt=0, shape=1, itemsize=4, sign=1, pos=None, indexed=0)
Declare a column to be of type IntXX, depending on the value of itemsize parameter, that sets the number of bytes of the integers in the column. sign determines whether the integers are signed or not. The meaning of the other parameters are the same of those in the Col class.

This class has several descendants:

Int8Col(dflt=0, shape=1, pos=None, indexed=0)
Define a column of type Int8.
UInt8Col(dflt=0, shape=1, pos=None, indexed=0)
Define a column of type UInt8.
Int16Col(dflt=0, shape=1, pos=None, indexed=0)
Define a column of type Int16.
UInt16Col(dflt=0, shape=1, pos=None, indexed=0)
Define a column of type UInt16.
Int32Col(dflt=0, shape=1, pos=None, indexed=0)
Define a column of type Int32.
UInt32Col(dflt=0, shape=1, pos=None, indexed=0)
Define a column of type UInt32.
Int64Col(dflt=0, shape=1, pos=None, indexed=0)
Define a column of type Int64.
UInt64Col(dflt=0, shape=1, pos=None, indexed=0)
Define a column of type UInt64.
FloatCol(dflt=0.0, shape=1, itemsize=8, pos=None, indexed=0)
Define a column to be of type FloatXX, depending on the value of itemsize. The itemsize parameter sets the number of bytes of the floats in the column and the default is 8 bytes (double precision). The meaning of the other parameters are the same as those in the Col class.

This class has two descendants:

Float32Col(dflt=0.0, shape=1, pos=None, indexed=0)
Define a column of type Float32.
Float64Col(dflt=0.0, shape=1, pos=None, indexed=0)
Define a column of type Float64.
ComplexCol(dflt=0.+0.j, shape=1, itemsize=16, pos=None)
Define a column to be of type ComplexXX, depending on the value of itemsize. The itemsize parameter sets the number of bytes of the complex types in the column and the default is 16 bytes (double precision complex). The meaning of the other parameters are the same as those in the Col class.

This class has two descendants:

Complex32Col(dflt=0.+0.j, shape=1, pos=None)
Define a column of type Complex32.
Float64Col(dflt=0+0.j, shape=1, pos=None)
Define a column of type Complex64.

ComplexCol columns and its descendants do not support indexation.

TimeCol(dflt=0, shape=1, itemsize=8, pos=None, indexed=0)
Define a column to be of type Time. Two kinds of time columns are supported depending on the value of itemsize: 4-byte signed integer and 8-byte double precision floating point columns (the default ones). The meaning of the other parameters are the same as those in the Col class.

Time columns have a special encoding in the HFD5 file. See appendix A for more information on those types.

This class has two descendants:

Time32Col(dflt=0, shape=1, pos=None, indexed=0)
Define a column of type Time32.
Time64Col(dflt=0.0, shape=1, pos=None, indexed=0)
Define a column of type Time64.

4.13.3 The Atom class and its descendants.

The Atom class is meant to declare the different properties of the base element (also known as atom) of EArray and VLArray objects. The Atom instances have the property that their length is always the same. However, you can grow objects along the extensible dimension in the case of EArray or put a variable number of them on a VLArray row. Moreover, the atoms are not restricted to scalar values, and they can be fully multidimensional objects.

A series of descendant classes are offered in order to make the use of these element descriptions easier. In general, it is recommended to use these descendant classes, as they are more meaningful when found in the middle of the code. Note that the only public methods accessible in these classes are the atomsize() method and the constructor itself. The atomsize() method returns the total length, in bytes, of the element base atom.

A description of the different constructors with their parameters follows:

Atom(dtype="Float64", shape=1, flavor="NumArray")
Define properties for the base elements of EArray and VLArray objects.
dtype
The data type for the base element. See the appendix A for a relation of data types supported. The type description is accepted both in string-type format and as a numarray data type.
shape
In a EArray context, it is a tuple specifying the shape of the object, and one (and only one) of its dimensions must be 0, meaning that the EArray object will be enlarged along this axis. In the case of a VLArray, it can be an integer with a value of 1 (one) or a tuple, that specifies whether the atom is an scalar (in the case of a 1) or has multiple dimensions (in the case of a tuple). For CharType elements, the last dimension is used as the length of the character strings. However, for this kind of objects, the use of StringAtom subclass is strongly recommended.
flavor
The object representation for this atom. It can be any of "CharArray" or "String" for the CharType type and "NumArray", "Numeric", "List" or "Tuple" for the rest of the types. If the specified values differs from CharArray or NumArray values, the read atoms will be converted to that specific flavor. If not specified, the atoms will remain in their native format (i.e. CharArray or NumArray).
StringAtom(shape=1, length=None, flavor="CharArray")
Define an atom to be of CharType type. The meaning of the shape parameter is the same as in the Atom class. length sets the length of the strings atoms. flavor can be whether "CharArray" or "String". Unicode strings are not supported by this type; see the VLStringAtom class if you want Unicode support (only available for VLAtom objects).
BoolAtom(shape=1, flavor="NumArray")
Define an atom to be of type Bool. The meaning of the parameters are the same of those in the Atom class.
IntAtom(shape=1, itemsize=4, sign=1, flavor="NumArray")
Define an atom to be of type IntXX, depending on the value of itemsize parameter, that sets the number of bytes of the integers that conform the atom. sign determines whether the integers are signed or not. The meaning of the other parameters are the same of those in the Atom class.

This class has several descendants:

Int8Atom(shape=1, flavor="NumArray")
Define an atom of type Int8.
UInt8Atom(shape=1, flavor="NumArray")
Define an atom of type UInt8.
Int16Atom(shape=1, flavor="NumArray")
Define an atom of type Int16.
UInt16Atom(shape=1, flavor="NumArray")
Define an atom of type UInt16.
Int32Atom(shape=1, flavor="NumArray")
Define an atom of type Int32.
UInt32Atom(shape=1, flavor="NumArray")
Define an atom of type UInt32.
Int64Atom(shape=1, flavor="NumArray")
Define an atom of type Int64.
UInt64Atom(shape=1, flavor="NumArray")
Define an atom of type UInt64.
FloatAtom(shape=1, itemsize=8, flavor="NumArray")
Define an atom to be of FloatXX type, depending on the value of itemsize. The itemsize parameter sets the number of bytes of the floats in the atom and the default is 8 bytes (double precision). The meaning of the other parameters are the same as those in the Atom class.

This class has two descendants:

Float32Atom(shape=1, flavor="NumArray")
Define an atom of type Float32.
Float64Atom(shape=1, flavor="NumArray")
Define an atom of type Float64.
ComplexAtom(shape=1, itemsize=16, flavor="NumArray")
Define an atom to be of ComplexXX type, depending on the value of itemsize. The itemsize parameter sets the number of bytes of the floats in the atom and the default is 16 bytes (double precision complex). The meaning of the other parameters are the same as those in the Atom class.

This class has two descendants:

Complex32Atom(shape=1, flavor="NumArray")
Define an atom of type Complex32.
Complex64Atom(shape=1, flavor="NumArray")
Define an atom of type Complex64.
TimeAtom(shape=1, itemsize=8, flavor="NumArray")
Define an atom to be of type Time. Two kinds of time atoms are supported depending on the value of itemsize: 4-byte signed integer and 8-byte double precision floating point atoms (the default ones). The meaning of the other parameters are the same as those in the Atom class.

Time atoms have a special encoding in the HFD5 file. See appendix A for more information on those types.

This class has two descendants:

Time32Atom(shape=1, flavor="NumArray")
Define an atom of type Time32.
Time64Atom(shape=1, flavor="NumArray")
Define an atom of type Time64.

Now, there come two special classes, ObjectAtom and VLString, that actually do not descend from Atom, but which goal is so similar that they should be described here. The difference between them and the Atom and descendants classes is that these special classes does not allow multidimensional atoms, nor multiple values per row. A flavor can not be specified neither as it is immutable (see below).

Caveat emptor: You are only allowed to use these classes to create VLArray objects, not EArray objects.

ObjectAtom()
This class is meant to fit any kind of object in a row of an VLArray instance by using cPickle behind the scenes. Due to the fact that you can not foresee how long will be the output of the cPickle serialization (i.e. the atom already has a variable length), you can only fit a representant of it per row. However, you can still pass several parameters to the VLArray.append() method as they will be regarded as a tuple of compound objects (the parameters), so that we still have only one object to be saved in a single row. It does not accept parameters and its flavor is automatically set to "Object", so the reads of rows always returns an arbitrary python object. You can regard ObjectAtom types as an easy way to save an arbitrary number of generic python objects in a VLArray object.
VLStringAtom()
This class describes a row of the VLArray class, rather than an atom. It differs from the StringAtom class in that you can only add one instance of it to one specific row, i.e. the VLArray.append() method only accepts one object when the base atom is of this type. Besides, it supports Unicode strings (contrarily to StringAtom) because it uses the UTF-8 codification (this is why its atomsize() method returns always 1) when serializing to disk. It does not accept any parameter and because its flavor is automatically set to "VLString", the reads of rows always returns a python string. See the appendix C.3.4 if you are curious on how this is implemented at the low-level. You can regard VLStringAtom types as an easy way to save generic variable length strings.

See examples/vlarray1.py and examples/vlarray2.py for further examples on VLArrays, including object serialization and Unicode string management.

4.14 Helper classes

In this section are listed classes that does not fit in any other section and that mainly serves for ancillary purposes.

4.14.1 The Filters class

This class is meant to serve as a container that keeps information about the filter properties associated with the enlargeable leaves, that is Table, EArray and VLArray.

The public variables of Filters are listed below:

complevel
The compression level (0 means no compression).
complib
The compression filter used (in case of compressed dataset).
shuffle
Whether the shuffle filter is active or not.
fletcher32
Whether the fletcher32 filter is active or not.

There are no Filters public methods with the exception of the constructor itself that is described next.

Filters(complevel=0, complib="zlib", shuffle=1, fletcher32=0)

The parameters that can be passed to the Filters class constructor are:

complevel
Specifies a compress level for data. The allowed range is 0-9. A value of 0 disables compression. The default is that compression is disabled, that balances between compression effort and CPU consumption.
complib
Specifies the compression library to be used. Right now, "zlib" (default), "lzo", "ucl" and bzip2 values are supported. See section 6.3 for some advice on which library is better suited to your needs.
shuffle
Whether or not to use the shuffle filter present in the HDF5 library. This is normally used to improve the compression ratio (at the cost of consuming a little bit more CPU time). A value of 0 disables shuffling and 1 makes it active. The default value depends on whether compression is enabled or not; if compression is enabled, shuffling defaults to be active, else shuffling is disabled.
fletcher32
Whether or not to use the fletcher32 filter in the HDF5 library. This is used to add a checksum on each data chunk. A value of 0 disables the checksum and it is the default.
Of course, you can also create an instance and then assign the ones you want to change. For example:
import numarray as na
from tables import *

fileh = openFile("test5.h5", mode = "w")
atom = Float32Atom(shape=(0,2))
filters = Filters(complevel=1, complib = "lzo")
filters.fletcher32 = 1
arr = fileh.createEArray(fileh.root, 'earray', atom, "A growable array",
                         filters = filters)
# Append several rows in only one call
arr.append(na.array([[1., 2.],
                     [2., 3.],
                     [3., 4.]], type=na.Float32))

# Print information on that enlargeable array
print "Result Array:"
print repr(arr)

fileh.close()
	      
This enforces the use of the LZO library, a compression level of 1 and a fletcher32 checksum filter as well. See the output of this example:
Result Array:
/earray (EArray(3L, 2), fletcher32, shuffle, lzo(1)) 'A growable array'
  type = Float32
  shape = (3L, 2)
  itemsize = 4
  nrows = 3
  extdim = 0
  flavor = 'NumArray'
  byteorder = 'little'
	      

4.14.2 The IndexProps class

You can use this class to set/unset the properties in the indexing process of a Table column. To use it, create an instance, and assign it to the special attribute _v_indexprops in a table description class (see 4.13.1) or dictionary.

The public variables of IndexProps are listed below:

auto
Whether an existing index should be updated or not after a table append operation.
reindex
Whether the table columns are to be re-indexed after an invalidating index operation.
filters
The filter settings for the different Table indexes.

There are no IndexProps public methods with the exception of the constructor itself that is described next.

IndexProps(auto=1, reindex=1, filters=None)

The parameters that can be passed to the IndexProps class constructor are:

auto
Specifies whether an existing index should be updated or not after a table append operation. The default is enable automatic index updates.
reindex
Specifies whether the table columns are to be re-indexed after an invalidating index operation (like for example, after a Table.removeRows call). The default is to reindex after operations that invalidate indexes.
filters
Sets the filter properties for Column indexes. It has to be an instance of the Filters (see section 4.14.1) class. A None value means that the default settings for the Filters object are selected.

4.14.3 The Index class

This class is used to keep the indexing information for table columns. It is actually a descendant of the Group class, with some added functionality.

It has no methods intended for programmer's use, but it has some attributes that maybe interesting for him.

Index instance variables

column
The column object this index belongs to.
type
The type class for the index.
itemsize
The size of the atomic items. Specially useful for columns of CharType type.
nelements
The total number of elements in index.
dirty
Whether the index is dirty or not.
filters
The Filters (see section 4.14.1) instance for this index.

6) In the future, multiple enlargeable dimensions might be implemented as well.

previousTable of ContentsReferencesnext