PyTables implements several classes to represent the different nodes in the object tree. They are named File, Group, Leaf, Table, Array, EArray, VLArray and UnImplemented. Another one allows the user to complement the information on these different objects; its name is AttributeSet. Finally, another important class called IsDescription allows to build a Table record description by declaring a subclass of it. Many other classes are defined in PyTables, but they can be regarded as helpers whose goal is mainly to declare the data type properties of the different first class objects and will be described at the end of this chapter as well.
An important function, called openFile is responsible to create, open or append to files. In addition, a few utility functions are defined to guess if the user supplied file is a PyTables or HDF5 file. These are called isPyTablesFile() and isHDF5File(), respectively. Finally, there exists a function called whichLibVersion that informs about the versions of the underlying C libraries (for example, the HDF5 or the Zlib).
Let's start discussing the first-level variables and functions available to the user, then the different classes defined in PyTables.
An easy way of copying one PyTables file to another.
This function allows you to copy an existing PyTables file named srcfilename to another file called dstfilename. The source file must exist and be readable. The destination file can be overwritten in place if existing by asserting the overwrite argument.
This function is a shorthand for the File.copyFile() method, which acts on an already opened file. kwargs takes keyword arguments used to customize the copying process. See the documentation of File.copyFile() (see 4.2.2) for a description of those arguments.
Determine whether a file is in the HDF5 format.
When successful, it returns a true value if the file is an HDF5 file, false otherwise. If there were problems identifying the file, an HDF5ExtError is raised.
For this function to work, it needs the name of an existing, readable and closed file.
Determine whether a file is in the PyTables format.
When successful, it returns a true value if the file is a PyTables file, false otherwise. The true value is the format version string of the file. If there were problems identifying the file, an HDF5ExtError is raised.
For this function to work, it needs the name of an existing, readable and closed file.
Open a PyTables (or generic HDF5) file and returns a File object.
Get version information about a C library.
If the library indicated by name is available, this function returns a 3-tuple containing the major library version as an integer, its full version as a string, and the version date as a string. If the library is not available, None is returned.
The currently supported library names are hdf5, zlib, lzo, ucl and bzip2. If another name is given, a ValueError is raised.
An instance of this class is returned when a PyTables file is opened with the openFile() function. It offers methods to manipulate (create, rename, delete...) nodes and handle their attributes, as well as methods to traverse the object tree. The user entry point to the object tree attached to the HDF5 file is represented in the rootUEP attribute. Other attributes are available.
File objects support an Undo/Redo mechanism which can be enabled with the enableUndo() method. Once the Undo/Redo mechanism is enabled, explicit marks (with an optional unique name) can be set on the state of the database using the mark() method. There are two implicit marks which are always available: the initial mark (0) and the final mark (-1). Both the identifier of a mark and its name can be used in undo and redo operations.
Hierarchy manipulation operations (node creation, movement and removal) and attribute handling operations (setting and deleting) made after a mark can be undone by using the undo() method, which returns the database to the state of a past mark. If undo() is not followed by operations that modify the hierarchy or attributes, the redo() method can be used to return the database to the state of a future mark. Else, future states of the database are forgotten.
Please note that data handling operations can not be undone nor redone by now. Also, hierarchy manipulation operations on nodes that do not support the Undo/Redo mechanism issue an UndoRedoWarning before changing the database.
The Undo/Redo mechanism is persistent between sessions and can only be disabled by calling the disableUndo() method.
Create a new Group instance with name name in where location.
Create a new Table instance with name name in where location.
Create a new Array instance with name name in where location.
See createTable description 4.2.2 for more information on the where, name and title, parameters.
Create a new EArray instance with name name in where location.
See createTable description 4.2.2 for more information on the where, name, title, and filters parameters.
Create a new VLArray instance with name name in where location. See the section 4.10 for a description of the VLArray class.
See createTable description 4.2.2 for more information on the where, name, title, and filters parameters.
Get the node under where with the given name.
where can be a Node instance or a path string leading to a node. If no name is specified, that node is returned.
If a name is specified, this must be a string with the name of a node under where. In this case the where argument can only lead to a Group instance (else a TypeError is raised). The node called name under the group where is returned.
In both cases, if the node to be returned does not exist, a NoSuchNodeError is raised. Please note that hidden nodes are also considered.
If the classname argument is specified, it must be the name of a class derived from Node. If the node is found but it is not an instance of that class, a NoSuchNodeError is also raised.
Returns the attribute attrname under where.name location.
Sets the attribute attrname with value attrvalue under where.name location. If the node already has a large number of attributes, a PerformanceWarning will be issued.
Delete the attribute attrname in where.name location.
Copy the attributes from node where.name to dstnode.
Returns a list with children nodes hanging from where. The list is alpha-numerically sorted by node name.
Removes the object node name under where location.
Copy the node specified by where and name to newparent/newname.
Change the name of the node specified by where and name to newname.
Move the node specified by where and name to newparent/newname.
Iterator that returns the list of Groups (not Leaves) hanging from (and including) where. The where Group is listed first (pre-order), then each of its child Groups (following an alphanumerical order) is also traversed, following the same procedure. If where is not supplied, the root object is used.
Recursively iterate over the nodes in the File instance. It takes two parameters:
Example of use:
# Recursively print all the nodes hanging from '/detector' print "Nodes hanging from group '/detector':" for node in h5file.walkNodes("/detector"): print node
Copy the children of a group into another group.
This method copies the nodes hanging from the source group srcgroup into the destination group dstgroup. Existing destination nodes can be replaced by asserting the overwrite argument. If the recursive argument is true, all descendant nodes of srcnode are recursively copied.
kwargs takes keyword arguments used to customize the copying process. See the documentation of Group._f_copyChildren() (see 4.4.2) for a description of those arguments.
Copy the contents of this file to dstfilename.
dstfilename must be a path string indicating the name of the destination file. If it already exists, the copy will fail with an IOError, unless the overwrite argument is true, in which case the destination file will be overwritten in place.
Additional keyword arguments may be passed to customize the copying process. For instance, title and filters may be changed, user attributes may be or may not be copied, data may be sub-sampled, stats may be collected, etc. Arguments unknown to nodes are simply ignored. Check the documentation for copying operations of nodes to see which options they support.
Copying a file usually has the beneficial side effect of creating a more compact and cleaner version of the original file.
Flush all the leaves in the object tree.
Flush all the leaves in object tree and close the file.
Is the Undo/Redo mechanism enabled?
Returns True if the Undo/Redo mechanism has been enabled for this file, False otherwise. Please note that this mechanism is persistent, so a newly opened PyTables file may already have Undo/Redo support.
Enable the Undo/Redo mechanism.
This operation prepares the database for undoing and redoing modifications in the node hierarchy. This allows mark(), undo(), redo() and other methods to be called.
The filters argument, when specified, must be an instance of class Filters (see section 4.14.1) and is meant for setting the compression values for the action log. The default is having compression enabled, as the gains in terms of space can be considerable. You may want to disable compression if you want maximum speed for Undo/Redo operations.
Calling enableUndo() when the Undo/Redo mechanism is already enabled raises an UndoRedoError.
Disable the Undo/Redo mechanism.
Disabling the Undo/Redo mechanism leaves the database in the current state and forgets past and future database states. This makes mark(), undo(), redo() and other methods fail with an UndoRedoError.
Calling disableUndo() when the Undo/Redo mechanism is already disabled raises an UndoRedoError.
Mark the state of the database.
Creates a mark for the current state of the database. A unique (and immutable) identifier for the mark is returned. An optional name (a string) can be assigned to the mark. Both the identifier of a mark and its name can be used in undo() and redo() operations. When the name has already been used for another mark, an UndoRedoError is raised.
This method can only be called when the Undo/Redo mechanism has been enabled. Otherwise, an UndoRedoError is raised.
Get the identifier of the current mark.
Returns the identifier of the current mark. This can be used to know the state of a database after an application crash, or to get the identifier of the initial implicit mark after a call to enableUndo().
This method can only be called when the Undo/Redo mechanism has been enabled. Otherwise, an UndoRedoError is raised.
Go to a past state of the database.
Returns the database to the state associated with the specified mark. Both the identifier of a mark and its name can be used. If the mark is omitted, the last created mark is used. If there are no past marks, or the specified mark is not older than the current one, an UndoRedoError is raised.
This method can only be called when the Undo/Redo mechanism has been enabled. Otherwise, an UndoRedoError is raised.
Go to a future state of the database.
Returns the database to the state associated with the specified mark. Both the identifier of a mark and its name can be used. If the mark is omitted, the next created mark is used. If there are no future marks, or the specified mark is not newer than the current one, an UndoRedoError is raised.
This method can only be called when the Undo/Redo mechanism has been enabled. Otherwise, an UndoRedoError is raised.
Go to a specific mark of the database.
Returns the database to the state associated with the specified mark. Both the identifier of a mark and its name can be used.
This method can only be called when the Undo/Redo mechanism has been enabled. Otherwise, an UndoRedoError is raised.
Following are described the methods that automatically trigger actions when a File instance is accessed in a special way.
Is there a node with that path?
Returns True if the file has a node with the given path (a string), False otherwise.
Iterate over the children on the File instance. However, this does not accept parameters. This iterator is recursive.
Example of use:
# Recursively list all the nodes in the object tree h5file = tables.openFile("vlarray1.h5") print "All nodes in the object tree:" for node in h5file: print node
Prints a short description of the File object.
Example of use:
>>> f=tables.openFile("data/test.h5") >>> print f data/test.h5 (File) 'Table Benchmark' Last modif.: 'Mon Sep 20 12:40:47 2004' Object Tree: / (Group) 'Table Benchmark' /tuple0 (Table(100L,)) 'This is the table title' /group0 (Group) '' /group0/tuple1 (Table(100L,)) 'This is the table title' /group0/group1 (Group) '' /group0/group1/tuple2 (Table(100L,)) 'This is the table title' /group0/group1/group2 (Group) ''
This is the base class for all nodes in a PyTables hierarchy. It is an abstract class, i.e. it may not be directly instantiated; however, every node in the hierarchy is an instance of this class.
A PyTables node is always hosted in a PyTables file, under a parent group, at a certain depth in the node hierarchy. A node knows its own name in the parent group and its own path name in the file. When using a translation map (see 4.2), its HDF5 name might differ from its PyTables name.
All the previous information is location-dependent, i.e. it may change when moving or renaming a node in the hierarchy. A node also has location-independent information, such as its HDF5 object identifier and its attribute set.
This class gathers the operations and attributes (both location-dependent and independent) which are common to all PyTables nodes, whatever their type is. Nonetheless, due to natural naming restrictions, the names of all of these members start with a reserved prefix (see 4.4).
Sub-classes with no children (i.e. leaf nodes) may define new methods, attributes and properties to avoid natural naming restrictions. For instance, _v_attrs may be shortened to attrs and _f_rename to rename. However, the original methods and attributes should still be available.
Close this node in the tree.
This makes the node inaccessible from the object tree. The closing operation is not recursive, i.e. closing a group does not close its children. On nodes with data, it may flush it to disk.
Remove this node from the hierarchy.
If the node has children, recursive removal must be stated by giving recursive a true value; otherwise, a NodeError will be raised.
Rename this node in place.
Changes the name of a node to newname (a string).
Move or rename this node.
Moves a node into a new parent group, or changes the name of the node. newparent can be a Group object or a pathname in string form. If it is not specified or None, the current parent group is chosen as the new parent. newname must be a string with a new name. If it is not specified or None, the current name is chosen as the new name.
Moving a node across databases is not allowed, nor it is moving a node into itself. These result in a NodeError. However, moving a node over itself is allowed and simply does nothing. Moving over another existing node is similarly not allowed, unless the optional overwrite argument is true, in which case that node is recursively removed before moving.
Usually, only the first argument will be used, effectively moving the node to a new location without changing its name. Using only the second argument is equivalent to renaming the node in place.
Copy this node and return the new node.
Creates and returns a copy of the node, maybe in a different place in the hierarchy. newparent can be a Group object or a pathname in string form. If it is not specified or None, the current parent group is chosen as the new parent. newname must be a string with a new name. If it is not specified or None, the current name is chosen as the new name. If recursive copy is stated, all descendants are copied as well.
Copying a node across databases is supported but can not be undone. Copying a node over itself is not allowed, nor it is recursively copying a node into itself. These result in a NodeError. Copying over another existing node is similarly not allowed, unless the optional overwrite argument is true, in which case that node is recursively removed before copying.
Additional keyword arguments may be passed to customize the copying process. For instance, title and filters may be changed, user attributes may be or may not be copied, data may be sub-sampled, stats may be collected, etc. See the documentation for the particular node type.
Using only the first argument is equivalent to copying the node to a new location without changing its name. Using only the second argument is equivalent to making a copy of the node in the same group.
Get a PyTables attribute from this node.
If the named attribute does not exist, an AttributeError is raised.
Set a PyTables attribute for this node.
If the node already has a large number of attributes, a PerformanceWarning is issued.
Delete a PyTables attribute from this node.
If the named attribute does not exist, an AttributeError is raised.
Instances of this class are a grouping structure containing instances of zero or more groups or leaves, together with supporting metadata.
Working with groups and leaves is similar in many ways to
working with directories and files, respectively, in a Unix
filesystem. As with Unix directories and files, objects in
the object tree are often described by giving their full (or
absolute) path names. This full path can be specified either
as a string (like in '/group1/group2') or as a
complete object path written in natural name schema
(like in
file.root.group1.group2) as
discussed in the section 1.2.
A collateral effect of the natural naming schema is that names of Group members must be carefully chosen to avoid colliding with existing children node names. For this reason and not to pollute the children namespace, it is explicitly forbidden to assign normal attributes to Group instances, and all existing members start with some reserved prefixes, like _f_ (for methods) or _v_ (for instance variables). Any attempt to set a new child node whose name starts with one of these prefixes will raise a ValueError exception.
Another effect of natural naming is that nodes having reserved Python names and other non-allowed Python names (like for example $a or 44) can not be accessed using the node.child syntax. You will be forced to use getattr(node, child) and delattr(node, child) to access them.
You can also make use of the trMap (translation map dictionary) parameter in the openFile function (see section 4.1.2) in order to translate HDF5 names not suited for natural naming into more convenient ones.
These instance variables are provided in addition to those in Node (see 4.3).
# Add a Table child instance under group with name "tablename" file.createTable(group, 'tablename', recordDict, "Record instance") table = group.tablename # Get the table child instance del group.tablename # Delete the table child instance
Caveat: The following methods are documented for completeness, and they can be used without any problem. However, you should use the high-level counterpart methods in the File class, because these are most used in documentation and examples, and are a bit more powerful than those exposed here.
These methods are provided in addition to those in Node (see 4.3).
Helper method to correctly concatenate a name child object with the pathname of this group.
Copy this node and return the new one.
This method has the behavior described in Node._f_copy() (see [here]). In addition, it recognizes the following keyword arguments:
Returns a list with all the object nodes hanging from this instance. The list is alpha-numerically sorted by node name. If a classname parameter is supplied, it will only return instances of this class (or subclasses of it).
Iterate over the list of Groups (not Leaves) hanging from (and including) self. This Group is listed first (pre-order), then each of its child Groups (following an alphanumerical order) is also traversed, following the same procedure.
Iterate over the nodes in the Group instance. It takes two parameters:
Example of use:
# Recursively print all the arrays hanging from '/' print "Arrays the object tree '/':" for array in h5file.root._f_walkNodes("Array", recursive=1): print array
Close this node in the tree.
This method has the behavior described in Node._f_close() (see [here]). It should be noted that this operation disables access to nodes descending from this group. Therefore, if you want to explicitly close them, you will need to walk the nodes hanging from this group before closing it.
Copy the children of this group into another group.
Children hanging directly from this group are copied into dstgroup, which can be a Group (see 4.4) object or its pathname in string form.
The operation will fail with a NodeError if there is a child node in the destination group with the same name as one of the copied children from this one, unless overwrite is true; in this case, the former child node is recursively removed before copying the later.
By default, nodes descending from children groups of this node are not copied. If the recursive argument is true, all descendant nodes of this node are recursively copied.
Additional keyword arguments may be passed to customize the copying process. For instance, title and filters may be changed, user attributes may be or may not be copied, data may be sub-sampled, stats may be collected, etc. Arguments unknown to nodes are simply ignored. Check the documentation for copying operations of nodes to see which options they support.
Following are described the methods that automatically trigger actions when a Group instance is accessed in a special way.
Is there a child with that name?
Returns True if the group has a child node (visible or hidden) with the given name (a string), False otherwise.
Iterate over the children on the group instance. However, this does not accept parameters. This iterator is not recursive.
Example of use:
# Non-recursively list all the nodes hanging from '/detector' print "Nodes in '/detector' group:" for node in h5file.root.detector: print node
The goal of this class is to provide a place to put common functionality of all its descendants as well as provide a way to help classifying objects on the tree. A Leaf object is an end-node, that is, a node that can hang directly from a group object, but that is not a group itself and, thus, it can not have descendants. Right now, the set of end-nodes is composed by Table, Array, EArray, VLArray and UnImplemented class instances. In fact, all the previous classes inherit from the Leaf class.
These instance variables are provided in addition to those in Node (see 4.3).
Close this node in the tree.
This method has the behavior described in Node._f_close() (see [here]). Besides that, the optional argument flush tells whether to flush pending data to disk or not before closing.
Remove this node from the hierarchy.
This method has the behavior described in Node._f_remove() (see [here]). Please note that there is no recursive flag since leaves do not have child nodes.
Copy this node and return the new one.
This method has the behavior described in Node._f_copy() (see [here]). Please note that there is no recursive flag since leaves do not have child nodes. In addition, this method recognizes the following keyword arguments:
Rename this node in place.
This method has the behavior described in Node._f_rename() (see [here]).
Move or rename this node.
This method has the behavior described in Node._f_move() (see [here]).
Get a PyTables attribute from this node.
This method has the behavior described in Node._f_getAttr() (see [here]).
Set a PyTables attribute for this node.
This method has the behavior described in Node._f_setAttr() (see [here]).
Delete a PyTables attribute from this node.
This method has the behavior described in Node._f_delAttr() (see [here]).
Instances of this class represents table objects in the object tree. It provides methods to read/write data and from/to table objects in the file.
Data can be read from or written to tables by accessing to an special object that hangs from Table. This object is an instance of the Row class (see 4.6.4). See the tutorial sections chapter 3 on how to use the Row interface. The columns of the tables can also be easily accessed (and more specifically, they can be read but not written) by making use of the Column class, through the use of an extension of the natural naming schema applied inside the tables. See the section 4.7 for some examples of use of this capability.
Note that this object inherits all the public attributes and methods that Leaf already has.
Append a series of rows to this Table instance. rows is an object that can keep the rows to be append in several formats, like a RecArray, a list of tuples, list of Numeric/NumArray/CharArray objects, string, Python buffer or None (no append will result). Of course, this rows object has to be compliant with the underlying format of the Table instance or a ValueError will be issued.
from tables import * class Particle(IsDescription): name = StringCol(16, pos=1) # 16-character String lati = IntCol(pos=2) # integer longi = IntCol(pos=3) # integer pressure = Float32Col(pos=4) # float (single-precision) temperature = FloatCol(pos=5) # double (double-precision) fileh = openFile("test4.h5", mode = "w") table = fileh.createTable(fileh.root, 'table', Particle, "A table") # Append several rows in only one call table.append([("Particle: 10", 10, 0, 10*10, 10**2), ("Particle: 11", 11, -1, 11*11, 11**2), ("Particle: 12", 12, -2, 12*12, 12**2)]) fileh.close()
Get a column from the table.
If a column called name exists in the table, it is read and returned as a numarray.NumArray object, or as a numarray.strings.CharArray object (whatever is more appropriate). If it does not exist, a ValueError is raised.
Example of use:
narray = table.col('var2')
That statement is equivalent to:
narray = table.read(field='var2')
Here you can see how this method can be used as a shorthand for the read() (see 4.6.2) method.
Returns an iterator yielding Row (see section 4.6.4) instances built from rows in table. If a range is supplied (i.e. some of the start, stop or step parameters are passed), only the appropriate rows are returned. Else, all the rows are returned. See also the __iter__() special method in section 4.6.3 for a shorter way to call this iterator.
The meaning of the start, stop and step parameters is the same as in the range() python function, except that negative values of step are not allowed. Moreover, if only start is specified, then stop will be set to start+1. If you do not specify neither start nor stop, then all the rows in the object are selected.
Example of use:
result = [ row['var2'] for row in table.iterrows(step=5) if row['var1'] <= 20 ]
Iterate over a sequence of row coordinates.
Returns the actual data in Table. If field is not supplied, it returns the data as a RecArray object table.
The meaning of the start, stop and step parameters is the same as in the range() python function, except that negative values of step are not allowed. Moreover, if only start is specified, then stop will be set to start+1. If you do not specify neither start nor stop, then all the rows in the object are selected.
The rest of the parameters are described next:
Read a set of rows given their indexes into an in-memory object.
This method works much like the read() method (see 4.6.2), but it uses a sequence (coords) of row indexes to select the wanted columns, instead of a column range.
It returns the selected rows in a RecArray object. If both field and flavor are provided, an additional conversion to an object of this flavor is made, just as in read().
Modify a series of rows in the [start:stop:step] extended slice range. If you pass None to stop, all the rows existing in rows will be used.
rows can be either a RecArray object or a structure that is able to be converted to a RecArray and compliant with the table format.
Returns the number of modified rows.
It raises an ValueError in case the rows parameter could not be converted to an object compliant with table description.
It raises an IndexError in case the modification will exceed the length of the table.
Modify a series of rows in the [start:stop:step] extended slice row range. If you pass None to stop, all the rows existing in columns will be used.
columns can be either a RecArray or a list of arrays (the columns) that is able to be converted to a RecArray compliant with the specified column names subset of the table format.
names specifies the column names of the table to be modified.
Returns the number of modified rows.
It raises an ValueError in case the columns parameter could not be converted to an object compliant with table description.
It raises an IndexError in case the modification will exceed the length of the table.
Removes a range of rows in the table. If only start is supplied, this row is to be deleted. If a range is supplied, i.e. both the start and stop parameters are passed, all the rows in the range are removed. A step parameter is not supported, and it is not foreseen to implement it anytime soon.
Remove the index associated with the specified column. Only Index instances (see 4.14.3) are accepted as parameter. This index can be recreated again by calling the createIndex (see 4.7.2) method of the appropriate Column object.
Add remaining rows in buffers to non-dirty indexes. This can be useful when you have chosen non-automatic indexing for the table (see section 4.14.2) and want to update the indexes on it.
Recompute all the existing indexes in table. This can be useful when you suspect that, for any reason, the index information for columns is no longer valid and want to rebuild the indexes on it.
Recompute the existing indexes in table, but only if they are dirty. This can be useful when you have set the reindex parameter to 0 in IndexProps constructor (see 4.14.2) for the table and want to update the indexes after a invalidating index operation (Table.removeRows, for example).
Iterate over values fulfilling a condition.
This method returns an iterator yielding Row (see 4.6.4) instances built from rows in the table that satisfy the given condition over a column. If that column is indexed, its index will be used in order to accelerate the search. Else, the in-kernel iterator (with has still better performance than standard Python selections) will be chosen instead.
Moreover, if a range is supplied (i.e. some of the start, stop or step parameters are passed), only the rows in that range and fullfilling the condition are returned. The meaning of the start, stop and step parameters is the same as in the range() Python function, except that negative values of step are not allowed. Moreover, if only start is specified, then stop will be set to start+1.
You can mix this method with standard Python selections in order to have complex queries. It is strongly recommended that you pass the most restrictive condition as the parameter to this method if you want to achieve maximum performance.
Example of use:
passvalues=[] for row in table.where(0 < table.cols.col1 < 0.3, step=5): if row['col2'] <= 20: passvalues.append(row['col3']) print "Values that pass the cuts:", passvalues
Append rows fullfilling the condition to the dstTable table.
dstTable must be capable of taking the rows resulting from the query, i.e. it must have columns with the expected names and compatible types. The meaning of the other arguments is the same as in the where() method (see 4.6.2).
The number of rows appended to `dstTable` is returned as a result.
Following are described the methods that automatically trigger actions when a Table instance is accessed in a special way (e.g., table["var2"] will be equivalent to a call to table.__getitem__("var2")).
It returns the same iterator than Table.iterrows(0,0,1). However, this does not accept parameters.
Example of use:
result = [ row['var2'] for row in table if row['var1'] <= 20 ]
Which is equivalent to:
result = [ row['var2'] for row in table.iterrows() if row['var1'] <= 20 ]
Get a row or a range of rows from the table.
If the key argument is an integer, the corresponding table row is returned as a numarray.records.Record object. If key is a slice, the range of rows determined by it is returned as a numarray.reords.RecArray object.
Using a string as key to get a column is supported but deprecated. Please use the col() (see 4.6.2) method.
Example of use:
record = table[4] recarray = table[4:1000:2]
Those statements are equivalent to:
record = table.read(start=4)[0] recarray = table.read(start=4, stop=1000, step=2)
Here you can see how indexing and slicing can be used as shorthands for the read() (see 4.6.2) method.
It takes different actions depending on the type of the key parameter:
Example of use:
# Modify just one existing row table[2] = [456,'db2',1.2] # Modify two existing rows rows = numarray.records.array([[457,'db1',1.2],[6,'de2',1.3]], formats="i4,a3,f8") table[1:3:2] = rows
Which is equivalent to:
table.modifyRows(start=2, [456,'db2',1.2]) rows = numarray.records.array([[457,'db1',1.2],[6,'de2',1.3]], formats="i4,a3,f8") table.modifyRows(start=1, step=2, rows)
This class is used to fetch and set values on the table fields. It works very much like a dictionary, where the keys are the field names of the associated table and the values are the values of those fields in a specific row.
This object turns out to actually be an extension type, so you won't be able to access its documentation interactively. Neither you won't be able to access its internal attributes (they are not directly accessible from Python), although accessors (i.e. methods that return an internal attribute) have been defined for some important variables.
This class is used as an accessor to the table columns following the natural name convention, so that you can access the different columns because there exists one attribute with the name of the columns for each associated Column instances. Besides, and like the Row class, it works similar to a dictionary, where the keys are the column names of the associated table and the values are Column instances. See section 4.7 for examples of use.
Each instance of this class is associated with one column of every table. These instances are mainly used to fetch and set actual data from the table columns, but there are a few other associated methods to deal with indexes.
Recompute the index associated with this column. This can be useful when you suspect that, for any reason, the index information is no longer valid and want to rebuild it.
Recompute the existing index only if it is dirty. This can be useful when you have set the reindex parameter to 0 in IndexProps constructor (see 4.14.2) for the table and want to update the column's index after a invalidating index operation (Table.removeRows, for example).
Delete the associated column's index. After doing that, you will loose the indexation information on disk. However, you can always re-create it using the createIndex() method (see 4.7.2).
Returns a column element or slice. It takes different actions depending on the type of the key parameter:
print "Column handlers:" for name in table.colnames: print table.cols[name] print print "Some selections:" print "Select table.cols.name[1]-->", table.cols.name[1] print "Select table.cols.name[1:2]-->", table.cols.name[1:2] print "Select table.cols.lati[1:3]-->", table.cols.lati[1:3] print "Select table.cols.pressure[:]-->", table.cols.pressure[:] print "Select table.cols['temperature'][:]-->", table.cols['temperature'][:]and the output of this for a certain arbitrary table is:
Column handlers: /table.cols.name (Column(1,), CharType) /table.cols.lati (Column(2,), Int32) /table.cols.longi (Column(1,), Int32) /table.cols.pressure (Column(1,), Float32) /table.cols.temperature (Column(1,), Float64) Some selections: Select table.cols.name[1]--> Particle: 11 Select table.cols.name[1:2]--> ['Particle: 11'] Select table.cols.lati[1:3]--> [[11 12] [12 13]] Select table.cols.pressure[:]--> [ 90. 110. 132.] Select table.cols['temperature'][:]--> [ 100. 121. 144.]See the examples/table2.py for a more complete example.
It takes different actions depending on the type of the key parameter:
Example of use:
# Modify row 1 table.cols.col1[1] = -1 # Modify rows 1 and 3 table.cols.col1[1::2] = [2,3]
Which is equivalent to:
# Modify row 1 table.modifyColumns(start=1, columns=[[-1]], names=["col1"]) # Modify rows 1 and 3 columns = numarray.records.fromarrays([[2,3]], formats="i4") table.modifyColumns(start=1, step=2, columns=columns, names=["col1"])
Represents an array on file. It provides methods to write/read data to/from array objects in the file. This class does not allow you to enlarge the datasets on disk; see the EArray descendant in section 4.9 if you want enlargeable dataset support and/or compression features.
The array data types supported are the same as the set provided by Numeric and numarray. For details of these data types see appendix A, or the numarray reference manual ().
Note that this object inherits all the public attributes and methods that Leaf already provides.
Note that, as this object has no internal I/O buffers, it is not necessary to use the flush() method inherited from Leaf in order to save its internal state to disk. When a writing method call returns, all the data is already on disk.
Returns an iterator yielding numarray instances built from rows in array. The return rows are taken from the first dimension in case of an Array instance and the enlargeable dimension in case of an EArray instance. If a range is supplied (i.e. some of the start, stop or step parameters are passed), only the appropriate rows are returned. Else, all the rows are returned. See also the and __iter__() special methods in section 4.8.3 for a shorter way to call this iterator.
The meaning of the start, stop and step parameters is the same as in the range() python function, except that negative values of step are not allowed. Moreover, if only start is specified, then stop will be set to start+1. If you do not specify neither start nor stop, then all the rows in the object are selected.
Example of use:
result = [ row for row in arrayInstance.iterrows(step=4) ]
Read the array from disk and return it as a numarray (default) object, or an object with the same original flavor that it was saved. It accepts start, stop and step parameters to select rows (the first dimension in the case of an Array instance and the enlargeable dimension in case of an EArray) for reading.
The meaning of the start, stop and step parameters is the same as in the range() python function, except that negative values of step are not allowed. Moreover, if only start is specified, then stop will be set to start+1. If you do not specify neither start nor stop, then all the rows in the object are selected.
Following are described the methods that automatically
trigger actions when an Array instance is
accessed in a special way (e.g.,
array[2:3,...,::2] will be equivalent to a
call to
array.__getitem__(slice(2,3, None),
Ellipsis, slice(None, None, 2))).
It returns the same iterator than Array.iterrows(0,0,1). However, this does not accept parameters.
Example of use:
result = [ row[2] for row in array ]
Which is equivalent to:
result = [ row[2] for row in array.iterrows(0, 0, 1) ]
It returns a numarray (default) object (or an object with the same original flavor that it was saved) containing the slice of rows stated in the key parameter. The set of allowed tokens in key is the same as extended slicing in python (the Ellipsis token included).
Example of use:
array1 = array[4] # array1.shape == array.shape[1:] array2 = array[4:1000:2] # len(array2.shape) == len(array.shape) array3 = array[::2, 1:4, :] array4 = array[1, ..., ::2, 1:4, 4:] # General slice selection
Sets an Array element, row or extended slice. It takes different actions depending on the type of the key parameter:
Example of use:
a1[0] = 333 # Assign an integer to a Integer Array row a2[0] = "b" # Assign a string to a string Array row a3[1:4] = 5 # Broadcast 5 to slice 1:4 a4[1:4:2] = "xXx" # Broadcast "xXx" to slice 1:4:2 # General slice update (a5.shape = (4,3,2,8,5,10) a5[1, ..., ::2, 1:4, 4:] = arange(1728, shape=(4,3,2,4,3,6))
This is a child of the Array class (see 4.8) and as such, EArray represents an array on the file. The difference is that EArray allows to enlarge datasets along any single dimension6) you select. Another important difference is that it also supports compression.
So, in addition to the attributes and methods that EArray inherits from Array, it supports a few more that provide a way to enlarge the arrays on disk. Following are described the new variables and methods as well as some that already exist in Array but that differ somewhat on the meaning and/or functionality in the EArray context.
Appends a sequence to the underlying dataset. Obviously, this sequence must have the same type as the EArray instance; otherwise a TypeError is issued. In the same way, the dimensions of the sequence have to conform to those of EArray, that is, all the dimensions have to be the same except, of course, that of the enlargeable dimension which can be of any length (even 0!).
Example of use (code available in examples/earray1.py):
import tables from numarray import strings fileh = tables.openFile("earray1.h5", mode = "w") a = tables.StringAtom(shape=(0,), length=8) # Use 'a' as the object type for the enlargeable array array_c = fileh.createEArray(fileh.root, 'array_c', a, "Chars") array_c.append(strings.array(['a'*2, 'b'*4], itemsize=8)) array_c.append(strings.array(['a'*6, 'b'*8, 'c'*10], itemsize=8)) # Read the string EArray we have created on disk for s in array_c: print "array_c[%s] => '%s'" % (array_c.nrow, s) # Close the file fileh.close()
and the output is:
array_c[0] => 'aa' array_c[1] => 'bbbb' array_c[2] => 'aaaaaa' array_c[3] => 'bbbbbbbb' array_c[4] => 'cccccccc'
Instances of this class represents array objects in the object tree with the property that their rows can have a variable number of (homogeneous) elements (called atomic objects, or just atoms). Variable length arrays (or VLA's for short), similarly to Table instances, can have only one dimension, and likewise Table, the compound elements (the atoms) of the rows of VLArrays can be fully multidimensional objects.
VLArray provides methods to read/write data from/to variable length array objects residents on disk. Also, note that this object inherits all the public attributes and methods that Leaf already has.
Append objects in the sequence to the array.
This method appends the objects in the sequence to a single row in this array. The type of individual objects must be compliant with the type of atoms in the array. In the case of variable length strings, the very string to append is the sequence.
Example of use (code available in examples/vlarray1.py):
import tables from Numeric import * # or, from numarray import * # Create a VLArray: fileh = tables.openFile("vlarray1.h5", mode = "w") vlarray = fileh.createVLArray(fileh.root, 'vlarray1', tables.Int32Atom(flavor="Numeric"), "ragged array of ints", Filters(complevel=1)) # Append some (variable length) rows: vlarray.append(array([5, 6])) vlarray.append(array([5, 6, 7])) vlarray.append([5, 6, 9, 8]) # Now, read it through an iterator: for x in vlarray: print vlarray.name+"["+str(vlarray.nrow)+"]-->", x # Close the file fileh.close()
The output of the previous program looks like this:
vlarray1[0]--> [5 6] vlarray1[1]--> [5 6 7] vlarray1[2]--> [5 6 9 8]
The objects argument is only retained for backwards compatibility; please do not use it.
Returns an iterator yielding one row per iteration. If a range is supplied (i.e. some of the start, stop or step parameters are passed), only the appropriate rows are returned. Else, all the rows are returned. See also the __iter__() special methods in section 4.10.3 for a shorter way to call this iterator.
The meaning of the start, stop and step parameters is the same as in the range() python function, except that negative values of step are not allowed. Moreover, if only start is specified, then stop will be set to start+1. If you do not specify neither start nor stop, then all the rows in the object are selected.
Example of use:
for row in vlarray.iterrows(step=4): print vlarray.name+"["+str(vlarray.nrow)+"]-->", row
Returns the actual data in VLArray. As the lengths of the different rows are variable, the returned value is a python list, with as many entries as specified rows in the range parameters.
The meaning of the start, stop and step parameters is the same as in the range() python function, except that negative values of step are not allowed. Moreover, if only start is specified, then stop will be set to start+1. If you do not specify neither start nor stop, then all the rows in the object are selected.
Following are described the methods that automatically trigger actions when a VLArray instance is accessed in a special way (e.g., vlarray[2:5] will be equivalent to a call to vlarray.__getitem__(slice(2,5,None)).
It returns the same iterator than VLArray.iterrows(0,0,1). However, this does not accept parameters.
Example of use:
result = [ row for row in vlarray ]
Which is equivalent to:
result = [ row for row in vlarray.iterrows() ]
It returns the slice of rows determined by key, which can be an integer index or an extended slice. The returned value is a list of objects of type array.atom.type.
Example of use:
list1 = vlarray[4] list2 = vlarray[4:1000:2]
Updates a vlarray row described by keys by setting it to value. Depending on the value of keys, the action taken is different:
Note: When updating VLStrings (codification UTF-8) or Objects atoms, there is a problem: one can only update values with exactly the same bytes than in the original row. With UTF-8 encoding this is problematic because, for instance, 'c' takes 1 byte, but 'ç' takes two. The same applies when using Objects atoms, because when cPickle applies to a class instance (for example), it does not guarantee to return the same number of bytes than over other instance, even of the same class than the former. These facts effectively limit the number of objects than can be updated in VLArrays.
Example of use:
vlarray[0] = vlarray[0]*2+3 vlarray[99,3:] = arange(96)*2+3 # Negative values for start and stop (but not step) are supported vlarray[99,-99:-89:2] = vlarray[5]*2+3
Instances of this class represents an unimplemented dataset in a generic HDF5 file. When reading such a file (i.e. one that has not been created with PyTables, but with some other HDF5 library based tool), chances are that the specific combination of datatypes and/or dataspaces in some dataset might not be supported by PyTables yet. In such a case, this dataset will be mapped into the UnImplemented class and hence, the user will still be able to build the complete object tree of this generic HDF5 file, as well as enabling the access (both read and write) of the attributes of this dataset and some metadata. Of course, the user won't be able to read the actual data on it.
This is an elegant way to allow users to work with generic HDF5 files despite the fact that some of its datasets would not be supported by PyTables. However, if you are really interested in having access to an unimplemented dataset, please, get in contact with the developer team.
This class does not have any public instance variables, except those inherited from the Leaf class (see 4.5).
Represents the set of attributes of a node (Leaf or Group). It provides methods to create new attributes, open, rename or delete existing ones.
Like in Group instances, AttributeSet instances make use of the natural naming convention, i.e. you can access the attributes on disk like if they were normal AttributeSet attributes. This offers the user a very convenient way to access (but also to set and delete) node attributes by simply specifying them like a normal attribute class.
Caveat: All Python data types are supported. The scalar ones (i.e. String, Int and Float) are mapped directly to the HDF5 counterparts, so you can correctly visualize them with any HDF5 tool. However, the rest of the data types and more general objects are serialized using cPickle, so you will be able to correctly retrieve them only from a Python-aware HDF5 library. Hopefully, the list of supported native attributes will be extended to fully multidimensional arrays sometime in the future.
Note that this class defines the __setattr__, __getattr__ and __delattr__ and they work as normally intended. Any scalar (string, ints or floats) attribute is supported natively as an attribute. However, (c)Pickle is automatically used so as to serialize other kind of objects (like lists, tuples, dicts, small Numeric/numarray objects, ...) that you might want to save. If an attribute is set on a target node that already has a large number of attributes, a PerformanceWarning will be issued.
leaf.attrs.myattr = "str attr" # Set a string (native support) leaf.attrs.myattr2 = 3 # Set an integer (native support) leaf.attrs.myattr3 = [3,(1,2)] # A generic object (Pickled) attrib = leaf.attrs.myattr # Get the attribute myattr del leaf.attrs.myattr # Delete the attribute myattr
In this section a series of classes that are meant to declare datatypes that are required for primary PyTables (like Table or VLArray ) objects are described.
This class is in fact a so-called metaclass object. There is nothing special on this fact, except that its subclasses' attributes are transformed during its instantiation phase, and new methods for instances are defined based on the values of the class attributes.
It is designed to be used as an easy, yet meaningful way to describe the properties of Table objects through the use of classes that inherit properties from it. In order to define such a special class, you have to declare it as descendant of IsDescription, with as many attributes as columns you want in your table. The name of these attributes will become the name of the columns, while their values will be the properties of the columns that are obtained through the use of the Col (see section 4.13.2) class constructor.
Then, you can pass this object to the Table constructor, where all the information it contains will be used to define the table structure. See the section 3.3 for an example on how that works.
Moreover, you can use the IsDescription object to change the properties of the index creation process for a table. Just create an instance of the IndexProps class (see section 4.14.2) and assign it to the special attribute _v_indexprops of the IsDescription object.
The Col class is used as a mean to declare the different properties of a table column. In addition, a series of descendant classes are offered in order to make these column descriptions easier to the user. In general, it is recommended to use these descendant classes, as they are more meaningful when found in the middle of the code.
Note that the only public method accessible in these classes is the constructor itself.
This class has several descendants:
This class has two descendants:
This class has two descendants:
ComplexCol columns and its descendants do not support indexation.
Time columns have a special encoding in the HFD5 file. See appendix A for more information on those types.
This class has two descendants:
The Atom class is meant to declare the different properties of the base element (also known as atom) of EArray and VLArray objects. The Atom instances have the property that their length is always the same. However, you can grow objects along the extensible dimension in the case of EArray or put a variable number of them on a VLArray row. Moreover, the atoms are not restricted to scalar values, and they can be fully multidimensional objects.
A series of descendant classes are offered in order to make the use of these element descriptions easier. In general, it is recommended to use these descendant classes, as they are more meaningful when found in the middle of the code. Note that the only public methods accessible in these classes are the atomsize() method and the constructor itself. The atomsize() method returns the total length, in bytes, of the element base atom.
A description of the different constructors with their parameters follows:
This class has several descendants:
This class has two descendants:
This class has two descendants:
Time atoms have a special encoding in the HFD5 file. See appendix A for more information on those types.
This class has two descendants:
Now, there come two special classes, ObjectAtom and VLString, that actually do not descend from Atom, but which goal is so similar that they should be described here. The difference between them and the Atom and descendants classes is that these special classes does not allow multidimensional atoms, nor multiple values per row. A flavor can not be specified neither as it is immutable (see below).
Caveat emptor: You are only allowed to use these classes to create VLArray objects, not EArray objects.
See examples/vlarray1.py and examples/vlarray2.py for further examples on VLArrays, including object serialization and Unicode string management.
In this section are listed classes that does not fit in any other section and that mainly serves for ancillary purposes.
This class is meant to serve as a container that keeps information about the filter properties associated with the enlargeable leaves, that is Table, EArray and VLArray.
The public variables of Filters are listed below:
There are no Filters public methods with the exception of the constructor itself that is described next.
The parameters that can be passed to the Filters class constructor are:
import numarray as na from tables import * fileh = openFile("test5.h5", mode = "w") atom = Float32Atom(shape=(0,2)) filters = Filters(complevel=1, complib = "lzo") filters.fletcher32 = 1 arr = fileh.createEArray(fileh.root, 'earray', atom, "A growable array", filters = filters) # Append several rows in only one call arr.append(na.array([[1., 2.], [2., 3.], [3., 4.]], type=na.Float32)) # Print information on that enlargeable array print "Result Array:" print repr(arr) fileh.close()This enforces the use of the LZO library, a compression level of 1 and a fletcher32 checksum filter as well. See the output of this example:
Result Array: /earray (EArray(3L, 2), fletcher32, shuffle, lzo(1)) 'A growable array' type = Float32 shape = (3L, 2) itemsize = 4 nrows = 3 extdim = 0 flavor = 'NumArray' byteorder = 'little'
You can use this class to set/unset the properties in the indexing process of a Table column. To use it, create an instance, and assign it to the special attribute _v_indexprops in a table description class (see 4.13.1) or dictionary.
The public variables of IndexProps are listed below:
There are no IndexProps public methods with the exception of the constructor itself that is described next.
The parameters that can be passed to the IndexProps class constructor are:
This class is used to keep the indexing information for table columns. It is actually a descendant of the Group class, with some added functionality.
It has no methods intended for programmer's use, but it has some attributes that maybe interesting for him.