--- title: "Quick Start" author: "Michael Koohafkan" date: 2021-06-09 output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Quick Start} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- This document gets you up and running with `hdfqlr`, an R interface to [HDFql](https://www.hdfql.com/). In order to use this package, you will need to [download HDFql](https://www.hdfql.com/#download) for your system. If you are going to be using HDFql regularly, it's a good idea to set a default HDFql directory for use with `hdfqlr`. You can do this by either - Setting the R options `hdfqlr.dir` in your .Rprofile file. - Setting the system environment variable `HDFQL-DIR`. If either of these settings exist, the HDFql library will be loaded on package start. ```r library(hdfqlr) ``` Otherwise, you can load the HDFql library after package start with `hql_load()`: ```r hql_load('path/to/hdfql-x.x.x') ``` If you are on a Linux system, you will need to update the environment variable `LD_LIBRARY_PATH` to include the HDFql directories prior to using the package: ```bash export LD_LIBRARY_PATH=::$LD_LIBRARY_PATH ``` The `hdfqlr` package relies on the R wrapper provided by HDFql. Functions exported by the package are prefixed with `hql_` to make it easy to differentiate them from the functions provided by the wrapper, which are prefixed with `HDFQL_` (for constants) or `hdfql_` (for functions). The contents of the HDFql wrapper are imported into the environment `hql$wrapper`. If you need to directly use the HDFql wrapper functions in an interactive session, you can access them using `with` or `attach` the environment to the search path. Any existing scripts written for use with the HDFql wrapper can therefore be run by attaching the `hql$wrapper` environment prior to running the script, or by using `with(hql$wrapper, ...)`. ```r # using with with(hql$wrapper, HDFQL_VERSION ) ``` ``` ## [1] "2.1.0" ``` ```r # or attach the environment attach(hql$wrapper) ``` ``` ## The following objects are masked from hql$wrapper (pos = 4): ## ## HDFQL_ARRAY, HDFQL_ASCII, HDFQL_ATTRIBUTE, HDFQL_BIG_ENDIAN, ## HDFQL_BIGINT, HDFQL_BITFIELD, HDFQL_CHAR, HDFQL_CHUNKED, ## HDFQL_COMPACT, HDFQL_COMPOUND, HDFQL_CONTIGUOUS, hdfql_cursor, ## hdfql_cursor_, hdfql_cursor_absolute, hdfql_cursor_clear, ## hdfql_cursor_clone, hdfql_cursor_first, hdfql_cursor_get_bigint, ## hdfql_cursor_get_char, hdfql_cursor_get_count, ## hdfql_cursor_get_data_type, hdfql_cursor_get_double, ## hdfql_cursor_get_float, hdfql_cursor_get_int, ## hdfql_cursor_get_position, hdfql_cursor_get_size, ## hdfql_cursor_get_smallint, hdfql_cursor_get_tinyint, ## hdfql_cursor_get_unsigned_bigint, hdfql_cursor_get_unsigned_int, ## hdfql_cursor_get_unsigned_smallint, ## hdfql_cursor_get_unsigned_tinyint, hdfql_cursor_initialize, ## hdfql_cursor_last, hdfql_cursor_next, hdfql_cursor_previous, ## hdfql_cursor_relative, hdfql_cursor_use, hdfql_cursor_use_default, ## HDFQL_DATASET, HDFQL_DIRECTORY, HDFQL_DISABLED, HDFQL_DOUBLE, ## HDFQL_EARLIEST, HDFQL_EARLY, HDFQL_ENABLED, HDFQL_ENUMERATION, ## HDFQL_ERROR_AFTER_LAST, HDFQL_ERROR_ALREADY_EXISTS, ## HDFQL_ERROR_BEFORE_FIRST, HDFQL_ERROR_DANGLING_LINK, ## HDFQL_ERROR_EMPTY, HDFQL_ERROR_FULL, hdfql_error_get_line, ## hdfql_error_get_message, hdfql_error_get_position, ## HDFQL_ERROR_INVALID_FILE, HDFQL_ERROR_INVALID_REGULAR_EXPRESSION, ## HDFQL_ERROR_NO_ACCESS, HDFQL_ERROR_NO_ADDRESS, ## HDFQL_ERROR_NOT_ENOUGH_MEMORY, HDFQL_ERROR_NOT_ENOUGH_SPACE, ## HDFQL_ERROR_NOT_FOUND, HDFQL_ERROR_NOT_OPEN, ## HDFQL_ERROR_NOT_REGISTERED, HDFQL_ERROR_NOT_SUPPORTED, ## HDFQL_ERROR_OUTSIDE_LIMIT, HDFQL_ERROR_PARSE, ## HDFQL_ERROR_UNEXPECTED_DATA_TYPE, ## HDFQL_ERROR_UNEXPECTED_STORAGE_TYPE, HDFQL_ERROR_UNEXPECTED_TYPE, ## HDFQL_ERROR_UNKNOWN, hdfql_execute, hdfql_execute_get_status, ## HDFQL_EXTERNAL_LINK, HDFQL_FILE, HDFQL_FILL_DEFAULT, ## HDFQL_FILL_UNDEFINED, HDFQL_FILL_USER_DEFINED, HDFQL_FLOAT, ## hdfql_get_canonical_path, HDFQL_GLOBAL, HDFQL_GROUP, ## HDFQL_INCREMENTAL, HDFQL_INDEXED, HDFQL_INT, HDFQL_LATE, ## HDFQL_LATEST, HDFQL_LITTLE_ENDIAN, HDFQL_LOCAL, hdfql_mpi_get_rank, ## hdfql_mpi_get_size, HDFQL_NO, HDFQL_OPAQUE, HDFQL_REFERENCE, ## hdfql_shared_library, HDFQL_SMALLINT, HDFQL_SOFT_LINK, ## hdfql_subcursor_absolute, hdfql_subcursor_first, ## hdfql_subcursor_get_bigint, hdfql_subcursor_get_char, ## hdfql_subcursor_get_count, hdfql_subcursor_get_double, ## hdfql_subcursor_get_float, hdfql_subcursor_get_int, ## hdfql_subcursor_get_position, hdfql_subcursor_get_size, ## hdfql_subcursor_get_smallint, hdfql_subcursor_get_tinyint, ## hdfql_subcursor_get_unsigned_bigint, ## hdfql_subcursor_get_unsigned_int, ## hdfql_subcursor_get_unsigned_smallint, ## hdfql_subcursor_get_unsigned_tinyint, hdfql_subcursor_last, ## hdfql_subcursor_next, hdfql_subcursor_previous, ## hdfql_subcursor_relative, HDFQL_SUCCESS, HDFQL_TINYINT, ## HDFQL_TRACKED, HDFQL_UNDEFINED, HDFQL_UNLIMITED, ## HDFQL_UNSIGNED_BIGINT, HDFQL_UNSIGNED_INT, HDFQL_UNSIGNED_SMALLINT, ## HDFQL_UNSIGNED_TINYINT, HDFQL_UNSIGNED_VARBIGINT, ## HDFQL_UNSIGNED_VARINT, HDFQL_UNSIGNED_VARSMALLINT, ## HDFQL_UNSIGNED_VARTINYINT, HDFQL_UTF8, HDFQL_VARBIGINT, ## HDFQL_VARCHAR, HDFQL_VARDOUBLE, HDFQL_VARFLOAT, ## hdfql_variable_get_count, hdfql_variable_get_data_type, ## hdfql_variable_get_dimension, hdfql_variable_get_dimension_count, ## hdfql_variable_get_number, hdfql_variable_get_size, ## hdfql_variable_register, hdfql_variable_transient_register, ## hdfql_variable_unregister, hdfql_variable_unregister_all, ## HDFQL_VARINT, HDFQL_VARSMALLINT, HDFQL_VARTINYINT, HDFQL_VERSION, ## HDFQL_VERSION_18, HDFQL_YES ``` ```r HDFQL_VERSION ``` ``` ## [1] "2.1.0" ``` ```r detach(hql$wrapper) ``` The `hdfqlr` package is primarily designed for simple read and write use cases, i.e. inspecting, reading and writing HDF datasets and attributes. In order to access HDF files, they must be opened or "used": ```r file = tempfile(fileext = ".h5") hql_create_file(file) hql_use_file(file) ``` Creation of datasets and attributes is accomplished with the `hql_write_*` family of functions. Groups and sub-groups are created on the fly when writing datasets or attributes, but can also be explicitly created using `hql_create_group()`. The following example writes a matrix to the file. The dataset is then read back in and compared to the original R object. The function `hql_flush` is used to ensure that any buffered data is written to the HDF file prior to reading the data back into R. ```r values = matrix(rnorm(1000), nrow = 50) hql_write_dataset(values, "dataset0") hql_flush() all.equal(values, hql_read_dataset("dataset0"), check.attributes = FALSE) ``` ``` ## [1] TRUE ``` Any attributes of the R object (other than `dim`) are also written to the dataset. To write an attributes (or list of attributes) to the root of the HDF file or to a group, use `hql_write_attribute` (or `hql_write_all_attributes`). When writing character datasets, the maximum string length is used for all elements of the dataset. ```r char.values = month.name attr(char.values, "abb") = month.abb hql_write_dataset(char.values, "group1/dataset1") hql_flush() hql_read_dataset("group1/dataset1", include.attributes = TRUE) ``` ``` ## [1] "January " "February " "March " "April " "May " "June " ## [7] "July " "August " "September" "October " "November " "December " ## attr(,"abb") ## [1] "Jan" "Feb" "Mar" "Apr" "May" "Jun" "Jul" "Aug" "Sep" "Oct" "Nov" "Dec" ``` The package does not currently support editing existing datasets or attributes; this is instead accomplished by reading the dataset or attribute to an R object, modifying it, and writing it back to the HDF file using the argument `overwrite = TRUE`. The package also provides options for listing the contents of an HDF file via the functions `hql_list_groups`, `hql_list_datasets`, and `hql_list_attributes`. Both `hql_list_groups`and `hql_list_datasets` provide support for recursively listing sub-groups and sub-datasets. ```r hql_list_groups() ``` ``` ## [1] "group1" ``` ```r hql_list_datasets() ``` ``` ## [1] "dataset0" ``` ```r hql_list_datasets(recursive = TRUE) ``` ``` ## [1] "dataset0" "group1/dataset1" ``` ```r hql_list_attributes("group1/dataset1") ``` ``` ## [1] "abb" ``` Datasets, attributes, and groups can be removed from an HDF file via the functions `hql_drop_dataset`, `hql_drop_attribute`, and `hql_drop_group`. ```r hql_drop_dataset("dataset0") ``` Once you're finished working with the HDF objects, close the file with `hql_close_file`: ```r hql_close_file(file) ```