README.md

Velox-cuDF

Velox-cuDF is a Velox extension module that uses the cuDF library to implement a GPU-accelerated backend for executing Velox plans. cuDF is an open source library for GPU data processing, and Velox-cuDF integrates with "libcudf", the CUDA C++ core of cuDF. libcudf uses Arrow-compatible data layouts and includes single-node, single-GPU algorithms for data processing.

How Velox and cuDF work together

Velox-cuDF implements the Velox DriverAdapter interface as CudfDriverAdapter to rewrite query plans for GPU execution. Generally the cuDF DriverAdapter replaces operators one-to-one. For end-to-end GPU execution where cuDF replaces all of the Velox CPU operators, cuDF relies on Velox's pipeline-based execution model to separate stages of execution, partition the work across drivers, and schedule concurrent work on the GPU.

For more information please refer to our blog: "Extending Velox - GPU Acceleration with cuDF."

Getting started with Velox-cuDF

cuDF supports Linux and WSL2 but not Windows or MacOS. cuDF also has minimum CUDA version, NVIDIA driver and GPU architecture requirements which can be found in the RAPIDS Installation Guide. Please refer to cuDF's readme and developer guide for more information.

Building Velox with cuDF

The cuDF backend is included in Velox builds when the VELOX_ENABLE_CUDF CMake option is set. The adapters-cuda service in Velox's docker-compose.yml is an excellent starting point for Velox builds with cuDF.

Use docker compose to run an adapters-cuda image.

$ docker compose -f docker-compose.yml run -e NUM_THREADS=8 --rm -v "$(pwd):/velox" adapters-cuda /bin/bash

Once inside the image, build cuDF with the following flags:

$ CUDA_ARCHITECTURES="native" EXTRA_CMAKE_FLAGS="-DVELOX_ENABLE_ARROW=ON -DVELOX_ENABLE_PARQUET=ON -DVELOX_ENABLE_BENCHMARKS=ON -DVELOX_ENABLE_BENCHMARKS_BASIC=ON" make cudf

After cuDF is built, verify the build by running the unit tests.

$ cd _build/release
$ ctest -R cudf -V

Velox-cuDF builds are included in Velox CI as part of the adapters build. The build step for cuDF does not require the worker to have a GPU, so adding a Velox-cuDF build step to Velox CI is compatible with the existing runners.

Configuring Velox-cuDF

Velox-cuDF provides several configuration properties to control GPU execution behavior, memory management, and debugging. These configurations are available when compiled with cuDF support and can be set via Velox's configuration system. For a complete list of cuDF-specific configuration properties and their descriptions, see the Cudf-specific Configuration section in the Velox configuration documentation.

Testing Velox with cuDF

Tests with Velox-cuDF can only be run on GPU-enabled hardware. The Velox-cuDF tests in experimental/cudf/tests include several types of tests:

operator tests
function tests
fuzz tests (not yet implemented)

The repo rapidsai/velox-testing includes standard scripts for testing Velox-cuDF. Please refer to the test_velox.sh for running the Velox-cuDF unit tests. We plan to first develop GitHub Actions for GPU CI in rapidsai/velox-testing, and then later transition GPU-enabled GitHub Actions to Velox mainline.

Operator tests

Many of the tests for cuDF are "operator tests" which confirm correct execution of simple query plans. cuDF's operator tests use CudfDriverAdapter to modify the test plan with GPU operators before executing it. The operator tests for cuDF include both tests that assert successful GPU operator replacement, and tests that pass with CPU fallback.

Function tests

Velox-cuDF also includes "function tests" which cover the behavior of shared functions that could be called in multiple operators. Velox-cuDF function tests assess the correctness of functions using one or more cuDF API calls to provide the output. SubfieldFilterAstTest includes several examples of function tests. Please note that unit tests for cuDF APIs are included in cudf/cpp/tests rather than Velox.

Fuzz tests

Velox includes components for "fuzz testing" to ensure robustness of Velox operators. For instance, the Join Fuzzer executes a random join type with random inputs and compares the Velox results with a reference query engine. Fuzz testing tools have been used for cuDF operator development, but fuzz testing for cuDF is not yet integrated into Velox mainline.

Benchmarking Velox with cuDF

Velox's TpchBenchmark is derived from TPC-H and provides a convenient tool for benchmarking Velox's performance with OLAP (Online Analytical Processing) workloads. Velox-cuDF includes GPU operators for the hand-built query plans located in TpchQueryBuilder. Velox PR 13695 extends Velox's TpchBenchmark to the cuDF backend.

Please note that Velox's hand-built query plans require the data set to have floating-point types in place of the fixed-point types defined in the standard. Further development of Velox's TpchBenchmark could allow correct behavior with both fixed-point and floating-point types.

Contributing

Velox-cuDF's development priorities are documented as Velox issues using the "[cuDF]" prefix. Please check out the open issues to learn more.

We would love to hear from you in Velox's Slack workspace, please see Velox discussion 11348 for information on joining.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Velox-cuDF

How Velox and cuDF work together

Getting started with Velox-cuDF

Building Velox with cuDF

Configuring Velox-cuDF

Testing Velox with cuDF

Operator tests

Function tests

Fuzz tests

Benchmarking Velox with cuDF

Contributing

FilesExpand file tree

README.md

Latest commit

History

README.md

File metadata and controls

Velox-cuDF

How Velox and cuDF work together

Getting started with Velox-cuDF

Building Velox with cuDF

Configuring Velox-cuDF

Testing Velox with cuDF

Operator tests

Function tests

Fuzz tests

Benchmarking Velox with cuDF

Contributing