Velox-cuDF is a Velox extension module that uses the cuDF library to implement a GPU-accelerated backend for executing Velox plans. cuDF is an open source library for GPU data processing, and Velox-cuDF integrates with "libcudf", the CUDA C++ core of cuDF. libcudf uses Arrow-compatible data layouts and includes single-node, single-GPU algorithms for data processing.
Velox-cuDF implements the Velox DriverAdapter interface as CudfDriverAdapter to rewrite query plans for GPU execution. Generally the cuDF DriverAdapter replaces operators one-to-one. For end-to-end GPU execution where cuDF replaces all of the Velox CPU operators, cuDF relies on Velox's pipeline-based execution model to separate stages of execution, partition the work across drivers, and schedule concurrent work on the GPU.
For more information please refer to our blog: "Extending Velox - GPU Acceleration with cuDF."
cuDF supports Linux and WSL2 but not Windows or MacOS. cuDF also has minimum CUDA version, NVIDIA driver and GPU architecture requirements which can be found in the RAPIDS Installation Guide. Please refer to cuDF's readme and developer guide for more information.
The cuDF backend is included in Velox builds when the VELOX_ENABLE_CUDF CMake option is set. The adapters-cuda service in Velox's docker-compose.yml is an excellent starting point for Velox builds with cuDF.
- Use
docker composeto run anadapters-cudaimage.
$ docker compose -f docker-compose.yml run -e NUM_THREADS=8 --rm -v "$(pwd):/velox" adapters-cuda /bin/bash
- Once inside the image, build cuDF with the following flags:
$ CUDA_ARCHITECTURES="native" EXTRA_CMAKE_FLAGS="-DVELOX_ENABLE_ARROW=ON -DVELOX_ENABLE_PARQUET=ON -DVELOX_ENABLE_BENCHMARKS=ON -DVELOX_ENABLE_BENCHMARKS_BASIC=ON" make cudf
- After cuDF is built, verify the build by running the unit tests.
$ cd _build/release
$ ctest -R cudf -V
Velox-cuDF builds are included in Velox CI as part of the adapters build. The build step for cuDF does not require the worker to have a GPU, so adding a Velox-cuDF build step to Velox CI is compatible with the existing runners.
Velox-cuDF provides several configuration properties to control GPU execution behavior, memory management, and debugging. These configurations are available when compiled with cuDF support and can be set via Velox's configuration system. For a complete list of cuDF-specific configuration properties and their descriptions, see the Cudf-specific Configuration section in the Velox configuration documentation.
Tests with Velox-cuDF can only be run on GPU-enabled hardware. The Velox-cuDF tests in experimental/cudf/tests include several types of tests:
- operator tests
- function tests
- fuzz tests (not yet implemented)
The repo rapidsai/velox-testing includes standard scripts for testing Velox-cuDF. Please refer to the test_velox.sh for running the Velox-cuDF unit tests. We plan to first develop GitHub Actions for GPU CI in rapidsai/velox-testing, and then later transition GPU-enabled GitHub Actions to Velox mainline.
Many of the tests for cuDF are "operator tests" which confirm correct execution of simple query plans. cuDF's operator tests use CudfDriverAdapter to modify the test plan with GPU operators before executing it. The operator tests for cuDF include both tests that assert successful GPU operator replacement, and tests that pass with CPU fallback.
Velox-cuDF also includes "function tests" which cover the behavior of shared functions that could be called in multiple operators. Velox-cuDF function tests assess the correctness of functions using one or more cuDF API calls to provide the output. SubfieldFilterAstTest includes several examples of function tests. Please note that unit tests for cuDF APIs are included in cudf/cpp/tests rather than Velox.
Velox includes components for "fuzz testing" to ensure robustness of Velox operators. For instance, the Join Fuzzer executes a random join type with random inputs and compares the Velox results with a reference query engine. Fuzz testing tools have been used for cuDF operator development, but fuzz testing for cuDF is not yet integrated into Velox mainline.
Velox's TpchBenchmark is derived from TPC-H and provides a convenient tool for benchmarking Velox's performance with OLAP (Online Analytical Processing) workloads. Velox-cuDF includes GPU operators for the hand-built query plans located in TpchQueryBuilder. Velox PR 13695 extends Velox's TpchBenchmark to the cuDF backend.
Please note that Velox's hand-built query plans require the data set to have floating-point types in place of the fixed-point types defined in the standard. Further development of Velox's TpchBenchmark could allow correct behavior with both fixed-point and floating-point types.
Velox-cuDF's development priorities are documented as Velox issues using the "[cuDF]" prefix. Please check out the open issues to learn more.
We would love to hear from you in Velox's Slack workspace, please see Velox discussion 11348 for information on joining.