thrust - Get unique elements of multiple arrays in CUDA -


here problem: there number of arrays, example, 2000 arrays, 256 integers in each array. , range of integers quite considerable, [0, 1000000] instance.

i want unique elements each array, in other words, remove duplicate elements. have 2 solutions:

  1. use thrust unique element every array, have 2000 times thrust::unique. each array pretty small, way may not performance.

  2. implement hash table in cuda kernel, use 2000 blocks, 256 thread in each block. , make use of shared memory implement hash table, every single block produce element-unique array.

the above 2 methods seem unprofessional, there elegant ways solve problem cuda ?

you can use thrust::unique if modify data similar done in question: segmented sort cudpp/thrust

for simplification, let's assume each array contains per_array elements , there total of array_num arrays. each element in range [0,max_element].

demo data per_array=4, array_num=3 , max_element=2 this:

data = {1,0,1,2},{2,2,0,0},{0,0,0,0} 

to denote membership of each element respective array use following flags:

flags = {0,0,0,0},{1 1 1 1},{2,2,2,2} 

in order unique elements per array of segmented dataset need following steps:

  1. transform data elements of each array i within unique range [i*2*max_element,i*2*max_element+max_element]

    data = data + flags*2*max_element data = {1,0,1,2},{6,6,4,4},{8,8,8,8} 
  2. sort transformed data:

    data = {0,0,1,2},{4,4,6,6},{8,8,8,8} 
  3. apply thrust::unique_by_key using data keys , flags values:

    data  = {0,1,2}{4,6}{8} flags = {0,0,0}{1,1}{2} 
  4. transform data original values:

    data  = data - flags*2*max_element data  = {0,1,2}{0,2}{0} 

the maximum value of max_element bounded size of integer used representing data. if unsigned integer n bits:

max_max_element(n,array_num) = 2^n/(2*(array_num-1)+1) 

given array_num=2000, following limits 32bit , 64bit unsigned integers:

max_max_element(32,2000) = 1074010 max_max_element(64,2000) = 4612839228234447 

the following code implements above steps:

unique_per_array.cu

#include <thrust/device_vector.h> #include <thrust/extrema.h> #include <thrust/transform.h> #include <thrust/iterator/counting_iterator.h> #include <thrust/functional.h> #include <thrust/sort.h> #include <thrust/unique.h> #include <thrust/copy.h>  #include <iostream> #include <cstdint>  #define printer(name) print(#name, (name)) template <template <typename...> class v, typename t, typename ...args> void print(const char* name, const v<t,args...> & v) {     std::cout << name << ":\t";     thrust::copy(v.begin(), v.end(), std::ostream_iterator<t>(std::cout, "\t"));     std::cout << std::endl; }  int main() {      typedef uint32_t integer;      const std::size_t per_array = 4;     const std::size_t array_num = 3;      const std::size_t total_count = array_num * per_array;      integer demo_data[] = {1,0,1,2,2,2,0,0,0,0,0,0};      thrust::device_vector<integer> data(demo_data, demo_data+total_count);          printer(data);      // if max_element known problem,     // don't need following operation      integer max_element = *(thrust::max_element(data.begin(), data.end()));     std::cout << "max_element=" << max_element << std::endl;      using namespace thrust::placeholders;      // create flags      // smaller integer type     thrust::device_vector<uint32_t> flags(total_count);      thrust::counting_iterator<uint32_t> flags_cit(0);      thrust::transform(flags_cit,                       flags_cit + total_count,                       flags.begin(),                       _1 / per_array);     printer(flags);       // 1. transform data unique ranges       thrust::transform(data.begin(),                       data.end(),                       thrust::counting_iterator<integer>(0),                       data.begin(),                       _1 + (_2/per_array)*2*max_element);     printer(data);      // 2. sort transformed data     thrust::sort(data.begin(), data.end());     printer(data);      // 3. eliminate duplicates per array     auto new_end = thrust::unique_by_key(data.begin(),                                          data.end(),                                          flags.begin());      uint32_t new_size = new_end.first - data.begin();     data.resize(new_size);     flags.resize(new_size);      printer(data);     printer(flags);      // 4. transform data     thrust::transform(data.begin(),                       data.end(),                       flags.begin(),                       data.begin(),                       _1 - _2*2*max_element);      printer(data);  }     

compiling , running yields:

$ nvcc -std=c++11 unique_per_array.cu -o unique_per_array && ./unique_per_array  data:   1   0   1   2   2   2   0   0   0   0   0   0    max_element=2 flags:  0   0   0   0   1   1   1   1   2   2   2   2    data:   1   0   1   2   6   6   4   4   8   8   8   8    data:   0   1   1   2   4   4   6   6   8   8   8   8    data:   0   1   2   4   6   8    flags:  0   0   0   1   1   2    data:   0   1   2   0   2   0    

one more thing:

in thrust development version there improvement implemented thrust::unique* improves performance around 25 %. might want try version if aim better performance.


Comments

Popular posts from this blog

How to show in django cms breadcrumbs full path? -

php - Invalid Cofiguration - yii\base\InvalidConfigException - Yii2 -

ruby on rails - npm error: tunneling socket could not be established, cause=connect ETIMEDOUT -