Hi folks.
I'm trying to achieve the highest possible throughput for number generation and am making my first attempts in C. Unfortunately I can't manage to work with multithreading, all my attempts failed. The goal is to simply generate numbers from 1 to 2^n and measure the time needed to do so in order to work as efficiently as possible. My current simple program looks like this:
The user can enter an integer number from 1-100 and then the program starts and generates all numbers starting from 1 up to 2^n. It writes the result into the output.txt file. At the end of the program it displays the required time.
#include
#include
#include
#include
int main() {
int n;
printf("Enter the value of n (1-100): ");
scanf("%d", &n);
if (n < 1 || n > 100) {
printf("Invalid input. Please enter an integer between 1 and 100.\n");
return 1;
}
clock_t start_time, end_time;
double total_time;
start_time = clock();
FILE *output_file = fopen("output.txt", "w");
if (output_file == NULL) {
printf("Error opening file.\n");
return 1;
}
int end_number = pow(2, n);
for (int i = 1; i <= end_number; i++) {
fprintf(output_file, "%d\n", i);
}
fclose(output_file);
end_time = clock();
total_time = (double)(end_time - start_time) / CLOCKS_PER_SEC;
printf("Total runtime: %.4f seconds\n", total_time);
return 0;
}
Here are some benchmark results on my machine for this simple program:
25 bit --> 1.9 seconds
28 bit --> 9.87 seconds
30 bit --> 40.01 seconds
I would like to be able to use multithreading or multiprocessing so that the program runs not only on one thread/core but optionally on several or all cores available to the system. The load would have to be shared, similar to Python with concurrent feautures with which I had the best experience.
The very best would of course be the use of CUDA, but I have absolutely no idea how to write the kernel for this.A typical sequence of operations for a CUDA C program is:
Declare and allocate host and device memory.
Initialize host data.
Transfer data from the host to the device.
Execute one or more kernels.
Transfer results from the device to the host.
If anyone could assists with a simple basic structure as a template, that would help me a lot. I am very grateful for any tips.