Pages:
Author

Topic: fastest way in C to generate numbers - page 2. (Read 327 times)

hero member
Activity: 560
Merit: 1060
February 09, 2024, 03:37:01 PM
#2
Hello, let me share my comments and perhaps we can construct something good.

First of all, I will not comment anything on CUDA because I am not very competent with it and I don't want to mislead you.

Now, my comments:

1. You basically want to generate numbers from 1 to 2^n where 1 <= n <= 100. But, I can assure you that printing them on the file requires much more time than just generating them on the RAM. If you measure how much time it takes to fprintf all the numbers on the file, versus the time it takes to fprintf only the last number (which will happen when the program terminates), the time elapsed will differ dramatically. But, I believe you want to print all the numbers, so my comment isn't very accurate.

2.Writing on the file, requires consecutive disk I/Os which are very time consuming. A work-around would be to print the numbers in batches. But the problem with C is that you cannot print the numbers without using recursion or loops. So you could add the numbers to an array in RAM (let's say the size of the array is 10,000). Then every 10,000 numbers, you would need to print to the file in batch. But, as I said, batch writing is not possible in C. You still need to loop the array and print one-by-one so you will technically add an overhead to your whole experiment which will not improve the time needed. However, the concept could be applied in other programming languages.

3. You assign numbers to an integer (int). In C, an int can take values from -2,147,483,648 to 2,147,483,647. The latter is equal to 2^31 + 1, which means that you cannot put any number larger than 30. So if you put n=100, then you will get a buffer overflow. Try it, just for fun.

Disclaimer: My last C program was written 10 years ago, so please be nice with me  Tongue
hero member
Activity: 630
Merit: 731
Bitcoin g33k
February 09, 2024, 02:29:56 PM
#1
Hi folks.

I'm trying to achieve the highest possible throughput for number generation and am making my first attempts in C. Unfortunately I can't manage to work with multithreading, all my attempts failed. The goal is to simply generate numbers from 1 to 2^n and measure the time needed to do so in order to work as efficiently as possible. My current simple program looks like this:

The user can enter an integer number from 1-100 and then the program starts and generates all numbers starting from 1 up to 2^n. It writes the result into the output.txt file. At the end of the program it displays the required time.

Code:
#include
#include
#include
#include

int main() {
    int n;
    printf("Enter the value of n (1-100): ");
    scanf("%d", &n);

    if (n < 1 || n > 100) {
        printf("Invalid input. Please enter an integer between 1 and 100.\n");
        return 1;
    }

    clock_t start_time, end_time;
    double total_time;

    start_time = clock();

    FILE *output_file = fopen("output.txt", "w");
    if (output_file == NULL) {
        printf("Error opening file.\n");
        return 1;
    }

    int end_number = pow(2, n);

    for (int i = 1; i <= end_number; i++) {
        fprintf(output_file, "%d\n", i);
    }

    fclose(output_file);

    end_time = clock();
    total_time = (double)(end_time - start_time) / CLOCKS_PER_SEC;
    printf("Total runtime: %.4f seconds\n", total_time);

    return 0;
}

Here are some benchmark results on my machine for this simple program:

Quote
25 bit --> 1.9 seconds
28 bit --> 9.87 seconds
30 bit --> 40.01 seconds

I would like to be able to use multithreading or multiprocessing so that the program runs not only on one thread/core but optionally on several or all cores available to the system. The load would have to be shared, similar to Python with concurrent feautures with which I had the best experience.

The very best would of course be the use of CUDA, but I have absolutely no idea how to write the kernel for this.A typical sequence of operations for a CUDA C program is:

Declare and allocate host and device memory.
Initialize host data.
Transfer data from the host to the device.
Execute one or more kernels.
Transfer results from the device to the host.

If anyone could assists with a simple basic structure as a template, that would help me a lot. I am very grateful for any tips.
Pages:
Jump to: