Pages:
Author

Topic: BSGS solver for cuda - page 9. (Read 3292 times)

sr. member
Activity: 616
Merit: 312
October 15, 2021, 06:31:56 AM
#39


Thanks man for the information , can you please fix memory & ampere issue? is it possible ? and recompile it as i am unable to compile it via pure basic , free version have limitation
I can only add code to get correct number of ampere cores.
I can`t fix memory(it is more fix return 32bit values instead 64bit) because i can`t use unofficial _v2 comands with official commands in the same app.
jr. member
Activity: 40
Merit: 7
October 15, 2021, 06:27:15 AM
#38
i think i found the problem the information which program is pulling from device is wrong or these are max value which intentionally hardcoded in program , Ethar can you please set all dynamic , i mean device should report all parameters

Found 1 Cuda device.
Cuda device:GeForce RTX 3080(4095Mb)    wrong
Device have: MP:68 Cores+0                    wrong
Shared memory total:49152                      i guess this is system memory but avaiable is 128GB
Constant memory total:65536                    not sure how calculate this one

i am not sure but MP is unit of AMD cards and cuda for Nvidia , and cuda is 8k+ in 3080 but not sure what is 68 cores here
so many confusions
Program used cuda driver api(not runtime api that ussualy used) and code for GPU writed on ptx.
cuda.lib that used to call cuda driver api even x64 version alwayse return 32bit values.
In that case you can`t use/allocate GPU memory more than 2**32bytes
Also cuDeviceTotalMem() return 32bit values of memory that is why you see 4095mb
I write about this issues to nvidia few times but according to them they have no problem)
if you are looking to cuda.lib you will fined unofficial commands like cuDeviceTotalMem_v2 and other.
All this commands have prefix _v2 and this comands return correct 64bit values.
But nvidia say that they does not have commands with prefix _v2 ))
It is about limitation of 2**32 bytes GPU memory
About Device have: MP:68 Cores+0, here 0 because i didn`t add Ampere to programm:
Code:
Case 2 ;Fermi
            Debug "Fermi"
            If minor=1
              cores = mp * 48
            Else
              cores = mp * 32
            EndIf
          Case 3; Kepler
            Debug "Kepler"
            cores = mp * 192
            
          Case 5; Maxwell
            Debug "Maxwell"
            cores = mp * 128
            
          Case 6; Pascal
            Debug "Pascal"
            cores = mp * 64
            
          Case 7; Pascal
            Debug "Pascal RTX"
            cores = mp * 64
          Default
            Debug "Unknown device type"
        EndSelect
by the way it need only for information and nothing more
to get corect number of cores need add only this
Code:
          Case 8; Ampere 
            Debug "Ampere RTX"
            cores = mp * 128
          Default
            Debug "Unknown device type"


Thanks man for the information , can you please fix memory & ampere issue? is it possible ? and recompile it as i am unable to compile it via pure basic , free version have limitation
sr. member
Activity: 616
Merit: 312
October 15, 2021, 06:24:17 AM
#37
i think i found the problem the information which program is pulling from device is wrong or these are max value which intentionally hardcoded in program , Ethar can you please set all dynamic , i mean device should report all parameters

Found 1 Cuda device.
Cuda device:GeForce RTX 3080(4095Mb)    wrong
Device have: MP:68 Cores+0                    wrong
Shared memory total:49152                      i guess this is system memory but avaiable is 128GB
Constant memory total:65536                    not sure how calculate this one

i am not sure but MP is unit of AMD cards and cuda for Nvidia , and cuda is 8k+ in 3080 but not sure what is 68 cores here
so many confusions
Program used cuda driver api(not runtime api that ussualy used) and code for GPU writed on ptx.
cuda.lib that used to call cuda driver api even x64 version alwayse return 32bit values.
In that case you can`t use/allocate GPU memory more than 2**32bytes
Also cuDeviceTotalMem() return 32bit values of memory that is why you see 4095mb
I write about this issues to nvidia few times but according to them they have no problem)
if you are looking to cuda.lib you will fined unofficial commands like cuDeviceTotalMem_v2 and other.
All this commands have prefix _v2 and this comands return correct 64bit values.
But nvidia say that they does not have commands with prefix _v2 ))
It is about limitation of 2**32 bytes GPU memory
About Device have: MP:68 Cores+0, here 0 because i didn`t add Ampere to programm:
Code:
Case 2 ;Fermi
            Debug "Fermi"
            If minor=1
              cores = mp * 48
            Else
              cores = mp * 32
            EndIf
          Case 3; Kepler
            Debug "Kepler"
            cores = mp * 192
            
          Case 5; Maxwell
            Debug "Maxwell"
            cores = mp * 128
            
          Case 6; Pascal
            Debug "Pascal"
            cores = mp * 64
            
          Case 7; Pascal
            Debug "Pascal RTX"
            cores = mp * 64
          Default
            Debug "Unknown device type"
        EndSelect
by the way it need only for information and nothing more
to get corect number of cores need add only this
Code:
          Case 8; Ampere 
            Debug "Ampere RTX"
            cores = mp * 128
          Default
            Debug "Unknown device type"
jr. member
Activity: 40
Merit: 7
October 15, 2021, 06:21:44 AM
#36
i agree with you but free purebasic program can compile only small code lines so that's why i need help from @Etar

and program is setting memory automatically but calculating it wrong
legendary
Activity: 1568
Merit: 6660
bitcoincleanup.com / bitmixlist.org
October 15, 2021, 06:11:54 AM
#35
i think i found the problem the information which program is pulling from device is wrong or these are max value which intentionally hardcoded in program , Ethar can you please set all dynamic , i mean device should report all parameters

Found 1 Cuda device.
Cuda device:GeForce RTX 3080(4095Mb)    wrong
Device have: MP:68 Cores+0                     wrong
Shared memory total:49152                      i guess this is system memory but avaiable is 128GB
Constant memory total:65536                    not sure how calculate this one

i am not sure but MP is unit of AMD cards and cuda for Nvidia , and cuda is 8k+ in 3080 but not sure what is 68 cores here
so many confusions

There is no need to wait for a patch, you can independently get these stats on an NVIDIA card using their sample DeviceQuery program: https://github.com/NVIDIA/cuda-samples/blob/master/Samples/deviceQuery/deviceQuery.cpp - It needs to be compiled from source though but it's extremely easy to do since it's only a single file.
jr. member
Activity: 40
Merit: 7
October 15, 2021, 04:29:17 AM
#34
i think i found the problem the information which program is pulling from device is wrong or these are max value which intentionally hardcoded in program , Ethar can you please set all dynamic , i mean device should report all parameters

Found 1 Cuda device.
Cuda device:GeForce RTX 3080(4095Mb)    wrong
Device have: MP:68 Cores+0                     wrong
Shared memory total:49152                      i guess this is system memory but avaiable is 128GB
Constant memory total:65536                    not sure how calculate this one

i am not sure but MP is unit of AMD cards and cuda for Nvidia , and cuda is 8k+ in 3080 but not sure what is 68 cores here
so many confusions
jr. member
Activity: 40
Merit: 7
October 15, 2021, 04:24:12 AM
#33
speed is also slower than Kangaroo around 1200M i am getting , but i want to tweak to utilize max gpu memory and max ram with max power , increase item size will slow down speed and take longer to solve .

any idea how to tweak 

Possibly due to "memory fragmentation" that happens when the program allocates GPU memory for one stuct, it's allocated in the middle of GPU memory and that will limit the maximum contiguous memory allocation allowed on the GPU for other structs.

The resolution for it is to allocate the largest structure first (in this case the TotalBuff) and then the smaller ones last. It requires a code modification though, which is impossible to do without the source code.

source codes are available i guess here https://github.com/Etayson/BSGS-cuda/blob/main/bsgscudaussualHTchangeble1_2.pb can you check please
legendary
Activity: 1568
Merit: 6660
bitcoincleanup.com / bitmixlist.org
October 15, 2021, 03:52:45 AM
#32
speed is also slower than Kangaroo around 1200M i am getting , but i want to tweak to utilize max gpu memory and max ram with max power , increase item size will slow down speed and take longer to solve .

any idea how to tweak 

Possibly due to "memory fragmentation" that happens when the program allocates GPU memory for one stuct, it's allocated in the middle of GPU memory and that will limit the maximum contiguous memory allocation allowed on the GPU for other structs.

The resolution for it is to allocate the largest structure first (in this case the TotalBuff) and then the smaller ones last. It requires a code modification though, which is impossible to do without the source code.
jr. member
Activity: 40
Merit: 7
October 15, 2021, 03:37:15 AM
#31
GPU #0 launched
GPU #0 TotalBuff: 8112.000Mb
error cuMemAlloc-2
Press Enter to exit

i guess you hard coded 4096 GPU mem as i did everything but i am unable utilizing full GPU memory  , my GPU is 3080 with 10GB

this is the max i can use

GPU #0 launched
GPU #0 TotalBuff: 3216.000Mb

      

speed is also slower than Kangaroo around 1200M i am getting , but i want to tweak to utilize max gpu memory and max ram with max power , increase item size will slow down speed and take longer to solve .

any idea how to tweak 
member
Activity: 71
Merit: 43
October 14, 2021, 06:08:14 AM
#30
i dont have 3xxx series available but based on specs i can calculate the average speed.
with one 3090 or 3080ti 2^51 operations should be done in 6-7 days
//edit
based on your previous post your tesla k80 will find (if lucky) a private key, if it's in range 100bit, in ~ 25-26 days
newbie
Activity: 9
Merit: 0
October 14, 2021, 05:45:59 AM
#29
Is it really possible to find a 100-bit key on one video card? How long does it take for this?
as i see 100bit puzzle was picked by telaurist who write first kangaroo ver in cpu, and he used 1 gpu to find it
maybe latest cards do it fast

It's quite possible to find 100bit puzzle with single video card and not even the most powerful one. (kangaroo method)
On single RTX 2060 you can find such a key in 34-35 days (2^51 operations). Sometimes you dont even need full 2^51, you can find the key even when you reach 2^50 (this means half of time ~17 days).
If we are talking about RTX 2080 then the speed is higher with almost 50% compared to 2060, this leads us to ~23 days for full 2^51 range.
with rtx 3xxx series maybe do it in hours ?
above 2 random key generate, one from first half and 2nd is 2nd half of 100 bit, i want to know how much fast rtx 3xxx series could found, i need to calc times, if you have rtx and have some time , to find above pubkeys in 100 bit, will help me to 3xxx power for time
thankx
member
Activity: 71
Merit: 43
October 14, 2021, 05:25:35 AM
#28
Is it really possible to find a 100-bit key on one video card? How long does it take for this?
as i see 100bit puzzle was picked by telaurist who write first kangaroo ver in cpu, and he used 1 gpu to find it
maybe latest cards do it fast

It's quite possible to find 100bit puzzle with single video card and not even the most powerful one. (kangaroo method)
On single RTX 2060 you can find such a key in 34-35 days (2^51 operations). Sometimes you dont even need full 2^51, you can find the key even when you reach 2^50 (this means half of time ~17 days).
If we are talking about RTX 2080 then the speed is higher with almost 50% compared to 2060, this leads us to ~23 days for full 2^51 range.
legendary
Activity: 1568
Merit: 6660
bitcoincleanup.com / bitmixlist.org
October 14, 2021, 04:15:17 AM
#27
Is it really possible to find a 100-bit key on one video card? How long does it take for this?
as i see 100bit puzzle was picked by telaurist who write first kangaroo ver in cpu, and he used 1 gpu to find it
maybe latest cards do it fast

They definitely do not do it fast because that's what would happen if the range was 50-60 bits... you sure his program wasn't published after he took #100 coins? Maybe his was the only Kangaroo program at the time and he kept it to himself until he found some private key.
newbie
Activity: 9
Merit: 0
October 13, 2021, 02:48:29 PM
#26
Is it really possible to find a 100-bit key on one video card? How long does it take for this?
as i see 100bit puzzle was picked by telaurist who write first kangaroo ver in cpu, and he used 1 gpu to find it
maybe latest cards do it fast
member
Activity: 170
Merit: 58
October 13, 2021, 02:01:36 PM
#25
searching these 2 pubkeys in 100 bit range
034786ac12686480348261b5dce84efcffc27b56b512ca793a09229ed06d63058d
027ede4f01c7dd2690603cd0449fc4e4ac9ca2d11de2404ef2285ab897d2645391

some one can help me to understand what hardware gpu's models you are using for above result data ?

is there any ubuntu compilation/sourcecode program available, for cuda 8.0 and ccap 20, g++ 4.8

love to see your updates


Why? There are no coins.
member
Activity: 173
Merit: 12
October 13, 2021, 01:50:50 PM
#24
Is it really possible to find a 100-bit key on one video card? How long does it take for this?
legendary
Activity: 1568
Merit: 6660
bitcoincleanup.com / bitmixlist.org
October 13, 2021, 12:32:10 PM
#23
searching these 2 pubkeys in 100 bit range
034786ac12686480348261b5dce84efcffc27b56b512ca793a09229ed06d63058d
027ede4f01c7dd2690603cd0449fc4e4ac9ca2d11de2404ef2285ab897d2645391

some one can help me to understand what hardware gpu's models you are using for above result data ?

is there any ubuntu compilation/sourcecode program available, for cuda 8.0 and ccap 20, g++ 4.8

love to see your updates

CUDA toolkits don't support your CUDA version and CCap anymore, therefore it is highly unlikely you will find any brute-forcing software that works with your GPU. You're better using a newer GPU with ccap 6.0+ (even then, there is no Linux port of this code).
i purchased tesla k80, will arrive at me aprox 7 days later, will that work ?

Sorry I made a mistake, anything with ccap 3.5+ will work. Yours is a Kepler GK210 model with ccap 3.7, so it should work fine [despite the caveat on Wikipedia saying that CUDA Toolkit 11.x only partially supports Kepler].

newbie
Activity: 9
Merit: 0
October 13, 2021, 12:24:14 PM
#22
searching these 2 pubkeys in 100 bit range
034786ac12686480348261b5dce84efcffc27b56b512ca793a09229ed06d63058d
027ede4f01c7dd2690603cd0449fc4e4ac9ca2d11de2404ef2285ab897d2645391

some one can help me to understand what hardware gpu's models you are using for above result data ?

is there any ubuntu compilation/sourcecode program available, for cuda 8.0 and ccap 20, g++ 4.8

love to see your updates

CUDA toolkits don't support your CUDA version and CCap anymore, therefore it is highly unlikely you will find any brute-forcing software that works with your GPU. You're better using a newer GPU with ccap 6.0+ (even then, there is no Linux port of this code).
i purchased tesla k80, will arrive at me aprox 7 days later, will that work ?
legendary
Activity: 1568
Merit: 6660
bitcoincleanup.com / bitmixlist.org
October 13, 2021, 12:18:33 PM
#21
searching these 2 pubkeys in 100 bit range
034786ac12686480348261b5dce84efcffc27b56b512ca793a09229ed06d63058d
027ede4f01c7dd2690603cd0449fc4e4ac9ca2d11de2404ef2285ab897d2645391

some one can help me to understand what hardware gpu's models you are using for above result data ?

is there any ubuntu compilation/sourcecode program available, for cuda 8.0 and ccap 20, g++ 4.8

love to see your updates

CUDA toolkits don't support your CUDA version and CCap anymore, therefore it is highly unlikely you will find any brute-forcing software that works with your GPU. You're better using a newer GPU with ccap 6.0+ (even then, there is no Linux port of this code).
newbie
Activity: 9
Merit: 0
October 13, 2021, 12:11:22 PM
#20
searching these 2 pubkeys in 100 bit range
034786ac12686480348261b5dce84efcffc27b56b512ca793a09229ed06d63058d
027ede4f01c7dd2690603cd0449fc4e4ac9ca2d11de2404ef2285ab897d2645391

some one can help me to understand what hardware gpu's models you are using for above result data ?

is there any ubuntu compilation/sourcecode program available, for cuda 8.0 and ccap 20, g++ 4.8

love to see your updates

Pages:
Jump to: