Author

Topic: Pollard's kangaroo ECDLP solver - page 124. (Read 55445 times)

sr. member
Activity: 462
Merit: 696
May 25, 2020, 02:52:14 PM
that what we found.. you say that "this check is added on the dgb version" so in that case this led to errors in early version.
thanks a lot for job! Hope server will work like sharm.

This check was already there in the first debug release tested by zielar that crashes.
As I said, in normal condition a corrupted hash should not happen.
The thing that seems to solve the problem is that:

thread.cpp
Code:
#ifndef WIN64
  pthread_mutex_init(&ghMutex, NULL); // Why ?
  setvbuf(stdout, NULL, _IONBF, 0);
#else
  ghMutex = CreateMutex(NULL,FALSE,NULL);
#endif

I did that already on linux, and added a "why" ? in the comment because I didn't understand this.
On Linux without the reinitialization on the mutex on the Processthread function of the server, it results in an immediate hanging although there was not lock of this mutex before.
On windows, it seems that the mutex does not work correctly and that the local DP cache can be corrupted.
The mutexes are initialised in the class constructor so in the main thread.
So I think the issue is due to the ownership of the mutex but this is not not yet fully clear.
3h10 without crash....

sr. member
Activity: 616
Merit: 312
May 25, 2020, 02:23:25 PM
-snip-
The number of DP is on 4 byte. But if you send a random hash, on the original 1.5, they're is no check of the hash validity so when added to the hashtable you have an illegal access. This check is added on the dgb version. However, on normal situation this should not happen. The server is badly protected against protocol attack.
2h35 without chrash...
that what we found.. you say that "this check is added on the dgb version" so in that case this led to errors in early version.
thanks a lot for job! Hope server will work like sharm.
sr. member
Activity: 462
Merit: 696
May 25, 2020, 02:17:59 PM
-snip-

Yes, this is why the Read function has a loop until it reached the end of the transmission.
I prefer to do like this rather than extending the packet length.
the new server is stil running: 2h07, 2^22.28 DP. i cross the finger Smiley
i test release 1.5 (not debug version):
connect to server, send byte=2 to server(comand to send DP), than i send word=1(mean 1DP will be send), than send random 40bytes(like DP) than server return me status 0(mean ok) and crashed!


The number of DP is on 4 byte. But if you send a random hash, on the original 1.5, they're is no check of the hash validity so when added to the hashtable you have an illegal access. This check is added on the dgb version. However, on normal situation this should not happen. The server is badly protected against protocol attack.
2h35 without chrash...

sr. member
Activity: 616
Merit: 312
May 25, 2020, 02:07:00 PM
-snip-

Yes, this is why the Read function has a loop until it reached the end of the transmission.
I prefer to do like this rather than extending the packet length.
the new server is stil running: 2h07, 2^22.28 DP. i cross the finger Smiley
i test release 1.5 (not debug version):
connect to server, send byte=2 to server(comand to send DP), than i send word=1(mean 1DP will be send), than send random 40bytes(like DP) than server return me status 0(mean ok) and crashed!
i do this 5 times and 5 times i got the same.
That mean if there will be invalid DP than server can crashed. so i think need check CRC or something before use this buffer of DP
full member
Activity: 277
Merit: 106
May 25, 2020, 02:03:52 PM
35 minutes without restart (and still runing)
sr. member
Activity: 462
Merit: 696
May 25, 2020, 01:49:05 PM
@jeanLuc maybe i am wrong but i can`t see  in the code any limits on the number of transferred DP for 1 time.
Packet in tcp connection can send only 65536 bytes max. I don`t know which size have 1 DP
Maybe huge rig can produce a lot of DP and when send this amount to server this overhead packet and can cause unxpected error?
When i send BIG file in my app to server i divide this file on part each not more than 65536bytes.

Yes, this is why the Read function has a loop until it reached the end of the transmission.
I prefer to do like this rather than extending the packet length.
the new server is stil running: 2h07, 2^22.28 DP. i cross the finger Smiley
full member
Activity: 427
Merit: 105
May 25, 2020, 01:44:46 PM
@jeanLuc maybe i am wrong but i can`t see  in the code any limits on the number of transferred DP for 1 time.
Packet in tcp connection can send only 65536 bytes max. I don`t know which size have 1 DP
Maybe huge rig can produce a lot of DP and when send this amount to server this overhead packet and can cause unxpected error?
When i send BIG file in my app to server i divide this file on part each not more than 65536bytes.
hi there, just reading some, someone found the problem after looking at dmesg output: nf_conntrack: table full, dropping packet
So the problem was in the TCP redirection 80 => 8000 using iptables. he removed that iptables rule and he made Erlang listen on both ports 80 and 8000 directly, and the 65k limit was removed.
full member
Activity: 277
Merit: 106
May 25, 2020, 01:40:05 PM
10 minutes on fresh debug version without any restart.
full member
Activity: 427
Merit: 105
May 25, 2020, 01:38:46 PM
and one more thing, that savefull.work file is 17 petabytes large, if true, i am guessing it isn't
did had one of those still continueing, even the file size mistake, it seem to continue with that over here, but not with you,
maybe merge error, then to max connections at once, guess it's safety from linux for those connections.

sr. member
Activity: 616
Merit: 312
May 25, 2020, 01:34:33 PM
@jeanLuc maybe i am wrong but i can`t see  in the code any limits on the number of transferred DP for 1 time.
Packet in tcp connection can send only 65536 bytes max. I don`t know which size have 1 DP
Maybe huge rig can produce a lot of DP and when send this amount to server this overhead packet and can cause unxpected error?
When i send BIG file in my app to server i divide this file on part each not more than 65536bytes.
full member
Activity: 277
Merit: 106
May 25, 2020, 01:12:29 PM
I uploaded a new dbg file (which is in fact a release)
May be a problem of mutex ownership.
I have a this server running since one hour with 4 clients,2^21.2 DP without problem



I will check it, but I am waiting for the server autorestart, because so far up to 54 minutes without restarting at 88 machines. Beautiful view :-)
full member
Activity: 427
Merit: 105
May 25, 2020, 12:54:36 PM
hope this shed some light into all.

Code:
As of this writing, these details are accurate from linux kernel v2.2 onwards, and where possible highlights the differences in settings between kernels up through v5.5.
Kernel settings outlined in this article can be adjusted following the primer on configuring kernel settings here:

TCP Receive Queue and netdev_max_backlog
Each CPU core can hold a number of packets in a ring buffer before the network stack is able to process them. If the buffer is filled faster than TCP stack can process them,
a dropped packet counter is incremented and they will be dropped. The net.core.netdev_max_backlog setting should be increased to maximize the number of packets queued for processing on servers with high burst traffic.
net.core.netdev_max_backlog is a per CPU core setting.

TCP Backlog Queue and tcp_max_syn_backlog
A connection is created for any SYN packets that are picked up from the receive queue and are moved to the SYN Backlog Queue. The connection is marked “SYN_RECV” and a SYN+ACK is sent back to the client.
These connections are not moved to the accept queue until the corresponding ACK is received and processed. The maximum number of connections in the queue is set in the net.ipv4.tcp_max_syn_backlog kernel setting.
Run the following netstat command to check the receive queue, which should be no higher than 1 on a properly configured server under normal load, and should be under the SYN backlog queue size under heavy load:
# netstat -an | grep SYN_RECV | wc -l
If there are a lot of connections in SYN_RECV state, there are some additional settings that can increase the duration a SYN packet sits in this queue and cause problems on a high performance server.
SYN Cookies
If SYN cookies are not enabled, the client will simply retry sending a SYN packet. If SYN cookies are enabled (net.ipv4.tcp_syncookies), the connection is not created and is not placed in the SYN backlog, but a SYN+ACK packet
is sent to the client as if it was. SYN cookies may be beneficial under normal traffic, but during high volume burst traffic some connection details will be lost and the client will experience issues when the connection is established.
There’s a bit more to it than just the SYN cookies, but here’s a write up called “SYN cookies ate my dog” written by Graeme Cole that explains in detail why enabling SYN cookies on high performance servers can cause issues.
SYN+ACK Retries
What happens when a SYN+ACK is sent but never gets a response ACK packet? In this case, the network stack on the server will retry sending the SYN+ACK. The delay between attempts are calculated to allow for server recovery.
If the server receives a SYN, sends a SYN+ACK, and does not receive an ACK, the length of time a retry take follows the Exponental Backoff algorithm and therefore depends on the retry counter for that attempt. The kernel setting
that defines the number of SYN+ACK retries is net.ipv4.tcp_synack_retries with a default setting of 5. This will retry at the following intervals after the first attempt: 1s, 3s, 7s, 15s, 31s. The last retry will timeout after roughly 63s after the first attempt was made, which corresponds to when the next attempt would have been made if the number of retries was 6. This alone can keep a SYN packet in the SYN backlog for more than 60 seconds before the packet times out. If the SYN backlog queue is small, it doesn’t take a large volume of connections to cause an amplification event in the network stack where half-open connections never complete and no connections can be established. Set the number of SYN+ACK retries to 0 or 1 to avoid this behavior on high performance servers.
SYN Retries
Although SYN retries refer to the number of times a client will retry sending a SYN while waiting for a SYN+ACK, it can also impact high performance servers that make proxy connections. An nginx server making a few dozen proxy connections to a backend server due to a spike of traffic can overload the backend server’s network stack for a short period, and retries can create an amplification on the backend on both the receive queue and the SYN backlog queue. This, in turn, can impact the client connections being served. The kernel setting for SYN retries is net.ipv4.tcp_syn_retries and defaults to 5 or 6 depending on distribution. Rather than retry for upwards of 63–130s, limit the number of SYN retries to 0 or 1.
Code:
See the following for more information on addressing client connection issues on a reverse proxy server:
Linux Kernel Tuning for High Performance Networking: Ephemeral Ports
levelup.gitconnected.com

TCP Accept Queue and somaxconn
Applications are responsible for creating their accept queue when opening a listener port when callinglisten() by specifying a “backlog” parameter. As of linux kernel v2.2, this parameter changed from setting the maximum number of incomplete connections a socket can hold to the maximum number of completed connections waiting to be accepted. As described above, the maximum number of incomplete connections is now set with the kernel setting net.ipv4.tcp_max_syn_backlog.
somaxconn and the TCP listen() backlog
Although the application is responsible for the accept queue size on each listener it opens, there is a limit to the number of connections that can be in the listener’s accept queue. There are two settings that control the size of the queue: 1) the backlog parameter on the TCP listen() call, and 2) a kernel limit maximum from the kernel setting net.core.somaxconn.
Accept Queue Default
The default value for net.core.somaxconn comes from theSOMAXCONN constant, which is set to 128 on linux kernels up through v5.3, while SOMAXCONN was raised to 4096 in v5.4. However, v5.4 is the most current version at the time of this writing and has not been widely adopted yet, so the accept queue is going to be truncated to 128 on many production systems that have not modified net.core.somaxconn.
Applications typically use the value of the SOMAXCONN constant when configuring the default backlog for a listener if it is not set in the application configuration, or it’s sometimes simply hard-coded in the server software. Some applications set their own default, like nginx which sets it to 511 — which is silently truncated to 128 on linux kernels through v5.3. Check the application documentation for configuring the listener to see what is used.
Accept Queue Override
Many applications allow the accept queue size to be specified in the configuration by providing a “backlog” value on the listener directive or configuration that will be used when calling listen(). If an application calls listen() with a backlog value larger than net.core.somaxconn, then the backlog for that listener will be silently truncated to the somaxconn value.
Application Workers
If the accept queue is large, also consider increasing the number of threads that can handle accepting requests from the queue in the application. For example, setting a backlog of 20480 on an HTTP listener for a high volume nginx server without allowing for enough worker_connections to manage the queue will cause connection refused responses from the server.
Connections and File Descriptors
System Limit
Every socket connection also uses a file descriptor. The maximum number of all file descriptors that can be allocated to the system is set with the kernel setting fs.file-max. To see the current number of file descriptors in use, cat the following file:
# cat /proc/sys/fs/file-nr
1976      12       2048
The output shows that the number of file descriptors in use is 1976, the number of allocated but free file descriptors is 12 (on kernel v2.6+), and the maximum is 2048. On a high performance system, this should be set high enough to handle the maximum number of connections and any other file descriptor needs for all processes on the system.
User Limit
In addition to the file descriptor system limit, each user is limited to a maximum amount of open file descriptors. This is set with the system’s limits.conf (nofile), or in the processes systemd unit file if running a process under systemd (LimitNOFILE). To see the maximum number of file descriptors a user can have open by default:
$ ulimit -n
1024
And under systemd, using nginx as an example:
$ systemctl show nginx | grep LimitNOFILE
4096
Settings
To adjust the system limit, set the fs.max-file kernel setting to the maximum number of open file descriptors the system can have, plus some buffer. Example:
fs.file-max = 3261780
To adjust the user limit, set the value high enough to handle the number of connection sockets for all listeners plus any other file descriptor needs for the worker processes, and include some buffer. User limits are set under /etc/security/limits.conf, or a conf file under /etc/security/limits.d/, or in the systemd unit file for the service. Example:
# cat /etc/security/limits.d/nginx.conf
nginx soft nofile 64000
nginx hard nofile 64000
# cat /lib/systemd/system/nginx.service
[Unit]
Description=OpenResty Nginx - high performance web server
Documentation=https://www.nginx.org/en/docs/
After=network-online.target remote-fs.target nss-lookup.target
Wants=network-online.target
[Service]
Type=forking
LimitNOFILE=64000
PIDFile=/var/run/nginx.pid
ExecStart=/usr/local/openresty/nginx/sbin/nginx -c /usr/local/openresty/nginx/conf/nginx.conf
ExecReload=/bin/kill -s HUP $MAINPID
ExecStop=/bin/kill -s TERM $MAINPID
[Install]
WantedBy=multi-user.target
Worker Limits
Like file descriptor limits, the number of workers, or threads, that a process can create is limited by both a kernel setting and user limits.
System Limit
Processes can spin up worker threads. The maximum number of all threads that can be created is set with the kernel setting kernel.threads-max. To see the max number of threads along with the current number of threads executing on a system, run the following:
# cat /proc/sys/kernel/threads-max
257083
# ps -eo nlwp | tail -n +2 | \
    awk '{ num_threads += $1 } END { print num_threads }'
576
As long as the total number of threads is lower than the max, the server will be able to spin up new threads for processes as long as they’re within user limits.
User Limit
In addition to the max threads system limit, each user process is limited to a maximum number of threads. This is again set with the system’s limits.conf (nproc), or in the processes systemd unit file if running a process under systemd (LimitNPROC). To see the maximum number of threads a user can spin up:
$ ulimit -u
4096
And under systemd, using nginx as an example:
$ systemctl show nginx | grep LimitNPROC
4096
Settings
In most systems, the system limit is already set high enough to handle the number of threads a high performance server needs. However, to adjust the system limit, set the kernel.threads-max kernel setting to the maximum number of threads the system needs, plus some buffer. Example:
kernel.threads-max = 3261780
To adjust the user limit, set the value high enough for the number of worker threads needed to handle the volume of traffic including some buffer. As with nofile, the nproc user limits are set under /etc/security/limits.conf, or a conf file under /etc/security/limits.d/, or in the systemd unit file for the service. Example, with nproc and nofile:
# cat /etc/security/limits.d/nginx.conf
nginx soft nofile 64000
nginx hard nofile 64000
nginx soft nproc 64000
nginx hard nproc 64000
# cat /lib/systemd/system/nginx.service
[Unit]
Description=OpenResty Nginx - high performance web server
Documentation=https://www.nginx.org/en/docs/
After=network-online.target remote-fs.target nss-lookup.target
Wants=network-online.target
[Service]
Type=forking
LimitNOFILE=64000
LimitNPROC=64000
PIDFile=/var/run/nginx.pid
ExecStart=/usr/local/openresty/nginx/sbin/nginx -c /usr/local/openresty/nginx/conf/nginx.conf
ExecReload=/bin/kill -s HUP $MAINPID
ExecStop=/bin/kill -s TERM $MAINPID
[Install]
WantedBy=multi-user.target
TCP Reverse Proxy Connections in TIME_WAIT
Under high volume burst traffic, proxy connections stuck in “TIME_WAIT” can add up tying up many resources during the close connection handshake. This state indicates the client is waiting for a final FIN packet from the server (or upstream worker) that may never come. In most cases, this is normal and expected behavior and the default of 120s is acceptable. However, when the volume of connections in the “TIME_WAIT” state is high, this can cause the application to run out of worker threads to handle new requests or client sockets to connect to the upstream. It’s better to let these time out faster. The kernel setting that controls this timeout is net.ipv4.tcp_fin_timeout and a good setting for a high performance server is between 5 and 7 seconds.
Bringing it All Together
The receive queue should be sized to handle as many packets as linux can process off of the NIC without causing dropped packets, including some small buffer in case spikes are a bit higher than expected. The softnet_stat file should be monitored for dropped packets to discover the correct value. A good rule of thumb is to use the value set for tcp_max_syn_backlog to allow for at least as many SYN packets that can be processed to create half-open connections. Remember, this is the number of packets each CPU can have in its receive buffer, so divide the total desired by the number of CPUs to be conservative.
The SYN backlog queue should be sized to allow for a large number of half-open connections on a high performance server to handle bursts of occasional spike traffic. A good rule of thumb is to set this at least to the highest number of established connections a listener can have in the accept queue, but no higher than twice the number of established connections a listener can have. It is also recommended to turn off SYN cookie protection on these systems to avoid data loss on high burst initial connections from legitimate clients.
The accept queue should be sized to allow for holding a volume of established connections waiting to be processed as a temporary buffer during periods of high burst traffic. A good rule of thumb is to set this between 20–25% of the number of worker threads.
Configurations
The following kernel settings were discussed in this article:
# /etc/sysctl.d/00-network.conf
# Receive Queue Size per CPU Core, number of packets
# Example server: 8 cores
net.core.netdev_max_backlog = 4096
# SYN Backlog Queue, number of half-open connections
net.ipv4.tcp_max_syn_backlog = 32768
# Accept Queue Limit, maximum number of established
# connections waiting for accept() per listener.
net.core.somaxconn = 65535
# Maximum number of SYN and SYN+ACK retries before
# packet expires.
net.ipv4.tcp_syn_retries = 1
net.ipv4.tcp_synack_retries = 1
# Timeout in seconds to close client connections in
# TIME_WAIT after waiting for FIN packet.
net.ipv4.tcp_fin_timeout = 5
# Disable SYN cookie flood protection
net.ipv4.tcp_syncookies = 0
# Maximum number of threads system can have, total.
# Commented, may not be needed. See user limits.
#kernel.threads-max = 3261780
# Maximum number of file descriptors system can have, total.
# Commented, may not be needed. See user limits.
#fs.file-max = 3261780
The following user limit settings were discussed in this article:
# /etc/security/limits.d/nginx.conf
nginx soft nofile 64000
nginx hard nofile 64000
nginx soft nproc 64000
nginx hard nproc 64000
Conclusion
The settings in this article are meant as examples and should not be copied directly into your production server configuration without testing. There are also some additional kernel settings that have an impact on network stack performance. Overall, these are the most important settings that I’ve used when tuning the kernel for high performance connections.
WRITTEN BY

John Patton
Follow
Following the motto: “Share what you know, learn what you don’t”


hope it helps


sr. member
Activity: 462
Merit: 696
May 25, 2020, 12:43:48 PM
I uploaded a new dbg file (which is in fact a release)
May be a problem of mutex ownership.
I have a this server running since one hour with 4 clients,2^21.2 DP without problem

full member
Activity: 277
Merit: 106
May 25, 2020, 12:11:59 PM
Yes it looks like ok.
My server is still running on windows 10...
What happen if you connect a server with a fresh work file to all the running clients ? Does it stop ?



Starting work on a new file in conjunction with the one-time pressure of all machines did not cause any negative consequences. I would even say that the reset frequency has fallen, but I would be lying, because from the beginning when I wrote about using a new progress file -> it happened already 9 times{EDIT} 10 times ;-).

The fact that I have only three machines on Linux can make it work stable.
sr. member
Activity: 462
Merit: 696
May 25, 2020, 11:50:28 AM
Yes I reproduced the issue with 3 clients on a local network but only with the release.
The debug seems more stable.
This is not really reproductible Sad
sr. member
Activity: 616
Merit: 312
May 25, 2020, 11:44:21 AM
Yes it looks like ok.
My server is still running on windows 10...
What happen if you connect a server with a fresh work file to all the running clients ? Does it stop ?
Server quit without error only if have some connections(especially not from local network)
For me  server without clients can work for a very long time. Problems appear when several clients are connected.
sr. member
Activity: 462
Merit: 696
May 25, 2020, 11:35:31 AM
Yes it looks like ok.
My server is still running on windows 10...
What happen if you connect a server with a fresh work file to all the running clients ? Does it stop ?

full member
Activity: 277
Merit: 106
May 25, 2020, 11:13:21 AM
I was forced to do as colleagues advised, i.e. change the save file to go any further. At the moment I wanted to check myself how it works in practice and combined the old progress file with a copy of the current one ... And here's what I saw:



Can you confirm that everything is working from the screenshots below? I wouldn't want to continue without it if it were otherwise ...
Below - winfo for each file separately (old, new, and connected dump):

OLD


NEW:


MERGED:
full member
Activity: 277
Merit: 106
May 25, 2020, 10:49:05 AM
In my Windows 10 not.

At first, the problem occurred for up to 1.5 hours. Over time, this intensified. Now I'm not able to finish # 110 because the server can restart every 2 minutes even
sr. member
Activity: 462
Merit: 696
May 25, 2020, 10:26:41 AM
pfff... I have a server running since 30min in the debugger without problem...
Jump to: