rorserver woes

only_a_ptr

Infamous Developer
Administrator
Developer
Joined
Jan 31, 2018
Messages
169
Location
Prague, CZ
The rorserver crashes awfully often lately. Sometimes it runs fine for 2 weeks, sometimes it crashes 2 times a day.

From Tritonas00 on Discord:
lets get the facts from start
years ago socketw working fine (at least from what mike said)
rorserver used to run months without crash
then rornet enhanced of course, new things added
rorserver starts crashing(edited)
socketw lib remained the same

My response: it could be
- different build of socketw (like the directx VS2017/19 issue)
- OS update on server
- rorserver bug, obviously.
- RoR bug.

Attached are gdb logs from the crashes.
 

Attachments

  • crashes.txt
    9.3 KB · Views: 240
I'm analyzing the crashes.txt from top to bottom. Currently I'm focusing on the first entry:

Code:
Thread 744 "rorserver" received signal SIGABRT, Aborted.
[Switching to Thread 0x7fffcaffd700 (LWP 11039)]
__GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51
51    ../sysdeps/unix/sysv/linux/raise.c: No such file or directory.
#0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51
#1  0x00007ffff6cbb8b1 in __GI_abort () at abort.c:79
#2  0x00007ffff6d04907 in __libc_message (action=action@entry=do_abort, fmt=fmt@entry=0x7ffff6e31be8 "*** %s ***: %s terminated\n") at ../sysdeps/posix/libc_fatal.c:181
#3  0x00007ffff6dafe81 in __GI___fortify_fail_abort (need_backtrace=need_backtrace@entry=false, msg=msg@entry=0x7ffff6e31bc6 "stack smashing detected") at fortify_fail.c:33
#4  0x00007ffff6dafe42 in __stack_chk_fail () at stack_chk_fail.c:29
#5  0x00005555555ac0ae in Receiver::Thread (this=0x7ffff000ae18) at /home/administrator/rorserver/source_242_ch/source/server/receiver.cpp:100

What it tells me: either the function `Receiver::Thread()` or some function called from it breaks the stack. There's not so many occasions where this happens - basically just working with arrays (or C-strings which are arrays, too). First thing I did was to verify all the code by eye - I cleaned up some ugly things but found nothing really dangerous.

Next, I studied how exactly the stack check works. According to GCC manual, the check is done when the function returns, which in this case means "when the user is disconnected and the receiver thread exits". Not good - the actual breakage may happen a lot earlier. But it gave me a very straightforward idea - if I partitioned "Receiver::Thread()" to several sub-functions, the next crash backtrace would tell me more.
 
I've (finally!) set myself up a linux box (ManjaroLinux 21.0.5, running in VirtualBox VM) so I can develop the server with confidence. I build with GCC 10.2.0 and connect clients from Windows host. I tried to make the server crash but no luck; the furthest I got is "ERROR|ReceiveMessage(): payload too long: 32568b" which is non-critical, it only means the 'length' field in RoRnet::Header contains oversized value. It happens when I terminate the client while it's doing some non-interactive operation like spawning a big truck. Curiously the length is always the same.

Next I'm going to learn the details on sockets programming and audit the rorserver, removing everything which doesn't make perfect sense. So far I've fixed GCC warnings and straightened up the listening thread: https://github.com/only-a-ptr/ror-server/commit/a5aa14c0ae84cda65f5d2452348d99d97b0e9569. I only tested in "foreground mode", hopefully the daemon mode behaves the same, if not, it'll be good to know.
 
I've (finally!) set myself up a linux box (ManjaroLinux 21.0.5, running in VirtualBox VM) so I can develop the server with confidence. I build with GCC 10.2.0 and connect clients from Windows host. I tried to make the server crash but no luck; the furthest I got is "ERROR|ReceiveMessage(): payload too long: 32568b" which is non-critical, it only means the 'length' field in RoRnet::Header contains oversized value. It happens when I terminate the client while it's doing some non-interactive operation like spawning a big truck. Curiously the length is always the same.

Next I'm going to learn the details on sockets programming and audit the rorserver, removing everything which doesn't make perfect sense. So far I've fixed GCC warnings and straightened up the listening thread: https://github.com/only-a-ptr/ror-server/commit/a5aa14c0ae84cda65f5d2452348d99d97b0e9569. I only tested in "foreground mode", hopefully the daemon mode behaves the same, if not, it'll be good to know.
That's awesome!
 
Back
Top