New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ganesha crashed after running delegations cases for pynfs #153
Comments
From debugging via gdb, I see that there could be cases where in an xprt handle is released twice in svc_rqst_xprt_task(). 752 /static/ void Here at line 771, we release xprt ref all the time without any checks for errors in SVC_RECV. But there could be cases where in we encountered errors in SVC_RECV->svc_vc_recv() which calls SVC_DESTROY->svc_destroy_it->svc_release_it(). So we end up decrement ref twice resulting in use_after_free. svc_destroy_it sets "SVC_XPRT_FLAG_DESTROYING" in xprt->xp_flags. May be we could check the same before calling SVC_RELEASE at line:771. @dang .. thoughts? |
I don't think that's it. The rpc_dplx_rec has a refcount on the svc_xprt which needs to be released when the xprt is destroyed (on failure). In addition, the event has a ref on the xprt (taken in svc_rqst_epoll_event() at line 868). This needs to be unconditionally released when the event is done, whether it succeeded or not, or else the xprt will be leaked. |
Okay..but callback chan still has reference to that xprt. I have collected traces using gdb and copied to the below link. Please take a look. https://paste.fedoraproject.org/paste/4YOgU70k7OPR6lmOWvwxOg |
Okay, I think Ganesha needs it's own ref on the client. Something like this: |
Thanks Dan..will verify the fix. But wouldn't it result in leaking clnt object? |
Any ref taken would need to be released. At least as of my work on them,
CLNT and SVC had a shared refcnt which would prevent destruction of xprt if
there were clients.
Matt
…On Wed, Oct 17, 2018 at 11:15 AM, Soumya Koduri ***@***.***> wrote:
Thanks Dan..will verify the fix. But wouldn't it result in leaking clnt
object?
I mean with this fix, when and where shall chan->clnt gets destroyed. And
when it gets destroyed, wouldn't it again try to access freed xprt?
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#153 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AASY52Ga974sJF18hrPhWqKhaP2R64Bzks5ul0l0gaJpZM4XUnCM>
.
--
Matt Benjamin
Red Hat, Inc.
315 West Huron Street, Suite 140A
Ann Arbor, Michigan 48103
http://www.redhat.com/en/technologies/storage
tel. 734-821-5101
fax. 734-769-8938
cel. 734-216-5309
|
also its very easy to reproduce this issue (in case if needed). Jiffin listed it in the first comment. #./testserver.py -v --outfile ~/pynfs.run.1 --maketree localhost:/ --showomit --rundeps DELEG1 It will result in use_after_free of xprt object by reaper thread which may or may not result in crash. |
It's destroyed in _nfs_rpc_destroy_chan(), in the last backtrace of your set of backtraces. |
yes.. but clnt->cl_refcnt always stays positive and never becomes zero now. Atleast for this particular testcase , I dont see 'clnt_vc_destroy' being called ever (post this fix is applied) |
Hmm, okay, so it's not hitting the error case. Is it never hitting the error case, with the extra ref in place? If so, that would be very strange. |
yes its not hitting error case with extra ref. |
I sent patch to fix the same - #154 Please review the changes. Thanks! |
On 10/18/18 4:34 AM, Soumya Koduri wrote:
I sent patch to fix the same - #154 <#154>
Please review the changes. Thanks!
As I understand the issue, this doesn't match the overall design.
There are a lot of comments trying to explain all over the code,
because there were many problems in the past. I've spent years
fixing them....
There is +1 reference for the svc_xprt fd tree. Various parts of
the code call SVC_DESTROY(). So there are bit flags that prevent
repeated destroys -- so that -1 is only done once.
There should be a CLIENT reference. While the client is around,
that bumps +1 reference of the SVCXPRT.
There is also a CLIENT reference for each outstanding clnt_req.
The clnt_req is what is used to do the delegation.
It all forms a link of references: clnt_req -> CLIENT -> SVCXPRT.
There is +1 reference for the svc_rqst_xprt_task, taken before the
task is run.
You are skipping the release for the task, but the svc_xprt ref
has been released by SVC_DESTROY(). It's a different one. So
who is releasing the task reference?
Moreover, that release by svc_rqst_xprt_task is supposed to trigger
the destroy_it call. That in turn sets up another task to free.
So there cannot be a use-after-free here. The SVC_RELEASE is the
very last code in the task. The only way it was freed was that
some other place released a reference that was not taken, and the
freeing task ran before the svc_rqst_xprt_task finished!
I've got some time tomorrow morning before 9 am, and after noon.
I'll wander over to the office and see whether DanG can reproduce
with his lovely lttg code running your sure-fire reproducer.
I'd have done it earlier today, but DanG was working from home.
|
You've fixed problems and created them, just like everybody else. The overall refcounting design is mine, I designed it. Matt |
So, there appears to be multiple issues here. In fact, the client is not getting a ref, if the xprt is newly created. svc_xprt_lookup() only returns with a single ref, the initial one which (per comment above) belongs to the svc_xprt fd tree. This is clear both by code inspection and in the backtrace linked above. Thus, when there's an error, and so an early destroy, there's a use after free when the supposed client's ref is put by the reaper. This would be fixed by this: https://paste.fedoraproject.org/paste/AYel7QSG2xRZyYwl~KfkuQ The second issue is that we must be missing a put somewhere in the non-error case, or we'd be leaking xprts under normal use. This is less clear, and I'm still looking at it. |
ack |
On 10/18/18 8:22 PM, Matt Benjamin wrote:
You've fixed problems and created them, just like everybody else. The overall refcounting design is mine, I designed it.
It was my understanding that our agreement with HR is you aren't supposed
reply to my posts. And vice versa. I'm not going to let this one slide.
[laugh] I'm fairly sure that you didn't invent or design refcounting.
In any case, most code that you contributed is long gone. The history shows
Marcus Watts did most of the fixes during 2013-2014, and after that it was
me.... When I arrived, shutting down often crashed. The transport cleanup
thread hung due to lock inversions. Many pynfs tests failed.
In particular, I've converted from a byzantine collection of multiple
locking levels around bare counters -- to atomic! Many fewer locks now.
Much faster.
If you know of any problems that I've created, please enlighten us with
detailed specifics.
Otherwise, please stop making disparaging personal comments about me, both
here and on the main mailing list. TIA.
|
I created the interfaces to take and release refcounts in the transport interface, which appears to still exist. No personally disparaging remark has been made. |
Okay, I have working code with no leaks with the following patches. For ntirpc: @soumyakoduri can you test this with delegations? |
On 10/19/18 2:19 PM, Daniel Gryniewicz wrote:
Okay, I have working code with no leaks with the following patches. For ntirpc:
https://paste.fedoraproject.org/paste/PxGuEtOdsztrAEtFBbUTjQ
For Ganesha:
https://paste.fedoraproject.org/paste/mYqhXW5Sp89vSEDEflnLTw
My Ganesha patch of June 17, 2017, originally did this.
Fix closing at termination and non-supported services
https://review.gerrithub.io/#/c/ffilz/nfs-ganesha/+/365966/1..4/src/MainNFSD/nfs_rpc_dispatcher_thread.c
*
* Actually call SVC_DESTROY(), setting SVC_XPRT_FLAG_DESTROYED to stop
* new connections. SVC_*_FLAG_CLOSE will prompt closing the fd.
As I did, you can replace the close() entirely; SVC_DESTROY() will cause the
close() in the [tcp|udp]_destroy code path. That extra close() here is the
reason we have a warning in the log for each of these fds, because close()
will also remove the epoll() for the fd.
But I removed that code in patch set 3, based upon your mild objection.
… @soumyakoduri <https://github.com/soumyakoduri> can you test this with delegations?
|
@dang.. with the above changes I dont see any more crash. Checked through gdb as well. While destroying chan->clnt, its xprt had valid ref=1 and there isnt any use_after_free. |
On 10/21/18 2:51 AM, Soumya Koduri wrote:
@dang <https://github.com/dang>.. with the above changes I dont see any more crash. Checked through gdb as well. While destroying chan->clnt, its xprt had valid ref=1 and there isnt any use_after_free.
Wonderful. Thanks, Soumya, your reproducer has been invaluable.
Welcome back!
|
Thanks Bill! |
I am also facing similar crash, I took this fix With extra ref, looks like we will never do svc_vc_destroy_task for xprt and socket fd (xp_fd) is never getting closed and causing fd leak even after SVC_DESTROY. |
yes.. #155 is the right and only fix needed in ntirpc codebase to fix this crash issue. You shouldn't be needing any other fixes. |
On 11/28/18 3:21 AM, Gaurav wrote:
I am also facing similar crash, I took this fix
#155 <#155>
With extra ref, looks like we will never do svc_vc_destroy_task for xprt and socket fd (xp_fd) is never getting closed and causing fd leak even after SVC_DESTROY.
I tried with this patch on top of extra ref patch above, it seems to fix the issue.
I am not sure if its the right way to fix it.
https://paste.fedoraproject.org/paste/fLYaI0rd1kB563m0SOPT1A
SVC_DESTROY isn't always (or even usually) the call that actually runs the svc_vc_destroy_task.
There's usually more SVC_RELEASE calls later.
This latter patch is not a good idea. It breaks a lot of internal design.
For one thing, close() also removes the fd from the epoll as a side effect. But then later,
unregister will error on the fd not being in the epoll. You should unregister before close.
We really REALLY don't want to close early (before the svc_vc_destroy_task). There's good
reasons for that design. Half open (one sided close) TCP connections are not well handled.
Moreover, there is no fd leak. If the fd isn't used, periodic svc_xprt_foreach calls run
through the xprt list looking for old ones that haven't had any activity.
It took many months to find this error. Everybody is affected. Isn't it time for an
ntirpc 1.7.x release?
|
I checked with latest next branch code. 28/11/2018 13:55:26 : epoch 5bfee335 : centos71 : ganesha.nfsd-46782[svc_84] rpc :TIRPC :DEBUG :svc_xprt_lookup: fd 1054 max_connections 1024 exceeded We are exceeding max_connections. If I revert the commit for #155, then it works fine. I agree we do SVC_RELEASE for all the refs taken and SVC_DESTROY will release the extra ref, but now we have extra ref even after SVC_DESTROY and all SVC_RELEASE. It make sense to keep the xprt entry in hash to reuse it but we will need to close the xp_fd on SVC_DESTROY even if we are not releasing/destroying xprt. I agree we will need to handle epoll_ctl failure with this fix, if we close xp_fd early. But I guess we will never end in xprt_unregister as we will always have extra ref. |
On 11/28/18 2:32 PM, Gaurav wrote:
I checked with latest next branch code.
I can see fd leaks with this test, mount with tcp in loop.
for i in {1..1000}; do echo $i; mount -t nfs -o vers=3,tcp :/export /mnt; umount /mnt; done
28/11/2018 13:55:26 : epoch 5bfee335 : centos71 : ganesha.nfsd-46782[svc_84] rpc :TIRPC :DEBUG :svc_xprt_lookup: fd 1054 max_connections 1024 exceeded
We are exceeding max_connections.
That's not a "leak" in ntirpc. The OS won't let you have too many connections, and ntirpc has a
configurable max_connections. You exceeded 1024 connections.
If Ganesha is properly releasing all references after umount, there will still be a final reference
until the cleanup runs. (I didn't design this, cleanup goes back to 2013.)
I agree we do SVC_RELEASE for all the refs taken and SVC_DESTROY will release the extra ref, but now we have extra ref even after SVC_DESTROY and all SVC_RELEASE.
No, it means we aren't doing "SVC_RELEASE for all the refs taken". The cleanup list has a ref.
Maybe that's not a good design. Or maybe Ganesha umount code needs to SVC_DESTROY to avoid
waiting for the cleanup task to run. Take it to the nfs-ganesha list.
|
How about this fix, Its aligned to the design, we can do SVC_RELEASE in cleanup routine if its already destroyed, that will handle extra ref after SVC_DESTROY. |
On 11/30/18 7:22 AM, Gaurav wrote:
How about this fix,
https://paste.fedoraproject.org/paste/lapz1NrOlBxS342S4Q79aw
Its aligned to the design, we can do SVC_RELEASE in cleanup routine if its already destroyed, that will handle extra ref after SVC_DESTROY.
This makes no sense. There is no "extra ref". The svc_rqst_clean_func is explicitly there to
trigger removal from the xprt cleanup list, and the SVC_DESTROY function does that.
If you SVC_RELEASE here, you are probably releasing a ref taken in NFS-Ganesha. This is bad!
Your test script bangs away using up fds until you run out. There is an explicit test preventing a
runaway client from doing that! You are trying to circumvent the explicit test.
Just wait until the fd inactivity timeout has expired. That's what is supposed to happen.
Only NFS-Ganesha knows whether the fd will never be used again. That is not a function of
this library. Please take up the behavior of umount on the NFS-Ganesha list.
|
On 12/1/18 5:44 PM, William Allen Simpson wrote:
Only NFS-Ganesha knows whether the fd will never be used again. That is not a function of
this library. Please take up the behavior of umount on the NFS-Ganesha list.
I'd already asked you to do that earlier, so I've done it for you.
Thanks for bringing it to our attention.
|
On latest master I am still seeing crash while running pynfs test suite for delegations during clean part (gdb) bt |
Ganesha was crashed after running delegations cases in pynfs suite. The crash was seen in client clean up path
Steps to reproduce
./testserver.py -v --outfile ~/pynfs.run.1 --maketree :/ --showomit --rundeps delegations
Tried only with glusterfs volume. Tests was not actually running, all the warned out(IMO so support for delegations in the export is not needed)
Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7f2c517ea700 (LWP 31067)]
0x00007f2cbaafdef5 in svc_release_it (xprt=0x7f2c64018750, flags=0, tag=0x7f2cbab25340 <func.8326> "clnt_vc_destroy", line=462) at /root/nfs-ganesha/src/libntirpc/ntirpc/rpc/svc.h:433
433 ((xprt)->xp_ops->xp_destroy)(xprt, flags, tag, line);
(gdb) bt
#0 0x00007f2cbaafdef5 in svc_release_it (xprt=0x7f2c64018750, flags=0, tag=0x7f2cbab25340 <func.8326> "clnt_vc_destroy", line=462) at /root/nfs-ganesha/src/libntirpc/ntirpc/rpc/svc.h:433
#1 0x00007f2cbaafec57 in clnt_vc_destroy (clnt=0x7f2c64009ff0) at /root/nfs-ganesha/src/libntirpc/src/clnt_vc.c:462
#2 0x000000000043dfdd in clnt_release_it (clnt=0x7f2c64009ff0, flags=0, tag=0x571880 <func.22917> "_nfs_rpc_destroy_chan", line=628) at /root/nfs-ganesha/src/libntirpc/ntirpc/rpc/clnt.h:318
#3 0x000000000043e073 in clnt_destroy_it (clnt=0x7f2c64009ff0, tag=0x571880 <func.22917> "_nfs_rpc_destroy_chan", line=628) at /root/nfs-ganesha/src/libntirpc/ntirpc/rpc/clnt.h:340
#4 0x0000000000441693 in _nfs_rpc_destroy_chan (chan=0x7f2c4801a198) at /root/nfs-ganesha/src/MainNFSD/nfs_rpc_callback.c:628
#5 0x00000000004422fc in nfs_rpc_destroy_chan (chan=0x7f2c4801a198) at /root/nfs-ganesha/src/MainNFSD/nfs_rpc_callback.c:864
#6 0x00000000004d11aa in nfs_client_id_expire (clientid=0x7f2c4801a0f0, make_stale=false) at /root/nfs-ganesha/src/SAL/nfs4_clientid.c:1099
#7 0x00000000004470e2 in reap_hash_table (ht_reap=0xa0d890) at /root/nfs-ganesha/src/MainNFSD/nfs_reaper_thread.c:109
#8 0x0000000000447885 in reaper_run (ctx=0xa43590) at /root/nfs-ganesha/src/MainNFSD/nfs_reaper_thread.c:232
#9 0x0000000000510a1a in fridgethr_start_routine (arg=0xa43590) at /root/nfs-ganesha/src/support/fridgethr.c:550
#10 0x00007f2cb94cbe25 in start_thread (arg=0x7f2c517ea700) at pthread_create.c:308
#11 0x00007f2cb8dd3bad in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:113
(gdb) f 0
#0 0x00007f2cbaafdef5 in svc_release_it (xprt=0x7f2c64018750, flags=0, tag=0x7f2cbab25340 <func.8326> "clnt_vc_destroy", line=462) at /root/nfs-ganesha/src/libntirpc/ntirpc/rpc/svc.h:433
433 ((xprt)->xp_ops->xp_destroy)(xprt, flags, tag, line);
(gdb) p xprt
$1 = (SVCXPRT *) 0x7f2c64018750
(gdb) p *xprt
$2 = {xp_ops = 0x2a, xp_dispatch = {process_cb = 0x0, rendezvous_cb = 0x0}, xp_parent = 0x2, xp_tp = 0x0, xp_netid = 0xffffffffffffffff <Address 0xffffffffffffffff out of bounds>, xp_p1 = 0x0,
xp_p2 = 0xffffffffffffffff, xp_p3 = 0x0, xp_u1 = 0xffffffffffffffff, xp_u2 = 0x0, xp_local = {nb = {maxlen = 4294967295, len = 4294967295, buf = 0x0}, ss = {ss_family = 65535,
__ss_padding = "\377\377\377\377\377\377\000\000\000\000\000\000\000\000\377\377\377\377\377\377\377\377\000\000\000\000\000\000\000\000\377\377\377\377\377\377\377\377\000\000\000\000\000\000\000\000\377\377\377\377\377\377\377\377\000\000\000\000\000\000\000\000\377\377\377\377\377\377\377\377\000\000\000\000\000\000\000\000\377\377\377\377\377\377\377\377", '\000' <repeats 31 times>, __ss_align = 0}},
xp_remote = {nb = {maxlen = 0, len = 0, buf = 0x0}, ss = {ss_family = 0, __ss_padding = '\000' <repeats 117 times>, __ss_align = 0}}, xp_lock = {__data = {__lock = 0, __count = 0, __owner = 0, __nusers = 0,
__kind = 0, __spins = 0, __elision = 0, __list = {__prev = 0x0, __next = 0x0}}, __size = '\000' <repeats 39 times>, __align = 0}, xp_fd = 0, xp_ifindex = 0, xp_si_type = 0, xp_type = 0, xp_refcnt = -1,
xp_flags = 64}
(gdb) f 4
#4 0x0000000000441693 in _nfs_rpc_destroy_chan (chan=0x7f2c4801a198) at /root/nfs-ganesha/src/MainNFSD/nfs_rpc_callback.c:628
628 CLNT_DESTROY(chan->clnt);
(gdb) p chan
$3 = (rpc_call_channel_t *) 0x7f2c4801a198
(gdb) p *chan
$4 = {type = RPC_CHAN_V40, mtx = {__data = {__lock = 1, __count = 0, __owner = 31067, __nusers = 1, __kind = 0, __spins = 0, __elision = 0, __list = {__prev = 0x0, __next = 0x0}},
__size = "\001\000\000\000\000\000\000\000[y\000\000\001", '\000' <repeats 26 times>, __align = 1}, states = 0, source = {clientid = 0x7f2c4801a0f0, session = 0x7f2c4801a0f0}, last_called = 0,
clnt = 0x7f2c64009ff0, auth = 0x0, gss_sec = {mech = 0x0, qop = 0, svc = 0, cred = 0x0, req_flags = 0}}
(gdb) p *chan->clnt
$5 = {cl_ops = 0x7f2cbad37420 <ops.8330>, cl_netid = 0x0, cl_tp = 0x0, cl_u1 = 0x0, cl_u2 = 0x0, cl_lock = {__data = {__lock = 0, __count = 0, __owner = 0, __nusers = 0, __kind = 3, __spins = 0, __elision = 0,
__list = {__prev = 0x0, __next = 0x0}}, __size = '\000' <repeats 16 times>, "\003", '\000' <repeats 22 times>, __align = 0}, cl_error = {ru = {RE_errno = 0, RE_why = AUTH_OK, RE_vers = {low = 0,
high = 0}, RE_lb = {s1 = 0, s2 = 0}}, re_status = RPC_SUCCESS}, cl_refcnt = 0, cl_flags = 96}
(gdb) p *chan->clnt->cl_ops
$6 = {cl_call = 0x7f2cbaafe74f <clnt_vc_call>, cl_abort = 0x7f2cbaafe9ae <clnt_vc_abort>, cl_freeres = 0x7f2cbaafe985 <clnt_vc_freeres>, cl_destroy = 0x7f2cbaafec15 <clnt_vc_destroy>,
cl_control = 0x7f2cbaafe9b8 <clnt_vc_control>}
(gdb)
The text was updated successfully, but these errors were encountered: