New Question
0

Ceph large file transfer failures

asked 2021-06-07 17:11:41 +0200

this post is marked as community wiki

This post is a wiki. Anyone with karma >75 is welcome to improve it.

We are using Ceph for windows in conjunction with Dokan to transfer files from windows machines to a very large (multi PB) Ceph cluster running on Linux. The drive mounts correctly and small files (e.g. a few MB) transfer without issue. When we try to send either a large file (e.g. 10 GB) or many small files at once (e.g. 1000 x 1MB) the transfer fails. Are there some logs that we can look at to see why there are failure?

The only thing I have to go on is a post transfer attempt failure message from windows that reports "Error 0x800705AA: insufficient system resources" yet task manager shows the resources as available. Online recommendations are to do a sfc/scannow but this is a clean install of Windows.

edit retag flag offensive close merge delete

3 answers

Sort by » oldest newest most voted
0

answered 2021-06-08 09:43:40 +0200

lpetrut gravatar image

Hi,

Can you please send us the Ceph version (ceph-dokan -v)?

ceph-dokan can log to stderr, event log and/or a log file, here's a config sample: http://paste.openstack.org/raw/806450/. As shown there, raising the cephfs client log level to 3 might include some useful information.

Let's see if it's a Ceph error. If not, we can retrieve Dokany logs from stderr using the --debug --dokan-stderr arguments.

By the way, how reproducible is it?

Thanks, Lucian

edit flag offensive delete link more
0

answered 2023-10-25 12:40:44 +0200

BenediktS gravatar image

updated 2023-10-25 12:42:10 +0200

Hello,

we have the same problem. When the error occurs, our ceph shows "1 clients failing to respond to capability release" . I tried to get the logs. But the problem is, when i activate loglevel 3 then the error did not occur in the same way.
With log level 3 the servers shows "slow OSD operations". But it is not throwing any error.

When i start the client with the dokan debug options on, then a error ocours that never occured before:

###Create file handle = 0x000001D6D8DBE060, eventID = 0006, event Info = 0x000001D6CC02D4A0
###WriteFile file handle = 0x000001D6CC057A50, eventID = 0003, event Info = 0x000001D6CC02CF60
libc++abi: Dokan Information: SendAndPullEventInformation() with NTSTATUS 0x0, context 0xcc057a50, and result   object 0x000001D6CDB11010 with size 48
terminating due to uncaught exception of type std::runtime_error: invalid utf8Dokan Information:    DokanEndDispatchCreate() status = 0, file handle = 0x000001D6D8DBE060, eventID = 0006, result = 0x1
Dokan Information: SendAndPullEventInformation() with NTSTATUS 0x0, context 0xd8dbe060, and result object 0x000001D6CBFBBDE0 with size 48

###WriteFile file handle = 0x000001D6D7A2B720, eventID = 0004, event Info = 0x000001D6CC02D0B0

The original error occured in pacific and reef.

right now i use :

server 18.2.0
dokan 2.0.6.1000
client : https://ask.cloudbase.it/question/364...

I don't know if this is connected, or if this are totally different errors. But i thought i post it in case it is connected.

best regards
Benedikt

edit flag offensive delete link more

Comments

Thanks, we'll look into it as soon as possible.

lpetrut gravatar imagelpetrut ( 2023-10-25 16:55:45 +0200 )edit

About the unicode error, did you use the updated MSI? https://cloudbase.it/downloads/ceph_unicode_reef.msi. Could you paste the ceph-dokan version? (ceph-dokan --version).

lpetrut gravatar imagelpetrut ( 2023-10-31 13:51:30 +0200 )edit

I couldn't reproduce the file copy error yet. How often does it show up? Are there any Dokan driver errors in the "System" Windows event log? Additional driver logs can be obtained like so: https://github.com/dokan-dev/dokany/wiki/How-to-Debug-Dokan#logs-on-debug-build

lpetrut gravatar imagelpetrut ( 2023-10-31 13:51:44 +0200 )edit

Could you share some info about the environment? It might give us an idea about possible limitations.

lpetrut gravatar imagelpetrut ( 2023-10-31 13:52:04 +0200 )edit

For example: Windows host specs (number of cpu cores, total ram, available ram when hitting the problem), Windows version, Ceph OSD count, network card type (e.g. 10G, 100G, NIC teaming), the number of attached filesystems.

lpetrut gravatar imagelpetrut ( 2023-10-31 13:52:19 +0200 )edit
0

answered 2023-11-28 07:37:36 +0200

this post is marked as community wiki

This post is a wiki. Anyone with karma >75 is welcome to improve it.

I have the save problem. I take some photo for log, could you please give a email? i will send photos to you. My email address is 295510979@qq.com

edit flag offensive delete link more

Your Answer

Please start posting anonymously - your entry will be published after you log in or create a new account.

Add Answer

Question Tools

1 follower

Stats

Asked: 2021-06-07 17:11:41 +0200

Seen: 639 times

Last updated: Nov 28 '23