Segmention rule for soft line breaks in Excel when "\n" doesn't work
Thread poster: XLTS
XLTS
XLTS  Identity Verified
Germany
Local time: 08:56
Member (2011)
English to German
+ ...
Apr 1, 2023

I would like to split the individual lines which carry soft line breaks in the cells of my (multilingual) Excel (2019) files ‎into seperate translation units. However, these lines don't seem to be separated by a "normal" LF/CR ‎‎(hence it doesn‘t help to add a "\n" segmentation rule), but by a string which is displayed as ‎‎"_x000D_" when I open the xlsx archive and have a look at the file "sharedstrings.xml" in the "xl" ‎directory.

When I add a segmentation rule to a
... See more
I would like to split the individual lines which carry soft line breaks in the cells of my (multilingual) Excel (2019) files ‎into seperate translation units. However, these lines don't seem to be separated by a "normal" LF/CR ‎‎(hence it doesn‘t help to add a "\n" segmentation rule), but by a string which is displayed as ‎‎"_x000D_" when I open the xlsx archive and have a look at the file "sharedstrings.xml" in the "xl" ‎directory.

When I add a segmentation rule to a translation memory (which I newly created for ‎testing purposes) in entering "_x000D_" in the same advanced view where you would normally ‎enter "\n", it will have no effect on the segmentation. (Adding the "\n" segmentation rule BTW just makes ‎the "_x000D_" string appear in the source column of the Studio editor view. I could change this into ‎a tag, but this is not what I seek.)‎

What do I have to do to successfully split these lines into individual TUs?‎ I am using Studio 2021.

[Bearbeitet am 2023-04-01 02:24 GMT]
Collapse


 
Jaime Oriard
Jaime Oriard  Identity Verified
Mexico
Local time: 00:56
Member (2005)
English to Spanish
+ ...
An idea Apr 1, 2023

Have you tried \r instead of \n? After all, 000D is a carriage return. You could also try both \r\n or \n\r. I normally search for \r\n in Windows (https://en.wikipedia.org/wiki/Newline?useskin=vector#Representation).

Hope this helps,


 
Stepan Konev
Stepan Konev  Identity Verified
Russian Federation
Local time: 09:56
English to Russian
Example Apr 1, 2023

Can you share a screenshot of your strings? You can use imgbb.com for that (copy the BB code generated by that site and paste it here).
Also, do you make sure to remove the file after you add a new rule both from source and target for Trados to rebuild both source and target sdlxliff files from scratch with the new rule?


 
XLTS
XLTS  Identity Verified
Germany
Local time: 08:56
Member (2011)
English to German
+ ...
TOPIC STARTER
Your suggestion looks logical, but... Apr 1, 2023

Jaime Oriard wrote:

Have you tried \r instead of \n? After all, 000D is a carriage return. You could also try both \r\n or \n\r. I normally search for \r\n in Windows (https://en.wikipedia.org/wiki/Newline?useskin=vector#Representation).



Thank you, I just have tried all of these, but none will give a different result, the TU still looks like:

This is sentence 1._x000D_This is sentence 2.

BTW, when I reopen the language resources tab of the translation memory, the string \r\n will appear as ".\r[\n]+". But no matter where I place the brackets or even when I put both codes into brackets individually, the result will remain the same.


 
XLTS
XLTS  Identity Verified
Germany
Local time: 08:56
Member (2011)
English to German
+ ...
TOPIC STARTER
Action taken between attempts Apr 1, 2023

Stepan Konev wrote:

Can you share a screenshot of your strings?

Which strings do you refer to? The segmentation rule or the TUs to be split?

Also, do you make sure to remove the file after you add a new rule both from source and target for Trados to rebuild both source and target sdlxliff files from scratch with the new rule?

I am using the option "translate single document", each time making sure to delete on my SSD the project file created on this occasion. I even have created several copies of the source file and closed Studio between two attempts, to no avail.


 
Stepan Konev
Stepan Konev  Identity Verified
Russian Federation
Local time: 09:56
English to Russian
Apr 1, 2023

XLTS wrote:
Which strings do you refer to? The segmentation rule or the TUs to be split?
The TUs to be split. But I can see an example from your previous reply, it's ok.


 
Stepan Konev
Stepan Konev  Identity Verified
Russian Federation
Local time: 09:56
English to Russian
Suggestion Apr 1, 2023

I assume that it is not actually _x000D_ but simply x000D. Is that right?
If yes, you should do the following steps:
1. Go to File - Options -File Types - Microsoft Excel 2007-2019 - Embedded Content
2. Tick the 'Enable Embedded Content' box and click the 'Extract in defined document structures' radio button.
3. In the 'Tag definition rules' window, click Add...
4. In the 'Start Tag:' field type x000D
5. Click Advanced and select 'Exclude'.
6. Cli
... See more
I assume that it is not actually _x000D_ but simply x000D. Is that right?
If yes, you should do the following steps:
1. Go to File - Options -File Types - Microsoft Excel 2007-2019 - Embedded Content
2. Tick the 'Enable Embedded Content' box and click the 'Extract in defined document structures' radio button.
3. In the 'Tag definition rules' window, click Add...
4. In the 'Start Tag:' field type x000D
5. Click Advanced and select 'Exclude'.
6. Click OK as many times as necessary to close all windows and save the changes.
7. Open your single file for translation.
Collapse


 
XLTS
XLTS  Identity Verified
Germany
Local time: 08:56
Member (2011)
English to German
+ ...
TOPIC STARTER
Underline character not for highlighting purposes :) Apr 1, 2023

Stepan Konev wrote:

I assume that it is not actually _x000D_ but simply x000D. Is that right?


Unfortunately not: each "x000D" carries a preceding and a tailing unterline character, also when there are several of these strings in a row (ex: Sentence1._x000D__x000D_Sentence2.)


 
Stepan Konev
Stepan Konev  Identity Verified
Russian Federation
Local time: 09:56
English to Russian
Ok, then try this Apr 1, 2023

XLTS wrote:
Unfortunately not: each "x000D" carries a preceding and a tailing unterline character, also when there are several of these strings in a row (ex: Sentence1._x000D__x000D_Sentence2.)
Then use the same procedure, but in step 4 type _x005F

*Edit: removed the full stop char after _x005F to avoid ambiguity.


[Edited at 2023-04-01 14:22 GMT]


 
XLTS
XLTS  Identity Verified
Germany
Local time: 08:56
Member (2011)
English to German
+ ...
TOPIC STARTER
The solution I have found Apr 1, 2023

Stepan Konev wrote:

Then use the same procedure, but in step 4 type _x005F

Before I tried this, I chose to follow your initial recipe...

3. In the 'Tag definition rules' window, click Add...
4. In the 'Start Tag:' field type x000D
5. Click Advanced and select 'Exclude'.

..., except that (in "Bilingual Excel", as I am dealing with this file type) I entered "_x000D_" instead and ticked the check box "Line break after the tag" (trl?), and this actually seems to have done the trick.

For anyone who will encounter the same problem in the future, here is a summary of what I have done to eventually resolve the problem (I have at hand the German version of Studio 2019, so the wording of the commands may be approximative):

1. Go to File - Options - File Types - Bilingual Excel - Embedded Content
2. Tick the 'Enable Embedded Content' box and click the 'Extract in defined document structures' radio button.
3. In the 'Tag definition rules' window, click Add...
4. In the 'Start Tag:' field type _x000D_ (including the underline characters)
5. Click Advanced and both select 'Exclude' and tick the 'Line break after the tag' check box
6. Click OK as many times as necessary to close all windows and save the changes.

I don't know whether the following is necessary, but now that I have found a solution that works for whatever reason, and with Trados Studio you have to be prudent not to ruin working solutions once you have found them, I have kept the changes I made to the translation memory:

1. In the Translation memory view, right click on the TM you would like to use, and select 'Settings'(or 'Properties'?).
2. In the 'Segmentation rules" windows of both the source and the target language, add a new Sentenced-base segmentation rule, using a suitable description.
3. Pick 'Anything' from the dropdown lists both in 'Before Break' and 'After Break', click on 'Advanced View', type .\r[\n]+ into the 'Before break' window, leaving the 'After break' window empty.
4. Click OK as many times as necessary to close all windows and save the changes.

Now create a project from you bilingual Excel file and see if this has worked for you, too...

Stepan: bol'shoe spasibo for your help!


[Bearbeitet am 2023-04-01 15:07 GMT]


Stepan Konev
Hans Lenting
Hung Mai
 


To report site rules violations or get help, contact a site moderator:


You can also contact site staff by submitting a support request »

Segmention rule for soft line breaks in Excel when "\n" doesn't work







Trados Studio 2022 Freelance
The leading translation software used by over 270,000 translators.

Designed with your feedback in mind, Trados Studio 2022 delivers an unrivalled, powerful desktop and cloud solution, empowering you to work in the most efficient and cost-effective way.

More info »
CafeTran Espresso
You've never met a CAT tool this clever!

Translate faster & easier, using a sophisticated CAT tool built by a translator / developer. Accept jobs from clients who use Trados, MemoQ, Wordfast & major CAT tools. Download and start using CafeTran Espresso -- for free

Buy now! »