Garbled text from PDF in Studio 2022
Thread poster: toasty
toasty
toasty  Identity Verified
Italy
Local time: 13:47
Member (2013)
Italian to English
Mar 6, 2023

Hello all,

Recently when trying to translate from PDFs in Studio 2022, the texts are all garbled, as you can see here:
Garbled pdf

This has happened with two different PDFs from two different clients, so I don't think it's a matter of a corrupted file.

Any way to fix this without converting to Word first?

Thanks in advance!


 
Samuel Murray
Samuel Murray  Identity Verified
Netherlands
Local time: 13:47
Member (2006)
English to Afrikaans
+ ...
These look like OCR errors Mar 6, 2023

toasty wrote:
Any way to fix this without converting to Word first?

There is no way to fix this without converting to Word. In fact, converting to Word is what Trados does as well. Trados tries to perform OCR (optical character recognition) on the file, but whether it produces useful output depends on how bad the PDF file is to begin with. I'm guessing your PDF files are so poor that a standard OCR process can't convert it to text. This means that you have to convert it to text manually, by typing the text into a Word file.


 
toasty
toasty  Identity Verified
Italy
Local time: 13:47
Member (2013)
Italian to English
TOPIC STARTER
High-quality PDFs actually Mar 6, 2023

Hi Sam,

The files are high-quality PDFs created from InDesign, not bad scans of old documents or something. In one case I managed to get the InDesign file, but as we all know, Studio is "incompatibile" with those now too.

In any case, thanks for confirming, I'll work directly from the PDF


 
Hans Lenting
Hans Lenting
Netherlands
Member (2006)
German to Dutch
SolidPDF Mar 6, 2023

Samuel Murray wrote:

Trados tries to perform OCR (optical character recognition) on the file


Doesn't SolidPDF (the converter) try to convert without OCR first?

The extra markup is likely caused by kerning instructions in InDesign.

Perhaps you can convert the PDF manually to DOCX and then use either TransTools or David's CodeZapper?

BTW: You can send me an example and I will see how my CAT tool's OCR filter handles it.


 
Samuel Murray
Samuel Murray  Identity Verified
Netherlands
Local time: 13:47
Member (2006)
English to Afrikaans
+ ...
Hans makes Mar 6, 2023

Hans Lenting wrote:
Doesn't SolidPDF (the converter) try to convert without OCR first?

Hans makes a good point -- the mistakes that I see, look like typical OCR errors, but it could be that Trados is just confused by some elements in the files.


 


To report site rules violations or get help, contact a site moderator:


You can also contact site staff by submitting a support request »

Garbled text from PDF in Studio 2022







Trados Studio 2022 Freelance
The leading translation software used by over 270,000 translators.

Designed with your feedback in mind, Trados Studio 2022 delivers an unrivalled, powerful desktop and cloud solution, empowering you to work in the most efficient and cost-effective way.

More info »
TM-Town
Manage your TMs and Terms ... and boost your translation business

Are you ready for something fresh in the industry? TM-Town is a unique new site for you -- the freelance translator -- to store, manage and share translation memories (TMs) and glossaries...and potentially meet new clients on the basis of your prior work.

More info »