This site uses cookies.
Some of these cookies are essential to the operation of the site,
while others help to improve your experience by providing insights into how the site is being used.
For more information, please see the ProZ.com privacy policy.
toasty ইটালি Local time: 08:24 2013 থেকে সদস্য ইটালিয়ান থেকে ইংরেজি
Mar 6, 2023
Hello all,
Recently when trying to translate from PDFs in Studio 2022, the texts are all garbled, as you can see here:
This has happened with two different PDFs from two different clients, so I don't think it's a matter of a corrupted file.
Any way to fix this without converting to Word first?
Thanks in advance!
Subject:
Comment:
The contents of this post will automatically be included in the ticket generated. Please add any additional comments or explanation (optional)
Samuel Murray নেদারল্যান্ড Local time: 08:24 2006 থেকে সদস্য ইংরেজি থেকে আফ্রিকানস + ...
These look like OCR errors
Mar 6, 2023
toasty wrote:
Any way to fix this without converting to Word first?
There is no way to fix this without converting to Word. In fact, converting to Word is what Trados does as well. Trados tries to perform OCR (optical character recognition) on the file, but whether it produces useful output depends on how bad the PDF file is to begin with. I'm guessing your PDF files are so poor that a standard OCR process can't convert it to text. This means that you have to convert it to text manually, by typing the text into a Word file.
Subject:
Comment:
The contents of this post will automatically be included in the ticket generated. Please add any additional comments or explanation (optional)
toasty ইটালি Local time: 08:24 2013 থেকে সদস্য ইটালিয়ান থেকে ইংরেজি
TOPIC STARTER
High-quality PDFs actually
Mar 6, 2023
Hi Sam,
The files are high-quality PDFs created from InDesign, not bad scans of old documents or something. In one case I managed to get the InDesign file, but as we all know, Studio is "incompatibile" with those now too.
In any case, thanks for confirming, I'll work directly from the PDF
Subject:
Comment:
The contents of this post will automatically be included in the ticket generated. Please add any additional comments or explanation (optional)
Hans Lenting নেদারল্যান্ড 2006 থেকে সদস্য জার্মান থেকে ডাচ
SolidPDF
Mar 6, 2023
Samuel Murray wrote:
Trados tries to perform OCR (optical character recognition) on the file
Doesn't SolidPDF (the converter) try to convert without OCR first?
The extra markup is likely caused by kerning instructions in InDesign.
Perhaps you can convert the PDF manually to DOCX and then use either TransTools or David's CodeZapper?
BTW: You can send me an example and I will see how my CAT tool's OCR filter handles it.
Subject:
Comment:
The contents of this post will automatically be included in the ticket generated. Please add any additional comments or explanation (optional)
Samuel Murray নেদারল্যান্ড Local time: 08:24 2006 থেকে সদস্য ইংরেজি থেকে আফ্রিকানস + ...
Hans makes
Mar 6, 2023
Hans Lenting wrote:
Doesn't SolidPDF (the converter) try to convert without OCR first?
Hans makes a good point -- the mistakes that I see, look like typical OCR errors, but it could be that Trados is just confused by some elements in the files.
Subject:
Comment:
The contents of this post will automatically be included in the ticket generated. Please add any additional comments or explanation (optional)
The leading translation software used by over 270,000 translators.
Designed with your feedback in mind, Trados Studio 2022 delivers an unrivalled, powerful desktop
and cloud solution, empowering you to work in the most efficient and cost-effective way.
Create your account in minutes, and start working! 3-month trial for agencies, and free for freelancers!
The system lets you keep client/vendor database, with contacts and rates, manage projects and assign jobs to vendors, issue invoices, track payments, store and manage project files, generate business reports on turnover profit per client/manager etc.