Gnu pdf vs poppler




















I used Acrobat to index all the scans to create a searchable library. Is there an open source solution for something like that? Good point. For me, the one only time I need to make detailed changes to vector-based PDFs are when the subject matter is a landscape or site plan or other map, so exporting just the page that needs editing if there even are multiple pages is not much of a problem -- I'm generally editing one page in much detail.

But for people with other use cases I could imagine that being a frustration, and a good reason to use Draw instead. Works well and I can edit!

There Linux version is a very poor cousin. You just forget Scribus, the only open source document editor that manages well CMYK document for printing. Thanks, Scribus is actually mentioned under the "creating" section -- I don't have a need to manage precise print color but that's a good point for anyone who does. For splitting or merging of pdf-files I use pdfsam available for Linux and Windows.

For converting scanned images mostly scientific papers into searchable pdf-files I use gscan2pdf. It can use either tesseract or cuneiform for doing the ocr - both with mostly very poor results. I have read that tesseract is the "best" ocr-program on Linux but is miles away from "professional" closed source solutions like FineReader 10 years back sorry to say that.

I have also tried and used tesseract from the command line with the same poor results although the scans were of high quality around dpi and without artefacts. Tesseract has massive problems in recognising the page layout even from pages with only a single cloumn - not to speak of multicolumn pages and its capability of correctly recognising single characters is bad as well even if you have chosen the correct language for the text.

I have read somewhere, that tesseract has been far better in the past, but that the developers have broken it not sure, if that is true. Tools like OCR Feeder also offer to save a scanned text image with a text layer - but for me, this does not work the program completely fails to save a pdf-file at all, searchable or not.

I also sometimes use Master PDF for editing pdfs - mainly for inserting bookmarks for navigation within the document. I use pdflatex to create pdfs. It is a great program and can embed video and insert hyperlinks. My only frustration is that ONLY acrobat can access those links! I believe the issue is support for javascript from the pdf but I am not sure and hope someone will make a Linux alternative eventually.

Where Scribus shines is with complex layout of text and images and its ability to very precisely handle fonts and color. It can also import PDFs as vector drawings, or more precisely groups of vector graphics, which can be ungrouped and edited as vector drawings. Currently there is also work going on to be able to handle complex text layout with non-Latin languages and fonts.

In limited circumstances, I use Google Docs to convert pdf files with straightforward, simple pdf files. I also use CloudConvert, an add-on to Google Drive.

The latter works surprisingly well, even with fairly complicated documents. It is free for limited conversions, minimal cost for on-going bulk conversions. I didn't know about some of the recent progress in editing PDFs, I use pdflatex a lot, but also a number of other editing tools that support export to PDF.

Do you have recommendations for command-prompt-friendly PDF tools? Good question! This isn't an area I've explored much personally but I'd be really interested to do a little exploring and find out what the available tools in this area are. Do you have one that you like in particular? I suppose technically it's not what you mean, since it is used to create, edit, compose, or convert bitmap images, but it worked for me.

I've found pdftk pdf toolkit very nice for splicing together pieces of several different pre-existing pdfs. It's a command line tool. I'm not a developer, i always use this free online image to pdf converter online merge from pdfcoding. Image by :. Get the highlights in your inbox every week. Do you still use Acrobat for working with PDFs? Choices Yes, I use Acrobat. No, I've switched to an open source alternative.

Aren't we supposed to be living in a paperless world by now? It could be worse. Editing PDFs Editing is a loaded term. Being terminal-based, these are great tools for automated manipulation, too. Editor's note: This article was originally published in and has been updated. Topics Alternatives.

About the author. Jason Baker - I use technology to make the world more open. An explanation of the options used:. According to my test, pdftoppm works great and can produce the needed images quickly. If you want to use Python, there is also a package named pdf2image , which is a thin wrapper around pdftoppm.

Make sure you have installed pdftoppm and set its PATH correctly. You can then manipulate the images with the powerful functionality provided by the Pillow package. I have also written a more detailed script to directly generate images from PPT file on the command. I think I will start to read a bit more extensively about how the PDF is coded then. Or try to rethink my strategy a little bit There is a related question concerning "spreadsheet"-type of data : Extracting tables from PDF files programmatically?

Sign up or log in Sign up using Google. Sign up using Facebook. Sign up using Email and Password. Post as a guest Name. Email Required, but never shown. The Overflow Blog. Podcast Making Agile work for data science.

Stack Gives Back Featured on Meta. New post summary designs on greatest hits now, everywhere else eventually. Visit chat. Linked Related Hot Network Questions.



0コメント

  • 1000 / 1000