PDF ... inside and outside

Dienstag, 25. Januar 2022

Washing day for PDF documents and -forms!

Like i've already explained in my blog years ago Adobe products for form creation used inserted checksum-/encryption-routines to prove if a form (created by an adobe tool) was changed by a third party pdf tool.
What does this mean? If a pdf-form or pdf-document made by an adobe tool was changed by a third party tool (fill in data and save it into the form is a modification as well) is opened by the adobe reader all form fields are not usable anymore. The form is just a simple document! Only an ugly popup-message appears telling me this:

"This document enabled extended features in Adobe Acrobat Reader. The document has been changed since it was created and use of extended features is no longer available. Please contact the author for the original version of this document."

I've made these experiences with forms created by the adobe products "InDesign" and "LiveCycle Designer"...

These problems are now issues from the past! Now i've published a cmd-exe which removes the described restrictions and all usage rights within a second. I've called it "Unlock_ALD" (32- and 64 bit version) and it works from the commandline in Windows.

First start processing the pdf-file with Unlock_ALD and then we can work with the adobe reader and the foxit reader and any third party tool and perhaps again with an Adobe product... without these annoying restrictions anymore.

There are four ways to use Unlock_ALD:
1. Directly from the command-line.
2. Inside your batch-scripts (bat- or cmd-files).
3. From the explorer context menu via "send to...".
4. Inside your own published/distributed applications and projects via shell-syntax.

[ more details you can read on my website at "products" ... More informations regarding Unlock_ALD ]

[All used names of trademarks, labels and companies are under the copyright of the respective companies and belongs to these companies or individuals.]

Cheers and have a nice day,
Ingo Schmoekel

Donnerstag, 5. Dezember 2019

PrPages - Individual enhancements - everything is possible!

PrPages is used to determine the coloured and monochrome (b/w or gray) pages of a pdf-document.
This may e.g. to get more accurate costing of copy jobs or for printing costs of company-departments.

If you only have to decide whether it should be a monochrome print or a color printout, a single coloured page in a 30-pages pdf-document can generate a much more expensive color copy from a - actually - mainly b/w copy. That hurts, of course, if you have to save money ;-)
Now back to PRPages ... Which page has coloured elements, which is completely b/w we should know using PrPages. This probably helps to reduce cost in the printing area. It would be even better, if you could extract and print the coloured pages separately.

A prospect came to me with questions regarding the described problem ... He asked me to develope an individual solution for him.

After some considerations, the free tool PDFtk came to my mind again. The command line tool PDFtk offers i.a. functions to connect single pdf-pages to a new document, to split pdf-documents into single pages, to connect single pages according to specification for a new document and much more.
PDFtk and the CMD instruction set from Microsoft for creating BAT files (yes ... the files with the extension BAT) would be enough to realize the desired individual solution free of charge.

1. PrPages

PRPages creates a csv-file with all relevant data (regarding used colors and so on) for each page of a whole pdf-document.

We need a bat(ch)-file to start PrPages with some parameters in the directory of PrPages. It can be look like this:

@echo off
if "%~1"=="" goto input1
set testvar1=%~1
IF EXIST rows.txt del rows.txt
IF EXIST rows.csv del rows.csv
prpages %testvar1% E 00 rows.csv
ren rows.csv rows.txt
goto end
:input1
echo The 1st parameter for the input-file (for example: "c:\temp\catalogue.pdf") is missing!
:end

This bat(ch)-file - let us call it get_colors.bat - called from the cmd-line (eg. "c:\temp\pdftk\>") could be look like this:

get_colors c:\temp\catalogue.pdf

The result should be a file rows.txt with a content similar to this one:

c:\temp\pdftk\catalogue.pdf;1;842;595;color;
c:\temp\pdftk\catalogue.pdf;2;842;595;color;
c:\temp\pdftk\catalogue.pdf;3;842;595;bw/gray;
c:\temp\pdftk\catalogue.pdf;4;842;595;bw/gray;
c:\temp\pdftk\catalogue.pdf;5;842;595;bw/gray;

2. Doing the real page extraction and reconcatenation into two new separated files (color and b/w)

For this we use PDFtk that provides functionality for separating and merging individual files.
On the command-line this can eg. look like this:

pdftk c:\temp\catalogue.pdf cat 2 5 8 9 12 output outputc.pdf

So we have to extract from the file rows.txt the coloured pages using the determined page-numbers. With these page-numbers we have to concatenate a string as a new parameter for PDFtk ...

The most important line in the bat(ch)-file for this functionality is:
for /f "tokens=2 delims=;" %%i in ('findstr /C:";color;" %testvar2%') do echo %%i>>color.txt
tokens=2 ... means the second column in the csv-file rows.txt. The second column contains the page number.
delims=; ... determines the used field separator.
findstr ... is an internal cmd-command for string searches in files.
;color; ... shall be now the string we're searching for in rows.txt.
%testvar2% ... this variable contains the parameter from calling the bat(ch)-file (eg. the csv-file rows.txt).
do echo %%i>>color.txt ... the output of the page numbers in lines with the string ";color;" will be in color.txt.
color.txt now contains the page numbers in a vertical arrangement.
From this file content (the numbers) we have to create a string:

FOR /F %%i in (color.txt) do call set "Myvar1=%%Myvar1%% %%i"
for ... means loop-processing.
%%i ... this variable contains the currently read line content (the page number).
color.txt ... this is our file with the page numbers for the coloured pages.
Myvar1 ... this is the name of the variable for the string with the page numbers.
%%Myvar1%% ... means the content of variable Myvar1
Myvar1=%%Myvar1%% %%i ... means Myvar1=previous content of Myvar1 and additionally the new content (the next number).
At the end of the loop-processing Myvar1 contains a string like 2 5 8 9 12 for example.
Finally the last important line in the bat file can look like this:

pdftk %testvar1% cat %Myvar1% output outputc.pdf
%testvar1% ... contains file- and path-name of the original pdf-document.
cat ... is a command from PDFtk to separate pages (single or ranges).
%Myvar1% ... contains the string with the page numbers from variable Myvar1.
output ... is a command from PDFtk to write the separated pages into a new pdf-document.
outputc.pdf ... finally this file contains the coloured pages 2, 5, 8, 9 and 12 from the sample-pdf.
Finally, the whole bat file with additionally - hopefully self-explanatory - lines.
To complete the whole process, it also includes the processing for the b/w pages:

@echo off
if "%~1"=="" goto input1
set testvar1=%~1
if "%~2"=="" goto input2
set testvar2=%~2
SET "Myvar1="
SET "Myvar2="
IF EXIST color.txt del color.txt
IF EXIST bwgray.txt del bwgray.txt
for /f "tokens=2 delims=;" %%i in ('findstr /C:";color;" %testvar2%') do echo %%i>>color.txt
for /f "tokens=2 delims=;" %%i in ('findstr /C:";bw/gray;" %testvar2%') do echo %%i>>bwgray.txt
FOR /F %%i in (color.txt) do call set "Myvar1=%%Myvar1%% %%i"
for /f %%i in (bwgray.txt) do call set "Myvar2=%%Myvar2%% %%i"
IF EXIST outputc.pdf del outputc.pdf
IF EXIST outputb.pdf del outputb.pdf
echo color pages: %Myvar1%
pdftk %testvar1% cat %Myvar1% output outputc.pdf
echo bwgray pages: %Myvar2%
pdftk %testvar1% cat %Myvar2% output outputb.pdf
goto ende
:input1
echo The 1st parameter pdf-inputfile (eg.: "c:\temp\catalogue.pdf") is missing!
goto end
:input2
echo The 2nd parameter csv-outputfile (with color data) (eg.: "c:\temp\rows.txt") is missing!
:end

The variable call of this bat-file (let us call it get_pages.bat) from the command line could be eg. look like this:
get_pages c:\temp\catalogue.pdf c:\temp\rows.txt

As a result, we get two new files from the original file (which will be preserved) - one with colored PDF pages, one with b/w pages, which can then go to the copy shop or print out on a color or b/w laserprinter.

Freitag, 29. Mai 2015

Determine dpi-values from pdf- and image-files

I just received an e-mail inquiry from a customer with the question how to determine the dpi values from PDF and image files ...

The dpi value can be calculated in an easy way.
1 dpi means 1 pixel per inch and 1 inch has 2.54 cm.
An image file with a width of 1024 pixels and 10 cm width has a dpi of (1024 x 2.54) / 10 ... so in this case 260 dpi.

An A4 PDF has the standard dimensions of 595 x 842 pixels.
With 21 cm width and a pixel width of 595 this results in:
(595 x 2.54) / 21 ... so you get the common 72 dpi.

This means if an A4 PDF page is rendered directly to an image file, the quality won't be great, because the quality has a fix limit of 72 dpi.

Dienstag, 23. Dezember 2014

Foxit versus Adobe ... or "David against Goliath"

Since many years now Adobe as the driving force keep on publishing their pdf format as a "de facto standard" mainly established for online and graphical documents. Due to the fact that a document standard should be usable for all users (for reading) Adobe published the free Adobe Reader for PDF documents.

Of course pdf documents have to be created before reading ;-) On one hand there are the normal documents but on the other hand the pdf-format is well suited for all types of forms. By embedding e.g. JavaScript actions and buttons much life can be breathed into pdf forms.

For all these things, there are a number of Adobe products - paid products in a high cost range. Since version 1.0 of the PDF specifications many years have passed and with each new version of an Adobe Reader installation the required harddisk space increased by many megabytes. This fact called the Open source community into action. Source Based on a free library like e.g. Ghostscript alternative ways of free PDF creation as printer drivers were developed. These printer drivers are usable inside most office products. Two well known free pdf printer drivers are PDFCreator and FreePDF. In the meantime there are some more free pdf readers published. There are portable versions, too. So there's no need to install - only copying anywhere onto the harddisk. That's all.

One of the best adobe alternatives - perhaps the best - regarding free pdf readers is the well known Foxit PDF reader. While an Adobe Reader 10 installation needs about 457 mb harddisk space, the Foxit Reader 5.4.2 comes along with slim 44 MB! ... And less megabytes means less harddisk space and this means often higher processing speed and a faster start, too! An additional hint: If you only want to read/show pdf documents the small "Sumatra PDF" (specially the portable version) is all you need.

What could be the reason for a big company like Adobe spending a lot of money to establish a document standard? ... Of course 'cause they are always the first with new pdf-products to earn money ;-) So actually there's not only the Adobe Reader but also the high-priced products from the Adobe Acrobat series that allow everything related to the creation of PDF documents.

That pdf-products can be much cheaper (with a comparable quality) is proving the Foxit Corporation - Manufacturer of the free Foxit PDF Reader. Started with the free Foxit pdf reader, now the company offers a wide range of lean and more cost-effective products for the development and editing of pdf documents. The products are running on nearly any technical basis (windows, iOS, Android, ...). This ranges from applications for PDF editing and creation up to complete developer sdks. All this with a very "nice price" licensing procedure.

As a developer in the pdf environment of course i have several products and versions of pdf readers and various versions of the Adobe Reader installed. This allows me for example to say "runs on Adobe ... too". But in my daily business i'm just using the Foxit Reader - Give it a try ;-)

Donnerstag, 19. Dezember 2013

PrPages now supports “gray value tolerance”!

The new option “gray value tolerance” makes PrPages rather complete. What is the purpose and sense of this value which is also known as "gray balance"?

Because sometimes slightly shading gray colours could be interpreted as very slight light blue or pink color tones for example some main printer-drivers are working with an optionally so called “gray value tolerance” or "gray balance".

PrPages splits each pixel of the pdf-page into the three rgb-values. The “gray value tolerance” represents the difference between the highest and the lowest rgb-value. A value of 15 (by the way
that’s good practice) means that the highest and lowest colour-value (the values are from 0 up to 255) of a pixel can differ at most by the value 15 and still be interpreted as a shade of gray.

Without a tolerance value even the smallest deviation leads to a colour-detection because the slightest difference means that it's not real gray.

Summarized it could be gray or … perhaps a very, very light blue? Anyway not an absolutely clear gray – so there are few coloured pixels on a page and it’s up to you what you want. Some main printer-drivers are working with optionally “gray value tolerances” and if you need identical results
for your work or accounting you should use these option with PrPages, too.

Montag, 3. Dezember 2012

Analyze PDF-documents for coloured pages

The calculation of printing costs for pdf-documents separated
for each serviceline or department in a company is always a
problem. How many pages - coloured or black & white - were printed?
Which were the dimensions of the printed pages? Which are the
responsible company departments? And finally: Who should be
responsible for the costs of new toner cartridges?

Internally the pdf-structure offers the device-flags like
- for example - DeviceRGB, DeviceGray oder DeviceCMYK which
points to the used colours in the document. Better: It could
point 'cause it's not a must to use device-flags in the pdf-
document. It's possible to have coloured pages in a document
with DeviceGray and it's possible to have coloured and b/w-pages
in a document without any device-flag and it's possible to
have only b/w-pages in a document with DeviceCMYK and so on...
So the device-flags are only indicators for the used colour-
models in a document but not a save method to determine
coloured pages.

This is a big problem in service-departments! There are
solutions available ... very expensive solutions.

For the described purpose i'm offering in my product range
the module "PrPages". It's a so-called command-line exe.
A batch executable means easiest installation with all
applications and workflows in 32 - and 64-bit Windows
environments without any problems!

The module's approach is that the real colour informations
are in each image pixel and based on this informations
you can make the only reliable statement regarding the
used colours on a pdf-page.
The graphic type bitmap offers three values for colour-
settings on pixel level. "PrPages" renders the pdf-pages
temporarily into bitmap format and checks the pixel values
for the colour informations to analyze the pages as b/w or
coloured.
Depending on the options you have set, a csv-file is created.
Each line contains the filename, the page count, the page
size of the first page, the pagecount for coloured and
the pagecount for b/w-pages.
Another option results in one line for each pdf-page with
the filename, single page number, the page size and a note
for b/w or coloured.

Depending on the system environment the module works with
high performance using a safe and stable technology.
On my product pages you'll find the "PrPages" as a trial
version to check if the tool is the right choice for your
purposes - Try before you buy.

Montag, 9. April 2012

PDF, JavaScript and Code you don't want

My dear readers!

Regarding trojan- and virus-attacks often you can read about pdf-documents entering your local machine as email-attachments. While opening these attachments it's possible that embedded trojans and other malware will be installed unvisible or system settings could be changed.

This works via scriptlanguage JavaScript. With embedded JavaScript-code the functionality of a pdf-document can be greatly expanded. The embedded code will be coupled with an event like OnLoad (that's while a document will be opened) and then executed if the event happens. Normally this is a positive thing but it may also be to your detriment.

Installing your Adobe- or Foxit-PDF-Reader (which are able to interprete Javascript-Code) it's a standard that using Javascript is activated. It's an optional setting which can be deactivated by you again.

Deactivating Javascript in Adobe Reader 9 or 10 (for example) you can go this way:

...Edit -> Preferences -> JavaScript...

in the right window section you can remove the marks in the checkboxes at
"Java Script / Enable Acrobat JavaScript" and
"Java Script Security / Enable menu items Javascript execution privileges".

Deactivating Javascript in Foxit-Reader 5 goes like this:

...Tools -> Preferences -> JavaScript...

Remove the mark at the checkbox at
"Enable Javascript actions".

If you don't want to deal with such things and if an easy pdf-reader is enough for you should try the small and easy to use Sumatra PDF Reader which aren't able to interprete JavaScript.

Cheers,
Ingo