GSCAN2PDF(1) User Contributed Perl Documentation GSCAN2PDF(1)
NAME
gscan2pdf - A GUI to produce PDFs or DjVus from scanned documents
USAGE
1. Scan one or several pages in with File/Scan
2. Create PDF of selected pages with File/Save
REQUIRED ARGUMENTS
None
OPTIONS
gscan2pdf has the following command-line options:
--device=device
Specifies the device to use, instead of getting the list of devices
from via the SANE API. This can be useful if the scanner is on a
remote computer which is not broadcasting its existence.
--help
Displays this help page and exits.
--log=log-file
Specifies a file to store logging messages.
--debug, --info, --warn, --error, --fatal
Defines the log level. If a log file is specified, this defaults
to --debug, otherwise --error.
--import=PDF|DjVu|images
Imports the specified file(s). If the document has more than one
page, a window is displayed to select the required pages.
--import-all=PDF|DjVu|images Imports all pages of the specified
file(s).
--version
Displays the program version and exits.
Scanning is handled with SANE via scanimage. PDF conversion is done by
PDF::Builder. TIFF export is handled by libtiff (faster and smaller
memory footprint for multipage files).
DIAGNOSTICS
To diagnose a possible error, start gscan2pdf from the command line
with logging enabled:
"gscan2pdf --log=file.log"
and check file.log.
EXIT STATUS
None
CONFIGURATION
gscan2pdf creates a text resource file in ~/.config/gscan2pdfrc. The
directory can be changed by setting the $XDG_CONFIG_HOME variable.
Generally, however, preferences should be changed via the
Edit/Preferences menu, or are captured automatically during normal
usage of the program.
INCOMPATIBILITIES
None known.
BUGS AND LIMITATIONS
Whilst it is possible to import PDFs, this is intended to be able to
round-trip files created by gscan2pdf.
Download
gscan2pdf is available on Sourceforge
().
Debian-based
If you are using Debian, you should find that sid
has the latest version already
packaged.
If you are using a Ubuntu-based system, you can automatically keep up
to date with the latest version via the ppa:
"sudo apt-add-repository ppa:jeffreyratcliffe/ppa"
If you are you are using Synaptic, then use menu Edit/Reload Package
Information, search for gscan2pdf in the package list, and lo and
behold, you can install the nice shiny new version.
From the command line:
"sudo apt update"
"sudo apt install gscan2pdf"
From source
The source is hosted in the files section of the gscan2pdf project on
Sourceforge ().
From the repository
gscan2pdf uses Git for its Revision Control System. You can browse the
tree at .
Git users can clone the complete tree with "git clone
git://git.code.sf.net/p/gscan2pdf/code"
Building gscan2pdf from source
Having downloaded the source either from a Sourceforge file release, or
from the Git repository, unpack it if necessary with "tar xvfz
gscan2pdf-x.x.x.tar.gz cd gscan2pdf-x.x.x"
"perl Makefile.PL", will create the Makefile.
"make test" should run several hundred tests to confirm that things
will work properly on your system.
You can install directly from the source with "make install", but
building the appropriate package for your distribution should be as
straightforward as "make debdist" or "make rpmdist". However, you will
additionally need the rpm, devscripts, fakeroot, debhelper and gettext
packages.
Dependencies
The list below looks daunting, but all packages are available from any
reasonable up-to-date distribution. If you are using Synaptic, having
installed gscan2pdf, locate the gscan2pdf entry in Synaptic, right-
click it and you can install them under Recommends. Note also that the
library names given below are the Debian/Ubuntu ones. Those
distributions using RPM typically use perl(module) where Debian has
libmodule-perl.
Required
libgtk3-perl >= 0.028
There is a bug in version of libgtk3-perl before 0.028 that
causes gscan2pdf to crash when saving. Whilst I could prevent
gscan2pdf from crashing, it would still be impossible to save
anything, rendering gscan2pdf rather useless.
libgtk3-simplelist-perl
A simple interface to Gtk3's complex MVC list widget
liblocale-gettext-perl (>= 1.05)
Using libc functions for internationalisation in Perl
libpdf-builder-perl
provides the functions for creating PDF documents in Perl
libsane
API library for scanners
libimage-sane-perl
Perl bindings for libsane.
libset-intspan-perl
manages sets of integers
libtiff-tools
TIFF manipulation and conversion tools
Imagemagick
Image manipulation programs
perlmagick
A perl interface to the libMagick graphics routines
sane-utils
API library for scanners -- utilities.
Optional
sane
scanner graphical frontends. Only required for the scanadf
frontend.
unpaper
post-processing tool for scanned pages. See
.
xdg-utils
Desktop integration utilities from freedesktop.org. Required
for Email as PDF. See
djvulibre-bin
Utilities for the DjVu image format. See
gocr
A command line OCR. See .
tesseract
A command line OCR. See
cuneiform
A command line OCR. See
Support
There are two mailing lists for gscan2pdf:
gscan2pdf-announce
A low-traffic list for announcements, mostly of new releases. You
can subscribe at
gscan2pdf-help
General support, questions, etc.. You can subscribe at
Reporting bugs
Before reporting bugs, please read the "FAQs" section.
Please report any bugs found, preferably against the Debian
package[1][2]. You do not need to be a Debian user, or set up an
account to do this. The Debian tool "reportbug" provides a convenient
GUI for doing so.
1. https://packages.debian.org/sid/gscan2pdf
2. https://www.debian.org/Bugs/
Alternatively, there is a bug tracker for the gscan2pdf project on
Sourceforge
().
Please include the log file created by "gscan2pdf --log=log" with any
new bug report.
Translations
gscan2pdf has already been partly translated into several languages.
If you would like to contribute to an existing or new translation,
please check out Rosetta:
Note that the translations for the scanner options are taken directly
from sane-backends. If you would like to contribute to these, you can
do so either at contact the sane-devel mailing list
(sane-devel@lists.alioth.debian.org) and have a look at the po/
directory in the source code .
Alternatively, Ubuntu has its own translation project. For the 9.04
release, the translations are available at
If you have updated an ".po" file in the "po" directory of the
gscan2pdf source tree and would like to test it, pick a test directory
for the compiled locales, e.g. "./locale", and create the ".mo" files
with:
"perl Makefile.PL LOCALEDIR=./locale"
If the updated locale is your standard one, then the following will
find the updated file:
"perl -I lib bin/gscan2pdf --log=log --locale=locale"
If it is not your standard locale, you will need something like (for
Russian):
"LC_ALL=ru_RU.utf8 LC_MESSAGES=ru_RU.utf8 LC_CTYPE=ru_RU.utf8
LANG=ru_RU.utf8 LANGUAGE=ru_RU.utf8 perl -I lib bin/gscan2pdf --log=log
--locale=locale"
or German:
"LC_ALL=de_DE LC_MESSAGES=de_DE LC_CTYPE=de_DE LANG=de_DE
LANGUAGE=de_DE perl -I lib bin/gscan2pdf --log=log --locale=locale"
If the above doesn't work, make sure it is in the list produced by
"locale -a", including any ".utf8" suffix. If necessary, generate new
locales with "sudo dpkg-reconfigure locales"
DESCRIPTION
File
New
Clears the page list.
Open
Opens any format that imagemagick supports. PDFs will have their
embedded images extracted and imported one per page.
Note that files can also be imported by dragging them into the
thumbnail list from a program like nautilus or konqueror.
Scan
Sets options before scanning via SANE.
Device
Chooses between available scanners.
# Pages
Selects the number of pages, or all pages to scan.
Source document
Selects between single sided or double sides pages.
This affects the page numbering. Single sided scans are numbered
consecutively. Double sided scans are incremented (or decremented, see
below) by 2, i.e. 1, 3, 5, etc..
Side to scan
If double sided is selected above, assuming a non-duplex scanner, i.e.
a scanner that cannot automatically scan both sides of a page, this
determines whether the page number is incremented or decremented by 2.
To scan both sides of three pages, i.e. 6 sides:
1. Select:
# Pages = 3 (or "all" if your scanner can detect when it is out of
paper)
Double sided
Facing side
2. Scans sides 1, 3 & 5.
3. Put pile back with scanner ready to scan back of last page.
4. Select:
# Pages = 3 (or "all" if your scanner can detect when it is out of
paper)
Double sided
Reverse side
5. Scans sides 6, 4 & 2.
6. gscan2pdf automatically sorts the pages so that they appear in the
correct order.
Device-dependent options
These, naturally, depend on your scanner. They can include
Page size.
Mode (colour/black & white/greyscale)
Resolution (in PPI)
Batch-scan
Guarantees that a "no documents" condition will be returned after
the last scanned page, to prevent endless flatbed scans after a
batch scan.
Wait-for-button/Button-wait
After sending the scan command, wait until the button on the
scanner is pressed before actually starting the scan process.
Source
Selects the document source. Possible options can include Flatbed
or ADF. On some scanners, this is the only way of generating an
out-of-documents signal.
Save
Saves the selected or all pages as a PDF, DjVu, TIFF, PNG, JPEG, PNM or
GIF.
Metadata
Metadata are information that are not visible when viewing the
PDF/DjVu, but are embedded in the file and so searchable and can be
examined, typically with the "Properties" option of the document
viewer.
The metadata are completely optional, but can also be used to generate
the filename see preferences for details.
The date can be selected with use of the calendar widget. The displayed
date can be incremented or decremented with use of the '+' and '-'
keys.
DjVu
Both black and white, and colour images produce better compression than
PDF. See for more details.
Email as PDF
Attaches the selected or all pages as a PDF to a blank email. This
requires xdg-email, which is in the xdg-utils package. If this is not
present, the option is ghosted out.
Print
Prints the selected or all pages.
Compress temporary files
If your temporary ($TMPDIR) directory is getting full, this function
can be useful - compressing all images at LZW-compressed TIFFs. These
require much less space than the PNM files that are typically produced
by SANE or by importing a PDF.
Edit
Delete
Deletes the selected page.
Renumber
Renumbers the pages from 1..n.
Note that the page order can also be changed by drag and drop in the
thumbnail view.
Select
The select menus can be used to select, all, even, odd, blank, dark or
modified pages. Selecting blank or dark pages runs imagemagick to make
the decision. Selecting modified pages selects those which have
modified by threshold, unsharp, etc., since the last OCR run was made.
Properties
When an image is scanned, gscan2pdf attempts to extract the resolution
from the scan options. This nearly always works without problem.
Importing an image can be trickier, however. Some image formats such as
PNM do not encode metadata for resolution. In other cases, the data is
incorrect. Edit/Properties allows the user to manually correct the
metadata for a particular page, thus correcting the size of final PDF
or DjVu. The image itself is otherwise not changed - it is not down- or
upscaled.
Preferences
The preferences menu item allows the control of the default behaviour
of various functions. Most of these are self-explanatory.
Frontends
gscan2pdf initially supported two frontends, scanimage and scanadf.
scanadf support was added when it was realised that scanadf works
better than scanimage with some scanners. On Debian-based systems,
scanadf is in the sane package, not, like scanimage, in sane-utils. If
scanadf is not present, the option is obviously ghosted out.
In 0.9.27, Perl bindings for SANE were introduced. These are called
libsane-perl.
Before 1.2.0, options available through CLI frontends like scanimage
were made visible as users asked for them. In 1.2.0, all options can be
shown or hidden via Edit/Preferences, along with the ability to specify
which options trigger a reload.
In 1.8.3, New Perl bindings for SANE were introduced. These are called
libimage-sane-perl and are the preferred frontend.
In 1.8.5, support for libsane-perl was removed.
Device blacklist
Ignore listed devices.
Note that this is a device name regular expression, e.g. /dev/video,
and not the name as listed in the scan window, e.g. Noname
Integrated_Webcam_HD.
Default filename for PDF or DjVu files
All strftime codes (e.g. %Y for the current year) are available as
variables, with the following additions:
%Da author
%De filename extension
%Dt title
All document date codes use strftime codes with a leading D, e.g.:
%DY document year
%Dm document month
%Dd document day
View
Zoom 100%
Zooms to 1:1. How this appears depends on the desktop resolution.
Zoom to fit
Scales the view such that all the page is visible.
Zoom in
Zoom out
Rotate 90 clockwise
The rotate options require the package imagemagick and, if this is not
present, are ghosted out.
Rotate 180
Rotate 90 anticlockwise
Tools
Threshold
Changes all pixels darker than the given value to black; all others
become white.
Unsharp mask
The unsharp option sharpens an image. The image is convolved with a
Gaussian operator of the given radius and standard deviation (sigma).
For reasonable results, radius should be larger than sigma. Use a
radius of 0 to have the method select a suitable radius.
Crop
unpaper
unpaper (see ) is a utility
for cleaning up a scan.
OCR (Optical Character Recognition)
The gocr, tesseract or cuneiform utilities are used to produce text
from an image.
There is an OCR output buffer for each page and is embedded as plain
text behind the scanned image in the PDF produced. This way, Beagle can
index (i.e. search) the plain text.
In DjVu files, the OCR output buffer is embedded in the hidden text
layer. Thus these can also be indexed by Beagle.
There is an interesting review of OCR software at
.
An important conclusion was that 400ppi is necessary for decent
results.
Up to v2.04, the only way to tell which languages were available to
tesseract was to look for the language files. Therefore, gscan2pdf
checks the path returned by:
"tesseract '' '' -l ''"
If there are no language files in the above location, then gscan2pdf
assumes that tesseract v1.0 is installed, which had no language files.
Variables for user-defined tools
The following variables are available:
%i input filename
%o output filename
%r resolution
An image can be modified in-place by just specifying %i.
FAQs
Why isn't option xyz available in the scan window?
Possibly because SANE or your scanner doesn't support it.
If an option listed in the output of "scanimage --help" that you would
like to use isn't available, send me the output and I will look at
implementing it.
I've only got an old flatbed scanner with no automatic sheetfeeder. How do
I scan a multipage document?
In Edit/Preferences, tick the box "Allow batch scanning from flatbed".
Some Brother scanners report "out of documents", despite scanning from
flatbed. This can be worked around by ticking the box "Force new scan
job between pages".
If you are lucky, you have an option like Wait-for-button or Button-
wait, where the scanner will wait for you to press the scan button on
the device before it starts the scan, allowing you to scan multiple
pages without touching the computer.
If you are quick, you might be able to change the document on the
flatbed whilst the scan head is returning.
Otherwise, you have to set the number of pages to scan to 1 and hit the
scan button on the scan window for each page.
Why is option xyz ghosted out?
Probably because the package required for that option is not installed.
Email as PDF requires xdg-email (xdg-utils), unpaper and the rotate
options require imagemagick.
Why can I not scan from the flatbed of my HP scanner?
Generally for HP scanners with an ADF, to scan from the flatbed, you
should set "# Pages" to "1", and possibly "Batch scan" to "No".
When I update gscan2pdf using the Update Manager in Ubuntu, why is the list
of changes never displayed?
As far as I can tell, this is pulled from changelogs.ubuntu.com, and
therefore only the changelogs from official Ubuntu builds are
displayed.
Why can gscan2pdf not find my scanner?
If your scanner is not connected directly to the machine on which you
are running gscan2pdf and you have not installed the SANE daemon,
saned, gscan2pdf cannot automatically find it. In this case, you can
specify the scanner device on the command line:
"gscan2pdf --device
How can I search for text in the OCR layer of the finished PDF or DJVU
file?
pdftotext or djvutxt can extract the text layer from PDF or DJVU files.
See the respective man pages for details.
Having opened a PDF or DJVU file in evince or Acrobat Reader, the
search function will typically find the page with the requested text
and highlight it.
There are various tools for searching or indexing files, including PDF
and DJVU:
o (meta) Tracker ()
o plone ()
o pdfgrep (
o swish-e ()
o recoll ()
o terrier ()
How can I change the colour of the selection box in the image viewer?
Create a file called "~/.config/gtk-3.0/gtk.css" with the following
content:
.rubberband,
rubberband,
flowbox rubberband,
treeview.view rubberband,
.content-view rubberband,
.content-view .rubberband {
border: 1px solid #2a76c6;
background-color: rgba(42, 118, 198, 0.2); }
How can I change the colour of the OCR output
Create a file called "~/.config/gtk-3.0/gtk.css" with the following
content:
#gscan2pdf-ocr-output {
color: black;
}
See Also
XSane ()
Scan Tailor ()
Author
Jeffrey Ratcliffe (jffry at posteo dot net)
Thanks to
o all the people who have sent patches, translations, bugs and
feedback.
o the gtk+ project for a most excellent graphics toolkit.
o the Gtk3-Perl project for their superb Perl bindings for GTK3.
o The SANE project for scanner access
o Bjorn Lindqvist for the gtkimageview widget
o Sourceforge for hosting the project.
LICENSE AND COPYRIGHT
Copyright (C) 2006--2024 Jeffrey Ratcliffe
This program is free software: you can redistribute it and/or modify it
under the terms of the version 3 GNU General Public License as
published by the Free Software Foundation.
This program is distributed in the hope that it will be useful, but
WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
General Public License for more details.
You should have received a copy of the GNU General Public License along
with this program. If not, see .
perl v5.38.2 2024-08-27 GSCAN2PDF(1)