For the htmldoc PDF generator to start supporting Cyrillic fonts, the three-step procedure is to be performed:
Replacing the HTML Cyrillic Fonts
The complete set of Cyrillic fonts that can be used with htmldoc is available over the Internet, for example here:
http://fonts.kolodka.com/htmldoc.cyr.fonts-0.1.tar.bz2
The GPL Cyrillic fonts were used as a source, and developer of this archive just performed the did pfb2pfa conversion, renamed the fonts according to htmldoc requirements and changed FontName, FullName and FamilyName attributes.
Please note that these fonts size is rather big. About 250 KBytes each. The way how the htmldoc includes fonts to PDF now far from optimal, so expect the resulting PDF file size not less than 1MB. If you think that this is too much, you can significantly reduce the size of PDF using Ghostscript together with htmldoc. You will find some tips in the Readme file inside package.
To install the fonts, unbzip and untar the archive. It will be automatically extracted into the fonts/
directory. Then overwrite the htmldoc original fonts with the extracted ones. By default, the htmldoc fonts are located in the /usr/share/htmldoc/fonts/
directory.
Example of Decoding Script
Parallels Business Automation - Standard provides the HTML content, encoded with UTF-8, and with all symbols having number greater than 127, replaced with &#
number;
HTML entities, where number is actual symbol number, for example, ñ
.
Below is the example script (to_pdf.pl)
, that converts the Parallels Business Automation - Standard data into the format suitable for htmldoc utility, then calls htmldoc and creates PDF in the Cyrillic font. Put the script into the directory accessible and executable for apache.
#!/usr/bin/perl
# Convert source files to 1251 encoding and PDF
# Usage: perl topdf.pl result_filename [source html files]
use strict;
use Encode qw(from_to);
my $f = shift;
foreach my $file (@ARGV) {
my $text = load_file($file);
Encode::_utf8_on($text);
$text =~ s/&\#(\d+);/chr($1)/ge; ## fix html characters after 127
Encode::_utf8_off($text);
from_to($text,'utf8','cp-1251'); ## encode
save_file($file,$text);
}
## call htmldoc
system ("/usr/bin/htmldoc --webpage --embedfonts --charset cp-1251 -f $f @ARGV");
sub load_file {
my ($file) = @_;
open (F, "< $file") or die $!;
local $/ = undef;
<F>;
}
sub save_file {
my ($file,$text) = @_;
open (F, "> $file") or die $!;
local $/ = undef;
print F $text;
close(F);
}
Configuring PDF Generator in Parallels Business Automation - Standard
The last, but not the least thing to do is specifying the path to the decoding script in the PDF Generator command line template.
Log in to the Provider Control Center, and go to Configuration Director > Miscellaneous Settings > PDF Generator Setup. Click the Edit button and enter the command template into the PDF Generator Template field. For example, if you have put the decoding script into the perl/var/opt/hspc-root/
directory, enter the following:
perl /var/opt/hspc-root/topdf.pl %target_file% %source_files%