Sunday 30 September 2018

Outputting HTML Content to PDF in C#

If you type "C# PDF Library" into google you will find there are lots of different offerings out there.

My search turned up IronPDF which costs $399. We have got EOPdf which costs $749 or abcPDF at $329, Hey, what about ExpertPDF HtmlToPdf Converter which is $550.

But what about free?

Well after a bit more interweb searching, I came across HTML Renderer for PDF using PdfSharp. Now, I have previously used the HTMLRenderer in previous projects such as my own DataGridViewHTMLCell project so naturally, it was my first port of call.

But when testing out this library I hit a snag...

You see I want to be able to create bookmarks in PDF as well as forcing page breaks in HTML and, unfortunately, during my testing I was not able to to achieve this using "HTML Renderer For PDF". It seems, if all you want to do is throw some HTML together and convert it to PDF and don't care about formatting, then HTML Renderer will be perfect for you, especially as its light-weight, performant and 100% managed code.

And so, my search continued!

..I guess at this point, I should back up and explain the requirements/features I am looking for in my PDF library

  • The ability to output to PDF (which is err.. self-explanatory!)
  • More importantly, the ability to output HTML to PDF
  • The ability for library to be used in Click Once App
  • The ability to create bookmarks in PDF Documents (This is important)
  • The ability to force page breaks in HTML (This is important)
  • The ability to set page orientation (landscape, portrait)
  • The ability to set page size (A4)

Now, before continuing I should say HTML Renderer is built on PdfSharp which looks very good and in theory should allow me to create bookmarks and page breaks but I could not get HTML Renderer to work for me. I didn't spend too much testing however and your mileage my vary.

OpenHtmlToPDF

Now, if you do enough digging eventually you will come across wkhtmltopdf. It's an "open source command line tool to render HTML into PDF using the Qt WebKit rendering engine" and it seems a lot of PDF libraries just sit on top of this library and leverage its functionality.

There are a number of libraries that make use of wkhtmltopdf such as NRecoPDF but what I liked about OpenHtmlToPDF is that the wkhtmltopdf.exe is held inside the OpenHtmlToPDF DLL as an embedded resource. This means, when it comes to apps that use Click Once deployment, everything just works as you don't have to manually specify the .exe as part of your deployment.

So, what does the code look like? Well, first install OpenHtmlToPDF via nuget and then run this simplest example:

private void TestOpenHtmlToPDF(string html, string fileName)
{
    var pdfBytes = Pdf.From(html).Content();
    System.IO.File.WriteAllBytes(fileName, pdfBytes);
}

Page Breaks

Like I mentioned, one of my requirements was to be able to programmatically force a page break from HTML. Fortunately, it wasn't long before I stumbled across this StackOverflow post that helped me here.

<!DOCTYPE html>
<html>
<head>
    <meta http-equiv="content-type" content="text/html;charset=UTF-8" />
    <title>Paginated HTML</title>
    <style type="text/css" media="print">
        div.page {
            page-break-after: always;
            page-break-inside: avoid;
        }
    </style>
</head>
<body>
    <div class="page">
        <h1>This is Page 1</h1>
        <p>some text</p>
    </div>

    <div class="page">
        <h1>This is Page 2</h1>
 <p>some more text</p>
    </div>

    <div class="page">
        <h1>This is Page 3</h1>
        <p>some more text</p>
    </div>
</body>
</html>

With the above HTML template, and the following code snippet below which sets the wkhtmltopdf setting "web.printMediaType" to true:

private void TestOpenHtmlToPDF(string html, string fileName)
{
    var pdfBytes = Pdf.From(html)
                      .WithObjectSetting("web.printMediaType", "true")
                      .Content();

    System.IO.File.WriteAllBytes(fileName, pdfBytes);
}

OpenHtmlToPDF produces the following PDF file. Notice that the bookmarks are automatically generated from HTML <H1> tags and that the content is separated across pages. Exactly what I was looking for.

In fact you can access any of the underlying wkhtmltopdf settings using code like the following:

private void TestOpenHtmlToPDF(string html, string fileName)
{
    var pdfBytes = Pdf.From(html)
                      .WithObjectSetting("web.printMediaType", "true")
                      .WithGlobalSetting("size.paperSize", "A4")              
                      .WithObjectSetting("web.printMediaType", "true")       
                      .WithObjectSetting("footer.line", "me a footer!")
                      .WithObjectSetting("footer.center", "me a footer!")    
                      .WithObjectSetting("footer.left", "left text")        
                      .WithObjectSetting("footer.right", "right text")       
                      .WithMargins(0.5.Centimeters())
                      .Content();

    System.IO.File.WriteAllBytes(fileName, pdfBytes);
}

There are lots of settings which can be passed to wkhtmltopdf. You can find a listing of them here.

Wrapping Up

All in all if you want a bit more control over the HTML/PDF content you produce then OpenHtmlToPDF and wkhtmltopdf might be the ticket. Its a thumbs up from me!

web counter
Contact Me:  ocean.airdrop@gmail.com

Popular Posts

Recent Posts

Unordered List

Text Widget

Pages