How to Test PDF Content with Capybara

PDF documents can be challenging to test, and even more so when using your typical Rails testing tools. So what do you do when you have a great new feature that is almost entirely based on the dynamic generation of PDF documents? You want your tests to provide you with an assurance that your code works as expected and puts the right content into your PDFs. Have no fear! There is a simple way to make assertions on the text content inside a PDF.

A new feature we built recently utilizes fillable PDFs with Pdftk and Prince generated PDF files based on user data. Testing that the content is correct hinges on the test’s ability to read the PDF content. The simple way to do this is using the pdftotext command from the Xpdf PDF viewer. The only major limitation is that you have to keep the content of your PDFs simple, but in our case that was easy to do.

We added this helper method:

def pdf_response_contains(text)
  temp_pdf = Tempfile.new('pdf')
  if Capybara.current_driver == Capybara.javascript_driver
    temp_pdf << page.driver.source
  else
    temp_pdf << page.driver.response.body
  end
  temp_pdf.close
  temp_txt = Tempfile.new('txt')
  temp_txt.close
  `pdftotext -q #{temp_pdf.path} #{temp_txt.path}`
  body = File.read temp_txt.path
  body.gsub!("\f", "\n")
  body.should =~ /#{text}/
end

There are a few caveats you may notice in that method:

pdftotext acts on files, so we need to squirrel away the response body into a temporary file.
We use capybara-webkit whenever possible, but we noticed that when accessing the response body, it would wrap the content in basic HTML tags. A simple check lets us access the response content the correct way when using either a capybara-webkit or rack-test.
If you are using a version of capybara-webkit at 0.11 or less, a null byte in your PDF will truncate the response that capybara-webkit provides. Newer versions contain a patch that will fix this issue.
The last little tweak is that the PDF will contain form feeds that you will probably want to replace with new-lines.

Now you can cover your PDF feature with complete end-to-end testing to ensure your PDF generation code is correctly integrated into your application.

Check out our other tips for application development:

Automate Away the Pain of Multiple Database.yml Files

Test Pilot-Rails Integration Testing Pattern

Properly Setting HTTP_REFERER in a Rails Integration Test For a File Upload

How to Test PDF Content with Capybara

About Dan Ivovich

Best Meeting Type For A Successful Application Development Project

API Planning and Proceeding: Tell Me What You’re Working With

Triple Threat: SmartLogic President Yair Flicker Submits 3 Talk Proposals for SXSW 2024

ElixirConf 2023: Anticipating Inspiration and Connection

The Future of Elixir: Season 10 in Review

How to Test PDF Content with Capybara

About Dan Ivovich

Subscribe to SmartLogic Blog

Best Meeting Type For A Successful Application Development Project

API Planning and Proceeding: Tell Me What You’re Working With

Triple Threat: SmartLogic President Yair Flicker Submits 3 Talk Proposals for SXSW 2024

ElixirConf 2023: Anticipating Inspiration and Connection

The Future of Elixir: Season 10 in Review