What is PDF/A? A Comprehensive Overview You Need to Know
In the world of digital document management, ensuring the longevity and accessibility of documents is very important. This is where PDF/A, a standardized version of the Portable Document Format (PDF), comes into play. PDF/A is designed specifically for archiving and long-term preservation of electronic documents.
This comprehensive overview will delve into what PDF/A is, its standards, and how to apply and validate these standards using the UniPDF library.
Understanding PDF/A:
PDF/A is a subset of the PDF format defined by the International Organization for Standardization (ISO). Unlike regular PDFs, PDF/A is tailored to ensure that documents can be reproduced exactly the same way in the future. This is achieved by embedding all necessary information within the file itself, such as fonts, color profiles, and other content.
PDF/A comes in several standards, each catering to different needs and levels of document preservation:
PDF/A-1: Based on PDF 1.4, suitable for basic document archiving.
PDF/A-2: Based on PDF 1.7, includes additional features like transparency and layers.
PDF/A-3: Allows embedding of non-PDF/A files within the PDF/A file, useful for including source documents or additional metadata.
Before You Begin
Before applying PDF/A standards, you need an API key from your UniCloud account. If this is your first time using the UniPDF SDK, follow this
Setting Up Your Project
First, clone the project repository that contains the necessary Go code examples:
git clone
cd unipdf-examples/pdfa
Configure your environment variables by replacing UNIDOC_LICENSE_API_KEY
with your API credentials from your UniCloud account.
For Linux/Mac:
export UNIDOC_LICENSE_API_KEY=PUT_YOUR_API_KEY_HERE
For Windows:
set UNIDOC_LICENSE_API_KEY=PUT_YOUR_API_KEY_HERE
Applying PDF/A-1 Standard
Let’s start by applying the PDF/A-1 standard to a PDF file. Below is
package main
import (
"fmt"
"log"
"os"
"time"
"github.com/unidoc/unipdf/v3/common/license"
"github.com/unidoc/unipdf/v3/model"
"github.com/unidoc/unipdf/v3/model/pdfa"
)
func init() {
err := license.SetMeteredKey(os.Getenv(`UNIDOC_LICENSE_API_KEY`))
if err != nil {
panic(err)
}
}
func main() {
args := os.Args
if len(args) < 3 {
fmt.Printf("Usage: %s INPUT_PDF_PATH OUTPUT_PDF_PATH", os.Args[0])
return
}
inputPath := args[1]
outputPath := args[2]
start := time.Now()
reader, file, err := model.NewPdfReaderFromFile(inputPath, nil)
if err != nil {
log.Fatalf("Fail: %v\n", err)
}
defer file.Close()
pdfWriter, err := reader.ToWriter(nil)
if err != nil {
log.Fatalf("Fail: %v\n", err)
}
pdfWriter.ApplyStandard(pdfa.NewProfile1B(nil))
err = pdfWriter.WriteToFile(outputPath)
if err != nil {
log.Fatalf("Fail: %v\n", err)
}
duration := float64(time.Since(start)) / float64(time.Millisecond)
fmt.Printf("Processing time: %.2f ms\n", duration)
}
To run this code, use the following command:
run pdf_apply_standard.go <input.pdf> <output.pdf>
This code reads an input PDF, applies the PDF/A-1B standard, and writes the output to a new file. It also measures and prints the processing time.
Validating PDF/A-1 Standard
Validation ensures that a document meets the PDF/A standards. Here is how to validate a PDF/A-1 file:
package main
import (
"fmt"
"log"
"os"
"time"
"github.com/unidoc/unipdf/v3/common/license"
"github.com/unidoc/unipdf/v3/model"
"github.com/unidoc/unipdf/v3/model/pdfa"
)
func init() {
err := license.SetMeteredKey(os.Getenv(`UNIDOC_LICENSE_API_KEY`))
if err != nil {
panic(err)
}
}
func main() {
args := os.Args
if len(args) < 2 {
fmt.Printf("Usage: %s INPUT_PDF_PATH", os.Args[0])
return
}
inputPath := args[1]
start := time.Now()
inputFile, err := os.Open(inputPath)
if err != nil {
log.Fatalf("Fail: %v\n", err)
}
defer inputFile.Close()
if err != nil {
log.Fatalf("Fail: %v\n", err)
}
standards := []model.StandardImplementer{
pdfa.NewProfile1A(nil),
pdfa.NewProfile1B(nil),
}
if err = standard.ValidateStandard(detailedReader);
err != nil {
fmt.Printf("Input document didn't pass the standard: %s - %v\n", standard.StandardName(), err)
}
}
duration := float64(time.Since(start)) / float64(time.Millisecond)
}
Run this validation with the command:
go run pdfa_validate_standard.go <input.pdf>
This script checks if the input PDF meets PDF/A-1A and PDF/A-1B standards and reports the validation status.
Applying PDF/A-2 Standard
Next, let’s apply the PDF/A-2 standard:
import (
"fmt"
"log"
"os"
"time"
"github.com/unidoc/unipdf/v3/common/license"
"github.com/unidoc/unipdf/v3/model"
"github.com/unidoc/unipdf/v3/model/pdfa"
)
func init() {
err := license.SetMeteredKey(os.Getenv(`UNIDOC_LICENSE_API_KEY`))
if err != nil {
panic(err)
}
}
func main() {
args := os.Args
if len(args) < 3 {
fmt.Printf("Usage: %s INPUT_PDF_PATH OUTPUT_PDF_PATH", os.Args[0])
return
}
inputPath := args[1]
outputPath := args[2]
start := time.Now()
reader, file, err := model.NewPdfReaderFromFile(inputPath, nil)
log.Fatalf("Fail: %v\n", err)
defer file.Close()
pdfWriter, err := reader.ToWriter(nil)
if err != nil {
log.Fatalf("Fail: %v\n", err)
}
pdfWriter.ApplyStandard(pdfa.NewProfile2B(nil))
err = pdfWriter.WriteToFile(outputPath)
if err != nil {
log.Fatalf("Fail: %v\n", err)
}
duration := float64(time.Since(start)) / float64(time.Millisecond)
fmt.Printf("Processing time: %.2f ms\n", duration)
}
Validating PDF/A-2 Standard
To validate a PDF/A-2 file, use the following script:
package main
import (
"fmt"
"log"
"os"
"time"
"github.com/unidoc/unipdf/v3/common/license"
"github.com/unidoc/unipdf/v3/model"
"github.com/unidoc/unipdf/v3/model/pdfa"
)
func init() {
err := license.SetMeteredKey(os.Getenv(`UNIDOC_LICENSE_API_KEY`))
if err != nil {
panic(err)
}
}
func main() {
args := os.Args
if len(args) < 2 {
fmt.Printf("Usage: %s INPUT_PDF_PATH", os.Args[0])
return
}
inputPath := args[1]
start := time.Now()
inputFile, err := os.Open(inputPath)
if err != nil {
log.Fatalf("Fail: %v\n", err)
}
defer inputFile.Close()
detailedReader, err := model.NewCompliancePdfReader(inputFile)
if err != nil {
log.Fatalf("Fail: %v\n", err)
}
standards := []model.StandardImplementer{
pdfa.NewProfile2A(nil),
pdfa.NewProfile2B(nil),
pdfa.NewProfile2U(nil),
}
for _, standard := range standards {
if err = standard.ValidateStandard(detailedReader); err != nil {
fmt.Printf("Input document didn't pass the standard: %s - %v\n", standard.StandardName(), err)
}
}
duration := float64(time.Since(start)) / float64(time.Millisecond)
fmt.Printf("Processing time: %.2f ms\n", duration)
}
Run the validation with:
go run pdfa2_validate_standard.go <input.pdf>
This script verifies the PDF/A-2A, PDF/A-2B, and PDF/A-2U compliance of the input PDF.
Applying PDF/A-3 Standard
Lastly, let’s look at how to apply the PDF/A-3 standard:
package main
import (
"fmt"
"log"
"os"
"time"
"github.com/unidoc/unipdf/v3/common/license"
"github.com/unidoc/unipdf/v3/model"
"github.com/unidoc/unipdf/v3/model/pdfa"\
)
func init() {
err := license.SetMeteredKey(os.Getenv(`UNIDOC_LICENSE_API_KEY`))
if err != nil {
panic(err)
}
}
func main() {
args := os.Args
if len(args) < 3 {
fmt.Printf("Usage: %s INPUT_PDF_PATH OUTPUT_PDF_PATH", os.Args[0])
return
}
inputPath := args[1]
outputPath := args[2]
start := time.Now()
reader, file, err := model.NewPdfReaderFromFile(inputPath, nil)
if err != nil {
log.Fatalf("Fail: %v\n", err)
}
defer file.Close()
pdfWriter, err := reader.ToWriter(nil)
if err != nil {
log.Fatalf("Fail: %v\n", err)
}
pdfWriter.ApplyStandard(pdfa.NewProfile3B(nil))
err = pdfWriter.WriteToFile(outputPath)
if err != nil {
log.Fatalf("Fail: %v\n", err)
}
duration := float64(time.Since(start)) / float64(time.Millisecond)
fmt.Printf("Processing time: %.2f ms\n", duration)
}
Run this with:
go run pdfa3_apply_standard.go <input.pdf> <output.pdf>
This code applies the PDF/A-3B standard to the input PDF.
Validating PDF/A-3 Standard
To validate a PDF/A-3 file, use this script:
package main
import (
"fmt"
"log"
"os"
"time"
"github.com/unidoc/unipdf/v3/common/license"
"github.com/unidoc/unipdf/v3/model"
"github.com/unidoc/unipdf/v3/model/pdfa"
)
func init() {
err := license.SetMeteredKey(os.Getenv(`UNIDOC_LICENSE_API_KEY`))
if err != nil {
panic(err)
}
}
func main() {
args := os.Args
if len(args) < 2 {
fmt.Printf("Usage: %s INPUT_PDF_PATH", os.Args[0])
return
}
inputPath := args[1]
start := time.Now()
inputFile, err := os.Open(inputPath)
if err != nil {
log.Fatalf("Fail: %v\n", err)
}
defer inputFile.Close()
detailedReader, err := model.NewCompliancePdfReader(inputFile)
if err != nil {
log.Fatalf("Fail: %v\n", err)
}
standards := []model.StandardImplementer{
pdfa.NewProfile3A(nil),
pdfa.NewProfile3B(nil),
pdfa.NewProfile3U(nil),
}
for _, standard := range standards {
if err = standard.ValidateStandard(detailedReader); err != nil {
fmt.Printf("Input document didn't pass the standard: %s - %v\n", standard.StandardName(), err)
}
}
duration := float64(time.Since(start)) / float64(time.Millisecond)
fmt.Printf("Processing time: %.2f ms\n", duration)
}
Run the validation with:
go run pdfa3_validate_standard.go <input.pdf>
This script validates the input PDF against the PDF/A-3A, PDF/A-3B, and PDF/A-3U standards
Conclusion:
PDF/A is an essential format for the long-term preservation of digital documents. By embedding all necessary components within the file, PDF/A ensures that documents remain accessible and unaltered over time. Using UniPDF, you can easily apply and validate different PDF/A standards to ensure your documents comply with archival requirements.
FAQ’s
What is PDF A used for?
PDF/A is a special form of the format PDF that is designed to preserve long-term archiving of digital documents. It guarantees that documents are able to be reproduced reliably later on by embedding all of the essential elements, including fonts, color profiles and images into the document.
This format is utilized in institutions and businesses where the appearance and integrity document over the years is vital like for legal, governmental and archives documents. PDF/A files are perfect to preserve the exact appearance as well as material of documents to be used in the future for references.
What does PDF/A compliant mean?
A PDF/A-compliant document means that the PDF file is compliant with the standards established for the long-term preservation of documents. This means that everything necessary in order to present the documents in the identical manner in the near future will be contained within the document.
This covers the entire material like text or fonts as well as color information. It doesn’t include features that might make it difficult to preserve for the long term, such as dependency or external link that could not be accessed in the future.
This makes PDF/A the ideal format for documents that need to be inaccessible and untouched in time.
What is the difference between PDF and PDF/A?
Both PDF/A as well as PDF are the two formats that are used to display and store documents, however they have different functions and have distinct features:
PDF (Portable Document Format): It is a versatile format that is used extensively for the distribution of electronic documents. It is compatible with interactive features such as links, embedding multimedia and filling out pdf forms. PDF is perfect for use in general to assure that documents display exactly the same across all devices.
PDF/A (PDF for Archiving): The version known as PDF was designed specifically to archive long-term electronic documents. PDF/A guarantees that documents can be recreated exactly the same way years into the future regardless of the application employed. It limits certain features such as encryption and font linking to ensure that the appearance of the document remains unaltered and self-contained.
In the end, although the standard PDF format is suitable to be used in interactive and dynamic ways The PDF/A format is specifically designed to preserve documents, making sure they remain usable and unchanged in the course of.