How to convert Office documents to PDF with Microsoft Graph

This tutorial aims to create a service that can be used to accurately convert Microsoft Office file formats into PDFs. The service works by storing the Office file in a SharePoint file storage, then requesting the file back in a specified format using this API call:
https://docs.microsoft.com/en-us/graph/api/driveitem-get-content-format?view=graph-rest-beta&tabs=http, before deleting the file again (saving space on the SharePoint).

This implementation majorly follows/borrows/steals from https://medium.com/medialesson/convert-files-to-pdf-using-microsoft-graph-azure-functions-20bc84d2adc4, however has been re-implemented in Java using the Microsoft Graph SDK.

The code produced in this guide can be found on GitHub and is explained in detail in the Project Setup section.

App Registration

The first thing we need to do is create an app registration, which we will use for login information and permissions for our app.

Secret Key

Navigate to the app registrations page on the azure active directory service

Azure Portal  -> Azure Active Directory -> App registrations

and create a new registration.

Set the name of your application (In this case I’ve set it to IDR-Office2PDF) and leave the rest of the options as their defaults. Then click “Register”.
Enter this new app registration, and take note of the “application (client) ID”, and the “Directory (tenant) ID”, we will use those in our application later.

Next, navigate to the “Certificates & Secrets” tab in the left menu.

From here, create a new client secret, give it a relevant description, and an appropriate expiry length, and click the add button.

Now copy and note down the value in the Value column, this will be used for authenticating requests later.

Permissions

The last thing we need to do is set up the application’s permissions. We need to configure the application to be allowed to write files so that we can upload our office files in order to then download them in the correct format.

Navigate to the app permissions page inside the application registration.

Click “Add a permission”, then, in the menu that appears, select the Microsoft Graph API.

Then Application permissions.

Then search for files, open the submenu, and select “Files.ReadWrite.All”.

Click the “Add permissions” button, then, back on the “API permissions” page, click the “Grant admin consent for X” button. If you cannot click this button (note it does take a moment to become available after adding a new permission), then you’ll need to get an admin account to navigate to this page and click it for you.

Download as a PDF

SharePoint setup

In order to download a file as a PDF, we need somewhere to upload it first. For this, we’re going to piggyback off of SharePoint.

We want to create a SharePoint site that isn’t linked to any group, to do this, navigate to https://www.office.com, open the admin panel then the SharePoint tab.

Go to sites -> Active Sites, then click create.

In the menu that pops up, click “Other Options”, then select the “document center” template (the template probably doesn’t matter, but this one has a nice list of any files that remain on the server if they fail to delete). Finally, set a name and note this down, you’ll need it in the next step.

With the site created, we need its ID in order to make API calls using it. We can find this ID by navigating to https://developer.microsoft.com/en-us/graph/graph-explorer, signing in using the same account that was set as an administrator to the SharePoint site, setting the URL to

https://graph.microsoft.com/v1.0/sites/YOUR-DOMAIN.sharepoint.com/:/sites/YOUR-SITE-NAME/?$select=id

replacing “YOUR-SITE-NAME” with the name of the site that you set in the previous step, and “YOUR-DOMAIN” to your SharePoint domain, and executing the query

After executing the query, you should get a response that looks like this:

Take note of the ID, we will use this for uploading to and downloading from the SharePoint.

Project setup

The code for the function will be written in Java using Maven.
To create the project, azure offers a lovely archetype which sets up the scaffolding of the project for us. More information can be found here:
https://docs.microsoft.com/en-us/azure/azure-functions/functions-reference-java
To create your project, run the following command:

mvn archetype:generate -DarchetypeGroupId=com.microsoft.azure -DarchetypeArtifactId=azure-functions-archetype

This will run interactively, asking for a groupId, artifactId, version, etc. Fill these in, then open the resulting project.

You should delete the files in /src/test, as we’re going to make some changes that will break these tests, which would prevent the Maven build from finishing.

Additional dependencies

For this code, we need a couple of extra dependencies:

We need Microsoft’s graph sdk for working with sharepoint, Microsoft’s Azure Identity library for authenticating our function, and Google’s Gson to process JSON responses.
Add the following to the dependencies section of your pom:

<dependency>
    <groupId>com.microsoft.graph</groupId>
    <artifactId>microsoft-graph</artifactId>
    <version>5.22.0</version>
</dependency>

<dependency>
    <groupId>com.azure</groupId>
    <artifactId>azure-identity</artifactId>
    <version>1.5.0</version>
</dependency>

<dependency>
    <groupId>com.google.code.gson</groupId>
    <artifactId>gson</artifactId>
    <version>2.9.0</version>
</dependency>

Local Settings

To load all the values we noted earlier for our function, we will use function settings, which will be exposed as environment variables. To add them to our local environment for testing, open the local.settings.json file in your project root, and add the following settings to the Values object:

"graph:TenantId": "YOUR-TENANT-ID",
"graph:ClientId": "YOUR-APPLICATION-CLIENT-ID",
"graph:ClientSecret": "YOUR-APPLICATION-CLIENT-SECRET",
"pdf:SiteId": "YOUR-SHAREPOINT-SITE-ID"

Make sure to replace the all caps text with the values we were supposed to remember from earlier.

MimeMap

The default Java Mime type handling stuff doesn’t handle Microsoft Office file types, so we’re going to create a class with a mapping for the file extension to Mime type, with a couple of helper functions as well.
Create a class called MimeMap and add the following to it:

public class MimeMap {
    // The map between extensions and Mimetypes
    private static final HashMap<String, String> map = new HashMap<>();

    static {
        // Add each office file extension and it's mimetype to the map
        // The source of these mappings can be found here: https://stackoverflow.com/a/4212908
        map.put("doc", "application/msword");
        map.put("dot", "application/msword");

        map.put("docx", "application/vnd.openxmlformats-officedocument.wordprocessingml.document");
        map.put("dotx", "application/vnd.openxmlformats-officedocument.wordprocessingml.template");
        map.put("docm", "application/vnd.ms-word.document.macroEnabled.12");
        map.put("dotm", "application/vnd.ms-word.template.macroEnabled.12");

        map.put("xls", "application/vnd.ms-excel");
        map.put("xlt", "application/vnd.ms-excel");
        map.put("xla", "application/vnd.ms-excel");

        map.put("xlsx", "application/vnd.openxmlformats-officedocument.spreadsheetml.sheet");
        map.put("xltx", "application/vnd.openxmlformats-officedocument.spreadsheetml.template");
        map.put("xlsm", "application/vnd.ms-excel.sheet.macroEnabled.12");
        map.put("xltm", "application/vnd.ms-excel.template.macroEnabled.12");
        map.put("xlam", "application/vnd.ms-excel.addin.macroEnabled.12");
        map.put("xlsb", "application/vnd.ms-excel.sheet.binary.macroEnabled.12");

        map.put("ppt", "application/vnd.ms-powerpoint");
        map.put("pot", "application/vnd.ms-powerpoint");
        map.put("pps", "application/vnd.ms-powerpoint");
        map.put("ppa", "application/vnd.ms-powerpoint");

        map.put("pptx", "application/vnd.openxmlformats-officedocument.presentationml.presentation");
        map.put("potx", "application/vnd.openxmlformats-officedocument.presentationml.template");
        map.put("ppsx", "application/vnd.openxmlformats-officedocument.presentationml.slideshow");
        map.put("ppam", "application/vnd.ms-powerpoint.addin.macroEnabled.12");
        map.put("pptm", "application/vnd.ms-powerpoint.presentation.macroEnabled.12");
        map.put("potm", "application/vnd.ms-powerpoint.template.macroEnabled.12");
        map.put("ppsm", "application/vnd.ms-powerpoint.slideshow.macroEnabled.12");

        map.put("mdb", "application/vnd.ms-access");
    }

    public static String getMimeType(String extension) {
        return map.get(extension);
    }

    public static boolean checkOfficeMimeType(String mimeType) {
        return map.containsValue(mimeType);
    }

    public static String getExtension(String mimeType) {
        Optional<Map.Entry<String, String>> extension = map.entrySet().stream().filter((entry) -> entry.getValue().equals(mimeType)).findFirst();

        return extension.map(Map.Entry::getKey).orElse(null);
    }
}

File Service

To encapsulate our HTTP operations, we will create a class called FileService. This will be a utility class and will contain only static methods and members.

File Service: Client

In order to use the Graph API, we need to build a client and authenticate it. As this client is used for the lifetime of the Function, we will add is as a static variable at the top of our FileService class.

private static final GraphServiceClient<Request> graphClient = GraphServiceClient
        .builder()
        .authenticationProvider(
                new TokenCredentialAuthProvider(
                        new ClientSecretCredentialBuilder()
                                .clientId(System.getenv("graph:ClientId"))
                                .clientSecret(System.getenv("graph:ClientSecret"))
                                .tenantId(System.getenv("graph:TenantId"))
                                .build()
                )
        )
        .buildClient();

File Service: Sharepoint Helper

Many of the upcoming functions work from the root of our sharepoint site, so we will create a helper function that abstracts accessing this.

/**
 * Get a DriveItemRequestBuilder already at the root of the configured sharepoint site
 * @return A DriveItemRequestBuilder at the site root
 */
private static DriveItemRequestBuilder getSharepointSiteRoot() {
    return graphClient
            .sites(System.getenv("pdf:SiteId"))
            .drive()
            .root();
}

File Service: File Upload

Our next function will upload the file to the SharePoint storage

/**
 * Uploads the given file to the given sharepoint storage
 * @param content An input stream of the file being uploaded
 * @param contentLength The length in bytes of the file being uploaded
 * @param contentType The Mimetype of the file being uploaded
 * @return The id of the file in the sharepoint storage
 * @throws IOException when the upload fails
 * @throws ClientException when the post request fails
 */
public static String uploadStream(ExecutionContext context, InputStream content, long contentLength, String contentType) throws ClientException, IOException {
    String fileName = UUID.randomUUID() + "." + MimeMap.getExtension(contentType);

    IProgressCallback callback = (current, max) ->
            context.getLogger().info(String.format("Uploaded %d of %d bytes", current, max));

    DriveItemCreateUploadSessionParameterSet uploadParams = DriveItemCreateUploadSessionParameterSet
            .newBuilder()
            .withItem(new DriveItemUploadableProperties())
            .build();

    UploadSession session = getSharepointSiteRoot()
            .itemWithPath(fileName)
            .createUploadSession(uploadParams)
            .buildRequest()
            .post();

    LargeFileUploadTask<DriveItem> largeFileUploadTask = new LargeFileUploadTask<>(
            session,
            graphClient,
            content,
            contentLength,
            DriveItem.class);

    LargeFileUploadResult<DriveItem> result = largeFileUploadTask.upload(0, null, callback);

    if (result.responseBody != null) {
        return result.responseBody.id;
    }

    return null;
}

File Service: File Download

Next, we need a function that requests the file we just uploaded as a PDF

/**
 * Download the file with the given fileId in the targetFormat
 * @param fileId The ID of the file to download
 * @param targetFormat The target format to download the file in
 * @return a byte array containing the converted file
 * @throws ClientException when the graph api request failsw
 * @throws IOException when the converted file cannot be read from the response
 */
public static byte[] downloadConvertedFile(String fileId, String targetFormat) throws IOException, ClientException {
    // It seems that the Java API is lacking a proper download function for items, we will need to make a custom request for the resource
    try (
            InputStream stream = graphClient
                .customRequest("/sites/" + System.getenv("pdf:SiteId") + "/drive/items/" + fileId + "/content", InputStream.class)
                .buildRequest(new QueryOption("format", targetFormat))
                .get()
    ) {
        if (stream != null) {
            return stream.readAllBytes();
        }

        throw new IOException("Failed to read file from response");
    }
}

File Service: File Deletion

To ensure we don’t leave a slowly growing mess, once we’ve converted the file and don’t need it any more, we want to delete it.

/**
 * Delete the file with the given fileId
 * @param fileId The ID of the file to delete
 * @throws ClientException when the graph api request fails
 */
public static void deleteFile(String fileId) throws ClientException {
    DriveItem item = getSharepointSiteRoot()
            .itemWithPath(fileId)
            .buildRequest()
            .delete();

    System.out.println(item.deleted.state);
}

Putting it All Together

The remainder of our code now happens in the Function class that was automatically generated for us.
First we need to make sure to set the value of the FunctionName annotation to a more appropriate value, then we want to set the HttpTrigger annotation to only listen to the POST HTTP method, have a binary data type, and a route of convert.
Finally, we want to get the file from the request, do some error checking, and run our upload, download, and delete functions.

public class Function {
    @FunctionName("Office2PDF")
    public HttpResponseMessage run(
            @HttpTrigger(
                name = "req",
                route = "convert",
                methods = {HttpMethod.POST},
                authLevel = AuthorizationLevel.ANONYMOUS,
                dataType = "binary")
                HttpRequestMessage<Optional<byte[]>> request,
            final ExecutionContext context) {
        context.getLogger().info("Java HTTP trigger processed a request.");

        if (request.getBody().isEmpty()) {
            return request.createResponseBuilder(HttpStatus.BAD_REQUEST).body("File must be attached to request").build();
        }

        byte[] body = request.getBody().get();

        // In order for this to accept a raw file, the content type needs to be "application/octet-stream", however, we
        // still need rely on the content type of the original file, thus we need it delivered separately
        String mimeType = request.getHeaders().get("content-type-actual");

        if (mimeType == null || mimeType.isEmpty()) {
            return request.createResponseBuilder(HttpStatus.UNSUPPORTED_MEDIA_TYPE).body("Please provide the file's mime-type in the header with the key: Content-Type-Actual").build();
        } else if (!MimeMap.checkOfficeMimeType(mimeType)) {
            return request.createResponseBuilder(HttpStatus.UNSUPPORTED_MEDIA_TYPE).body("Content-Type-Actual must be a valid office type").build();
        }

        String fileId = null;

        try (InputStream stream = new ByteArrayInputStream(body)) {
            fileId = FileService.uploadStream(context, stream, body.length, mimeType);
            byte[] pdf = FileService.downloadConvertedFile(fileId, "pdf");

            return request.createResponseBuilder(HttpStatus.OK).body(pdf).build();
        } catch (IOException | ClientException e) {
            context.getLogger().warning(e.getMessage());
            return request.createResponseBuilder(HttpStatus.INTERNAL_SERVER_ERROR).body(e.getMessage()).build();
        } finally {
            // Since we can exit early during the conversion, we need to make sure that if a file was created, it gets
            // deleted, successful conversion or not
            if (fileId != null) {
                try {
                    FileService.deleteFile(path, fileId);
                } catch (ClientException e) {
                    context.getLogger().warning(e.getMessage());
                }
            }
        }
    }
}

To send a file to this service, we need to make a HTTP POST request to the function endpoint with the Content-Type header set to “application/octet-stream”, the Content-Type-Actualheader set to the content type of the file being uploaded, and the body of the request set to the file being uploaded.

Note: in order for azure functions to accept the file as a binary file, the Content-Typeheader must be set to “application/octet-stream”, as we still need to know the filetype, we also pass that in a second header: Content-Type-Actual.

Note: We delete the file in a finally statement to ensure that it gets deleted even if we fail to convert it

The full code can be found here: https://github.com/idrsolutions/azure-office-conversion

Testing

Because we set up the local.settings.json file, we can test the app locally by running

mvn azure-functions:run

Which will, provided your local environment has been set up correctly, start the endpoint locally ready to send requests to.

Pushing to azure

To push to azure, we run the command:

mvn clean package

to build the jar and resources, then

mvn azure-functions:deploy

to send the files to Azure. If a functions project doesn’t already exist for this function, then executing this command will create one.

Finishing up on azure

Once you’ve pushed and created the Function on azure with the azure-functions:deploycommand, we still need to set the settings in azure. To do this, navigate to the function in azure

Azure -> Functions app -> Your function name

Then use the left menu to navigate to the configuration tab.

From here, add each config option that you added to the local settings before using the “New application settings” button. The required settings are repeated below:

"graph:TenantId": "YOUR-TENANT-ID",
"graph:ClientId": "YOUR-APPLICATION-CLIENT-ID",
"graph:ClientSecret": "YOUR-APPLICATION-CLIENT-SECRET",
"pdf:SiteId": "YOUR-SHAREPOINT-SITE-ID"

Our software libraries allow you to

Convert PDF files to HTML

Use PDF Forms in a web browser

Convert PDF Documents to an image

Work with PDF Documents in Java

Read and write HEIC and other Image formats in Java