Jacob Lucas Jacob is a Java Developer at IDRsolutions. He enjoys juggling programming projects, PC games & hardware, and breaking the company's build process.

How to convert Office documents to PDF with Microsoft Graph

10 min read

How to convert Office documents to PDF with Microsoft Graph

Contents

This tutorial aims to create a service that can be used to accurately convert Microsoft Office file formats into PDFs. The service works by storing the Office file in a SharePoint file storage, then requesting the file back in a specified format using this API call:
https://docs.microsoft.com/en-us/graph/api/driveitem-get-content-format?view=graph-rest-beta&tabs=http, before deleting the file again (saving space on the SharePoint).

This implementation majorly follows/borrows/steals from https://medium.com/medialesson/convert-files-to-pdf-using-microsoft-graph-azure-functions-20bc84d2adc4, however has been re-implemented in Java.

The code produced in this guide can be found on GitHub and is explained in detail in the Project Setup section.

App Registration

The first thing we need to do is create an app registration, which we will use for login information and permissions for our app.

Secret Key

Navigate to the app registrations page on the azure active directory service
Azure Portal  -> Azure Active Directory -> App registrations
and create a new registration.

 

Registrations page

 

Set the name of your application (In this case I’ve set it to IDR-Office2PDF) and leave the rest of the options as their defaults. Then click “Register”.
Enter this new app registration, and take note of the “application (client) ID”, and the “Directory (tenant) ID”, we will use those in our application later.

 

The application page

 

Next, navigate to the “Certificates & Secrets” tab in the left menu.

 

 

From here, create a new client secret, give it a relevant description, and an appropriate expiry length, and click the add button.

 

Add a client secret

 

Now copy and note down the value in the Value column, this will be used for authenticating requests later.

 

Value column

 

Permissions

The last thing we need to do is set up the application’s permissions. We need to configure the application to be allowed to write files so that we can upload our office files in order to then download them in the correct format.

Navigate to the app permissions page inside the application registration.

 

app permissions page

 

Click “Add a permission”, then, in the menu that appears, select the Microsoft Graph API.

 

Microsoft APIs

 

Then Application permissions.

 

Application permissions

 

Then search for files, open the submenu, and select “Files.ReadWrite.All”.

 

Request API permissions

 

Click the “Add permissions” button, then, back on the “API permissions” page, click the “Grant admin consent for X” button. If you cannot click this button (note it does take a moment to become available after adding a new permission), then you’ll need to get an admin account to navigate to this page and click it for you.

 

Download as a PDF

SharePoint setup

In order to download a file as a PDF, we need somewhere to upload it first. For this, we’re going to piggyback off of SharePoint.

We want to create a SharePoint site that isn’t linked to any group, to do this, navigate to https://www.office.com, open the admin panel then the SharePoint tab.

 

SharePoint tab

 

Go to sites -> Active Sites, then click create.

 

Active Sites

 

In the menu that pops up, click “Other Options”, then select the “document center” template (the template probably doesn’t matter, but this one has a nice list of any files that remain on the server if they fail to delete). Finally, set a name and note this down, you’ll need it in the next step.

With the site created, we need its ID in order to make API calls using it. We can find this ID by navigating to https://developer.microsoft.com/en-us/graph/graph-explorer, signing in using the same account that was set as an administrator to the SharePoint site, setting the URL to

https://graph.microsoft.com/v1.0/sites/YOUR-DOMAIN.sharepoint.com/:/sites/YOUR-SITE-NAME/?$select=id

replacing “YOUR-SITE-NAME” with the name of the site that you set in the previous step, and “YOUR-DOMAIN” to your SharePoint domain, and executing the query

After executing the query, you should get a response that looks like this:

Response Preview

Take note of the ID, we will use this for uploading to and downloading from the SharePoint.

 

Project setup

The code for the function will be written in Java using Maven.
To create the project, azure offers a lovely archetype which sets up the scaffolding of the project for us. More information can be found here:
https://docs.microsoft.com/en-us/azure/azure-functions/functions-reference-java
To create your project, run the following command:
mvn archetype:generate -DarchetypeGroupId=com.microsoft.azure -DarchetypeArtifactId=azure-functions-archetype

This will run interactively, asking for a groupId, artifactId, version, etc. Fill these in, then open the resulting project.

You should delete the files in /src/test, as we’re going to make some changes that will break these tests, which would prevent the Maven build from finishing.

Additional dependencies

For this code, we need a couple of extra dependencies:
We need Apache’s httpclient for sending the http requests to azure, and Google’s Gson to process the JSON responses.
Add the following to the dependencies section of your pom:
<dependency>
    <groupId>org.apache.httpcomponents</groupId>
    <artifactId>httpclient</artifactId>
    <version>4.5.13</version>
</dependency>

<dependency>
    <groupId>com.google.code.gson</groupId>
    <artifactId>gson</artifactId>
    <version>2.9.0</version>
</dependency>

Local Settings

To load all the values we noted earlier for our function, we will use function settings, which will be exposed as environment variables. To add them to our local environment for testing, open the local.settings.json file in your project root, and add the following settings to the Values object:
"graph:Endpoint": "https://login.microsoftonline.com/",
"graph:GrantType": "client_credentials",
"graph:Scope": "Files.ReadWrite.All",
"graph:Resource": "https://graph.microsoft.com",
"graph:TenantId": "YOUR-TENANT-ID",
"graph:ClientId": "YOUR-APPLICATION-CLIENT-ID",
"graph:ClientSecret": "YOUR-APPLICATION-CLIENT-SECRET",
"pdf:GraphEndpoint": "https://graph.microsoft.com/beta/",
"pdf:SiteId": "YOUR-SHAREPOINT-SITE-ID"
Make sure to replace the all caps text with the values we were supposed to remember from earlier.


MimeMap

The default Java Mime type handling stuff doesn’t handle Microsoft Office file types, so we’re going to create a class with a mapping for the file extension to Mime type, with a couple of helper functions as well.
Create a class called MimeMap and add the following to it:
public class MimeMap {
    // The map between extensions and Mimetypes
    private static final HashMap<String, String> map = new HashMap<>();

    static {
        // Add each office file extension and it's mimetype to the map
        // The source of these mappings can be found here: https://stackoverflow.com/a/4212908
        map.put("doc", "application/msword");
        map.put("dot", "application/msword");

        map.put("docx", "application/vnd.openxmlformats-officedocument.wordprocessingml.document");
        map.put("dotx", "application/vnd.openxmlformats-officedocument.wordprocessingml.template");
        map.put("docm", "application/vnd.ms-word.document.macroEnabled.12");
        map.put("dotm", "application/vnd.ms-word.template.macroEnabled.12");

        map.put("xls", "application/vnd.ms-excel");
        map.put("xlt", "application/vnd.ms-excel");
        map.put("xla", "application/vnd.ms-excel");

        map.put("xlsx", "application/vnd.openxmlformats-officedocument.spreadsheetml.sheet");
        map.put("xltx", "application/vnd.openxmlformats-officedocument.spreadsheetml.template");
        map.put("xlsm", "application/vnd.ms-excel.sheet.macroEnabled.12");
        map.put("xltm", "application/vnd.ms-excel.template.macroEnabled.12");
        map.put("xlam", "application/vnd.ms-excel.addin.macroEnabled.12");
        map.put("xlsb", "application/vnd.ms-excel.sheet.binary.macroEnabled.12");

        map.put("ppt", "application/vnd.ms-powerpoint");
        map.put("pot", "application/vnd.ms-powerpoint");
        map.put("pps", "application/vnd.ms-powerpoint");
        map.put("ppa", "application/vnd.ms-powerpoint");

        map.put("pptx", "application/vnd.openxmlformats-officedocument.presentationml.presentation");
        map.put("potx", "application/vnd.openxmlformats-officedocument.presentationml.template");
        map.put("ppsx", "application/vnd.openxmlformats-officedocument.presentationml.slideshow");
        map.put("ppam", "application/vnd.ms-powerpoint.addin.macroEnabled.12");
        map.put("pptm", "application/vnd.ms-powerpoint.presentation.macroEnabled.12");
        map.put("potm", "application/vnd.ms-powerpoint.template.macroEnabled.12");
        map.put("ppsm", "application/vnd.ms-powerpoint.slideshow.macroEnabled.12");

        map.put("mdb", "application/vnd.ms-access");
    }

    public static String getMimeType(String extension) {
        return map.get(extension);
    }

    public static boolean checkOfficeMimeType(String mimeType) {
        return map.containsValue(mimeType);
    }

    public static String getExtension(String mimeType) {
        Optional<Map.Entry<String, String>> extension = map.entrySet().stream().filter((entry) -> entry.getValue().equals(mimeType)).findFirst();

        return extension.map(Map.Entry::getKey).orElse(null);
    }
}

File Service

To encapsulate our HTTP operations, we will create a class called FileService. This will be a utility class and will contain only static methods.

 

File Service: Authentication

In order to make HTTP requests to upload, convert, and delete the office files, we need to first get an access token which we can add to the header to authenticate our requests.
/**
 * Requests an access token from Azure to authorise future requests with
 */public static String getAccessToken() throws IOException, HttpException {
    List<NameValuePair> values = new ArrayList<>();
    values.add(new BasicNameValuePair("client_id", System.getenv("graph:ClientId")));
    values.add(new BasicNameValuePair("client_secret", System.getenv("graph:ClientSecret")));
    values.add(new BasicNameValuePair("scope", System.getenv("graph:Scope")));
    values.add(new BasicNameValuePair("grant_type", System.getenv("graph:GrantType")));
    values.add(new BasicNameValuePair("resource", System.getenv("graph:Resource")));

    String path = System.getenv("graph:EndPoint") + System.getenv("graph:TenantId") + "/oauth2/token";

    HttpClient client = HttpClients.createDefault();

    HttpPost post = new HttpPost(path);
    post.setEntity(new UrlEncodedFormEntity(values));
    HttpResponse response = client.execute(post);
    if (response.getStatusLine().getStatusCode() == 200) {
        Reader reader = new InputStreamReader(response.getEntity().getContent());
        Gson gson = new Gson();
        JsonObject json = gson.fromJson(reader, JsonObject.class);

        JsonElement token = json.get("access_token");

        return token != null ? token.getAsString() : null;
    } else {
        throw new HttpException("Failed to get access token: " + response.getStatusLine().getStatusCode() + " - " + response.getStatusLine().getReasonPhrase());
    }
}
To simplify the HTTP request creation process, as well as to prevent making this request 3 times for every execution, we will also create a function that caches the access token, and prewraps it in a header class.
private static String auth;

/**
 * Creates a header that contains the authorisation required to make requests to our azure environment
 */private static Header createAuthorisedHeader() throws IOException, HttpException {
    if (auth == null) {
        auth = getAccessToken();
    }

    return new BasicHeader("Authorization", "Bearer " + auth);
}

File Service: File Upload

Our next function will upload the file to the SharePoint storage
/**
 * Uploads the given file to the given sharepoint storage
 * @param path The sharepoint address
 * @param content An input stream of the file being uploaded
 * @param contentLength The length in bytes of the file being uploaded
 * @param contentType The Mimetype of the file being uploaded
 * @return The id of the file in the sharepoint storage
 * @throws HttpException when an unexpected answer is received while making an HTTP Request
 * @throws IOException when an HTTP request fails
 */
public static String uploadStream(String path, InputStream content, long contentLength, String contentType) throws HttpException, IOException {
    HttpClient client = HttpClients.createDefault();

    String fileName = UUID.randomUUID() + "." + MimeMap.getExtension(contentType);

    HttpPut put = new HttpPut(path + "root:/" + fileName + ":/content");
    put.addHeader(createAuthorisedHeader());

    InputStreamEntity entity = new InputStreamEntity(content, contentLength, ContentType.create(contentType));
    entity.setChunked(true);
    put.setEntity(entity);

    HttpResponse response = client.execute(put);

    if (response.getStatusLine().getStatusCode() < 300) {
        Reader reader = new InputStreamReader(response.getEntity().getContent());
        Gson gson = new Gson();
        JsonObject json = gson.fromJson(reader, JsonObject.class);

        JsonElement token = json.get("id");

        if (token == null) throw new HttpException("Failed to upload file to sharepoint: response did not contain file ID");

        return token.getAsString();
    } else {
        throw new HttpException("Failed to upload file to sharepoint: " + response.getStatusLine().getStatusCode() + " - " + response.getStatusLine().getReasonPhrase());
    }
}

File Service: File Download

Next, we need a function that requests the file we just uploaded as a PDF
/**
 * Download the file with the given fileId in the targetFormat
 * @param path The sharepoint address
 * @param fileId The ID of the file to download
 * @param targetFormat The target format to download the file in
 * @return a byte array containing the converted file
 * @throws HttpException when an unexpected answer is received while making an HTTP Request
 * @throws IOException when an HTTP request fails or the converted file cannot be read from the response
 */public static byte[] downloadConvertedFile(String path, String fileId, String targetFormat) throws IOException, HttpException {
    HttpClient client = HttpClients.createDefault();

    HttpGet get = new HttpGet(path + fileId + "/content?format=" + targetFormat);
    get.addHeader(createAuthorisedHeader());

    HttpResponse response = client.execute(get);

    if (response.getStatusLine().getStatusCode() < 300) {
        return response.getEntity().getContent().readAllBytes();
    } else {
        throw new HttpException("Failed to fetch converted file: " + response.getStatusLine().getStatusCode() + " - " + response.getStatusLine().getReasonPhrase());
    }
}

File Service: File Deletion

To ensure we don’t leave a slowly growing mess, once we’ve converted the file and don’t need it any more, we want to delete it.
/**
 * Delete the file with the given fileId
 * @param path The sharepoint address
 * @param fileId The ID of the file to delete
 * @throws HttpException when an unexpected answer is received while making an HTTP Request
 * @throws IOException when an HTTP request fails
 */public static void deleteFile(String path, String fileId) throws HttpException, IOException {
    HttpClient client = HttpClients.createDefault();

    HttpDelete delete = new HttpDelete(path + fileId);
    delete.addHeader(createAuthorisedHeader());

    HttpResponse response = client.execute(delete);
    if (response.getStatusLine().getStatusCode() >= 300) {
        throw new HttpException("Failed to delete file: " + response.getStatusLine().getStatusCode() + " - " + response.getStatusLine().getReasonPhrase());
    }
}

Putting it All Together

The remainder of our code now happens in the Function class that was automatically generated for us.
First we need to make sure to set the value of the FunctionName annotation to a more appropriate value, then we want to set the HttpTrigger annotation to only listen to the POST HTTP method, have a binary data type, and a route of convert.
Finally, we want to get the file from the request, do some error checking, and run our upload, download, and delete functions.
public class Function {
    @FunctionName("Office2PDF")
    public HttpResponseMessage run(
            @HttpTrigger(
                name = "req",
                route = "convert",
                methods = {HttpMethod.POST},
                authLevel = AuthorizationLevel.ANONYMOUS,
                dataType = "binary")
                HttpRequestMessage<Optional<byte[]>> request,
            final ExecutionContext context) {
        context.getLogger().info("Java HTTP trigger processed a request.");

        if (request.getBody().isEmpty()) {
            return request.createResponseBuilder(HttpStatus.BAD_REQUEST).body("File must be attached to request").build();
        }

        byte[] body = request.getBody().get();

        // In order for this to accept a raw file, the content type needs to be "application/octet-stream", however, we
        // still need rely on the content type of the original file, thus we need it delivered separately
        String mimeType = request.getHeaders().get("content-type-actual");

        if (mimeType == null || mimeType.isEmpty()) {
            return request.createResponseBuilder(HttpStatus.UNSUPPORTED_MEDIA_TYPE).body("Please provide the file's mime-type in the header with the key: Content-Type-Actual").build();
        } else if (!MimeMap.checkOfficeMimeType(mimeType)) {
            return request.createResponseBuilder(HttpStatus.UNSUPPORTED_MEDIA_TYPE).body("Content-Type-Actual must be a valid office type").build();
        }

        String path = System.getenv("pdf:GraphEndpoint") + "sites/" + System.getenv("pdf:SiteId") + "/drive/items/";
        String fileId = null;

        try (InputStream stream = new ByteArrayInputStream(body)) {
            fileId = FileService.uploadStream(path, stream, body.length, mimeType);
            byte[] pdf = FileService.downloadConvertedFile(path, fileId, "pdf");

            return request.createResponseBuilder(HttpStatus.OK).body(pdf).build();
        } catch (IOException e) {
            context.getLogger().warning(e.getMessage());
            return request.createResponseBuilder(HttpStatus.INTERNAL_SERVER_ERROR).body(e.getMessage()).build();
        } catch (HttpException e) {
            context.getLogger().warning(e.getMessage());
            return request.createResponseBuilder(HttpStatus.INTERNAL_SERVER_ERROR).body(e.getMessage()).build();
        } finally {
            // Since we can exit early during the conversion, we need to make sure that if a file was created, it gets
            // deleted, successful conversion or not
            if (fileId != null) {
                try {
                    FileService.deleteFile(path, fileId);
                } catch (HttpException | IOException e) {
                    context.getLogger().warning(e.getMessage());
                }
            }
        }
    }
}
To send a file to this service, we need to make a HTTP POST request to the function endpoint with the Content-Type header set to “application/octet-stream”, the Content-Type-Actualheader set to the content type of the file being uploaded, and the body of the request set to the file being uploaded.

Note: in order for azure functions to accept the file as a binary file, the Content-Typeheader must be set to “application/octet-stream”, as we still need to know the filetype, we also pass that in a second header: Content-Type-Actual.

Note: We delete the file in a finally statement to ensure that it gets deleted even if we fail to convert it

The full code can be found here: https://github.com/idrsolutions/azure-office-conversion

 

Testing

Because we set up the local.settings.json file, we can test the app locally by running
mvn azure-functions:run
Which will, provided your local environment has been set up correctly, start the endpoint locally ready to send requests to.

 

Pushing to azure

To push to azure, we run the command:
mvn clean package
to build the jar and resources, then
mvn azure-functions:deploy
to send the files to Azure. If a functions project doesn’t already exist for this function, then executing this command will create one.

 

Finishing up on azure

Once you’ve pushed and created the Function on azure with the azure-functions:deploycommand, we still need to set the settings in azure. To do this, navigate to the function in azure
Azure -> Functions app -> Your function name
Then use the left menu to navigate to the configuration tab.

 

Configuration tab

 

From here, add each config option that you added to the local settings before using the “New application settings” button. The required settings are repeated below:
"graph:Endpoint": "https://login.microsoftonline.com/",
"graph:GrantType": "client_credentials",
"graph:Scope": "Files.ReadWrite.All",
"graph:Resource": "https://graph.microsoft.com",
"graph:TenantId": "YOUR-TENANT-ID",
"graph:ClientId": "YOUR-APPLICATION-CLIENT-ID",
"graph:ClientSecret": "YOUR-APPLICATION-CLIENT-SECRET",
"pdf:GraphEndpoint": "https://graph.microsoft.com/beta/",
"pdf:SiteId": "YOUR-SHAREPOINT-SITE-ID"



Are you a Developer working with PDF files?

Our developers guide contains a large number of technical posts to help you understand the PDF file Format.

Find out more about our software for Developers

Jacob Lucas Jacob is a Java Developer at IDRsolutions. He enjoys juggling programming projects, PC games & hardware, and breaking the company's build process.

Leave a Reply

Your email address will not be published. Required fields are marked *

IDRsolutions Ltd 2022. All rights reserved.