Contents
This tutorial aims to create a service that can be used to accurately convert Microsoft Office file formats into PDFs. The service works by storing the Office file in a SharePoint file storage, then requesting the file back in a specified format using this API call:
https://docs.microsoft.com/en-us/graph/api/driveitem-get-content-format?view=graph-rest-beta&tabs=http, before deleting the file again (saving space on the SharePoint).
This implementation majorly follows/borrows/steals from https://medium.com/medialesson/convert-files-to-pdf-using-microsoft-graph-azure-functions-20bc84d2adc4, however has been re-implemented in Java.
The code produced in this guide can be found on GitHub and is explained in detail in the Project Setup section.
App Registration
The first thing we need to do is create an app registration, which we will use for login information and permissions for our app.
Secret Key
Azure Portal -> Azure Active Directory -> App registrations
Permissions
The last thing we need to do is set up the application’s permissions. We need to configure the application to be allowed to write files so that we can upload our office files in order to then download them in the correct format.
Navigate to the app permissions page inside the application registration.
Click “Add a permission”, then, in the menu that appears, select the Microsoft Graph API.
Then Application permissions.
Then search for files, open the submenu, and select “Files.ReadWrite.All”.
Click the “Add permissions” button, then, back on the “API permissions” page, click the “Grant admin consent for X” button. If you cannot click this button (note it does take a moment to become available after adding a new permission), then you’ll need to get an admin account to navigate to this page and click it for you.
Download as a PDF
SharePoint setup
In order to download a file as a PDF, we need somewhere to upload it first. For this, we’re going to piggyback off of SharePoint.
We want to create a SharePoint site that isn’t linked to any group, to do this, navigate to https://www.office.com, open the admin panel then the SharePoint tab.
Go to sites -> Active Sites, then click create.
In the menu that pops up, click “Other Options”, then select the “document center” template (the template probably doesn’t matter, but this one has a nice list of any files that remain on the server if they fail to delete). Finally, set a name and note this down, you’ll need it in the next step.
With the site created, we need its ID in order to make API calls using it. We can find this ID by navigating to https://developer.microsoft.com/en-us/graph/graph-explorer, signing in using the same account that was set as an administrator to the SharePoint site, setting the URL to
https://graph.microsoft.com/v1.0/sites/YOUR-DOMAIN.sharepoint.com/:/sites/YOUR-SITE-NAME/?$select=id
replacing “YOUR-SITE-NAME” with the name of the site that you set in the previous step, and “YOUR-DOMAIN” to your SharePoint domain, and executing the query
After executing the query, you should get a response that looks like this:
Take note of the ID, we will use this for uploading to and downloading from the SharePoint.
Project setup
To create the project, azure offers a lovely archetype which sets up the scaffolding of the project for us. More information can be found here:
https://docs.microsoft.com/en-us/azure/azure-functions/functions-reference-java
To create your project, run the following command:
mvn archetype:generate -DarchetypeGroupId=com.microsoft.azure -DarchetypeArtifactId=azure-functions-archetype
This will run interactively, asking for a groupId, artifactId, version, etc. Fill these in, then open the resulting project.
You should delete the files in /src/test, as we’re going to make some changes that will break these tests, which would prevent the Maven build from finishing.
Additional dependencies
Add the following to the dependencies section of your pom:
<dependency> <groupId>org.apache.httpcomponents</groupId> <artifactId>httpclient</artifactId> <version>4.5.13</version> </dependency> <dependency> <groupId>com.google.code.gson</groupId> <artifactId>gson</artifactId> <version>2.9.0</version> </dependency>
Local Settings
"graph:Endpoint": "https://login.microsoftonline.com/", "graph:GrantType": "client_credentials", "graph:Scope": "Files.ReadWrite.All", "graph:Resource": "https://graph.microsoft.com", "graph:TenantId": "YOUR-TENANT-ID", "graph:ClientId": "YOUR-APPLICATION-CLIENT-ID", "graph:ClientSecret": "YOUR-APPLICATION-CLIENT-SECRET", "pdf:GraphEndpoint": "https://graph.microsoft.com/beta/", "pdf:SiteId": "YOUR-SHAREPOINT-SITE-ID"
MimeMap
Create a class called MimeMap and add the following to it:
public class MimeMap { // The map between extensions and Mimetypes private static final HashMap<String, String> map = new HashMap<>(); static { // Add each office file extension and it's mimetype to the map // The source of these mappings can be found here: https://stackoverflow.com/a/4212908 map.put("doc", "application/msword"); map.put("dot", "application/msword"); map.put("docx", "application/vnd.openxmlformats-officedocument.wordprocessingml.document"); map.put("dotx", "application/vnd.openxmlformats-officedocument.wordprocessingml.template"); map.put("docm", "application/vnd.ms-word.document.macroEnabled.12"); map.put("dotm", "application/vnd.ms-word.template.macroEnabled.12"); map.put("xls", "application/vnd.ms-excel"); map.put("xlt", "application/vnd.ms-excel"); map.put("xla", "application/vnd.ms-excel"); map.put("xlsx", "application/vnd.openxmlformats-officedocument.spreadsheetml.sheet"); map.put("xltx", "application/vnd.openxmlformats-officedocument.spreadsheetml.template"); map.put("xlsm", "application/vnd.ms-excel.sheet.macroEnabled.12"); map.put("xltm", "application/vnd.ms-excel.template.macroEnabled.12"); map.put("xlam", "application/vnd.ms-excel.addin.macroEnabled.12"); map.put("xlsb", "application/vnd.ms-excel.sheet.binary.macroEnabled.12"); map.put("ppt", "application/vnd.ms-powerpoint"); map.put("pot", "application/vnd.ms-powerpoint"); map.put("pps", "application/vnd.ms-powerpoint"); map.put("ppa", "application/vnd.ms-powerpoint"); map.put("pptx", "application/vnd.openxmlformats-officedocument.presentationml.presentation"); map.put("potx", "application/vnd.openxmlformats-officedocument.presentationml.template"); map.put("ppsx", "application/vnd.openxmlformats-officedocument.presentationml.slideshow"); map.put("ppam", "application/vnd.ms-powerpoint.addin.macroEnabled.12"); map.put("pptm", "application/vnd.ms-powerpoint.presentation.macroEnabled.12"); map.put("potm", "application/vnd.ms-powerpoint.template.macroEnabled.12"); map.put("ppsm", "application/vnd.ms-powerpoint.slideshow.macroEnabled.12"); map.put("mdb", "application/vnd.ms-access"); } public static String getMimeType(String extension) { return map.get(extension); } public static boolean checkOfficeMimeType(String mimeType) { return map.containsValue(mimeType); } public static String getExtension(String mimeType) { Optional<Map.Entry<String, String>> extension = map.entrySet().stream().filter((entry) -> entry.getValue().equals(mimeType)).findFirst(); return extension.map(Map.Entry::getKey).orElse(null); } }
File Service
File Service: Authentication
/** * Requests an access token from Azure to authorise future requests with */public static String getAccessToken() throws IOException, HttpException { List<NameValuePair> values = new ArrayList<>(); values.add(new BasicNameValuePair("client_id", System.getenv("graph:ClientId"))); values.add(new BasicNameValuePair("client_secret", System.getenv("graph:ClientSecret"))); values.add(new BasicNameValuePair("scope", System.getenv("graph:Scope"))); values.add(new BasicNameValuePair("grant_type", System.getenv("graph:GrantType"))); values.add(new BasicNameValuePair("resource", System.getenv("graph:Resource"))); String path = System.getenv("graph:EndPoint") + System.getenv("graph:TenantId") + "/oauth2/token"; HttpClient client = HttpClients.createDefault(); HttpPost post = new HttpPost(path); post.setEntity(new UrlEncodedFormEntity(values)); HttpResponse response = client.execute(post); if (response.getStatusLine().getStatusCode() == 200) { Reader reader = new InputStreamReader(response.getEntity().getContent()); Gson gson = new Gson(); JsonObject json = gson.fromJson(reader, JsonObject.class); JsonElement token = json.get("access_token"); return token != null ? token.getAsString() : null; } else { throw new HttpException("Failed to get access token: " + response.getStatusLine().getStatusCode() + " - " + response.getStatusLine().getReasonPhrase()); } }
private static String auth; /** * Creates a header that contains the authorisation required to make requests to our azure environment */private static Header createAuthorisedHeader() throws IOException, HttpException { if (auth == null) { auth = getAccessToken(); } return new BasicHeader("Authorization", "Bearer " + auth); }
File Service: File Upload
/** * Uploads the given file to the given sharepoint storage * @param path The sharepoint address * @param content An input stream of the file being uploaded * @param contentLength The length in bytes of the file being uploaded * @param contentType The Mimetype of the file being uploaded * @return The id of the file in the sharepoint storage * @throws HttpException when an unexpected answer is received while making an HTTP Request * @throws IOException when an HTTP request fails */ public static String uploadStream(String path, InputStream content, long contentLength, String contentType) throws HttpException, IOException { HttpClient client = HttpClients.createDefault(); String fileName = UUID.randomUUID() + "." + MimeMap.getExtension(contentType); HttpPut put = new HttpPut(path + "root:/" + fileName + ":/content"); put.addHeader(createAuthorisedHeader()); InputStreamEntity entity = new InputStreamEntity(content, contentLength, ContentType.create(contentType)); entity.setChunked(true); put.setEntity(entity); HttpResponse response = client.execute(put); if (response.getStatusLine().getStatusCode() < 300) { Reader reader = new InputStreamReader(response.getEntity().getContent()); Gson gson = new Gson(); JsonObject json = gson.fromJson(reader, JsonObject.class); JsonElement token = json.get("id"); if (token == null) throw new HttpException("Failed to upload file to sharepoint: response did not contain file ID"); return token.getAsString(); } else { throw new HttpException("Failed to upload file to sharepoint: " + response.getStatusLine().getStatusCode() + " - " + response.getStatusLine().getReasonPhrase()); } }
File Service: File Download
/** * Download the file with the given fileId in the targetFormat * @param path The sharepoint address * @param fileId The ID of the file to download * @param targetFormat The target format to download the file in * @return a byte array containing the converted file * @throws HttpException when an unexpected answer is received while making an HTTP Request * @throws IOException when an HTTP request fails or the converted file cannot be read from the response */public static byte[] downloadConvertedFile(String path, String fileId, String targetFormat) throws IOException, HttpException { HttpClient client = HttpClients.createDefault(); HttpGet get = new HttpGet(path + fileId + "/content?format=" + targetFormat); get.addHeader(createAuthorisedHeader()); HttpResponse response = client.execute(get); if (response.getStatusLine().getStatusCode() < 300) { return response.getEntity().getContent().readAllBytes(); } else { throw new HttpException("Failed to fetch converted file: " + response.getStatusLine().getStatusCode() + " - " + response.getStatusLine().getReasonPhrase()); } }
File Service: File Deletion
/** * Delete the file with the given fileId * @param path The sharepoint address * @param fileId The ID of the file to delete * @throws HttpException when an unexpected answer is received while making an HTTP Request * @throws IOException when an HTTP request fails */public static void deleteFile(String path, String fileId) throws HttpException, IOException { HttpClient client = HttpClients.createDefault(); HttpDelete delete = new HttpDelete(path + fileId); delete.addHeader(createAuthorisedHeader()); HttpResponse response = client.execute(delete); if (response.getStatusLine().getStatusCode() >= 300) { throw new HttpException("Failed to delete file: " + response.getStatusLine().getStatusCode() + " - " + response.getStatusLine().getReasonPhrase()); } }
Putting it All Together
First we need to make sure to set the value of the FunctionName annotation to a more appropriate value, then we want to set the HttpTrigger annotation to only listen to the POST HTTP method, have a binary data type, and a route of convert.
Finally, we want to get the file from the request, do some error checking, and run our upload, download, and delete functions.
public class Function { @FunctionName("Office2PDF") public HttpResponseMessage run( @HttpTrigger( name = "req", route = "convert", methods = {HttpMethod.POST}, authLevel = AuthorizationLevel.ANONYMOUS, dataType = "binary") HttpRequestMessage<Optional<byte[]>> request, final ExecutionContext context) { context.getLogger().info("Java HTTP trigger processed a request."); if (request.getBody().isEmpty()) { return request.createResponseBuilder(HttpStatus.BAD_REQUEST).body("File must be attached to request").build(); } byte[] body = request.getBody().get(); // In order for this to accept a raw file, the content type needs to be "application/octet-stream", however, we // still need rely on the content type of the original file, thus we need it delivered separately String mimeType = request.getHeaders().get("content-type-actual"); if (mimeType == null || mimeType.isEmpty()) { return request.createResponseBuilder(HttpStatus.UNSUPPORTED_MEDIA_TYPE).body("Please provide the file's mime-type in the header with the key: Content-Type-Actual").build(); } else if (!MimeMap.checkOfficeMimeType(mimeType)) { return request.createResponseBuilder(HttpStatus.UNSUPPORTED_MEDIA_TYPE).body("Content-Type-Actual must be a valid office type").build(); } String path = System.getenv("pdf:GraphEndpoint") + "sites/" + System.getenv("pdf:SiteId") + "/drive/items/"; String fileId = null; try (InputStream stream = new ByteArrayInputStream(body)) { fileId = FileService.uploadStream(path, stream, body.length, mimeType); byte[] pdf = FileService.downloadConvertedFile(path, fileId, "pdf"); return request.createResponseBuilder(HttpStatus.OK).body(pdf).build(); } catch (IOException e) { context.getLogger().warning(e.getMessage()); return request.createResponseBuilder(HttpStatus.INTERNAL_SERVER_ERROR).body(e.getMessage()).build(); } catch (HttpException e) { context.getLogger().warning(e.getMessage()); return request.createResponseBuilder(HttpStatus.INTERNAL_SERVER_ERROR).body(e.getMessage()).build(); } finally { // Since we can exit early during the conversion, we need to make sure that if a file was created, it gets // deleted, successful conversion or not if (fileId != null) { try { FileService.deleteFile(path, fileId); } catch (HttpException | IOException e) { context.getLogger().warning(e.getMessage()); } } } } }
Note: in order for azure functions to accept the file as a binary file, the Content-Typeheader must be set to “application/octet-stream”, as we still need to know the filetype, we also pass that in a second header: Content-Type-Actual.
Note: We delete the file in a finally statement to ensure that it gets deleted even if we fail to convert it
The full code can be found here: https://github.com/idrsolutions/azure-office-conversion
Testing
mvn azure-functions:run
Pushing to azure
mvn clean package
mvn azure-functions:deploy
Finishing up on azure
Azure -> Functions app -> Your function name
"graph:Endpoint": "https://login.microsoftonline.com/", "graph:GrantType": "client_credentials", "graph:Scope": "Files.ReadWrite.All", "graph:Resource": "https://graph.microsoft.com", "graph:TenantId": "YOUR-TENANT-ID", "graph:ClientId": "YOUR-APPLICATION-CLIENT-ID", "graph:ClientSecret": "YOUR-APPLICATION-CLIENT-SECRET", "pdf:GraphEndpoint": "https://graph.microsoft.com/beta/", "pdf:SiteId": "YOUR-SHAREPOINT-SITE-ID"
Are you a Developer working with PDF files?
Our developers guide contains a large number of technical posts to help you understand the PDF file Format.
Find out more about our software for Developers
|
|
|