Tuesday, February 5, 2013

Automate Data Transformation with FlowForce Server

Altova designed FlowForce Server to provide comprehensive automation, management, and control over data transformations performed by dedicated high-speed servers. FlowForce Server Beta 3 is currently available to users of MapForce Enterprise and Professional Editions at no charge during the beta test period.

FlowForce Server can provide hot folder automation of data mappings and maintains a detailed activity log users can monitor remotely in a Web browser window. The screenshot below shows the log for FlowForce Server running the MapForce data mapping CameraLogToGPX we wrote about in the blog post titled Process Multiple Input Files in a Single Data Mapping. This mapping used wildcards to specify multiple input files for processing.

FlowForce Server job log

It only takes a few minutes to set up, run, and review the results of jobs like this on FlowForce Server.

Wildcards or Hot Folders?

Wildcards and hot folders increase the complexity of a data transformation workflow, and using them successfully requires careful planning. Let’s take a minute to look a little deeper at the scenario we want to implement.

Assume we are the IT department in a company that publishes nature and hiking guides. We employ photographers who go out trekking and record their routes as they go, using the GPS tracking feature of their digital cameras. We want to convert the camera GPS log files to XML-based .gpx format for mapping and other processing.

We will publish a folder on our network where photographers can drop off their GPS log files. This will be the hot folder FlowForce Server watches for new files to supply as input to the CameraLogToGPX mapping.

We only need to process each input file once. So, after the data transformation is complete, we can remove the input file from the hot folder. We also want to place the output file in a separate folder.

This suggests the following FlowForce Server job steps:

  • Look in the hot folder to see if new a input files has arrived
  • Perform the data mapping on the input file and place the output file in a separate folder
  • Move the input file to a permanent location

The diagram below shows a folder structure we can use for the workflow, with files ready to drop into the hot folder for processing:

Hot folder structure

The hot folder is C:\CameraGPS\hotFolder and the generated .gpx files will be placed in C:\CameraGPS\outputFiles. When the data mapping is done, input files will be moved to C:\CameraGPS\completedInput.

Deploy the Mapping on a Server

MapForce Beta is a component of FlowForce Server that works just like MapForce, and adds a feature to deploy mapping files to a FlowForce Server. Existing data mappings need little or no special preparation for deployment.

The only thing we need to think about are the filenames of the input and output files. We will instruct FlowForce Server to provide the input filename as a job parameter as new files arrive in the hot folder for processing. The original mapping assigned a wildcard filename for the input component, which is no longer needed. We will also want FlowForce to specify the location of the output file.

We can open the mapping in MapForce Beta, remove the filename from the input component, and add a remove-folder filepath function to the output file so FlowForce Server can set the destination. The screenshot below shows the new filename definitions in the mapping.

Mapping file names

In the MapForce Beta dialog to deploy the mapping, we can choose to immediately open the mapping in a FlowForce Server job definition window in a Web browser to finish defining the job operations.

Deploy mapping dialog

Defining the Job in FlowForce Server

The screenshot below shows the complete job steps defined in a FlowForce Server job properties window:

FlowForce Server job definition

The job trigger is defined at the bottom of the window. Every 30 seconds, FlowForce Server will check the hot folder. If the contents have changed, FlowForce Server will execute the job steps. Each Execution step could be a deployed MapForce mapping, a system step, or even another FlowForce Server job.

The name of each new file entering the hot folder becomes the parameter called {triggerfile} that we use in the mapping step as the input filename, and in the move step as the name of the file to be moved.

The Working-directory parameter in the mapping step defines where the output files will be placed.

FlowForce Server also includes features to set automatic run and stop times for jobs, user permissions and roles, and Queue settings to define minimum time between job runs and maximum parallel instances of a job.

In our scenario, we are likely to receive multiple input files in groups as they are copied from photographers’ memory cards. Multiple parallel runs can greatly improve throughput. As a rule of thumb, you might want to match the number of cores or CPUs in the machine running FlowForce Server.

FlowForce Server job queue settings

Exploring the Job Log

Each time FlowForce Server runs our job, six lines are added to the Log View shown in the illustration at the top of this post. The first and last lines record the start and completion of the job, and each Execution step generated its own start and completion messages. The phrase “completed with status: 0” means the step was successful with no errors.

FlowForce job log

We can click the more links for a detailed report on each Execution step. The screenshot below shows the message for the MapForce mapping Execution step:

FlowForce job detail

Each file dropped into the hot folder generates an individual FlowForce Server job instance, even if multiple files are added as a group. This makes it easy to track any individual input file that generates an error.

When we dropped four files into the hot folder, FlowForce Server ran four jobs, and the contents of the output folder looked like this:

Job results folder

Click here to read more about FlowForce Server Beta 3 at the Altova Web site, or visit the FlowForce Server Beta 3 download page to get started automating data transformations in your data center!


DaveMcG said...

In case you missed it, we described the CameraLogToGPX data mapping in an earlier post. Click here to read the full story.

dkershaw said...

Nice post! This example reminds me of setting up VAN end points for EDI processing. MapForce + FlowForce would have been really useful to me -- wish I'd had them!