MapReduce Forum

Data type for precision – BigDecimal?

  • #49504
    Marco Shaw


    I’m playing around with some pretty simple data, but I’m struggling a bit. I’m basically pulling out a text and dollar amount field to work on sales calculations.

    My output ends up looking like this:

    San Jose 9936721.410000008
    Santa Ana 1.0050309929999959E7

    So I have a problem with precision and how the data is outputted.

    What I’ve read seems to suggest that BigDecimal is the data type I should be using for currency, but I’m struggling just a big with how to take that type and convert it into one of the Writable classes. Which Writable class should I be using? Should I be doing my calculations using BigDecimal, and then writing out the K,V as Text?

    My mapper:

    public class TotalSalesMapper extends
    	Mapper<LongWritable, Text, Text, LongWritable> {
    	public void map(LongWritable key, Text value, Context context)
        	throws IOException, InterruptedException {
    		String data[] = value.toString().split("\t");
    		if (data.length == 6) {
    			String store = data[2];
    			double cost = Double.parseDouble(data[4]);
    			//BigDecimal cost = new BigDecimal(data[4]);
    			context.write(new Text(store), new LongWritable(cost));

    My reducer:

    public class TotalSalesReducer extends
    	Reducer<Text, DoubleWritable, Text, DoubleWritable> {
    	public void reduce(Text key, Iterable<DoubleWritable> values, Context
            throws IOException, InterruptedException {
    		double sum = Double.MIN_VALUE;
    		for (DoubleWritable value: values) {
    		//while (values.hasNext()) {
    			sum += value.get();
    			//sum +=;
    		context.write(key, new DoubleWritable(sum));

to create new topics or reply. | New User Registration

You must be to reply to this topic. | Create Account

Support from the Experts

A HDP Support Subscription connects you experts with deep experience running Apache Hadoop in production, at-scale on the most demanding workloads.

Enterprise Support »

Become HDP Certified

Real world training designed by the core architects of Hadoop. Scenario-based training courses are available in-classroom or online from anywhere in the world

Training »

Hortonworks Data Platform
The Hortonworks Data Platform is a 100% open source distribution of Apache Hadoop that is truly enterprise grade having been built, tested and hardened with enterprise rigor.
Get started with Sandbox
Hortonworks Sandbox is a self-contained virtual machine with Apache Hadoop pre-configured alongside a set of hands-on, step-by-step Hadoop tutorials.
Modern Data Architecture
Tackle the challenges of big data. Hadoop integrates with existing EDW, RDBMS and MPP systems to deliver lower cost, higher capacity infrastructure.