Mapreduce Fun: Sampling for Large Data Set

This post is by Chou-han Yang, principal engineer at BloomReach. The coolest thing about mapreduce is that we suddenly have enormous computing power and storage at disposal. To me, it’s like a kid who suddenly has a new toy and a desire to incorporate it into his favorite games. What could be more fun than … 

 

Strategies for Reducing Your Amazon EMR Costs

This post is by Prateek Gupta, a lead engineer at BloomReach. It is also cross-posted on the AWS Big Data Blog. BloomReach has built a personalized discovery platform with applications for organic search, site search, content marketing and merchandizing. BloomReach ingests data from a variety of sources such as merchant inventory feed, sitefetch data from merchants’ websites … 

 

Open Source at Bloomreach

BloomReach benefits enormously from open source software throughout our data processing and serving systems. Our backend data processing and analytics systems use Hadoop, Cassandra and a myriad of libraries from the Apache and Python projects and other communities — and of course Linux. While the bulk of our code is tightly linked to our data …