|
Apache LogAnalysis using Pig
Analyze your Apache logs using Pig and Amazon Elastic MapReduce.
Last Modified:
Aug 10, 2009 5:09 PM
|
|
|
Processing and Loading Data from Amazon S3 to the Vertica Analytic Database
The Amazon Elastic MapReduce service allows users to create massively distributed data processing tasks built on Map and Reduce functions. Amazon Elastic Compute Cloud allows users to run any software on a scale out compute platform. EC2 can, for example be used for large scale data analysis by running an analytic database management
system. Often data analysis tasks start with a processing phase where
unstructured or semi-structured data needs to be processed or transformed before loading into a relational database. In this example we show how to use EMR to process and load a data set from S3 into the Vertica Analytic Database running on EC2.
Last Modified:
May 30, 2009 7:08 AM
|
|
|
LogAnalyzer for Amazon CloudFront
Analyze your Amazon CloudFront Logs using Amazon Elastic MapReduce.
Last Modified:
Jun 1, 2009 11:02 AM
|
|
|
Cascading.Multitool
A command-line tool for processing large data sets.
Last Modified:
Apr 6, 2009 2:49 PM
|
|
|
FreeBase
FreebaseDataProcessor is a simple streaming Hadoop application that finds the most popular items in the given freebase data input and loads them into Amazon SimpleDB.
Last Modified:
Apr 2, 2009 1:53 PM
|
|
|
ItemSimilarity
ItemSimilarity is a simple Hadoop streaming Python application that attempts to find similar items for each item in the input dataset. This example application finds similar artists using the Audioscrobbler user playlist dataset and Amazon Elastic MapReduce.
Last Modified:
Apr 2, 2009 2:49 PM
|
|
|
CloudBurst
CloudBurst provides highly-sensitive short read mapping with MapReduce.
Last Modified:
Apr 2, 2009 1:53 PM
|
|
|
Word Count Example
This example shows how to use Hadoop Streaming to count the number of
times that words occur within a text collection.
Last Modified:
Apr 2, 2009 1:53 PM
|
|