Working With Big Data

When we work with Big Data we need some way to speed up our algorithm. The common way to do this is to use a light algorithm, work with on-line learning or distribute the workload among different computer or processor.

A Light Algorithm to Work With Big Data

A way to speed up our application is instead of use batch gradient descent use stochastic gradient descent. It is important to say that stochastic gradient descent is not suitable to small data it only works better to large data. The step we need to follow is:

  1. Randomly ‘shuffle’ the dataset
  2. For i=1…m
  3. Compute the new parameter

This algorithm does not always get in a global optimal, but it gets really close.

On-line learning

The intuition behind on-line learning is that for each new example that we get, we use this to tune work algorithm. The step is the follow:

  1. repeat forever
  2.  computer new parameter based on new example (x, y)
  3. exclude the new example

The on-line learning does not consume memory and adapt to the user.

Distributed Computing

Sometimes we really need the data, so the only option is to use a distributed approach know as map-reduce. The intuition is the follow:

1: divide our data in n distinct parts

2: process each part in a different computer

3: merge the data

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: