Use tools like Apache Storm, MongoDB, Cassandra, Cloudera, OpenRefine.
Make most of your large data sets: Keep your raw data raw and don't manipulate it without having a copy, Visualize the information, Develop workflow, Use version control, Record metadata, Make computing time count,