10 Big Data Do's and Don'ts

By Kaushik Pal
Published: September 29, 2014 | Last updated: September 30, 2014
Key Takeaways

Big data is a new and emerging domain for most companies. Making it work takes careful fine-tuning and use of best practices.

Source: Rawpixelimages/
Big data carries a lot of promise for all types of industries. If this big data is leveraged effectively and efficiently, it can have a significant impact in decision-making and analytics. But the benefit of big data can only be achieved if it is managed in a structured way. The best practices of big data are gradually being established, but there are already some clear do's and don’ts when it comes to implementation.

The following guidance is based on practical experience and knowledge gathered from real-life projects. Here are my top big data do's and don'ts.

Do involve all business sections in a big data initiative

A big data initiative is not an isolated and independent activity, and the involvement of all business units is a must to get real value and insight. Big data can help organizations leverage large volumes of data and gain insight into customer behavior, events, trends, predictions, etc. This is not possible with a data snapshot, which only captures a part of the entire volume of data processed in big data. As a result, companies are increasingly concentrating more on all types of data coming from all possible avenues/business units to understand the correct pattern.

Do evaluate all infrastructure models for big data implementation

The volume of data and its management is a major concern for any big data initiative. Because big data deals with petabytes of data, the only solution to manage it is by using data centers. At the same time, the cost component has to be considered before selecting and finalizing any storage facility. Cloud services are often the best choice, but the services of different cloud environments must be evaluated to determine the appropriate one. As storage is one of the most important components in any big data implementation, it is a factor that should be evaluated very carefully in any big data initiative. (Get another perspective in Today's Big Data Challenge Stems from Variety, Not Volume or Velocity.)

Do consider traditional data sources in big data planning

There are various sources of big data and the number of sources is also increasing day by day. This huge volume of data is used as an input to big data processing. As a result, some companies think that traditional data sources are of no use. This is not true, as this traditional data is a critical component for the success of any big data story. Traditional data contains valuable information, so it should be used in conjunction with other big data sources. The real value of big data can only be derived if all data sources (traditional and non-traditional) are taken into account. (Learn more in Take That, Big Data! Why Small Data May Pack a Bigger Punch.)

Do consider a consistent set of data

In a big data environment, data is coming from various sources. The format, structure and types of data vary from one source to another. The most important part is that the data is not cleansed when it comes to your big data environment. So, before you trust the incoming data, you need to check the consistency by repetitive observation and analysis. Once the consistency of data is confirmed, it can be treated as a consistent set of metadata. Finding a consistent set of metadata by careful observation of the pattern is an essential exercise in any big data planning.

Do distribute the data

The volume of data is a major concern when we consider a processing environment. Because of the huge volume of data that big data deals with, processing on a single server is not possible. The solution is a Hadoop environment, which is a distributed computing environment that runs on commodity hardware. It gives the power of faster processing on multiple nodes. (Learn more in 7 Things to Know About Hadoop.)

Don't ever rely on a single big data analytics approach

There are various technologies available in the market for processing big data. The foundation of all big data technologies is Apache Hadoop and MapReduce. Therefore, it is important to evaluate the correct technology for the correct purpose. Some of the important analytics approaches are predictive analytics, prescriptive analytics, text analytics, stream data analytics, etc. Selection of the appropriate method/approach is important to achieve the desired goal. It's best to avoid relying on a single approach, but to investigate various approaches and select the perfect match for your solution.

Don't start large big data initiative before you are ready

It is always recommended to start with small steps for any big data initiative. So, start with pilot projects to gain expertise and then go for actual implementation. The potential of big data is very impressive, but the real value can only be achieved once we reduce our mistakes and gain more expertise.

Don't use data in isolation

Big data sources are scattered around us and they are increasing day by day. It is important to integrate all these data to get correct analytics output. Different tools are available in the market for data integration, but they should be evaluated properly before use. Integration of big data is a complex task as the data from different sources are of different format, but it is very much required to get good analytics result.

Don't ignore data security

Data security is a major consideration in big data planning. Initially, (before doing any processing), the data is in petabytes, so the security is not strictly implemented. But after some processing, you will get a subset of data that provides some insight. At this point, data security becomes essential. The more the data is processed and fine tuned, the more valuable it often becomes to an organization. This fine tuned output data is intellectual property and must be secured. Data security must be implemented as a part of the big data life cycle.

Don't ignore the performance part of big data analytics

The output of big data analytics is only useful when it gives good performance. Big data offers more insights based on the processing of a huge amount of data at a faster speed. Therefore, it is essential to manage it effectively and efficiently. If the performance of big data is not managed carefully, it will cause problems and make the entire effort meaningless.

In our discussion, we have focused on the do's and don'ts of big data initiatives. Big data is an emerging area and when it comes to implementation, many companies are still in the planning phase. It is essential to understand big data best practices to minimize risk and mistakes. The discussion points have been derived from live project experiences, so it will give some guidelines for making a big data strategy successful.


Share This Article

  • Facebook
  • LinkedIn
  • Twitter

Written by Kaushik Pal | Contributor

Profile Picture of Kaushik Pal

Kaushik is a technical architect and software consultant, having over 20 years of experience in software analysis, development, architecture, design, testing and training industry. He has an interest in new technology and innovation areas. He focuses on web architecture, web technologies, Java/J2EE, open source, WebRTC, big data and semantic technologies. Kaushik is also the founder of TechAlpine, a technology blog/consultancy firm based in Kolkata. The team at TechAlpine works for different clients in India and abroad. The team has expertise in Java/J2EE/open source/web/WebRTC/Hadoop/big data technologies and technical writing.

Related Articles

Go back to top