Recent Posts

Troubleshooting in Redshift

Redshift is a one of the most popular data warehousing solution, thousands of companies running millions of ETL jobs everyday. The problem with MPP systems i...

Speculative Execution in Hadoop

Speculative execution is an optimization technique where a computer system tries to predict the slowness the task execution and run a backup task the slownes...

Spell-correct in Python - Part 1

One of the most common problem to face while solving a text processing use case is wrongly spelt words, swapping every wrongly spelt word with a mapping dict...

Introduction To Pandas

Pandas is one of the widely used Python libraries for working with data. it is built on libraries like Matplotlib and NumPy. Pandas is great for data manipul...

Refreshing Probability

Probability: is defined as the measure of the likelihood that an event will occur. Measured by the ratio of the favorable event to the whole number of events...