Category Archives: R Programming

My R learning curve

As a Data Scientist / Analyst we may receive data in different/variety of formats, and also allows to wide range of tools to import data and widely supported formats can be as follows:

R Data input formats R Data input format2

  1. Keyboard
  2. Text Files
  3. Excel
  4. Structured Database – Oracle, MySQL, etc.,
  5. Unstructured / Hybrid Database – eg XML; HDFS, etc.,

and the details are as follows:

Data Scientist role

The role of the Data Scientist came up with the Big Data area. But it’s not a quite new role in the enterprise business. Before we called them statisticians or subject matter experts. So what makes him now so different and what skills brings a Data Scientist to his role? Drew Conway published a very interesting illustration, called the “Data Science Venn Diagram”:

What is data scientist

About data scientists

Rising alongside the relatively new technology of big data is the new job title data scientist. While not tied exclusively to big data projects, the data scientist role does complement them because of the increased breadth and depth of data being examined, as compared to traditional roles.

So what does a data scientist do?

A data scientist represents an evolution from the business or data analyst role. The formal training is similar, with a solid foundation typically in computer science and applications, modeling, statistics, analytics and math. What sets the data scientist apart is strong business acumen, coupled with the ability to communicate findings to both business and IT leaders in a way that can influence how an organization approaches a business challenge. Good data scientists will not just address business problems, they will pick the right problems that have the most value to the organization.

The data scientist role has been described as “part analyst, part artist.” Anjul Bhambhri, vice president of big data products at IBM, says, “A data scientist is somebody who is inquisitive, who can stare at data and spot trends. It’s almost like a Renaissance individual who really wants to learn and bring change to an organization.”

Whereas a traditional data analyst may look only at data from a single source – a CRM system, for example – a data scientist will most likely explore and examine data from multiple disparate sources. The data scientist will sift through all incoming data with the goal of discovering a previously hidden insight, which in turn can provide a competitive advantage or address a pressing business problem. A data scientist does not simply collect and report on data, but also looks at it from many angles, determines what it means, then recommends ways to apply the data.

Data scientists are inquisitive: exploring, asking questions, doing “what if” analysis, questioning existing assumptions and processes. Armed with data and analytical results, a top-tier data scientist will then communicate informed conclusions and recommendations across an organization’s leadership structure.

What does a data scientist do?

This is one of the better descriptions, I have seen, for what a data scientist does.

They must find interesting, novel, and useful insights about the real world in the data. And they must turn those insights into products and services, and deliver those products and services at a profit.

Notice, data scientists don’t just need to find insights in data. They also need create profitable products from that insight. I often times feel that data products are not seen as important as improving the machine learning algorithms, but the data products really are the end goal.