We changed our name from IT Central Station: Here's why
Ariful Mondal
Consulting Practice Partner - Data, Analytics & Artificial Intelligence at Wipro Ltd
Real User
ExpertModerator
Flexible with support for several programming languages, good visualization and workload management functionality
Pros and Cons
  • "Databricks gives you the flexibility of using several programming languages independently or in combination to build models."
  • "Databricks requires writing code in Python or SQL, so if you're a good programmer then you can use Databricks."

What is our primary use case?

The primary use is for data management and managing workloads of data pipelines.

Databricks can also be used for data visualization, as well as to implement machine learning models. Machine learning development can be done using R, Python, and Spark programming.

What is most valuable?

Databricks gives you the flexibility of using several programming languages independently or in combination to build models.

The quick visualization of the data is very good.

The workload management functionality works well.

What needs improvement?

Databricks requires writing code in Python or SQL, so if you're a good programmer then you can use Databricks.

For how long have I used the solution?

I have been using Databricks since 2017. I am no longer using it personally, although my team is, and will continue to do so in the future.

What do I think about the stability of the solution?

Databricks is quite popular these days and it appears to be stable. I have not found any issues with stability.

What do I think about the scalability of the solution?

Databricks is scalable, regardless of which cloud provider is being used. It is supported on Microsoft Azure, AWS, and they have their own cloud as well.

For a small workload, Databricks may not be worth the costs. However, for larger workloads, Databricks is a very good solution.

In my previous organization, there were between 10 and 15 users.

How are customer service and technical support?

The technical support is handled by Microsoft partners and because we had premium support, it was easy to get. That said, I did not require any support.

Which solution did I use previously and why did I switch?

I have not used tools that are similar to Databricks for workload management, but Azure ADFv2, Google BigQuery, SAS are some the most powerful tools in this space, that I have used in the past. I have also heard of Dataiku and other tools but I have not used them. The only things that I have used are tools written in Python or scripting languages.

How was the initial setup?

There is no installation required.

What's my experience with pricing, setup cost, and licensing?

Databricks uses pay-per-use model, where you can use as much compute as you need. I think that the cost can be reduced, given that there are more users on the platform, although it is not as expensive as some other solutions like SAS.

What other advice do I have?

As we transition to the Azure cloud, I expect that we will be using Databricks for workloads.

This is a product that I recommend for those who want to scale and have a good budget. It is good for automating a data pipeline and managing workloads. My advice for anybody who is starting to use it is to take the proper training.

Overall, based on my uses, I think that this product is pretty good.

I would rate this solution an eight out of ten.

Which deployment model are you using for this solution?

Public Cloud

If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?

Microsoft Azure
Disclosure: I am a real user, and this review is based on my own experience and opinions.
Engineer at a tech services company with 10,001+ employees
Real User
Top 10
An easy initial setup with a good time travel feature, but needs better model scoring
Pros and Cons
  • "The time travel feature is the solution's most valuable aspect."
  • "Databricks is an analytics platform. It should offer more data science. It should have more features for data scientists to work with."

What is our primary use case?

We use the solution for multiple items. We use lots of data crunching, development, and algorithms on it.

What is most valuable?

The time travel feature is the solution's most valuable aspect.

What needs improvement?

The management of the solution needs to be modernized. Managing the radius data is hard.

The solution requires modern scoring. There's not a good way of knowing how the models are performing from a data science perspective. The solution needs more model scoring abilities. It doesn't necessarily need more model monitoring, but more model scoring and performance from a data science perspective. 

Databricks is an analytics platform. It should offer more data science. It should have more features for data scientists to work with.

For how long have I used the solution?

I've been using the solution for one year so far.

What do I think about the stability of the solution?

The solution is not exactly stable. We've faced a few bugs which have really affected it. There are bugs especially when it comes to connecting with Spark.

What do I think about the scalability of the solution?

It's hard to say how scalable the solution is. The scalability comes into play on the Spark side, not on the Databricks side.

We have about 20 people on the solution right now.

How are customer service and technical support?

We've never been in touch with technical support, so I don't have any experience in terms of dealing with them.

How was the initial setup?

The initial setup is straightforward. I wouldn't say that it's complex in any way.

Deployment times vary and really depend on multiple factors. It can take anywhere from a few weeks to a few months to deploy the solution. In our case, it took us about three months to fully deploy it.

It takes two to three people to deploy the solution.

What about the implementation team?

I deployed the solution with the help of my team.

What's my experience with pricing, setup cost, and licensing?

I'm not sure what the licensing costs are on the solution.

Which other solutions did I evaluate?

We did evaluate Amazon PageMaker before ultimately choosing Databricks. It's the only other solution we evaluated at the time.

What other advice do I have?

We're partners with Databricks.

We're using the latest version of the solution, but I can't recall what version number we are on.

I'd advise others considering the solution to look at usage. They shouldn't adopt the solution blindly. How the implementation and usage will go will depend on the skill of the data engineer and what your requirements are.

I'd rate the solution seven out of ten.

Which deployment model are you using for this solution?

On-premises
Disclosure: My company has a business relationship with this vendor other than being a customer: Partner
Learn what your peers think about Databricks. Get advice and tips from experienced pros sharing their opinions. Updated: January 2022.
563,148 professionals have used our research since 2012.
Data Scientist at a retailer with 5,001-10,000 employees
Real User
Top 10
Quick development, reliable, has interactive clusters, and is priced per usage
Pros and Cons
  • "One of the features provides nice interactive clusters, or compute instances that you don't really need to manage often."
  • "I would like to see more documentation in terms of how an end-user could use it, and users like me can easily try it and implement use cases."

What is our primary use case?

Currently, I am using this solution for a forecasting project.

What is most valuable?

One of the features provides nice interactive clusters, or compute instances that you don't really need to manage often. You can just spin it off and use that for a lot of your pre-processing, which is very convenient. 

The normal features are very good in terms of doing some quick development or doing some EDA.

Also, one of the newest features brought into this solution provides you with a way to solve, deploy, and train models using the platform itself. Or, it can connect to your Azure Machine Learning in order to train, deploy, and productionalize some of the machine learning models.

What needs improvement?

Since the Databricks community is not that old, there is not a lot of information about some of the issues that we face. We have to go back to the Databricks stream to get some of the issue resolutions from there. 

As time passes, and more people start putting more information out there about this technology, wit will be helpful.

I think even with the features that we currently have, they're still optimizing some of the clusters and trying to parallelize to better read from other types of data. So, that's going really well in terms of one of the features that they recently came up with to include the data format for data, which was really good, and that speeds up a lot of the processes.

I would like to see more documentation in terms of how an end-user could use it, and users like me can easily try it and implement use cases.

For how long have I used the solution?

I have been using Databricks on a daily basis for over a year.

It's deployed on the cloud, so it's always up to date.

What do I think about the stability of the solution?

It's definitely quite stable, in terms of an enterprise solution. 

I'd say that it's pretty stable. 

You have these clusters running on-demand, and you can also come up with these clusters that are scheduled, and that can be run for your production jobs.

What's my experience with pricing, setup cost, and licensing?

The pricing depends on the usage itself. They measure the cost of the companies in town. It also depends on the type of cluster that you are using. If you are using a very heavy cluster, it would be the price per CPU.

What other advice do I have?

I would rate Databricks an eight out of ten.

Which deployment model are you using for this solution?

Private Cloud

If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?

Microsoft Azure
Disclosure: I am a real user, and this review is based on my own experience and opinions.
Chief Research Officer at a consumer goods company with 1,001-5,000 employees
Real User
Top 20
Ability to work collaboratively without concerns regarding the infrastructure is very beneficial to us
Pros and Cons
  • "Ability to work collaboratively without having to worry about the infrastructure."
  • "Would be helpful to have additional licensing options."

What is our primary use case?

Our primary use case of Databricks is for advanced analytics. I'm the chief research officer of the company and we're customers of Databricks.  

What is most valuable?

I think the features I like the most are the scalability of the solution as well as its ability to share. We work with multiple people on notebooks and it enables us to work collaboratively in an easy way without having to worry about the infrastructure. I think the solution is very intuitive, very easy to use. And that's what you pay for.

What needs improvement?

I'd like to see more licensing options for the solution, the availability of additional pricing tiers. I understand it's not easy to achieve because it's a kind of platform-as-a-service type of solution. If you wanted to be more specific about the parts, and what you might or might not need, then you could save some money, and go for a lower level. Of course, that would then mean you'd have to manage more configurations which, as a user, would make things more complex but it would be good to have that option. The pricing is not the cheapest but it's understandable because it's a very high-end solution and easy to use, there's a lot of complexity masked away.

I would like to see additional monitoring tools and, in general, anything that can improve visualization of data. I know it's not the main point of Databricks and there are other tools that can be used, but anything that facilitates the integration of Databricks with visualization tools could be really useful. Increasing data scalability would also be great. 

For how long have I used the solution?

I've been using this solution for a year. 

What do I think about the stability of the solution?

The solution has been very stable. 

What do I think about the scalability of the solution?

Scalability of the solution seems very easy to achieve. 

How are customer service and technical support?

We haven't had contact with technical support. 

How was the initial setup?

The initial set was very straightforward because it's also in our Azure cloud so it was quite easy to set up and configure. Very intuitive.

What other advice do I have?

I would rate this solution an eight out of 10. 

Which deployment model are you using for this solution?

Private Cloud

If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?

Microsoft Azure
Disclosure: I am a real user, and this review is based on my own experience and opinions.
Anirban Bhattacharya
Practice Head, Data & Analytics at a computer software company with 10,001+ employees
Real User
Top 5Leaderboard
Key feature is ability to make changes in structure or data size and align for subsequent consumption
Pros and Cons
  • "Can cut across the entire ecosystem of open source technology to give an extra level of getting the transformatory process of the data."
  • "Implementation of Databricks is still very code heavy."

What is our primary use case?

We have a team that works on Databricks for our clients. We are customers of Databricks. 

What is most valuable?

Databricks can cut across the entire ecosystem of open source technology which gives an extra level in terms of getting the transformatory process of the data. The solution is primarily open source and they have bolstered its components to make it more fit for purpose for a complete Azure Data platform. The solution is responsible for the core transformatory activities. While Azure Data Factory is very good for pulling in the data, doing the basic standardization and profiling, Databricks is more about making fundamental changes in structure or in size of the data and aligning it for subsequent consumption, or for the final layer on Synapse. It also has the power to complement and work with Spark and elements related to Python. 

What needs improvement?

In my view, the fundamental approach of implementing Databricks is still very code heavy, more than you find in Azure Data Factory and other technologies like Informatica or SQL Server Integration Service. From my perspective, that could be improved. I'd also like to have the ability to facilitate predictive analytics within the solution. 

For how long have I used the solution?

I've been using the solution for a year and a half. 

What do I think about the stability of the solution?

Stability of the product is good, whether it's handling large volumes, diverse elements of data or processing data at speed. It has stood the test of time. It's a solution that really lends itself to that higher level of stability, versatility and diversity in terms of its capability to process different forms of data.

What's my experience with pricing, setup cost, and licensing?

The cost of the solution is slightly on the high side so it's important to use it efficiently.

What other advice do I have?

Use the solution wisely and in tandem with Azure Data Factory. Apply the prism in your overall design of the pipelines of the flow, to utilize to its potential. Databricks offers significant capability to the transformatory and data tranching capabilities in terms of diverse variety to Azure Data Stack per se. In terms of the license, ensure that the customer is getting what they paid for so that the value for money is realized. 

I rate the solution eight out of 10. 

Disclosure: I am a real user, and this review is based on my own experience and opinions.
Flag as inappropriate
Vice President, Business Intelligence and Analytics at a tech services company with 10,001+ employees
Consultant
Top 20
Stable cloud platform for data engineering and has a straightforward setup
Pros and Cons
  • "I haven't heard about any major stability issues. At this time I feel like it's stable."
  • "Pricing is one of the things that could be improved."

What is our primary use case?

We are still exploring the solution. We utilize it much, much better than their star schema models that they are trying to replace it with. We bring in Databricks and then see how they can leverage the additional analytical functionalities around the Databricks cloud. It's more in exploratory ways. We recommend Databricks, especially with the Azure cloud frameworks.

What needs improvement?

Pricing is one of the things that could be improved.

Also, there could be improvement in the visual analytics space there and on the machine learning functions. I haven't explored so I don't know about the functions and features that are there. If it is not there, then I think that's something which they should consider including.

For how long have I used the solution?

My team has been exploring Databricks for close to five or six months.

What do I think about the stability of the solution?

I haven't heard about any major stability issues. At this time I feel like it's stable.

What do I think about the scalability of the solution?

In terms of scalability, I think once we put it across for larger use-cases the scalability question will really arise. So we'll need detailed information. I assume that we will be able to scale up.

I think we do not have more than 10 people working on it now. Because we are in the earlier stages of implementation, it's more like a POC now. I really don't know whether it's been open for the larger audience yet.

How was the initial setup?

The initial setup was straightforward.

What about the implementation team?

It is better to be installed with the help of integrators, or consultants, or with an experienced team.

What other advice do I have?

It's more data scientists using Databricks. I would call them power users trying to see how they can get a hand on it, though they are not data scientists. They try to understand it a little bit better for their future use.

On a scale of one to ten, I would rate it an eight, easy. 

Which deployment model are you using for this solution?

Public Cloud
Disclosure: My company has a business relationship with this vendor other than being a customer: Partner
Allan Kirszberg
Coordenador Financeiro at Icatu
Real User
Top 20
Good technical support, but is difficult to set up and integrate
Pros and Cons
  • "The technical support is good."
  • "The initial setup is difficult."

What is our primary use case?

I believe we are using the new version.

Our company makes comprehensive use of the solution to consolidate data and do a certain amount of reporting and analytics. All the data consumers use Databricks to develop the information.

What needs improvement?

Data governance should be addressed. We have some trouble connecting all the governance solutions with Databricks. This means the integrative capabilities are problematic. 

The initial setup is difficult. 

For how long have I used the solution?

We have been using Databricks for a year-and-a-half.

What do I think about the stability of the solution?

The solution is stable. 

What do I think about the scalability of the solution?

The solution is scalable. 

How are customer service and support?

The technical support is good. 

Which solution did I use previously and why did I switch?

As we are talking about a corporate solution, the deployment of Databricks lasted longer than the one day it took for Alteryx. 

We used Alteryx prior to Databricks and continue to do so, it being the only other solution we have employed. We use the two with different software. 

How was the initial setup?

The initial setup is difficult. 

While I don't know exactly how long the deployment took, I do know that it lasted longer than the one day needed for Alteryx. 

What about the implementation team?

I believe we used a partner for the deployment, although I cannot say for certain, as this is not within my purview. 

I don't know how many people are needed for maintenance and deployment. 

What's my experience with pricing, setup cost, and licensing?

As the licensing is not within my purview, I am not in a position to comment on this. 

What other advice do I have?

My company makes use of the solution. It is employed by my data team and the technology one. I do not have personal experience using the solution. 

The solution is deployed on base, on data. 

I am not aware of how many people make use of it. 

I rate Databricks as a seven out of ten. 

Which deployment model are you using for this solution?

Private Cloud
Disclosure: I am a real user, and this review is based on my own experience and opinions.
Flag as inappropriate
Associate Manager at a consultancy with 501-1,000 employees
Real User
Top 5Leaderboard
Efficient, high data volume processing, and easy to use
Pros and Cons
  • "The main features of the solution are efficiency."
  • "There should be better integration with other platforms."

What is our primary use case?

We use this solution to process data, for example, data validation.

What is most valuable?

The main features of the solution are efficiency.

We were trying to process 300 million records over 10 years. If you are processing that high number of records through the ADF pipeline with, for example, Azure, it took approximately six hours. In order to reduce the burden on our ADF pipeline, we wrote a simple code in this solution where we can read and write to the file into the temporary Storage Explorer. By going through this solution, we were able to complete the processing of the data in half an hour.

The technology that allows us to write scripts within the solution is extremely beneficial. If I was, for example, able to script in SQL, R, Scala, Apache Spark, or Python, I would be able to use my knowledge to make a script in this solution. It is very user-friendly and you can also process the records and validation point of view.

The ability to migrate from one environment to another is useful.

What needs improvement?

There should be better integration with other platforms.

For how long have I used the solution?

I have been using this solution for two years.

What do I think about the scalability of the solution?

I have approximately 20 users using this solution in my organization. We have plans to increase our usage in the future.

How was the initial setup?

There is no installation required. It is easy to use, for example, in Azure it is available, you subscribe, and use it.

What's my experience with pricing, setup cost, and licensing?

The solution uses a pay-per-use model with an annual subscription fee or package. Typically this solution is used on a cloud platform, such as Azure or AWS, but more people are choosing Azure because the price is more reasonable.

What other advice do I have?

I rate Databricks a nine out of ten.

Which deployment model are you using for this solution?

Public Cloud
Disclosure: I am a real user, and this review is based on my own experience and opinions.
Flag as inappropriate
Buyer's Guide
Download our free Databricks Report and get advice and tips from experienced pros sharing their opinions.