Hosting and Deployment: Open-Sourced vs. Proprietary
Learn how to choose between open-sourced vs. proprietary cloud-sourced software.
Another choice we need to make regarding the destination repository is whether to use open-source software or proprietary solutions.
Open-source software (OSS)
An open-source software (OSS) is a model for distributing software in which the software and its underlying code base are freely available for users to use, modify, and distribute (under specific licensing terms). OSS is created by a certain user or a small team and then released into the wild for public use. Or, an enterprise first develops it as a solution for a certain problem, and after a while, it decides to release it as open-source software.
Community managed OSS
OSS can be managed completely free by a large community of developers. The community will collaborate and innovate the existing code base of the OSS and distribute it for free for anyone to use (for example, PostgreSQL). These types of OSS require users to install, host, maintain, and upgrade the software in their own environment, which can be trivial or extremely complicated.
Some considerations while choosing community managed OSS are:
Community and popularity: Use only popular OSS, which has existed for a while and has a large and devoted community. A good way to measure that is by looking at the GitHub repository and checking the number of stars, forks, and contributions in recent times.
Maturity: How long has the open-source project been around? How active is it lately? Do people use it in production?
Troubleshooting: How will we handle problems when they arise? Can we handle them on our own? How helpful is the community in troubleshooting?
Expertise: When choosing community-managed OSS, it’s important to be aware of our data team’s technical skills and expertise because they have to host and maintain it themselves.
Commercially managed OSS
Because of the drawbacks of community managed OSS, namely, the difficulties of hosting and maintaining the software on our own, another type of OSS has emerged as commercially managed.
In other words, a vendor offers a service where they manage and host the OSS on their environment and offer users a paid service to use their environment. This removes the management headache of hosting the OSS but introduces a cost. This type of OSS typically becomes a cloud software as a service (SaaS) platform built around a managed version of the open-source code (for example, Databricks is offering a platform with a paid version of the OSS Apache Spark).
Some considerations while choosing commercially managed OSS are:
Value: Does the vendor offering the OSS offer better value compared to hosting and managing the OSS on our own?
Pricing: Does the service offer a reasonable pricing model? Can we employ on-demand pricing or pay-as-you-go? How easy is it to track the billing?
Support: Is there an extra cost for support? How extensive is the support? Anything that’s not supported by the vendor is automatically our responsibility to maintain.
Proprietary software
Another type of software that we can choose to employ is proprietary software or closed-source software. Most companies today prefer to offer a data tool as a proprietary solution instead of releasing it as OSS software.
Proprietary software costs more than commercially managed OSS but comes with some added benefits. Proprietary software should be free of bugs. If a bug still exists, updates known as patches are provided free of charge. The support for proprietary software is much more extensive and has constant new feature updates. This option works well with cloud-based services. Cloud vendors develop and sell fully maintained proprietary services for storage and analytics.
Cloud vendors often bundle their proprietary services to work well together, therefore offering more mature and well-integrated data tools.
Some considerations while choosing proprietary software are:
Interoperability: Make sure that the tool can be integrated and operated with other tools in the data stack.
Support, documentation, and popularity: How large is the community? Is there enough support and documentation?
Pricing: Map out usage scenarios and predict expected costs. If the expected usage is long-term, try to negotiate a better contract with a discount.
Conclusion
Generally, companies should invest resources in customization. We should favor community and commercially managed OSS as a default option for choosing software, allowing the company to customize its services and build something that adds significant value. Doing this will free us to buy proprietary software for areas where these options are not enough.
Before making the business case for using a proprietary solution as the destination repository for the ETL pipeline, we should first determine whether an OSS can do the job.
Get hands-on with 1400+ tech skills courses.