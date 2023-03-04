Partner im RedaktionsNetzwerk Deutschland
Tobias Macey
This show goes behind the scenes for the tools, techniques, and difficulties associated with the discipline of data engineering. Databases, workflows, automation, and data manipulation are just some of the topics that you will find here.
Technology
This show goes behind the scenes for the tools, techniques, and difficulties associated with the discipline of data engineering. Databases, workflows, automatio... More

  • Realtime Data Applications Made Easier With Meroxa
    Summary Real-time capabilities have quickly become an expectation for consumers. The complexity of providing those capabilities is still high, however, making it more difficult for small teams to compete. Meroxa was created to enable teams of all sizes to deliver real-time data applications. In this episode DeVaris Brown discusses the types of applications that are possible when teams don't have to manage the complex infrastructure necessary to support continuous data flows. 

Your host is Tobias Macey and today I'm interviewing DeVaris Brown about the impact of real-time data on business opportunities and risk profiles 

Interview 
Introduction 
How did you get involved in the area of data management? 
Can you describe what Meroxa is and the story behind it? 
How have the focus and goals of the platform and company evolved over the past 2 years? 
Who are the target customers for Meroxa? 
What problems are they trying to solve when they come to your platform? 
Applications powered by real-time data were the exclusive domain of large and/or sophisticated tech companies for several years due to the inherent complexities involved. 
What are the shifts that have made them more accessible to a wider variety of teams? 
What are some of the remaining blockers for teams who want to start using real-time data? 
With the democratization of real-time data, what are the new categories of products and applications that are being unlocked? 
How are organizations thinking about the potential value that those types of apps/services can provide? 
With data flowing constantly, there are new challenges around oversight and accuracy. 
How does real-time data change the risk profile for applications that are consuming it? 
What are some of the technical controls that are available for organizations that are risk-averse? 
What skills do developers need to be able to effectively design, develop, and deploy real-time data applications? 
How does this differ when talking about internal vs. consumer/end-user facing applications? 
What are the most interesting, innovative, or unexpected ways that you have seen Meroxa used? 
What are the most interesting, unexpected, or challenging lessons that you have learned while working on Meroxa? 
When is Meroxa the wrong choice? 
What do you have planned for the future of Meroxa? 

Contact Info 
LinkedIn (https://www.linkedin.com/in/devarispbrown/) 
@devarispbrown (https://twitter.com/devarispbrown) on Twitter 

Parting Question 
From your perspective, what is the biggest gap in the tooling or technology for data management today? 

Links 
Meroxa (https://meroxa.com/) 
Podcast Episode (https://www.dataengineeringpodcast.com/meroxa-data-integration-episode-153/) 
Kafka (https://kafka.apache.org/) 
Kafka Connect (https://docs.confluent.io/platform/current/connect/index.html) 
Conduit (https://github.com/ConduitIO/conduit) - golang Kafka connect replacement 
Pulsar (https://pulsar.apache.org/) 
Redpanda (https://redpanda.com/) 
Flink (https://flink.apache.org/) 
Beam (https://beam.apache.org/) 
Clickhouse (https://clickhouse.tech/) 
Druid (https://druid.apache.org/) 
Pinot (https://pinot.apache.org/)
    4/24/2023
    45:26
  • Building Self Serve Business Intelligence With AI And Semantic Modeling At Zenlytic
    Summary Business intellingence has been chasing the promise of self-serve data for decades. As the capabilities of these systems has improved and become more accessible, the target of what self-serve means changes. With the availability of AI powered by large language models combined with the evolution of semantic layers, the team at Zenlytic have taken aim at this problem again. In this episode Paul Blankley and Ryan Janssen explore the power of natural language driven data exploration combined with semantic modeling that enables an intuitive way for everyone in the business to access the data that they need to succeed in their work. 

Your host is Tobias Macey and today I'm interviewing Paul Blankley and Ryan Janssen about Zenlytic, a no-code business intelligence tool focused on emerging commerce brands 

Interview 
Introduction 
How did you get involved in the area of data management? 
Can you describe what Zenlytic is and the story behind it? 
Business intelligence is a crowded market. What was your process for defining the problem you are focused on solving and the method to achieve that outcome? 
Self-serve data exploration has been attempted in myriad ways over successive generations of BI and data platforms. 
What are the barriers that have been the most challenging to overcome in that effort? 
What are the elements that are coming together now that give you confidence in being able to deliver on that? 
Can you describe how Zenlytic is implemented? 
What are the evolutions in the understanding and implementation of semantic layers that provide a sufficient substrate for operating on? 
How have the recent breakthroughs in large language models (LLMs) improved your ability to build features in Zenlytic? 
What is your process for adding domain semantics to the operational aspect of your LLM? 
For someone using Zenlytic, what is the process for getting it set up and integrated with their data? 
Once it is operational, can you describe some typical workflows for using Zenlytic in a business context? 
Who are the target users? 
What are the collaboration options available? 
What are the most complex engineering/data challenges that you have had to address in building Zenlytic? 
What are the most interesting, innovative, or unexpected ways that you have seen Zenlytic used? 
What are the most interesting, unexpected, or challenging lessons that you have learned while working on Zenlytic? 
When is Zenlytic the wrong choice? 
What do you have planned for the future of Zenlytic? 

Contact Info 
Paul Blankley (LinkedIn) (https://www.linkedin.com/in/paulblankley/) 

Parting Question 
From your perspective, what is the biggest gap in the tooling or technology for data management today? 

Links 
Zenlytic (https://zenlytic.com/) 
OLAP Cube (https://analyticsengineers.club/whats-an-olap-cube/) 
Large Language Model (https://en.wikipedia.org/wiki/Large_language_model) 
Starburst (https://www.starburst.io/) 
Prompt Engineering (https://en.wikipedia.org/wiki/Prompt_engineering) 
ChatGPT (https://openai.com/blog/chatgpt)
    4/16/2023
    49:19
  • An Exploration Of The Composable Customer Data Platform
    Summary The customer data platform is a category of services that was developed early in the evolution of the current era of cloud services for data processing. When it was difficult to wire together the event collection, data modeling, reporting, and activation it made sense to buy monolithic products that handled every stage of the customer data lifecycle. Now that the data warehouse has taken center stage a new approach of composable customer data platforms is emerging. In this episode Darren Haken is joined by Tejas Manohar to discuss how Autotrader UK is addressing their customer data needs by building on top of their existing data stack. Your host is Tobias Macey and today I'm interviewing Darren Haken and Tejas Manohar about building a composable CDP and how you can start adopting it incrementally 

Interview 
Introduction 
How did you get involved in the area of data management? 
Can you describe what you mean by a "composable CDP"? 
What are some of the key ways that it differs from the ways that we think of a CDP today? 
What are the problems that you were focused on addressing at Autotrader that are solved by a CDP? One of the promises of the first generation CDP was an opinionated way to model your data so that non-technical teams could own this responsibility. What do you see as the risks/tradeoffs of moving CDP functionality into the same data stack as the rest of the organization? 
What about companies that don't have the capacity to run a full data infrastructure? 
Beyond the core technology of the data warehouse, what are the other evolutions/innovations that allow for a CDP experience to be built on top of the core data stack? 
added burden on core data teams to generate event-driven data models 
When iterating toward a CDP on top of the core investment of the infrastructure to feed and manage a data warehouse, what are the typical first steps? 
What are some of the components in the ecosystem that help to speed up the time to adoption? (e.g. pre-built dbt packages for common transformations, etc.) What are the most interesting, innovative, or unexpected ways that you have seen CDPs implemented? What are the most interesting, unexpected, or challenging lessons that you have learned while working on CDP related functionality? When is a CDP (composable or monolithic) the wrong choice? What do you have planned for the future of the CDP stack? Contact Info Darren LinkedIn (https://www.linkedin.com/in/darrenhaken/?originalSubdomain=uk) @DarrenHaken (https://twitter.com/darrenhaken) on Twitter Tejas LinkedIn (https://www.linkedin.com/in/tejasmanohar) @tejasmanohar (https://twitter.com/tejasmanohar) on Twitter Parting Question From your perspective, what is the biggest gap in the tooling or technology for data management today? Closing Announcements Thank you for listening! Don't forget to check out our other shows. Podcast.__init__ (https://www.pythonpodcast.com) covers the Python language, its community, and the innovative ways it is being used. The Machine Learning Podcast (https://www.themachinelearningpodcast.com) helps you go from idea to production with machine learning. Visit the site (https://www.dataengineeringpodcast.com) to subscribe to the show, sign up for the mailing list, and read the show notes. If you've learned something or tried out a project from the show then tell us about it! Email [email protected] (mailto:[email protected])) with your story. To help other people find the show please leave a review on Apple Podcasts (https://podcasts.apple.com/us/podcast/data-engineering-podcast/id1193040557) and tell your friends and co-workers Links Autotrader (https://www.autotrader.co.uk/) Hightouch (https://hightouch.com/) Customer Studio (https://hightouch.com/platform/customer-studio) CDP == Customer Data Platform (https://blog.hubspot.com/service/customer-data-platform-guide) Segment (https://segment.com/) Podcast Episode (https://www.dataengineeringpodcast.com/segment-customer-analytics-episode-72/) mParticle (https://www.mparticle.com/) Salesforce (https://www.salesforce.com/) Amplitude (https://amplitude.com/) Snowplow (https://snowplow.io/) Podcast Episode (https://www.dataengineeringpodcast.com/snowplow-with-alexander-dean-episode-48/) Reverse ETL (https://medium.com/memory-leak/reverse-etl-a-primer-4e6694dcc7fb) dbt (https://www.getdbt.com/) Podcast Episode (https://www.dataengineeringpodcast.com/dbt-data-analytics-episode-81/) Snowflake (https://www.snowflake.com/en/) Podcast Episode (https://www.dataengineeringpodcast.com/snowflakedb-cloud-data-warehouse-episode-110/) BigQuery (https://cloud.google.com/bigquery) Databricks (https://www.databricks.com/) ELT (https://en.wikipedia.org/wiki/Extract,_load,_transform) Fivetran (https://www.fivetran.com/) Podcast Episode (https://www.dataengineeringpodcast.com/fivetran-data-replication-episode-93/) DataHub (https://datahubproject.io/) Podcast Episode (https://www.dataengineeringpodcast.com/acryl-data-datahub-metadata-graph-episode-230/) Amundsen (https://www.amundsen.io/) Podcast Episode (https://www.dataengineeringpodcast.com/amundsen-data-discovery-episode-92/) The intro and outro music is from The Hug (http://freemusicarchive.org/music/The_Freak_Fandango_Orchestra/Love_death_and_a_drunken_monkey/04_-_The_Hug) by The Freak Fandango Orchestra (http://freemusicarchive.org/music/The_Freak_Fandango_Orchestra/) / CC BY-SA (http://creativecommons.org/licenses/by-sa/3.0/)
    4/10/2023
    1:11:42
  • Mapping The Data Infrastructure Landscape As A Venture Capitalist
    Summary The data ecosystem has been building momentum for several years now. As a venture capital investor Matt Turck has been trying to keep track of the main trends and has compiled his findings into the MAD (ML, AI, and Data) landscape reports each year. In this episode he shares his experiences building those reports and the perspective he has gained from the exercise. 

Your host is Tobias Macey and today I'm interviewing Matt Turck about his annual report on the Machine Learning, AI, & Data landscape and the insights around data infrastructure that he has gained in the process 

Interview 
Introduction 
How did you get involved in the area of data management? 
Can you describe what the MAD landscape report is and the story behind it? 
At a high level, what is your goal in the compilation and maintenance of your landscape document? 
What are your guidelines for what to include in the landscape? 
As the data landscape matures, how have you seen that influence the types of projects/companies that are founded? 
What are the product categories that were only viable when capital was plentiful and easy to obtain? What are the product categories that you think will be swallowed by adjacent concerns, and which are likely to consolidate to remain competitive? 
The rapid growth and proliferation of data tools helped establish the "Modern Data Stack" as a de-facto architectural paradigm. As we move into this phase of contraction, what are your predictions for how the "Modern Data Stack" will evolve? Is there a different architectural paradigm that you see as growing to take its place? 
How has your presentation and the types of information that you collate in the MAD landscape evolved since you first started it?~~ 
What are the most interesting, innovative, or unexpected product and positioning approaches that you have seen while tracking data infrastructure as a VC and maintainer of the MAD landscape? 
What are the most interesting, unexpected, or challenging lessons that you have learned while working on the MAD landscape over the years? What do you have planned for future iterations of the MAD landscape? 

Contact Info 
Website (https://mattturck.com/) 
@mattturck (https://twitter.com/mattturck) on Twitter 
MAD Landscape Comments Email (mailto:[email protected]) 

Parting Question 
From your perspective, what is the biggest gap in the tooling or technology for data management today? 

Links 
MAD Landscape (https://mad.firstmarkcap.com) 
First Mark Capital (https://firstmark.com/) 
Bayesian Learning (https://en.wikipedia.org/wiki/Bayesian_inference) 
AI Winter (https://en.wikipedia.org/wiki/AI_winter) 
Databricks (https://www.databricks.com/) 
Cloud Native Landscape (https://landscape.cncf.io/) 
LUMA Scape (https://lumapartners.com/lumascapes/) 
Hadoop Ecosystem (https://www.analyticsvidhya.com/blog/2020/10/introduction-hadoop-ecosystem/) 
Modern Data Stack (https://www.fivetran.com/blog/what-is-the-modern-data-stack) 
Reverse ETL (https://medium.com/memory-leak/reverse-etl-a-primer-4e6694dcc7fb) 
Generative AI (https://generativeai.net/) 
dbt (https://www.getdbt.com/) 
Transform (https://transform.co/) 
Podcast Episode (https://www.dataengineeringpodcast.com/transform-co-metrics-layer-episode-206/) 
Snowflake IPO (https:// What do you have planned for future iterations of the MAD landscape? Contact Info Website (https://mattturck.com/) @mattturck (https://twitter.com/mattturck) on Twitter MAD Landscape Comments Email (mailto:[email protected]) Parting Question From your perspective, what is the biggest gap in the tooling or technology for data management today? Closing Announcements Thank you for listening! Don't forget to check out our other shows. Podcast.__init__ (https://www.pythonpodcast.com) covers the Python language, its community, and the innovative ways it is being used. The Machine Learning Podcast (https://www.themachinelearningpodcast.com) helps you go from idea to production with machine learning. Visit the site (https://www.dataengineeringpodcast.com) to subscribe to the show, sign up for the mailing list, and read the show notes. If you've learned something or tried out a project from the show then tell us about it! Email [email protected] (mailto:[email protected])) with your story. To help other people find the show please leave a review on Apple Podcasts (https://podcasts.apple.com/us/podcast/data-engineering-podcast/id1193040557) and tell your friends and co-workers Links MAD Landscape (https://mad.firstmarkcap.com) First Mark Capital (https://firstmark.com/) Bayesian Learning (https://en.wikipedia.org/wiki/Bayesian_inference) AI Winter (https://en.wikipedia.org/wiki/AI_winter) Databricks (https://www.databricks.com/) Cloud Native Landscape (https://landscape.cncf.io/) LUMA Scape (https://lumapartners.com/lumascapes/) Hadoop Ecosystem (https://www.analyticsvidhya.com/blog/2020/10/introduction-hadoop-ecosystem/) Modern Data Stack (https://www.fivetran.com/blog/what-is-the-modern-data-stack) Reverse ETL (https://medium.com/memory-leak/reverse-etl-a-primer-4e6694dcc7fb) Generative AI (https://generativeai.net/) dbt (https://www.getdbt.com/) Transform (https://transform.co/) Podcast Episode (https://www.dataengineeringpodcast.com/transform-co-metrics-layer-episode-206/) Snowflake IPO (https://www.cnn.com/2020/09/16/investing/snowflake-ipo/index.html) Dataiku (https://www.dataiku.com/) Iceberg (https://iceberg.apache.org/) Podcast Episode (https://www.dataengineeringpodcast.com/tabular-iceberg-lakehouse-tables-episode-363) Hudi (https://hudi.apache.org/) Podcast Episode (https://www.dataengineeringpodcast.com/hudi-streaming-data-lake-episode-209/) DuckDB (https://duckdb.org/) Podcast Episode (https://www.dataengineeringpodcast.com/duckdb-in-process-olap-database-episode-270/) Trino (https://trino.io/) Y42 (https://www.y42.com/) Podcast Episode (https://www.dataengineeringpodcast.com/y42-full-stack-data-platform-episode-295) Mozart Data (https://www.mozartdata.com/) Podcast Episode (https://www.dataengineeringpodcast.com/mozart-data-modern-data-stack-episode-242/) Keboola (https://www.keboola.com/) MPP Database (https://www.techtarget.com/searchdatamanagement/definition/MPP-database-massively-parallel-processing-database) The intro and outro music is from The Hug (http://freemusicarchive.org/music/The_Freak_Fandango_Orchestra/Love_death_and_a_drunken_monkey/04_-_The_Hug) by The Freak Fandango Orchestra (http://freemusicarchive.org/music/The_Freak_Fandango_Orchestra/) / CC BY-SA (http://creativecommons.org/licenses/by-sa/3.0/)
    4/3/2023
    1:01:57
  • Unlocking The Potential Of Streaming Data Applications Without The Operational Headache At Grainite
    Summary The promise of streaming data is that it allows you to react to new information as it happens, rather than introducing latency by batching records together. The peril is that building a robust and scalable streaming architecture is always more complicated and error-prone than you think it's going to be. After experiencing this unfortunate reality for themselves, Abhishek Chauhan and Ashish Kumar founded Grainite so that you don't have to suffer the same pain. In this episode they explain why streaming architectures are so challenging, how they have designed Grainite to be robust and scalable, and how you can start using it today to build your streaming data applications without all of the operational headache. Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management Businesses that adapt well to change grow 3 times faster than the industry average. As your business adapts, so should your data. RudderStack Transformations lets you customize your event data in real-time with your own JavaScript or Python code. Join The RudderStack Transformation Challenge today for a chance to win a $1,000 cash prize just by submitting a Transformation to the open-source RudderStack Transformation library. Visit dataengineeringpodcast.com/rudderstack (https://www.dataengineeringpodcast.com/rudderstack) today to learn more Hey there podcast listener, are you tired of dealing with the headache that is the 'Modern Data Stack'? We feel your pain. It's supposed to make building smarter, faster, and more flexible data infrastructures a breeze. It ends up being anything but that. Setting it up, integrating it, maintaining it—it’s all kind of a nightmare. And let's not even get started on all the extra tools you have to buy to get it to do its thing. But don't worry, there is a better way. TimeXtender takes a holistic approach to data integration that focuses on agility rather than fragmentation. By bringing all the layers of the data stack together, TimeXtender helps you build data solutions up to 10 times faster and saves you 70-80% on costs. If you're fed up with the 'Modern Data Stack', give TimeXtender a try. Head over to dataengineeringpodcast.com/timextender (https://www.dataengineeringpodcast.com/timextender) where you can do two things: watch us build a data estate in 15 minutes and start for free today. Join in with the event for the global data community, Data Council Austin. From March 28-30th 2023, they'll play host to hundreds of attendees, 100 top speakers, and dozens of startups that are advancing data science, engineering and AI. Data Council attendees are amazing founders, data scientists, lead engineers, CTOs, heads of data, investors and community organizers who are all working together to build the future of data. As a listener to the Data Engineering Podcast you can get a special discount of 20% off your ticket by using the promo code dataengpod20. Don't miss out on their only event this year! Visit: dataengineeringpodcast.com/data-council (https://www.dataengineeringpodcast.com/data-council) today Your host is Tobias Macey and today I'm interviewing Ashish Kumar and Abhishek Chauhan about Grainite, a platform designed to give you a single place to build streaming data applications Interview Introduction How did you get involved in the area of data management? Can you describe what Grainite is and the story behind it? What are the personas that you are focused on addressing with Grainite? What are some of the most complex aspects of building streaming data applications in the absence of something like Grainite? How does Grainite work to reduce that complexity? What are some of the commonalities that you see in the teams/organizations that find their way to Grainite? What are some of the higher-order projects that teams are able to build when they are using Grainite as a starting point vs. where they would be spending effort on a fully managed streaming architecture? Can you describe how Grainite is architected? How have the design and goals of the platform changed/evolved since you first started working on it? What does your internal build vs. buy process look like for identifying where to spend your engineering resources? What is the process for getting Grainite set up and integrated into an organizations technical environment? What is your process for determining which elements of the platform to expose as end-user features and customization options vs. keeping internal to the operational aspects of the product? Once Grainite is running, can you describe the day 0 workflow of building an application or data flow? What are the day 2 - N capabilities that Grainite offers for ongoing maintenance/operation/evolution of those applications? What are the most interesting, innovative, or unexpected ways that you have seen Grainite used? What are the most interesting, unexpected, or challenging lessons that you have learned while working on Grainite? When is Grainite the wrong choice? What do you have planned for the future of Grainite? Contact Info Ashish LinkedIn (https://www.linkedin.com/in/ashishkumarprofile/) Abhishek LinkedIn (https://www.linkedin.com/in/abhishekchauhan/) Parting Question From your perspective, what is the biggest gap in the tooling or technology for data management today? Closing Announcements Thank you for listening! Don't forget to check out our other shows. Podcast.__init__ (https://www.pythonpodcast.com) covers the Python language, its community, and the innovative ways it is being used. The Machine Learning Podcast (https://www.themachinelearningpodcast.com) helps you go from idea to production with machine learning. Visit the site (https://www.dataengineeringpodcast.com) to subscribe to the show, sign up for the mailing list, and read the show notes. If you've learned something or tried out a project from the show then tell us about it! Email [email protected] (mailto:[email protected])) with your story. To help other people find the show please leave a review on Apple Podcasts (https://podcasts.apple.com/us/podcast/data-engineering-podcast/id1193040557) and tell your friends and co-workers Links Grainite (https://www.grainite.com/) Blog about the challenges of streaming architectures (https://www.grainite.com/blog/there-was-an-old-lady-who-swallowed-a-fly) Getting Started Docs (https://gitbook.grainite.com/developers/getting-started) BigTable (https://research.google/pubs/pub27898/) Spanner (https://research.google/pubs/pub39966/) Firestore (https://cloud.google.com/firestore) OpenCensus (https://opencensus.io/) Citrix (https://www.citrix.com/) NetScaler (https://www.citrix.com/blogs/2022/10/03/netscaler-is-back/) J2EE (https://www.oracle.com/java/technologies/appmodel.html) RocksDB (https://rocksdb.org/) Pulsar (https://pulsar.apache.org/) SQL Server (https://en.wikipedia.org/wiki/Microsoft_SQL_Server) MySQL (https://www.mysql.com/) RAFT Protocol (https://raft.github.io/) The intro and outro music is from The Hug (http://freemusicarchive.org/music/The_Freak_Fandango_Orchestra/Love_death_and_a_drunken_monkey/04_-_The_Hug) by The Freak Fandango Orchestra (http://freemusicarchive.org/music/The_Freak_Fandango_Orchestra/) / CC BY-SA (http://creativecommons.org/licenses/by-sa/3.0/)
    3/25/2023
    1:13:33

About Data Engineering Podcast

This show goes behind the scenes for the tools, techniques, and difficulties associated with the discipline of data engineering. Databases, workflows, automation, and data manipulation are just some of the topics that you will find here.
