Data Platforms Virtual Summit is the only online conference focused exclusively on helping data teams build a modern big data platform.  It will bring together practitioners and industry gurus who will share best practices, use cases and success stories about achieving big data success.

Join the global community of data professionals and your peers in a series of interactive webinars, roundtables and keynote sessions: all accessible online from anywhere in the world. There will be 20+ sessions across 3 tracks delivered in 4 days.

Big Data in the Cloud

  • Scaling big data infrastructure in the cloud
  • Role machine learning plays in a modern big data platform
  • Cybersecurity
View this track

Data-Driven Cultures

  • Importance of being data driven
  • Culture, People and Technology
  • Application design patterns for the cloud
View this track

Data Science

  • Evolution of data science workflows
  • Data exploration and visualization to machine learning and natural language processing
  • Data science workflow best practices
View this track

Big Data in the Cloud

Using Machine Learning to Manage User Access

Jon Austin Osborne, Machine Learning Engineer, Capital One

Watch Now

In any enterprise, one of the most prevalent security risks revolves around who has access to which resources. Whether data is being stored in a cloud solution or on-premises, there is a large challenge in knowing how to provide the correct privileges to associates. By using machine learning and clustering algorithms like the Louvain Method, we can group similar users in the Capital One network and create two valuable features: (1) automated onboarding and (2) automated "rogue access" detection.  With the utilization of machine learning, we have allowed Capital One to become a more well-managed company, and have reduced a major cybersecurity threat. This talk will be a deep dive into the model, data engineering and productionization of the web application interface.

How We Built a Scalable, Real-time User Targeting System

Sriranjan Manjunath, CTO and Head of Engineering @ Saavn

Watch Now

Saavn is India’s leading music streaming service. Since context is key to music, we have built a system called Sniper that lets us identify cohorts of users in real-time and target them for marketing, advertising and recommendation purposes. This system allows us to understand user behavior by quantifying their engagement characteristics such as stream consumption, affinities or ads. Speed and scalability are critical to its design. This talk will cover our motivations behind building such a system and how big data technologies have helped us architect it.

On-Demand Analytics: Building Big Data Solutions with Azure Data Lake

Cathy Palmer, Ph.D., Principal Program Manager, Microsoft

Watch Now

Enterprises are building big data solutions with Azure Data Lake, an on-demand, real-time stream processing service with a no-limits data lake built to support massively parallel analytics. Patterns of enterprise solutions are emerging and evolving as customers migrate their analytics workloads to the cloud and embrace new business opportunities. With an overview of Azure Data Lake, this webinar briefly explores some of the choices customers are making in building big data solutions with Azure Data Lake.

Accelerate Time to Value with Data Operations

Saket Saurabh, Co-Founder & CEO, Nexla

Watch Now

Today, 91% of companies are ingesting data from third party partners to run their businesses. Additionally, 70% of companies either currently send or plan to send data to partners. This inter-company data collaboration powers insights, machine learning, and better consumer experiences. But, it also increases workloads for strapped engineering teams and creates challenges to data access. Learn how companies are streamlining and even automating their Data Operations to accelerate the time from data to business value.

How We Built a Scalable, Fast, & Reliable Indexing Infrastructure

Navin Agarwal, Principal Engineer, BloomReach

Watch Now

At BloomReach we process around 100 million products everyday across all of our customers. For each customer, the feed processing needs to be fast and reliable, and while indexing there shouldn't be any impact on serving. We will walk over how we've built this in BloomReach while also making sure that the cost is minimal.

Data Governance, Discovery, & Lineage in a Heterogeneous Streaming Big Data Platform

Barbara Eckman, Principal Data Architect, Comcast

Watch Now

Data governance, discovery and lineage help data scientists find and integrate data of interest to uncover otherwise hidden trends, anomalies, and powerful predictors of business successes and failures. Comcast’s Streaming Data Platform comprises a wide variety of ingest, transformation, and storage services. Peer-reviewed Apache Avro schemas support end-to-end data governance. Apache Atlas is our metadata repository for data discovery and lineage. We have extended Atlas with custom data and process types, eg.: avro schemas; AWS S3 buckets and prefixes; kafka topics; and kinesis streams. Custom asynchronous messaging libraries notify Atlas of new data and schema entities and lineage links as they are created.

Real-time Analytics on Streaming Data with Azure Stream Analytics

Krishna Mamidipaka, Sr. Program Manager, Microsoft Azure

Watch Now
Continuous streams of data are generated in every industry from sensors, manufacturing IoT devices, business transactions, social media, network devices, clickstream logs, and more. Found within these streams of data are critical business insights that are waiting to be unlocked. Attend this session and learn how customers are creating solutions for fleet monitoring, smart grid, network monitoring, recommendations, and other real-time solutions to analyze multiple concurrent streams of data-in-motion into insights and actions for competitive advantage.
In this session you will see demos and learn how services like Azure Event Hubs, Stream Analytics, Machine Learning, and other Azure services work seamlessly together to create your end to end real-time analytics solutions.

Modern Data Architecture with AWS

Pratap Ramamurthy, Partner Solution Architect, Amazon Web Services

Watch Now

Today’s organizations are tasked with managing multiple data types, coming from a wide variety of sources. Faced with massive volumes and heterogeneous types of data, organizations are finding that in order to deliver insights in a timely manner, they need a data storage and analytics solution that offers more agility and flexibility than traditional data management systems. Every use case might be different and different use cases might need different tools. AWS provides a variety of options for your needs from RDS, EMR, Redshift, Athena and Quicksight. In this talk we will discuss the different technologies available on AWS and its application.

Amplifying Retail with Big Data and The Cloud

Carter Bradford, Senior Vice President, Precocity

Watch Now
Once considered the "black magic" of digitally-born retailers like Amazon, personalizing the customer experience has now become table stakes for any retailer interested in surviving in the era of digital transformation. the techniques, tools and scalable platforms necessary to optimize customer interactions are now available and accessible for use by companies of any shape or size. We'll discuss how the use of big data technology in the cloud eases the implementation of common retail use cases as well as how it helps to avoid typical pitfalls.

Untangling the Cloud Services Hairball

James Curtis, Senior Analyst - Data Platforms & Analytics, 451 Research

Watch Now

The question is not much whether to migrate to the cloud or not. That question has likely already been answered by many organizations and the answer is a resounding full steam ahead. But the start of the journey can be daunting especially with a lot of ‘as-a-service’ terminology floating around. Please join James Curtis, senior analyst at 451 Research, as he discusses not only some industry trends and what many organizations are doing but also a simplified approach to understanding cloud services and how that might best fit your organization. Because it’s not so much buyer beware; it’s more about buyer understand.

Big Data Trends with Oracle Bare Metal Cloud

Andrew Reichman, Sr. Director of Cloud Strategy, Oracle

Watch Now
Cloud has changed the game when it comes to data analytics. Previously, organizations had to lock themselves into a particular architecture and level of capacity for three to seven years and do all the lifting themselves. Cloud on the other hand allows them to experiment with different hardware and software options, get more of the solution as a service and scale up and down to meet project spikes and accelerate busy jobs at will. This makes it much more viable for any company to get the advantages of advanced analytics against large data sets, without an oversize IT staff or huge capital investments.
Oracle cloud is specifically designed to help enterprises take advantage of cloud for data analytics—it offers massive non-variable performance, predictable low cost and broad choice of deployment and software options. Oracle and Qubole work together to deliver a new breed of data platform—capable of taming the scale, performance, cost and complexity issues associated with gaining business insight from data of all types.Watch this webinar to understand:
- Summary of industry trends for big data on the cloud
- How Oracle Cloud Infrastructure is optimized for big data workloads from a cost, performance and flexibility perspective
- How Oracle Cloud Big Data solutions compare with on-premises and competing cloud options

Data-Driven Cultures

A Data Platform to Enable Intelligent Features

Karthik Subramaniam, Data Platform Lead, Data Science & Engineering, Under Armour Connected Fitness

Watch Now

Under Armour built the world's most comprehensive health and wellness application: the Connected Fitness Data Platform. It consists of event streaming pipelines and processing using big data technologies like Hive, Presto and Spark to derive the insights needed to keep their users fit and healthy. Discover their step-by-step process.

Building a Culture to Support a Big Data Model in the Cloud

Dale Treece, Senior Solutions Architect and Engineering Lead, Digital Data Services, Scripps Networks Interactive

Watch Now

For years IT has been tasked to produce, gather, and store large volumes of internal and external data. We’re now engaged in developing the infrastructure to support analysis of that data. A cloud-based, self-service, big-data model can be the answer and provide numerous benefits and efficiencies. But with those benefits, there are cultural, organizational and architectural hurdles to clear. We will discuss the challenges we faced at Scripps Networks Interactive, and the successful team and architectural outcomes that emerged.

Qubole and Talend: A Match Made in the Cloud

Shawn James, Sr. Director Technology Alliances, Talend

Watch Now

Talend provides the data agility businesses need to use the latest cloud technologies to act with insight across their organization and win in an economy being deeply transformed by exploding data volumes, technology innovation, and fundamental changes to the IT infrastructure. Join us to learn how Talend and Qubole together help companies’ business users execute data preparation workloads in the cloud at a fraction of the cost and resources.

Creating Real Value: Bridging the Gap between Analysis and Action

Dillon Morrison, Platform Alliances Manager, Looker & Andrew Wynn, Product Manager, Looker

Watch Now

Looker is the modern data platform that democratizes data analytics, creates meaningful insights, and powers critical business actions. The Looker platform allows you to analyze your data and act on it within a single interface, enabling both business users and analysts to add maximum value where it matters most. Stop jumping between tabs and tools – do it all in Looker.

Bringing DevOps to the World of Data Science

Sridhar Alla, Big Data Architect, Comcast

Watch Now

We will look at how DevOps is making Data Science more mainstream with automation, release trains, agility and operational readiness. In this talk, we will look at the various tools and techniques in building a successful Data Science practice and how DevOps can be introduced to provide Continuous Integration, Delivery, and Deployment of Data Science Models.

Get “Datopia”: Transforming to an Agile Data Culture

Tripp Smith, Chief Technical Officer, Clarity

Watch Now
This webinar is focused on:
- Increasing collaborative friction between engineers, analysts, and business
- Process-driven iteration i.e. balancing agility with discipline
- Making the quantitative business case for moving from big bang to continuous enhancement (convincing your CFO/CIO to shift from CapEx to OpEx)
- Case studies and outcomes from our clients

Fireside Chat: Lessons Learned from Facebook

Ashish Thusoo, CEO/Co-Founder, Qubole & Horia Margarit, Resident Data Scientist, Qubole

Watch Now

Qubole's CEO and CMO discuss Data Platforms 2017 highlights, favorite sessions, common themes, and future trends.

Data Science

Industrializing Data Science Workflows

Sean Downes, Senior Data Scientist, Expedia

Watch Now

Discover the evolution of data science workflows implemented at Expedia with a special emphasis on Learning to Rank problems. This session will explore the process of industrializing the data science workflow and best practices on how to keep your data productive, or even pull your organization out of the data swamp.

Data Science Stack in the Cloud

Evan Harris, Data Scientist, Return Path

Watch Now

Journey from exploration and visualization to machine learning and natural language processing. Discover how Return Path built a cloud-based, production ready, enterprise scale data solution without a dedicated Dev Ops team. Leveraging modern distributed computing frameworks like Spark and managed services like EMR and Qubole were key to the process.

Deep Learning for Biotechnology on Qubole

Matt Der, Chief Technology Officer, Notch

Watch Now

In the biological sciences, hypothesis-driven experiments and bottom-up design experiments rely on predicting what will happen with new cells and molecules. Machine learning excels at prediction and has become more democratized, making it an important component in the biotech toolkit. We use Merck's Kaggle competition as a representative task in this domain that involves predicting molecular activity from numeric descriptors of chemical structure. Our approach utilizes deep neural networks using the Keras library in a Qubole notebook, which is conveniently attached to an autoscaled Spark cluster. We use Spark to distribute the hyperparameter search for optimizing the neural net.

Azure Machine Learning and R to Speed-up Data Science Projects

Scott Donohoo, Technology Solutions Professional, Microsoft, & Erik Zwiefel, Technology Solutions Professional, Microsoft

Watch Now
This session will cover how Azure Machine Learning and R can help data scientists overcome the following challenges:
- Development Time - Dramatically reduce the time of running initial ML experiment validations.
- Performance - Option for best in class performance.
For deeper data science needs the session will explore how hard core data scientists can leverage R to attack the most complex scenarios. Finally we will explore Python integration on the Microsoft stack and what is new between CNTK and TensorFlow in Azure.

BS-free Data Science

Aman Naimat, Senior Vice President, Technology, DemandBase

Watch Now
There is a surge in hype around Artificial Intelligence. Startups are raising hundreds of millions of dollars by bedazzling investors with Deep Learning, word embeddings, and reinforcement learning. This is a distraction from the very real problems that data and AI can solve if done right. By working across dozens of machine learning problems that are live in the real world, I’ve worked out the most common problems encountered and recurring design patterns on how to solve real-world problems using AI as a tool. This talk will arm you with a perspective on how to get pragmatic solutions with AI today.

Data Platforms 2018

Sign up to hold your seat at the second annual Data Platforms 2018, a unique opportunity for data practitioners to come together to share best practices enabling companies to become data driven.

Save the Date

Contact the Organizer

Please leave this field empty.