Scroll down to view the list of Keynotes, Tech Talks, Hands-On Labs, and Networking
Wednesday May 24
Qubole Co-Founders Ashish Thusoo and Joydeep Sen Sarma welcome you to Data Platforms 2017 to kick off this inaugural event. Pick up a copy of their new book “Creating a Data Driven Enterprise with DataOps: Insights from Facebook, Uber, LinkedIn, Twitter and eBay” Published by O’Reilly media and released May 24!!
Enjoy cocktails & conversation.
Thursday May 25
Ashish Thusoo will discuss his role creating the first modern, big data platform at Facebook, as well as insights from the new book. Joining Ashish will be book contributors and big data pioneers Shrikanth Shankar, LinkedIn, and Karthik Ramasamy, Cofounder of Streamlio, formerly at Twitter, sharing how they led their organizations through similar transformations to become data-driven businesses. They will share what they did, how they did it, and lessons learned on their journeys.
When we were growing up….
From startups to enterprises, industry leaders will discuss the growth aspects and challenges from various stages along the way. Which of the challenges led to technological innovation within the organization and/or adoption of newer tools and technologies and why?
Karthik Subramaniam, Data Platform Lead, Data Science & Engineering, Under Armour Connected Fitness
Oskar Austegard, Director, Data Solutions, Gannett
Colin Riddell, Senior Data Architect, EpicGames
Wade Warren, VP Engineering, Wikia
Tripp Smith, Clarity Insights
Rakesh Soni, Intersys Consulting
Moderated by: Andy Sautins, Technical Manager, Google
Experience the future! Join Joydeep Sen Sarma and team for some exciting announcements and cool demos!
Industry visionaries from Amazon, Microsoft and Oracle share their views on the future and promise of the next wave of cloud computing. Hear from:
Practitioners share best practices, techniques, challenges and solutions in these deep dive sessions on the technical, organizational and cultural aspects of building modern, big data platforms.
Over last two years, Fanatics Inc., the global leader in licensed sports merchandise, went through major transformations in terms of technology, and especially in data, by not only moving to Cloud from on-premise but also in terms of how the data is being strategically used to power the e-commerce and backend supply chain systems. From the very start, we expected the data platform to be exposed via a set of data and analytical web services that can act as a brain to provide a delightful customer experience–whether it’s ranking the relevant products or recommending the most interesting ones.
As web visitors browse through the web pages and transact, all events that are generated flow through a Kafka messaging system–Fanflow. These events are then aggregated by Flink and Spark Streaming consumers and machine learned models adapt and react to changes in behaviors evident in these events. Setting up data as services and blurring the line with application stack to provide business metrics and metadata, in addition to storing them as traditional warehouse or data lake, made all the difference. In this session, we will deep dive into couple of these data services and discuss how you may benefit from implementing similar patterns.
Discover the evolution of data science workflows implemented at Expedia with a special emphasis on Learning to Rank problems. This session will explore the process of industrializing the data science workflow and best practices on how to keep your data productive, or even pull your organization out of the data swamp.
Big Data encompasses a large landscape and building into the cloud can introduce more unique challenges. Two of the primary are cost and storage. Join Kellyn as she discusses cost savings by utilizing virtualization of multiple tiers encompassing the big data landscape through a review of real use cases, along with methods of discovery to gain incredible success and the technical specifications behind different big data platforms when engaging virtualization when data is big and platforms are vast.
Saavn is India’s leading music streaming service. Since context is key to music, we have built a system called Sniper that lets us identify cohorts of users in real-time and target them for marketing, advertising and recommendation purposes. This system allows us to understand user behavior by quantifying their engagement characteristics such as stream consumption, affinities or ads. Speed and scalability are critical to its design. This talk will cover our motivations behind building such a system and how big data technologies have helped us architect it.
Speaker: Balaji Mohanam, Product Manager @ Qubole
Discover the newly launched features in Qubole, powered by Data Intelligence, that automates mundane Data Model performance appraisal and simplifies Data Ops. This session will provide detailed walkthrough of Qubole’s latest offering in Data Intelligence that includes Data Model insights and Recommendations including Partitioning, Formatting and Sorting that helps optimize data models for improved performance and computing resources. In addition, learn about Qubole’s latest offering in self-service analytics and how it can improve analysts productivity by making data discovery easy through column and table name auto-suggestion and completion, and insights preview.
This presentation will discuss tips and tricks on how to build data sets optimized for Apache Spark. You will learn different aggregation strategies, granularity required for a data set, optimal file formats, partitioning strategies, and when to leverage caching to improve performance.
Speaker: Evan Harris, Data Scientist, Return Path
Journey from exploration and visualization to machine learning and natural language processing. Discover how Return Path built a cloud based, production ready, enterprise scale data solution without a dedicated Dev Ops team. Leveraging modern distributed computing frameworks like Spark and managed services like EMR and Qubole were key to the process.
The Under Armour built the world’s most comprehensive health and wellness application: the Connected Fitness Data Platform. It consists of event streaming pipelines and processing using big data technologies like Hive, Presto and Spark to derive the insights needed to keep their users fit and healthy. Discover their step by step process.
Infrastructure planning and implementing cluster best practices can lead to significant cost savings. In this in depth session, discover the benefits Oracle has reaped from using heterogeneous vs homogeneous clusters.
Increasingly, valuable customer data sources are dispersed among on-premise, SaaS providers, partners, 3rd party data providers and public data sets. Data Lakes in the cloud are the foundation for storing on-premises, 3rd party and public data sets at attractive price / performance. Atop this foundation, a portfolio of descriptive, predictive and real-time agile analytics can empower customers to answer their most important business questions. In this talk, 47Lining CEO, Mick Bass, will walk attendees through best practices data lake reference architecture in AWS and share real world customer use cases like predicting customer churn, propensity to buy, detecting fraud, optimizing industrial processes and content recommendations.
Behind all the glory of AI and machine learning advancements is the work of data operations. Ensuring that the right data is available to the right system in the right form can be as much as 80% of the overall workload. But don’t take our word for it— come hear the results of the first-ever DataOps professionals survey. In this session you’ll learn how the world’s leading companies are organizing their DataOps teams, what systems they are using, and how managers can better support their efforts. Learn how Nexla uses machine learning to automate common tasks such as data source monitoring, schema management, and data quality management. Leave the session knowing how you’ll spend more time analyzing your data, and less time wrangling with it.
At the Oracle Data Cloud, petabytes are processed to create custom, targeted, online advertising. Speed, throughput, and scalability are the three core metrics upon which architecture effectiveness is measured.
The Oracle Data Cloud has moved from on-premise, single machine data processing, to cloud based Hive and ultimately, Spark solutions. This talk covers the challenges along the way: success and failures, and where the easy Hive to Spark SQL (and beyond) translations did not work quite as advertised. The Oracle Data Cloud has improved the speed of processes by over 300%, and eliminated some gotchas that caused months of confusion.
For years IT has been tasked to produce, gather, and store large volumes of internal and external data. We’re now engaged in developing the infrastructure to support analysis of that data. A cloud-based, self-service, big-data model can be the answer and provide numerous benefits and efficiencies. But with those benefits, there are cultural, organizational and architectural hurdles to clear. We will discuss the challenges we faced at Scripps Networks Interactive, and the successful team and architectural outcomes that emerged.
Simple decisions in data lead to extra work for developers. Ingesting files (in CSV, TSV, etc. formats) and getting it right without losing data, is an expensive proposition. Compare this to receiving a file that is optimized or organized into a schema and good file format ahead of time. If industry and government were transporting files in a better way, we’d all save a lot of time and money, and the tools are widely available.
In any enterprise, one of the most prevalent security risks revolves around who has access to which resources. Whether data is being stored in a cloud solution or on-premises, there is a large challenge in knowing how to provide the correct privileges to associates. By using machine learning and clustering algorithms like the Louvain Method, we can group similar users in the Capital One network and create two valuable features: (1) automated onboarding and (2) automated “rogue access” detection. With the utilization of machine learning, we have allowed Capital One to become a more well-managed company, and have reduced a major cybersecurity threat. This talk will be a deep dive into the model, data engineering and productionization of the web application interface.
R. David Edelman, President Obama’s “Geek in Chief”
The United States is the global leader in big data technology – so how is the United States Government using big data today and how can it be used in the future? What are the key policy debates that will shape the big data landscape?
Enjoy cocktails, a casual barbecue dinner and play some bocce or volleyball. Poolside at the Wigwam.
Friday May 26
Ashish will kick off the morning with highlights from Thursday’s session. Joining him in conversation is book contributor and Former VP of Commerce Platform Infrastructure at eBay, Debashis Saha, who will share his insights and lead a discussion around charting your data platforms next steps. From there we’ll break into facilitated small group discussions designed to help you hone your action plans.