Temporary Delayed Data Processing & Availability
Incident Report for Heap
Resolved
All clear!
Posted Feb 12, 2018 - 10:36 PST
Monitoring
The app has been returned to its normal state and all customers should have access. We are continuing to monitor the situation to ensure stability and performance.
Posted Feb 10, 2018 - 23:06 PST
Update
Migrations and Sources data have resumed normal behavior for all customers. This means that your Identify calls will work as expected and Sources data (Stripe, Mailchimp, etc) have resumed ingestion.

Affected environments remain closed. You may contact support@heapanalytics.com to re-enable access to your data.

During this time, new customers may experience slower queries.
Posted Feb 08, 2018 - 16:29 PST
Identified
On Wednesday afternoon at ~3:00PM (PST), Heap experienced multiple hardware failures resulting in several data availability issues for a subset of our customer base as well as ongoing data processing issues for our all customers. Data collection is unaffected and no data has been lost.

We’re currently investigating internally and working with our hosting provider to identify a root cause. We will update as soon as we’ve identified the root cause with more information including estimated resolution time frames and steps we’re taking to prevent this type of problem in the future.

In particular, the following aspects of Heap data processing will not occur until the incident is resolved:

- Data from third party Sources will not be ingested nor available for querying from the time of the incident. This data will be fully available thereafter.
- User migrations will not be processed. If an `identify` call is made to merge two users, it will not be processed and the users will remain separate. If you identify an anonymous user, the event data for the user until the point of identification and the event data after the point of identification will also remain separate during this time period. You may notice discrepancies in user count and user flows for this reason during this time period.

We’re working to determine the timeline to resolve these issues and will update as soon as we have an estimated timeline available.

A small subset of our customer base also experienced a period of failing queries during this incident. These environments are currently unavailable for querying and will remain unavailable until we can restore our machines fully from the hardware failure. Users attempting to query data in an affected environment will receive communication that the environment is unavailable for maintenance. We’ve contacted the specific customers affected by this incident to provide context and resolution timelines specific to their dataset but please do not hesitate to contact support@heapanalytics.com if you have any questions! Heap SQL syncs will not be available for affected environments until the environments are restored and available for querying in the UI. We will provide individual updates directly as soon as your dataset is available for querying.

For more context, disks on two of our user and event database machines failed unexpectedly this afternoon. We typically experience disk failure every few months for a single machine and our infrastructure is built to be robust to single machine failure as our entire dataset is replicated. This incident however caused two machine failures resulting in our inability to query a small portion of the dataset. We’re currently working with our hosting provider (AWS) to determine the root cause of the multiple disk failures to prevent future incidents. All of the missing data is available in backups and we are in the process of restoring these machines. Data will be temporarily unavailable for this small subset as mentioned and no data has been lost.

We sincerely apologize for any problems this incident may cause for you and your organization. We treat preventing incidents such as these with the highest priority and will update you as soon as we have more information. As always, please don’t hesitate to reach out to us at support@heapanalytics.com if there’s anything we can help with or any questions we can answer for you.
Posted Feb 07, 2018 - 21:41 PST
Update
Some customers may be experiencing failures when running queries. Some environments may be unavailable. We're currently looking into the cause. Data collection has not been affected and no data has been lost.
Posted Feb 07, 2018 - 16:52 PST
Investigating
Some customers may be experiencing failures when running queries. We're currently looking into the cause.
Posted Feb 07, 2018 - 15:38 PST
Monitoring
Queries should be back to normal, but we're continuing to monitor the affected systems.
Posted Feb 07, 2018 - 14:23 PST
Identified
We've identified the cause of the failing queries and are working on a fix.
Posted Feb 07, 2018 - 13:35 PST
Investigating
Some customers may be experiencing failures when running queries. We're currently looking into the cause.

Data collection has not been affected.
Posted Feb 07, 2018 - 13:21 PST