|
|
(7 intermediate revisions by the same user not shown) |
Line 16: |
Line 16: |
| == Bugzilla Tags == | | == Bugzilla Tags == |
|
| |
|
| Those are used for the classification of Socorro bugs, all starting with "V3", and those will be documented here. In brackets, there are bug counts as of 02/23.
| | Moved to https://wiki.mozilla.org/CrashKill/Plan/BugLists |
| | |
| * '''[https://bugzilla.mozilla.org/buglist.cgi?cmdtype=dorem&remaction=run&namedcmd=V3-integrity&sharer_id=5189 V3-integrity]''' (31): Affecting Crash Data Integrity, i.e. quality of the original data we have stored
| |
| * '''[https://bugzilla.mozilla.org/buglist.cgi?cmdtype=dorem&remaction=run&namedcmd=V3-search&sharer_id=5189 V3-search]''' (28): Search Capabilities
| |
| * '''[https://bugzilla.mozilla.org/buglist.cgi?cmdtype=dorem&remaction=run&namedcmd=V3-classify&sharer_id=5189 V3-classify]''' (47): Classification and Characterization of Crash Reports and Signature Generation
| |
| * '''[https://bugzilla.mozilla.org/buglist.cgi?cmdtype=dorem&remaction=run&namedcmd=V3-correlation&sharer_id=5189 V3-correlation]''' (31): Correlation reports to help identifying circumstances around the crash and steps to reproduce
| |
| * '''[https://bugzilla.mozilla.org/buglist.cgi?cmdtype=dorem&remaction=run&namedcmd=V3-trends&sharer_id=5189 V3-trends]''' (17): Trend Reports, e.g. to identify and alert teams about Explosive bugs
| |
| * '''[https://bugzilla.mozilla.org/buglist.cgi?cmdtype=dorem&remaction=run&namedcmd=V3-newreports&sharer_id=5189 V3-newreports]''' (22): New reports (requests for generating new reports)
| |
| * '''[https://bugzilla.mozilla.org/buglist.cgi?cmdtype=dorem&remaction=run&namedcmd=V3-UI&sharer_id=5189 V3-UI]''' (103): User Interface issues
| |
| * '''[https://bugzilla.mozilla.org/buglist.cgi?cmdtype=dorem&remaction=run&namedcmd=V3-UItweaks&sharer_id=5189 V3-UItweaks]''' (58): UI tweaks (probably easy to solve, small UI issues) - subgroup of V3-UI
| |
| * '''[https://bugzilla.mozilla.org/buglist.cgi?cmdtype=dorem&remaction=run&namedcmd=V3-nonHTMLoutput&sharer_id=5189 V3-nonHTMLoutput]''' (12): Non-HTML/web output (.csv, feeds, etc.)
| |
| * '''[https://bugzilla.mozilla.org/buglist.cgi?cmdtype=dorem&remaction=run&namedcmd=V3-notify&sharer_id=5189 V3-notify]''' (14): Notifications (to be) sent out by the Socorro system
| |
| * '''[https://bugzilla.mozilla.org/buglist.cgi?cmdtype=dorem&remaction=run&namedcmd=V3-infra&sharer_id=5189 V3-infra]''' (123): Infrastructure and backend issues (note: out of the direct focus of my project, subject to internal planning in the Socorro team)
| |
| * '''[https://bugzilla.mozilla.org/buglist.cgi?cmdtype=dorem&remaction=run&namedcmd=V3-config&sharer_id=5189 V3-config]''' (20): Configuration adaptations (skiplist additions, etc.)
| |
| * '''[https://bugzilla.mozilla.org/buglist.cgi?cmdtype=dorem&remaction=run&namedcmd=V3-productization&sharer_id=5189 V3-productization]''' (14): Making Socorro a product that can be deployed and understood by others (documentation, etc.)
| |
| * '''[https://bugzilla.mozilla.org/buglist.cgi?cmdtype=dorem&remaction=run&namedcmd=V3-datarequest&sharer_id=5189 V3-datarequest]''' (6): Data requests (bugs that request data through manual jobs)
| |
|
| |
|
| == Prioritization Comments From Socorro Users == | | == Prioritization Comments From Socorro Users == |
|
| |
|
| <wsmwk> KaiRo: second tier needs might be bug 421119, bug 518823, bug 578376, bug 411354. third tier: bug 527304, bug 512910, better workflow for updating skiplist)
| | Moved to https://wiki.mozilla.org/CrashKill/Plan/Priorities |
| <firebot> Bug https://bugzilla.mozilla.org/show_bug.cgi?id=421119 min, P3, 2.1, nobody, NEW, function for socorro to compare stacks of two or more crash reports
| |
| <firebot> Bug https://bugzilla.mozilla.org/show_bug.cgi?id=518823 enh, --, Future, nobody, NEW, indicate bug's status for bugzilla keyword topcrash
| |
| <firebot> Bug https://bugzilla.mozilla.org/show_bug.cgi?id=578376 nor, --, ---, nobody, NEW, multiple crashes from a single person should have less weight then many crashes from different peopl
| |
| <firebot> Bug https://bugzilla.mozilla.org/show_bug.cgi?id=411354 nor, P1, 2.0, nobody, REOP, Add ability to search by build ID
| |
| <firebot> Bug https://bugzilla.mozilla.org/show_bug.cgi?id=527304 enh, --, ---, nobody, REOP, provide smart analysis ala talkback
| |
| <firebot> Bug https://bugzilla.mozilla.org/show_bug.cgi?id=512910 enh, --, ---, nobody, NEW, Make it easier to analyze crashes that share a signature
| |
| | |
| https://bugzilla.mozilla.org/show_bug.cgi?id=551669 provide graphs by crash date too
| |
| Smokey Ardisson (way behind; no bugmail - do not email) <alqahira@ardisson.org> changed:
| |
| CC| |kairo@kairo.at
| |
| | |
| johnjbarton
| |
| Of course *all* of my crashes involve Firebug. The number one question I have when I visit crash-stats site is:
| |
| How many other users who have this crash also running Firebug?
| |
| If the answer is 95%, then I better spend some time on it because no one else will. If the answer is 5%, I'm having lunch.
| |
| | |
| Jeff Muizelaar
| |
| It would be nice if it was possible to get more summary information about a crash.
| |
| For example: What build ids does this crash all occur with? What operating system versions does this all occur with? etc.
| |
| | |
| Josh Matthews
| |
| I, like Jeff, would appreciate summaries of the data available - most recent 10 unique build ids, list of unique OS versions, range of uptimes, etc.
| |
| I would also be really interested in data about spikes - seeing a graph of the number of crashes for a particular signature over time would be useful to track trends.
| |
|
| |
|
| == Explosive Crashes == | | == Explosive Crashes == |
|
| |
|
| Notes on the work on a set of criteria for finding explosive crash reports - [https://bugzilla.mozilla.org/show_bug.cgi?id=629049 bug 629049] is the tracker bug, [https://bugzilla.mozilla.org/show_bug.cgi?id=629062 bug 629062] is detection. The [https://wiki.mozilla.org/Socorro:PRD_2.x#New_.2F_explosive_.2F_critical_crash_tracking PRD doc] has some surrounding info, but no criteria yet.
| | Moved to https://wiki.mozilla.org/CrashKill/Plan/Explosive |
| | |
| === Personal Notes ===
| |
| | |
| * Sharp/significant increase at certain wall-clock time across versions
| |
| * Sharp/significant increase at certain build ID (date?) on single version/series (possibly ignoring everything in version string starting with first letter if the version ends in "pre", to have e.g. 5.0a3pre->5.0b1pre or 4.0b11pre->4.0b12pre not disturb the analysis)
| |
| * Ignore (suspected) duplicates
| |
| * Frequency weighted by ADU more important than bare count (from something chofmann has said)
| |
| * I'm not fond of topcrash rank comparisons, as 20 crashes with similar frequency changing place looks overvalued there, while e.g. #1 having 10,000 crashes and #3 having 500 fully mask #2 exploding from 600 to 5,000 in a day.
| |
| | |
| === Criteria Proposal ===
| |
| | |
| This is a quite rough proposal right now.
| |
| | |
| # Get two sets of numbers per signature:
| |
| #* non-duplicate crashes occurred per day and total ADU for the last 10 days
| |
| #* non-duplicate crashes and ADU per combination of version series (see personal notes) and date of build ID, for the last 10 available build ID dates in the version series
| |
| # For each set, calculate (if there are at least 4 values in the set):
| |
| #* average crashes per ADU over 7 values before recent value ("base")
| |
| #* average ADU over those values ("avgADU")
| |
| #* distance of that average to the highest value in set ("dist"), clamped to a minimum of (50 crashes/avgADU)
| |
| #* recent value per ADU ("data")
| |
| #* '''(total|version)_explosiveness_1''' = (data-base)/dist
| |
| # For each set, calculate (if there are at least 6 values in the set):
| |
| #* average crashes per ADU over 7 values before recent 3 values ("base")
| |
| #* average ADU over those values ("avgADU")
| |
| #* standard deviation of that average ("dist"), clamped to a minimum of (20 crashes/avgADU)
| |
| #* average of recent 3 values per ADU ("data")
| |
| #* '''(total|version)_explosiveness_3''' = (data-base)/dist
| |
| # Mark as explosive in UI if '''*_explosiveness_1 > 3''' or '''*_explosiveness_3 > 2'''.
| |
| | |
| ==== Problems with this proposal ====
| |
| * Completely arbitrary numbers for explosiveness marking limits and "dist"-clamping, need to see if they catch all explosives and/or catch too much.
| |
| * If there's no large enough set of numbers to work with, there's no useful explosiveness.
| |
| * It's unclear if the version-based numbers give really useful additional value, they also create a multitude of explosiveness numbers to store (2 per version series).
| |
| * There might be an argument for only calculating the second (*_explosiveness_3) measure, as it's fine-grained enough to catch highly explosive crashes on the first day of explosion.
| |
| | |
| ==== Upsides of this proposal ====
| |
| * Recognizes that dupes and ADU changes can make base values fluctuate and gets rid of those problems.
| |
| * The clamping of "dist" doesn't just prohibit divisions by zero, but also deals potential skew due to tiny fluctuations in small numbers.
| |
| * Having explosiveness numbers available to UI enables flexibility in marking, sorting and changing limits.
| |
| | |
| ==== Examples ====
| |
| * Bug 554660 (see below) has an interesting example of numbers to look at for this: totals of 54, 72, 86, 83, 67, 46, 47, 123, 131 for 2010-03-08 through 2010-03-16. Here's a look at how this algorithm does, ignoring ADU, which are not given there, and therefore also the clamping:
| |
| ** On 2010-03-15, total_explosiveness_1 would have been 2.7, not yet triggering (?), and total_explosiveness_3 would have been slightly negative, also not triggering.
| |
| ** On 2010-03-16, total_explosiveness_1 would have been 1.2, not triggering, but total_explosiveness_3 would have been 2.2, triggering the warning.
| |
| ** On later days, should have triggered easily on both values, even with clamping of "dist".
| |
| | |
| A larger number of examples, including dist clamping (but no ADU) is available as an [[Media:Explosive_calc.ods|ODF spreadsheet]] ([[Media:Explosive_calc.pdf|PDF version]])
| |
| | |
| === User Comments ===
| |
| | |
| From https://wiki.mozilla.org/Socorro:PRD_Interviews
| |
| | |
| damon:
| |
| * (initial) growth of more than 25 positions in the ranking
| |
| * upwards change in rank and no related bugzilla id
| |
| * time since startup < 1 minute
| |
| * highlight these crashes in red or something
| |
| | |
| From https://bugzilla.mozilla.org/show_bug.cgi?id=525316
| |
| | |
| morgamic:
| |
| My suggestion for a delta to watch is an increase in crash frequency of more
| |
| than 50-75% and new crashes in the top 20 overall signatures by version.
| |
| | |
| === Data From Previous Explosive Crash Bugs ===
| |
| | |
| Used the [https://bugzilla.mozilla.org/buglist.cgi?status_whiteboard_type=allwordssubstr;query_format=advanced;status_whiteboard=explos explosive bug query] to find those, trying to pull info out on how those were explosive.
| |
| | |
| * https://bugzilla.mozilla.org/show_bug.cgi?id=503946
| |
| ** #16 tc (2-week) for 3.6 on 2010-01-26, #2 in 1-day, #3 in 3-day
| |
| ** crash numbers with that signature: 2010-01-24 145 (days before similar), 2010-01-25 1950, 2010-01-26 11731
| |
| ** percentage of total crashes on 2010-01-25 was 4 times as high on 3.6 as on 3.5
| |
| * https://bugzilla.mozilla.org/show_bug.cgi?id=528798
| |
| ** Rise from 7-18 null signature crashes with comments to >100 (with some days of 30s or 50s) within a week or less after the 3.5.5 release.
| |
| ** Increase in total null-signature crashes from <5000 to >6000 within 2-3 days
| |
| * https://bugzilla.mozilla.org/show_bug.cgi?id=530074
| |
| ** crashes with that signature jumped from 45-65 to 570 and higher within 3 days
| |
| * https://bugzilla.mozilla.org/show_bug.cgi?id=536974
| |
| ** jumped up 41 ranks in top crasher analysis tp #15 on 3.6b5 in 3 days
| |
| ** two signatures, both from <30 total crashes per day to >300 within a week
| |
| * https://bugzilla.mozilla.org/show_bug.cgi?id=538687
| |
| ** uptick from 45-60 (2009-12-16 to 2010-01-02, with some single-day spikes above) to multiple consecutive days with 80-100 (2010-01-03 to -07) total crashes per day
| |
| ** percentage of total crashes on 3.6 is about a factor 50-100 higher than on 3.5
| |
| ** up to 416 crashes on 2010-03-10, 600-900 on 2010-03-12 to -14, >1600 on 2010-03-15, >1000 until -19
| |
| ** #45 topcrash in 3.6b5 (2010-01-08), #8 in early 3.6.2 top crash data (2010-03-23)
| |
| * https://bugzilla.mozilla.org/show_bug.cgi?id=538998
| |
| ** From 0 to >100 in two days, from 94-115 to >3700 in one day
| |
| ** #10 tc in early 3.6rc1 reports
| |
| ** from 3-12 total crashes per hour to 100-600 with a sharp cutoff hour
| |
| * https://bugzilla.mozilla.org/show_bug.cgi?id=543646
| |
| ** from 0 to >1000 crashes in two days, staying there at least 3 days
| |
| * https://bugzilla.mozilla.org/show_bug.cgi?id=546632
| |
| ** new crash in top 30 tc on 3.6
| |
| ** from 0 to >100 in a day, from 260 to >1000 in 7 days
| |
| ** roughly factor 5 between 3.6 and other versions in percentage of total crashes
| |
| * https://bugzilla.mozilla.org/show_bug.cgi?id=547210
| |
| ** 0 to >3000 in a day, stayed >1000 for 3 days at least
| |
| * https://bugzilla.mozilla.org/show_bug.cgi?id=547622
| |
| ** Top 50 Crash for 3.6 (+149!) Firefox 3.6 Crash Report
| |
| ** started showing up around the first of November with 1-10 crashes per day, then 10-40 crashes per day in Dec, 100-150 in January, and last few days of February was running at 400-712 crashes per day
| |
| * https://bugzilla.mozilla.org/show_bug.cgi?id=553581
| |
| ** +224 positions up to Top 33 Crash for 3.6
| |
| * https://bugzilla.mozilla.org/show_bug.cgi?id=554660
| |
| ** from 45-90 to >100 in a day, >400 in 3 days and rising further (>600 in 6 days etc.)
| |
| * https://bugzilla.mozilla.org/show_bug.cgi?id=558955
| |
| ** 200-500 crashes per day in first half of April '10, up from 0-5 crashes per day in Nov '09 through early March '10
| |
| ** ~5 to >100 in 3 days, somewhat back down, then ~80 to >200 in 3 days, 150-180 to >500 in 3 days
| |
| * https://bugzilla.mozilla.org/show_bug.cgi?id=570722
| |
| ** from 7-18 to >600 in a day
| |
| * https://bugzilla.mozilla.org/show_bug.cgi?id=595957
| |
| ** from 30-90 to >3000 in 3 days
| |
|
| |
|
| == Reports == | | == Reports == |
|
| |
|
| Some tools and reports on crash data are currently outside the main Socorro systems:
| | Moved to https://wiki.mozilla.org/CrashKill/Plan#External_Reports |
| * List of Firefox nightly builds, liked to crash queries per build: http://dbaron.org/mozilla/crashes-by-build
| |
| * Top crashes per day: http://people.mozilla.org/~chofmann/crash-stats/
| |
| * Correlation analysis (CPU cores, modules, add-ons): http://people.mozilla.com/crash_analysis/20110228/
| |
| * Top second frame: http://people.mozilla.com/~aphadke/topSecondFrame/
| |