Veridian 3: Difference between revisions

From KaiRoWiki
Jump to navigation Jump to search
 
(42 intermediate revisions by the same user not shown)
Line 11: Line 11:
* Improving Search Capabilities
* Improving Search Capabilities
* Improving Classification and Characterization of Crash Reports and Improved Signature Generation
* Improving Classification and Characterization of Crash Reports and Improved Signature Generation
* additional correlation reports to help identifying circumstances around the crash and steps to reproduce. [tag includes everything on correlations]
* additional correlation reports to help identifying circumstances around the crash and steps to reproduce
* Improve Trend Reports to identify and alert teams about Explosive bugs [tag includes all trend reports]
* Improve Trend Reports to identify and alert teams about Explosive bugs


== Bugzilla Tags ==
== Bugzilla Tags ==


Those are used for the classification of Socorro bugs, all starting with "V3", and those will be documented here. In brackets, there are bug counts as of 02/23.
Moved to https://wiki.mozilla.org/CrashKill/Plan/BugLists
 
* '''[https://bugzilla.mozilla.org/buglist.cgi?cmdtype=dorem&remaction=run&namedcmd=V3-integrity&sharer_id=5189 V3-integrity]''' (32): Affecting Crash Data Integrity, i.e. quality of the original data we have stored
* '''[https://bugzilla.mozilla.org/buglist.cgi?cmdtype=dorem&remaction=run&namedcmd=V3-search&sharer_id=5189 V3-search]''' (29): Search Capabilities
* '''[https://bugzilla.mozilla.org/buglist.cgi?cmdtype=dorem&remaction=run&namedcmd=V3-classify&sharer_id=5189 V3-classify]''' (46): Classification and Characterization of Crash Reports and Signature Generation
* '''[https://bugzilla.mozilla.org/buglist.cgi?cmdtype=dorem&remaction=run&namedcmd=V3-correlation&sharer_id=5189 V3-correlation]''' (33): Correlation reports to help identifying circumstances around the crash and steps to reproduce
* '''[https://bugzilla.mozilla.org/buglist.cgi?cmdtype=dorem&remaction=run&namedcmd=V3-trends&sharer_id=5189 V3-trends]''' (18): Trend Reports, e.g. to identify and alert teams about Explosive bugs
* '''[https://bugzilla.mozilla.org/buglist.cgi?cmdtype=dorem&remaction=run&namedcmd=V3-newreports&sharer_id=5189 V3-newreports]''' (18): New reports (requests for generating new reports) - currently in V3-classify
* '''[https://bugzilla.mozilla.org/buglist.cgi?cmdtype=dorem&remaction=run&namedcmd=V3-UI&sharer_id=5189 V3-UI]''' (102): User Interface issues
* '''[https://bugzilla.mozilla.org/buglist.cgi?cmdtype=dorem&remaction=run&namedcmd=V3-nonHTMLoutput&sharer_id=5189 V3-nonHTMLoutput]''' (11): Non-HTML/web output (.csv, feeds, etc.)
* '''[https://bugzilla.mozilla.org/buglist.cgi?cmdtype=dorem&remaction=run&namedcmd=V3-notify&sharer_id=5189 V3-notify]''' (14): Notifications (to be) sent out by the Socorro system
* '''[https://bugzilla.mozilla.org/buglist.cgi?cmdtype=dorem&remaction=run&namedcmd=V3-infra&sharer_id=5189 V3-infra]''' (111): Infrastructure and backend issues (note: out of the direct focus of my project, subject to internal planning in the Socorro team)
* '''[https://bugzilla.mozilla.org/buglist.cgi?cmdtype=dorem&remaction=run&namedcmd=V3-config&sharer_id=5189 V3-config]''' (21): Configuration adaptations (skiplist additions, etc.)
* '''[https://bugzilla.mozilla.org/buglist.cgi?cmdtype=dorem&remaction=run&namedcmd=V3-productization&sharer_id=5189 V3-productization]''' (14): Making Socorro a product that can be deployed and understood by others (documentation, etc.)
* '''[https://bugzilla.mozilla.org/buglist.cgi?cmdtype=dorem&remaction=run&namedcmd=V3-datarequest&sharer_id=5189 V3-datarequest]''' (7): Data requests (bugs that request data through manual jobs)
 
Planned but not yet created (or not yet fully done) categories/tags:
 
* '''[https://bugzilla.mozilla.org/buglist.cgi?cmdtype=dorem&remaction=run&namedcmd=V3-UItweaks&sharer_id=5189 V3-UItweaks]''' (): UI tweaks (probably easy to solve, small UI issues) - subgroup of V3-UI


== Prioritization Comments From Socorro Users ==
== Prioritization Comments From Socorro Users ==


<wsmwk> KaiRo: second tier needs might be bug 421119, bug 518823, bug 578376, bug 411354.  third tier: bug 527304, bug 512910, better workflow for updating skiplist)
Moved to https://wiki.mozilla.org/CrashKill/Plan/Priorities
<firebot> Bug https://bugzilla.mozilla.org/show_bug.cgi?id=421119 min, P3, 2.1, nobody, NEW, function for socorro to compare stacks of two or more crash reports
<firebot> Bug https://bugzilla.mozilla.org/show_bug.cgi?id=518823 enh, --, Future, nobody, NEW, indicate bug's status for bugzilla keyword topcrash
<firebot> Bug https://bugzilla.mozilla.org/show_bug.cgi?id=578376 nor, --, ---, nobody, NEW, multiple crashes from a single person should have less weight then many crashes from different peopl
<firebot> Bug https://bugzilla.mozilla.org/show_bug.cgi?id=411354 nor, P1, 2.0, nobody, REOP, Add ability to search by build ID
<firebot> Bug https://bugzilla.mozilla.org/show_bug.cgi?id=527304 enh, --, ---, nobody, REOP, provide smart analysis ala talkback


== Explosive Crashes ==
== Explosive Crashes ==


Notes on the work on a set of criteria for finding explosive crash reports - [https://bugzilla.mozilla.org/show_bug.cgi?id=629049 bug 629049] is the tracker bug, [https://bugzilla.mozilla.org/show_bug.cgi?id=629062 bug 629062] is detection. The [https://wiki.mozilla.org/Socorro:PRD_2.x#New_.2F_explosive_.2F_critical_crash_tracking PRD doc] has some surrounding info, but no criteria yet.
Moved to https://wiki.mozilla.org/CrashKill/Plan/Explosive
 
=== Personal Notes ===
 
* Sharp increase at certain wall-clock time across versions
* Sharp increase at certain build ID (date?) on single version/series (possibly ignoring everything in version string starting with first letter if the version ends in "pre", to have e.g. 5.0a3pre->5.0b1pre or 4.0b11pre->4.0b12pre not disturb the analysis)
* Ignore (suspected) duplicates
* Frequency weighted by ADU more important than bare count (from something chofmann has said)
* I'm not fond of topcrash rank comparisons, as 20 crashes with similar frequency changing place looks overvalued there, while e.g. #1 having 10,000 crashes and #3 having 500 fully mask #2 exploding from 600 to 5,000 in a day.
 
=== Criteria Proposal ===
 
This is a quite rough proposal right now.
 
# Get two sets of numbers per signature:
#* crashes occurred per day and total ADU for the last 10 days
#* crashes and ADU per combination of version series (see personal notes) and date of build ID, for the last 10 available build ID dates in the version series
# Calculate:
#* average crashes per 1 million ADU over 7 values before recent value ("base")
#* distance of that average to the highest value in set ("dist"), clamped to 1 crash/MADU
#* recent value ("data")
#* '''(total|version)_explosiveness_1''' = (data-base)/(dist-base)
# Calculate:
#* average crashes per 1 million ADU over 7 values before recent 3 values ("base")
#* standard deviation of that average ("dist"), clamped to .5 crash/MADU
#* average of recent 3 values ("data")
#* '''(total|version)_explosiveness_3''' = (data-base)/(dist-base)
# Mark as explosive in UI if '''*_explosiveness_1 > 3''' or '''*_explosiveness_3 > 2'''.
 
Problems with this proposal:
* Completely arbitrary numbers, need to see if they catch all explosives and/or catch too much
 
=== User Comments ===
 
From https://wiki.mozilla.org/Socorro:PRD_Interviews
 
damon:
  * (initial) growth of more than 25 positions in the ranking
  * upwards change in rank and no related bugzilla id
  * time since startup < 1 minute
  * highlight these crashes in red or something
 
From https://bugzilla.mozilla.org/show_bug.cgi?id=525316
 
morgamic:
  My suggestion for a delta to watch is an increase in crash frequency of more
  than 50-75% and new crashes in the top 20 overall signatures by version.
 
=== Data From Previous Explosive Crash Bugs ===


Used the [https://bugzilla.mozilla.org/buglist.cgi?status_whiteboard_type=allwordssubstr;query_format=advanced;status_whiteboard=explos explosive bug query] to find those, trying to pull info out on how those were explosive.
== Reports ==


* https://bugzilla.mozilla.org/show_bug.cgi?id=503946
Moved to https://wiki.mozilla.org/CrashKill/Plan#External_Reports
** #16 tc (2-week) for 3.6 on 2010-01-26, #2 in 1-day, #3 in 3-day
** crash numbers with that signature: 2010-01-24 145 (days before similar), 2010-01-25 1950, 2010-01-26 11731
** percentage of total crashes on 2010-01-25 was 4 times as high on 3.6 as on 3.5
* https://bugzilla.mozilla.org/show_bug.cgi?id=528798
** Rise from 7-18 null signature crashes with comments to >100 (with some days of 30s or 50s) within a week or less after the 3.5.5 release.
** Increase in total null-signature crashes from <5000 to >6000 within 2-3 days
* https://bugzilla.mozilla.org/show_bug.cgi?id=530074
** crashes with that signature jumped from 45-65 to 570 and higher within 3 days
* https://bugzilla.mozilla.org/show_bug.cgi?id=536974
** jumped up 41 ranks in top crasher analysis tp #15 on 3.6b5 in 3 days
** two signatures, both from <30 total crashes per day to >300 within a week
* https://bugzilla.mozilla.org/show_bug.cgi?id=538687
** uptick from 45-60 (2009-12-16 to 2010-01-02, with some single-day spikes above) to multiple consecutive days with 80-100 (2010-01-03 to -07) total crashes per day
** percentage of total crashes on 3.6 is about a factor 50-100 higher than on 3.5
** up to 416 crashes on 2010-03-10, 600-900 on 2010-03-12 to -14, >1600 on 2010-03-15, >1000 until -19
** #45 topcrash in 3.6b5 (2010-01-08), #8 in early 3.6.2 top crash data (2010-03-23)
* https://bugzilla.mozilla.org/show_bug.cgi?id=538998
** From 0 to >100 in two days, from 94-115 to >3700 in one day
** #10 tc in early 3.6rc1 reports
** from 3-12 total crashes per hour to 100-600 with a sharp cutoff hour
* https://bugzilla.mozilla.org/show_bug.cgi?id=543646
** from 0 to >1000 crashes in two days, staying there at least 3 days
* https://bugzilla.mozilla.org/show_bug.cgi?id=546632
** new crash in top 30 tc on 3.6
** from 0 to >100 in a day, from 260 to >1000 in 7 days
** roughly factor 5 between 3.6 and other versions in percentage of total crashes
* https://bugzilla.mozilla.org/show_bug.cgi?id=547210
** 0 to >3000 in a day, stayed >1000 for 3 days at least
* https://bugzilla.mozilla.org/show_bug.cgi?id=547622
** Top 50 Crash for 3.6 (+149!) Firefox 3.6 Crash Report
** started showing up around the first of November with 1-10 crashes per day, then 10-40 crashes per day in Dec, 100-150 in January, and last few days of February was running at 400-712 crashes per day
* https://bugzilla.mozilla.org/show_bug.cgi?id=553581
** +224 positions up to Top 33 Crash for 3.6
* https://bugzilla.mozilla.org/show_bug.cgi?id=554660
** from 45-90 to >100 in a day, >400 in 3 days and rising further (>600 in 6 days etc.)
* https://bugzilla.mozilla.org/show_bug.cgi?id=558955
** 200-500 crashes per day in first half of April '10, up from 0-5 crashes per day in Nov '09 through early March '10
** ~5 to >100 in 3 days, somewhat back down, then ~80 to >200 in 3 days, 150-180 to >500 in 3 days
* https://bugzilla.mozilla.org/show_bug.cgi?id=570722
** from 7-18 to >600 in a day
* https://bugzilla.mozilla.org/show_bug.cgi?id=595957
** from 30-90 to >3000 in 3 days

Latest revision as of 18:20, March 16, 2011

On a planet called Veridian III, a decisive battle was fought to prevent a future firing of a rocket into a star that would change gravitational forces and make "the nexus" crash into the planet as well as destroy the planet with a shock wave. Preventing this catastrophy made the crash of the USS Enterprise on the planet controllable as to suffer no human casualties.

In the same spirit, the project I internally dub "Veridian 3" is about dealing with crashes to make the bad ones preventable and other ones more controllable, all through prioritizing Socorro work.

Project areas

Those areas have been specified in the contract:

  • Improving Crash Data Integrity
    • Identification and Removal of Duplicate Crash reports
  • Improving Search Capabilities
  • Improving Classification and Characterization of Crash Reports and Improved Signature Generation
  • additional correlation reports to help identifying circumstances around the crash and steps to reproduce
  • Improve Trend Reports to identify and alert teams about Explosive bugs

Bugzilla Tags

Moved to https://wiki.mozilla.org/CrashKill/Plan/BugLists

Prioritization Comments From Socorro Users

Moved to https://wiki.mozilla.org/CrashKill/Plan/Priorities

Explosive Crashes

Moved to https://wiki.mozilla.org/CrashKill/Plan/Explosive

Reports

Moved to https://wiki.mozilla.org/CrashKill/Plan#External_Reports