Mark Runals .conf2014 Presentation

Copyright © 2014 Splunk Inc. Taming Your Data Mark Runals Sr Security Engineer The Ohio State University Disclaimer During the course of this presentaFon, we may make forward-­‐looking statements regarding future events or the expected performance of the company. We cauFon you that such statements reflect our current expectaFons and esFmates based on factors currently known to us and that actual events or results could differ materially. For important factors that may cause actual results to differ from those contained in our forward-­‐looking statements, please review our filings with the SEC. The forward-­‐looking statements made in the this presentaFon are being made as of the Fme and date of its live presentaFon. If reviewed aRer its live presentaFon, this presentaFon may not contain current or accurate informaFon. We do not assume any obligaFon to update any forward-­‐looking statements we may make. In addiFon, any informaFon about our roadmap outlines our general product direcFon and is subject to change at any Fme without noFce. It is for informaFonal purposes only, and shall not be incorporated into any contract or other commitment. Splunk undertakes no obligaFon either to develop the features or funcFonality described or to include any such feature or funcFonality in a future release. 2 Disclaimer During the course of this presentaFon, we may make forward looking statements regarding future events or the expected performance of the company. We cauFon you that such statements reflect our current expectaFons and esFmates based on factors currently known to us and that actual events or results could differ materially. For important factors that may cause actual results to differ from those contained in our forward-­‐looking statements, please review our filings with the SEC. The forward-­‐looking statements made in the this presentaFon are being made as of the Fme and date of its live presentaFon. If reviewed aRer its live presentaFon, this presentaFon may not contain current or accurate informaFon. We do not assume any obligaFon to update any forward looking statements we may make. In addiFon, any informaFon about our roadmap outlines our general product direcFon and is subject to change at any Fme without noFce. It is for informaFonal purposes only and shall not, be incorporated into any contract or other commitment. Splunk undertakes no obligaFon either to develop the features or funcFonality described or to include any such feature or funcFonality in a future release. 3 Agenda ! 
! 
! 
OSU Splunk deployment – environmental background Props/field extracFon score methodology Look at data curator app FYI -­‐ Splunk Admin Focused PresentaFon 4 Some Background & Program Drivers OSU Environment Incredible roll-­‐on/
adopFon rate 135 Distributed IT units around OSU • 
• 
• 
• 
Each group is autonomous No standardizaFon Huge variety of technologies Splunk use not mandatory = + Desired lightweight onboarding process • 
For units & for Splunk team 5 Fast Forward a Year or 2 +/-­‐ ! 
! 
! 
! 
! 
! 
! 
! 
2TB Of data 1,800+ Splunk agents 10k Devices 12 Types of firewalls MulFple OS 90+ Teams with data in Splunk 700+ Sourcetypes – many ‘learned’ 350+ People 6 Fast Forward a Year or 2 +/-­‐ ! 
! 
! 
! 
! 
! 
! 
! 
2TB Of data 1,800+ Splunk agents 10k Devices 12 Types of firewalls MulFple OS 90+ Teams with data in Splunk 700+ Sourcetypes – many ‘learned’ 350+ People 7 Is data being ingested correctly? What fields have been defined? Where? What types of data are in Splunk? What’s not configured correctly? Issue Overview Out of the box and without specific data definiFon Splunk will generally ingest data correctly •  Host names •  Sourcetypes •  Timestamp •  Line breaking •  Auto key-­‐value fields At best though, this isn’t efficient. At worst, it can strain your deployment and may drop/lose events Factors in play •  Hardware •  RaFo of indexers to total log volume •  Sourcetype velocity •  Data distribuFon (forwarders pre 5.0.4 will favor first indexer listed in autoLB outputs.conf) •  Weird date/Fme informaFon in your logs •  Etc… 8 Data Import/DefiniFon Pipeline (Mark’s View) Get Data to Splunk Data Management DM = Index Time Processing •  Sourcetyping •  Line breaking •  Timestamp •  Host field •  etc Knowledge Management KM = Search Time Processing •  Base level field extracFon •  Normalized field names •  Field name alignment within Common InformaFon Model (CIM) •  Knowledge objects 9 The Plan Score based on ‘Gepng Data in Correctly’ .conf 2012 preso Data Management Knowledge Management Data Curator App Score based on length of fields relaFve to _raw length (conversaFon with Kevin Meeks) IdenFfy Common Issues Munge through internal logs Data Taxonomy Create way to classify sourcetypes 10 Data Management – Props Score [mah_data_stanza] TIME_PREFIX = MAX_TIMESTAMP_LOOKAHEAD = TIME_FORMAT = SHOULD_LINEMERGE = LINE_BREAKER = TRUNCATE = TZ = 11 Data Management – Props Score [mah_data_stanza] TIME_PREFIX = +1
MAX_TIMESTAMP_LOOKAHEAD = +1
TIME_FORMAT = +1
SHOULD_LINEMERGE = LINE_BREAKER = TRUNCATE = TZ = 12 OR
DATETIME_CONFIG = +3
Data Management – Props Score [mah_data_stanza] TIME_PREFIX = MAX_TIMESTAMP_LOOKAHEAD = TIME_FORMAT = SHOULD_LINEMERGE = False +1
LINE_BREAKER = TRUNCATE = TZ = ….but what if my data should be merged? 13 Data Management – Props Score [mah_data_stanza] TIME_PREFIX = MAX_TIMESTAMP_LOOKAHEAD = TIME_FORMAT = SHOULD_LINEMERGE = True LINE_BREAKER = TRUNCATE = TZ = AND
+1
14 One of these is populated BREAK_ONLY_BEFORE MUST_BREAK_AFTER MUST_NOT_BREAK_BEFORE MUST_NOT_BREAK_AFTER Data Management – Props Score [mah_data_stanza] TIME_PREFIX = MAX_TIMESTAMP_LOOKAHEAD = TIME_FORMAT = SHOULD_LINEMERGE = LINE_BREAKER = +1
TRUNCATE = TZ = Default is ([\r\n\]+) Don’t want to line break? ((?!)) or ((*FAIL)) are a couple opFons* *hyp://answers.splunk.com/answers/106075/each-­‐file-­‐as-­‐one-­‐single-­‐splunk-­‐event 15 Data Management – Props Score [mah_data_stanza] TIME_PREFIX = MAX_TIMESTAMP_LOOKAHEAD = TIME_FORMAT = SHOULD_LINEMERGE = LINE_BREAKER = TRUNCATE = +1
TZ = Default is 10000 +0
Game your score! Ø  Set this to anything other than the default i.e. 10001 or 999999 16 Data Management – Props Score [mah_data_stanza] TIME_PREFIX = MAX_TIMESTAMP_LOOKAHEAD = TIME_FORMAT = SHOULD_LINEMERGE = LINE_BREAKER = TRUNCATE = TZ = +1
If sepng this across your environment isn’t possible/pracFcal reduce the max score macro in the app. It’s used as a variable. Macro: props_score_upper_bounds = \
7 6
17 Data Management – Props Score [mah_data_stanza] TIME_PREFIX = MAX_TIMESTAMP_LOOKAHEAD = TIME_FORMAT = SHOULD_LINEMERGE = LINE_BREAKER = TRUNCATE = TZ = Max Score = 7
(st_score * `props_score_scale`) / `props_score_upper_bounds` 18 10
Props Score Caveats There are a lot of addiFonal props sepngs that could be applicable for your data/environment. This method/app doesn’t address host fields that are incorrect Default host field? syslog Splunk UF 19 Props Score Caveats There are a lot of addiFonal props sepngs that could be applicable for your data/environment. This method/app doesn’t address host fields that are incorrect Default host field? syslog Splunk UF 20 Field ExtracFon Score Methodology 10.10.10.10 -­‐ -­‐ [20/Aug/2014:13:44:03.151 -­‐0400] "POST /services/broker/phonehome/
connecFon_10.10.10.10_8089_10.10.10.10_TEST-­‐TS_68D82260-­‐CC1D-­‐4203-­‐83CA-­‐6E24F9FE6538 HTTP/1.0" 200 24 -­‐ -­‐ -­‐ 1ms Length of Fields 1. 
2. 
3. 
4. 
Account for any autokv field names Do convoluted search to get length of fields Account for Fmestamp in log Get total length _raw length 1.  Remove spaces 2.  Remove newline characters 3.  Get _raw length 21 = % of Event has Fields Defined Field ExtracFon Score Methodology 11
10.10.10.10 -­‐ -­‐ [20/Aug/2014:13:44:03.151 -­‐0400] "POST /services/broker/phonehome/
connecFon_10.10.10.10_8089_10.10.10.10_TEST-­‐TS_68D82260-­‐CC1D-­‐4203-­‐83CA-­‐6E24F9FE6538 HTTP/1.0" 200 24 -­‐ -­‐ -­‐ 1ms 11
4
11
7
36
8
3
2
3
Length of Fields 1. 
2. 
3. 
4. 
Account for any autokv field names Do convoluted search to get length of fields Account for Fmestamp in log Get total length _raw length 1.  Remove spaces 2.  Remove newline characters 3.  Get _raw length 22 = % of Event has Fields Defined Field ExtracFon Score Methodology 11
10.10.10.10 -­‐ -­‐ [20/Aug/2014:13:44:03.151 -­‐0400] "POST /services/broker/phonehome/
connecFon_10.10.10.10_8089_10.10.10.10_TEST-­‐TS_68D82260-­‐CC1D-­‐4203-­‐83CA-­‐6E24F9FE6538 HTTP/1.0" 200 24 -­‐ -­‐ -­‐ 1ms 11
4
11
7
36
8
3
2
3
Length of Fields 1. 
2. 
3. 
4. 
_raw length Account for any autokv field names Do convoluted search to get length of fields Account for Fmestamp in log Get total length 1.  Remove spaces 2.  Remove newline characters 3.  Get _raw length = % of Event has Fields Defined * Not a great example – Splunk forwarder phonehome logs actually have +100% field length compared to _raw 23 Field ExtracFon Score Methodology Caveats/ConsideraFons Doesn’t account for field alias (will arFficially inflate score) If field extracFon % is over 100 the score is set to 100 DirecFonally correct is about the best this will get Ø  F ields extracted != field value 24 Data Taxonomy Version 1 – deprecated out of the box Designed to answer “What type of data is in Splunk?” Created a 2nd field classificaFon csv for several hundred sourcetypes • 
• 
Data family Data subtype Very useful but too many one-­‐to-­‐many relaFonships based on data use netstat Server Monitoring Server InformaFon Server ConfiguraFon Server Performance ConfiguraFon? Networking? 25 Too many server * Data Taxonomy – InteracFve Host Dashboard Host A 26 Data Taxonomy – InteracFve Host Dashboard Host B 27 Data Curator App Goals Take Note! • 
Flexible scoring scale • 
Will NOT tell you what the sepngs should be • 
Generate aggregate, system maturity scores • 
Generate ~accurate individual maturity score • 
• 
Requires Splunk 6 search head Only able to work through issues I saw in my • 
Show what app/package contained props sepngs • 
Show current props sepngs • 
Highlight issues related to/solvable by props sepngs – 
– 
– 
environment -­‐ you may have others. • 
I can troubleshoot my app – not your deployment =) Line breaking Timestamp Transforms issues 28 Deployment At A Glance 29 Props Score Breakdown Holy
C
Lots rap!! of W
ork ….but before you slit your wrists 30 Props Score Breakdown 31 Learned Sourcetypes (-­‐too_small OR -­‐#) Beware of diminishing returns on working the ‘long tail’ 32 Sourcetype Deep Dive Dashboard Avamar Logs 33 Sourcetype Deep Dive Dashboard Avamar Logs Not all items factor into score 34 Sourcetype Deep Dive Dashboard Avamar Logs Loaded score based on volume of events per punct. Score created on the fly 35 Sourcetype Deep Dive Dashboard Based on volume of events per punct. Quick way to see how unique logs in a parFcular sourcetype are. Avamar Logs Had 75 unique punct 36 Sourcetype Deep Dive Dashboard ABDCB (learned) 37 Sourcetype Deep Dive Dashboard Argus 38 IdenFfying Date/Time Issues 39 IdenFfying Date/Time Issues These events don’t have Fmestamps! 40 IdenFfying Date/Time Issues These events don’t have Fmestamps! What if Splunk thinks the last known good Fmestamp was 6 years ago? 41 IdenFfying Date/Time Issues These events don’t have Fmestamps! What if Splunk thinks the last known good Fmestamp was 6 years ago? 42 Date/Time Workspace Dashboard Pre-­‐populated with sourcetypes having issues (No Fme informaFon set) AddiFonal Dashboard Elements •  Clustered internal logs giving you a level of visibility •  100 most recent events 43 (DATETIME_CONFIG added to view aRer screenshot) Line Breaking/Truncate Workspace Dashboard 44 Line Breaking/Truncate Workspace Dashboard 45 Line Breaking Sanity Check Dashboard Sourcetypes have line breaking set but have mulFple line counts in recent events 46 Line Breaking Sanity Check Sourcetypes have line breaking set but have mulFple line counts in recent events Set in mulFple apps; potenFal problem down the road? 47 Query TroubleshooFng Two main scheduled searches that are somewhat computaFonally expensive. Dashboard allows admin to compare run length & frequency to coverage Sourcetype field length percentage query 48 Extract/Report/Transforms Issues Example Internal Warning Logs 08-­‐21-­‐2014 08:55:46.348 -­‐0400 WARN SearchOperator:kv -­‐ IndexOutOfBounds invalid The FORMAT capturing group id: id=7, transform_name='Message' 08-­‐21-­‐2014 08:59:02.854 -­‐0400 WARN SearchOperator:kv -­‐ Invalid key-­‐value parser, ignoring it, transform_name='extract_cmd_change' 08-­‐21-­‐2014 08:59:03.345 -­‐0400 WARN SearchOperator:kv -­‐ Invalid key-­‐value parser, ignoring it, transform_name='(?i)^(?:[^\|]*\|){3}(?P<dest_domain>[^\|]+)' …wut? Which app? In props or transforms? SoluFon: grep -­‐r through 520+ packages in deployment-­‐apps directory for ‘Message’? 49 Extract/Report/Transforms Issues 50 Extract/Report/Transforms Issues Only 5 tokens
51 Extract/Report/Transforms Issues Anyone know what the issue is? 52 Extract/Report/Transforms Issues Should be an EXTRACT 53 KM – Sourcetype Fields Comparison Boyom of explanatory text. There is a freeform text search box at top of dashboard 54 App Roadmap Now •  Props maturity scores •  Field extracFon scores •  Issues workspaces •  Data taxonomy Next •  Dashboard opFmizaFon (ie searchTemplate) •  Tag based data taxonomy •  Any iniFal app bug fixes RelaFvely non-­‐scaling ARer Next •  Tie in data model fields •  Field value? •  Expand issue troubleshooFng Based on community feedback 55 ? .conf 14 updated Ge8ng Data in Correctly presentaFon– Andrew Duca Blog: runals.blogspot.com Check out the Forwarder Health app in Splunkbase 56 THANK YOU