Media logging manual
Workflows
User submitting debug info
Basically:
get samples
run a script
submit that directory
This is detailed in tools/ffmpeg/readme_run_tests.txt
.
Changing code in pyglet ffmpeg subsystem
Preparation like in readme_run_tests.txt, optionally install the library bokeh (http://bokeh.pydata.org/en/latest/index.html) for visualization support.
The basic flow goes as:
initialize the active session subsystem: set environment variable
pyglet_mp_samples_dir
to the desired samples_dir.record a session with the initial state:
configure.py new <session> [playlist] run_test_suite.py
Follow this workflow
while True: edit code commit to hg record a new session: configure.py new <new session> [playlist] run_test_suite.py look at the last session reports in samples_dir/session/reports especially 00_summary.txt, which shows defects stats and list condensed info about any sample failing; then to look more details look at the individual reports. compare with prev sessions if desired: compare.py <session1> <session2> render additional reports: report.py sample or visualize the data collected with: bokeh_timeline.py sample if results are as wanted, break done, you may want to delete sessions for intermediate commits
It is possible to return to a previous session to request additional reports:
configure.py activate <session>
report.py ...
You can list the known sessions for the current samples_dir with:
configure.py list
Important
All this debugging machinery depends on a detailed and accurate capture of media_player related state, currently in examples/media_player.py and pyglet.media.player.
Modifications in those modules may require matching modifications in pyglet/media/sources/instrumentation.py, and further propagation to other modules.
Changing the debug code for pyglet ffmpeg
For initial debugging of debug code, where there are misspellings and trivial errors to weed out, creating a new session for each run_test_suite.py run may be inconvenient.
The flag dev_debug
can be set to true in the session configuration file;
this will allow to rewrite the session.
Keep in mind that some raw data will be stale or misleading:
The ones captured at session creation time (currently pyglet.info and pyglet_changeset)
The collected crashes info (new crashes will not be seen)
If media_player.py crashes before doing any writing, the state recording will be the previous recording.
The reports using that stale raw data will obviously report stale data.
So it is a good idea to switch to a normal workflow as soon as posible (simply creating a new session and deleting the special session).
Session
If playlist_file
is not specified, then all files in samples_dir, except
for the files with extension “.dbg”, “.htm”, “.html”, “.json”, “.log”, “.pkl”,
“.py”, “.txt” will make the implicit playlist; subdirectories of samples_dir
will not be explored.
If a playlist_file
is specified, then it should contain one sample name
per line; a sanity check will be performed ensuring no blacklisted extension
is used, and that the sample exists in samples_dir.
Once the playlist_file
is used in configure.py new
a copy is writen to
the session raw data directory, and this copy will be the authoritative
playlist for the session; playlist_file
can be deleted if desired.
Specifying a playlist is useful in development to restrict the tests to samples relevant to the feature or issue under work.
The session name will be used to create a samples_dir subdir to store the test
results, hence it should be different of previous sessions names, and it must
not contain slashes, /
, backslashes \
or characters forbidden in
directory names.
Active session
Most commands and subcommands target the currently active session.
A session becomes active when
a
configure.py new session [playlist]
is issueda
configure.py activate session
is issued
The current implementation relies in two pieces of data to determine the active session
the environment variable
pyglet_mp_samples_dir
specifies samples_dir, the directory where all the media samples reside. Under the current paths schema is also where session data will be stored, one subdir per session.a file
activation.json
in samples_dir storing the name for the current active session.
Notice that the second precludes running two commands in parallel targeting two different sessions in the same sample_dir.
The concept of active session plus the enforced path schema avoids the need to provide paths at each command invocation, making for less errors, easier docs and less typing.
Commands Summary
Primary commands
They are the ones normally used by developers
configure.py
, mp.py
: session creation, activation, protection, status
and list all.
run_test_suite.py
: plays session’s samples, reports results.
report.py
: produces the specified report for the specified sample.
timeline.py
: translates the event stream to a stream of media_player
state, useful to pass to other software.
bokeh_timeline.py
: visualization of data collected for the specified
sample.
Helper commands
Somehow an artifact of run_test_suite.py
development, can help in testing
the debugging subsystem. run_test_suite.py
is basically playmany.py +
retry_crashed.py + summarize.py
. When trying to change run_test_suite.py
it is easier to first adapt the relevant helper.
playmany.py
: plays active session samples, recording media_player state
along the play.
retry_crashed.py
: plays again samples that have been seen always
crashing, hoping to get a recording with no crash. Motivated by early tests on
Ubuntu, where sometimes (but not always) a sample will crash the media_player.
summarize.py
: using the raw data produced by the two previous commands
elaborates some reports, aiming to give an idea of how well the run was and
what samples should be investigated.
Data directory layout
samples_dir/ : directory where the samples live, also used to store
sessions data
<session name>/ : directory to store session info, one per session,
named as the session.
dbg/ : recording of media_player events captured while playing a
sample, one per sample, named as sample.dbg; additional
versioning info, other raw data collected.
_crashes_light.pkl : pickle with info for retry crashed
_pyglet_hg_revision.txt
_pyglet_info.txt
_samples_version.txt
_session_playlist.txt
<one .dbg file per sample in the session playlist, named sample.dbg>
reports/ : human readable reports rendered from the raw data (.txt),
visualizations (.html), intermediate data used by other
tools(.pkl)
configuration.json : session configuration info, mostly permissions
activation.json : holds the name of current active session
<sample> : one for each sample
A subdirectory of samples_dir is detected as a session dir if:
it is a direct child of session dir
it has a
configuration.json
file
policies:
it should be hard to rewrite the .dbg files (recordings of media_player states)
think of dev analyzing data sent by an user.
Code Layout and conventions
The emerging separation of responsabilities goes like
Scripts (commands)
Structured as:
uses
if __main__
idiom to allow use as module (testing, sharing)
sysargs_to_mainargs()
:sys.argv
translation tomain
params
main(...)
params validation and translation to adequate code entities (uses module
fs
).translates exceptions to prints (uses module
mpexceptions
)short chain of instantiations / function calls to accomplish the command goals, no logic or calculations here.
other functions and classes: code specific to this command, delegates as much as possible to modules.
When two scripts use some related but not identical functionality, these parts
can be moved to another module. Example: at first summarize
had the code to
collect defects stats, later, when compare
was writen, the module
extractors
was added and the defect collection stats code moved to that
module.
If script B needs a subset of unchanged script A functionality, it imports A
and uses what it needs. Example is retry_crashed
, will call into
playmany
.
Because of the last point, some scripts will also be listed as modules.
Modules
buffered_logger
Accumulation of debug events while playing media_player, saves when sample’s play ends
instrumentation
Defines the events that modify media_player state. Defines which events are potential defects. Gives the low level support to extract info from the recorded data.
For new code here, keep accepting and returning only data structures, no paths or files.
fs
Path building for entities into a session directory should be delegated to
fs.PathServices
.
Session’s creation, activation and management at start of fs
.
Versions capture are handled at start of module fs
.
Utility functions to load - save at the end of fs
.
While there isn’t a Session
object, in practice the code identifies and
provides access to a particular session data by handling a fs.PathServices
instance.
extractors
Analyzes a media_player recording to build specific info on behalf of
reporters. Uses instrumentation
to get input data about the media_player
state sequence seen while playing a sample.
Defines object types to collect some specific info about a replay.
reports
Formats as text info captured / generated elsewhere.
mpexceptions
Defines exceptions generated by code in the ffmpeg debug subsystem.
Scripts that also acts as modules
timeline
Renders the media player’s debug info to a format more suitable to postprocess
in a spreadsheets or other software, particularly to get a data visualization.
(used by bokeh_timeline.py
)
playmany
Produces media_player debug recordings. Runs python scripts as subprocesses with a timeout (used by retry_crashed.py).
Commands detailed
bokeh_timeline.py
Usage:
bokeh_timeline.py sample
Renders media player’s internal state graphically using bokeh.
Arguments:
sample: sample to report
The output will be written to session’s output dir under
reports/sample.timeline.html
.
Notice the plot can be zoomed live with the mouse wheel, but you must click the button that looks as a distorted OP; it also does pan with mouse drag.
Example:
bokeh_timeline.py small.mp4
will write the output to report/small.mp4.timeline.html
.
compare.py
Usage:
compare.py --reldir=relpath other_session
Builds a reports comparing the active session with other_session.
Outputs to samples_dir/relpath/comparison_<session>_<other_session>.txt
.
configure.py
Usage:
configure.py subcommand [args]
Subcommands:
new session [playlist] : Creates a new session, sets it as the active one
activate session : activates a session
deactivate : no session will be active
protect [target]: forbids overwrite of session data
status : prints configuration for the active session
help [subcommand] : prints help for the given subcommand or topic
list : list all sessions associated the current samples_dir
Creates and manages pyglet media_player debug session configurations.
Most commands and subcommands need an environment variable
pyglet_mp_samples_dir
to be set to the directory where the media samples
reside.
The configuration stores some values used when other commands are executed, mostly protection status.
This command can be called both as configure.py
or mp.py
, they do the
same.
mp.py
alias for configure.py
playmany.py
Usage:
playmany.py
Uses media_player to play a sequence of samples and record debug info.
A session must be active, see command configure.py
If the active configuration has disallowed dbg overwrites it will do nothing.
If a playlist was provided at session creation, then only the samples in the
playlist will be played, otherwise all files in samples_dir
.
report.py
Usage:
report.py sample report_name
Generates a report from the debugging info recorded while playing sample.
Arguments:
sample: sample to report
report_name: desired report, one of
"anomalies": Start, end and interesting events
"all": All data is exposed as text
"counter": How many occurrences of each defect
The report will be written to session’s output dir under
reports/sample.report_name.txt
.
Example:
report anomalies small.mp4
will write the report anomalies to report/small.mp4.anomalies.txt
.
The authoritative list of reports available comes from
reports.available_reports
retry_crashed.py
Usage:
retry_crashed.py [--clean] [max_retries]
Inspects the raw data collected to get the list of samples that crashed the last time they were played. Then it replays those samples, recording new raw data for them.
The process is repeated until all samples has a recording with no crashes or
the still crashing samples were played max_tries
times in this command
run.
Notice that only samples recorded as crashing in the last run are retried.
A configuration must be active, see command configure.py
.
Besides the updated debug recordings, a state is build and saved:
total_retries: total retries attempted, including previous runs
sometimes_crashed: list of samples that crashed one time but later
completed a play
always_crashed: list of samples that always crashed
Options:
--clean: discards crash data collected in a previous run
max_retries: defaults to 5
run_test_suite.py
Usage:
run_test_suite.py [samples_dir]
Plays media samples with the pyglet media_player, recording debug information for each sample played and writing reports about the data captured.
Arguments:
samples_dir: directory with the media samples to play
If no samples_dir is provided the active session is the target. If an explicit playlist was specified when creating the session, then only the samples in the playlist will be played, otherwise all samples in samples_dir will be played.
If sample_dir is provided, a session named testrun_00
(_01
, _02
,
… if that name was taken) will be created, with no explicit playlist, and
then the command operates as in the previous case.
Output files will be into:
samples_dir/session/dbg : binary capture of media_player events, other raw
data captured
samples_dir/session/reports : human readable reports
Note
This script will refuse to overwrite an existing test_run results
.
Output files will be into subdirectories:
samples_dir/test_run/dbg
Each sample will generate a
sample.dbg
file storing the sequence of player debug events seen while playing the sample. It is simply a pickle of a list of tuples, each tuple an event. There are not meant for direct human use, but to run some analyzers to render useful reports.A
crash_retries.pkl
file, a pickle of(total_retries, sometimes_crashed, still_crashing) <-> (int, set, set)
.A
pyglet.info
captured at session creation to track hw & sw.A pyglet hg revision captured at session creation.
samples_dir/test_run/reports
Human readable outputs, described in command
summarize.py
Later a user can generate visualizations and additional reports that will be stored in this directory
summarize.py
Usage:
summarize.py
Summarizes the session info collected with playmany
and retry_crashes
.
A configuration must be active, see command configure.py
.
If a playlist was provided at session creation, then only the samples in the playlist will be played, otherwise all files in samples_dir.
Produces human readable reports, constructed from the .dbg files.
Output will be in
samples_dir/test_run/reports
The files in that directory will be
00_summary.txt
, which provides:
basics defects stats over all samples
a paragraph for each non perfect sample play with the count of each anomaly observed
03_pyglet_info.txt
, pyglet.info
output giving OS, python version,
etc (as captured at session creation).
04_pyglet_hg_revision.txt
, pyglet hg revision if running from a repo
clone, non writen if no repo (as captured at session creation).
sample_name.all.txt
and sample_name.anomalies.txt
for each sample that
played non perfect.
sample_name.all.txt
has all info in the sample_name.dbg
in human
readable form, that is, the sequence of player’s internal events along the
play.
sample_name.anomalies.txt
is a reduced version of the .all
.
variant: normal events are not shown, only anomalies.
timeline.py
Usage:
timeline.py sample [output_format]
Renders the media player’s debug info to a format more suitable to postprocess in a spreadsheets or other software, particularly to get a data visualization.
See output details in the manual.
Arguments:
sample: sample to report
output_format : one of { "csv", "pkl"}, by default saves as .pkl (pickle)
The output will be written to session’s output dir under
reports/sample.timeline.[.pkl or .csv]
.
Example:
timeline.py small.mp4
will write the output to report/small.mp4.timeline.pkl
.
Note
.csv
sample is currently not implemented.
Samples
Samples should be small, at the moment I suggest an arbitrary 2MB 2 minutes
limit. The samples dir contains a _sources.txt
which lists from where
each sample comes.
Caveat:
Samples are not ‘certified to be compliant with the specification’.
When possible, samples should be played with non ffmpeg software for incidental confirmation of well formed
*.mp4
,*.3gp
played well with Windows Media Player for win7
*.ogv
,*. webm
played well with Firefox 54.0
*.flv
,*.mkv
played well with VLC Media player, but VLC uses ffmpeg
Surely the samples set will be refined as time goes.
pycharm notes
For examples/video_ffmpeg
module visibility and code completion, that
directory should be a ‘content root’ in pycharm settings | ‘project
structure’; as projects roots cannot nest, the pyglet working copy cannot be a
‘content root’, I removed it; I added also working_copy/pyglet as another
‘content root’ so pycharm plays well also en the library proper. This with
pycharm 2017.2