Processing vulnerability data for evidence & agreement
Hey Cee, I went ahead and added the "uncertainty agreegated" sheet that Oronde shared with us. That looks to match the SB data after you aggregate. The evidence categories of small
, medium
and large
make sense here and follow what Oronde shared:
Amt of Evidence:
Small if less than 5 studies measured the indicator. Medium if 5-9 studies measured the indicator.
Large if more than 9 studies measured the indicator.
Still a bit stuck confirming the level of agreement (miss matches with agreement ordinal vs cardinal values) - I thought I understood but could still be doing the math wrong?
Level of Agreement: NA if the level of agreement could not be calculated as indicator was measured only once. Low if the most recorded direction of influence reflects less than 50% of models measuring the determinant. Medium if the most recorded direction of influence reflects more than 50% but less than 74% of models measuring the determinant. High if more than 74% of models measuring the determinant recorded the same direction of influence on conditions of water insecurity.
Merge request reports
Activity
requested review from @cnell
6 # Confirming raw data matches `p1_unc_stats` from SB 7 tar_target(p2_unc_agg_summary, 8 p1_unc_agg |> 9 group_by(dimension, determinant, indicator) |> 10 summarize( 11 total_positively_related = sum(positively_related, na.rm = TRUE), 12 total_negatively_related = sum(negatively_related, na.rm = TRUE), 13 total_unrelated = sum(unrelated, na.rm = TRUE), 14 total_unknown_direction = sum(unknown_direction, na.rm = TRUE), 15 total_significant = sum(significant, na.rm = TRUE), 16 total_not_significant = sum(not_significant, na.rm = TRUE), 17 total_unknown = sum(unknown, na.rm = TRUE)) changed this line in version 2 of the diff
12 total_negatively_related = sum(negatively_related, na.rm = TRUE), 13 total_unrelated = sum(unrelated, na.rm = TRUE), 14 total_unknown_direction = sum(unknown_direction, na.rm = TRUE), 15 total_significant = sum(significant, na.rm = TRUE), 16 total_not_significant = sum(not_significant, na.rm = TRUE), 17 total_unknown = sum(unknown, na.rm = TRUE)) 18 ), 19 # Based on metadata: 20 # Amt of evidence: Small = total_studies < 5; Medium = total_studies 5-9; Large,total_studies = > 9 21 # Amt of agreement: Low = < 50% of models; Medium = >50% & <74% of models; High = >74% of models; NA if the level of agreement could not be calculated as indicator was measured only once. 22 tar_target(p2_agree_evid_stats, 23 p1_unc_stats |> 24 mutate( 25 evidence_val = pos_related + neg_related + unrelated + unk_direction, 26 # lots of mismatched ordinal vs cardinal values for agreement 27 agreement_val = (pos_related / (sig_strength + not_sig_strength + unk_strength) * 100)) Instead of using
pos_related
here, I think it should be whichever value out ofpos_related
,neg_related
,unrelated
, orunknown_direction
is the highest.You could do something like this to find the max value in each of those categories:
top_trend <- p2_unc_agg_summary |> select(dimension, determinant, indicator, positively_related_total, negatively_related_total, unrelated_total, unknown_direction_total) |> pivot_longer(!c(dimension,determinant,indicator)) |> group_by(dimension, determinant, indicator) |> slice_max(value) |> rename(sig_name = name, sig_value = value)
and then use a join to add is back to
p2_unc_agg_summary
and calculate the %:p2_unc_agg_summary |> left_join(top_trend) |> mutate(level_agreement = 100*(sig_value/evidence_val))
Try that out and see if it provides the expected results
Edited by Cee Nellchanged this line in version 2 of the diff
In
p1_unc_stats
there are 106 rows, butp2_unc_agg_summary
has 107 rows. I expected that these would be the same length, sincep2_unc_agg_summary
is essentially recreating the stats that are published inp1_unc_stats
from the raw data. Do you know if there are any indicators missing between the two?good catch - yeah after looking at the spreadsheets (case differences and slight differences in indicators names made it hard to compare indicators in R) , its looks like
p1_unc_stats
containsOther non-white
whilep2_unc_agg_summary
does not.p2_unc_agg_summary
hasamenities
&hazard type
whilep1_unc_stats
does not.Edited by Azadpour, Elmera
mentioned in issue #2 (closed)
mentioned in commit d53d6557
mentioned in issue #46