CS6603 AI, Ethics, and Society¶

Module 1: Data, Individuals, and Society¶

Notes by Taichi Nakatani (tnakatani3@gatech.edu)

Lesson 1 - Introduction¶

Punishes those not loyal to Communist party. No due process.
AI for monitoring
Profiling via video (age, gender)
Glasses to recognize those on wanted list.

What is big data¶

Big Data: Process of applying computing power to aggregate large, complex sets of information.

References and Links:

https://news.microsoft.com/2013/02/11/the-big-bang-how-the-big-data-explosion-is-changing-the-world/ (Links to an external site.)
https://fcw.com/articles/2013/03/06/big-data-not-for-all.aspx (Links to an external site.)
https://www.conversationprism.com (Links to an external site.)
https://www.forbes.com/sites/bernardmarr/2018/05/21/how-much-data-do-we-create-every-day-the-mind-blowing-stats-everyone-should-read/#542d386260ba (Links to an external site.)
https://mc.ai/no-machine-learning-is-not-just-glorified-statistics/

Targeted Messaging¶

Big data allows orgs to target specific demographics
The more data you allow to be collected the more they can target your interests.

What's the problem¶

Organizations aren't interested in you personally, but as a collective "you" with similar traits and behaviors.
These assumptions are based on historical data, which means it can embed our historical biases.

Example: Criminial detection with headshots

Chinese scientists claiming they can distinguish criminials by headshots with 90% accuracy, and algos are free from biases that cloud human judgment.
Data: 1,800 photos Chinese men aged 18-55. 1,100 were photos of non-criminals scraped, others were pictures of criminials provided by police.
- Behavioral bias: Mugshots, people not smiling and not happy. Non-mugshots, ppl typically smile. Injecting behavioral bias of not smiling to criminality.
Physiognomy: Practice of using ppl's outer appearance to infer inner character.

Example: AI guessing sexual orientation

AI "learned" that gay men had larger foreheads than straight men, and vice versa for lesbians.
Data: Scraped dating website. 35,000 facial images of ~14,000 ppl, straight/gay evenly distributed.
Eval: Compared ML accuracy to annotation via Amazon Turk.

References and links:

AI & Unintended Consequences¶

Most companies cover up embedded biases in their algorithms by blocking certain outputs rather than retraining the models to abstrain from thes biases.
Search results for images of certain positions (doctors) returned a specific gender (men). Reflects societal bias.
Bias is fed back by user's behaviors (preferring to click male doctors instead of female doctor images)
Unintended consequences caused by using historical data for future prediction (criminality based on historical data).

References and Links:

Rose, A. (2010). Are face-detection cameras racist? Time, January 22, http://content.time.com/time/business/article/0,8599,1954643,00.html (Links to an external site.)
Griffiths, J. (2016). New Zealand passport robot thinks this Asian man's eyes are closed. CNN, December 9, www.cnn.com/2016/12/07/asia/new-zealand-passport-robot-asian-trnd/ (Links to an external site.)
Pulliam-Moore, C. (2015). Google photos identified black people as ‘gorillas,’ but racist software isn’t new. Fusion. fusion.net/story/159736/google-photos-identified-black-people-as-gorillas-but-racist-software-isnt-new/ (Links to an external site.)
Simonite, T. (2018). When It Comes to Gorillas, Google Photos Remains Blind. Wired. https://www.wired.com/story/when-it-comes-to-gorillas-google-photos-remains-blind (Links to an external site.)
Harwell, D. (2018). The Accent Gap. https://www.washingtonpost.com/graphics/2018/business/alexa-does-not-understand-your-accent/?noredirect=on (Links to an external site.)
Metz, R. (2018). Microsoft’s neo-Nazi sexbot was a great lesson for makers of AI assistants. MIT Technology Review https://www.technologyreview.com/s/610634/microsofts-neo-nazi-sexbot-was-a-great-lesson-for-makers-of-ai-assistants/ (Links to an external site.)
Tatman, R. (2016). Google’s speech recognition has a gender bias. Making Noise & Hearing Things, July 12, makingnoiseandhearingthings.com/2016/07/12/googles-speech- (Links to an external site.)recognition-has-a-gender-bias/ (Links to an external site.)
Hornigold, T. (2019). This Chatbot has Over 660 Million Users—and It Wants to Be Their Best Friend. Singularity Hub. https://singularityhub.com/2019/07/14/this-chatbot-has-over-660-million-users-and-it-wants-to-be-their-best-friend/ (Links to an external site.)
Vincent, J. (2018). Google removes gendered pronouns from Gmail’s Smart Compose to avoid AI bias, https://www.theverge.com/2018/11/27/18114127/google-gmail-smart-compose-ai-gender-bias-prounouns-removed (Links to an external site.)
Kay, M., Matuszek, C., & Munson, S. A. (2015). Unequal representation and gender stereotypes in image search results for occupations. In Proceedings of the 33rd Annual ACM Conference on Human Factors in Computing Systems (CHI '15). ACM, New York, NY, USA, 3819-3828. https://www.csee.umbc.edu/~cmat/Pubs/KayMatuszekMunsonCHI2015GenderImageSearch.pdf (Links to an external site.)
Dastin, J. (2018). Amazon scraps secret AI recruiting tool that showed bias against women. Reuters. www.reuters.com/article/us-amazon-com-jobs-automation-insight/amazon-scraps-secret-ai-recruiting-tool-that-showed-bias-against-women-idUSKCN1MK08G (Links to an external site.)
Guarino, B. (2016). Google faulted for racial bias in image search results for black teenagers. The Washington Post, https://www.washingtonpost.com/news/morning-mix/wp/2016/06/10/google-faulted-for-racial-bias-in-image-search-results-for-black-teenagers/ (Links to an external site.)
Angwin, J., Larson, J., Mattu S. and Kirchner, L. (2016). There’s software used across the country to predict future criminals. And it’s biased against blacks. ProPublica. www.propublica.org/article/machine-bias-risk-assessments-in-criminal-sentencing

Lesson 3 - Relationship between Ethics and Law¶

Ethics: Principles that distinguish what is morally right/wrong. No governing authority to sanction it.

Law: System of rules established by governemtn to maintain stability and justice. Defines legal rights and provides means of enforcing them.

Lesson 4 - Data Collection¶

Lesson 5 - Fairness and Bias¶

Algorithmic fairness: how can we ensure that our algorithms act in ways that are fair?

Accountability: How to supervise/audit AI which have large impact
Transparency: Why does an algo behave a certain way, explainability.
AI safety: How to make AI without unintended negative consequences.

Why fairness is hard¶

Bank loan problem: If sensitive attribute (e.g. postal code) is correlated with other attributes, AI will find those correlations.

Easy to predict class if you have lots of other information (e.g. home address spending patterns)
More sophisticated approaches are necessary.

Principles for Quantifying Fairness¶

Group Fairness: Assessing fairness by using statistical parity. Require same percentage of group A and B to receive loans (in bank loan context).

Formula - Given two groups, both group would receive same % of loans.

P(loan|no repay, A) == P(loan|no repay, B)
P(no loan|would repay) == P(no loan|would repay, B)

Problem: What is group A & B have different probability of likeliness to pay?
Should bank take a loss for the sake of group fairness?

Individual Fairness: Assess fairness by whether similar people (background) experience similar outcomes.

Thes measures compare the protected group against the unprotected group.
Risk difference (UK law): To ensure fairness, risk difference (ie "absolute risk reduction") should be minimal.
Risk ratio (EU court of justice): Proportion of protected/unprotected group is the focus.
Problem: Consistency might result in everyone being treated equally badly.

What is bias?

moralmachine.mit.edu - addresses trolley problem (end one life to save five?)

Module 2: BS of Big Data & Stats 101¶

Lesson 6: Overview¶

Statistics: The science of collecting, organizing, presenting, analyzing and interpreting data to assist in making effective decisions.

Goal of module:

How to identify bad statistics.
How to use data to train your algorithms in more unbiased and fair ways.

Brief history of stats: Graunt's "Natural and Political Observation Made upon the Bills of Mortality"

Used London bills of mortality to estimate city's population in ~1660.
Stats: There were ~3 deaths for every 88 people. Since London bills cited 13,200 deaths, Graunt estimated London population to be about 387,200 (13,200 * 88 / 3).
Issues: Higher income neighborhoods would have lower death rates, and vice versa for lower economic neighborhoods. Doesn't account for the homeless. Assumes population ratio for each neighborhood is the same.

How to mislead through poor sampling¶

Definitions:

sample: data collected
population: the body in which the data is collected from

Example: Analyzing change in high school students' interest in computing.

Biased sampling: Sampling from specific sample (e.g. high income high school students in Georgia) and extrapolating findings to the population.
Problem: Most analysis doesn't provide deep enough info on the sample population.

How to mislead through poor analysis¶

Definitions:

Data analysis: Process of gathering, modeling, transforming data with the goal of highlighting useful info, suggesting conclusions, and supporting decision making.
Problem: Scientists have propensity to throw all data and see what works. This magnifies biases in the data.

Example: Lying with graphs¶

In graphs below, right chart doesn't include zero. Warps the distance between y-values. Misleads that unemployment rates are significantly falling.

charts

Example: Unemployment data¶

Sample: 60,000 households.
Formula: Unemployment Rate = # unemployed / # labor force
Problem: Questionnaire doesn't count those who haven't looked for a job in over 30 days to be part of the labor force. Thus, they aren't counted in the unemployment rate numbers.

Household Survey vs Establishment Survey

Household survey asks if you're working. Establishment asks how many ppl are on payroll.
Household survey includes agricultural workers, self-employed, and private household workers. Establishment doesn't.
Household survey counts people on unpaid leave as employed. Establishment doesn't.
Household survey only counts ppl over age of 16. Establishment doesn't.
Establishment survey often "double counts" jobs (e.g. person works 2 jobs, employee quits one job and is employed at another in same payroll period)
Leads to delta between household and establishment. Establishment numbers are also often edited.

payroll

How to mislead through interpretation¶

tl;dr - don't trust graphs

Bar chart axes should include zero.
Don't invert the y-axis.

Reference: https://www.callingbullshit.org/tools/tools_misleading_axes.html

Lesson 7: Python and Stats 101¶

Defining Data Analytics¶

Descriptive Analytics: Methods of organizing, summarizing, presenting data (freq table, histogram, mean, variance)
Inferential Analytics: Methods used to determine ideas about a population using stats.

Diff between big data & data analytics

Big data focuses on handling non-traditional "big" data.
Focuses on gaining meaningful insight regardless of size of data.

AI / ML / DL

AI: machines imitating intelligent human behavior
ML: process of computer able to continuously improve its own performance by incorporating new data into an existing stats model
DL: artificial neural networks learn from large amounts of data.

All about the data¶

Data:

Facts and figures, collected and analyzed.
Can have quantitative / qualitative values
Can be continuous / categorical
Ordinal/rank - in order but not necessarily equal (e.g. Likert scale)
Cross-sectional - collected at the same time.
Time-series - data collected over several time periods.

Lesson 8: Descriptive Statistics¶

Types of Descriptive Statistics¶

Descriptive stats: Methods of organizing, summarizing, and presenting data in an informative way (freq table, histogram, mean, variance)

Inferential Analytics: Methods used to determine something about a population on the basis of a sample (ML/AI for big data)

Population: Entire set of indiv or objects of interest or the measurements obtained from all individuals or objects of interest.
Sample: Portion, or part of the population of interest.

Types of Studies¶

Experimental Study: One variable is manipulated, and second variable is observed and measured to determine effect of treatment variable. Measurements are compared to see if there are differences between conditions. (Facebook's emotion contagion experiment)
Correlation Study: Determine whether there is a relationship between two variables and to describe the relationship. A correlational study simply observes the two variables as they exist naturally.
Quasi-Experimental: Compares groups based on a variable that differentiate the groups (e.g. male/female)

Sampling & Sampling Error¶

Sampling Error: Discrepency between a sample statistic and its population parameter. Leads to sampling bias.

Median, Mean, and Mode¶

When to use median vs mean:

Mean is best for symmetric distributions
Median is less sensitive to outliers than the mean. Better measure for highly skewed distributions (e.g. family income, housing prices, etc)

Using mean vs median for your messaging¶

Headline should be the one on bottom if math was done correctly. guns

Mode¶

Most frequently occurring number (score, measurement)
Value that is observed most frequently.
Value is undefined for sequences with no duplicates.
Example: Avg number of tix purchased per person for a GT football game is almost always going to be accurately reflected by the mode.
Lying with mode: Any survey that rates on a broad scale can be manipulated to emphasize the mode.
- If you survey 100 ppl on scale of 1-10 about their feelings on a subject, and more people rate it "10" than any other number, then even if only one more person gave 10 rating than gave a 1 rating, 10 is the mode average.

How to mislead with averages¶

Example: Manipulating average income of a neighborhood

Real estate agent manipulates avg income with perfect honesty and "truthfulness", tell different ppl that the avg income in the neighborhood is:

$150,000 - mean of the incomes of all the families in the neighborhood. One home is a giant, expensive mansions.
$35,000 - median income.
$10,000 - mode of neighborhood

misleadavg

Frequency Distribution¶

Definition: Tallies number of times a data point occurs.

Cumulative frequency distribution: "Runnint total" of frequencies.

Tells the total number of data items at different stages in the data set.
How to lie with frequency distribution: Showing "cumulative" frequency distribution vs basic distribution.

Ref: https://www150.statcan.gc.ca/n1/edu/power-pouvoir/ch10/5214862-eng.htm

Variability¶

Definition: Measures the amount of "scatter" in a dataset. Shows how well the avg characterizes data as a whole.

# Both have same mean (50) but different stdev (20 vs 10)
a = [30, 50, 70]
b = [40, 50, 60]

Examples: range, variance, stdev, interquartile range, coefficient of variation.

range

quartiles

Ref: https://junkcharts.typepad.com/junk_charts/boxplot/

Inferential Statistics: Sampling Bias¶

Definition: Drawing inferences about an individual based on data drawn from a larger group of similar individuals.

Examples: Credit card / loans, hiring.

Chain of reasoning for inferential stats¶

chain

Not all samples will lead to good prediction about an entire population.

Case Studies¶

Institute decide to get rid of the Chick-fil-A Express in the student center. After a survey of all the faculty, it was overwhelmingly decided that Chick-fil-A would be replaced with To- Go Fogo de Chão Brazilian Steakhouse.

Who is the population? A: Faculty (NOT students)
The population which is weighed more towards a certain group will result in conclusions drawn from the data to be inaccurate.

Simpson's Paradox¶

Definition: Trend appears in several different groups of data but disappears of reverses when these groups are combined.

Example (ref: https://blog.revolutionanalytics.com/2013/07/a-great-example-of-simpsons-paradox.html)

Since 2000, the median US wage has risen about 1%, adjusted for inflation.
But over the same period, within every education subgroup, the median wage is lower now than its was in 2000:
- high school dropouts,
- high school graduates with no college education,
- people with some college education, and
- people with Bachelor’s or higher degrees
WHY? Changing educational profile of the workforce
- There are now more college graduate (with higher-paying jobs), but wages for college grads collectively have fallen at a slower rate (-1.2%) vs those with lower education (-7.9% for high school dropouts). I.e., the growth in college grads swamps the wage decline for specific groups.

Example (How statistics can be misleading - Mark Liddell): https://www.youtube.com/watch?v=sxYrzzy3cq8&t=26s

How can Hospital A with lower survival rates for both healthy/unhealthy patients get a better surival rate overall compared to Hospital B that has higher survival rates for both?
WHY? Relative proportion of healthy/unhealthy patients in each sample.

Biased Sampling¶

Statistical definition of bias¶

Estiamtor is unbiased if mean of means == true mean (bias is zero)
"Mean of means" is called the "expected value" of the estimator.

Types of Sampling Bias¶

Area Bias: Bias by sampling in specific area not representative of the population (e.g. sampling only from East End in Pittsburgh vs all neighborhoods).
Selection Bias_: Sampling method is done in a way that proper randomization is not achieved, ergo sample isn't representative of the population (e.g. cherry picking sample to confirm your hypothesis).
Self-selection Bias: Participants' decision to join a study may be correlated with traits that affec tthe study. Individual select themselves into a group, causing biased sample with nonprobability sampling. E.g. Sample under-21 about drinking, respondents are probably those that don't drink.
Leading Question Bias: Giving participants a clue as to the desired answer. (e.g. "Don't you think that...?" suggests agreement)
Social Desirability Bias: Participants persuaded to answer in a socially acceptable manner (e.g. "Do you brush your teeth in the morning?" in a group of people).

Biased Sampling Example¶

samplingmethods

(a) - high variability and high bias
(b) - low variabilty and low bias
(c) - high variability and low bias
(d) - low variability and high bias

Good sampling method has both low bias and low variability
Graph B is theoretically the best, but it suggests distribution is gaussian (but it might not be)

Types of Randomized Sampling¶

Simple random sampling: randomly sample from population

Systematic Sampling

Given data is sequentially numbered, choose nth piece of data.
Cons: There may be bias in sequence of students.

Statified random sampling: Data is divided in subgroups (strata)

Based on specific characteristis (age, education level)
Use random sampling within each strata.

Pros and Cons of each:

sampling

Cluster random sampling: Split population into similar parts of clusters.

Each cluster should be mini version of entire population.
Select one or few clusters at random and select simple random sample from each cluster.
If each cluster fairly represents the full population, cluster random sampling will give us an unbiased example.
Pro: Useful when difficult and costly to develop complete list of population members (e.g. all items sold at grocery store)

Non-probability Sampling: Participants are chosen/choose themselves so that chance of being selected is not known.

No one has figured out how to select a representative sample of internet users.

Inferential Statistics: Causation vs Correlation¶

Correlation tells us two variables are related.

Types of relationship reflected in correlation:

X causes Y or Y causes X (causal relationship)
X and Y are caused by a third variable Z (spurious relationship)

Important: Correlation doesn't imply causation.

Correlation coefficient summarizes the association between 2 variables.

1.0 is perfect positive relationship, 0.0 is no relationship, -1.0 is perfect negative

Correlation vs Causation Examples¶

"Correlation between worker's education levels and wages is strongly positive"

Issues:

Recall: Correlation tells us two variables are related but doesn't tell us why
Causation: Education improves skills and skilled workers get better paying jobs
Rebuttal: Individuals are born with innate talent A which is relevant for success in education as well as relevant for success on the job.

Examples of "spurious correlations": www.tylervigen.com

spurious

Relationships¶

Relationships between two variables are often influenced by other unknown variables.

Common response: Variable Z (unknown) affects X and Y. Change in an unknown variable is causing change in both our explanatory variable and our response variable.
Confounding: Variable Z (unknown) or X affects Y. Either the change in our explanatory variable is causing changes in the response variable, or that a change in an unknown variable is causing changes in the response variable.

Measuring Linear Correlation in Python¶

Linear correlation coefficient: a measure of the strength and direction of a linear association between two random variables (also called the Pearson product-moment correlation coefficient)

from scipy import stats

scipy.stats.pearsonr(X, Y)

The linear correlation coefficient quantifies the strengths and directions of movements in two random variables
Correlations of -1 or +1 imply an exact linear relationship
Positive correlations imply that as x increases, so does y.
Negative correlations imply that as x increases, y decreases.

Inferential Statistics: Confidence¶

Empirical Rule¶

Definition: For a normal distribution, almost all of the data will fall within 3 standard deviations from the mean. Assumes that the data follows a gaussian distribution.

bell

Example: IQ¶

IQ scores are normally distributed with a mean of 100 and a stdev of 15.
68% of IQ scores (85 to 115) can be calculated by ±1 standard deviations of the mean.
95% of IQ scores (70 to 130) can be calculated by ±2 standard deviations of the mean.

Population Proportions and Margin of Error (MoE)¶

sample

Margin of Error:

How confident we are is usually expressed as a percentage
Going back to Empirical Rule, we saw that 95% of area of a normal curve lies within +-2 stdev of the mean.
This means that we are 95% certain that the population proportion is within ±2 stdev of the sample proporition. ±2 stdev is our margin of error
Percentage margin of error depends on sample size.

# At 95% level of confidence
# n = sample size
margin_of_error = 1 / math.sqrt(n)

# Example of n=1000, MoE is ±3%
0.03 = 1 / math.sqrt(1000)

Example: Surveys

Company X surveys customers and finds that 50% of the respondents say its customer service is "very good". The confidence level is cited as 95% ± 3% MoE.

This means if the survey is conducted 1000 times, the percentage of those who respond "very good" will range between 47 and 53 percent 95 percent of the time.

Confidence Interval¶

We can estimate population proportion using a confidence interval.
If we build the MoE around the true value, it will capture 95% of all the samples.
If we build the MoE around the sample statistic, it would have a 95% chance of capturing the true value.

Example:

moe

If we run this sample many times, 95% of the time the proportion of those who spent over $5 will be within a ±0.2 MoE interval from the sampled proportion (e.g. 0.4).

Sample Size and MoE¶

MoE estimates how accurately the results of the poll reflect the true value, ie. the population.
As sample size increases, MoE decreases.
The MoE decrease wrt sample size is logarithmic, so need to consider cost/benefit in sampling.

MoE Table: | Sample Size | % MoE | |:------------|:-| | 25 | ±20% | | 64 | ±12.5% | | 100 | ±10% | | 256 | ±6.25% | | 400 | ±5% | | 625 | ±4% | | 1111 | ±3% | | 1600 | ±2.5% | | 2500 | ±2% | | 10000 | ±1% |

Applications of MoE: Examples¶

Example 1:

A company claims 30% of ppl who eat their product really like it. CI is cited as 95%.
In June, an independent survey was conducted with 625 randomly selected members to verify this claim.
Result of survey was that 125 liked the product.
Q: Would you say that at a 5% level of significance, that the company was correct in stating that 30% of people liked their product?

Answer:

Proportion of people who liked the product in sample: 125/625 = 0.2, 20%
MoE of n=625 for 95% confidence is 1/math.sqrt(625), ±0.04 (±40%)
MoE range is within 0.26 to 0.34
Conclusion: No, 20% is not within the MoE range of 26-34%

Example 2:

In a survey I want a MoE to be ±5% at 95% level of confidence. What sample size must I pick in order to achieve this?

Answer: Sample size should be 400 to get MoE 5% at 95% level of confidence:

# Derive sample size from MoE of 0.05
moe = 0.05
moe = 1/math.sqrt(n)
moe**2 = 1/n
1/(moe**2) = n
n = 400

Example 3:

Company claims that 10% of candies it produces are green.
Students found that in a large sample of 500 M&Ms, 60 were green.
Q: Assuming company claim is true, would 60/500 proportion be unusually high or low proportion of green M&Ms?

Answer:

Proportion of green M&Ms is 0.12, 12%
MoE of sample size is 0.0445, ±4.5%
Sample proportion is within the MoE range of 0.075 to 0.165, or 7.5-16.5%.
Conclusion: 10% proportion is within the MoE for its sample size.

Module 3: AI/ML Techniques¶

Goal: Understand and apply basic AI/ML techniques to data scenarios, with a focus on instituting "fair" practices when designing decision-making systems based on big data.

Lesson 12: Word Embeddings¶

Word Embeddings (NLP)¶

Word embeddings transform human language meaningfully into a numerical form. This allows algortihms to understand the nuances implicitly encoded into our language.

Bias in word embeddings

NLP products, if trained on toxic data will generate biased/toxic output (e.g. Micoosoft chatbot).
Data can be "sanitized" to prevent biased/toxic outputs.

Word Simlarity & Relatedness¶

A note on word vs semantic similarity:

Semantic similarity: Metric defined over a set of terms, and its distance/similarity is based on the likeliness of their meaning.
Word simlarity: Metric comparison of a word's syntactical representation or string format.

2 prevailing use of simlarity:

Using a dictionary (e.g. WordNet)
Learning simlarity statistics using a large corpus of data.

Vector Space Models¶

Vectorization: Process of converting text to numbers. This conversions helps to measure simlarity between words.
Vector space models are models representing text as a vector of identifiers in which similar words are mapped to points in geometric space.

cosine

Representations¶

Document Occurrence: Assign identifiers corresponding to the count of words in each document (from a cluster of docs) in which the word occurs.

documents

12 documents (menus), "chocolate" is mentioned 7 times in 5 docs.
Vector space model may find relation between the docs, e.g. they are dessert menus. Can use to score probability that "chocolate" in menus are dessert menus.

Word Context: Quantify co-occurrence of terms in a corpus by constructing a co-occurrence matrix to capture the number of times a term appears in the context of another term.

Example: Create word cooccurrence table between "chocolate is the bets dessert in the world", "GT is the best university in the world" and "The world runs on chocolate".

wordcontext

Example: Comparing tiny corpus of sports corpus. Doc occurrence finds "losangeles" + "dodgers" and "atlanta" + "falcons" co-occur. Word occurrence shows different viewpoint.

dococcurrence

wordoccurrence

Cosine similarity & word analogy¶

Cosine similarity estimates how similar two words are.
IMPORTANT: Similarity measures is highly dependent on what vector representation is selected to represetn the words found in your corpus.

FORMULA:

cosine

Given two vectors a and b, the cosine similarity is defined as the dot-product of the two vectors divided by their length.
The formula measures the cosine of the angle between two vectors projected in a multi-dimensional space.
The closer the angle, the more similar the words are. 1 = related, 0 = unrelated, -1 = related but opposite

# Similarity = (A.B) / (||A||.||B||) 

import numpy as np
from numpy.linalg import norm
from itertools import permutations

# toy vectors using atlanta, falcons, los angeles and dodgers
atlanta = ('atlanta', np.array([1, 1, 0, 0]))
falcons = ('falcons', np.array([1, 1, 0, 0]))
los_angeles = ('los angeles', np.array([0, 0, 1, 1]))
dodgers = ('dodgers', np.array([0, 0, 1, 1]))

# compute cosine similarity
def cos_sim(x, y):
    return np.dot(x, y) / (norm(x) * norm(y))

# compute cosine similarities among toy vectors
for p1, p2 in list(permutations([atlanta, falcons, los_angeles, dodgers], 2)):
    cosine = cos_sim(p1[1], p2[1])
    print(f"Similarity({p1[0]}, {p2[0]}): {round(cosine, 2)}")

Results: Cosine similarity between atlanta and falcons, and los angeles and dodgers are similar.

Similarity(atlanta, falcons): 1.0
Similarity(atlanta, los angeles): 0.0
Similarity(atlanta, dodgers): 0.0
Similarity(falcons, atlanta): 1.0
Similarity(falcons, los angeles): 0.0
Similarity(falcons, dodgers): 0.0
Similarity(los angeles, atlanta): 0.0
Similarity(los angeles, falcons): 0.0
Similarity(los angeles, dodgers): 1.0
Similarity(dodgers, atlanta): 0.0
Similarity(dodgers, falcons): 0.0
Similarity(dodgers, los angeles): 1.0

Word Analogy Task¶

Task: "a is to b as c is to ???"
Formula: Find the word vector that is most similar to the result vector of vec_c + vec_b - vec_a

Examples: From http://bionlp-www.utu.fi/wv_demo/ - English GoogleNews Model

analogies

Word Embeddings (Word2Vec)¶

Stores each word as point in multidimensional spaces represented by a vector of a fixed number of dimensions (generally 300).
Dimensions are projections along different axes.
Assumption: Similar words have similar angles
Unsupervised, built just by reading large corpus of data
Example: "Chocolate" might be represented as [1, 0, 1, 1, 0, 2]

vecspace

Vector Space Models for Word Embeddings¶

Context prediction models (Skipgram, W2V): Predict the context of a given word by learning probabilities of co-occurence from a corpus.

In theory, words that share similar contexts tend to have similar meanings.
Thus, instead of counting co-occurrence we should be able to generate word vectors that can predict the context of a word based on its surrounding words by learning from a corpus of data.

Word2Vec¶

w2v

2 Types of Word2Vec:

Continuous Bag of Words (CBOW): Neural network trained to predict which word fits in a gap in a sentence. Example: "the student ___ the exam", model optimized to predict gap with word with highest possibility.
Skipgram: Starts with a single word embedding and tries to predict the surrounding words.
- W2V uses words a few positions away from each center word to predict similarities between every word and its context words.
- Pairs of center word / context word are called skip grams
- Example: "the student passed the exam". Center word = "passed", context words = ["the", "student", "the", "exam"].

cbowskipgram

skipgram

Sliding window moves across a sentence, setting the center word and context words.
Visualization tutorial: http://ronxin.github.io/wevi/

Word2Vec Params¶

Important parameters

Window size: Can affect the result of the vector space model
Iterations: Can affect

Lesson 13: Bias in Word Embeddings¶

Bias in Word Embeddings¶

Why does this happen?¶

Lesson 14: Facial Recognition¶

Facial Recognition Algorithms¶

Steps:

Face detection, Two-class classification. First step of any auto-face recognition system
- Positioning
- Rotation and pose
- Occlusion (hidden face)
- Resolution
- Single image or sequence of images (video)
Segmentated based on face detection, and normalized/translate/scale/rotated.
- Multi-class classification (one person vs all others)
Face Identification - tell which person it is.
Face verification - verify whether it is the same person who it is claiming it is.

Method:¶

Face recognitions algos "measure" nodal points on the face (distance between eyes, length of nose, angle of jaws)
Features: upper ridges of eye, nose shape, mouth size, position of features relative to each other.
Face space: Theory of psychology that defines a multidimensional space in which recognizable faces are stored. Representation of faces in this space are according to invariant features of the face itself.

facespace

Appearance-based methods (classifier) trained, typically using supervised learning methods. Deep neural nets are most common method as of 2022.

Human biases¶

"Own Race" bias (Meissner and Brigham) - 2x more likely to identify own race than other race
"Own gender" bias
"Own age" bias

Deep Neural Network for Facial Recognition¶

Facebook DeepFace - largest facial dataset (in 2014), trained on 4MM images belonging to more than 4000 identities.
Microsoft Celeb Dataset - 10MM of 100,000 individuals. Scraped from images with Creative Commons license
Duke MTMC Dataset Analysis - capture students between student lectures. 2MM frames of 2,000 students
Stanford Branwash Dataset Analysis - 10,000 images, 82,000 annotated heads.

All 3 were taken down in 2019. All were CC, but were all used by foreign surveillance and defence organizations.

Other datasets: MegaFace Dataset - face recognition training set 4.7MM faces. Sourced from Flickr.

Emotions: Facial Recognition¶

Facial recognition algorithms are used to gauge a person's emotion. Used for driver attention, monitor movie audience reaction, healthcare use.

AIs originally built upon Ekman's studies (emotion expressions are universal)

Procedure:

Extract facial features
Feed features into a classifier (NNs)
Classify image/features to one of the pre-selected emotion categories (6 universal emotions + neutral).

Facial Action Units¶

$fau$

Case Study: TSA's Screening of Passengers by Observation Techniques (SPOT) Program

deploys over 3,000 behavioral detection officers in an effort to identify passengers that may pose a risk to aviation security.
criticised for racial profiling.

Lesson 15: Bias in Facial Recognition¶

In the wild, facial identification becomes problematic because:

resolution
facial pose
illumination
occlusion

Results in:

Facial feature points not found
Higher errors
Not enough data or feature points to analyze

Error rates for face recognition:

False positives - matching a wrong person to an image
False negatives - not matching the right person to an image

No standards exist for "acceptable" error rates. Depends on the facial recognition system used and its application.

Bias in the Data¶

biasdata

Why Bias Occurs in the Data

Training sets are hard to get. Need to buy/scrape/obtain more samples from underrepresented classes. Grey area occurs with regards to scraping.

Lesson 16: Predictive Algorithms Pt 1¶

Evaluation Metrics

alt

Module 4: Bias Mitigation Applications¶

Lesson 19: Fairness and Bias¶

Algorithmic Fairness - mitiagate effects of unwarranted bias/discrimination from AIML algos. Focus on mathematical formalism / algo arpproaches to fairness.

Examples of algo bias: types

Addressing Source of Bias¶

bias

Problem: Biased data stored in protected attributes. Solution: Remove protected class attributes. But redundant inherent encodings in other features that correlate to protected class may occur.

Addressing Fairness Measures¶

Problem: There are issues with error-rate imbalances such that different groups have different outcomes. Solution: Only outcomes matter, mmake sure g roups are in line with preetermined "fiarness" metrics

Issues:

There are many definitions for fairness
Many of the definitions conflict

Principles for quantifying fairness

predictions for ppl with similar non-protected attributes should be similar
differences should be mostly explainable by non-protected attributes

Two basic frameworks for measuring fairness:

Fairness at individual: consistency or individual fairness
Fairness at group: statistical parity

Fairness in Loan Scoring Models¶

redline

Max Profit Model - Setting different thresholds for disadvantaged groups in order to maximize profit. Split into priviledge vs unpriviledged group.

Profits computed on 4 components:

Max profit if you grant loan to someone with high probability of paying it back
Lose profit if you grant a loan to someone with low probability of paying it back.
Neutral if you deny a loan to someone with someone with low probability of paying it back.
(IMPORTANT) Lose some profit if you deny a loan to someone with a higher probability of paying it back (Opportunity costs)

Set different thresholds to the two groups, gives most loans to those with the highest probability of paying back. At least gives loans out to underpriviledged group, rather than denying then altogether.

Blinding Model - Class features and all "proxy" information removed.

Model is still unfair without sensitive data. Biases are still encoded by proxies in the dataset.
Priviledged group will have generally higher thresholds on all decision features.
Results in less profit in case study than max profit model.

Demographic Parity - All groups have same percentage approved.

Leads to bias against priviledged group.
Makes less profit than max profit model, but more than blind model.

Equal Opportunity - Same percentage of "credit-worthy" candidates, ie. true positives, in both groups.

Best of all one-threshold models, but still doesn't do better than Max Profit model.

Other Group Fairness Metrics

Statistical Parity Difference - calculate delta between unprivileged positives and privileged positives.
Disparate Impact - calculate ratio between unprivileged positives vs privileged positives.

otherfairness

Other Biases in the Algorithm Process¶

biascycle

biasmitigation

3 phases of bias mitigation steps

Preprocessing algorithms - modify training data
In-processing algorithms - modify learning algorithm
Post-processing algorithms - modify prediction labels

Lesson 20: Fairness and Bias Assessment Tools¶

AI Fairness 360¶

360

360algos

Preprocessing

preprocess

What-if Tool¶

https://pair-code.github.io/what-if-tool/

No code method of exploring ML models

Other tools¶

othertools

Lesson 21: AI/ML Techniques for Bias Mitigation¶

Fair Classifiers¶

Can't simply drop protected attributes because other features are correlated with them.

Race/Sex Discrimination on different algorithms

alt

Fairness-aware Algo Trade-offs¶

Determining thresholds for accuracy vs fairness must take into considerations: legal, ethical, gain trust.

When false positives is better than false negatives.

Image privacy: Something that needs to be blurred is not blurred.

When false negatives is better than false positives.

Spam filtering: An important email is flagged as spam (false positive), you end up not reading the email.

Bias consideration with regards to task: Example with gender.

Gender discrimination is illegal with loan applications.
Gender-specific medical diagnosis is desirable.

CS6603 AI, Ethics, and Society¶

Module 1: Data, Individuals, and Society¶

Lesson 1 - Introduction¶

China's Social Credit system¶

What is big data¶

Targeted Messaging¶

What's the problem¶

AI & Unintended Consequences¶

Lesson 3 - Relationship between Ethics and Law¶

Lesson 4 - Data Collection¶

Lesson 5 - Fairness and Bias¶

Why fairness is hard¶

Principles for Quantifying Fairness¶

Module 2: BS of Big Data & Stats 101¶

Lesson 6: Overview¶

How to mislead through poor sampling¶

How to mislead through poor analysis¶

Example: Lying with graphs¶

Example: Unemployment data¶

How to mislead through interpretation¶

Lesson 7: Python and Stats 101¶

Defining Data Analytics¶

All about the data¶

Lesson 8: Descriptive Statistics¶

Types of Descriptive Statistics¶

Types of Studies¶

Sampling & Sampling Error¶

Median, Mean, and Mode¶

Using mean vs median for your messaging¶

Mode¶

How to mislead with averages¶

Frequency Distribution¶

Variability¶

Inferential Statistics: Sampling Bias¶

Chain of reasoning for inferential stats¶

Case Studies¶

Simpson's Paradox¶

Biased Sampling¶

Statistical definition of bias¶

Types of Sampling Bias¶

Biased Sampling Example¶

Types of Randomized Sampling¶

Inferential Statistics: Causation vs Correlation¶

Correlation vs Causation Examples¶

Relationships¶

Measuring Linear Correlation in Python¶

Inferential Statistics: Confidence¶

Empirical Rule¶

Example: IQ¶

Population Proportions and Margin of Error (MoE)¶

Confidence Interval¶

Sample Size and MoE¶

Applications of MoE: Examples¶

Module 3: AI/ML Techniques¶

Lesson 12: Word Embeddings¶

Word Embeddings (NLP)¶

Word Simlarity & Relatedness¶

Vector Space Models¶

Representations¶

Cosine similarity & word analogy¶

Word Analogy Task¶

Word Embeddings (Word2Vec)¶

Vector Space Models for Word Embeddings¶

Word2Vec¶

Word2Vec Params¶

Lesson 13: Bias in Word Embeddings¶

Bias in Word Embeddings¶

Why does this happen?¶

Lesson 14: Facial Recognition¶

Facial Recognition Algorithms¶

Method:¶

Human biases¶

Deep Neural Network for Facial Recognition¶

Emotions: Facial Recognition¶

Facial Action Units¶

Lesson 15: Bias in Facial Recognition¶

Bias in the Data¶

Lesson 16: Predictive Algorithms Pt 1¶

Module 4: Bias Mitigation Applications¶

Lesson 19: Fairness and Bias¶