How to interview algorithms without code?

50 questions to ask companies responsible for developing, implementing and monitoring artificial intelligence systems.

This interview guide offers a framework for understanding how to audit artificial intelligence tools implemented across the public and private sector.

As these systems proliferate all parts of life, it is important to understand how to question specific algorithms, just as we question human decisions.

In a world where algorithms become ubiquitous in government and business, it’s critical to develop a more nuanced understanding of AI. By understanding the intricacies of smart machines and knowing what questions to ask, we can help keep these systems accountable. Here are important questions to ask:

  1. How is the organization using artificial intelligence?
  2. Are any decision trees or rule based systems being used and framed as AI?
  3. What is the technology used by the algorithm? (eg. machine learning, natural language, speech recognition, computer vision, robotics et.)
  4. What is the purpose of the algorithm? (eg. prediction, classification, optimization, pattern recognition, event detection etc. )
  5. What are all forms of human intervention and how do they impact the decision-making process? (eg. collect data, label data/ annotation, model development and training, validation, test, productization)
  6. What are the procedures in place for human oversight and/or override of algorithmic decisions?
  7. What libraries have been used to build the system?
  8. How much of the code is open-source?
  9. Has the author of the code been given due credit?
  10. What types of auditing procedures are in place?
  11. Is there an internal team responsible for auditing the algorithm (eg. specialized data science team, automation editors, subject matter experts etc.)
  12. How are algorithm deliverables communicated to technical and non technical staff of the contracting company?
  13. What criteria is used to measure the AI system’s effectiveness?
  14. Which of the many algorithm variables are the determining factors in a model’s performance?
  15. What decision-making concerns were considered during the data collection and model generation phases?
  16. What were the initial errors during data collection and algorithm testing phase?
  17. What changes have been made to the algorithmic development workflow to address any of such errors? How and why?
  18. Has recent academic research and development informed the improvement of the algorithmic workflow?
  19. How does the algorithm address any potential concerns related to discrimination or bias? (eg. racial bias, gender bias, economic bias etc.)?
  20. How can the algorithm be exploited and what correction measures have been implemented?
  21. During model training, has enough data been provided to the algorithm/model to account for unusual cases?
  22. How were the criteria and process for data collection determined?
  23. Was public data used to train the algorithm?
  24. Was private data used to train the system? What approvals were required to access the data?
  25. Was the data and copyright owners informed of the potential use of the data? (eg. terms of use, email communications etc.)
  26. What other ethical and legal concerns were addressed before and during data collection?
  27. Was the data collected before or after the model was built?
  28. How much of the data was collected after the model was built? And how did it perform on that data? (Models trained on retroactive data perform well on trained sets. The best way to assess a model is to collect fresh data and test it.)
  29. Has the company paid for licenses to access the training data from a third party? If so, has that partner been vetted?
  30. What happens to data that has been collected for a particular algorithm and used for training?
  31. How is the quality of collected data assessed?
  32. Have datasets containing private identifiable information been anonymized before use?
  33. How did the organization in question manipulate or sanitize the data for the algorithm?
  34. How did it decide what data is relevant to the model?
  35. Are end users informed of what inferences are made using the data? (eg. algorithmic disclaimers)
  36. If the company discloses this information to users, is the communication clear, made public and easy to understand?
  37. Can the collected data be used for predatory practices?
  38. Can the data be used to influence or limit an individual’s choice in the above markets or the public realm?
  39. Can this data be used by a government or private surveillance body?
  40. Is the data being collected by a private organization to exploit user engagement?
  41. If so, what mechanisms or triggers are being used to retain user attention? (eg. fear, conspiracy theories, provocative, violent or user-sensitive content)
  42. Are there any laws in place against these practices?
  43. If not, are there any data privacy groups working on addressing these concerns? And has the organization engaged with them?
  44. What are the ongoing bills addressing these concerns?
  45. Can the data collected by companies expose sensitive details about individuals? (eg.exact residence location, gender identity, sexual orientation etc)
  46. What measures has the company taken to address potential security or privacy leaks?
  47. Has the company been subject to an independent expert or platform auditing its data?
  48. Are there any federal agencies overseeing the practices of data collection in the company’s field of operation?
  49. Are owners of the personal data given rights to access their personal information or withdraw access privileges to the company or the data collector?
  50. Has there been a manual review of the partial or complete dataset?

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Francesco Marconi

Francesco Marconi

Computational journalist and co-founder of Applied XL. I write about data science, storytelling and innovation.