CPA to Cybersecurity
Posts
GRCnewsAssistant.py: How you can use it, and better yet - create something like it with Cline AI coding assistant

GRCnewsAssistant.py: How you can use it, and better yet - create something like it with Cline AI coding assistant

Learning enough about AI augmented coding "to be dangerous," but responsibly and to advance cyber risk reduction

Steve McMichael
February 10, 2025

❝

GRC has been difficult for projects and portfolio things, because it's mostly spreadsheets and nerd stuff!

Dr. Gerald Auger, Daily Cyber Threat Briefing #785

Hi! 🤓

More than 27,000 viewers have seen that the open source GRC Assist python script is awesome. It automates searching for news articles based on keywords relevant to your industry or needs. Minimal/no coding experience is required to:

Save time by automating news searches
Streamline gathering relevant news for budget requests or user training
Build a knowledge base of relevant news stories
Provide recent examples to drive cybersecurity investments

How can we stand on the shoulders of Dr Auger’s cool project and take it to the next level, both for business value gains and to elevate our technical chops as GRC practitioners? 🤔 I’m glad you asked!

In this demo I take the current GRC Assist output here 👇️

And add AI generated ratings, scores and one sentence summaries for each news story that GRC Assist finds. 👇️

The end result is also open source and you can try it for yourself. To get up and running as quickly as possible, go straight to the installation instructions in the README here 👇️

GRCnewsAssistant/README.md at main · CPAtoCybersecurity/GRCnewsAssistant

ALPHA VERSION: This Python-based tool requires minimal coding experience for users to collect Cybersecurity Governance, Risk, and Compliance related news articles (or any topic of interest) using t...

github.com/CPAtoCybersecurity/GRCnewsAssistant/blob/main/README.md

Or to see the steps I followed to modify the script (and you can too), coming from a business background with limited hands on keyboard experience, check them out below.

It’s helpful as GRC practitioners to elevate our technical capabilities with python and open source GitHub projects like this, since the most important aspect of managing technology risk is to understand the underlying technology.

YouTube video to follow.

Be safe, be well,

Steve

How I Made My First GitHub Project and You Can Too

🛠️ Tools and Resources Used
⌨️ Step 1: Install GRC Assist
- Installation Instructions
- Demo
📰 Step 2: Install NP4K Extractor
- Installation Instructions
- Demo
🔥 Step 3: Install Cline
⚠️ Step 4: Proceed With Caution
- MIT AI Risk Repository
🗣️ Step 5: Have a Conversation with Cline About W …
🧪 Step 6: Test the Code
- 🔄 Simply Cyber GRC Mantra, Lather, Rinse, Repeat
🤖🔎 AI Review of AI Generated Code
👨🔎 Human Review of AI Generated Code
- 🆚 Diff from Original grcAssist to GRCnewsAssistan …
💡Bottom Line
🔖 Epilogue: Coding responsibly

🛠️ Tools and Resources Used

Open source projects on GitHub
Visual Studio Code with Cline plugin
- Claude-Sonnet API
Fabric label_and_rate pattern
- Claude-Haiku API

⌨️ Step 1: Install GRC Assist

Installation Instructions

How I Installed GRC Assist: Step-by-Step

Find instructions in this LLM Face-Off blog post

www.cpatocybersecurity.com/p/llm-face-off#how-i-installed-grc-assist-stepby-s

Demo

Once installed, here’s what is produces based on the keywords you provide:

📰 Step 2: Install NP4K Extractor

Installation Instructions

GitHub - nopslip/NP4k-extractor: 🌱 Extract data from articles and feed it to Fabric

🌱 Extract data from articles and feed it to Fabric - nopslip/NP4k-extractor

github.com/nopslip/NP4k-extractor

#STEP 1: Go to a directory for installation
cd documents/grc_tools

#STEP 2: Install NP4k-extractor directly from the repo
git clone https://github.com/nopslip/NP4k-extractor

#STEP 3: Go to the NP4k-extractor directory
cd NP4k-extractor

❝

You should probably run this script in a Python virtual env.

Setup instructions, https://github.com/nopslip/NP4k-extractor/tree/main

#STEP 4: Crate a virtual environment named venv in your current directory, a best practice for stability and portability, to help avoid conflicts with other projects
python3 -m venv venv

#STEP 5: Activate your virtual environment
source venv/bin/activate

#STEP 6: Install newspaper4k
pip install newspaper4k

#STEP 7: Install lxml_html_clean
pip install lxml_html_clean

#STEP 8: Install typing_extensions
pip install typing_extensions

#STEP 9: Run the python interpereter
python3

#STEP 10: Import ntlk
import nltk

#STEP 11: Download punkt_tab
nltk.download('punkt_tab')

#STEP 12: Exit
exit()

STEP 13: Copy the URLs column from grcdata.csv and paste it into a new file saved as “URLs.csv” in your NP4k-extractor directory.

#STEP 14: run the script
python extract_articles.py --file urls.csv

Demo

Running through each URL, it pulls and outputs into a single text file the article titles, keywords, authors, summary, publish date.

🔥 Step 3: Install Cline

Now things get really interesting as we modify the open source GRC Assist and NP4k-extractor projects, with the help of an AI coding assistant, Cline. It’s an extension to the popular Visual Studio Code IDE and came to me highly recommended.

Cline - Autonomous Coding Agent for VSCode

Cline is an AI-powered coding assistant for Visual Studio Code.

cline.bot

Install from the Extensions area in VS Code

⚠️ Step 4: Proceed With Caution

I’ve found the outcomes of this demo eyebrow raising, in terms of both the risks and opportunities that the diffusion of AI augmented coding brings to novices like myself. Taking a step here to focus on using these capabilities responsibly.

One of the biggest impacts of AI that goes kind of unnoticed is that we’re about to see an explosion of poorly built applications.
Specifically, applications built completely by AI with no thought of security whatsoever.
🧵
— ᴅᴀɴɪᴇʟ ᴍɪᴇssʟᴇʀ (@DanielMiessler)
5:16 PM • Jan 27, 2025

INSECURE CODE

#GRC & Cybersecurity Perspective:

AI models that generate or facilitate insecure code present a major cybersecurity risk. If an AI system inadvertently produces code snippets with vulnerabilities, these flaws can be integrated into software development pipelines, potentially leading to data breaches, ransomware attacks, or system takeovers. From a governance standpoint, organizations must ensure secure coding practices and compliance with cybersecurity standards such as ISO 27001, PCI DSS, or internal security controls. Failing to address insecure code can also create legal liabilities if clients or regulators find the company negligent in preventing obvious vulnerabilities. This risk grows in tandem with the widespread adoption of AI-assisted coding tools, making robust risk assessment and code review processes essential.

#National Security & Public Safety Perspective:

On a national scale, insecure code can enable hostile actors—state-sponsored or otherwise—to penetrate critical infrastructure, such as power grids, communication systems, and healthcare networks. Successful exploits can cause massive societal disruption or harm to public services, raising the stakes from a mere cybersecurity incident to a broader national security threat.

John V, ai-ml-chat in Simply Cyber Discord

MIT AI Risk Repository

The AI Risk Repository

A comprehensive living database of over 1000 AI risks categorized by their cause and risk

domainairisk.mit.edu

	RiskDB_AI_Generated_Code.pdf93.81 KB • PDF File

Risks of AI Generated Code

AI-generated code carries a range of risks that can be categorized by likelihood and impact, as well as by their nature (technical, ethical, societal). Here's a breakdown of these risks with IDs from the AI Risk Database and categorized by risk level:

High Risk (High likelihood and significant impact):

Malicious Use [22.01.00, 49.01.00]: AI can be used by malicious actors to create malware, facilitate cyberattacks, generate disinformation, or develop weapons. This includes the potential for AI to be used for terrorism or combating law enforcement. AI can also be used to discover vulnerabilities, write exploit code, and evade detection. This is a high-risk area because it can have immediate and widespread consequences.
Cybercrime [23.01.00, 23.02.00, 23.03.00, 23.04.00, 23.05.00, 23.06.00, 23.07.00]: AI can enable various fraudulent activities including creating fake identities or phishing messages. This category includes a wide range of potential harms, from financial crimes to more serious security breaches. AI-supported malware and hacking are already a reality.
Security Vulnerabilities [02.04.00, 14.06.00, 25.01.00, 51.04.00]: The complex toolchain of AI models can introduce vulnerabilities in the generated code, including software security issues and hardware vulnerabilities. Model attacks, designed to steal information or cause incorrect responses, pose a significant threat. This has a high potential to cause harm since it directly compromises the integrity of software systems.
Lack of Transparency & Explainability [06.06.00, 14.05.00, 26.01.00, 26.02.00, 42.06.00, 42.18.00, 42.21.00, 52.01.00]: AI systems can make decisions without offering insight into the process, leading to a lack of trust and an inability to audit the system. This opaqueness can lead to undetected errors or biases. The mismatch between mathematical optimization and human reasoning makes this a high risk and is a barrier to safety.
Bias and Discrimination [04.02.00, 10.01.00, 11.01.00, 11.02.00, 11.03.00, 11.04.00, 11.05.00, 12.05.00, 15.01.00, 16.01.00, 18.01.00, 28.02.00, 39.08.00, 42.05.00, 54.01.00, 56.01.00, 56.02.00]: AI models can reflect biases present in training data. This can lead to unfair or discriminatory outcomes. Such biases may result in legal ramifications. These can manifest as representational, allocative, quality of service, interpersonal and societal system harms.
Misinformation and Information Manipulation [04.05.00, 16.03.00, 17.03.00, 18.02.00, 31.01.00, 46.03.00, 48.08.00]: AI can generate false, misleading, or inaccurate information, which can erode trust in shared information. This includes using AI for disinformation campaigns or to create fake news. This is particularly risky because it undermines the truth and can be used to manipulate public opinion. AI can produce content that does not distinguish fact from opinion.
Privacy Violations & Data Leakage [02.07.00, 14.02.00, 16.02.00, 17.02.00, 18.03.00, 23.09.00, 38.01.00, 39.07.00, 42.09.00, 48.04.00, 52.01.00]: AI models trained on personal data may unintentionally expose private information such as phone numbers, email and residential addresses. AI systems may also leak sensitive information or violate privacy regulations. This presents an immediate and serious risk to individuals.
Intellectual Property Rights Violations [03.03.00, 30.10.00, 31.05.00, 48.10.00, 56.07.00]: AI code generation can infringe on existing copyrights and patents by appropriating work without permission or compensation. This includes issues related to training data, model outputs, and the blurring of traditional concepts of authorship. This poses legal risks as well as economic harms.

Moderate Risk (Moderate likelihood and/or moderate impact):

Harmful Content & Toxicity [02.01.00, 02.08.00, 17.01.00, 18.01.00, 23.07.00, 28.01.00, 48.03.00, 48.11.00, 56.05.00]: AI-generated content may include toxic, biased, or offensive language and can create harmful and abusive content. This includes promoting social stereotypes or hate speech.
Human-Computer Interaction Harms [16.05.00, 17.05.00, 38.04.00, 48.07.00]: Users may overly trust AI systems or treat them as human-like which may lead to overreliance, automation bias, and emotional entanglement. AI can also influence human agency and decision-making. This could have a negative impact on mental well-being.
Job Displacement & Economic Impact [19.03.00, 25.03.00, 25.05.00, 31.07.00, 41.01.00, 47.04.00, 56.02.00, 56.15.00]: AI-powered code generation may lead to job losses in software development and related fields. This risk involves potential economic disruption as well as exacerbation of inequalities. AI systems could also lead to large-scale deskilling of workers.
Environmental Harms [03.06.00, 12.03.00, 28.00, 31.06.00, 39.02.00, 41.06.00, 45.00, 48.05.00, 56.03.00]: The energy consumption of AI models contributes to a significant carbon footprint. The extraction of resources for AI systems can also be unsustainable. These impacts can be substantial and contribute to climate change.
Unintended Consequences [06.08.00, 40.00]: AI may achieve goals in ways that differ from what its creators intended. This could result from errors in design or unanticipated emergent behaviors. Systems may also misinterpret commands.
Goal-related Failures [24.02.00]: AI assistants may pursue goals in ways that do not benefit users or that overstep bounds and cause harm. A misaligned metric M can cause an AI assistant to harm non-users.
Operational Risks [10.00, 14.04.00, 47.01.00]: There are risks involved in the complex environments in which AI systems operate. This also includes failures caused by inadequate testing or deployment. Poorly designed systems can cause harm.
Incompetence: AI may simply fail at its task, leading to unintentional death, or unjust rejections or lead to poor performance or biased decisions . This involves AI systems that lack the skill to do what is asked.
Lack of Robustness [26.06.00, 39.04.00, 52.01.00]: AI systems may not be resilient against attacks or manipulation and may not function as intended if there is unexpected input or changes in the data or environment. This may cause an HLI to be error-prone in practice.
Environmental and Socioeconomic Harms [03.06.00, 16.06.00, 18.06.00]: AI can contribute to social inequities due to an uneven distribution of risk and benefits. This can include the loss of safe employment.
Amplification of Biases [56.04.00, 56.05.00]: AI can amplify existing biases within the training data. This can lead to skewed outputs and discrimination.
Poor Performance of Models: Models may perform poorly for their intended purposes, leading to biased decisions.

Low Risk (Low likelihood and/or minor impact):

Hardware Vulnerabilities [02.05.00, 14.07.00]: While a technical risk, the likelihood of direct hardware vulnerabilities causing major issues is lower than software vulnerabilities. Faults in the hardware can disrupt the execution of an algorithm.
External Tool Issues [02.06.00]: External tools used by AI models may present trustworthiness and privacy issues, but are typically of lower risk than internal system flaws.
Erosion of Society: Personalized news feeds can lead to a loss of a shared sense of reality and basic solidarity. This is a long-term effect and may not be a significant risk for all AI applications.
Loss of Autonomy: Delegating decisions to an AI may leave people feeling helpless and subjected to the power of a machine. This may not be a significant risk if AI is used in a way that preserves human control.
Over-reliance: Over-dependence on AI may lead to a loss of human skills, but this is a slower, longer-term concern. This is dependent on specific AI implementations.
Exclusion: AI technology may end up in the hands of a few powerful players, but the effects of this are slow and may not be a high-risk for most AI applications.
Misapplication by a Non-Malicious Actor: AI can be misused by non-malicious actors, which may cause unintended negative outcomes. This risk may be lower since there is no malicious intent.
Unintended Outcomes from Interactions with Other AI Systems: AI systems can interact in unexpected ways, but this may not result in significant harm.
Impacts Resulting from Interactions with External Societal, Political, and Economic Systems: AI systems may interact with existing systems in ways that cause harm, but this is indirect.
Societal concerns: Societal concerns around AI reduce the realization of potential benefits, however, this is a less direct risk and is dependent on social conditions.
Overreliance: As AI capability increases, humans grant AI more control over critical systems and eventually become irreversibly dependent on systems they do not fully understand but this is a slow risk.
Single point of failure: Intense competition leads to one company gaining a technical edge, and if that company's model controls other key systems then failures are possible. This is a possible but lower probability risk.

It is important to note that the likelihood and impact of these risks can vary based on the specific AI system, its intended use, and the safeguards in place.

🗣️ Step 5: Have a Conversation with Cline About What to Change

I just added a task and had chat conversations with Cline while iterating on the script modification and development. Like many LLMs (Claude-Sonnet here) it was forgiving of my “produde” vs “produce” typo!

🧪 Step 6: Test the Code

These updates look reasonable, as someone new to coding. And I like how Cline tests what it produces to show that is completed what it set out to do. We can get a second opinion from AI and human reviewers in validation steps.

Running the script for a test:

#Go to the grcnewsassistant directory if you're not already there
cd grcnewsassistant

#Activate the virtual enviornment if it's not already activated, e.g. for macOS/Linus:
source venv/bin/activate

#Execute the script   
python GRCnewsAssistant.py

It’s working!

🔄 Simply Cyber GRC Mantra, Lather, Rinse, Repeat

🤖🔎 AI Review of AI Generated Code

To get a second opinion on the code that Cline modified, produced, I ran it through a different LLM - Gemini using the explain_code fabric pattern. Here’s that it had to say:

EXPLANATION: This Python script, GRCnewsAssistant.py, automates the process of gathering, analyzing, and storing news articles related to Governance, Risk, and Compliance (GRC). It uses several libraries to achieve this:

newspaper3k: For extracting article content (title, text, authors, keywords, summary, etc.) from URLs.
newsdata.io: For searching and retrieving news articles based on keywords.
fabric: For labeling and rating the articles (presumably using a custom model).
Standard Python libraries like csv, requests, datetime, time, logging, json, tempfile, os, subprocess, platform, sys, and urllib.

Here's a breakdown of the script's functionality:

Keyword Input: Reads keywords from a keywords.csv file. These keywords are used to search for relevant news articles.
News Retrieval: Uses the NewsData.io API to search for news articles matching the keywords. It stores the date, keyword, headline, description, and URL of each article.
URL Saving: Saves the URLs of the retrieved articles to a urls.csv file.
Article Content Extraction: Extracts the full content of each article using the newspaper3k library. This includes the title, authors, keywords, summary, main text, and publish date.
AI Analysis (Fabric): Sends the extracted article content to a fabric command (presumably a custom tool or script) for analysis. The fabric command is expected to perform tasks like labeling, rating, and generating a one-sentence summary. The script uses the system clipboard to pass data to the fabric command and reads the output from a temporary file. This approach is unusual and could be improved by directly piping data to the fabric command.
Data Storage: Stores the original article data along with the results of the AI analysis into a grcdata_rated.csv file. It avoids duplicate entries by checking for existing URLs in the file. It also handles cases where the AI analysis fails for a particular article.
Logging: Uses the logging module to record the progress and any errors encountered during the process. This is crucial for debugging and monitoring the script's execution.
Clipboard Handling: The get_clipboard_command function and related code is used to handle interactions with the system clipboard for passing data to the external fabric tool. This is done in an OS-specific way (macOS, Linux, Windows).

The script is designed to be run as a command-line tool. It requires the NewsData.io API key to be set as an environment variable. It also depends on the xclip utility being installed on Linux systems.

The use of temporary files and the clipboard to communicate with the fabric command is a less common and potentially less efficient approach. A more direct method of passing data to and from the fabric command (e.g., using pipes or direct file I/O) would likely be preferable. The script also lacks error handling for cases where the fabric command takes too long to execute (it uses a timeout, but doesn't fully handle the interrupted process).

It sees an opportunity to improve use of a clipboard function - but I’m not fussed about that at the moment as clipboard input is how I typically interact with the fabric client, and this script is providing the output I need
I’m more concerned with “subtle backdoors”, vulnerabilities or strange processes in the code which haven’t been flagged - appreciating that this isn’t a product security code scan, but a blunt check

👨🔎 Human Review of AI Generated Code

🆚 Diff from Original grcAssist to GRCnewsAssistant

It changed a lot when comparing to 1 of the 3 GitHub projects it built on, but no part of the new script raises my eyebrows. Your feedback is of course welcome.

💡Bottom Line

I’m impressed with the Cline AI agent!
- I like how it asked for permission to test running the modified script at different stages, then ran them confirm success
- It adapted to a variety of question styles that I sent it, from conversational to pasting back terminal snippets
- It’s more integrated into my workflow as a VSC plugin than copy/pasting into a web UI
- The speed was remarkable
If I can do it, you can do it
Now sharing my first project on GitHub to crowdsource feedback and collaborate with the community
Let’s code responsibly

🔖 Epilogue: Coding responsibly

Exploring further project ideas after this one, I was surprised at the capability for Cline to provide me with a python script to download all transcripts from a YouTube channel (with permission) in a few clicks.

This capability could create issued in at least these 4 risk categories in the MIT AI Risk database

	AI_Risks_YT_Transcript.pdf42.39 KB • PDF File

AI Risk Database - Risk Categories Flagged

1. Intellectual Property Rights Violations (Ev_ID: 03.03.00)

This is an emerging risk category as the use of generative AI tools like Stable Diffusion, Midjourney, and ChatGPT expands.
Content creators are suing AI developers for allegedly using their work without permission or compensation.
Issues arise as developers use AI assistants like GitHub CoPilot to generate ready-to-use computer code.
Even if the AI is trained on open-source software (OSS), not all OSS licenses are compatible. Some licenses, like GPL, require derivative works to be made publicly available under the same license.
Misuse of such licenses has led to injunctions and forced disclosure of proprietary source code in the past.

2. Intellectual Property Violations (Ev_ID: 23.10.00)

This category addresses AI-generated responses that may violate or encourage violations of intellectual property (IP) rights.
IP violations include unauthorized use of copyrights, trademarks, or patents belonging to third parties.

3. Impact on Intellectual Property Rights (Ev_ID: 31.05.00)

The effectiveness of traditional IP protections is being challenged by the rise of generative AI.
AI models often train on vast datasets, which may include copyrighted and IP-protected materials, raising questions about legality.

4. Intellectual Property Concerns Related to Unauthorized Use (Ev_ID: 48.10.00)

Generative AI eases the production or replication of copyrighted, trademarked, or licensed content without proper authorization.
It may inadvertently expose trade secrets or encourage plagiarism and illegal replication.
Fair use exceptions might not cover all these instances, increasing the risk of legal disputes.

In Finance the term “knows enough to be dangerous” typically is about taxes. In cyberspace this is taking on a whole new meaning. Governance, Risk and Compliance has never been more important.